13 - Bayesian Networks: Reasoning Over Time
Up until now, our Bayesian Networks have represented static snapshots of the world. However, many real-world environments are dynamic. The state of the world changes, and we often receive a sequence of observations over time.
To track a moving robot, recognize spoken speech, analyze financial markets, or diagnose an evolving disease, we must introduce time and sequencing into our models. The fundamental approach to this is using Markov Models and their more powerful extension, Hidden Markov Models (HMMs).
1. Markov Models (Markov Chains)
A Markov Model is essentially a chain-structured Bayesian Network. It models a system as a sequence of states: .
The Markov Assumption
To prevent the model from growing infinitely complex, we make the First-Order Markov Assumption: The conditional probability of the next state depends exclusively on the current state.
In plain English: The future is independent of the past given the present. Everything you need to know about the history of the system is encapsulated in the current state.
(Note: We can also define Higher-Order Markov Models where the next state depends on the last states, but standard models are first-order).
Defining a Markov Chain
A discrete-time, discrete-state Markov Chain is fully defined by:
- Initial State Probabilities (Prior): , the probability distribution of the starting state.
- Transition Probabilities (Dynamics): , how the state evolves over one time step.
- We typically make a Stationarity Assumption: the transition probabilities do not change over time. The rules of physics/dynamics remain constant.
The joint distribution of an entire sequence is simply the product of the prior and the chained transitions:
Passage of Time and Stationary Distributions
If we know the initial distribution , we can simulate forward to find the distribution at time using the Mini-Forward Algorithm:
What happens if we simulate a Markov Chain far into the future without ever observing the real world? Uncertainty accumulates. Eventually, we have no idea what the exact state is. For most chains, the distribution eventually stabilizes into a Stationary Distribution—a steady state where , entirely independent of the initial starting state .
2. Hidden Markov Models (HMMs)
Markov Chains are not very useful for AI agents because an agent's uncertainty simply grows to infinity without new information. Furthermore, agents rarely have the luxury of directly observing the true state of the world. Instead, they receive noisy measurements or effects, .
A Hidden Markov Model (HMM) solves this. It consists of an underlying, unobservable Markov Chain over states , accompanied by observable emissions (or effects) at each time step.
Defining an HMM
An HMM is defined by three components:
- Prior Probabilities:
- Transition Model:
- Emission Model: — The probability of receiving observation given that the true underlying state is .
Conditional Independence in HMMs
HMMs have two crucial independence properties:
- Markov Hidden Process: The future state depends on the past strictly via the present state.
- Sensor Independence: The current observation/emission is conditionally independent of everything else (past states, future states, past observations) given the current state .
The joint distribution is:
Examples of HMMs
- Speech Recognition: = Specific words/phonemes, = Acoustic audio signals.
- Robot Localization: = True coordinates on a map, = Noisy laser rangefinder readings.
3. Filtering and The Forward Algorithm
The most common task performed on an HMM is Filtering (or Monitoring). Filtering is the task of tracking the Belief State over time. The Belief State is the posterior distribution of the current state, given all evidence observed up to the current time:
As time progresses, our agent operates in a continuous two-step loop:
- Elapse Time (Passage of Time)
- Observe (Incorporate Evidence)
Step 1: Passage of Time (1-Step State Evolution)
As one unit of time passes (before we take a new measurement), the state of the world changes according to the transition dynamics. Because time passes, our uncertainty increases.
We calculate the intermediate belief , representing our prediction for time based only on evidence up to time :
Conceptually: We take our old belief, and push it forward through the transition matrix.
Step 2: Observation (1-Step Emission)
Now, we receive a new piece of evidence from our sensors. This new information allows us to decrease our uncertainty.
We update our intermediate prediction by reweighting it with the likelihood of the new observation, using Bayes' Rule.
Conceptually: We take our prediction, multiply it by the emission probability of what we just saw, and normalize the table to sum to 1. States where the observation is highly unlikely are suppressed; states where the observation is expected are boosted.
The Forward Algorithm
The Forward Algorithm simply chains these two operations together simultaneously in an online fashion. At every time step, it takes the previous belief, projects it forward through the transitions, multiplies it by the new emission likelihoods, and normalizes it.
By running this loop indefinitely, an agent can maintain a highly accurate, constantly updating belief about its location, the spoken word, or the state of the world, strictly by maintaining a single probability distribution array in memory.
Summary
- Markov Chains represent sequences of states over time using transition probabilities, bounded by the Markov Assumption (future independent of past given present).
- In a pure Markov Chain without observations, uncertainty accumulates until the system reaches a stationary distribution.
- Hidden Markov Models (HMMs) augment Markov Chains with an Emission Model, acknowledging that the true state is hidden and only noisy sensor readings are available.
- Filtering is the process of tracking the agent's Belief State.
- The Forward Algorithm performs filtering via a continuous two-step process: Elapsing time (which increases uncertainty by pushing the belief through transition dynamics) and Observing (which decreases uncertainty by reweighting the belief according to the emission likelihoods).