A Markov process is one in which the future depends only on the present state, not on its past. Let
P(x,t∣x0,t0)
denote the probability density of transitioning from state x0 at time t0 to state x at time t. The Chapman–Kolmogorov equation tells us that long-time transitions can be computed by “summing over” intermediate states. For three times t0<t1<t2, the equation reads:
Law of Total Probability:
To compute the probability of being in state x at time t2 given x0 at time t0, we condition on an intermediate state y at time t1:
2.3. Defining Moments and Kramers–Moyal Coefficients
Define the n th moment over the small interval Δt as:
M(n)(y,t,Δt)=∫(x−y)np(x,t+Δt∣y,t)dx.
Assuming these moments scale linearly with Δt, we define the Kramers–Moyal coefficients as:
D(n)(y,t)=Δt→0limn!Δt1M(n)(y,t,Δt).
2.4. Fokker–Planck Equation
Inserting the Taylor expansion into the integrated Chapman–Kolmogorov equation, subtracting the zeroth-order term, dividing by Δt, and letting Δt→0, we obtain:
∂t∂p(x,t)=n=1∑∞(−1)n∂xn∂n[D(n)(x,t)p(x,t)].
In many applications, the coefficients D(n) for n≥3 vanish or are negligible. Truncating at n=2 yields the Fokker–Planck equation:
3.1.Real-World Example: Brownian Motion of a Pollen Grain
Imagine you are observing a tiny pollen grain suspended in water. The grain is bombarded by water molecules, and these collisions cause it to move in a seemingly random way. This erratic motion is called Brownian motion.
1. Discrete Modeling
Suppose you record the position of the pollen grain at discrete time intervals of length Δt. At each time step, the grain’s position changes due to:
Drift: There might be a very slight overall current in the water, which gives a predictable, small shift.
Random Kicks (Diffusion): The collisions with water molecules produce random displacements.
A discrete update of the position Xt can be written as:
Xt+Δt=Xt+μ(Xt,t)Δt+σ(Xt,t)ΔtZ,
where:
Xt is the pollen grain’s position at time t.
μ(Xt,t) represents any systematic drift (for example, due to a gentle water current).
σ(Xt,t) represents the intensity of the random collisions.
Z is a standard normal random variable, Z∼N(0,1).
In this context, the term μ(Xt,t)Δt models the small, steady displacement due to the current, and the term σ(Xt,t)ΔtZ models the random displacements caused by molecular collisions.
2. Variance and Scaling
Because Z is normally distributed with mean 0 and variance 1, the variance of the random term is:
Var[σ(Xt,t)ΔtZ]=σ2(Xt,t)Δt.
This shows that over a short time interval Δt, the variance of the displacement is proportional to Δt, which is a hallmark of Brownian motion.
3. Taking the Continuous-Time Limit
When we let Δt→0, the process is observed over infinitely many infinitesimally small time steps. In the limit, by the central limit theorem (and Donsker’s invariance principle), the cumulative effect of the random displacements converges to a continuous-time Brownian motion Wt. Therefore, the discrete update
Xt+Δt=Xt+μ(Xt,t)Δt+σ(Xt,t)ΔtZ
transforms into the stochastic differential equation (SDE):
dXt=μ(Xt,t)dt+σ(Xt,t)dWt.
Here,
The term μ(Xt,t)dt still represents the drift (the effect of the current in the water).
The term σ(Xt,t)dWt represents the random fluctuations (the effect of molecular collisions), with the important property that
(dWt)2=dt.
3.2. The Setup
Suppose that Xt satisfies the stochastic differential equation (SDE):
dXt=μ(Xt,t)dt+σ(Xt,t)dWt,
where:
μ(Xt,t) is the drift,
σ(Xt,t) is the diffusion coefficient,
dWt is an increment of standard Brownian motion, with
E[dWt]=0,(dWt)2=dt,dtdWt=0,dt2=0.
Let f(x,t) be a function in C1,2 (i.e., continuously differentiable in t and twice in x). We want to compute df(Xt,t).
3.3. Derivation
Taylor Expansion:
Expand f(Xt+dXt,t+dt):
df=ft(Xt,t)dt+fx(Xt,t)dXt+21fxx(Xt,t)(dXt)2+higher order terms.
Suppose you have a function σ(t) (possibly random, but non-anticipative) and wish to integrate it with respect to Brownian motion Wt. The Itô integral is defined by:
Partitioning the Time Interval:
Divide the interval [0,t] into small subintervals:
0=t0<t1<⋯<tn=t.
Forming the Riemann Sum:
Let ΔWi=Wti+1−Wti. Then approximate the integral as:
Sn=i=0∑n−1σ(ti)ΔWi.
The evaluation of σ(ti) at the left endpoint ensures the integral is non-anticipative.
Taking the Limit:
As the partition gets finer, the sum converges (in the mean-square sense) to the Itô integral:
An Itô process combines a drift part and a diffusion part:
Xt=X0+∫0tμ(s)ds+∫0tσ(s)dWs.
The drift term ∫0tμ(s)ds is a standard Lebesgue integral.
The diffusion term ∫0tσ(s)dWs is the Itô integral.
4.3. Some Key Properties
Continuity:
The process Xt is continuous (under suitable conditions on μ and σ).
Quadratic Variation:
The quadratic variation is contributed solely by the diffusion part:
⟨X⟩t=∫0tσ2(s)ds.
Martingale Component:
Removing the drift, the diffusion part forms a martingale.
5. Diffusion Models: Putting It All Together
Diffusion models use these concepts to describe how data is gradually corrupted by noise and then recovered.
Forward Process (Noising):
Starting with a data sample x0, noise is gradually added by evolving x0 using an Itô process. The evolution of the probability density p(x,t) is governed by the Fokker–Planck equation (obtained by truncating the Kramers–Moyal expansion).
Reverse Process (Denoising):
To generate or recover data, the process is reversed. The reverse-time stochastic differential equation—derived using time-reversal techniques and Itô’s lemma—employs the gradient of the log-density (known as the score function) to guide a noisy sample back to the data distribution.