A Beginner's Guide to Diffusion model

1. The Chapman–Kolmogorov Equation
- 1.1. What Is It?
- 1.2. Step-by-Step Derivation
2. The Kramers–Moyal Expansion
3. Itô’s Lemma
4. Constructing an Itô Process
5. Diffusion Models: Putting It All Together

1. The Chapman–Kolmogorov Equation

1.1. What Is It?

A Markov process is one in which the future depends only on the present state, not on its past. Let

P(x, t \mid x_0, t_0)

denote the probability density of transitioning from state $x_0$ at time $t_0$ to state $x$ at time $t$ . The Chapman–Kolmogorov equation tells us that long-time transitions can be computed by “summing over” intermediate states. For three times $t_0 < t_1 < t_2$ , the equation reads:

P(x, t_2 \mid x_0, t_0) = \int_{-\infty}^{\infty} P(x, t_2 \mid y, t_1)\,P(y, t_1 \mid x_0, t_0)\,dy.

1.2. Step-by-Step Derivation

Law of Total Probability:
To compute the probability of being in state $x$ at time $t_2$ given $x_0$ at time $t_0$ , we condition on an intermediate state $y$ at time $t_1$ :
$\Pr(X_{t_2} = x \mid X_{t_0} = x_0) = \int \Pr(X_{t_2} = x,\, X_{t_1} = y \mid X_{t_0} = x_0)\,dy.$
Markov Property:
Since the process is Markovian,
$\Pr(X_{t_2} = x \mid X_{t_1} = y,\, X_{t_0} = x_0) = \Pr(X_{t_2} = x \mid X_{t_1} = y),$
so we can write:
$P(x, t_2 \mid x_0, t_0) = \int P(x, t_2 \mid y, t_1)\,P(y, t_1 \mid x_0, t_0)\,dy.$
Discrete Case:
In a discrete state space, the integral becomes a sum:
$P_{ij}^{(n+m)} = \sum_{k} P_{ik}^{(n)}\,P_{kj}^{(m)}.$
This shows that multi-step transitions can be computed by multiplying (or convolving) shorter-step transition probabilities.

2. The Kramers–Moyal Expansion

2.1. From Chapman–Kolmogorov to a Differential Equation

For continuous processes, we study the evolution of the probability density $p(x,t)$ over a small time interval $\Delta t$ . Starting with:

p(x, t+\Delta t \mid x_0, t_0) = \int_{-\infty}^{\infty} p(x, t+\Delta t \mid y, t)\,p(y, t \mid x_0, t_0)\,dy,

we probe the evolution by multiplying by a smooth test function $\phi(x)$ and integrating over $x$ .

2.2. Taylor Expansion of the Test Function

For a fixed intermediate state $y$ , we expand $\phi(x)$ about $x = y$ :

\phi(x) = \phi(y) + (x-y)\,\phi'(y) + \frac{(x-y)^2}{2}\,\phi''(y) + \cdots.

Substituting this expansion into the inner integral yields:

\int \phi(x)\,p(x, t+\Delta t \mid y, t)\,dx = \phi(y) \underbrace{\int p(x, t+\Delta t \mid y, t)\,dx}_{=1} + \phi'(y) \int (x-y)\,p(x, t+\Delta t \mid y, t)\,dx + \cdots.

2.3. Defining Moments and Kramers–Moyal Coefficients

Define the $n$ th moment over the small interval $\Delta t$ as:

M^{(n)}(y,t,\Delta t) = \int (x-y)^n\,p(x, t+\Delta t \mid y, t)\,dx.

Assuming these moments scale linearly with $\Delta t$ , we define the Kramers–Moyal coefficients as:

D^{(n)}(y,t) = \lim_{\Delta t \to 0} \frac{1}{n!\,\Delta t}\,M^{(n)}(y,t,\Delta t).

2.4. Fokker–Planck Equation

Inserting the Taylor expansion into the integrated Chapman–Kolmogorov equation, subtracting the zeroth-order term, dividing by $\Delta t$ , and letting $\Delta t \to 0$ , we obtain:

\frac{\partial p(x,t)}{\partial t} = \sum_{n=1}^{\infty} (-1)^n \frac{\partial^n}{\partial x^n} \Bigl[D^{(n)}(x,t)\,p(x,t)\Bigr].

In many applications, the coefficients $D^{(n)}$ for $n \ge 3$ vanish or are negligible. Truncating at $n=2$ yields the Fokker–Planck equation:

\frac{\partial p(x,t)}{\partial t} = -\frac{\partial}{\partial x}\Bigl[D^{(1)}(x,t)\,p(x,t)\Bigr] + \frac{\partial^2}{\partial x^2}\Bigl[D^{(2)}(x,t)\,p(x,t)\Bigr].

3. Itô’s Lemma

3.1.Real-World Example: Brownian Motion of a Pollen Grain

Imagine you are observing a tiny pollen grain suspended in water. The grain is bombarded by water molecules, and these collisions cause it to move in a seemingly random way. This erratic motion is called Brownian motion.

1. Discrete Modeling

Suppose you record the position of the pollen grain at discrete time intervals of length $\Delta t$ . At each time step, the grain’s position changes due to:

Drift: There might be a very slight overall current in the water, which gives a predictable, small shift.
Random Kicks (Diffusion): The collisions with water molecules produce random displacements.

A discrete update of the position $X_t$ can be written as:

X_{t+\Delta t} = X_t + \mu(X_t, t)\,\Delta t + \sigma(X_t, t)\,\sqrt{\Delta t}\,Z,

where:

$X_t$ is the pollen grain’s position at time $t$ .
$\mu(X_t, t)$ represents any systematic drift (for example, due to a gentle water current).
$\sigma(X_t, t)$ represents the intensity of the random collisions.
$Z$ is a standard normal random variable, $Z\sim N(0,1)$ .

In this context, the term $\mu(X_t,t)\,\Delta t$ models the small, steady displacement due to the current, and the term $\sigma(X_t,t)\,\sqrt{\Delta t}\,Z$ models the random displacements caused by molecular collisions.

2. Variance and Scaling

Because $Z$ is normally distributed with mean 0 and variance 1, the variance of the random term is:

\text{Var}\Bigl[\sigma(X_t,t)\,\sqrt{\Delta t}\,Z\Bigr] = \sigma^2(X_t,t)\,\Delta t.

This shows that over a short time interval $\Delta t$ , the variance of the displacement is proportional to $\Delta t$ , which is a hallmark of Brownian motion.

3. Taking the Continuous-Time Limit

When we let $\Delta t \to 0$ , the process is observed over infinitely many infinitesimally small time steps. In the limit, by the central limit theorem (and Donsker’s invariance principle), the cumulative effect of the random displacements converges to a continuous-time Brownian motion $W_t$ . Therefore, the discrete update

X_{t+\Delta t} = X_t + \mu(X_t, t)\,\Delta t + \sigma(X_t, t)\,\sqrt{\Delta t}\,Z

transforms into the stochastic differential equation (SDE):

dX_t = \mu(X_t,t)\,dt + \sigma(X_t,t)\,dW_t.

Here,

The term $\mu(X_t,t)\,dt$ still represents the drift (the effect of the current in the water).
The term $\sigma(X_t,t)\,dW_t$ represents the random fluctuations (the effect of molecular collisions), with the important property that

(dW_t)^2 = dt.

3.2. The Setup

Suppose that $X_t$ satisfies the stochastic differential equation (SDE):

dX_t = \mu(X_t,t)\,dt + \sigma(X_t,t)\,dW_t,

where:

$\mu(X_t,t)$ is the drift,
$\sigma(X_t,t)$ is the diffusion coefficient,
$dW_t$ is an increment of standard Brownian motion, with $\mathbb{E}[dW_t] = 0,\quad (dW_t)^2 = dt,\quad dt\,dW_t = 0,\quad dt^2 = 0.$

Let $f(x,t)$ be a function in $C^{1,2}$ (i.e., continuously differentiable in $t$ and twice in $x$ ). We want to compute $df(X_t,t)$ .

3.3. Derivation

Taylor Expansion:
Expand $f(X_t + dX_t, t+dt)$ :
$df = f_t(X_t,t)\,dt + f_x(X_t,t)\,dX_t + \frac{1}{2} f_{xx}(X_t,t)\,(dX_t)^2 + \text{higher order terms}.$
Substitute the SDE:
Replace $dX_t$ by
$dX_t = \mu(X_t,t)\,dt + \sigma(X_t,t)\,dW_t.$
Thus,
$f_x(X_t,t)\,dX_t = f_x(X_t,t) \Bigl[\mu(X_t,t)\,dt + \sigma(X_t,t)\,dW_t\Bigr].$
Compute $(dX_t)^2$ :
We have
$(dX_t)^2 = \Bigl[\mu(X_t,t)\,dt + \sigma(X_t,t)\,dW_t\Bigr]^2.$
Expanding:
$(dX_t)^2 = \mu^2(X_t,t)(dt)^2 + 2\mu(X_t,t)\sigma(X_t,t)\,dt\,dW_t + \sigma^2(X_t,t)(dW_t)^2.$
Using the rules:
$(dt)^2 = 0,\quad dt\,dW_t = 0,\quad (dW_t)^2 = dt,$
we get:
$(dX_t)^2 = \sigma^2(X_t,t)\,dt.$
Combine the Terms:
Substitute back into the Taylor expansion:
$\begin{aligned} df &= f_t(X_t,t)\,dt + f_x(X_t,t) \Bigl[\mu(X_t,t)\,dt + \sigma(X_t,t)\,dW_t\Bigr] + \frac{1}{2} f_{xx}(X_t,t)\,\sigma^2(X_t,t)\,dt\\[1mm] &= \Bigl[f_t(X_t,t) + \mu(X_t,t)f_x(X_t,t) + \frac{1}{2}\sigma^2(X_t,t)f_{xx}(X_t,t)\Bigr]dt + \sigma(X_t,t)f_x(X_t,t)\,dW_t. \end{aligned}$

This is Itô’s lemma:

\boxed{df(X_t,t)= \left[f_t(X_t,t) + \mu(X_t,t)f_x(X_t,t) + \frac{1}{2}\sigma^2(X_t,t)f_{xx}(X_t,t)\right]dt + \sigma(X_t,t)f_x(X_t,t)\,dW_t.}

4. Constructing an Itô Process

4.1. The Itô Integral

Suppose you have a function $\sigma(t)$ (possibly random, but non-anticipative) and wish to integrate it with respect to Brownian motion $W_t$ . The Itô integral is defined by:

Partitioning the Time Interval:
Divide the interval $[0,t]$ into small subintervals:
$0 = t_0 < t_1 < \cdots < t_n = t.$
Forming the Riemann Sum:
Let $\Delta W_i = W_{t_{i+1}} - W_{t_i}$ . Then approximate the integral as:
$S_n = \sum_{i=0}^{n-1} \sigma(t_i)\,\Delta W_i.$
The evaluation of $\sigma(t_i)$ at the left endpoint ensures the integral is non-anticipative.
Taking the Limit:
As the partition gets finer, the sum converges (in the mean-square sense) to the Itô integral:
$\int_0^t \sigma(s)\,dW_s = \lim_{\max(t_{i+1}-t_i) \to 0} \sum_{i=0}^{n-1} \sigma(t_i) \Bigl(W_{t_{i+1}} - W_{t_i}\Bigr).$

4.2. Defining the Itô Process

An Itô process combines a drift part and a diffusion part:

X_t = X_0 + \int_0^t \mu(s)\,ds + \int_0^t \sigma(s)\,dW_s.

The drift term $\int_0^t \mu(s)\,ds$ is a standard Lebesgue integral.
The diffusion term $\int_0^t \sigma(s)\,dW_s$ is the Itô integral.

4.3. Some Key Properties

Continuity:
The process $X_t$ is continuous (under suitable conditions on $\mu$ and $\sigma$ ).
Quadratic Variation:
The quadratic variation is contributed solely by the diffusion part: $\langle X \rangle_t = \int_0^t \sigma^2(s)\,ds.$
Martingale Component:
Removing the drift, the diffusion part forms a martingale.

5. Diffusion Models: Putting It All Together

Diffusion models use these concepts to describe how data is gradually corrupted by noise and then recovered.

Forward Process (Noising):
Starting with a data sample $x_0$ , noise is gradually added by evolving $x_0$ using an Itô process. The evolution of the probability density $p(x,t)$ is governed by the Fokker–Planck equation (obtained by truncating the Kramers–Moyal expansion).
Reverse Process (Denoising):
To generate or recover data, the process is reversed. The reverse-time stochastic differential equation—derived using time-reversal techniques and Itô’s lemma—employs the gradient of the log-density (known as the score function) to guide a noisy sample back to the data distribution.

Table of Contents