How to Judge if a Time Series Is a Martingale? Using the Black–Scholes Model as an Explanation

| February 23, 2025

How to Judge if a Time Series Is a Martingale? Using the Black–Scholes Model as an Explanation

Table of Contents

1. Introduction

The Black–Scholes model, famous for its role in option pricing, assumes that asset prices follow a geometric Brownian motion (GBM). A key property of GBM—and martingales in general—is that the conditional expectation of the future value, given the current information, equals the current value. By simulating asset price paths using the Black–Scholes model, we can study their statistical properties to assess whether they exhibit martingale-like behavior.

In our analysis, we simulate multiple asset price paths over a five-year period and then examine them using two advanced tools:

  • Complexity–Entropy Causality Plane: This technique combines permutation entropy and statistical complexity to capture both randomness and hidden structure in the time series.
  • Power Spectral Density (PSD) Analysis: Using Welch’s method, we estimate how the variance of the series is distributed across frequencies, revealing characteristic power–law behavior.

2. The Black–Scholes Model and Geometric Brownian Motion

2.1. Stochastic Differential Equation

The Black–Scholes model describes the evolution of an asset price StS_t with the stochastic differential equation (SDE):

dSt=μStdt+σStdWt,dS_t = \mu S_t\, dt + \sigma S_t\, dW_t,

where:

  • μ\mu is the drift rate,
  • σ\sigma is the volatility, and
  • dWtdW_t is the increment of a standard Brownian motion.

2.2. Derivation via Itô’s Lemma

To solve the SDE, we define Xt=lnStX_t = \ln S_t and apply Itô’s lemma. This gives:

dXt=1StdSt121St2(dSt)2.dX_t = \frac{1}{S_t} \, dS_t - \frac{1}{2}\frac{1}{S_t^2}(dS_t)^2.

Substituting the expression for dStdS_t and noting that (dWt)2=dt(dW_t)^2 = dt, we obtain:

dXt=μdt+σdWt12σ2dt,dX_t = \mu\, dt + \sigma\, dW_t - \frac{1}{2}\sigma^2\, dt,

which simplifies to

dXt=(μ12σ2)dt+σdWt.dX_t = \left(\mu - \frac{1}{2}\sigma^2\right) dt + \sigma\, dW_t.

Integrating from 00 to tt, we have:

Xt=lnS0+(μ12σ2)t+σWt.X_t = \ln S_0 + \left(\mu - \frac{1}{2}\sigma^2\right)t + \sigma W_t.

Exponentiating yields the solution:

St=S0exp ⁣[(μ12σ2)t+σWt].S_t = S_0\,\exp\!\Bigl[\left(\mu - \frac{1}{2}\sigma^2\right)t + \sigma\,W_t\Bigr].

2.3. Discrete Simulation

In practice, we simulate the process by discretizing time. Over a small interval Δt\Delta t, the discrete model is given by:

St+Δt=Stexp ⁣[(μ12σ2)Δt+σΔtZ],S_{t+\Delta t} = S_t\,\exp\!\Bigl[(\mu - \tfrac{1}{2}\sigma^2)\Delta t + \sigma\sqrt{\Delta t}\,Z\Bigr],

with ZN(0,1)Z \sim \mathcal{N}(0,1). This recursive formula allows us to generate asset price paths.

3. Simulation of Multiple Traces

For our analysis, we simulate three independent Black–Scholes paths over a five-year period. Assuming approximately 252 trading days per year, this results in around 1,260 time steps per simulation. These multiple traces allow us to compare the dynamics across different realizations and study their collective behavior.

4. Advanced Time Series Analysis

After generating our simulated asset price paths, we apply two advanced analytical techniques to judge their martingale properties.

4.1. Complexity–Entropy Causality Plane

The Complexity–Entropy (CH) causality plane is a tool that plots two key quantities calculated from a time series:

  • Permutation Entropy H(P)H(P):
    This measure quantifies the randomness in the ordering of values by examining the frequency of ordinal patterns. For a chosen embedding dimension dd, if p(π)p(\pi) is the probability of an ordinal pattern π\pi, then:

    H(P)=πp(π)lnp(π).H(P) = -\sum_{\pi} p(\pi) \ln p(\pi).

    It is often normalized by dividing by ln(d!)\ln(d!), yielding:

    Hnorm=H(P)ln(d!).H_{\text{norm}} = \frac{H(P)}{\ln(d!)}.

    Example of Computing P(π)P(\pi):
    Consider a short time series: [3,1,2,4][3,\, 1,\, 2,\, 4] with d=3d=3 and τ=1\tau=1.

    • The first window, [3,1,2][3,\, 1,\, 2], has the ordinal pattern (1,2,0)(1,\, 2,\, 0), because the smallest value is 11, followed by 22, and then 33.
    • The next window, [1,2,4][1,\, 2,\, 4], is in increasing order, yielding the ordinal pattern (0,1,2)(0,\, 1,\, 2).
      If these are the only windows, then the probability distribution is:
    P((1,2,0))=12,P((0,1,2))=12,P((1,2,0)) = \frac{1}{2}, \quad P((0,1,2)) = \frac{1}{2},

    with zero probability for all other patterns.

  • Statistical Complexity CJSC_{JS}:
    This measure combines the normalized entropy with the Jensen–Shannon divergence between the observed distribution PP and the uniform distribution PeP_e. The explicit formula is:

    CJS=DJS(P,Pe)×Hnorm,C_{JS} = D_{JS}(P, P_e) \times H_{\text{norm}},

    where DJS(P,Pe)D_{JS}(P, P_e) is the Jensen–Shannon divergence, detailed in Appendix B.

Plotting HnormH_{\text{norm}} against CJSC_{JS} provides a visual representation of how the time series balances randomness with structure—a key indicator of martingale behavior.

4.2. Power Spectral Density Analysis

The PSD of a time series shows how its variance is distributed over frequency. Many stochastic processes follow a power–law:

S(f)1fα,S(f) \propto \frac{1}{f^\alpha},

where α\alpha is the spectral exponent. For Brownian motion, typically α2\alpha \approx 2.

To estimate the PSD, Welch’s method is used. This technique:

  1. Divides the time series into overlapping segments.
  2. Applies a window function to each segment to minimize spectral leakage.
  3. Computes the periodogram (squared magnitude of the Fourier transform) for each segment.
  4. Averages the periodograms to produce a robust PSD estimate.

A linear regression on the log–log plot of the PSD yields an estimate of α\alpha, which provides further evidence of martingale behavior if it aligns with theoretical expectations.

5. Visualization and Interpretation

In our analysis:

  • The left panel displays the three simulated price paths over five years in distinct colors.
  • The top-right panel shows the CH–plane trajectories for each trace. Instead of a single point, the accumulated history of HnormH_{\text{norm}} and CJSC_{JS} is plotted, revealing how these metrics evolve over time.
  • The bottom-right panel presents the PSD spectrum for one simulation on a log–log scale, along with a fitted line and the estimated exponent α\alpha.

These visualizations help determine whether the time series behaves as a martingale by revealing key characteristics such as the lack of exploitable structure (in the CH–plane) and a PSD exponent consistent with theoretical models.

6. Conclusion

By simulating asset price paths with the Black–Scholes model and analyzing them with the Complexity–Entropy causality plane and PSD estimation via Welch’s method, we can assess whether a time series behaves like a martingale. A martingale process, by definition, has no predictable trends beyond its current value. When the CH–plane indicates high normalized entropy and low statistical complexity, and the PSD exhibits a power–law with an exponent consistent with Brownian motion, these are strong indicators of martingale-like behavior.


Appendix A: Welch’s Method and PSD Derivation

A.1. Welch’s Method

Welch’s method refines the periodogram approach to PSD estimation by reducing variance. The procedure involves:

  1. Segmenting the Data:
    The time series is divided into overlapping segments (commonly with 50% overlap).

  2. Windowing:
    Each segment is multiplied by a window function (e.g., Hann or Hamming) to reduce spectral leakage.

  3. Computing Periodograms:
    For each windowed segment, the discrete Fourier transform (DFT) is computed, and the squared magnitude yields the periodogram.

  4. Averaging:
    The periodograms are averaged to produce the final PSD estimate:

    S(f)1Kk=1KPk(f),S(f) \approx \frac{1}{K}\sum_{k=1}^{K} P_k(f),

    where Pk(f)P_k(f) is the periodogram of the kkth segment and KK is the number of segments.

A.2. PSD Derivation

For a stationary process y(t)y(t), the theoretical PSD is defined by:

S(f)=limT1T0Ty(t)ei2πftdt2.S(f) = \lim_{T\to\infty} \frac{1}{T} \left|\int_0^T y(t)e^{-i2\pi f t}\, dt\right|^2.

For a finite segment of length LL with window w(n)w(n) and sampling interval Δt\Delta t, the periodogram is given by:

Pk(f)=1Un=0L1yk(n)w(n)ei2πfnΔt2,P_k(f) = \frac{1}{U}\left|\sum_{n=0}^{L-1} y_k(n)\, w(n)e^{-i2\pi f n\Delta t}\right|^2,

where the normalization factor is:

U=1Ln=0L1w2(n).U = \frac{1}{L}\sum_{n=0}^{L-1}w^2(n).

Averaging over KK segments yields the PSD estimate.


Appendix B: Derivation of the Jensen–Shannon Divergence

B.1. Definition and Derivation

The Jensen–Shannon divergence (JSD) is a symmetrized and smoothed version of the Kullback–Leibler (KL) divergence. For two probability distributions P={pi}P = \{p_i\} and Q={qi}Q = \{q_i\}, the KL divergence is:

DKL(PQ)=ipilnpiqi.D_{KL}(P\|Q) = \sum_{i} p_i \ln \frac{p_i}{q_i}.

However, the KL divergence is asymmetric and can be infinite if qi=0q_i = 0 for any ii where pi>0p_i > 0. The JSD is defined as:

DJS(P,Q)=12DKL(PM)+12DKL(QM),D_{JS}(P, Q) = \frac{1}{2} D_{KL}\Bigl(P \Big\| M\Bigr) + \frac{1}{2} D_{KL}\Bigl(Q \Big\| M\Bigr),

where the mixture distribution is:

M=12(P+Q).M = \frac{1}{2}(P + Q).

This definition ensures symmetry and finiteness.

B.2. Explicit Expression for DJS(P,Pe)D_{JS}(P, P_e)

When comparing a distribution PP with the uniform distribution PeP_e (where pe(i)=1/np_e(i) = 1/n for nn outcomes), the JSD becomes:

DJS(P,Pe)=12i=1npiln2pipi+1n+12i=1n1nln2/npi+1n.D_{JS}(P, P_e) = \frac{1}{2}\sum_{i=1}^{n} p_i \ln \frac{2p_i}{p_i + \frac{1}{n}} + \frac{1}{2}\sum_{i=1}^{n} \frac{1}{n} \ln \frac{2/n}{p_i + \frac{1}{n}}.

B.3. Simple Example

Consider a probability distribution P=(0.5,0.3,0.2)P = (0.5,\, 0.3,\, 0.2) over 3 outcomes. The uniform distribution is:

Pe=(13,13,13)(0.333,0.333,0.333).P_e = \left(\frac{1}{3},\, \frac{1}{3},\, \frac{1}{3}\right) \approx (0.333,\, 0.333,\, 0.333).
  1. Compute the Mixture:
    For each outcome ii, the mixture is:

    Mi=12(pi+13).M_i = \frac{1}{2}\left(p_i + \frac{1}{3}\right).

    For i=1i=1:

    M1=12(0.5+0.333)0.4165.M_1 = \frac{1}{2}\left(0.5 + 0.333\right) \approx 0.4165.

    For i=2i=2:

    M2=12(0.3+0.333)0.3165.M_2 = \frac{1}{2}\left(0.3 + 0.333\right) \approx 0.3165.

    For i=3i=3:

    M3=12(0.2+0.333)0.2665.M_3 = \frac{1}{2}\left(0.2 + 0.333\right) \approx 0.2665.
  2. Compute the KL Divergences:
    Calculate:

    DKL(PM)=0.5ln0.50.4165+0.3ln0.30.3165+0.2ln0.20.2665,D_{KL}(P\|M) = 0.5\ln\frac{0.5}{0.4165} + 0.3\ln\frac{0.3}{0.3165} + 0.2\ln\frac{0.2}{0.2665},

    and

    DKL(PeM)=13ln0.3330.4165+13ln0.3330.3165+13ln0.3330.2665.D_{KL}(P_e\|M) = \frac{1}{3}\ln\frac{0.333}{0.4165} + \frac{1}{3}\ln\frac{0.333}{0.3165} + \frac{1}{3}\ln\frac{0.333}{0.2665}.
  3. Jensen–Shannon Divergence:
    Finally, the divergence is:

    DJS(P,Pe)=12DKL(PM)+12DKL(PeM).D_{JS}(P, P_e) = \frac{1}{2}D_{KL}(P\|M) + \frac{1}{2}D_{KL}(P_e\|M).

Appendix C: Martingale Process

C.1. Filtration

A filtration is a family of sigma–algebras

{Ft}t0,\{\mathcal{F}_t\}_{t \ge 0},

which represents the information available up to each time ( t ). It is an increasing family, meaning that for any ( 0 \le s \le t ),

FsFt.\mathcal{F}_s \subseteq \mathcal{F}_t.

Intuitively, (\mathcal{F}_t) includes all events (or outcomes) that have occurred by time ( t ). For example, if you are observing the price of a stock over time, (\mathcal{F}_t) would consist of all the historical data, news, and any other relevant information known up to time ( t ).

C.2. Martingale Definition

A stochastic process ( (M_t)_{t \ge 0} ) is called a martingale with respect to the filtration ( (\mathcal{F}t){t \ge 0} ) if it satisfies the following conditions:

  1. Integrability:

    E[Mt]<for all t0.\mathbb{E}[|M_t|] < \infty \quad \text{for all } t \ge 0.
  2. Martingale Property:
    For any ( 0 \le s < t ),

    E[MtFs]=Ms(almost surely).\mathbb{E}[M_t \mid \mathcal{F}_s] = M_s \quad \text{(almost surely)}.

This definition captures the essence of a martingale: given all the information available at time ( s ), the best prediction for the value at time ( t ) is the current value ( M_s ). In other words, there is no predictable trend or “drift” in the process.

C.3. A Simple Example: Fair Coin Toss Random Walk

Consider a simple game based on tossing a fair coin. Define a sequence of random variables ( X_1, X_2, \dots ) by

Xi={+1,if the i-th toss is Heads,1,if the i-th toss is Tails.X_i = \begin{cases} +1, & \text{if the } i\text{-th toss is Heads}, \\ -1, & \text{if the } i\text{-th toss is Tails}. \end{cases}

Since the coin is fair, each ( X_i ) has an expected value of zero:

E[Xi]=0.\mathbb{E}[X_i] = 0.

Now, define the cumulative sum (or simple random walk) by

Mn=i=1nXi,with M0=0.M_n = \sum_{i=1}^{n} X_i, \quad \text{with } M_0 = 0.

Let the filtration ( {\mathcal{F}n}{n \ge 0} ) be the sigma–algebra generated by the outcomes of the first ( n ) tosses. Then, for any ( n ), the martingale property is verified as follows:

E[Mn+1Fn]=E[Mn+Xn+1Fn]=Mn+E[Xn+1]=Mn.\mathbb{E}[M_{n+1} \mid \mathcal{F}_n] = \mathbb{E}\left[M_n + X_{n+1} \mid \mathcal{F}_n\right] = M_n + \mathbb{E}[X_{n+1}] = M_n.