CausalImpact Analysis is a statistical method designed to estimate the causal effect of an intervention by comparing observed data against a counterfactual scenario—what would have happened in the absence of the intervention. This analysis leverages a Structural Time Series (STS) model to capture the underlying data-generating processes and employs Gibbs Sampling, a Bayesian inference technique, to derive posterior distributions of the model parameters.
2. Structural Time Series (STS) Model
The Structural Time Series (STS) model offers a robust framework for modeling time-series data by decomposing it into various components such as trend, seasonality, and regression effects.
2.1. State-Space Representation
The STS model is formulated within a state-space framework, comprising two primary equations: the State Transition Equation and the Observation Equation.
a. State Vector (xt)
The state vector encapsulates all latent (unobserved) components influencing the observed data at time t:
xt=ℓts1,ts2,t⋮sK,tβt
Components:
ℓt: Local Level capturing the underlying trend at time t.
sk,t: Seasonal Componentk at time t for k=1,…,K.
βt: Regression Coefficients representing the influence of covariates at time t.
b. State Transition Equation
The evolution of the state vector over time is governed by:
xt=Gxt−1+wt
Where:
G: State Transition Matrix dictating how each state evolves.
σsk2: Variance of the k-th seasonal component noise.
Σβ: Covariance matrix for the regression coefficients.
I: Identity matrix of appropriate dimension.
2.2. Observation Equation
The Observation Equation links the latent state vector to the observed data:
yt=F⊤xt+εt,εt∼N(0,σε2)
Where:
F: Observation Matrix, defined as:
F=[10K⊤Xt⊤]⊤
Xt: Covariate vector at time t.
σε2: Variance of the observation noise.
3. Priors and Hyperparameters
In Bayesian analysis, priors represent initial beliefs about the model parameters before observing the data. Proper specification of priors is essential as they influence the posterior distributions.
3.1. Local Level Variance Prior
σℓ2∼Inverse-Gamma(αℓ,βℓ)
αℓ: Shape parameter.
βℓ: Scale parameter.
3.2. Observation Noise Variance Prior
σε2∼Inverse-Gamma(αε,βε)
αε: Shape parameter.
βε: Scale parameter.
3.3. Regression Weights Prior
Assuming a multivariate normal prior for regression coefficients β:
β∼N(0,Λ−1)
Λ: Precision matrix, often derived from the design matrix X:
Λ=0.01×N0.5×(X⊤X)
N: Number of observations.
3.4. Initial State Priors
ℓ0∼N(y0,σy2)sk,0∼N(0,σsk2)β0∼N(0,Σβ)
y0: Initial observed value.
σy2: Variance of the initial level.
σsk2: Variance of the initial seasonal component k.
Σβ: Covariance matrix for the initial regression coefficients.
4. Bayesian Inference via Gibbs Sampling
Gibbs Sampling is a Markov Chain Monte Carlo (MCMC) method used to sample from the joint posterior distribution of model parameters and latent states.
4.1. Posterior Distribution
The objective is to sample from the joint posterior distribution:
p(x1:T,θ∣y1:T)
Where:
x1:T: State vectors from time 1 to T.
θ: Model parameters (e.g., σℓ2,σε2,Λ).
y1:T: Observed data from time 1 to T.
Using Bayes’ theorem:
p(x1:T,θ∣y)∝p(y∣x,θ)⋅p(x1:T∣θ)⋅p(θ)
4.2. Gibbs Sampling Steps
Gibbs Sampling iteratively samples each parameter conditioned on the current values of all other parameters.
Step 1: Sample Local Level Variance (σℓ2)
σℓ2∣x1:T,y1:T∼Inverse-Gamma(αℓ∗,βℓ∗)
Where:
αℓ∗=αℓ+2Tβℓ∗=βℓ+21t=1∑T(ℓt−ℓt−1)2
Step 2: Sample Observation Noise Variance (σε2)
σε2∣x1:T,y1:T∼Inverse-Gamma(αε∗,βε∗)
Where:
αε∗=αε+2Tβε∗=βε+21t=1∑T(yt−F⊤xt)2
Step 3: Sample Regression Weights (β)
Assuming time-invariant regression coefficients:
β∣x1:T,y1:T,σε2∼N(m,V)
Where:
V=(Λ+σε2X⊤X)−1m=V(σε2X⊤y)
Step 4: Sample State Vectors (x1:T)
Utilize Forward-Backward Sampling or similar algorithms to sample the latent states given current parameter estimates and observed data.
4.3. Multiple MCMC Chains
To ensure convergence and robustness:
Run Multiple Gibbs Chains (C chains): Each with different initializations.
Combine Samples Across Chains: Aggregate after convergence to form the posterior distribution.
5. Posterior Predictive Inference
With posterior samples, derive predictions for the counterfactual scenario (y^t) and assess the impact of the intervention.
5.1. Posterior Means
For each time t, the posterior mean prediction is:
y^t=E[yt∣y1:T]
In matrix form:
y^=F⊤xt
5.2. Credible Intervals
Compute the α-credible intervals (e.g., 95%) for y^t:
y^t(q)=Quantile(y^t,q),q∈{2α,1−2α}
6. Causal Effect Estimation
Assess the intervention’s impact by comparing observed data with model predictions.
6.1. Point Effects
The immediate difference at time t:
Point Effectt=yt−y^t
6.2. Cumulative Effects
Total impact from intervention start Tstart to time t:
Efficient computation and representation of the STS model rely heavily on matrix operations.
7.1. State Transition Matrix (G)
G=10⋮0001⋮00……⋱……00⋮100⊤0⊤⋮0⊤I
Diagonal elements set to 1 for identity transitions.
Off-diagonal elements are 0, except for potential seasonal dependencies.
7.2. Observation Matrix (F)
F⊤=10KXt⊤
Incorporates the local level and seasonal components directly.
Includes regression coefficients via Xt⊤.
7.3. Covariance Matrices (W)
W=σℓ20⋮00σs12I⋮0……⋱…00⋮Σβ
Diagonal matrix with variances for each state component.
Σβ represents the covariance matrix for regression weights.
7.4. Precision Matrix for Regression Weights (Λ)
Λ=0.01×N0.5×(X⊤X)
Derived from the design matrix X (covariates).
Controls the prior variance of regression weights.
7.5. Likelihood Function
For the entire dataset, the likelihood is:
p(y∣x,θ)=t=1∏TN(yt∣F⊤xt,σε2)
7.6. Posterior Distribution
Using Bayes’ theorem:
p(x1:T,θ∣y)∝p(y∣x,θ)⋅p(x1:T∣θ)⋅p(θ)
Where:
p(y∣x,θ): Likelihood.
p(x1:T∣θ): Prior on states.
p(θ): Priors on parameters.
8. Data Standardization and Scaling
Proper data preprocessing ensures that the model accurately captures patterns without being skewed by varying scales.
8.1. Standardizing Data
yt′=σyyt−μy
μy: Mean of the pre-intervention data.
σy: Standard deviation of the pre-intervention data.
8.2. Scaling Priors
Level Scale (σℓ):
σℓ=prior_level_sd×σy
Seasonal Drift Scales (σs):
σs=0.01×σy
9. Seasonal Effects Handling
Seasonality is a common feature in time-series data, representing periodic fluctuations.
9.1. Seasonal Components (sk,t)
Number of Seasons (m): Defines the periodicity (e.g., m=12 for monthly data with yearly seasonality).
Steps per Season (n): Granularity within each season (e.g., weekly steps within a yearly cycle).
9.2. Seasonal Drift (σsk)
Allows seasonal trends to gradually change over time:
sk,t=sk,t−m+ϵk,t,ϵk,t∼N(0,σsk2)
10. Summary
Your CausalImpact Analysis implementation encompasses a comprehensive Bayesian Structural Time Series (STS) model with the following core components:
State-Space Model: Defines the dynamics of latent states, including local level, seasonal components, and regression coefficients, influencing the observed data.
Priors: Utilizes Inverse-Gamma priors for variances and Normal priors for regression weights and initial states, integrating domain knowledge and ensuring regularization.
Gibbs Sampling: Employs Gibbs Sampling to iteratively sample from conditional posterior distributions of model parameters and latent states, ensuring convergence to the joint posterior.
Posterior Predictive Inference: Derives posterior mean predictions and credible intervals, providing probabilistic estimates of counterfactual scenarios.
Causal Effect Estimation: Quantifies the impact of interventions through point and cumulative effects by juxtaposing observed data against model predictions.
Matrix Operations: Leverages matrix algebra for efficient representation and computation of state transitions, observations, and parameter updates.
By articulating the entire process in precise mathematical terms, your formulation not only enhances interpretability but also lays the groundwork for potential extensions or modifications to the model, catering to the evolving needs of data analysis and causal inference.
11. Appendix
Derivation of Sampling the Local Level Variance (σℓ2)
This section provides a detailed mathematical derivation of the sampling step for the Local Level Variance (σℓ2) within the Gibbs Sampling procedure.
1. Model Setup
1.1. State Transition Equation
The evolution of the Local Level component is given by:
ℓt=ℓt−1+ηt,ηt∼N(0,σℓ2)
1.2. Prior for σℓ2
An Inverse-Gamma prior is assumed:
σℓ2∼Inverse-Gamma(αℓ,βℓ)
2. Likelihood Function
Given the state transition, the likelihood of observed states {ℓt}t=1T is:
This conjugate relationship between the Normal likelihood and the Inverse-Gamma prior facilitates efficient Gibbs Sampling, enabling straightforward updates of σℓ2 in each iteration.
References:
Harvey, A. C. (1990). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.
Gelman, A., et al. (2013). Bayesian Data Analysis (3rd ed.). CRC Press.