Tutorial

How to Run an Event Study in Python

From returns to CAAR and the Patell Z, vectorised across many events with pandas, numpy and statsmodels.

In short

To run an event study in Python, fit the market model per firm with statsmodels.OLS over the estimation window, compute abnormal returns as realised minus predicted in the event window, aggregate into CAR and CAAR with vectorised pandas operations, and test significance with the cross-sectional t-test, the Patell Z and the BMP test. The free ARC calculator runs the same pipeline from a CSV upload, no code required.

When researchers reach for Python

Python is the natural choice when the event study is one step in a larger data pipeline: scraping or querying prices, merging with deal or fundamentals data, and feeding results into a machine-learning or panel model. This tutorial builds the same estimator as the R tutorial so results match exactly; only the syntax differs.

Setup: pandas, numpy, statsmodels

import numpy as np
import pandas as pd
import statsmodels.api as sm
from scipy import stats

# panel: DataFrame[firm, date, price]
# market: DataFrame[date, mkt_price]
# events: DataFrame[firm, event_date]
EST_WIN = (-250, -11)   # estimation window in event time
EVT_WIN = (-5, 5)       # event window in event time

Step 1: Load prices, align trading days, compute returns

Compute simple returns $R_t = P_t/P_{t-1}-1$ per firm, attach the market return, and index each row in event time relative to the firm's event date.

panel = panel.sort_values(["firm", "date"]).copy()
panel["ret"] = panel.groupby("firm")["price"].pct_change()

market = market.sort_values("date").copy()
market["mkt"] = market["mkt_price"].pct_change()

df = (panel
      .merge(market[["date", "mkt"]], on="date", how="inner")
      .merge(events, on="firm", how="inner")
      .dropna(subset=["ret", "mkt"]))

# Event-time offset: 0 on the first trading day on/after the event date.
def add_event_time(g):
    g = g.sort_values("date").reset_index(drop=True)
    k = (g["date"] >= g["event_date"]).idxmax()
    g["rel_day"] = g.index - k
    return g

df = df.groupby("firm", group_keys=False).apply(add_event_time)

Step summary. Use groupby(firm).pct_change() for returns, merge the market return, and build an event-time index where rel_day = 0 is the event date.

Step 2: Fit the market model and extract alpha and beta

$$R_{i,t} = \alpha_i + \beta_i R_{m,t} + \varepsilon_{i,t}, \qquad t \in [T_0, T_1].$$

def fit_market_model(g):
    est = g[(g.rel_day >= EST_WIN[0]) & (g.rel_day = EST_WIN[1])]
    X = sm.add_constant(est["mkt"])
    res = sm.OLS(est["ret"], X).fit()
    M = len(est)
    return pd.Series({
        "alpha":   res.params["const"],
        "beta":    res.params["mkt"],
        "s_ar":    np.sqrt(res.ssr / (M - 2)),   # residual SD, M-2 dof
        "M":       M,
        "mkt_bar": est["mkt"].mean(),
        "mkt_ss":  ((est["mkt"] - est["mkt"].mean()) ** 2).sum(),
    })

params = df.groupby("firm").apply(fit_market_model).reset_index()

Step summary. Fit OLS(ret, const + mkt) per firm on the estimation window and keep alpha, beta, the residual standard deviation $S_{AR_i}=\sqrt{SSR/(M-2)}$, and $M_i$.

Step 3: Vectorised abnormal returns, CAR and CAAR

$$AR_{i,t} = R_{i,t} - (\hat\alpha_i + \hat\beta_i R_{m,t}), \quad CAR_i = \sum_{t} AR_{i,t}, \quad CAAR = \frac{1}{N}\sum_i CAR_i.$$

evt = (df[(df.rel_day >= EVT_WIN[0]) & (df.rel_day = EVT_WIN[1])]
       .merge(params, on="firm"))
evt["ar"] = evt["ret"] - (evt["alpha"] + evt["beta"] * evt["mkt"])

aar  = evt.groupby("rel_day")["ar"].mean()              # AAR path
car  = evt.groupby("firm")["ar"].sum().rename("car")    # CAR per firm
caar = car.mean()
N    = car.shape[0]

Step 4: Significance tests in Python

Cross-sectional t-test: $t = \sqrt{N}\,CAAR/S_{CAAR}$ with $S_{CAAR}^2=\frac{1}{N-1}\sum_i(CAR_i-CAAR)^2$.

# (a) Cross-sectional t-test.
s_caar = car.std(ddof=1)
t_cs   = np.sqrt(N) * caar / s_caar
p_cs   = 2 * stats.t.sf(abs(t_cs), df=N - 1)

Patell Z: standardise each AR by its forecast-error-adjusted SD, then aggregate. $S_{AR_{i,t}}^2 = S_{AR_i}^2\!\left(1+\frac{1}{M_i}+\frac{(R_{m,t}-\bar R_m)^2}{\sum(R_{m,s}-\bar R_m)^2}\right)$ and $z=\frac{1}{\sqrt N}\sum_i CSAR_i/S_{CSAR_i}$ with $S_{CSAR_i}^2=L_2\frac{M_i-2}{M_i-4}$.

# (b) Patell Z.
evt["fe_var"] = evt["s_ar"]**2 * (1 + 1/evt["M"]
                + (evt["mkt"] - evt["mkt_bar"])**2 / evt["mkt_ss"])
evt["sar"] = evt["ar"] / np.sqrt(evt["fe_var"])

g = evt.groupby("firm")
csar = g["sar"].sum()
L2   = g.size()
M    = g["M"].first()
var_cs = L2 * (M - 2) / (M - 4)            # L2 * (M-2)/(M-4)
z_i  = csar / np.sqrt(var_cs)
z_patell = z_i.sum() / np.sqrt(len(z_i))
p_patell = 2 * stats.norm.sf(abs(z_patell))

BMP (standardized cross-sectional) test: $t_{BMP}=\sqrt N\,\overline{SCAR}/S_{\overline{SCAR}}$ with $SCAR_i=CAR_i/S_{CAR_i}$, robust to event-induced volatility (Boehmer, Musumeci & Poulsen, 1991).

# (c) BMP test. Compute SCAR_i = CAR_i / S_CAR_i per firm, where S_CAR_i^2
# scales the residual variance to the event window with its own
# forecast-error correction (full expression on /significance-tests).
def scar_per_firm(sub):
    L2 = len(sub)
    M  = sub["M"].iloc[0]
    s  = sub["s_ar"].iloc[0]
    fe = ((sub["mkt"] - sub["mkt_bar"].iloc[0])**2).sum() / sub["mkt_ss"].iloc[0]
    s_car = np.sqrt(s**2 * (L2 + L2/M + fe))
    return sub["ar"].sum() / s_car

scar = evt.groupby("firm").apply(scar_per_firm)
t_bmp = np.sqrt(len(scar)) * scar.mean() / scar.std(ddof=1)
p_bmp = 2 * stats.t.sf(abs(t_bmp), df=len(scar) - 1)

print(f"CAAR={caar:.4f} | t_cs={t_cs:.2f} (p={p_cs:.3f}) | "
      f"Patell Z={z_patell:.2f} (p={p_patell:.3f}) | BMP t={t_bmp:.2f} (p={p_bmp:.3f})")

Step summary. The cross-sectional t-test gives a quick read; the Patell Z corrects for estimation error and unequal variances; the BMP test adds robustness to event-induced volatility.

Skip the code. Upload your CSV to ARC and get AR, CAR, CAAR, the Patell Z and BMP statistics directly, validated against the published test definitions.

Run it free in ARC →

Step 5: Plotting the CAAR path

import matplotlib.pyplot as plt
caar_path = aar.sort_index().cumsum()        # CAAR builds as cumulative AAR
ax = caar_path.plot(marker="o")
ax.axvline(0, ls="--", lw=1)                  # event day
ax.axhline(0, color="0.6", lw=0.8)
ax.set_xlabel("Event time (days)"); ax.set_ylabel("CAAR")
plt.tight_layout(); plt.show()

The CAAR path is the running sum of AAR across event time. A step at rel_day = 0 with a flat pre-event line is the textbook efficient-reaction pattern; a slope that begins before day 0 suggests information leakage.

No-code alternative: the free ARC calculator

The ARC calculator runs this same pipeline from a CSV upload. Choose the model on Expected-return models, the tests on Significance tests, upload, and download AR, CAR, CAAR and every statistic.

FAQ

How do I handle missing data?

Drop firm-days with missing returns before estimation and track the resulting estimation-window length $M_i$ per firm, because the Patell and BMP variance corrections depend on it. Do not forward-fill prices, which creates artificial zero returns.

What about overlapping events?

When event windows overlap in calendar time, abnormal returns are cross-sectionally correlated and the Patell Z over-rejects. Use the BMP test or the Kolari-Pynnonen adjusted statistics described on Significance tests.

OLS or GARCH for the return model?

For short daily event windows OLS on the market model is standard and robust; GARCH adds little because abnormal returns are dominated by the event-period return. GARCH helps mainly for long-horizon studies. See Choosing a return model.