Academic Research with Event Studies: Findings, Methods, Tools

Q: What event window should I use?

For clean, precisely dated news use the shortest window the information diffusion allows, typically [0,+1] or [-1,+1]; the bulk of a clean reaction lands on the announcement day and the next. Extend to a few days only for slow-diffusing or hard-to-date events, accepting a known cost in power and confounding exposure (Brown and Warner, 1985).

The event study is the dominant empirical method for measuring the value relevance of information, and it powers thousands of published papers across finance, accounting, economics, management, marketing, operations, information systems, political science, and law.

An event study converts a dateable corporate or economic event into a number: the abnormal return, the slice of a security's price move that the event, and not the market, explains. The same six-step procedure that Fama, Fisher, Jensen and Roll (1969) ran on stock splits now anchors more than five hundred published studies, sits inside U.S. securities-fraud verdicts, and runs on monetary-policy desks at the Federal Reserve. This page maps the canonical findings with exact magnitudes, a worked numeric example, the methodology choices a referee will probe, who runs these studies in the real world, and how to run one for free in our calculators (no SAS, CRSP, or WRDS license needed).

This page assumes you know the mechanics of the method. If you are new to it, our introduction to event study methodology covers the abnormal-return intuition, the estimation and event windows, and the CAR, CAAR and BHAR definitions, while the expected-return models and significance tests pages cover the benchmark model and the test statistics. What follows is about how event studies are used in published research: the canonical findings, who runs them, and the choices a referee will probe.

Why the event study is the workhorse of empirical research

The method is unusually durable. It was invented by Fama, Fisher, Jensen and Roll (1969), who studied 940 splits on 622 NYSE common stocks (January 1927 to December 1959), introduced the market-model cumulative-average-residual framework, and delivered the first direct test of market efficiency. Their seminal efficiency result is subtle and often forgotten: cumulative average residuals rose steadily in the months before a split and were flat after it, and the post-split path depended on the dividend that followed. Splits followed by dividend increases showed no post-split drift, while splits followed by dividend decreases drifted down, evidence that the split itself carried no information beyond the dividend signal it predicted. In parallel, Ball and Brown (1968), studying 261 firms over 1946 to 1966, showed that annual earnings carry information content (roughly 85 to 90 percent of which is impounded by the announcement month) and, as a by-product, documented the post-announcement drift that still anchors the anomalies literature. The procedure those papers established is, in its essentials, the procedure researchers run today. For the conceptual foundations, see our introduction to the event study methodology.

The breadth of borrowing is documented. Corrado (2011) traces the migration of the technique out of finance and accounting into economics, history, law, management, marketing, and political science; the standard surveys add operations and supply-chain management and information systems. Kothari and Warner (2007) put a number on it: a census of 565 event-study papers published 1974 to 2000 across the Journal of Business, Journal of Finance, Journal of Financial Economics, the Journal of Financial and Quantitative Analysis, and the Review of Financial Studies, which they call a lower bound. Of those 565, roughly 200 use a maximum window of twelve months or more: about a third of the literature ventures into the fragile long-horizon zone despite it being the least reliable, a striking juxtaposition of popularity and validity. The dominance has not faded. Roughly 305 articles in the Journal of Finance and the Review of Financial Studies alone used event-study methods between 2010 and 2025, extending the workhorse status into the current decade.

Free since 1969, by design. The method needs only daily security returns and a market index. FFJR ran it in 1969 with no commercial database. That is the whole value proposition of our calculators: the same estimators, on any market and any period, in the browser, for free, with no SAS, CRSP, or WRDS subscription.

What the research shows

Across applications, the empirical regularities are consistent enough that researchers can sanity-check their own results against them. Information is impounded fast: for clean single-firm news, the bulk of the price reaction occurs on the announcement day and the day after, which is precisely why credible studies use very short windows such as $[-1,+1]$ or $[0,+1]$.

The typical abnormal-return signs and magnitudes vary by event type:

Mergers and acquisitions are the most replicated application and one of the most robust findings in financial economics. Andrade, Mitchell and Stafford (2001), across roughly 4,000 deals from 1973 to 1998, report target shareholders earning an average three-day $(-1,+1)$ CAR of about $+16\%$ (rising to about $+23.8\%$ over the longer $(-20,\text{close})$ window), acquirers earning about $-0.7\%$ on the 3-day window (statistically indistinguishable from zero) and about $-3.8\%$ over the longer window, and combined returns of about $+1.8\%$ (3-day) to $+1.9\%$ (longer): targets win big, acquirers roughly break even or lose modestly. See our mergers and acquisitions page.
Earnings surprises move prices in the direction of the surprise; top-decile positive surprises generate average short-window CARs on the order of $+3\%$ to $+5\%$, far smaller than M&A target reactions. See dividends and earnings announcements.
Post-earnings-announcement drift (PEAD) is the granddaddy underreaction anomaly: prices keep drifting in the direction of the surprise for weeks and months. Bernard and Thomas (1989) show a long/short strategy on the extreme surprise deciles earned roughly $18\%$ over the year following portfolio formation, interpreted as underreaction rather than risk (the companion Bernard and Thomas, 1990 report about 8 to 9 percent per quarter pre-cost). This is a 1980s-era magnitude that has since shrunk; see what has held up since 2010.
Dividend signaling is asymmetric and modest: initiations and increases earn small positive abnormal returns, while omissions and cuts produce larger negative reactions. Asquith and Mullins (1983) quantify the seminal case: firms initiating dividends earn about $+3.7\%$ over the two days around the announcement, with larger initiations producing larger reactions.
Marketing events (product launches, alliances, brand and advertising events) generate economically meaningful but generally smaller abnormal returns than M&A; the discipline treats them as measurable shareholder-value effects of marketing actions (Sorescu, Warren and Ertekin, 2017).

Several cross-sectional drivers of CAR magnitude recur regardless of discipline: the size and sign of the surprise, firm size (smaller firms react more), leverage, the prior information environment and analyst coverage, and the credibility of the source. Bad news also tends to be impounded faster and more strongly than good news.

Empirical findings at a glance

The exact, quotable magnitudes behind the prose above, one citation per row. This is the kind of literature-anchored reference that paywalled, software-only competitors do not provide: the incumbent products publish a test-statistic feature list but zero literature-anchored findings with magnitudes, so a free tool that also hands the researcher the canonical benchmarks to sanity-check against is strictly more useful.

Finding	Magnitude	Sample	Source
First event study; semi-strong efficiency	Pre-split CAR rises, post-split flat; drift conditional on dividend outcome	940 splits, 622 NYSE stocks, 1927-1959	FFJR (1969)
Earnings have information content; first drift	~85-90% impounded by announcement month	261 firms, 1946-1966	Ball & Brown (1968)
M&A announcement returns	3-day $(-1,+1)$: target ~+16%, acquirer ~-0.7%, combined ~+1.8%. Longer $(-20,\text{close})$ window: target ~+23.8%, acquirer ~-3.8%, combined ~+1.9%	~4,000 deals, 1973-1998	Andrade, Mitchell & Stafford (2001)
Dividend initiation	~+3.7% two-day excess return	Initiating firms	Asquith & Mullins (1983)
Post-earnings-announcement drift (PEAD)	~+18% over the year following portfolio formation	SUE-decile long/short	Bernard & Thomas (1989)
Short-window test power	Reliably detects ~1% event-day abnormal return	Daily-return simulation	Brown & Warner (1985)

The single most important stylized fact is about the method itself: there is a sharp short-horizon versus long-horizon divide. Short-window studies are reliable and well powered. Long-horizon abnormal-return estimates are fragile because they are sensitive to the choice of benchmark model (the "bad-model problem"). Fama (1998) argues that long-run "anomalies" are about equally split between apparent overreaction and underreaction and tend to dissolve under reasonable changes in methodology, and Kothari and Warner (2007) conclude that short-horizon methods are relatively reliable while long-horizon inference suffers serious, unresolved misspecification.

A worked example you can follow

The arithmetic is simpler than the notation suggests. Take one stock. From the estimation window, an OLS market-model regression of the stock's return on the market return yields an intercept $\alpha = 0.0005$ and a slope $\beta = 1.1$. On the event day the market returns $R_m = +0.4\%$. The benchmark (normal) return is therefore:

Expected return $= \alpha + \beta \cdot R_m = 0.0005 + 1.1 \times 0.004 = 0.0049 = 0.49\%$.

The stock actually returned $+2.0\%$ that day. The abnormal return is the gap:

Abnormal return $= 2.0\% - 0.49\% = 1.51\%$.

To get the CAR over a three-day window $(-1,+1)$, repeat the calculation for each day and sum. If the abnormal returns on days $-1$, $0$, and $+1$ were $0.30\%$, $1.51\%$, and $0.45\%$, the CAR is $0.30 + 1.51 + 0.45 = 2.26\%$. Average that CAR across many sample firms and you have the cumulative average abnormal return (CAAR) that a significance test then evaluates against zero.

Common student errors on this exact calculation. Graders see the same four mistakes: (1) dropping the intercept $\alpha$ and using $\beta \cdot R_m$ alone as the normal return; (2) counting calendar days instead of trading days when laying out the windows; (3) letting the event day (or leakage days) slip into the estimation window, which contaminates $\alpha$ and $\beta$; and (4) summing CARs across firms instead of averaging them into a CAAR. Each one quietly biases the result; none throws an error message.

Reading a results table. Two numbers per row do the work, and they answer different questions. The AR or CAAR is the point estimate: the economic size of the reaction. The $t$, Patell, or BMP statistic is the reliability: whether that size is distinguishable from zero. Read them together. A large CAAR with $t 2$ is not a finding, and a tiny CAAR with $t = 8$ is reliable but may be economically trivial. Significance without magnitude, or magnitude without significance, is half a result.

The canonical real-data anchor. MacKinlay (1997) illustrates the method on 600 quarterly earnings announcements for the 30 Dow Jones Industrial Average firms (January 1989 to December 1993), using the market model on the CRSP value-weighted index, a 250-day estimation window, and a 41-day event window $(-20,+20)$. Sorting announcements into good-news, no-news, and bad-news groups, the day-0 average abnormal return is about $+0.965\%$ for good news, about zero (statistically insignificant) for no news, and about $-0.679\%$ for bad news, with the CAARs diverging sharply by group. The good-news figure carries a standard error of about $0.104\%$ and a $t$-statistic of about $9.28$, so the null of no reaction is rejected emphatically; the bad-news $-0.679\%$ is likewise significant. This is the worked example every textbook reaches for, and you can run the same computation on your own sample in ARC.

Who runs event studies in practice

The event study is not only published-research furniture. It is a load-bearing instrument in courts, central banks, antitrust agencies, and trading desks. Naming the institutions that rely on these exact estimators is the cleanest way to see why a free, browser-based implementation matters: the competitor product locks identical estimators behind a SAS plus CRSP or WRDS subscription, even as it lists regulators such as the Office of the Comptroller of the Currency and the Office of Financial Research among its subscribers.

Litigation

Under SEC Rule 10b-5, a single event study is used to establish four distinct legal elements at once: materiality, reliance, loss causation, and per-share damages. The U.S. Supreme Court has effectively institutionalized the technique. In Halliburton II (573 U.S. 258, 2014) the Court let defendants rebut the fraud-on-the-market presumption of reliance at class certification with direct evidence of no price impact, which in practice means an event study, making it the gatekeeper for whether a securities class action is certified at all. In Dura Pharmaceuticals v. Broudo (544 U.S. 336, 2005) the Court required proof that the price actually fell on a corrective disclosure, isolated from confounding news, which is exactly an event study. Admissibility runs through the Daubert standard, and because the method is peer-reviewed, tested, and has a known error rate, courts treat it as presumptively reliable. The cost: single-firm ($n=1$) studies, the kind required against an individual defendant, are statistically far weaker than the multi-firm studies that validated the method and over-reject in volatile markets (Baker, 2016; Brav and Heaton, 2015), so the robust statistics this page recommends (BMP, Corrado, the Patell time-series test) are not academic nicety but the line between admissible and excluded testimony.

Inside the courtroom. The doctrine has concrete cases behind it. On remand in Halliburton, the court ran event studies on six corrective-disclosure dates, found five statistically insignificant, and certified the class only on the December 7, 2001 date, the event study acting as the class-certification gatekeeper. In the Tesla "funding secured" case (In re Tesla Securities Litigation), the plaintiffs' expert Michael Hartzmark presented a roughly $12 billion event-study damages estimate over the August 2018 tweets, yet the jury in February 2023 found Elon Musk not liable. In In re Vivendi, Dr. Blaine Nye's event study underpinned a price-maintenance (inflation-maintenance) theory of damages that the Second Circuit accepted on appeal in 2016. The lesson running through all three: the magnitude is only half the case, the significance test is the other half.

See our security fraud litigation page; run it in ARC.

Regulators and central banks

Central banks operate the method as infrastructure. The Federal Reserve Bank of San Francisco publishes and regularly updates the US Monetary Policy Event-Study Database, recording high-frequency asset-price moves in 30-minute windows around FOMC statements and 70-minute windows around press conferences across money-market futures, OIS, Treasury and TIPS yields, equity indexes, and the dollar (SF Fed USMPD). The canonical magnitude behind that database: Bernanke and Kuttner (2005) find that a surprise 25-basis-point cut in the fed funds target is associated with about a $+1\%$ rise in broad equity indexes, working mostly through expected future excess returns rather than real rates or dividends, with the surprise extracted from the announcement-day move in fed-funds futures (a high-frequency event study). The recent reassessment of high-frequency monetary identification (Bauer and Swanson, 2023) revised the macro interpretation but left the asset-price estimates largely unchanged, so the event-study numbers underpinning the database still hold. The Federal Trade Commission's Bureau of Economics has published event-study merger analyses, and the OCC and the Office of Financial Research appear as institutional users. This is the method as government infrastructure, not a journal technique.

Antitrust authorities

Antitrust use inverts the usual sign logic. If a horizontal merger is anticompetitive, the merging firms' rivals should earn positive abnormal returns, anticipating higher industry prices. The classic test is Eckbo and Wier (1985): rivals of merging firms rose at the merger announcement but reacted normally when the deal was challenged under antitrust law, which is the signature of efficiency gains rather than market power (if the merger threatened higher industry prices, rivals should have fallen on the challenge). The FTC found the complementary positive-rival pattern around the 1986 May Co. / Associated Dry Goods and 1988 American Stores / Lucky Stores deals (the latter the closest to an anticompetitive signal). The honest caveat, worth citing for balance, is that event studies sometimes fail to flag mergers later proven anticompetitive. The companion lesson for confounding control is enforced in court: in Comcast Corp. v. Behrend (569 U.S. 27, 2013) the Supreme Court decertified an antitrust class because the damages model failed to isolate the challenged conduct from other price drivers, the courtroom version of the confounding-events pitfall. Measure rival reactions with AVC and ARC.

Investors and arbitrageurs

Merger arbitrageurs trade the same signal academics measure. They read the announcement-day target jump and the residual spread to the offer as a market-implied probability of deal completion. For a cash deal, the implied completion probability is approximately the current price divided by the offer price (downside-adjusted for the fall-back price if the deal breaks); the residual spread is the market's compensation for deal-break risk. Mitchell and Pulvino (2001) document a median target jump of about $+27\%$ on announcement and a residual spread of about $3.5\%$, with the large majority of announced U.S. deals completing. Academics measure the abnormal return; arbitrageurs trade it. See mergers and acquisitions.

Corporates and consultants

Litigation-support and economic-consulting firms run event studies for damages estimation, valuation, and regulatory submissions, while corporate strategy and investor-relations teams use them to benchmark the market reaction to their own and rivals' announcements. The same procedure that fills a journal article fills an expert report.

How to run this kind of event study

The canonical procedure, as laid out in the standard pedagogical surveys (MacKinlay, 1997; Kothari and Warner, 2007), has six steps:

Define the event and the event date. This is the single most error-prone step. The date must be the first public release of the information; mis-dating (for example, using the wrong filing or news date) blurs the reaction and destroys statistical power. Pre-announcement leakage and information that diffuses over more than one day argue for including the day after.
Set the windows. Use an estimation window (commonly about 120 to 250 trading days ending before the event) and a short event window. Leave a gap between the estimation and event windows at least as large as the pre-event portion of the event window, so the event cannot contaminate the normal-return parameters; widen the gap if leakage is suspected. Keep the event window as short as the diffusion of the information allows: longer windows mechanically raise the odds of capturing confounding news and lower power.
Estimate a normal-return (benchmark) model. The market model (an OLS regression on a market index) is the workhorse and is hard to beat for short windows; mean-adjusted and market-adjusted models are cruder, and Fama-French or Carhart multifactor models add little for short windows but matter more at long horizons. See expected-return models.
Compute abnormal returns as actual minus predicted over the event window.
Aggregate to a CAR per firm, and to average CAR and cumulative average abnormal return (CAAR) across firms.
Test significance with parametric and nonparametric statistics, reported together.

A step-by-step recipe, from data ingestion to publication-ready output, is set out in the application blueprint.

A methodology decision table

The choices a referee will probe, the typical academic answer, and the app that handles each. Named window conventions let you point to precedent rather than defend a number from scratch.

Decision	Typical academic choice	Why	App
Event date	First public disclosure	Mis-dating blurs the reaction and kills power	EDI
Estimation window	~120-250 trading days (e.g. Johnston 250 + 45-day gap; Sorescu ~100; Wiles 90 ending t-6)	Enough data for stable parameters without staleness	ARC
Estimation/event gap	≥ pre-event half-width of the event window	Keeps the event out of the benchmark	ARC
Event window	Single day to $(-1,+1)$ / $(0,+1)$; longer only if slow-diffusing	Short windows maximize power, minimize confounds	ARC
Normal-return model	Market model (short window); FF/Carhart (long)	Multifactor adds little at short horizons	Models / ARC
Test statistic	Patell/BMP baseline; Corrado/GRANK for non-normality; Kolari-Pynnonen ADJ for clustering	Daily returns are fat-tailed; variance shifts at the event	ARC
Confounder screen	Drop overlapping news; report screened and full samples	Confounds bias the AR (decisive for longer windows)	CATA / EDI
Thin trading	Scholes-Williams / Dimson / Fowler-Rorke beta	OLS beta is biased under non-synchronous trading	ARC

Choosing the right test statistic

Daily stock returns are fat-tailed and non-normal, and abnormal-return variance typically rises around the event date, so the plain cross-sectional t-test can be misspecified. Brown and Warner (1985) showed by simulation that simple market-model procedures are nevertheless well specified and powerful at the portfolio level, but the literature has since converged on a menu of robust statistics matched to the event type:

Use the Patell standardized-residual test as a parametric baseline: it standardizes each firm's event-window abnormal return by its estimation-period standard deviation before aggregating (Patell, 1976).
Use the standardized cross-sectional (BMP) test when the event plausibly changes return volatility, because it is robust to event-induced variance (Boehmer, Musumeci and Poulsen, 1991). The reason it is the right default whenever an event could move volatility: an event-induced variance increase is statistically indistinguishable from genuine cross-sectional heterogeneity in the true abnormal return, and both make the plain cross-sectional $t$-test reject too often, so the standardized statistic is the conservative choice.
Use the Corrado rank test or a generalized sign test for robustness to non-normality and fat tails (Corrado, 1989).
Use the Kolari-Pynnonen adjusted test when events cluster in calendar time: even small cross-sectional correlation of abnormal returns severely over-rejects the null, and the cross-correlation-robust adjustment restores correct size (Kolari and Pynnonen, 2010).
Use the generalized rank (GRANK) test as a strong single default: it is robust to event-induced volatility, clustering-induced cross-sectional correlation, and non-normality simultaneously, is valid for multi-day CARs (where earlier rank tests failed), and empirically out-powers both Patell and BMP (Kolari and Pynnonen, 2011). Our ARC computes both the Generalized Rank T and Z variants.

For the full catalog and the formulae behind each, see significance tests. Corrado (2011) reviews these short-horizon refinements and recommends nonparametric and standardized-cross-sectional statistics over the plain t-test.

Statistical power and sample size

Power is the question students get wrong most often. In Brown and Warner's (1985) simulations (250 portfolios of 50 securities each), the market-model test detects a 1 percent day-0 abnormal return about 80 percent of the time, versus about 76 percent for the cruder mean-adjusted return, while keeping rejection under the null near the nominal 5 percent. Power rises with sample size and falls with abnormal-return variance. The practical implication: large effects (M&A targets at $+15\%$ to $+30\%$) are detectable with a handful of events, while small effects (sub-1-percent earnings or marketing reactions) need dozens to hundreds. Because the confounder screen trades bias for power, always report the effective sample size after screening. This is exactly where a free tool helps: ARC handles large samples at no cost and with no CRSP or WRDS license.

Design thresholds, with numbers. Kothari and Warner (2007) compute the analytic power functions so you can size a sample before you run it. For low-variance (decile-1) firms, 21 stocks give 90 percent power to detect a one-day 1 percent abnormal return; for high-variance (decile-10) firms, even a 5 percent abnormal return needs about 60 stocks for the same power. Horizon is the real power killer: a 10 percent abnormal return concentrated on one known day is detected essentially 100 percent of the time with just 6 stocks, but the same 10 percent spread over six months needs 200 firms to be detected even 65 percent of the time. MacKinlay (1997) gives the complementary warning: with a true abnormal return of 0.5 percent and only 20 firms, a 5 percent test has power of just 0.20, so four times in five it will miss a real effect.

Recent Monte Carlo work backs the robust-test defaults this page recommends: under event-date clustering, the Patell and BMP statistics over-reject more as the sample grows (from, say, 10 to 50 securities), while the Kolari-Pynnonen ADJ-BMP and rank or GRANK tests hold their size near the nominal level. When in doubt under clustering, default to GRANK or the adjusted test rather than letting a larger sample worsen the problem.

Sample construction and common pitfalls

Three threats recur across disciplines. First, confounding events: other material news in the event window contaminates the abnormal return. The discipline reviews treat this as a leading validity threat and prescribe short windows plus an explicit confounding-event screen that drops firm-events overlapping with simultaneous earnings, M&A, dividend, or other announcements (McWilliams and Siegel, 1997). It is also the step most often skipped in practice: a census of 29 short-term event studies in operations and supply-chain management found the market model dominant and event windows of three days or fewer standard, but the confounding screen under-applied and longer windows usually adopted without theoretical justification (Wu, Gaur, Modi et al., 2018). The nuance the literature actually settled on, however, is important: Sorescu, Warren and Ertekin (2017) report that, on a large 3-day-window sample (the published figure is 3,982 firm-events), the difference between the full sample and the confounder-excluded sample was statistically insignificant, so for very short windows the screen is best treated as a robustness check rather than a conclusion-changer, while for longer windows it can be decisive. Screen, and report both. Second, event clustering and cross-sectional correlation: when many sample firms share the same calendar date, as in a regulatory shock, abnormal returns are correlated and naive tests over-reject by a quantifiable amount. The inflation of the test statistic equals $\sqrt{1+(N-1)\rho}$ (Kothari and Warner, 2007, eq. 10): with an average pairwise correlation of only $\rho = 0.02$ and $N = 100$ firms, the $t$-statistic is overstated by a factor of about $1.73$, and by a factor of two or more for the typical multi-year long-horizon sample (empirical pairwise BHAR correlations run about 0.02 to 0.03, Mitchell and Stafford, 2000). The scope condition is sharp and worth memorizing: cross-correlation is largely irrelevant for short windows unless events share a calendar date, but unavoidable at long horizons even without calendar clustering. Remedies are calendar-time portfolios or the Kolari-Pynnonen adjustment. Third, thin trading: non-synchronous trading biases the estimated beta downward, so require a minimum number of non-missing returns and, for small, illiquid, or non-US samples, use a non-synchronous-trading beta correction (Scholes and Williams, 1977; Dimson, 1979; Fowler and Rorke, 1983).

Data quality off CRSP. Our calculators work on any market, which is a genuine strength, but it puts the onus on your data. For non-US or non-CRSP samples, control survivorship bias by including delisted and "dead" firms, beware padded or stale returns on non-trading days, and choose a defensible local market index. The free reach of the method is real only if the inputs are clean.

The cautionary literature is worth taking seriously. McWilliams and Siegel (1997) replicated three published management studies and showed that conclusions flip with small design changes, and McWilliams, Siegel and Teoh (1999) demonstrated that five corporate-social-responsibility studies with conflicting results differed purely in research design. The magnitude of an abnormal return is also only interpretable against the size of the surprise: the standard second stage is a cross-sectional regression of CAR on surprise measures and firm characteristics, and text-based scoring of the announcement can quantify the tone and surprise that drive the reaction.

The long-horizon recipe

For horizons of months to years, do not extend the short-window machinery; switch methods, because this is where the bad-model problem bites hardest. Two estimators dominate. Buy-and-hold abnormal returns (BHAR) are computed geometrically, $\text{BHAR} = \prod_t (1+R_{it}) - \prod_t (1+E[R_{it}])$, with each sample firm matched to a control firm on size deciles and then book-to-market (Barber and Lyon, 1997). Because BHAR carries skewness, new-listing, and rebalancing biases, the bias-robust alternative is the calendar-time portfolio (CTAR) approach (Jaffe, 1974; Mandelker, 1974), which forms one portfolio observation per calendar month and regresses excess portfolio returns on Fama-French or Carhart factors. Report both, and treat the point estimates with appropriate caution: at long horizons the misspecification is not cosmetic. Long-horizon parametric tests can reject a true null more than 30 percent of the time at a nominal 5 percent level (Kothari and Warner, 2007), a roughly six-fold size distortion that makes spurious "anomalies" easy to manufacture. The clean one-line statement of the divide: short-horizon test specification is not highly sensitive to the benchmark model or to dependence assumptions, whereas long-horizon specification is quite sensitive to both.

What has held up since 2010

A definitive resource should state the current state of the evidence, not just the canon. Three threads matter.

Merger returns by decade. The target-wins pattern has been durable. Renneboog and Vansteenkiste (2019) document target CARs of roughly 24 percent in the 2000s rising toward roughly 29 percent in the 2010s, confirming the Andrade-Mitchell-Stafford target stylized fact across additional decades. The acquirer side, however, has genuinely updated: Alexandridis, Antypas and Travlos (2017) show acquirer announcement returns reversed sign post-2009, averaging about $+1.05\%$ versus about $-1.08\%$ over 1990 to 2009, with 54 percent of public deals showing positive acquirer CARs post-2009 (versus 39 percent in the 2000s), driven mainly by mega-deals and post-crisis improvements in acquirer governance. So "acquirers break even" is a full-sample average that masks a positive post-crisis sub-result, a clean reminder that what an event study finds can depend on the era it samples.

Anomalies have decayed. PEAD has attenuated with a documented mechanism, not just with time. Quarterly hedge returns fell from roughly 5 to 6 percent in the 1970s and 1980s, to about 4 percent in the 2000s, to roughly 2 to 3 percent or lower in the 2010s. Chordia, Subrahmanyam and Tong (2014) attribute the decline to cheaper arbitrage after 2001 decimalization (higher hedge-fund assets, more short interest, more turnover): the signal was traded away. The decay is uneven, and that is the tell that it is arbitrage rather than mismeasurement: a value-weighted earnings-surprise strategy now earns about 0.04 percent per month in the most liquid stocks but about 2.43 percent per month in the most illiquid, so the drift survives mainly where trading costs make it expensive to exploit. The Bernard-Thomas magnitude is best read as a 1980s artifact. More broadly, McLean and Pontiff (2016) find published anomaly returns about 26 percent lower out of sample and about 58 percent lower after publication, and Hou, Xue and Zhang (2020) report that a majority of published anomalies fail to replicate at conventional thresholds (with much of the failure concentrated in microcaps). As a counterweight, Jensen, Kelly and Pedersen (2023), applying a hierarchical Bayesian framework across many factors and markets, argue that roughly 82 to 85 percent of factors do replicate. The honest reading: short-window event-study results remain robust, while long-horizon and cross-sectional anomaly claims demand out-of-sample and post-publication checks.

Information is impounded even faster now. In modern markets a large share of the price adjustment to scheduled news (on the order of 70 percent in the first ten seconds) occurs almost instantly via high-frequency trading, reinforcing the case for short, precisely dated windows. The practical upshot for current practice: favor value-weighting and NYSE breakpoints, apply multiple-testing-aware thresholds, and validate with out-of-sample and post-publication checks. The secular pattern, in one line: short-window event-study measurements (M&A target reactions, FOMC asset-price moves, announcement-day CARs) have stayed stable across decades, while the exploitable drift and anomaly magnitudes have decayed as arbitrage cheapened. What held up is the measurement; what decayed is the tradable mispricing.

Common misconceptions

Myth	Fact
A bigger event window captures more of the effect.	Wider windows mechanically lower power and admit confounding news; the bulk of a clean reaction is on day 0 and day +1 (Brown & Warner, 1985).
A significant CAR proves the event caused the move.	Only if confounds are screened and the date is the first public disclosure; otherwise the CAR reflects whatever else happened that day (McWilliams & Siegel, 1997).
Use Fama-French / Carhart for everything.	Multifactor models add little over the market model for short windows; they matter mainly at long horizons (Kothari & Warner, 2007).
Long-horizon abnormal returns are as reliable as short ones.	They are fragile due to the bad-model problem and often dissolve under reasonable methodology changes (Fama, 1998).
A single-firm study needs many firms to be valid.	Single-firm studies are valid using the time-series Patell test and are standard in litigation, though weaker in volatile markets (Patell, 1976; Baker, 2016; Brav and Heaton, 2015).

Test yourself. Three quick interpretation drills, answers below each.

You study acquirers and find an average CAR of $-0.7\%$ with $t = -0.4$. Failed study? No. A near-zero acquirer CAR is the expected result, and $t = -0.4$ says it is statistically indistinguishable from zero. This is a confirmation, not a null finding to bury.
You report a $+12\%$ CAR over a $(-30,+5)$ window, but an earnings release fell inside that window and was not screened out. Trustworthy? No. The window is wide enough to swallow a confounder; re-run on $(-1,+1)$ and apply a confounding-event screen before believing the magnitude.
You see a significant abnormal return on day $-3$ when the announcement is dated day $0$. What does it mean? Suspect leakage or a mis-dated event. A clean reaction concentrates on day 0 and day +1; a pre-event spike usually means the news leaked, or your event date is wrong.

Discipline-specific norms

The mechanics are shared, but the norms a referee enforces differ by field. Match your design to the discipline you are publishing in. The newest growth area is sustainability: event studies are now the workhorse for measuring market reactions to ESG and carbon-disclosure news and to climate-policy shocks, where high-carbon sectors typically show the largest responses, extending the cross-discipline reach into a literature that barely existed a decade ago.

Discipline	Typical window	Typical sample	Dominant threat	Recommended test
Finance / accounting	$(-1,+1)$, $(0,+1)$	Large CRSP-style panels	Clustering, long-horizon bad model	Patell / BMP, GRANK
Management / strategy	Short, hand-dated	Small, hand-collected	Confounding events, small N	Nonparametric + confounder screen (McWilliams & Siegel)
Marketing	Single day to a few days	~100-day estimation	Confounds in voluntary announcements	BMP / Corrado (Sorescu et al.)
Law / economics	Single-firm or small cluster	n = 1 to a few	Power, Daubert admissibility	Patell time-series, robust stats
Operations / IS	Short	Industry or technology panels	Confounds, thin trading	BMP / Corrado

Effect-size cheat sheet

Sanity-check your magnitudes against these typical short-window benchmarks before you trust a surprising result.

Event type	Typical short-window abnormal return
M&A target	~+15% to +30%
M&A acquirer	~0% to slightly negative on average (but ~+1% post-2009, mega-deal driven; Alexandridis et al. 2017)
Top-decile positive earnings surprise	~+3% to +5%
PEAD (long/short, post-formation)	~+18% over the following year (1980s magnitude, since shrunk)
Dividend initiation / omission	Small positive / larger negative
Marketing events	Economically meaningful but smaller than M&A

Run it with our tools

Our calculators implement exactly the estimators this literature recommends, the same ones courts accept under Daubert, the Federal Reserve publishes, and the FTC uses, in the browser, for free, with no SAS, CRSP, or WRDS license required. The method needs only security returns plus a market index, so it works on any market and period. Results are emailed to you.

Abnormal Return Calculator (ARC) computes abnormal returns and CARs under the full set of normal-return models (market model, market-adjusted, mean-adjusted, Fama-French and Carhart factors) and reports the Patell, standardized cross-sectional BMP, Corrado rank, generalized sign, Kolari-Pynnonen adjusted, and generalized rank (GRANK T and Z) statistics: steps 3 through 6 of the procedure above.
Abnormal Volume Calculator (AVC) runs the same logic on trading volume, detecting information arrival even when the direction of the price reaction is ambiguous.
Volatility (AVyC) models event-induced variance directly with GARCH, treating the volatility change as the object of study rather than a nuisance.
Event Date Identifier (EDI) helps you pin the precise first-disclosure date, the step most likely to sink a study if done wrong.
News Analytics (CATA) scores the tone and information content of the announcement text, giving you the surprise measure for the cross-sectional second stage and the confounder screen.

Research FAQ

Method questions distinct from the tool-usage FAQ.

Why not just compare the stock price before and after the event?

Because a raw before-and-after change mixes three things together: the event, whatever the market did that day, and the firm's own normal drift. If the stock rose 2 percent but the market rose 1.5 percent and the stock normally tracks the market, almost all of that move was not the event. The benchmark model predicts the counterfactual ("but-for") return the stock would have earned absent the event, and the abnormal return is the part left over once the market move is stripped out. That subtraction is the entire point of the method. See the intuition.

What event window should I use?

For clean, precisely dated news, use the shortest window the information diffusion allows, typically $[0,+1]$ or $[-1,+1]$; the bulk of a clean reaction lands on the announcement day and the next. Extend to a few days only for slow-diffusing or hard-to-date events, accepting a known cost in power and confounding exposure (Brown & Warner, 1985). Set the window in ARC.

How many events do I need for statistical power?

It depends on the effect size. Large effects such as M&A target reactions are detectable with a handful of events, while sub-1-percent effects need dozens to hundreds. As concrete anchors: about 21 stocks give 90 percent power to detect a one-day 1 percent abnormal return for low-variance firms, but a 0.5 percent effect with only 20 firms has power of just 0.20, and a 10 percent effect spread over six months needs about 200 firms to be caught even 65 percent of the time (Brown & Warner, 1985; Kothari & Warner, 2007; MacKinlay, 1997). Report your effective sample after the confounder screen. See statistical power.

What is the difference between CAR, CAAR, and BHAR?

CAR sums one firm's daily abnormal returns arithmetically over the event window; CAAR averages those CARs across firms; BHAR compounds returns geometrically over long horizons against a matched control firm. Use CAR/CAAR for short windows and BHAR (or calendar-time portfolios) for long horizons (Barber & Lyon, 1997). See the long-horizon recipe.

Why are acquirer abnormal returns near zero?

In competitive markets the gains from a deal are largely competed away to the target's shareholders, so acquirers earn close to zero on average; Andrade, Mitchell and Stafford (2001) estimate about $-0.7\%$. A near-zero acquirer CAR is the expected result, not a failed study. Two modern caveats keep this honest. First, the average is sample-period dependent: Alexandridis, Antypas and Travlos (2017) find acquirer announcement returns turned positive post-2009 (about $+1.05\%$ versus about $-1.08\%$ over 1990 to 2009), driven by mega-deals and improved acquirer governance. Second, even where it is near zero, the acquirer CAR does not correlate with ex-post value creation because news about the standalone acquirer dominates it (Ben-David, Bhattacharya, Huang and Jacobsen, 2024). The teaching point: a CAR measures the market's instantaneous revision of expectations, not realized deal value. See mergers and acquisitions.

Can I run an event study on a single firm?

Yes. The time-series Patell test evaluates one firm's event-window abnormal return against its own estimation-period volatility, and single-firm studies are standard in securities litigation. They are statistically weaker than multi-firm studies and over-reject in volatile periods, so robust statistics and a clean estimation window matter (Patell, 1976; Baker, 2016; Brav and Heaton, 2015). Run it in ARC.

How do I handle events that cluster on the same date?

Calendar-clustered events have cross-sectionally correlated abnormal returns, and even small correlation makes naive tests over-reject by a measurable amount: with an average pairwise correlation of just 0.02 and 100 firms, the $t$-statistic is inflated by about 1.73x, and by a factor of two or more for typical multi-year samples (Kothari & Warner, 2007). Use the Kolari-Pynnonen adjusted test or the GRANK test, or form calendar-time portfolios (Kolari & Pynnonen, 2010, 2011). For short, non-clustered samples the adjustment is unnecessary. See choosing a test.

Which expected-return model should I pick?

For short windows the market model is hard to beat; mean-adjusted and market-adjusted models are cruder fallbacks, and Fama-French or Carhart add little at short horizons but matter at long ones. Match the model to the horizon (Kothari & Warner, 2007). See expected-return models.

Short-horizon versus long-horizon: which is reliable?

Short-horizon inference is well specified and powerful; long-horizon abnormal-return inference is fragile because it is highly sensitive to the benchmark model (the bad-model problem) and many long-run anomalies dissolve under reasonable methodology changes (Fama, 1998; Kothari & Warner, 2007). See what has held up since 2010.

This page sits in the Academic and teaching group and takes the "why the method and how to use our apps" angle; for the event-type catalog and discipline-specific deep dives, see the sibling pages rather than duplicating them here:

References

Legal authorities

Data and further readings

See the full bibliography for all sources cited across the site.

Beyond finance: how far the method has traveled

Although the event study grew up in finance and accounting, the same logic now travels across disciplines wherever a precise, unanticipated event meets a traded price. Marketing researchers study product launches and recalls. Operations and supply-chain scholars study disruptions, plant closings, and logistics shocks. Information-systems researchers study IT investment and security-breach announcements. Sustainable-finance work studies green-bond issuance and other ESG signals. Law and political-science researchers study court decisions and regulatory rulings, and recent frontiers extend to pandemic and climate-disaster responses.

Two lessons recur across these fields. First, non-financial information moves prices whenever the event is sharp and genuinely new to the market. Second, each discipline tends to contribute its own methodological refinements: careful event dating in management studies, contemporaneous-news screening in marketing, and explicit anticipation modeling where events leak ahead of formal disclosure.

Event-date identification is the linchpin of power

The single highest-leverage design choice is often the least discussed one: getting the event date right. When an event is mis-dated or diffuse, the abnormal return spreads across several days and the test silently loses power, even when the underlying effect is real.

It helps to separate scheduled events from negotiable or leaked ones. Scheduled events, such as earnings releases, index-reconstitution dates, and regulatory decision dates, carry a clean, knowable date. Negotiable or leaked events, such as mergers and alliances, often surface through rumor and partial disclosure before any formal announcement, so the relevant date is the first credible public disclosure, and the run-up window deserves inspection for leakage. The Boeing 737 MAX sequence is a useful illustration: the crashes, groundings, and regulatory actions unfolded as a chain of dates rather than a single clean event, and analysts must decide which date carries the new information. Our Event Date Identifier (EDI) is built to recover power in exactly these diffuse cases.

A CAR is an expectations number, not a value number

One caveat deserves its own heading because it is so easily forgotten in applied work. A cumulative abnormal return measures how the market revised its expectations of shareholder wealth around an event. It is not a measurement of realized cash flow, accounting value, or long-run welfare.

The 1982 Tylenol cyanide-tampering episode is the classic reminder. The immediate reaction reflected the market's revised expectation of damage, yet long-run value largely recovered as the company's response reset those expectations. The general point is that an announcement-window reaction reflects beliefs at a moment in time, and those beliefs can be revised again. Treating a CAR as proof of permanent value creation or destruction confuses the expectation with the outcome.

Survey of research themes

The literature can be read along two axes: how many event types a study examines, and whether the event originates with the firm itself or with a third party. The streams below organize the field and point to the event-type pages where each is documented in detail.

Multi-event-type studies

Some studies pool or compare several event categories at once. See stock-market responses to economy-wide events for market and regulatory shocks, comparative event-type analyses that place multiple event categories side by side, and competitive dynamics for how rivals react to a focal firm's moves.

Single-event-type studies

Other studies focus on one event type and refine its measurement. Long-running streams include earnings and dividend announcements, mergers and acquisitions, seasoned equity offerings, index reconstitutions, share repurchases, alliances and joint ventures, and divestitures. Across these streams the field has trended toward cross-event comparison and text-based event sourcing, for which large-scale contemporaneous-news harvesting with CATA is increasingly useful.

To see these themes applied to specific corporate events, visit mergers and acquisitions, earnings and dividend announcements, alliances and joint ventures, and divestitures. For multi-event perspectives, see economy-wide events, comparative analyses, and competitive dynamics. For the field-facing view of who uses these methods and why, see the practical-applications overview, and for mechanics start with the application blueprint.

Academic research

Why the event study is the workhorse of empirical research

What the research shows

Empirical findings at a glance

A worked example you can follow

Who runs event studies in practice

Litigation

Regulators and central banks

Antitrust authorities

Investors and arbitrageurs

Corporates and consultants

How to run this kind of event study

A methodology decision table

Choosing the right test statistic

Statistical power and sample size

Sample construction and common pitfalls

The long-horizon recipe

What has held up since 2010

Common misconceptions

Discipline-specific norms

Effect-size cheat sheet

Run it with our tools

Research FAQ

Why not just compare the stock price before and after the event?

What event window should I use?

How many events do I need for statistical power?

What is the difference between CAR, CAAR, and BHAR?

Why are acquirer abnormal returns near zero?

Can I run an event study on a single firm?

How do I handle events that cluster on the same date?

Which expected-return model should I pick?

Short-horizon versus long-horizon: which is reliable?

Related use cases

References

Legal authorities

Data and further readings

Beyond finance: how far the method has traveled

Event-date identification is the linchpin of power

A CAR is an expectations number, not a value number

Survey of research themes

Multi-event-type studies

Single-event-type studies

Related use cases