Significance Tests for Event Studies
Every number is guilty unless proven innocent.
(Rao, 1997: 152)
The abnormal and cumulative abnormal returns from event studies are typically used in two ways. Either they are deployed as dependent variables in subsequent regression analyses or they are interpreted assuch. This latter direct interpretation seeks to answer the question if the abnormal returns of individual events or samples of events are significantly different from zero and thus not the result of pure chance.
The answer about statistical significance is given by means of hypothesis testing, where the null hypothesis ($H_0$) claims that there are no abnormal returns within the event window and the alternative hypothesis ($H_1$) suggests the opposite. Formally, the testing framework reads as follows:
\begin{equation}H_0: μ = 0 \end{equation}
\begin{equation}H_1: μ \neq 0 \end{equation}
μ, however, may not only represent simple abnormal returns (ARs). Event studies are oftentimes mutilevel calculations, where ARs are compounded to cumulative abnormal returns (CARs), and CARs are 'averaged' to cumulative average abnormal returns (CAARs) in crosssectional studies (sometimes also called 'sample studies'). In longrun event studies, the buyandhold abnormal return (BHAR) is often used to replace CAR. BHAR can then again be 'averaged' to ABHAR for crosssectional studies. Significance testing is needed for each of these values, meaning that μ in the above testing framework can represent ARs, CARs, BHARs, AARs, CAARs, and ABHARs. Let's shortly revisit these six different forms of abnormal return calculations, as presented in the introduction:
\begin{equation}AR_{i,t}=R_{i,t}E[R_{i,t}\Omega_{i,t}] \end{equation}
\begin{equation}AAR_{t}= \frac{1}{N} \sum\limits_{i=1}^{N}AR_{i,t} \end{equation}
\begin{equation}CAR_{i}=\sum\limits_{t=T_1 + 1}^{T_2} AR_{i,t} \end{equation}
\begin{equation}BHAR_{i}=\prod\limits_{t=T_1 + 1}^{T_2} (1 + R_{i,t}) \prod\limits_{t=T_1 + 1}^{T_2} (1 + E[R_{i,t}\Omega_{i,t}])\end{equation}
\begin{equation}CAAR=\frac{1}{N}\sum\limits_{i=1}^{N}CAR_i\end{equation}
\begin{equation}ABHAR=\frac{1}{N}\sum\limits_{i=1}^{N}BHAR_{i}\end{equation}
For grouped observations, both along the firm or event dimension, we provide a precisionweighted CAAR, which offers a similar standardization as the Patell Test:
\begin{equation}PWCAAR=\sum\limits_{i=1}^{N}\sum\limits_{t=T_1 + 1}^{T_2}\omega_i AR_{i, t}\end{equation}
where $$\omega_i = \frac{\left(\sum\limits_{t=T_1 + 1}^{T_2} S^2_{AR_{i,t}}\right)^{0.5}}{ \sum\limits_{1=1}^{N}\left(\sum\limits_{t=T_1 + 1}^{T2}S^2_{AR_{i,t}}\right)^{0.5}}$$
and $S^2_{AR_{i,t}}$ is the forecasterror corrected standard deviation.
The literature on event study test statistics covers a wide range of significance tests and is thus very comprehensive. Generally, significance tests can be grouped in parametric and nonparametric tests. Parametric tests assume that the individual firm's abnormal returns are normally distributed, whereas nonparametric tests do not rely on any such assumptions. Scholars typically combine parametric tests with nonparametric tests to verify that the research findings are not driven by outliers (see Schipper and Smith (1983) for an example). Table 1 provides an overview and links to the formulas of the different test statistics.
Null hypothesis  Parametric tests  Nonparametric tests  Test level 

$H_0: AR = 0$  AR Test  Individual Event  
$H_0: AAR = 0$  CrossSectional Test, TimeSeries Standard Deviation Test, Patell Test, Adjusted Patell Test, Standardized CrossSectional Test, Adjusted Standardized CrossSectional Test, and Skewness Corrected Test  Generalized Sign Test, Generalized Rank T Test, and Generalized Rank Z Test  Sample of Events 
$H_0: CAR = 0$  CAR ttest  Individual Event  
$H_0: CAAR = 0$  CrossSectional Test, TimeSeries Standard Deviation Test, Patell Test, Adjusted Patell Test, Standardized CrossSectional Test, Adjusted Standardized CrossSectional Test, and Skewness Corrected Test  Generalized Sign Test, Generalized Rank T Test, and Generalized Rank Z Test  Sample of Events 
$H_0: BHAR = 0$  BHAR Test  Individual Event  
$H_0: ABHAR = 0$  ABHAR Test and Skewness Corrected Test  Sample of Events 
Most parametric test are advanced forms of the standard ttest, which correct for the ttest's prediction error. The most widely used of these 'scaled' tests are those developed by Patell (1976) and Boehmer, Musumeci and Poulsen (1991). Among the nonparametric tests, the ranktest of Corrado (1989), and the signbased of Cowan (1992) are very popular.
Why different test statistics are needed
An informed choice of test statistic should be based on the research setting and the statistical issues the analyzed data holds. Specifically, eventdate clustering poses a problem leading to (1) crosssectional correlation of abnormal returns, and (2) distortions from eventinduced volatility changes. A crosssectional correlation arises when sample studies focus on (an) event(s) which happened for multiple firms at the same day(s). Eventinduced volatility changes, instead, is a phenomenon common to many event types (e.g., M&A transactions) that becomes problematic when events are clustered. As a consequence, both issues introduce a downward bias in the standard deviation and thus overstate the tstatistic, leading to an overrejection of the null hypothesis.
Comparison of test statistics
There have been several attempts to address these statistical issues. Patell (1976, 1979), for example, tried to overcome the ttest's proneness to eventinduced volatility by standardizing the event window's ARs. He used the dispersion of the estimation interval's ARs to limit the impact of stocks with high return standard deviations. Yet, the test too often rejects the true null hypothesis, particularly when samples are characterized by nonnormal returns, low prices or little liquidity. Also, the test has been found to be still affected by eventinduced volatility changes (Campbell and Wasley, 1993; Cowan and Sergeant, 1996; Maynes and Rumsey, 1993, Kolari and Pynnonen, 2010). Boehmer, Musumeci and Poulsen (1991) resolved this latter issue and developed a test statistic robust against volatilitychanging events. Furthermore, the simulation study of Kolari and Pynnonen (2010) indicates an overrejection of the null hypothesis for both the Patell and the BMP test, if the crosssectional correlation is ignored. Kolari and Pynnonen (2010) developed an adjusted version for both test statistics that accounts for crosssectional correlation.
The nonparametric rank test of Corrado and Zivney (1992) (RANK) applies restandardized event window returns and has proven robust against induced volatility and crosscorrelation. Sign tests are another category of tests. One advantage the tests’ authors stress over the common ttest is that they are apt to also identify small levels of abnormal returns. Moreover, scholars have recommended the use of nonparametric sign and rank tests for applications that require robustness against nonnormally distributed data. Past research (e.g. Fama, 1976) has argued that daily return distributions are more fattailed (exhibit very large skewness or kurtosis) than normal distributions, what suggests the use of nonparametric tests.
Several authors have further advanced the sign and ranked tests pioneered by Cowan (1992) and Corrado and Zivney (1992). Campbell and Wasley (1993), for example, improved the RANK test by introducing an incremental bias into the standard error for longer CARs, creating the CampbellWasley test statistic (CUMRANK). Another NPT is the generalized rank test (GRANK) test with a Student tdistribution with T2 degrees of freedom (T is the number of observations). It seems that GRANK is one of the most powerful instruments for both shorter and longer CARwindows.
The Cowan (1992) sign test (SIGN) is also used for testing CARs by comparing the share of positive ARs close to an event to the proportion from a normal period. SIGN's null hypothesis includes the possibility of an asymmetric return distribution. Because this test considers only the sign of the difference between abnormal returns, associated volatility does not influence in any way its rejection rates. Thus, in the presence of induced volatility scholars recommend the use of BMP, GRANK, SIGN.
Most studies have shown that if the focus is only on singleday ARs, the means of all tests stick close to zero. In the case of longer event windows, however, the mean values deviate from zero. Compared to their nonparametric counterparts, the Patell and the BMPtests produce means that deviate quite fast from zero, whereas the standard deviations of all tests gravitate towards zero. For longer event windows, academics recommend nonparametric over parametric tests.
Therefore, the main idea is that in case of longer eventwindows, the conclusions for the tests power should be very carefully drawn because of the many over or underrejections of the null hypothesis. Overall, comparing the different test statistics yields the following insights (see Table 2 for further details):
 Parametric tests based on scaled abnormal returns perform better than those based on nonstandardized returns
 Generally, nonparametric tests tend to be more powerful than parametric tests
 The generalized rank test (GRANK) is one of the most powerful tests for both shorter CARwindows and longer periods
No.  Name  Key Reference  Abbreviation in EST Results  Strengths  Weaknesses 

1  T test 



2  CrossSectional Test  CSect T  
3  TimeSeries Standard Deviation Test  CDA T  
4  Patell Test  Patell (1976)  Patell Z 


5  Adjusted Patell Test  Kolari and Pynnönen (2010)  Adjusted Patell Z 


6  Standardized CrossSectional Test  Boehmer, Musumeci and Poulsen (1991)  StdCSect Z 


7  Adjusted Standardized CrossSection Test  Kolari and Pynnönen (2010)  Adjusted StdCSect Z 


8  Skewness Corrected Test  Hall (1992)  Skewness Corrected T 


9  Jackknife Test  Giaccotto and Sfiridis (1996)  Jackknife T  
10  Corrado Rank Test  Corrado and Zivney (1992)  Rank Z 


11  Generalized Rank Test  Kolari and Pynnönen (2011)  Generalized Rank T 


12  Generalized Rank Test  Kolari and Pynnönen (2011)  Generalized Rank Z 


13  Sign Test  Cowan (1992)  not available 


14  Cowan Generalized Sign Test  Cowan (1992)  Generalized Sign Z  
15  Wilcoxon signedrank Test  Wilcoxon (1945) 

Source: The above presented strengths and weaknesses were compiled from Kolari and Pynnonen (2011)
Formulas, acronyms, and the decision rule applicable to all test statistics
Let $L_1 = T_1  T_0 + 1$ the estimation window length with $T_0$ as the 'earliest' day of the estimation window, and $T_1$ the 'latest' day of the estimation window relative to the event day and $L_2 = T_2  T_1$ the event window length with $T_2$ as the 'latest day' of the event window relative to the event day. Define $N$ as the sample size (i.e. number of events / observations); $S_{AR_i}$ represent the standard deviation as produced by the regression analysis over the estimation window according to the following formula
$$S^2_{AR_i} = \frac{1}{M_i  2} \sum\limits_{t=T_0}^{T_1}(AR_{i,t})^2$$
$M_{i}$ refers to the number of nonmissing (i.e., matched) returns. This standard deviation corresponds to the market model. For other models some adjustment need to be done.
Parametric Tests
[1] T test
Our research app provides test statistics for single firms at each point of time $t$. The Null is: $H_0: AR_{i, t} = 0,$
$$t_{AR_{i,t}}=\frac{AR_{i,t}}{S_{AR_i}}, $$
where $S_{AR_i}$ is the standard deviation of the abnormal returns in the estimation window,
$$S^2_{AR_i} = \frac{1}{M_i2} \sum\limits_{t=T_0}^{T_1}(AR_{i,t})^2.$$
Second, we provide t statistics of the cumulative abnormal returns for each firm. The t statistic und the Null $H_0: CAR_{i} = 0$ is defined as
$$t_{CAR}=\frac{CAR_i}{S_{CAR}},$$
where
$$S^2_{CAR} = L_2 S^2_{AR_i}.$$
[2] CrossSectional Test (Abbr.: CSect T)
A simple test for testing $H_0: AAR = 0$ is given by
$$t_{AAR_t}=\sqrt{N}\frac{AAR_t}{S_{AAR_t}},$$
where $S_{AAR_t}$ is the standard deviation across firms at time $t$:
$$S^2_{AAR_t} =\frac{1}{N1} \sum\limits_{i=1}^{N}(AR_{i, t}  AAR_t)^2.$$
Test statistic for testing $H_0: CAAR = 0$ is given by
$$t_{CAAR}=\sqrt{N}\frac{CAAR}{S_{CAAR}},$$
where $S_{CAAR}$ is the standard deviation of the cumulative abnormal returns across the sample
$$S^2_{CAAR} =\frac{1}{N1} \sum\limits_{i=1}^{N}(CAR_{i}  CAAR)^2.$$
Brown and Warner (1985) showed that the crosssectional test is prone to eventinduced volatility. Thus, the test has low power.
[3] TimeSeries Standard Deviation or Crude Dependence Test (Abbr.: CDA T)
The timeseries standard deviation test uses the entire sample for variance estimation. According to this construction, the timeseries dependence test does not consider unequal variances across observations. We have for the variance estimation:
$$S^2_{AAR} =\frac{1}{M2} \sum\limits_{t=T_0}^{T_1}(AAR_{t}  \overline{AAR})^2,$$
where $[T_0, T_1]$ is the estimation window and
$$\overline{AAR} = \frac{1}{M} \sum\limits_{t=T_0}^{T_1}AAR_{t}.$$
Test statistic for testing $H_0: AAR_t = 0$ is given by$$t_{AAR_t}=\sqrt{N}\frac{AAR_t}{S_{AAR}}.$$
Test statistic for testing $H_0: CAAR = 0$ is given by
$$t_{CAAR}=\frac{CAAR}{\sqrt{T_2  T_1}S_{AAR}}.$$
[4] Patell or Standardized Residual Test (Abbr.: Patell Z)
The Patell test is a widely used test statistic in event studies. In the first step Patell (1976, 1979) suggested to standardize each $AR_i$ before calculating the test statistic by the forecasterror corrected standard deviation.
\begin{equation}SAR_{i,t} = \frac{AR_{i,t}}{S_{AR_{i,t}}} \label{eq:sar}\end{equation}
As the eventwindow abnormal returns are outofsample predictions, Patell adjusts the standard error by the forecasterror:
\begin{equation}S^2_{AR_{i,t}} = S^2_{AR_i} \left(1+\frac{1}{M_i}+\frac{(R_{m,t}\overline{R}_{m})^2} {\sum\limits_{t=T_0}^{T_1}(R_{m,t}\overline{R}_{m})^2}\right)\label{EQ:FESD}\end{equation}
with $\overline{R}_{m}$ as the mean of the market returns in the estimation window. $SAR_{i,t}$ is distributed as a tdistribution with ${M_i2}$ degrees of freedom under the Null. Test statistic for testing $H_0: AAR = 0$ is then given by
$$z_{Patell, t} = \frac{ASAR_t}{S_{ASAR_t}},$$
where $ASAR_t$ is the sum over the sample of the standardized abnormal returns
$$ASAR_t = \sum\limits_{i=1}^N SAR_{i,t},$$
with expectation zero and variance
$$S^2_{ASAR_t} = \sum\limits_{i=1}^N \frac{M_i2}{M_i4}.$$
Test statistic for testing $H_0: CAAR = 0$ is given by
$$z_{Patell}=\frac{1}{\sqrt{N}}\sum\limits_{i=1}^{N}\frac{CSAR_i}{S_{CSAR_i}},$$
with $CSAR$ as the cumulative standardized abnormal returns
$$CSAR_{i} = \sum\limits_{t=T_1+1}^{T_2} SAR_{i,t}$$
with expectation zero and variance
$$S^2_{CSAR_i} = L_2\frac{M_i2}{M_i4}.$$
Under the assumption of crosssectional independence and some other conditions (Patell, 1976), $z_{Patell}$ is standard normal distribution.
[5] Kolari and Pynnönen adjusted Patell or Standardized Residual Test (Abbr.: Adjusted Patell Z)
Kolari and Pynnönen (2010) propose a modification to the Patelltest to account for crosscorrelation of the abnormal returns. Using the standardized abnormal returns ($SAR_{i,t}$) defined as in (EQ: $\ref{eq:sar}$), and defining $\overline r$as the average of the sample crosscorrelation of the estimation period abnormal returns, the test statistic for $H_0: AAR = 0$ of the adjusted Patelltest is
$$z_{Patell, t}=z_{Patell, t} \sqrt{\frac{1}{1 + (N  1) \overline r}},$$
where $z_{patell, t}$ is the Patell test statistic. It is easily seen that if the correlation $\overline r$ is zero, the adjusted test statistic reduces to the original Patell test statistic. Assuming the squareroot rule holds for the standard deviation of different return periods, this test can be used when considering Cumulated Abnormal Returns ($H_0: CAAR = 0$):
$$z_{Patell}=z_{Patell} \sqrt{\frac{1}{1 + (N  1) \overline r}}.$$
[6] Standardized CrossSectional or BMP Test (Abbr.: StdCSect Z)
Similarly, Boehmer, Musumeci and Poulsen (1991) proposed a standardized crosssectional method which is robust to the variance induced by the event. Test statistics on day $t$ ($H_0: AAR = 0$) in the event window is given by
$$z_{BMP, t}= \frac{ASAR_t}{\sqrt{N}S_{ASAR_t}},$$
with $ASAR_t$ defined as for Patelltest [2] and with standard deviation
$$S^2_{ASAR_t} = \frac{1}{N1}\sum\limits_{i=1}^{N}\left(SAR_{i, t}  \frac{1}{N} \sum\limits_{l=1}^N SAR_{l, t} \right)^2.$$
Furthermore, EST API provides the test statistic for testing $H_0: CAAR = 0$ given by
$$z_{BMP}=\sqrt{N}\frac{\overline{SCAR}}{S_{\overline{SCAR}}},$$
where $\overline{SCAR}$ is the averaged standardized cumulated abnormal returns across the $N$ firms, with standard deviation
$$S^2_{\overline{SCAR}} = \frac{1}{N1} \sum\limits_{i=1}^{N} \left(SCAR_i  \overline{SCAR}\right)^2,$$
$$\overline{SCAR} = \frac{1}{N}\sum\limits_{i=1}^{N}SCAR_i$$
and $SCAR_i = \frac{CAR_i}{S_{CAR_i}}$. $S_{CAR_i}$ is the forecast error corrected standard deviation from Mikkelson and Partch (1988). The Mikkelson and Partch correction adjusts for each firm the test statistic for serial correlation in the returns. The correction terms are
 Market Model:
$$S^2_{CAR_i} = S_{AR_i}^2\left(L_i + \frac{L^2_i}{M_i} + \frac{\left(\sum\limits_{t=T_1+1}^{T_2}(R_{m,t}\overline{R}_{m})\right)^2} {\sum\limits_{t=T_0}^{T_1}(R_{m,t}\overline{R}_{m})^2}\right)$$
 Comparison Period Mean Adjusted Model:
$$S^2_{CAR_i} = S_{AR_i}^2\left(L_i + \frac{L^2_i}{M_i}\right)$$
 Market Adjusted Model:
$$S^2_{CAR_i} = S_{AR_i}^2L_i,$$
where $L_i$ is the count of nonmissing return values in the event window and $M_i$ is the count of nonmissing return values in the estimation window for firm $i$. $\overline{R}_{m}$ is the mean of the market returns in the estimation window, see e.g. Patell Test.
[7] Kolari and Pynnönen Adjusted Standardized CrossSectional or BMP Test (Abbr.: Adjusted StdCSect Z)
Kolari and Pynnönen (2010) propose a modification to the BMPtest to account for crosscorrelation of the abnormal returns. Using the standardized abnormal returns ($SAR_{i,t}$) defined as in the previous section, and defining $\overline r$as the average of the sample crosscorrelation of the estimation period abnormal returns, the test statistic for $H_0: AAR = 0$ of the adjusted BMPtest is
$$z_{BMP, t}=z_{BMP, t} \sqrt{\frac{1 \overline r}{1 + (N  1) \overline r}},$$
where $z_{bmp, t}$ is the BMP test statistic. It is easily seen that if the correlation $\overline r$ is zero, the adjusted test statistic reduces to the original BMP test statistic. Assuming the squareroot rule holds for the standard deviation of different return periods, this test can be used when considering Cumulated Abnormal Returns ($H_0: CAAR = 0$):
$$z_{BMP}=z_{BMP} \sqrt{\frac{1 \overline r}{1 + (N  1) \overline r}}.$$
[8] Skewness Corrected Test (Abbr.: Skewness Corrected T)
The skewnessadjusted ttest, introduced by Hall 1992, corrects the crosssectional ttest for skewed abnormal return distribution. This test is applicable for averaged abnormal return ($H_0: AAR = 0$), the cumulative averaged abnormal return ($H_0: CAAR = 0$), and the averaged buyandhold abnormal return ($H_0: ABHAR = 0$). In the following, we are limited by the situation of cumulative averaged abnormal returns. First, let's revisit the crosssectional standard deviation (unbiased by sample size):
$$S^2_{CAAR} = \frac{1}{N1} \sum\limits_{i=1}^{N}(CAR_i  CAAR)^2.$$
The skewness estimation (unbiased by sample size) is given by:
$$\gamma = \frac{N}{(N2)(N1)} \sum\limits_{i=1}^{N}(CAR_i  CAAR)^3S^{3}_{CAAR} .$$
Furthermore, let
$$S = \frac{CAAR}{S_{CAAR}},$$
then the skewness adjusted test statistic for CAAR is given by
$$t_{skew} = \sqrt{N}\left(S + \frac{1}{3}\gamma S^2 + \frac{1}{27}\gamma^2S^3 + \frac{1}{6N}\gamma\right),$$
which is asymptotically standard normal distributed. For a further discussion on skewness transformation we refer to Hall (1992) and for further discussion on unbiased estimation of the second and third moment we refer to Cramer (1961) or Rimoldini (2013).
[9] Jackknife Test (Abbr.: Jackknife T)
coming soon.
Nonparametric Tests
[10] Corrado Rank Test (Abbr.: Rank Z)
In a first step, the Corrado's (1989) rank test transforms abnormal returns into ranks. Ranking is done for all abnormal returns of both the event and the estimation period. If ranks are tied, the midrank is used. For adjusting on missing values Corrado and Zyvney (1992) suggested a standardization of the ranks by the number of nonmissing values $M_i$ plus 1
$$K_{i, t}=\frac{rank(AR_{i, t})}{1 + M_i + L_i} $$,
where $L_i$ refers to the number of nonmissing (i.e., matched) returns in event window. The rank statistic for testing on a single day ($H_0: AAR = 0$) is then given by
$$t_{rank, t} = \frac{\overline{K}_t  0.5}{S_{\overline{K}}},$$
where $\overline{K}_t = \frac{1}{N_t}\sum\limits_{i=1}^{N_t}K_{i, t}$, $N_t$ is the number of nonmissing returns across firms, and
$$S^2_{\overline{K}} = \frac{1}{L_1 + L_2} \sum\limits_{t=T_0}^{T_2} \frac{N_t}{N}\left(\overline{K}_t  0.5 \right)^2$$.
When analyzing a multiday event period, Campell and Wasley (1993) define the RANKtest considering the sum of the mean excess rank for the event window as follows ($H_0: CAAR = 0$):
$$t_{rank} =\sqrt{L_2} \left(\frac{\overline{K}_{T_1, T_2}  0.5}{S_{\overline{K}}}\right),$$
where $\overline{K}_{T_1, T_2} = \frac{1}{L_2} \sum\limits_{t=T_1 + 1}^{T_2}\overline{K}_t$ is the mean rank across firms and time in event window. By adjusting the last day in the event window $T_2$, one can get a series of test statistics as definded by Campell and Wasley (1993).
Note 1: The adjustment for event induced variance as done by Campell and Wasley (1993) is omitted here and may be implemented in a future version. In such a case, we recommend the GRANKT or GRANKZ test.
[11] Generalized Rank T Test (Abbr.: Generalized Rank T)
In the following steps we assume, for sake of simplicity, that there are no missing values in estimation or event window for each firm. In order to account for possible eventinduced volatility, the GRANK test squeezes the whole event window into one observation, the socalled 'cumulative event day'. First, define the standardized cumulative abnormal returns of firm $i$ in the event window
$$SCAR_{i}=\frac{CAR_{i}}{S_{CAR_{i}}},$$
where $S_{CAR_{i}}$ is the standard deviation of the prediction errors in the cumulative abnormal returns of firm $i$, namely
\begin{equation}S^2_{CAR_{i}} = S^2_{AR_i} \left(L+\frac{L_2}{L_1}+\frac{\sum\limits_{t=T_1+1}^{T_2}(R_{m,t}\overline{R}_{m})^2} {\sum\limits_{t=T_0}^{T_1}(R_{m,t}\overline{R}_{m})^2}\right).\end{equation}
The standardized CAR value $SCAR_{i}$ has an expectation of zero and approximately unit variance. To account for eventinduced volatility $S_{CAR_{i}}$ is restandardized by the crosssectional standard deviation
$$SCAR^*_{i}=\frac{SCAR_{i}}{S_{SCAR}},$$
where
$$S^2_{SCAR}=\frac{1}{N1} \sum\limits_{i=1}^N \left(SCAR_{i}  \overline{SCAR} \right) \quad \text{ and } \quad \overline{SCAR} = \frac{1}{N} \sum\limits_{i=1}^N SCAR_{i}.$$
By construction $SCAR^*_{i}$ has again an expectation of zero with unit variance. Now, let's define the generalized standardized abnormal returns ($GSAR$):
$$GSAR_{i, t} = \left\{ \eqalign{ SCAR^*_i &\text{ for t in event window} \ SAR_{i ,t} &\text{ for t in estimation window}} \right.$$
The CAR window is also considered as one time point, the other time points are considered GSAR is equal to the standardized abnormal returns. Define on this $L_1 + 1$ points the standardized ranks:
$$K_{i, t}=\frac{rank(GSAR_{i, t})}{L_1 + 2}0.5$$
Then the generalized rank tstatistic for testing $H_0: CAAR = 0$ is defined as:
$$t_{grank}=Z\left(\frac{L_1  1}{L_1  Z^2}\right)^{1/2}$$
with
$$Z=\frac{\overline{K_{0}}}{S_{\overline{K}}},$$
$t=0$ indicates the cumulative event day, and
$$S^2_{\overline{K}}=\frac{1}{L_1 + 1}\sum\limits_{t \in CW}\frac{N_t}{N}\overline{K}_t^2$$
with CW representing the combined window consisting of estimation window and the cumulative event day, and
$$\overline{K}_t=\frac{1}{N_t}\sum\limits_{i=1}^{N_t}K_{i, t}.$$
$t_{grank}$ is tdistributed with $L_1  1$ degrees of freedom.
Formulas testing on a single day ($H_0: AAR = 0$) are straightforward from the ones shown above.
[12] Generalized Rank Z Test (Generalized Rank Z)
Using some facts about statistics on ranks, we get the standard deviation of $\overline{K_{0}}$
$$S^2_{\overline{K_{0}}} =\frac{L_1}{12N(L_1 + 2)}.$$
By this calculation, the following test statistic can be defined
$$z_{grank} = \frac{ \overline{K_{0}} }{ S_{\overline{ K_{0} } } } = \sqrt{ \frac{12N(L_1+ 2)}{L_1}} \overline{K_{0}},$$
which converges under null hypothesis quickly to the standard normal distribution as the firms $N$ increase.
[13] Sign Test (Abbr.: not available in our apps)
This sign test has been proposed by Cowan (1991) and builds on the ratio of positive cumulative abnormal returns $\hat{p}$ present in the event window. Under the null hypothesis, this ratio should not significantly differ from 0.5.
$$t_{sign}= \sqrt{N}\left(\frac{\hat{p}0.5}{\sqrt{0.5(10.5)}}\right)$$
[14] Cowan Generalized Sign Test (Abbr.: Generalized Sign Z)
Under the Null Hypothesis of no abnormal returns, the number of stocks with positive abnormal cumulative returns ($CAR$) is expected to be in line with the fraction $\hat{p}$ of positive $CAR$ from the estimation period. When the number of positive $CAR$ is significantly higher than the number expected from the estimated fraction, it is suggested to reject the Null Hypothesis.
The fraction $\hat{p}$ is estimated as
$$\hat{p}=\frac{1}{N}\sum\limits_{i=1}^{N}\frac{1}{L_1}\sum\limits_{t=T_0}^{T_1}\varphi_{i, t},$$
where $\varphi_{i,t}$ is $1$ if the sign is positive and $0$ otherwise. The Generalized sign test statistic ($H_0: CAAR = 0$) is
$$z_{gsign}=\frac{(wN\hat{p})}{\sqrt{N\hat{p}(1\hat{p})}},$$
where $w$ is the number of stocks with positive cumulative abnormal returns during the event period. For the test statistic, a normal approximation of the binomial distribution with the parameters $\hat{p}$ and $N$, is used.
Note 1: This test is based on the paper of Cowan, A. R. (1992).
Note 2: EST provides GSIGN test statistics also for single days ($H_0: AAR = 0$) in the event time period.
Note 3: The GSIGN test is based on the traditional SIGN test where the null hypothesis assumes a binomial distribution with parameter $p=0.5$ for the sign of the $N$ cumulative abnormal returns.
Note 4: If $N$ is small, the normal approximation is inaccurate for calculating the pvalue, in such case we recommend to use the binomial distribution for calculating the pvalue.
[15] Wilcoxon Test (Abbr.: Wilcoxon Z)
The Wilcoxon rank test can be regarded as an extension of the GSIGN test, since it considers both the sign and the magnitude of abnormal returns. This test assumes that none of the absolute values are equal and are nonzero. Let
$$W_t = \sum\limits_{i=1}^N rank(A_{i,t})^+$$
where $rank(A_{i,t})$ is the positive rank of the absolute value of abnormal returns $A_{i, t}$ at time point $t$ for firm i. The test statistic for testing ($H_0: AAR = 0$) is then defined as
$$z_{wilcoxon, t} = \frac{W  N(N  1) / 4}{\sqrt{(N(N+1)(2N+1)/12)}}$$
for large N. In the case for testing the cumulative averaged abnormal return ($H_0: CAAR = 0$), we add the CAAR value for each firm i to the abnormal returns in the event window and do the same calculations as for AAR.