Event studies are generally the first step in a twostep analysis process that aims at identifying the determinants of stock market repsonses to distinct event types. They produce as an outcome abnormal returns (ARs), which are cumulated over time to cumulative abnormal returns (CARs) and then 'averaged'  in the case of so called sample studies  over several observations of identical events to AARs and CAARs  where the second 'A' stands for 'average'. These event study results are then oftentimes used as dependent variables in regression analyses.
Explaining abnormal returns by means of regression analysis, however, is only meaningful if the abnormal returns are significantly different from zero, and thus not the result of pure chance. This assessment wil be made by hypothesis testing. Following general principles of inferential statistics, the null hypothesis ($H_0$) thus maintains that there are no abnormal returns within the event window, whereas the alternative hypothesis ($H_1$) suggests the presence of ARs within the event window. Formally, the testing framework reads as follows:
$$H_0: μ = 0 (1)$$
$$H_1: μ \neq 0 (2)$$
Event studies may imply a hierarchy of calculations, with ARs being compounded to CARs, which can again be 'averaged' to CAARs in crosssectional studies (sometimes also called 'sample studies'). There is a need for significance testing at each of these levels. μ in the abovementioned equations may thus represent ARs, CARs, and CAARs. Let's shortly revisit these three different forms of abnormal return calcuations, as presented in the introduction:
$$AR_{i,t}=R_{i,t}E(R_{i,t}) (3)$$
$$AAR_{t}= \frac{1}{N} \sum\limits_{i=1}^{N}AR_{i,t} (4)$$
$$CAR(t_1,t_2)_{i}=\sum\limits_{i=t_1}^{t_2} AR_{i,t} (5)$$
$$CAAR(t_1,t_2)=\frac{1}{N}\sum\limits_{i=1}^{N}CAR(t_1,t_2) (6)$$
The literature on event study test statistics is very rich, as is the range of significance tests. Generally, significance tests can be grouped in parametric and nonparametric tests (NPTs). Parametric tests assume that individual firm's abnormal returns are normally distributed, whereas nonparametric tests do not rely on any such assumptions. In research, scholars commonly complement a parametric test with a nonparametric tests to verify that the research findings are not due to eg. an outlier (see Schipper and Smith (1983) for an example). Table 1 provides an overview and links to the formulas of the different test statistics.
Table 1: 'Recommended' Significance Tests per Test Level (note: work in progress/ draft)
Null hypothesis tested  Parametric tests  Nonparametric tests  Test level 

$H_0: AR = 0$  AR ttest*  Individual Event  
$H_0: AAR = 0$  AAR ttest, Patelltest, BMPtest, Jtest  GRANK  Sample of Events 
$H_0: CAR = 0$  GRANK, SIGN  Individual Event  
$H_0: CAAR = 0$  Sample of Events 
N.B.: *labeled test results are included in the output of EventStudyTools' abnormal return calculator
Nonparametric test statistics ground on the classic ttest. Yet, scholars have further developed the test to correct for the ttest's prediction error. The most widely used of these 'scaled' tests are those developed by Patell (1976) and Boehmer, Musumeci and Poulsen (1991). Among the nonparametric tests, the ranktest of Corrado (1989), and the signbased of Cowan (1992) are very popular. EST provides these test statistics (soon) in its analysis results reports.
Why different test statistics are needed
The choice of test statistic should be informed by the research setting and the statistical issues the analyzed data holds. Specifically, eventdate clustering poses a problem leading to (1) crosssectional correlation of abnormal returns, and (2) distortions from eventinduced volatility changes. Crosssectional correlation arises when sample studies focus on (an) event(s) which happened for multiple firms at the same day(s). Eventinduced volatility changes, instead, is a phenomenon common to many event types (e.g., M&A transactions) that becomes problematic when events are clustered. As consequence, both issues introduce a downward bias in the standard deviation and thus overstate the tstatistic, leading to an overrejection of the null hypothesis.
Comparison of test statistics
There have been several attempts to address these statistical issues. Patell (1976, 1979), for example, tried to overcome the ttest's proneness to eventinduced volatility by standardizing the event window's ARs. He used the dispersion of the estimation interval's ARs to limit the impact of stocks with high return standard deviations. Yet, the test too often rejects the true null hypothesis, particularly when samples are characterized by nonnormal returns, low prices or little liquidity. Also, the test has been found to be still affected by eventinduced volatility changes (Campbell and Wasley, 1993; Cowan and Sergeant, 1996; Maynes and Rumsey, 1993, Kolari and Pynnonen, 2010). Boehmer, Musumeci and Poulsen (1991) resolved this latter issue and developed a test statistic robust against volatilitychanging events.
The nonparametric rank test of Corrado and Zivney (1992) (RANK) applies restandardized event window returns and has proven robust against induced volatility and crosscorrelation. Sign tests are another category of tests. One advantage the tests’ authors stress over the common ttest is that they are apt to also identify small levels of abnormal returns. Moreover, scholars have recommend the used of nonparametric sign and rank tests for applications that require robustness against nonnormally distributed data. Past research (e.g. Fama, 1976) has argued that daily return distributions are more fattailed (exhibit very large skewness or kurtosis) than normal distributions, what suggests the use of nonparametric tests.
Several authors have further advanced the sign and ranked tests pioneered by Cowan (1992) and Corrado and Zivney (1992). Campbell and Wasley (1993), for example, improved the RANK test by introducing an incremental bias into the standard error for longer CARs, creating the CampbellWasley test statistic (CUMRANK). Another NPT is the generalized rank test (GRANK) test with a Student tdistribution with T2 degrees of freedom (T is the number of observations). It seems that GRANK is one of the most powerful instruments for both shorter and longer CARwindows.
The Cowan (1992) sign test (SIGN) is also used for testing CARs by comparing the share of positive ARs close to an event to the proportion from a normal period. SIGN's null hypothesis includes the possibility of asymmetric return distribution. Because this test considers only the sign of the difference between abnormal returns, associated volatility does not influence in any way its rejection rates. Thus, in the presence of induced volatility scholars recommend the use of BMP, GRANK, SIGN.
Most studies have shown that if the focus is only on single day ARs, the means of all tests stick close to zero. In the case of longer event windows, however, the mean values deviate from zero. Compared to their nonparametric counterparts, the Patell and the BMPtests produce means that deviate quite fast from zero, whereas the standard deviations of all tests gravitate towards zero. For longer event windows, academics recommend nonparametric over parametric tests.
Therefore, the main idea is that in case of longer eventwindows, the conclusions on the tests power should be very carefully drawn because of the many over or underrejections of the null hypothesis. Overall, comparing the different test statistics yields the following insights:
 Parametric tests based on scaled abnormal returns perform better than those based on nonstandardized returns
 Generally, nonparametric tests tend to be more powerful than parametric tests
 The generalized rank test (GRANK) is one of the most powerful test for both shorter CARwindows and longer periods
Table 2 provides a short summary of the individual test statistics discussed above.
Table 2: Summary Overview of Main Test Statistics
#  Name [synonym] 
Key Reference  Type (P/NP) 
Antecedent  Strengths  Weaknesses 

1  ttest [ORDIN] 
P 



2  Standardized residual test [Patelltest] 
Patell (1976)  P  ORDIN 


3 
Standardized crosssectional test 
Boehmer, Musumeci and Poulsen (1991)  P  Patelltest 


4  Adjusted BMPtest [Jtest] 
Kolari and Pynnönen (2010)  P  BMPtest 


5  Generalized sign test [SIGN] 
Cowan (1992)  NP 



6 
Rank test 
Corrado and Zivney (1992)  NP 


7  CampbellWasley test statistic [CUMRANK] 
Campell and Wasley (1993)  NP  RANK 


8  Generalized rank test [GRANK] 
Kolari and Pynnonen (2010)  NP  RANK 


9  Generalized sign test [GSIGN] 
NP  SIGN  
10  Wilcoxon signedrank test  Wilcoxon (1945)  NP 

Notes: P = parametric, NP = nonparametric; Insights about strenghts and weaknesses were compiled from Kolari and Pynnonen (2011)
Formulas, acronyms, and the decision rule applicable to all test statistics
$T= t_2 t_1+1$ (days in the event window), with $t_1$ denoting the 'earliest' day of the event window, and $t_2$ the 'latest' day of the event window; $N$ = sample size (i.e., number of events/ observations); $EW$ = Estimation Window, with $EW_{min}$ denoting the 'earliest' day of the estimation window, and $EW_{max}$ the 'latest' day of the estimation window; $\hat{\sigma}^2_{AR_i}$, resp. $\hat{\sigma}_{AR_i}$ represent the variance, resp. the standard deviation as produced by the regression analysis over the estimation window according to the following formula.
$$\hat{\sigma}^2_{AR_i} = \frac{1}{M_idF} \sum\limits_{t=EW_{min}}^{EW_{max}}(AR_{i,t})^2$$
$M_{i}$ refers to the number of nonmissing (i.e., matched) returns and $dF$ to the degrees of freedom (for the market model, $dF$ = N2); Please note: If you use the ARC of this website, the 'analysis report'CSV provides you with $\hat{\sigma}_{AR_i}$ for each event/ observation.
The decision rule for all test statistics mandates the rejection of the null hypothesis with a confidence level of $1\alpha$ when the test statistic is larger than the critical value from the ttable (i.e., if $t(AR_{i,t})>t_c(\alpha)$).
Please note: There are alternative aproaches to calculate the standard deviations for CARs and CAARs (see, for example, Campbell, Lo and MacKinleay (1997)).
The crosssectional ttest does not account for eventinduced variance and thus overstates significance levels. Patell (1976, 1979) suggested to correct for this overstatement by first standardizing each $AR_i$ before calculating the test statistic using the standardized $AR_i$.
$$SAR_{i,t} = \frac{AR_{i,t}}{S(AR_i)} $$
As the eventwindow abnormal returns are outofsample predictions, Patell adjusts the standard error by the forecasterror:
$$S(AR_{i,t}) = \hat{\sigma}_{AR_i} \sqrt{1+\frac{1}{M_i}+\frac{(R_{m,t}R_{m,EW})^2} {\sum\limits_{t=EW_{min}}^{EW_{max}}(R_{m,t}R_m)^2}}$$
'Cumulating' these standardized abnormal returns over time gives us:
$$CSAR_{i,(t_1, t_2)} = \sum\limits_{t=t1}^{t2} \frac{AR_{i,t}}{S(AR_i)} $$
Assuming a Student's tdistribution wit ${M_id}$ degrees of freedom (Campbell, Lo, MacKinlay (1997), the expected value of $CSAR_i(t_1,t_2)$ is zero and the standard deviation assumes the following value:
$$\hat{\sigma}_{CSAR_i} = \sqrt{T\frac{M_id}{M_i2d}}$$
The tstatistic reads as:
$$t_{Patell}=\frac{1}{\sqrt{N}}\sum\limits_{i=1}^{N}\frac{CSAR_i}{\sigma_{CSAR_i}}$$
Similarly, Boehmer, Musumeci and Poulsen (1991) proposed a stadardized crosssectional method which is robust to the variance induced by the event. It grounds on the the standardized residual tests
$$\overline{CSAR(t_1,t_2)} = \frac{1}{N}\sum\limits_{i=1}^{N}CSAR(t_1,t_2)_i$$
$$\hat{\sigma}(\overline{CSAR(t_1,t_2)}) = \sqrt{\frac{1}{N(N1)}\sum\limits_{i=1}^{N}(CSAR(t_1,t_2)(\overline{CSAR(t_1,t_2)})^2)}$$
$$t_{BMP}= \frac{\overline{CSAR(t_1,t_2)}}{\hat{\sigma}(\overline{CSAR(t_1,t_2)}}$$
[4] Jtest (adjusted BMPtest)
Kolari and Pynnönen (2010) propose a modification to the BMPtest to account for crosscorrelation of the abnormal returns. Using the standardized abnormal returns ($SAR_{i,t}$) defined as in the previous section, and defining $\bar r$as the average of the sample crosscorrelation of the estimation period residuals, the Jtest can be written as:
$$t_{J}=\frac{\overline{SAR}_{i,0}\sqrt N}{\hat{\sigma}_{SAR} \sqrt{1+(N1)\bar r}}$$
Where $\overline{SAR}_{i,0}$ is the mean of the $SAR$ at the event date, $N$ the number of firms, and the estimated standard deviation $\hat{\sigma}_{SAR}$ is defined as $\hat{\sigma}_{SAR}=\sqrt{\frac{1}{N1}\sum\limits_{i=1}^{N}(SAR_{i,0}\overline{SAR}_{i,0})^2}$
Assuming the squareroot rule holds for the standard deviation of different return periods, this test can be used when considering Cumulated Abnormal Returns. While the average crosscorrelation remains unchanged, the $SAR_{i,0}$ should be replaced by $CSAR_{i}(t_1,t_2)$ in the estimation.
This sign test has been proposed by Cowan (1991) and builds on the ratio of positive cumulative abnormal returns $p^{+}_0$ present in the event window. Under the null hypothesis, this ratio should not significantly differ from 0.5.
$$t_{SIGN}= \frac{p^+_00.5}{\sqrt{0.5(10.5)/N}}$$
In a first step, the Corrado's (1989) rank test transforms abnormal returns into ranks. This ranking is done for each event and stock combination and for all abnormal returns of both the event and the estimation window ('tied ranks').
$$K_{i, t}=rank(AR_{i, t})$$
Thereafter, the average rank is calculated as 0.5 plus half the number of returns observed in the event ($L_2$) and the estimation window ($L_1$).
$$AK_{i, L_1+L_2}=0.5+\frac{(L_1+L_2)}{2}$$
The tstatistic then denotes as:
$$T_{Corrado}=\frac{1}{\sqrt{N}}\sum\limits_{i=1}^{N}(K_{i, t}AK_{i, L_1+L_2})/\hat{\sigma_U}$$
The standard deviation is calculated as follow. $l_{1b}$ denotes the first day in the estimation and $l_ {2e}$ the last day of the event window.
$$\hat{\sigma_U}=\sqrt{\frac{1}{L_1+L_2}\sum\limits_{i=l_{1b}}^{l_{2e}}(\frac{1}{\sqrt{N}}\sum\limits_{i=1}^{N}(K_{i, t}AK_{i, L_1+L_2}))^2}$$
When analyzing multiday event periods, Campell and Wasley (1993) define the RANKtest considering the sum of the mean excess rank for the event window as follows:
$$t_{CRANK}=\frac{\sum\limits_{\tau=t}^{t+L_2}\overline{K}_\tau}{\sqrt{\sum\limits_{\tau=t}^{t+L_2}\hat{\sigma}^2(\overline{K}_{\tau})}}$$
In this equation, $t$ is the starting date of the event period and $L_2$ is the number of days in the event window as before. $\overline{K}_\tau$ is the mean excess rank on day $\tau$, defined as $\overline{K}_\tau=\frac{1}{N}\sum\limits_{i=1}^{N}(K_{i,\tau}AK_i)$, where $K_{i,\tau}$ is the rank of the abnormal return of firm $i$ on period $\tau$ and $AK_i$ is the average rank of $i$ as defined in the RANKtest. Finally, $\hat{\sigma}^2(\overline{K}_{\tau})$ is defined as the variance used in the RANKtest.
The GRANK test squeezes the whole event window into one observation, the socalled 'cumulative event day'. Thus, the demeaned standardized abnormal ranks of the generalized abnormal returns read as below. For the definition of $L_1$, see the RANK test.
$$K_{i, t}=\frac{rank(GSAR_{i, t})}{L_1+1}0.5$$
The generalized rank tstatistic is then defined as:
$$t_{grank}=Z(\frac{L_12}{L_11Z^2})^{1/2}$$
with
$$Z=\frac{\overline{K_{0}}}{\sigma_{\overline{K}}}$$
$$\sigma_{\overline{K}}=\sqrt{\frac{1}{L_1}\sum\limits_{t \in CW}\frac{n_t}{n}\overline{K}_t^2}$$
with CW representing the combined window consisting of estimation window and the cumulative event day, and
$$\overline{K}_t=\frac{1}{n_t}\sum\limits_{i=1}^{n_t}K_{i,t}$$
Under the Null Hypothesis of no abnormal returns, the number of stocks with positive abnormal cumulative returns ($CAR$) is expected to be in line with the fraction ($\hat{p}_{EW}^{+}$) of positive $CAR$ from the estimation period. When the number of positive $CAR$ is significantly higher than the number expected from the estimated fraction, it is suggested to reject the Null Hypothesis.
The fraction $\hat{p}_{EW}^{+}$ is estimated as $\hat{p}_{EW}^{+}=\frac{1}{N}\sum\limits_{i=1}^{N}\frac{1}{T_i}\sum\limits_{t=1}^{T_i}\varphi_{i,t}$, where $\varphi_{i,t}$ is $1$ if the sign is positive and $0$ otherwise.
The Generalized sign test statistic is
$$Z_G=\frac{(wN\hat{p}_{EW}^{+})}{\sqrt{N\hat{p}_{EW}^{+}(1\hat{p}_{EW}^{+})}}$$
Where $W$ is the number of stocks with positive $CAR$ during the event period.
Comment: The GSIGN test is based on the traditional SIGN test where the null hypothesis assumes a binomial distribution with parameter $p=0.5$ for the sign of the $N$ cumulative abnormal returns.
References and further readings
Boehmer, E., Musumeci, J. and Poulsen, A. B. 1991. 'Eventstudy methodology under conditions of eventinduced variance'. Journal of Financial Economics, 30(2): 253272.
Campbell, C. J. and Wasley, C. E. 1993. 'Measuring security performance using daily NASDAQ returns'. Journal of Financial Economics, 33(1): 7392.
Campbell, J., Lo, A., MacKinlay, A.C. 1997. 'The econometrics of financial markets'. Princeton: Princeton University Press.
Corrado, C. J. and Zivney, T. L. 1992. 'The specification and power of the sign test in event study hypothesis test using daily stock returns'. Journal of Financial and Quantitative Analysis, 27(3): 465478.
Cowan, A. R. (1992). 'Nonparametric event study tests'. Review of Quantitative Finance and Accounting, 2: 343358.
Cowan, A. R. and Sergeant, A. M. A. 1996. 'Trading frequency and event study test specification'. Journal of Banking and Finance, 20(10): 17311757.
Fama, E. F. 1976. Foundations of Finance. New York: Basic Books.
Kolari, J. W. and Pynnonen, S. 2010. 'Event study testing with crosssectional correlation of abnormal returns'. Review of Financial Studies, 23(11): 39964025.
Maynes, E. and Rumsey, J. 1993. 'Conducting event studies with thinly traded stocks'. Journal of Banking and Finance, 17(1): 145157.
Patell, J. A. 1976. 'Corporate forecasts of earnings per share and stock price behavior: Empirical test'. Journal of Accounting Research, 14(2): 246276.
Schipper, K. and Smith, A. 1983. 'Effects of recontracting on shareholder wealth: The case of voluntary spinoffs.' Journal of Financial Economics, 12(4): 437467.
Wilcoxon, F. (1945). 'Individual comparison by ranking methods'. Biometrics Bulletin, 1(6): 8083.