Instructions News Analytics

News analytics typically involves several steps. First, you need to build a database of press releases or other news items. Second, you typically want to lay out your dataset within time by identifying the announcement dates of each news item. Finally, you characterize your news items along the dimensions you are interested in - e.g., through text analysis. Finally, you may quantify the capital market responses to the captured news. This website's research apps and services support you in all of these steps.

(1) Micro Services for Database Building/Data Scraping:
> Scraping and consolidation of press releases into a CSV file (Service 1, Service 2). These micro services will help you create a CSV file containing the press releases of a set of firms you specify. This CSV will allow you to use the EDI and ETC tools of this website and may thus serve as an empirical basis for your research project. The above-listed micro-service is able to scrape from different sources, such as company websites or third party websites.
> Collection of financial data and preparation of CSV files: This micro service aims at simplifying the use of EST's abnormal return calculator (ARC). Based on the events you want to study and the parameters you choose for your event study, the contractor will provide you with the time series of firm and stock market data in the format you will need for the ARC.

(2) Event Date Identifier (EDI):
The EDI is a regular expression-based tool to identify dates in texts. There is practically no limit to the number of text strings you can process. Similarly, the maximum length of text strings is set very high to allow for most applications. All you need to do to start the EDI is to either input your text into the dialogue box and press the 'continue' button, or upload a CSV file including your texts. Both will trigger the EDI. In the first case, the EDI will directly prompt the dates it has found, whereas in the latter case, it will produce a CSV file holding the text IDs and the dates found in the individual text right to the IDs.
> If you want to use the EDI in batch-mode, please provide the system with a CSV file of the following structure: Text ID; Issuer; "Text-String" (review and use this CSV file if you need an example).

(3) Text Analyzer/Event Type Categorizer (CATA):
The text analyzer is a powerful and versatile content-analysis tool which is able to process large amounts of text. It allows you to apply a categorization scheme of your choice to a large corpus of individual texts (e.g., press releases) for the purpose of text scaling (e.g., Laver, Benoit et al. 2003) and text categorization. With these features, the text analyzer is a server-side alternative to existing CATA tools such as Yoshicoder or Harvard University's General Inquirer. It supports you in several steps implied by content analysis research and integrates well with the other tools of this website. For further references to the topic of computer-aided text analysis please refer to our the content analysis website of the University of Georgia, or this website which focuses on the sentiments of texts (further links: 1).

CATA allows you to apply a distance-condition to the counting of keyphrases. If you add "income{1,7}increase*" (without the ""-signs) to your keyword-list, the counter will increase by one if the words income and increase* appear in a range of 1 to 7 words from each other (both income first and increase first). The wildcard * allows in the presented case e.g., for increase, increases, or increased. 

For exemplary in- and output-CSVs (i.e., texts, analysis scheme, and results at the text and event category-level) that illustrate the analysis capability, please refer to this zip-file. Table 1 summarizes the data items in the different in- and output files. Note: You can zip the individual CSVs before uploading to reduce your upload time.

(4) CATA_Filter (CATA_Filter):
This tool allows you to generate a sub sample from a larger (CATA) file holding press releases. Simply upload the source file and a csv file holding the IDs of the releases you are interested in. The system will then generate for you a CSV only holding the sub sample. The CATA_Filter app also allows you to generate individual CSV files for the sub-sample. If you check this option, the filter app will produce you a zip archive holding one csv file per press releases.
 

Table 1: Structure of CATA's In- and Output CSV Files

In-/Output File Name Data Items
Input Text CSV Text ID; Firm; Text; Cut-off value
Analysis scheme Category ID; Category Label; S(caling)/C(ompeting)C(ategory); Columns with keywords
Output Text-level results Text String ID; Total Text String Length; Cut-off; Considered Text String Length; Considered words; Likely Event Category; Likelihood KPI1; Likelihood KPI2; Actual keywords occurrence; Expected keywords occurrence; Counter Keyword list <Label Category 1>;Counter Keyword list <Label Category 2>
Category-level results Category ID; Label; Scaling (S) / Competing Categories (CC); Benchmark Level (= Normal Level of Occurrence); Number of Texts Assigned