## News Classification Algorithms

This website describes the different text classification algorithms implemented in our CATA tool.

The overall challenge of press release classificaion can be described as follows:

Given:

• Collection (corpus) of documents D
• Set of classes C
• for each class in C there exists set of term characterizing that class

terms are unique within one class, but terms in can appear in any number of classes

Goal:

• For each document Di predict corresponding class Cj
• This is the multiclass classification (not the multlabel one)

because "Eventually, we need multiclass since each press release represents one corporate event,

typically.

Subgoal:

• Calculate probability distribution of documents over classes using terms of classes
• It is document-class matrix M, |D| x |C|, where M_{ij} is probability of document i to be from the class j
• For each document select a class with the maximal probability

Preliminary estimation of quality of ALG1

estimation of binary prediction

using JV dataset jv_train

fixed output: (3025, 13)

2 classes:    precision:  90.4% recall:  75.7% fscore:  82.4%

3 classes:    precision:  90.7% recall:  69.4% fscore:  78.6%

4 classes:    precision:  91.0% recall:  64.1% fscore:  75.2%

5 classes:    precision:  90.0% recall:  53.9% fscore:  67.5%

6 classes:    precision:  90.5% recall:  52.6% fscore:  66.6%

7 classes:    precision:  90.5% recall:  52.1% fscore:  66.1%

8 classes:    precision:  90.8% recall:  51.3% fscore:  65.6%

9 classes:    precision:  90.9% recall:  50.3% fscore:  64.7%

10 classes:    precision:  90.8% recall:  48.9% fscore:  63.5%

11 classes:    precision:  91.1% recall:  48.4% fscore:  63.2%

15 classes:    precision:  91.8% recall:  47.0% fscore:  62.1%

20 classes:    precision:  91.7% recall:  45.2% fscore:  60.6%

36 classes:    precision:  91.8% recall:  44.3% fscore:  59.8%

---------------------------------------------------------------------

artificial cases:

2 classes:

keywords for the control

keywords random from the corpus

train

precision:  91.2% recall:  93.3% fscore:  92.2%

test

precision:  91.5% recall:  94.5% fscore:  93.0%

1 class in the input for the CATA:

train

precision: 100.0% recall:  50.9% fscore:  67.4%

test

precision: 100.0% recall:  49.6% fscore:  66.3%

----------------------------------------------------

Summary of the experiments with ALG1 (as a binary classifier):

Good:

- ALG1 shows high and stable precition (91%)

- recall drops with new classes added

- recall strongly depends from the samples from other classes

---------------------------------------------------------

Due to we use ALG1 in practice as a multiclass clasifier (not as a binary classifier),

we need to test it also on the multiclass dataset.

Comparison results binary dataset (2newsgroups) and  multiclass dataset (20newsgroups):

Evaluation of CATA as binary classifier on binary dataset:

2news_test 2classes:    precision:  84.9% recall:  86.6% fscore:  85.8%

2news_test 20classes:   precision:  92.8% recall:  26.0% fscore:  40.6%

Evaluation of CATA as binary classifier on multiclass dataset:

20news_test 20classes:  precision:  35.1% recall:  26.6% fscore:  30.3%

20news_test 2classes:   precision:   6.7% recall:  82.1% fscore:  12.4%

Evaluation of CATA as multiclass classifier on multiclass dataset (averaged metrics):

macro averaging:

precision:  44.9% recall:  44.0% fscore:  43.8%

micro averaging:

precision:  46.9% recall:  46.9% fscore:  46.9%

weighted averaging:

precision:  47.5% recall:  46.9% fscore:  46.6%

Summary:
- on binary 2news dataset ALG1 shows results similar to results with JV binary dataset
- ALG1 on multiclass dataset shows worse results with average precision of 47.5% and recall of 46.7%

--------------------------------------------------------------------------------

Results with Tfidf algorithm (compared with ALG1)

20news_test (16 keywords, 1-ngrams)

CATA1 weighted stats:       precision:  47.5% recall:  46.9% fscore:  46.6%

Tfidf ngrams=1:             precision:  59.3% recall:  54.8% fscore:  55.6%

20news_test (32 keywords, 1-2-ngrams)

random benchmark            precision:   5.2% recall:   4.5% fscore:   4.7%

ALG1 weighted stats:        precision:  52.3% recall:  52.1% fscore:  51.4%

Tfidf ng=2                  precision:  61.5% recall:  59.0% fscore:  59.0%

Tfidf ng=1                  precision:  63.4% recall:  60.6% fscore:  60.8%

Tfidf ng=1,fake             precision:  70.0% recall:  55.0% fscore:  60.5%

Tfidf ng=2,stemm            precision:  63.2% recall:  60.6% fscore:  60.8%

Tfidf ng=1,stemm            precision:  64.6% recall:  61.9% fscore:  62.2%

Tfidf ng=1,stemm,fake       precision:  69.3% recall:  58.2% fscore:  62.3%

cut with mincut=0.15:

Tfidf ng=1,stemm,fake       precision:  70.2% recall:  57.9% fscore:  62.4%

cut with mincut=0.30:

Tfidf ng=1,stemm,fake       precision:  83.5% recall:  37.9% fscore:  50.7%

cut with mincut=0.45:

Tfidf ng=1,stemm,fake       precision:  93.0% recall:   9.5% fscore:  16.6%

Summary on 20newsgroups dataset (with labels):

- new experiment setup:

calculate quality of prediction of classes in multiclass dataset

use mixed 20 classes dataset, predict every class and calculate averaged metrics

- random benchmark  precision:   5.2% recall:   4.5% fscore:   4.7%

- ALG1              precision:  52.3% recall:  52.1% fscore:  51.4%

- Tfidf tuned       precision:  70.2% recall:  57.9% fscore:  62.4%

- from literature we know that supervised model trained on labeled 20news train dataset

predicts on test dataset with precision and fscore > 80%

---------------------------------------------------------------------------------------------

SnP full dataset JV labeled (test part): 57243 samples with 858 JV labeled

we know labels only for the JV class, thus metrics concern only this class

ALG1 makes errors on this dataset: returns only first 28 records and breaks

random benchmark            precision:   1.2% recall:   0.9% fscore:   1.1%

Tfidf ng=1                  precision:   7.8% recall:  48.4% fscore:  13.5%

Tfidf ng=2                  precision:   8.6% recall:  53.6% fscore:  14.9%

Tfidf ng=3                  precision:   8.8% recall:  53.0% fscore:  15.0%

Tfidf ng=2,fake32           precision:  20.7% recall:  15.6% fscore:  17.8%

Tfidf ng=3,fake32           precision:  20.7% recall:  15.5% fscore:  17.7%

Tfidf ng=2,stemm            precision:   9.9% recall:  64.8% fscore:  17.1%

Tfidf ng=3,stemm            precision:  10.0% recall:  64.3% fscore:  17.2%

Tfidf ng=2,stemm,fake16     precision:  22.5% recall:  19.2% fscore:  20.7%

Tfidf ng=3,stemm,fake16     precision:  22.7% recall:  19.2% fscore:  20.8%

Tfidf ng=2,stemm,fake32     precision:  20.5% recall:  23.0% fscore:  21.6%

Tfidf ng=3,stemm,fake32     precision:  20.4% recall:  22.7% fscore:  21.5%

Tfidf ng=2,stemm,fake128    precision:  21.6% recall:   3.7% fscore:   6.4%

cut with mincut=0.15:

Tfidf ng=2,stemm,fake32     precision:  20.7% recall:  23.0% fscore:  21.8%

cut with mincut=0.30:

Tfidf ng=2,stemm,fake32     precision:  35.6% recall:  12.6% fscore:  18.6%

cut with mincut=0.45:

Tfidf ng=2,stemm,fake32     precision:  40.0% recall:   0.2% fscore:   0.5%

--------

Summary on SnP500 dataset and "Alliances and joint ventures" labels:

- experiment setup:

calculate quality of prediction of classes in multiclass dataset

use full SnP500 dataset, predict every class, but calculate metrics using JV labels

- random benchmark  precision:   1.2% recall:   0.9% fscore:   1.1%

- ALG1              NA

- Tfidf tuned       precision:  20.7% recall:  23.0% fscore:  21.8%