Predictor for Company Mergers

(Bachelorthesis Anne)

This project contains python classes for text mining and machine learning models to recognize company mergers in news articles. The csv file classification_labelled_corrected.csv contains 1497 labeled news articles from and is used for the machine learning models.

Best F1 score results:

  • Support Vector Machines Classifier (SVM):
    F1 score: 0.894
    Best parameters set found on development set: {‘SVC__C’: 0.1, ‘SVC__gamma’: 0.01, ‘SVC__kernel’: ‘linear’, ‘perc__percentile’: 50}

  • Naive Bayes Classifier:
    F1 score: 0.841 (average)
    Parameters: SelectPercentile(100), own Bag of Words implementation, 10-fold cross validation

The complete documentation can be found in the latex document in the thesis folder.

Installation under Windows

$ pip install xy




The scripts can be called separately. You need to enter a valid personal key for before you call To run you need to change the path to the JAVAHOME environment variable in find_companies method.

Author: Anne Lorenz / Datavard AG

Project Status: work in progress