# Predictor for Company Mergers (Bachelorthesis Anne) This project contains python classes for text mining and machine learning models to recognize company mergers in news articles. The csv file *classification_labelled_corrected.csv* contains 1497 labeled news articles from *Reuters.com* and is used for the machine learning models. **Best F1 score results**: * **Support Vector Machines Classifier (SVM):** F1 score: 0.894 Best parameters set found on development set: {'SVC\__C': 0.1, 'SVC\__gamma': 0.01, 'SVC\__kernel': 'linear', 'perc\__percentile': 50} * **Naive Bayes Classifier**: F1 score: 0.841 (average) Parameters: SelectPercentile(100), own Bag of Words implementation, 10-fold cross validation The complete documentation can be found in the latex document in the *thesis* folder. ## Installation under Windows ```bash $ pip install xy ``` ### Requirements pandas==0.20.1 nltk==3.2.5 webhoseio==0.5 numpy==1.14.0 graphviz==0.9 scikit_learn==0.19.2 ## Usage The scripts can be called separately. You need to enter a valid personal key for *webhose.io* before you call *Requester.py*. To run *NER.py* you need to change the path to the *JAVAHOME* environment variable in *find_companies* method. --- **Author:** Anne Lorenz / Datavard AG **Project Status:** work in progress