From a866390ae3696e8a3bbc7e14a909d20111c0641f Mon Sep 17 00:00:00 2001 From: Anne Lorenz Date: Thu, 18 Oct 2018 22:10:11 +0200 Subject: [PATCH] updated readme --- README.md | 69 +++++++++++++++++++++++-------------------------------- 1 file changed, 29 insertions(+), 40 deletions(-) diff --git a/README.md b/README.md index be5d26e..f867e5c 100644 --- a/README.md +++ b/README.md @@ -1,51 +1,40 @@ -# Predictor for Company Mergers -State: October 2018 (in progress) +# Prediction of Company Mergers (Bachelorthesis Anne) -My python classes for text mining, machine learning models, … +This project contains python classes for text mining, machine learning models, … +The csv file *classification_labelled_corrected.csv* contains 1497 labeled news articles from *Reuters.com* and is used for the machine learning models. -The scripts can be called separately. +**Best F1 score results**: -The complete documentation can be found in the latex document in the thesis folder. - -The csv file 'classification_labelled_corrected.csv' contains 1497 labeled news articles from Reuters.com and is used for the machine learning models. - -Note: -Please enter a valid webhose personal key before you call 'Requester.py'. -Also, please change the path to your JAVAHOME environment variable in 'NER.find_companies' method. - -example: -java_path = "C:\\Program Files (x86)\\Java\\jre1.8.0_181" -os.environ['JAVAHOME'] = java_path - -### Best F1 score results: - -SVM: - -F1 score: 0.8944166649330559 - -best parameters set found on development set: +* **Support Vector Machines Classifier (SVM):** +F1 score: 0.8944166649330559 +Best parameters set found on development set: {'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50} -Naive Bayes: +* **Naive Bayes Classifier**: +F1 score: 0.8324014738144634 (average) +Parameters: SelectPercentile(25), own Bag of Words implementation, 10-fold cross validation -parameters: SelectPercentile(25), own BOW implementation, 10-fold cross validation - -F1 score: min = 0.7586206896551724, max = 0.8846153846153846, average = 0.8324014738144634 - - -## Requirements - -pandas==0.20.1 -nltk==3.2.5 -webhoseio==0.5 -numpy==1.14.0 -graphviz==0.9 -scikit_learn==0.19.2 +The complete documentation can be found in the latex document in the *thesis* folder. ## Installation under Windows +```bash +$ pip install xy +``` +### Requirements -pip install XY +pandas==0.20.1 +nltk==3.2.5 +webhoseio==0.5 +numpy==1.14.0 +graphviz==0.9 +scikit_learn==0.19.2 -## Installation under UBUNTU +## Usage +The scripts can be called separately. +You need to enter a valid personal key for *webhose.io* before you call *Requester.py*. +To run *NER.py* you need to change the path to the JAVAHOME environment variable in *find_companies* method. +--- -apt-get install XX +**Author:** Anne Lorenz / Datavard AG + +**Project Status:** work in progress \ No newline at end of file