updated Readme

This commit is contained in:
Anne Lorenz 2018-10-18 14:32:46 +02:00
parent c85ce71e24
commit 701d0fdd7e

View File

@ -1,22 +1,10 @@
# Anne's Bachelor Thesis # Predictor for Company Mergers
State: October 2018 (in progress) State: October 2018 (in progress)
My python classes for text mining, machine learning models, … My python classes for text mining, machine learning models, …
The scripts can be called separately. The scripts can be called separately.
Best F1 score results were:
SVM
---
F1 score: 0.8944166649330559
best parameters set found on development set:
{'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50}
Naive Bayes
-----------
parameters: SelectPercentile(25), own BOW implementation, 10-fold cross validation
F1 score: min = 0.7586206896551724, max = 0.8846153846153846, average = 0.8324014738144634
The complete documentation can be found in the latex document in the thesis folder. The complete documentation can be found in the latex document in the thesis folder.
The csv file 'classification_labelled_corrected.csv' contains 1497 labeled news articles from Reuters.com and is used for the machine learning models. The csv file 'classification_labelled_corrected.csv' contains 1497 labeled news articles from Reuters.com and is used for the machine learning models.
@ -26,10 +14,24 @@ Please enter a valid webhose personal key before you call 'Requester.py'.
Also, please change the path to your JAVAHOME environment variable in 'NER.find_companies' method. Also, please change the path to your JAVAHOME environment variable in 'NER.find_companies' method.
example: example:
# set paths
java_path = "C:\\Program Files (x86)\\Java\\jre1.8.0_181" java_path = "C:\\Program Files (x86)\\Java\\jre1.8.0_181"
os.environ['JAVAHOME'] = java_path os.environ['JAVAHOME'] = java_path
### Best F1 score results:
SVM:
F1 score: 0.8944166649330559
best parameters set found on development set:
{'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50}
Naive Bayes:
parameters: SelectPercentile(25), own BOW implementation, 10-fold cross validation
F1 score: min = 0.7586206896551724, max = 0.8846153846153846, average = 0.8324014738144634
## Requirements ## Requirements