updated Readme

This commit is contained in:
Anne Lorenz 2018-10-18 14:32:46 +02:00
parent c85ce71e24
commit 701d0fdd7e
1 changed files with 17 additions and 15 deletions

View File

@ -1,22 +1,10 @@
# Anne's Bachelor Thesis
# Predictor for Company Mergers
State: October 2018 (in progress)
My python classes for text mining, machine learning models, …
The scripts can be called separately.
Best F1 score results were:
SVM
---
F1 score: 0.8944166649330559
best parameters set found on development set:
{'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50}
Naive Bayes
-----------
parameters: SelectPercentile(25), own BOW implementation, 10-fold cross validation
F1 score: min = 0.7586206896551724, max = 0.8846153846153846, average = 0.8324014738144634
The complete documentation can be found in the latex document in the thesis folder.
The csv file 'classification_labelled_corrected.csv' contains 1497 labeled news articles from Reuters.com and is used for the machine learning models.
@ -26,10 +14,24 @@ Please enter a valid webhose personal key before you call 'Requester.py'.
Also, please change the path to your JAVAHOME environment variable in 'NER.find_companies' method.
example:
# set paths
java_path = "C:\\Program Files (x86)\\Java\\jre1.8.0_181"
os.environ['JAVAHOME'] = java_path
### Best F1 score results:
SVM:
F1 score: 0.8944166649330559
best parameters set found on development set:
{'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50}
Naive Bayes:
parameters: SelectPercentile(25), own BOW implementation, 10-fold cross validation
F1 score: min = 0.7586206896551724, max = 0.8846153846153846, average = 0.8324014738144634
## Requirements