diff --git a/README.md b/README.md index 733816d..be5d26e 100644 --- a/README.md +++ b/README.md @@ -1,22 +1,10 @@ -# Anne's Bachelor Thesis +# Predictor for Company Mergers State: October 2018 (in progress) My python classes for text mining, machine learning models, … + The scripts can be called separately. -Best F1 score results were: - -SVM ---- -F1 score: 0.8944166649330559 -best parameters set found on development set: -{'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50} - -Naive Bayes ------------ -parameters: SelectPercentile(25), own BOW implementation, 10-fold cross validation -F1 score: min = 0.7586206896551724, max = 0.8846153846153846, average = 0.8324014738144634 - The complete documentation can be found in the latex document in the thesis folder. The csv file 'classification_labelled_corrected.csv' contains 1497 labeled news articles from Reuters.com and is used for the machine learning models. @@ -26,10 +14,24 @@ Please enter a valid webhose personal key before you call 'Requester.py'. Also, please change the path to your JAVAHOME environment variable in 'NER.find_companies' method. example: -# set paths java_path = "C:\\Program Files (x86)\\Java\\jre1.8.0_181" os.environ['JAVAHOME'] = java_path +### Best F1 score results: + +SVM: + +F1 score: 0.8944166649330559 + +best parameters set found on development set: +{'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50} + +Naive Bayes: + +parameters: SelectPercentile(25), own BOW implementation, 10-fold cross validation + +F1 score: min = 0.7586206896551724, max = 0.8846153846153846, average = 0.8324014738144634 + ## Requirements