updated readme

This commit is contained in:
Anne Lorenz 2018-10-18 22:10:11 +02:00
parent 701d0fdd7e
commit a866390ae3
1 changed files with 29 additions and 40 deletions

View File

@ -1,39 +1,26 @@
# Predictor for Company Mergers # Prediction of Company Mergers (Bachelorthesis Anne)
State: October 2018 (in progress)
My python classes for text mining, machine learning models, … This project contains python classes for text mining, machine learning models, …
The csv file *classification_labelled_corrected.csv* contains 1497 labeled news articles from *Reuters.com* and is used for the machine learning models.
The scripts can be called separately. **Best F1 score results**:
The complete documentation can be found in the latex document in the thesis folder.
The csv file 'classification_labelled_corrected.csv' contains 1497 labeled news articles from Reuters.com and is used for the machine learning models.
Note:
Please enter a valid webhose personal key before you call 'Requester.py'.
Also, please change the path to your JAVAHOME environment variable in 'NER.find_companies' method.
example:
java_path = "C:\\Program Files (x86)\\Java\\jre1.8.0_181"
os.environ['JAVAHOME'] = java_path
### Best F1 score results:
SVM:
* **Support Vector Machines Classifier (SVM):**
F1 score: 0.8944166649330559 F1 score: 0.8944166649330559
Best parameters set found on development set:
best parameters set found on development set:
{'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50} {'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50}
Naive Bayes: * **Naive Bayes Classifier**:
F1 score: 0.8324014738144634 (average)
Parameters: SelectPercentile(25), own Bag of Words implementation, 10-fold cross validation
parameters: SelectPercentile(25), own BOW implementation, 10-fold cross validation The complete documentation can be found in the latex document in the *thesis* folder.
F1 score: min = 0.7586206896551724, max = 0.8846153846153846, average = 0.8324014738144634 ## Installation under Windows
```bash
$ pip install xy
## Requirements ```
### Requirements
pandas==0.20.1 pandas==0.20.1
nltk==3.2.5 nltk==3.2.5
@ -42,10 +29,12 @@ numpy==1.14.0
graphviz==0.9 graphviz==0.9
scikit_learn==0.19.2 scikit_learn==0.19.2
## Installation under Windows ## Usage
The scripts can be called separately.
You need to enter a valid personal key for *webhose.io* before you call *Requester.py*.
To run *NER.py* you need to change the path to the JAVAHOME environment variable in *find_companies* method.
---
pip install XY **Author:** Anne Lorenz / Datavard AG
## Installation under UBUNTU **Project Status:** work in progress
apt-get install XX