updated readme

This commit is contained in:
Anne Lorenz 2018-10-18 22:10:11 +02:00
parent 701d0fdd7e
commit a866390ae3
1 changed files with 29 additions and 40 deletions

View File

@ -1,39 +1,26 @@
# Predictor for Company Mergers
State: October 2018 (in progress)
# Prediction of Company Mergers (Bachelorthesis Anne)
My python classes for text mining, machine learning models, …
This project contains python classes for text mining, machine learning models, …
The csv file *classification_labelled_corrected.csv* contains 1497 labeled news articles from *Reuters.com* and is used for the machine learning models.
The scripts can be called separately.
The complete documentation can be found in the latex document in the thesis folder.
The csv file 'classification_labelled_corrected.csv' contains 1497 labeled news articles from Reuters.com and is used for the machine learning models.
Note:
Please enter a valid webhose personal key before you call 'Requester.py'.
Also, please change the path to your JAVAHOME environment variable in 'NER.find_companies' method.
example:
java_path = "C:\\Program Files (x86)\\Java\\jre1.8.0_181"
os.environ['JAVAHOME'] = java_path
### Best F1 score results:
SVM:
**Best F1 score results**:
* **Support Vector Machines Classifier (SVM):**
F1 score: 0.8944166649330559
best parameters set found on development set:
Best parameters set found on development set:
{'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50}
Naive Bayes:
* **Naive Bayes Classifier**:
F1 score: 0.8324014738144634 (average)
Parameters: SelectPercentile(25), own Bag of Words implementation, 10-fold cross validation
parameters: SelectPercentile(25), own BOW implementation, 10-fold cross validation
The complete documentation can be found in the latex document in the *thesis* folder.
F1 score: min = 0.7586206896551724, max = 0.8846153846153846, average = 0.8324014738144634
## Requirements
## Installation under Windows
```bash
$ pip install xy
```
### Requirements
pandas==0.20.1
nltk==3.2.5
@ -42,10 +29,12 @@ numpy==1.14.0
graphviz==0.9
scikit_learn==0.19.2
## Installation under Windows
## Usage
The scripts can be called separately.
You need to enter a valid personal key for *webhose.io* before you call *Requester.py*.
To run *NER.py* you need to change the path to the JAVAHOME environment variable in *find_companies* method.
---
pip install XY
**Author:** Anne Lorenz / Datavard AG
## Installation under UBUNTU
apt-get install XX
**Project Status:** work in progress