updated readme
This commit is contained in:
parent
701d0fdd7e
commit
a866390ae3
69
README.md
69
README.md
|
@ -1,51 +1,40 @@
|
||||||
# Predictor for Company Mergers
|
# Prediction of Company Mergers (Bachelorthesis Anne)
|
||||||
State: October 2018 (in progress)
|
|
||||||
|
|
||||||
My python classes for text mining, machine learning models, …
|
This project contains python classes for text mining, machine learning models, …
|
||||||
|
The csv file *classification_labelled_corrected.csv* contains 1497 labeled news articles from *Reuters.com* and is used for the machine learning models.
|
||||||
|
|
||||||
The scripts can be called separately.
|
**Best F1 score results**:
|
||||||
|
|
||||||
The complete documentation can be found in the latex document in the thesis folder.
|
* **Support Vector Machines Classifier (SVM):**
|
||||||
|
F1 score: 0.8944166649330559
|
||||||
The csv file 'classification_labelled_corrected.csv' contains 1497 labeled news articles from Reuters.com and is used for the machine learning models.
|
Best parameters set found on development set:
|
||||||
|
|
||||||
Note:
|
|
||||||
Please enter a valid webhose personal key before you call 'Requester.py'.
|
|
||||||
Also, please change the path to your JAVAHOME environment variable in 'NER.find_companies' method.
|
|
||||||
|
|
||||||
example:
|
|
||||||
java_path = "C:\\Program Files (x86)\\Java\\jre1.8.0_181"
|
|
||||||
os.environ['JAVAHOME'] = java_path
|
|
||||||
|
|
||||||
### Best F1 score results:
|
|
||||||
|
|
||||||
SVM:
|
|
||||||
|
|
||||||
F1 score: 0.8944166649330559
|
|
||||||
|
|
||||||
best parameters set found on development set:
|
|
||||||
{'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50}
|
{'SVC__C': 0.1, 'SVC__gamma': 0.01, 'SVC__kernel': 'linear', 'perc__percentile': 50}
|
||||||
|
|
||||||
Naive Bayes:
|
* **Naive Bayes Classifier**:
|
||||||
|
F1 score: 0.8324014738144634 (average)
|
||||||
|
Parameters: SelectPercentile(25), own Bag of Words implementation, 10-fold cross validation
|
||||||
|
|
||||||
parameters: SelectPercentile(25), own BOW implementation, 10-fold cross validation
|
The complete documentation can be found in the latex document in the *thesis* folder.
|
||||||
|
|
||||||
F1 score: min = 0.7586206896551724, max = 0.8846153846153846, average = 0.8324014738144634
|
|
||||||
|
|
||||||
|
|
||||||
## Requirements
|
|
||||||
|
|
||||||
pandas==0.20.1
|
|
||||||
nltk==3.2.5
|
|
||||||
webhoseio==0.5
|
|
||||||
numpy==1.14.0
|
|
||||||
graphviz==0.9
|
|
||||||
scikit_learn==0.19.2
|
|
||||||
|
|
||||||
## Installation under Windows
|
## Installation under Windows
|
||||||
|
```bash
|
||||||
|
$ pip install xy
|
||||||
|
```
|
||||||
|
### Requirements
|
||||||
|
|
||||||
pip install XY
|
pandas==0.20.1
|
||||||
|
nltk==3.2.5
|
||||||
|
webhoseio==0.5
|
||||||
|
numpy==1.14.0
|
||||||
|
graphviz==0.9
|
||||||
|
scikit_learn==0.19.2
|
||||||
|
|
||||||
## Installation under UBUNTU
|
## Usage
|
||||||
|
The scripts can be called separately.
|
||||||
|
You need to enter a valid personal key for *webhose.io* before you call *Requester.py*.
|
||||||
|
To run *NER.py* you need to change the path to the JAVAHOME environment variable in *find_companies* method.
|
||||||
|
---
|
||||||
|
|
||||||
apt-get install XX
|
**Author:** Anne Lorenz / Datavard AG
|
||||||
|
|
||||||
|
**Project Status:** work in progress
|
Loading…
Reference in New Issue