You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
4 years ago | |
---|---|---|
.. | ||
README | 4 years ago | |
__init__.py | 4 years ago | |
run.py | 4 years ago |
README
Author: Imad Hamoumi
1- Put your data into the directory /data.
2- Start the script with python run.py
3- follow the instructions
Note:
CSV:
+ Only two extensions are allowed currently. the first is csv and will be read using pandas.
+ You have to provide the name of the column where the scripte can read the text data.
+ In some cases, reading a pdf file is not allowed
+ Some PDF files are not well encoded
You can add your own training model in the pipline or change the cleaning parameters such as ngram size etc.