25 lines
564 B
Plaintext
25 lines
564 B
Plaintext
|
Author: Imad Hamoumi
|
||
|
|
||
|
|
||
|
1- Put your data into the directory /data.
|
||
|
2- Start the script with python run.py
|
||
|
3- follow the instructions
|
||
|
|
||
|
|
||
|
Note:
|
||
|
CSV:
|
||
|
+ Only two extensions are allowed currently. the first is csv and will be read using pandas.
|
||
|
+ You have to provide the name of the column where the scripte can read the text data.
|
||
|
|
||
|
PDF
|
||
|
+ In some cases, reading a pdf file is not allowed
|
||
|
+ Some PDF files are not well encoded
|
||
|
|
||
|
|
||
|
You can add your own training model in the pipline or change the cleaning parameters such as ngram size etc.
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|