25 lines
564 B
Plaintext
25 lines
564 B
Plaintext
Author: Imad Hamoumi
|
|
|
|
|
|
1- Put your data into the directory /data.
|
|
2- Start the script with python run.py
|
|
3- follow the instructions
|
|
|
|
|
|
Note:
|
|
CSV:
|
|
+ Only two extensions are allowed currently. the first is csv and will be read using pandas.
|
|
+ You have to provide the name of the column where the scripte can read the text data.
|
|
|
|
PDF
|
|
+ In some cases, reading a pdf file is not allowed
|
|
+ Some PDF files are not well encoded
|
|
|
|
|
|
You can add your own training model in the pipline or change the cleaning parameters such as ngram size etc.
|
|
|
|
|
|
|
|
|
|
|