Author: Imad Hamoumi 1- Put your data into the directory /data. 2- Start the script with python run.py 3- follow the instructions Note: CSV: + Only two extensions are allowed currently. the first is csv and will be read using pandas. + You have to provide the name of the column where the scripte can read the text data. PDF + In some cases, reading a pdf file is not allowed + Some PDF files are not well encoded You can add your own training model in the pipline or change the cleaning parameters such as ngram size etc.