You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
Imad c9176ebe3e indexer impl. 4 years ago
..
README indexer impl. 4 years ago
__init__.py indexer impl. 4 years ago
run.py indexer impl. 4 years ago

README

Author: Imad Hamoumi


1- Put your data into the directory /data.
2- Start the script with python run.py
3- follow the instructions


Note:
CSV:
+ Only two extensions are allowed currently. the first is csv and will be read using pandas.
+ You have to provide the name of the column where the scripte can read the text data.

PDF
+ In some cases, reading a pdf file is not allowed
+ Some PDF files are not well encoded


You can add your own training model in the pipline or change the cleaning parameters such as ngram size etc.