textnavi/indexer
Imad c9176ebe3e indexer impl. 2019-01-28 17:08:35 +01:00
..
README indexer impl. 2019-01-28 17:08:35 +01:00
__init__.py indexer impl. 2019-01-28 17:08:35 +01:00
run.py indexer impl. 2019-01-28 17:08:35 +01:00

README

Author: Imad Hamoumi


1- Put your data into the directory /data.
2- Start the script with python run.py
3- follow the instructions


Note:
    CSV:
     + Only two extensions are allowed currently. the first is csv and will be read using pandas.
     + You have to provide the name of the column where the scripte can read the text data.

    PDF
     + In some cases, reading a pdf file is not allowed
     + Some PDF files are not well encoded


You can add your own training model in the pipline or change the cleaning parameters  such as ngram size etc.