16 lines
926 B
Plaintext
16 lines
926 B
Plaintext
|
|
---- 1. Load User Document
|
||
|
|
----> Starting with word document. Like Pdf, txt and docx file.
|
||
|
|
----> Data Ingestion is meant to take in the user data. Load the embedding model, then create a vector database from it.
|
||
|
|
----> Considerations:
|
||
|
|
1. Pdfs have pages already, hence text splitter won't be used. We want to be able to make reference to the pages the searched document can be found.
|
||
|
|
2. The apporach for other data types can be different. we can have text splitter fot txt files and if possible add pages to the chunks made for easy reference.
|
||
|
|
3.
|
||
|
|
|
||
|
|
Data Ingestion Module:
|
||
|
|
This module will handle the data ingestion process.
|
||
|
|
uitls.py --> keep the reusable functions
|
||
|
|
pdf_ingest.py --> This module will handle pdfs
|
||
|
|
|
||
|
|
|
||
|
|
Loggings Module:
|
||
|
|
This module will keep logs of what's going on here.
|