Files
ds_erp_ai/Projct Structure.txt
T

16 lines
926 B
Plaintext
Raw Normal View History

2024-08-05 22:14:19 +01:00
---- 1. Load User Document
----> Starting with word document. Like Pdf, txt and docx file.
----> Data Ingestion is meant to take in the user data. Load the embedding model, then create a vector database from it.
----> Considerations:
1. Pdfs have pages already, hence text splitter won't be used. We want to be able to make reference to the pages the searched document can be found.
2. The apporach for other data types can be different. we can have text splitter fot txt files and if possible add pages to the chunks made for easy reference.
3.
Data Ingestion Module:
This module will handle the data ingestion process.
uitls.py --> keep the reusable functions
pdf_ingest.py --> This module will handle pdfs
Loggings Module:
This module will keep logs of what's going on here.