Projct%20Structure.txt

---- 1. Load User Document
    ----> Starting with word document. Like Pdf, txt and docx file. 
    ----> Data Ingestion is meant to take in the user data. Load the embedding model, then create a vector database from it. 
    ----> Considerations: 
            1. Pdfs have pages already, hence text splitter won't be used. We want to be able to make reference to the pages the searched document can be found. 
            2. The apporach for other data types can be different. we can have text splitter fot txt files and if possible add pages to the chunks made for easy reference.
            3. 
        
        Data Ingestion Module: 
            This module will handle the data ingestion process.
                uitls.py --> keep the reusable functions 
                pdf_ingest.py --> This module will handle pdfs 

        
        Loggings Module: 
            This module will keep logs of what's going on here.
Pdf Ingestion pipeline completed 2024-08-05 22:14:19 +01:00			`---- 1. Load User Document`
			`----> Starting with word document. Like Pdf, txt and docx file.`
			`----> Data Ingestion is meant to take in the user data. Load the embedding model, then create a vector database from it.`
			`----> Considerations:`
			`1. Pdfs have pages already, hence text splitter won't be used. We want to be able to make reference to the pages the searched document can be found.`
			`2. The apporach for other data types can be different. we can have text splitter fot txt files and if possible add pages to the chunks made for easy reference.`
			`3.`

			`Data Ingestion Module:`
			`This module will handle the data ingestion process.`
			`uitls.py --> keep the reusable functions`
			`pdf_ingest.py --> This module will handle pdfs`


			`Loggings Module:`
			`This module will keep logs of what's going on here.`