This commit is contained in:
ryanwong
2024-04-26 19:22:42 -04:00
parent 40f8d089d9
commit 0ca33b3fc5
2 changed files with 29 additions and 14 deletions
+29 -14
View File
@@ -5,18 +5,22 @@ This project is divided into four main modules, each focusing on a distinct aspe
## Module 1: Data Preparation and Backend Setup ## Module 1: Data Preparation and Backend Setup
### Task 1: E-commerce Dataset Cleaning ### Task 1: E-commerce Dataset Cleaning
- *Objective*: Ensure the dataset is clean and ready for analysis and vectorization. - *Objective*: Ensure the dataset is clean and ready for analysis and vectorization.
- *Key Actions*: Remove duplicates, handle missing values, and standardize formats. - *Key Actions*: Remove duplicates, handle missing values, and standardize formats.
### Task 2: Vector Database Creation ### Task 2: Vector Database Creation
- *Objective*: Set up a vector database using Pinecone to store product vectors. - *Objective*: Set up a vector database using Pinecone to store product vectors.
- *Key Actions*: Define the database schema and integrate with Pinecone. - *Key Actions*: Define the database schema and integrate with Pinecone.
### Task 3: Similarity Metrics Selection ### Task 3: Similarity Metrics Selection
- *Objective*: Choose and justify the similarity metrics used to compare product vectors. - *Objective*: Choose and justify the similarity metrics used to compare product vectors.
- *Key Actions*: Evaluate different metrics (e.g., cosine similarity, dot product) and select the best fit based on the dataset characteristics. - *Key Actions*: Evaluate different metrics (e.g., cosine similarity, dot product) and select the best fit based on the dataset characteristics.
### Endpoint 1: Product Recommendation Service ### Endpoint 1: Product Recommendation Service
- *Functionality*: Handle natural language queries to recommend products, including safeguards against bad queries and sensitive data exposure. - *Functionality*: Handle natural language queries to recommend products, including safeguards against bad queries and sensitive data exposure.
- *Input*: Customer's natural language query. - *Input*: Customer's natural language query.
- *Output*: Product matches array and a natural language response within specified constraints. - *Output*: Product matches array and a natural language response within specified constraints.
@@ -24,14 +28,17 @@ This project is divided into four main modules, each focusing on a distinct aspe
## Module 2: OCR and Web Scraping ## Module 2: OCR and Web Scraping
### Task 4: OCR Functionality Implementation ### Task 4: OCR Functionality Implementation
- *Objective*: Develop the capability to extract text from images using OCR technology. - *Objective*: Develop the capability to extract text from images using OCR technology.
- *Key Actions*: Integrate and configure an OCR tool (e.g., Tesseract). - *Key Actions*: Integrate and configure an OCR tool (e.g., Tesseract).
### Task 5: Web Scraping for Product Images ### Task 5: Web Scraping for Product Images
- *Objective*: Scrape product images from e-commerce websites for training data.
- *Key Actions*: Automate scraping, download images, and store them systematically. - *Objective*: Scrape product images from e-commerce websites for training data ``CNN_Model_Train_Data.csv``.
- *Key Actions*: Automate scraping, download images, and store them systematically and make sure you have enough data to train the CNN model.
### Endpoint 2: OCR-Based Query Processing ### Endpoint 2: OCR-Based Query Processing
- *Functionality*: Extract and process handwritten queries using the same logic as Endpoint 1. - *Functionality*: Extract and process handwritten queries using the same logic as Endpoint 1.
- *Input*: Image file with handwritten text. - *Input*: Image file with handwritten text.
- *Output*: Same output format as Endpoint 1, adapted for image inputs. - *Output*: Same output format as Endpoint 1, adapted for image inputs.
@@ -39,10 +46,12 @@ This project is divided into four main modules, each focusing on a distinct aspe
## Module 3: CNN Model Development ## Module 3: CNN Model Development
### Task 6: CNN Model Training ### Task 6: CNN Model Training
- *Objective*: Develop a CNN model from scratch to identify products from images.
- *Objective*: Develop a CNN model from scratch using only the ``products`` mentioned on ``CNN_Model_Train_Data.csv`` to identify products from images.
- *Key Actions*: Train the model using scraped images and clean data without using pre-trained models. - *Key Actions*: Train the model using scraped images and clean data without using pre-trained models.
### Endpoint 3: Image-Based Product Detection ### Endpoint 3: Image-Based Product Detection
- *Functionality*: Use the CNN model to identify products from images and match them using the vector database. - *Functionality*: Use the CNN model to identify products from images and match them using the vector database.
- *Input*: Product image. - *Input*: Product image.
- *Output*: Product description and matching products in a format consistent with other endpoints. - *Output*: Product description and matching products in a format consistent with other endpoints.
@@ -50,12 +59,15 @@ This project is divided into four main modules, each focusing on a distinct aspe
## Module 4: Frontend Development and Integration ## Module 4: Frontend Development and Integration
### Frontend Page 1: Text Query Interface ### Frontend Page 1: Text Query Interface
- *Features*: Form to submit text queries, display natural language responses, and a product details table. - *Features*: Form to submit text queries, display natural language responses, and a product details table.
### Frontend Page 2: Image Query Interface ### Frontend Page 2: Image Query Interface
- *Features*: Allows users to upload images of handwritten queries and displays results similar to Page 1. - *Features*: Allows users to upload images of handwritten queries and displays results similar to Page 1.
### Frontend Page 3: Product Image Upload Interface ### Frontend Page 3: Product Image Upload Interface
- *Features*: Users can upload product images, and view the identified product description and related products in natural language and tabular format. - *Features*: Users can upload product images, and view the identified product description and related products in natural language and tabular format.
## Instructions for Presentation ## Instructions for Presentation
@@ -69,11 +81,11 @@ Each module completion should be accompanied by a concise, to-the-point report t
- *Title Page*: Include the module number and title, the names of the team members, and the submission date. - *Title Page*: Include the module number and title, the names of the team members, and the submission date.
- *Introduction*: Briefly describe the objectives of the module and its importance to the overall project. - *Introduction*: Briefly describe the objectives of the module and its importance to the overall project.
- *High-Level Flow*: - *High-Level Flow*:
- *Description*: Outline the main tasks and functionalities developed in the module. - *Description*: Outline the main tasks and functionalities developed in the module.
- *Diagrams*: Include flowcharts or diagrams that visually represent the architecture and data flow. - *Diagrams*: Include flowcharts or diagrams that visually represent the architecture and data flow.
- *Key Decisions*: Summarize crucial decisions made during the module, such as choice of technology, design patterns, and configurations. - *Key Decisions*: Summarize crucial decisions made during the module, such as choice of technology, design patterns, and configurations.
- *Challenges and Solutions*: - *Challenges and Solutions*:
- Briefly discuss any challenges faced during the module and how they were addressed. - Briefly discuss any challenges faced during the module and how they were addressed.
- *Conclusion*: Sum up the outcomes of the module and its readiness for integration with other modules. - *Conclusion*: Sum up the outcomes of the module and its readiness for integration with other modules.
- *References*: Cite any tools, libraries, or external resources that were used. - *References*: Cite any tools, libraries, or external resources that were used.
@@ -84,13 +96,13 @@ Participants are required to create two sets of videos for each module, detailin
#### Video Requirements: #### Video Requirements:
- *Functional Demonstration Video*: - *Functional Demonstration Video*:
- *Content*: Demonstrate the functionality of each endpoint and page developed in the module. - *Content*: Demonstrate the functionality of each endpoint and page developed in the module.
- *Focus*: Show how the system responds to various inputs and scenarios. Explain the user interaction with the system. - *Focus*: Show how the system responds to various inputs and scenarios. Explain the user interaction with the system.
- *Duration*: Keep the video concise, preferably under 5 minutes. - *Duration*: Keep the video concise, preferably under 5 minutes.
- *Code Explanation Video*: - *Code Explanation Video*:
- *Content*: Provide a high-level overview of the codebase for the module. - *Content*: Provide a high-level overview of the codebase for the module.
- *Focus*: Explain the structure of the code, major classes, and functions. Highlight any significant patterns or algorithms used. - *Focus*: Explain the structure of the code, major classes, and functions. Highlight any significant patterns or algorithms used.
- *Duration*: Limit the explanation to under 10 minutes. - *Duration*: Limit the explanation to under 10 minutes.
### Submission Guidelines: ### Submission Guidelines:
@@ -101,12 +113,15 @@ Participants are required to create two sets of videos for each module, detailin
## Instructions for Coding ## Instructions for Coding
### General Guidelines ### General Guidelines
- *Class-Based Implementation*: It is recommended to use class-based implementation for all backend services to ensure organized, reusable, and maintainable code. - *Class-Based Implementation*: It is recommended to use class-based implementation for all backend services to ensure organized, reusable, and maintainable code.
- *Best Practices*: - *Best Practices*:
- *ACID Properties*: Ensure that database transactions are Atomic, Consistent, Isolated, and Durable to maintain data integrity and reliability. - *ACID Properties*: Ensure that database transactions are Atomic, Consistent, Isolated, and Durable to maintain data integrity and reliability.
- *Modularity*: Build the codebase with clear modularity in mind. Separate different functionalities into distinct modules to enhance readability and maintainability. - *Modularity*: Build the codebase with clear modularity in mind. Separate different functionalities into distinct modules to enhance readability and maintainability.
- *Packaging*: Organize your code into packages that reflect the services they provide. This approach not only helps in maintaining the code but also simplifies the deployment and scaling process. - *Packaging*: Organize your code into packages that reflect the services they provide. This approach not only helps in maintaining the code but also simplifies the deployment and scaling process.
- Directories: Whenever you will test on notebook make sure you keep all the notebooks in ``notebook`` directory and use proper naming for the notebooks.
### Tech Stack ### Tech Stack
- *Web Framework*: Use Flask for developing the backend. Flask provides flexibility and ease of use for setting up API services. - *Web Framework*: Use Flask for developing the backend. Flask provides flexibility and ease of use for setting up API services.
- *Vector Database*: Integrate Pinecone to manage and query vector data efficiently. Pinecone supports scalable vector searches which are crucial for the recommendation systems in this project. - *Vector Database*: Integrate Pinecone to manage and query vector data efficiently. Pinecone supports scalable vector searches which are crucial for the recommendation systems in this project.
BIN
View File
Binary file not shown.