This commit is contained in:
ryanwong
2024-04-26 19:22:42 -04:00
parent 40f8d089d9
commit 0ca33b3fc5
2 changed files with 29 additions and 14 deletions
+18 -3
View File
@@ -5,18 +5,22 @@ This project is divided into four main modules, each focusing on a distinct aspe
## Module 1: Data Preparation and Backend Setup ## Module 1: Data Preparation and Backend Setup
### Task 1: E-commerce Dataset Cleaning ### Task 1: E-commerce Dataset Cleaning
- *Objective*: Ensure the dataset is clean and ready for analysis and vectorization. - *Objective*: Ensure the dataset is clean and ready for analysis and vectorization.
- *Key Actions*: Remove duplicates, handle missing values, and standardize formats. - *Key Actions*: Remove duplicates, handle missing values, and standardize formats.
### Task 2: Vector Database Creation ### Task 2: Vector Database Creation
- *Objective*: Set up a vector database using Pinecone to store product vectors. - *Objective*: Set up a vector database using Pinecone to store product vectors.
- *Key Actions*: Define the database schema and integrate with Pinecone. - *Key Actions*: Define the database schema and integrate with Pinecone.
### Task 3: Similarity Metrics Selection ### Task 3: Similarity Metrics Selection
- *Objective*: Choose and justify the similarity metrics used to compare product vectors. - *Objective*: Choose and justify the similarity metrics used to compare product vectors.
- *Key Actions*: Evaluate different metrics (e.g., cosine similarity, dot product) and select the best fit based on the dataset characteristics. - *Key Actions*: Evaluate different metrics (e.g., cosine similarity, dot product) and select the best fit based on the dataset characteristics.
### Endpoint 1: Product Recommendation Service ### Endpoint 1: Product Recommendation Service
- *Functionality*: Handle natural language queries to recommend products, including safeguards against bad queries and sensitive data exposure. - *Functionality*: Handle natural language queries to recommend products, including safeguards against bad queries and sensitive data exposure.
- *Input*: Customer's natural language query. - *Input*: Customer's natural language query.
- *Output*: Product matches array and a natural language response within specified constraints. - *Output*: Product matches array and a natural language response within specified constraints.
@@ -24,14 +28,17 @@ This project is divided into four main modules, each focusing on a distinct aspe
## Module 2: OCR and Web Scraping ## Module 2: OCR and Web Scraping
### Task 4: OCR Functionality Implementation ### Task 4: OCR Functionality Implementation
- *Objective*: Develop the capability to extract text from images using OCR technology. - *Objective*: Develop the capability to extract text from images using OCR technology.
- *Key Actions*: Integrate and configure an OCR tool (e.g., Tesseract). - *Key Actions*: Integrate and configure an OCR tool (e.g., Tesseract).
### Task 5: Web Scraping for Product Images ### Task 5: Web Scraping for Product Images
- *Objective*: Scrape product images from e-commerce websites for training data.
- *Key Actions*: Automate scraping, download images, and store them systematically. - *Objective*: Scrape product images from e-commerce websites for training data ``CNN_Model_Train_Data.csv``.
- *Key Actions*: Automate scraping, download images, and store them systematically and make sure you have enough data to train the CNN model.
### Endpoint 2: OCR-Based Query Processing ### Endpoint 2: OCR-Based Query Processing
- *Functionality*: Extract and process handwritten queries using the same logic as Endpoint 1. - *Functionality*: Extract and process handwritten queries using the same logic as Endpoint 1.
- *Input*: Image file with handwritten text. - *Input*: Image file with handwritten text.
- *Output*: Same output format as Endpoint 1, adapted for image inputs. - *Output*: Same output format as Endpoint 1, adapted for image inputs.
@@ -39,10 +46,12 @@ This project is divided into four main modules, each focusing on a distinct aspe
## Module 3: CNN Model Development ## Module 3: CNN Model Development
### Task 6: CNN Model Training ### Task 6: CNN Model Training
- *Objective*: Develop a CNN model from scratch to identify products from images.
- *Objective*: Develop a CNN model from scratch using only the ``products`` mentioned on ``CNN_Model_Train_Data.csv`` to identify products from images.
- *Key Actions*: Train the model using scraped images and clean data without using pre-trained models. - *Key Actions*: Train the model using scraped images and clean data without using pre-trained models.
### Endpoint 3: Image-Based Product Detection ### Endpoint 3: Image-Based Product Detection
- *Functionality*: Use the CNN model to identify products from images and match them using the vector database. - *Functionality*: Use the CNN model to identify products from images and match them using the vector database.
- *Input*: Product image. - *Input*: Product image.
- *Output*: Product description and matching products in a format consistent with other endpoints. - *Output*: Product description and matching products in a format consistent with other endpoints.
@@ -50,12 +59,15 @@ This project is divided into four main modules, each focusing on a distinct aspe
## Module 4: Frontend Development and Integration ## Module 4: Frontend Development and Integration
### Frontend Page 1: Text Query Interface ### Frontend Page 1: Text Query Interface
- *Features*: Form to submit text queries, display natural language responses, and a product details table. - *Features*: Form to submit text queries, display natural language responses, and a product details table.
### Frontend Page 2: Image Query Interface ### Frontend Page 2: Image Query Interface
- *Features*: Allows users to upload images of handwritten queries and displays results similar to Page 1. - *Features*: Allows users to upload images of handwritten queries and displays results similar to Page 1.
### Frontend Page 3: Product Image Upload Interface ### Frontend Page 3: Product Image Upload Interface
- *Features*: Users can upload product images, and view the identified product description and related products in natural language and tabular format. - *Features*: Users can upload product images, and view the identified product description and related products in natural language and tabular format.
## Instructions for Presentation ## Instructions for Presentation
@@ -101,12 +113,15 @@ Participants are required to create two sets of videos for each module, detailin
## Instructions for Coding ## Instructions for Coding
### General Guidelines ### General Guidelines
- *Class-Based Implementation*: It is recommended to use class-based implementation for all backend services to ensure organized, reusable, and maintainable code. - *Class-Based Implementation*: It is recommended to use class-based implementation for all backend services to ensure organized, reusable, and maintainable code.
- *Best Practices*: - *Best Practices*:
- *ACID Properties*: Ensure that database transactions are Atomic, Consistent, Isolated, and Durable to maintain data integrity and reliability. - *ACID Properties*: Ensure that database transactions are Atomic, Consistent, Isolated, and Durable to maintain data integrity and reliability.
- *Modularity*: Build the codebase with clear modularity in mind. Separate different functionalities into distinct modules to enhance readability and maintainability. - *Modularity*: Build the codebase with clear modularity in mind. Separate different functionalities into distinct modules to enhance readability and maintainability.
- *Packaging*: Organize your code into packages that reflect the services they provide. This approach not only helps in maintaining the code but also simplifies the deployment and scaling process. - *Packaging*: Organize your code into packages that reflect the services they provide. This approach not only helps in maintaining the code but also simplifies the deployment and scaling process.
- Directories: Whenever you will test on notebook make sure you keep all the notebooks in ``notebook`` directory and use proper naming for the notebooks.
### Tech Stack ### Tech Stack
- *Web Framework*: Use Flask for developing the backend. Flask provides flexibility and ease of use for setting up API services. - *Web Framework*: Use Flask for developing the backend. Flask provides flexibility and ease of use for setting up API services.
- *Vector Database*: Integrate Pinecone to manage and query vector data efficiently. Pinecone supports scalable vector searches which are crucial for the recommendation systems in this project. - *Vector Database*: Integrate Pinecone to manage and query vector data efficiently. Pinecone supports scalable vector searches which are crucial for the recommendation systems in this project.
BIN
View File
Binary file not shown.