updated
This commit is contained in:
@@ -5,18 +5,22 @@ This project is divided into four main modules, each focusing on a distinct aspe
|
|||||||
## Module 1: Data Preparation and Backend Setup
|
## Module 1: Data Preparation and Backend Setup
|
||||||
|
|
||||||
### Task 1: E-commerce Dataset Cleaning
|
### Task 1: E-commerce Dataset Cleaning
|
||||||
|
|
||||||
- *Objective*: Ensure the dataset is clean and ready for analysis and vectorization.
|
- *Objective*: Ensure the dataset is clean and ready for analysis and vectorization.
|
||||||
- *Key Actions*: Remove duplicates, handle missing values, and standardize formats.
|
- *Key Actions*: Remove duplicates, handle missing values, and standardize formats.
|
||||||
|
|
||||||
### Task 2: Vector Database Creation
|
### Task 2: Vector Database Creation
|
||||||
|
|
||||||
- *Objective*: Set up a vector database using Pinecone to store product vectors.
|
- *Objective*: Set up a vector database using Pinecone to store product vectors.
|
||||||
- *Key Actions*: Define the database schema and integrate with Pinecone.
|
- *Key Actions*: Define the database schema and integrate with Pinecone.
|
||||||
|
|
||||||
### Task 3: Similarity Metrics Selection
|
### Task 3: Similarity Metrics Selection
|
||||||
|
|
||||||
- *Objective*: Choose and justify the similarity metrics used to compare product vectors.
|
- *Objective*: Choose and justify the similarity metrics used to compare product vectors.
|
||||||
- *Key Actions*: Evaluate different metrics (e.g., cosine similarity, dot product) and select the best fit based on the dataset characteristics.
|
- *Key Actions*: Evaluate different metrics (e.g., cosine similarity, dot product) and select the best fit based on the dataset characteristics.
|
||||||
|
|
||||||
### Endpoint 1: Product Recommendation Service
|
### Endpoint 1: Product Recommendation Service
|
||||||
|
|
||||||
- *Functionality*: Handle natural language queries to recommend products, including safeguards against bad queries and sensitive data exposure.
|
- *Functionality*: Handle natural language queries to recommend products, including safeguards against bad queries and sensitive data exposure.
|
||||||
- *Input*: Customer's natural language query.
|
- *Input*: Customer's natural language query.
|
||||||
- *Output*: Product matches array and a natural language response within specified constraints.
|
- *Output*: Product matches array and a natural language response within specified constraints.
|
||||||
@@ -24,14 +28,17 @@ This project is divided into four main modules, each focusing on a distinct aspe
|
|||||||
## Module 2: OCR and Web Scraping
|
## Module 2: OCR and Web Scraping
|
||||||
|
|
||||||
### Task 4: OCR Functionality Implementation
|
### Task 4: OCR Functionality Implementation
|
||||||
|
|
||||||
- *Objective*: Develop the capability to extract text from images using OCR technology.
|
- *Objective*: Develop the capability to extract text from images using OCR technology.
|
||||||
- *Key Actions*: Integrate and configure an OCR tool (e.g., Tesseract).
|
- *Key Actions*: Integrate and configure an OCR tool (e.g., Tesseract).
|
||||||
|
|
||||||
### Task 5: Web Scraping for Product Images
|
### Task 5: Web Scraping for Product Images
|
||||||
- *Objective*: Scrape product images from e-commerce websites for training data.
|
|
||||||
- *Key Actions*: Automate scraping, download images, and store them systematically.
|
- *Objective*: Scrape product images from e-commerce websites for training data ``CNN_Model_Train_Data.csv``.
|
||||||
|
- *Key Actions*: Automate scraping, download images, and store them systematically and make sure you have enough data to train the CNN model.
|
||||||
|
|
||||||
### Endpoint 2: OCR-Based Query Processing
|
### Endpoint 2: OCR-Based Query Processing
|
||||||
|
|
||||||
- *Functionality*: Extract and process handwritten queries using the same logic as Endpoint 1.
|
- *Functionality*: Extract and process handwritten queries using the same logic as Endpoint 1.
|
||||||
- *Input*: Image file with handwritten text.
|
- *Input*: Image file with handwritten text.
|
||||||
- *Output*: Same output format as Endpoint 1, adapted for image inputs.
|
- *Output*: Same output format as Endpoint 1, adapted for image inputs.
|
||||||
@@ -39,10 +46,12 @@ This project is divided into four main modules, each focusing on a distinct aspe
|
|||||||
## Module 3: CNN Model Development
|
## Module 3: CNN Model Development
|
||||||
|
|
||||||
### Task 6: CNN Model Training
|
### Task 6: CNN Model Training
|
||||||
- *Objective*: Develop a CNN model from scratch to identify products from images.
|
|
||||||
|
- *Objective*: Develop a CNN model from scratch using only the ``products`` mentioned on ``CNN_Model_Train_Data.csv`` to identify products from images.
|
||||||
- *Key Actions*: Train the model using scraped images and clean data without using pre-trained models.
|
- *Key Actions*: Train the model using scraped images and clean data without using pre-trained models.
|
||||||
|
|
||||||
### Endpoint 3: Image-Based Product Detection
|
### Endpoint 3: Image-Based Product Detection
|
||||||
|
|
||||||
- *Functionality*: Use the CNN model to identify products from images and match them using the vector database.
|
- *Functionality*: Use the CNN model to identify products from images and match them using the vector database.
|
||||||
- *Input*: Product image.
|
- *Input*: Product image.
|
||||||
- *Output*: Product description and matching products in a format consistent with other endpoints.
|
- *Output*: Product description and matching products in a format consistent with other endpoints.
|
||||||
@@ -50,12 +59,15 @@ This project is divided into four main modules, each focusing on a distinct aspe
|
|||||||
## Module 4: Frontend Development and Integration
|
## Module 4: Frontend Development and Integration
|
||||||
|
|
||||||
### Frontend Page 1: Text Query Interface
|
### Frontend Page 1: Text Query Interface
|
||||||
|
|
||||||
- *Features*: Form to submit text queries, display natural language responses, and a product details table.
|
- *Features*: Form to submit text queries, display natural language responses, and a product details table.
|
||||||
|
|
||||||
### Frontend Page 2: Image Query Interface
|
### Frontend Page 2: Image Query Interface
|
||||||
|
|
||||||
- *Features*: Allows users to upload images of handwritten queries and displays results similar to Page 1.
|
- *Features*: Allows users to upload images of handwritten queries and displays results similar to Page 1.
|
||||||
|
|
||||||
### Frontend Page 3: Product Image Upload Interface
|
### Frontend Page 3: Product Image Upload Interface
|
||||||
|
|
||||||
- *Features*: Users can upload product images, and view the identified product description and related products in natural language and tabular format.
|
- *Features*: Users can upload product images, and view the identified product description and related products in natural language and tabular format.
|
||||||
|
|
||||||
## Instructions for Presentation
|
## Instructions for Presentation
|
||||||
@@ -101,12 +113,15 @@ Participants are required to create two sets of videos for each module, detailin
|
|||||||
## Instructions for Coding
|
## Instructions for Coding
|
||||||
|
|
||||||
### General Guidelines
|
### General Guidelines
|
||||||
|
|
||||||
- *Class-Based Implementation*: It is recommended to use class-based implementation for all backend services to ensure organized, reusable, and maintainable code.
|
- *Class-Based Implementation*: It is recommended to use class-based implementation for all backend services to ensure organized, reusable, and maintainable code.
|
||||||
- *Best Practices*:
|
- *Best Practices*:
|
||||||
- *ACID Properties*: Ensure that database transactions are Atomic, Consistent, Isolated, and Durable to maintain data integrity and reliability.
|
- *ACID Properties*: Ensure that database transactions are Atomic, Consistent, Isolated, and Durable to maintain data integrity and reliability.
|
||||||
- *Modularity*: Build the codebase with clear modularity in mind. Separate different functionalities into distinct modules to enhance readability and maintainability.
|
- *Modularity*: Build the codebase with clear modularity in mind. Separate different functionalities into distinct modules to enhance readability and maintainability.
|
||||||
- *Packaging*: Organize your code into packages that reflect the services they provide. This approach not only helps in maintaining the code but also simplifies the deployment and scaling process.
|
- *Packaging*: Organize your code into packages that reflect the services they provide. This approach not only helps in maintaining the code but also simplifies the deployment and scaling process.
|
||||||
|
- Directories: Whenever you will test on notebook make sure you keep all the notebooks in ``notebook`` directory and use proper naming for the notebooks.
|
||||||
|
|
||||||
### Tech Stack
|
### Tech Stack
|
||||||
|
|
||||||
- *Web Framework*: Use Flask for developing the backend. Flask provides flexibility and ease of use for setting up API services.
|
- *Web Framework*: Use Flask for developing the backend. Flask provides flexibility and ease of use for setting up API services.
|
||||||
- *Vector Database*: Integrate Pinecone to manage and query vector data efficiently. Pinecone supports scalable vector searches which are crucial for the recommendation systems in this project.
|
- *Vector Database*: Integrate Pinecone to manage and query vector data efficiently. Pinecone supports scalable vector searches which are crucial for the recommendation systems in this project.
|
||||||
Binary file not shown.
Reference in New Issue
Block a user