5.4 KiB
5.4 KiB
DS Task Recycling Project
This project is a Flask API that processes images of motherboards to detect memory modules. It uses computer vision to identify and draw bounding boxes around memory modules present in the input images.
Project Overview
-
Input Types:
- Image upload via the Flask API
- A hardcoded test image (memory_out19.png) for testing purposes
-
Dataset:
- 20 pictures of motherboards with memory
- 20 pictures of motherboards without memory
-
Output:
- An annotated image with bounding boxes around each detected memory module
- For example, if there are two memory modules, two boxes are drawn; if only one is detected, then one box is drawn
-
Annotation Tool:
- makesense.ai was used for manual annotation
Implementation Details
Algorithm Choice & Rationale
-
Which algorithm was chosen?
- YOLOv8 (specifically YOLOv8n - the nano version) was selected for this task
-
Why this algorithm?
- Fast inference speed suitable for real-time applications
- Good balance between accuracy and computational requirements
- Built-in support for transfer learning
- Excellent performance on object detection tasks
- Easy integration with Python/Flask applications
- Robust community support and documentation
Hardware Considerations
- CPU/GPU Impact:
- The current implementation runs on CPU for broader accessibility
- Model parameters were optimized for CPU performance:
- Reduced batch size (8)
- Lightweight augmentation
- Early stopping with patience=15
- GPU support is available through YOLO if needed for scaling
- Current performance is suitable for the demo nature of the project
Video Processing Approach
- Handling Video Input:
- While not currently implemented, video processing would involve:
- Frame extraction
- Batch processing of frames
- Real-time detection using YOLO's video processing capabilities
- Optional frame skipping for performance optimization
- The current architecture can be extended for video by:
- Adding a video upload endpoint
- Implementing frame-by-frame processing
- Returning annotated video or real-time stream
- While not currently implemented, video processing would involve:
API Implementation
Endpoints
-
Image Upload (
/detect):POST /detect Content-Type: multipart/form-data- Accepts image uploads
- Returns annotated image with detection boxes
-
Test Detection (
/detect/test):GET /detect/test- Uses a hardcoded test image (memory_out19.png)
- Returns annotated image with detection boxes
Processing Workflow
- Image Reception:
- Via file upload or hardcoded test image
- Detection:
- YOLOv8 processes the image
- Confidence threshold: 0.25
- IoU threshold: 0.45
- Annotation:
- Bounding boxes drawn around detected modules
- Response:
- Annotated image returned in PNG format
Model Training
The model was trained with the following parameters:
- 50 epochs
- Image size: 640x640
- Batch size: 8
- Early stopping patience: 15
- Augmentations:
- Rotation (±5°)
- Scale (0.5)
- Translation (0.1)
- Horizontal flip (0.5)
- Mosaic (1.0)
Dataset Preparation
training/
├── memory/
│ └── (images with memory modules) #You have this
├── no_memory/
│ └── (images without memory modules) #You have this as well
├── train/
│ ├── images/
│ │ ├── memory_*.png
│ │ └── no_memory_*.png
│ └── labels/
│ ├── memory_*.txt
│ └── no_memory_*.txt
└── val/
├── images/
│ ├── memory_*.png
│ └── no_memory_*.png
└── labels/
├── memory_*.txt
└── no_memory_*.txt
dataset.yaml
The dataset is organized as follows:
training/memory/: Source directory for images with memory modulestraining/no_memory/: Source directory for images without memory modulestraining/train/: Training datasetimages/: Contains both memory and no-memory images with appropriate prefixeslabels/: Contains YOLO format annotation files
training/val/: Validation datasetimages/: Contains both memory and no-memory images with appropriate prefixeslabels/: Contains YOLO format annotation files
The dataset.yaml file contains:
path: training # dataset root dir
train: train/images # train images
val: val/images # validation images
nc: 1 # number of classes
names: ['memory_module'] # class names
Getting Started
- Clone the repository:
git clone http://23.29.118.76:3000/michael/ds_task_recycling_project.git - Install dependencies:
pip install -r requirements.txt - Prepare the dataset:
python prepare_dataset.py - Train the model (if not already trained):
python train.py - Run the Flask application:
python run.py - Access the web interface at
http://localhost:5000
Testing
The project includes comprehensive tests for the detector:
- Batch detection testing
- Threshold optimization
- Various confidence/IoU threshold combinations
Run tests with:
pytest tests/
Future Improvements
- GPU support for faster processing
- Video input support
- Real-time streaming capabilities
- More sophisticated augmentation techniques
- Model quantization for improved CPU performance