195 lines
5.4 KiB
Markdown
195 lines
5.4 KiB
Markdown
# DS Task Recycling Project
|
|
|
|
This project is a Flask API that processes images of motherboards to detect memory modules. It uses computer vision to identify and draw bounding boxes around memory modules present in the input images.
|
|
|
|
## Project Overview
|
|
|
|
- **Input Types:**
|
|
- Image upload via the Flask API
|
|
- A hardcoded test image (memory_out19.png) for testing purposes
|
|
|
|
- **Dataset:**
|
|
- 20 pictures of motherboards with memory
|
|
- 20 pictures of motherboards without memory
|
|
|
|
- **Output:**
|
|
- An annotated image with bounding boxes around each detected memory module
|
|
- For example, if there are two memory modules, two boxes are drawn; if only one is detected, then one box is drawn
|
|
|
|
- **Annotation Tool:**
|
|
- [makesense.ai](https://www.makesense.ai/) was used for manual annotation
|
|
|
|
## Implementation Details
|
|
|
|
### Algorithm Choice & Rationale
|
|
|
|
1. **Which algorithm was chosen?**
|
|
- YOLOv8 (specifically YOLOv8n - the nano version) was selected for this task
|
|
|
|
2. **Why this algorithm?**
|
|
- Fast inference speed suitable for real-time applications
|
|
- Good balance between accuracy and computational requirements
|
|
- Built-in support for transfer learning
|
|
- Excellent performance on object detection tasks
|
|
- Easy integration with Python/Flask applications
|
|
- Robust community support and documentation
|
|
|
|
### Hardware Considerations
|
|
|
|
3. **CPU/GPU Impact:**
|
|
- The current implementation runs on CPU for broader accessibility
|
|
- Model parameters were optimized for CPU performance:
|
|
- Reduced batch size (8)
|
|
- Lightweight augmentation
|
|
- Early stopping with patience=15
|
|
- GPU support is available through YOLO if needed for scaling
|
|
- Current performance is suitable for the demo nature of the project
|
|
|
|
### Video Processing Approach
|
|
|
|
4. **Handling Video Input:**
|
|
- While not currently implemented, video processing would involve:
|
|
- Frame extraction
|
|
- Batch processing of frames
|
|
- Real-time detection using YOLO's video processing capabilities
|
|
- Optional frame skipping for performance optimization
|
|
- The current architecture can be extended for video by:
|
|
- Adding a video upload endpoint
|
|
- Implementing frame-by-frame processing
|
|
- Returning annotated video or real-time stream
|
|
|
|
## API Implementation
|
|
|
|
### Endpoints
|
|
|
|
1. **Image Upload (`/detect`):**
|
|
```http
|
|
POST /detect
|
|
Content-Type: multipart/form-data
|
|
```
|
|
- Accepts image uploads
|
|
- Returns annotated image with detection boxes
|
|
|
|
2. **Test Detection (`/detect/test`):**
|
|
```http
|
|
GET /detect/test
|
|
```
|
|
- Uses a hardcoded test image (memory_out19.png)
|
|
- Returns annotated image with detection boxes
|
|
|
|
### Processing Workflow
|
|
|
|
1. Image Reception:
|
|
- Via file upload or hardcoded test image
|
|
2. Detection:
|
|
- YOLOv8 processes the image
|
|
- Confidence threshold: 0.25
|
|
- IoU threshold: 0.45
|
|
3. Annotation:
|
|
- Bounding boxes drawn around detected modules
|
|
4. Response:
|
|
- Annotated image returned in PNG format
|
|
|
|
## Model Training
|
|
|
|
The model was trained with the following parameters:
|
|
- 50 epochs
|
|
- Image size: 640x640
|
|
- Batch size: 8
|
|
- Early stopping patience: 15
|
|
- Augmentations:
|
|
- Rotation (±5°)
|
|
- Scale (0.5)
|
|
- Translation (0.1)
|
|
- Horizontal flip (0.5)
|
|
- Mosaic (1.0)
|
|
|
|
## Dataset Preparation
|
|
|
|
```bash
|
|
training/
|
|
├── memory/
|
|
│ └── (images with memory modules) #You have this
|
|
├── no_memory/
|
|
│ └── (images without memory modules) #You have this as well
|
|
├── train/
|
|
│ ├── images/
|
|
│ │ ├── memory_*.png
|
|
│ │ └── no_memory_*.png
|
|
│ └── labels/
|
|
│ ├── memory_*.txt
|
|
│ └── no_memory_*.txt
|
|
└── val/
|
|
├── images/
|
|
│ ├── memory_*.png
|
|
│ └── no_memory_*.png
|
|
└── labels/
|
|
├── memory_*.txt
|
|
└── no_memory_*.txt
|
|
|
|
dataset.yaml
|
|
```
|
|
|
|
The dataset is organized as follows:
|
|
- `training/memory/`: Source directory for images with memory modules
|
|
- `training/no_memory/`: Source directory for images without memory modules
|
|
- `training/train/`: Training dataset
|
|
- `images/`: Contains both memory and no-memory images with appropriate prefixes
|
|
- `labels/`: Contains YOLO format annotation files
|
|
- `training/val/`: Validation dataset
|
|
- `images/`: Contains both memory and no-memory images with appropriate prefixes
|
|
- `labels/`: Contains YOLO format annotation files
|
|
|
|
The `dataset.yaml` file contains:
|
|
```yaml
|
|
path: training # dataset root dir
|
|
train: train/images # train images
|
|
val: val/images # validation images
|
|
nc: 1 # number of classes
|
|
names: ['memory_module'] # class names
|
|
```
|
|
|
|
## Getting Started
|
|
|
|
1. Clone the repository:
|
|
```bash
|
|
git clone http://23.29.118.76:3000/michael/ds_task_recycling_project.git
|
|
```
|
|
2. Install dependencies:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
3. Prepare the dataset:
|
|
```bash
|
|
python prepare_dataset.py
|
|
```
|
|
4. Train the model (if not already trained):
|
|
```bash
|
|
python train.py
|
|
```
|
|
5. Run the Flask application:
|
|
```bash
|
|
python run.py
|
|
```
|
|
6. Access the web interface at `http://localhost:5000`
|
|
|
|
## Testing
|
|
|
|
The project includes comprehensive tests for the detector:
|
|
- Batch detection testing
|
|
- Threshold optimization
|
|
- Various confidence/IoU threshold combinations
|
|
|
|
Run tests with:
|
|
```bash
|
|
pytest tests/
|
|
```
|
|
|
|
## Future Improvements
|
|
|
|
1. GPU support for faster processing
|
|
2. Video input support
|
|
3. Real-time streaming capabilities
|
|
4. More sophisticated augmentation techniques
|
|
5. Model quantization for improved CPU performance
|