update

2025-07-21 19:20:44 +01:00
parent 0b7af4050e
commit 7908b94d40
124 changed files with 968 additions and 3499 deletions
@@ -1,545 +1,194 @@
-# DS Task Recycling Project - Memory Module Detection
+# DS Task Recycling Project

-This project is a complete implementation of a Flask API that processes motherboard images and detects memory modules using YOLOv8. The API returns annotated images with bounding boxes drawn around each detected memory module.
+This project is a Flask API that processes images of motherboards to detect memory modules. It uses computer vision to identify and draw bounding boxes around memory modules present in the input images.

-## 🚀 Quick Start
+## Project Overview

-### 1. Install Dependencies
-```bash
-pip install -r requirements.txt
-```
-
-### 2. Train the Model
-```bash
-python3 train.py --epochs 100 --batch 16
-```
-
-### 3. Start the API
-```bash
-python3 main.py
-```
-
-### 4. Test the API
-```bash
-# Option 1: Use the Web Interface (Recommended for QA)
-# Open browser and go to: http://localhost:5000
-
-# Option 2: Use command line
-# Test with hardcoded image
-curl http://localhost:5000/detect/hardcoded
-
-# Upload an image
-curl -X POST -F "image=@your_image.png" http://localhost:5000/detect
-
-# Option 3: Run automated tests
-python3 test_api.py
-```
-
-## 📋 Project Overview
-
- **Algorithm Used:** YOLOv8 Nano (ultralytics)
 - **Input Types:**
-  - Image upload via Flask API
-  - Base64 encoded images
-  - Hardcoded test image
- **Dataset:** 40 images (20 with memory modules, 20 without)
- **Output:** Annotated images with bounding boxes and confidence scores
+  - Image upload via the Flask API
+  - A hardcoded test image (memory_out19.png) for testing purposes

-## 🏗️ Project Structure
+- **Dataset:**
+  - 20 pictures of motherboards with memory
+  - 20 pictures of motherboards without memory

-```
-ds_task_recycling_project/
-├── main.py                 # Flask API application (main interface)
-├── api_docs.py            # Swagger UI API documentation (developer only)
-├── train.py               # YOLOv8 training script
-├── inference_utils.py     # Detection and visualization utilities
-├── prepare_dataset.py     # Dataset preparation script
-├── test_api.py            # API testing script
-├── setup.py               # Automated setup script
-├── requirements.txt       # Python dependencies
-├── dataset.yaml          # YOLO dataset configuration
-├── .gitignore            # Git ignore file for ML projects
-├── VALIDATION_CHECKLIST.md # Project validation checklist
-├── templates/             # Frontend templates
-│   └── index.html        # QA testing web interface
-├── static/               # Frontend assets
-│   ├── style.css         # Styling for web interface
-│   └── script.js         # JavaScript for web interface
-├── venv/                 # Virtual environment (created by user)
-├── training/             # Dataset directory
-│   ├── memory/          # Images with memory modules + YOLO labels
-│   │   ├── out1.png     # Sample motherboard image with memory
-│   │   ├── out1.txt     # YOLO format annotation file
-│   │   └── ...          # 19 more image/label pairs
-│   ├── no_memory/       # Images without memory modules
-│   │   ├── out21.png    # Sample motherboard image without memory
-│   │   └── ...          # 19 more images (no labels needed)
-│   ├── train/           # Training split (80% = 32 images)
-│   │   ├── images/      # Training images
-│   │   └── labels/      # Training labels
-│   └── val/             # Validation split (20% = 8 images)
-│       ├── images/      # Validation images
-│       └── labels/      # Validation labels
-├── uploads/              # Temporary upload directory (created at runtime)
-└── runs/                # Training outputs (created after training)
-    └── detect/
-        └── memory_module_detection/
-            ├── weights/
-            │   ├── best.pt    # Best model weights
-            │   └── last.pt    # Last epoch weights
-            ├── train_batch*.jpg # Training visualization
-            ├── val_batch*.jpg   # Validation visualization
-            ├── confusion_matrix.png # Model performance metrics
-            ├── results.png     # Training curves
-            └── args.yaml      # Training arguments
-```
+- **Output:**
+  - An annotated image with bounding boxes around each detected memory module
+  - For example, if there are two memory modules, two boxes are drawn; if only one is detected, then one box is drawn

-### **📁 Key Files Description**
+- **Annotation Tool:**
+  - [makesense.ai](https://www.makesense.ai/) was used for manual annotation

-| File/Directory | Purpose | Usage |
-|----------------|---------|-------|
-| `main.py` | Main Flask API application | `python3 main.py` |
-| `api_docs.py` | Swagger UI documentation (developer only) | `python3 api_docs.py` |
-| `train.py` | YOLOv8 model training | `python3 train.py` |
-| `inference_utils.py` | Detection utilities and classes | Imported by other scripts |
-| `test_api.py` | Comprehensive API testing | `python3 test_api.py` |
-| `setup.py` | Automated project setup | `python3 setup.py` |
-| `templates/index.html` | Web interface for QA testing | Served by Flask |
-| `static/` | CSS, JavaScript, and assets | Served by Flask |
-| `training/` | Complete dataset with annotations | Used by training script |
-| `runs/` | Model training outputs | Created after training |
-| `venv/` | Python virtual environment | Created by user |
+## Implementation Details

-## 🤖 Algorithm Choice & Technical Decisions
+### Algorithm Choice & Rationale

-### 1. **Algorithm Choice: YOLOv8 Nano**
+1. **Which algorithm was chosen?**
+   - YOLOv8 (specifically YOLOv8n - the nano version) was selected for this task
+   
+2. **Why this algorithm?**
+   - Fast inference speed suitable for real-time applications
+   - Good balance between accuracy and computational requirements
+   - Built-in support for transfer learning
+   - Excellent performance on object detection tasks
+   - Easy integration with Python/Flask applications
+   - Robust community support and documentation

-**Which algorithm will you use for detecting the memory modules?**
- **Answer:** YOLOv8 Nano (You Only Look Once version 8, Nano variant)
+### Hardware Considerations

-**Why do you choose this particular algorithm?**
+3. **CPU/GPU Impact:**
+   - The current implementation runs on CPU for broader accessibility
+   - Model parameters were optimized for CPU performance:
+     - Reduced batch size (8)
+     - Lightweight augmentation
+     - Early stopping with patience=15
+   - GPU support is available through YOLO if needed for scaling
+   - Current performance is suitable for the demo nature of the project

-**Primary Reasons:**
- **State-of-the-art performance:** Latest evolution of YOLO family with superior accuracy
- **Real-time inference:** 37ms processing time, single-stage detector
- **Small object detection:** Excellent at detecting memory modules on motherboards
- **Pre-trained weights:** Leverages COCO dataset for transfer learning
- **Easy integration:** Ultralytics library with excellent Python API
- **Model efficiency:** Nano variant balances 99.5% mAP50 accuracy with speed
- **Production ready:** Proven architecture used in industrial applications
+### Video Processing Approach

-**Technical Advantages:**
- **Anchor-free design:** Eliminates anchor box tuning complexity
- **Advanced augmentation:** Built-in data augmentation strategies
- **Multi-scale detection:** Handles objects of different sizes effectively
- **Export flexibility:** ONNX, TensorRT support for deployment optimization
- **Active community:** Regular updates and extensive documentation
+4. **Handling Video Input:**
+   - While not currently implemented, video processing would involve:
+     - Frame extraction
+     - Batch processing of frames
+     - Real-time detection using YOLO's video processing capabilities
+     - Optional frame skipping for performance optimization
+   - The current architecture can be extended for video by:
+     - Adding a video upload endpoint
+     - Implementing frame-by-frame processing
+     - Returning annotated video or real-time stream

-### 2. **Hardware Considerations**
-
-**Does CPU or GPU have an impact on your decision? Please explain.**
-
-**Yes, hardware significantly impacts the implementation strategy:**
-
-**Training Phase:**
- **GPU Impact:** Critical for training efficiency
-  - **GPU Training:** 5-10 minutes for 50 epochs (recommended)
-  - **CPU Training:** 30-60 minutes for same epochs
-  - **Memory Requirements:** 4GB+ GPU memory recommended
-  - **Batch Size:** GPU allows larger batches (16-32) vs CPU (4-8)
-
-**Inference Phase:**
- **CPU Performance:** 37ms per image on modern CPU (Intel i5/i7, M1/M2)
- **GPU Performance:** 10-15ms per image, better for batch processing
- **Memory Usage:** CPU: 2-4GB RAM, GPU: 1-2GB VRAM
- **Edge Deployment:** Model runs efficiently on CPU-only devices
-
-**Decision Impact:**
- **Algorithm Choice:** YOLOv8 Nano chosen specifically for CPU compatibility
- **Deployment Flexibility:** No expensive GPU required for production
- **Cost Efficiency:** Reduces infrastructure costs
- **Scalability:** GPU enables high-throughput batch processing
-
-**Implementation:**
-```python
-# Auto-detection with fallback in train.py
-device = 'cuda' if torch.cuda.is_available() else 'cpu'
-print(f"Using device: {device}")
-```
-
-### 3. **Video Input Approach**
-
-**What if a video is provided instead of single images?**
-**Does your approach change when processing videos? Please describe your approach.**
-
-**Yes, the approach would change significantly for video processing:**
-
-**Video Processing Strategy:**
-
-**1. Frame Extraction & Sampling**
-```python
-def process_video(video_path, fps_sample=5):
-    cap = cv2.VideoCapture(video_path)
-    frame_rate = cap.get(cv2.CAP_PROP_FPS)
-    frame_interval = int(frame_rate / fps_sample)  # Sample every N frames
-
-    frames = []
-    frame_count = 0
-    while cap.isOpened():
-        ret, frame = cap.read()
-        if not ret:
-            break
-        if frame_count % frame_interval == 0:
-            frames.append(frame)
-        frame_count += 1
-    return frames
-```
-
-**2. Batch Processing for Efficiency**
-```python
-def batch_detect_video(frames, batch_size=8):
-    results = []
-    for i in range(0, len(frames), batch_size):
-        batch = frames[i:i+batch_size]
-        batch_results = model(batch)  # Process multiple frames at once
-        results.extend(batch_results)
-    return results
-```
-
-**3. Temporal Consistency & Tracking**
-```python
-def apply_temporal_tracking(detections, frames):
-    tracker = DeepSORT()  # Or ByteTrack for better performance
-    tracked_results = []
-
-    for frame_detections, frame in zip(detections, frames):
-        tracked_objects = tracker.update(frame_detections)
-        tracked_results.append(tracked_objects)
-
-    return tracked_results
-```
-
-**4. Optimization Strategies**
- **Motion Detection:** Skip frames with no significant changes
- **Optical Flow:** Track objects between frames to reduce processing
- **Keyframe Selection:** Process only important frames
- **Parallel Processing:** Use multiple CPU cores/GPU streams
- **Memory Management:** Process in chunks to avoid overflow
-
-**5. Video-Specific Considerations**
- **Temporal Smoothing:** Apply filters to reduce detection jitter
- **Performance Scaling:** GPU becomes more critical for video processing
- **Storage Requirements:** Annotated videos require significant storage
- **Real-time Processing:** Streaming vs batch processing trade-offs
-
-**Potential API Endpoint:**
-```python
-@app.route('/detect/video', methods=['POST'])
-def detect_video():
-    # Upload video file
-    # Extract frames at specified FPS
-    # Batch process frames with YOLOv8
-    # Apply temporal tracking for consistency
-    # Return annotated video or frame-by-frame results
-```
-
-## **Technical Questions Summary**
-
-The project successfully addresses all required technical questions:
-
-1. **✅ Algorithm Choice:** YOLOv8 Nano selected for optimal balance of accuracy (99.5% mAP50), speed (37ms), and deployment flexibility
-2. **✅ Hardware Considerations:** Comprehensive CPU/GPU analysis with auto-detection and fallback strategies for maximum compatibility
-3. **✅ Video Processing:** Complete video processing strategy with frame extraction, batch processing, temporal tracking, and optimization techniques
-
-All technical decisions are implemented and validated in the working system.
-
-## Installation & Setup
-
-### Prerequisites
- Python 3.8+
- pip or conda
-
-### Step-by-Step Installation
-
-1. **Clone/Download the project**
-```bash
-cd ds_task_recycling_project
-```
-
-2. **Install dependencies**
-```bash
-pip install -r requirements.txt
-```
-
-3. **Prepare dataset (if not already done)**
-```bash
-python3 prepare_dataset.py
-```
-
-4. **Train the model**
-```bash
-# Basic training (recommended)
-python3 train.py
-
-# Custom training parameters
-python3 train.py --epochs 150 --batch 8 --device cuda
-```
-
-5. **Start the Flask API**
-```bash
-python3 main.py
-```
-
-The API will be available at `http://localhost:5000`
-
-## 🌐 Web Interface for QA Testing
-
-We've included a comprehensive web interface for easy QA testing:
-
-### Features:
- **Drag & Drop Image Upload** - Easy image selection
- **Real-time API Status** - Shows if API and model are loaded
- **Multiple Test Options:**
-  - Test hardcoded image
-  - Upload custom images
-  - Run comprehensive API tests
- **Interactive Results** - View annotated images with detection details
- **Confidence Threshold Control** - Adjust detection sensitivity
- **Responsive Design** - Works on desktop and mobile
-
-### Access:
-1. Start the API: `python3 main.py`
-2. Open browser: `http://localhost:5000`
-3. Use the interface to test detection functionality
-
-### QA Testing Workflow:
-1. **Check API Status** - Verify green "API Online" indicator
-2. **Test Hardcoded Image** - Click "Test Hardcoded Image" button
-3. **Upload Custom Images** - Drag/drop or select motherboard images
-4. **Adjust Confidence** - Use slider to test different thresholds
-5. **Run All Tests** - Comprehensive API endpoint testing
-6. **Review Results** - Check detection accuracy and annotations
-
-## 📡 API Documentation
-
-### Base URL
-```
-http://localhost:5000
-```
+## API Implementation

 ### Endpoints

-#### 1. **GET /** - API Information
+1. **Image Upload (`/detect`):**
+   ```http
+   POST /detect
+   Content-Type: multipart/form-data
+   ```
+   - Accepts image uploads
+   - Returns annotated image with detection boxes
+
+2. **Test Detection (`/detect/test`):**
+   ```http
+   GET /detect/test
+   ```
+   - Uses a hardcoded test image (memory_out19.png)
+   - Returns annotated image with detection boxes
+
+### Processing Workflow
+
+1. Image Reception:
+   - Via file upload or hardcoded test image
+2. Detection:
+   - YOLOv8 processes the image
+   - Confidence threshold: 0.25
+   - IoU threshold: 0.45
+3. Annotation:
+   - Bounding boxes drawn around detected modules
+4. Response:
+   - Annotated image returned in PNG format
+
+## Model Training
+
+The model was trained with the following parameters:
+- 50 epochs
+- Image size: 640x640
+- Batch size: 8
+- Early stopping patience: 15
+- Augmentations:
+  - Rotation (±5°)
+  - Scale (0.5)
+  - Translation (0.1)
+  - Horizontal flip (0.5)
+  - Mosaic (1.0)
+
+## Dataset Preparation
+
 ```bash
-curl http://localhost:5000/
+training/
+├── memory/
+│   └── (images with memory modules) #You have this 
+├── no_memory/
+│   └── (images without memory modules) #You have this as well
+├── train/
+│   ├── images/
+│   │   ├── memory_*.png
+│   │   └── no_memory_*.png
+│   └── labels/
+│       ├── memory_*.txt
+│       └── no_memory_*.txt
+└── val/
+    ├── images/
+    │   ├── memory_*.png
+    │   └── no_memory_*.png
+    └── labels/
+        ├── memory_*.txt
+        └── no_memory_*.txt
+
+dataset.yaml
 ```

-**Response:**
-```json
-{
-  "message": "Memory Module Detection API",
-  "version": "1.0.0",
-  "endpoints": {...},
-  "model_loaded": true,
-  "supported_formats": ["png", "jpg", "jpeg", "gif", "bmp"]
-}
+The dataset is organized as follows:
+- `training/memory/`: Source directory for images with memory modules
+- `training/no_memory/`: Source directory for images without memory modules
+- `training/train/`: Training dataset
+  - `images/`: Contains both memory and no-memory images with appropriate prefixes
+  - `labels/`: Contains YOLO format annotation files
+- `training/val/`: Validation dataset
+  - `images/`: Contains both memory and no-memory images with appropriate prefixes
+  - `labels/`: Contains YOLO format annotation files
+
+The `dataset.yaml` file contains:
+```yaml
+path: training  # dataset root dir
+train: train/images  # train images
+val: val/images    # validation images
+nc: 1  # number of classes
+names: ['memory_module']  # class names
 ```

-#### 2. **GET /health** - Health Check
+## Getting Started
+
+1. Clone the repository:
+   ```bash
+   git clone http://23.29.118.76:3000/michael/ds_task_recycling_project.git
+   ```
+2. Install dependencies:
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. Prepare the dataset:
+   ```bash
+   python prepare_dataset.py
+   ```
+4. Train the model (if not already trained):
+   ```bash
+   python train.py
+   ```
+5. Run the Flask application:
+   ```bash
+   python run.py
+   ```
+6. Access the web interface at `http://localhost:5000`
+
+## Testing
+
+The project includes comprehensive tests for the detector:
+- Batch detection testing
+- Threshold optimization
+- Various confidence/IoU threshold combinations
+
+Run tests with:
 ```bash
-curl http://localhost:5000/health
+pytest tests/
 ```

-#### 3. **POST /detect** - Upload Image Detection
-```bash
-curl -X POST -F "image=@motherboard.png" -F "confidence=0.5" http://localhost:5000/detect
-```
+## Future Improvements

-**Response:**
-```json
-{
-  "success": true,
-  "detections": [
-    {
-      "bbox": [100, 150, 200, 250],
-      "confidence": 0.85,
-      "class": 0,
-      "class_name": "memory_module"
-    }
-  ],
-  "num_detections": 1,
-  "annotated_image": "base64_encoded_image...",
-  "confidence_threshold": 0.5
-}
-```
-
-#### 4. **GET /detect/hardcoded** - Test with Hardcoded Image
-```bash
-curl "http://localhost:5000/detect/hardcoded?confidence=0.5"
-```
-
-#### 5. **POST /detect/base64** - Base64 Image Detection
-```bash
-curl -X POST -H "Content-Type: application/json" \
-  -d '{"image": "base64_string", "confidence": 0.5}' \
-  http://localhost:5000/detect/base64
-```
-
-## 🧪 Testing & Usage Examples
-
-### 1. **Test with Python requests**
-```python
-import requests
-import base64
-
-# Test hardcoded image
-response = requests.get('http://localhost:5000/detect/hardcoded')
-result = response.json()
-print(f"Found {result['num_detections']} memory modules")
-
-# Upload image
-with open('test_image.png', 'rb') as f:
-    files = {'image': f}
-    response = requests.post('http://localhost:5000/detect', files=files)
-    result = response.json()
-```
-
-### 2. **Test with curl**
-```bash
-# Basic detection
-curl -X POST -F "image=@training/memory/out1.png" http://localhost:5000/detect
-
-# With custom confidence
-curl -X POST -F "image=@training/memory/out1.png" -F "confidence=0.3" http://localhost:5000/detect
-```
-
-### 3. **Command Line Inference**
-```bash
-# Test single image
-python3 inference_utils.py --image training/memory/out1.png --conf 0.5
-
-# Validate trained model
-python3 train.py --validate --model runs/detect/memory_module_detection/weights/best.pt
-```
-
-## 📊 Training Details
-
-### Dataset Statistics
- **Total Images:** 40 (20 with memory, 20 without)
- **Training Split:** 32 images (80%)
- **Validation Split:** 8 images (20%)
- **Classes:** 1 (memory_module)
- **Annotation Format:** YOLO (normalized coordinates)
-
-### Training Configuration
-```python
-# Default training parameters
-epochs = 100
-batch_size = 16
-image_size = 640
-confidence_threshold = 0.5
-iou_threshold = 0.45
-```
-
-### Expected Training Time
- **GPU (RTX 3060+):** 5-10 minutes
- **CPU (Modern):** 30-60 minutes
- **Memory Usage:** 2-4GB RAM
-
-### Model Performance
-After training, you should see:
- **mAP50:** >0.8 (80%+ accuracy at 50% IoU)
- **Precision:** >0.85
- **Recall:** >0.80
-
-## 🐛 Troubleshooting
-
-### Common Issues
-
-#### 1. **Model Not Found Error**
-```
-Error: Model not found at runs/detect/memory_module_detection/weights/best.pt
-```
-**Solution:** Train the model first
-```bash
-python3 train.py
-```
-
-#### 2. **CUDA Out of Memory**
-```
-RuntimeError: CUDA out of memory
-```
-**Solutions:**
- Reduce batch size: `python3 train.py --batch 8`
- Use CPU: `python3 train.py --device cpu`
- Close other GPU applications
-
-#### 3. **Import Error: ultralytics**
-```
-ModuleNotFoundError: No module named 'ultralytics'
-```
-**Solution:**
-```bash
-pip install ultralytics
-```
-
-#### 4. **Flask Port Already in Use**
-```
-OSError: [Errno 48] Address already in use
-```
-**Solution:**
-```bash
-# Kill process using port 5000
-lsof -ti:5000 | xargs kill -9
-
-# Or use different port
-python3 main.py  # Edit main.py to change port
-```
-
-#### 5. **Low Detection Accuracy**
-**Solutions:**
- Increase training epochs: `python3 train.py --epochs 200`
- Lower confidence threshold: `confidence=0.3`
- Check image quality and lighting
- Verify annotations are correct
-
-### Performance Optimization
-
-#### For Better Accuracy:
-1. **More Training Data:** Add more annotated images
-2. **Data Augmentation:** Already included in YOLOv8
-3. **Hyperparameter Tuning:** Adjust learning rate, batch size
-4. **Model Size:** Use YOLOv8s or YOLOv8m for better accuracy
-
-#### For Faster Inference:
-1. **Model Quantization:** Convert to TensorRT or ONNX
-2. **Batch Processing:** Process multiple images together
-3. **Image Resizing:** Use smaller input size (320x320)
-
-## 📁 File Descriptions
-
- **`main.py`** - Flask API with all endpoints
- **`train.py`** - YOLOv8 training script with validation
- **`inference_utils.py`** - Detection utilities and visualization
- **`prepare_dataset.py`** - Dataset preparation and splitting
- **`requirements.txt`** - Python dependencies
- **`dataset.yaml`** - YOLO dataset configuration
-
-## 🔮 Future Enhancements
-
-1. **Video Processing:** Add video upload and processing endpoints
-2. **Model Ensemble:** Combine multiple models for better accuracy
-3. **Real-time Streaming:** WebSocket support for live camera feeds
-4. **Database Integration:** Store detection results and statistics
-5. **Web Interface:** HTML frontend for easier testing
-6. **Docker Deployment:** Containerized deployment
-7. **Model Versioning:** Support multiple model versions
-8. **Batch Processing:** Process multiple images simultaneously
-
-## 📄 License
-
-This project is for educational and training purposes.
-
-## 🤝 Contributing
-
-This is a toy project for training purposes. Feel free to experiment and improve!
+1. GPU support for faster processing
+2. Video input support
+3. Real-time streaming capabilities
+4. More sophisticated augmentation techniques
+5. Model quantization for improved CPU performance