# 🚜 Agricultural Photo Keyword Training Guide

## Overview

This guide explains how to train a custom agricultural keyword generation model using your 30,000 tagged photos dataset.

## 📋 Prerequisites

### 1. Hardware Requirements
- **GPU**: NVIDIA GPU with 8GB+ VRAM (recommended)
- **RAM**: 16GB+ system RAM
- **Storage**: 50GB+ free space for model and data

### 2. Software Requirements
```bash
# Install additional training dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install transformers datasets accelerate
pip install scikit-learn tqdm
```

## 📁 Data Preparation

### 1. Organize Your 30,000 Photos
```
data/training/
├── photo_001.jpg
├── photo_002.jpg
├── ...
├── photo_30000.jpg
└── metadata.csv
```

### 2. Create Metadata CSV
Your `metadata.csv` should have this format:
```csv
filename,keywords
photo_001.jpg,"farmer, corn, field, agriculture, male, tractor"
photo_002.jpg,"dairy cow, barn, livestock, farming, rural"
photo_003.jpg,"chicken, poultry, farm, feeding, outdoor"
...
```

**Required columns:**
- `filename`: Image filename (must exist in data/training/)
- `keywords`: Comma-separated keywords for the image

## 🚀 Training Process

### Step 1: Prepare Sample Data (Testing)
```bash
# Create sample data for testing the pipeline
python3 src/train_model.py --create-sample --data-dir data/training
```

### Step 2: Train on Your 30,000 Photos
```bash
# Basic training command
python3 src/train_model.py \
    --data-dir data/training \
    --metadata-file data/training/metadata.csv \
    --epochs 5 \
    --batch-size 8 \
    --learning-rate 5e-5

# Advanced training with custom settings
python3 src/train_model.py \
    --data-dir data/training \
    --metadata-file data/training/metadata.csv \
    --output-dir models/custom_agricultural_model \
    --epochs 10 \
    --batch-size 16 \
    --learning-rate 3e-5 \
    --val-split 0.15 \
    --num-workers 8
```

### Step 3: Monitor Training
Training logs are saved to `models/agricultural_blip/training.log`:
```bash
# Monitor training progress
tail -f models/agricultural_blip/training.log
```

### Step 4: Use Trained Model
```bash
# Use your custom trained model for inference
python3 src/main.py \
    --input data/raw \
    --output outputs \
    --model-path models/agricultural_blip/best_model
```

## ⚙️ Training Parameters

### Key Parameters
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--epochs` | 5 | Number of training epochs |
| `--batch-size` | 8 | Training batch size (reduce if GPU memory issues) |
| `--learning-rate` | 5e-5 | Learning rate for optimization |
| `--val-split` | 0.2 | Fraction of data for validation |
| `--num-workers` | 4 | Data loading workers |

### GPU Memory Optimization
If you encounter GPU memory issues:
```bash
# Reduce batch size
python3 src/train_model.py --batch-size 4

# Use gradient accumulation (simulates larger batch)
# This is handled automatically in the training code
```

## 📊 Training Monitoring

### Training Metrics
The training script tracks:
- **Training Loss**: How well model fits training data
- **Validation Loss**: How well model generalizes
- **Learning Rate**: Optimization parameter schedule

### Expected Training Time
- **30,000 photos**: ~6-12 hours on modern GPU
- **Batch size 8**: ~45 minutes per epoch
- **Early stopping**: Training stops if no improvement

### Model Checkpoints
Models are saved to `models/agricultural_blip/`:
- `best_model/`: Best performing model (lowest validation loss)
- `final_model/`: Model after all epochs
- `checkpoint_epoch_N/`: Intermediate checkpoints

## 🎯 Training Data Quality

### Keyword Quality Guidelines
For best results, ensure your 30,000 photos have:

1. **Consistent Keywords**: Use standardized terms
   - ✅ "farmer" not "farm worker" or "agricultural worker"
   - ✅ "tractor" not "farm equipment" or "machinery"

2. **Specific Agricultural Terms**:
   - ✅ "dairy farmer" vs "rancher" vs "chicken farmer"
   - ✅ "corn field" vs "wheat field" vs "soybean field"

3. **5-10 Keywords per Image**: Optimal range for training

4. **Balanced Dataset**: Include variety of:
   - Crops (corn, wheat, soy, etc.)
   - Livestock (cattle, pigs, chickens)
   - Equipment (tractors, harvesters)
   - People (farmers, ranchers, workers)
   - Settings (fields, barns, farms)

### Data Analysis
Before training, analyze your dataset:
```bash
# The training script will show data analysis
python3 src/train_model.py --data-dir data/training --metadata-file data/training/metadata.csv
```

## 🔧 Troubleshooting

### Common Issues

**1. GPU Out of Memory**
```bash
# Solution: Reduce batch size
python3 src/train_model.py --batch-size 4
```

**2. Training Too Slow**
```bash
# Solution: Increase batch size and workers (if GPU allows)
python3 src/train_model.py --batch-size 16 --num-workers 8
```

**3. Poor Model Performance**
- Check keyword quality and consistency
- Increase training epochs
- Verify image quality and variety

**4. Model Not Loading**
```bash
# Check if model path exists
ls -la models/agricultural_blip/best_model/
```

## 📈 Performance Expectations

### After Training on 30,000 Photos
- **Keyword Accuracy**: 80-90% relevant keywords
- **Agricultural Distinctions**: Improved farmer vs rancher detection
- **Domain Specificity**: Better recognition of agricultural terms
- **Processing Speed**: Same as pre-trained model (~3 seconds/image)

### Validation Metrics
- **Training Loss**: Should decrease over epochs
- **Validation Loss**: Should decrease and stabilize
- **Early Stopping**: Prevents overfitting

## 🚀 Production Deployment

### Using Trained Model
```bash
# Replace pre-trained model with your custom model
python3 src/main.py \
    --input data/raw \
    --output outputs \
    --model-path models/agricultural_blip/best_model
```

### Model Sharing
Your trained model can be shared by copying:
```
models/agricultural_blip/best_model/
├── config.json
├── pytorch_model.bin
├── preprocessor_config.json
├── tokenizer.json
├── tokenizer_config.json
└── training_state.pt
```

## 📋 Training Checklist

- [ ] **Hardware**: GPU with 8GB+ VRAM available
- [ ] **Data**: 30,000 photos organized in data/training/
- [ ] **Metadata**: CSV file with filename and keywords columns
- [ ] **Dependencies**: Training packages installed
- [ ] **Storage**: 50GB+ free space
- [ ] **Time**: 6-12 hours available for training
- [ ] **Monitoring**: Training logs being tracked

## 🎯 Next Steps

1. **Prepare your 30,000 photo dataset**
2. **Create metadata.csv with keywords**
3. **Run training script**
4. **Evaluate trained model performance**
5. **Deploy for production use**

---

**Ready to train?** Start with sample data to test the pipeline, then scale to your full 30,000 photo dataset!