# Smart Farm Photo Keyword Tagging AI

## Project Overview
This project aims to automate the generation of high-quality, agriculture-relevant keyword tags for agricultural stock photos using AI. The system will replace the current manual keyword tagging process, saving significant time and improving consistency.

## What is Expected
- **AI Model**: A model trained to generate 5–10 relevant, high-quality keywords per image, with a focus on agricultural context and subtle distinctions (e.g., farmer vs. rancher, male vs. female farmer).
- **Title Generation**: Optionally generate a descriptive product title for each photo (e.g., "Farmer and son walking in cornfield").
- **Location Extraction**: If location metadata is present in the image, extract and use it as a keyword (e.g., "Iowa").
- **CSV Output**: For each photo, output a CSV row with:
  - Photo file name
  - Human-entered keywords (for comparison)
  - AI-generated keywords
  - AI-generated title (if available)
  - Location (if available)
- **Training**: The system should be trainable on a dataset of ~30,000 currently keyword-tagged photos.
- **Scalability**: Should handle at least 1,000 photos/month (in batches of 500), with potential to double in 3 years.
- **Quality**: Keywords and titles must be accurate, relevant, and reflect subtle ag-specific concepts.

## Folder Structure
```
.
├── data/         # Datasets: training, validation, test images, and CSVs
│   ├── raw/      # Raw, unprocessed images and metadata
│   ├── processed/# Preprocessed data ready for modeling
│   └── ...
├── notebooks/    # Jupyter notebooks for EDA, prototyping, and experiments
├── src/          # Source code
│   ├── data/     # Data loading, preprocessing scripts
│   ├── model/    # Model architecture, training, inference code
│   ├── utils/    # Utility functions
│   └── main.py   # Main entry point for training/inference
├── outputs/      # Generated outputs (CSVs, predictions, logs)
├── docs.txt      # Project requirements and notes
├── README.md     # Project overview and instructions
└── .gitignore    # Files and folders to ignore in git
```

### Directory Details
- **data/**: All datasets. Use `raw/` for original files, `processed/` for cleaned/ready-to-use data.
- **notebooks/**: Jupyter notebooks for data exploration, prototyping, and model development.
- **src/**: All source code, organized by function (data, model, utils). `main.py` is the main script.
- **outputs/**: All generated outputs, including CSVs with AI-generated tags/titles, logs, and model predictions.
- **docs.txt**: The original requirements and project notes.
- **README.md**: This file.
- **.gitignore**: Keeps unnecessary files out of version control.

## Deliverables
- Well-documented code in `src/`
- At least one Jupyter notebook showing EDA and model prototyping
- Example CSV output as described above
- Instructions for running the system
- (Optional) Trained model weights

## Deadline
**All deliverables are expected within 3 days of project start.**