This project aims to automate the generation of high-quality, agriculture-relevant keyword tags for agricultural stock photos using AI. The system will replace the current manual keyword tagging process, saving significant time and improving consistency.

What is Expected

AI Model: A model trained to generate 5–10 relevant, high-quality keywords per image, with a focus on agricultural context and subtle distinctions (e.g., farmer vs. rancher, male vs. female farmer).
Title Generation: Optionally generate a descriptive product title for each photo (e.g., "Farmer and son walking in cornfield").
Location Extraction: If location metadata is present in the image, extract and use it as a keyword (e.g., "Iowa").
CSV Output: For each photo, output a CSV row with:
- Photo file name
- Human-entered keywords (for comparison)
- AI-generated keywords
- AI-generated title (if available)
- Location (if available)
Training: The system should be trainable on a dataset of ~30,000 currently keyword-tagged photos.
Scalability: Should handle at least 1,000 photos/month (in batches of 500), with potential to double in 3 years.
Quality: Keywords and titles must be accurate, relevant, and reflect subtle ag-specific concepts.

Folder Structure

.
├── data/         # Datasets: training, validation, test images, and CSVs
│   ├── raw/      # Raw, unprocessed images and metadata
│   ├── processed/# Preprocessed data ready for modeling
│   └── ...
├── notebooks/    # Jupyter notebooks for EDA, prototyping, and experiments
├── src/          # Source code
│   ├── data/     # Data loading, preprocessing scripts
│   ├── model/    # Model architecture, training, inference code
│   ├── utils/    # Utility functions
│   └── main.py   # Main entry point for training/inference
├── outputs/      # Generated outputs (CSVs, predictions, logs)
├── docs.txt      # Project requirements and notes
├── README.md     # Project overview and instructions
└── .gitignore    # Files and folders to ignore in git

Directory Details

data/: All datasets. Use raw/ for original files, processed/ for cleaned/ready-to-use data.
notebooks/: Jupyter notebooks for data exploration, prototyping, and model development.
src/: All source code, organized by function (data, model, utils). main.py is the main script.
outputs/: All generated outputs, including CSVs with AI-generated tags/titles, logs, and model predictions.
docs.txt: The original requirements and project notes.
README.md: This file.
.gitignore: Keeps unnecessary files out of version control.

Deliverables

Well-documented code in src/
At least one Jupyter notebook showing EDA and model prototyping
Example CSV output as described above
Instructions for running the system
(Optional) Trained model weights

Deadline

All deliverables are expected within 3 days of project start.

README.md Unescape Escape

Smart Farm Photo Keyword Tagging AI

Project Overview

What is Expected

Folder Structure

Directory Details

Deliverables

Deadline

README.md