# Smart Farm Photo Keyword Tagging AI ## Project Overview This project aims to automate the generation of high-quality, agriculture-relevant keyword tags for agricultural stock photos using AI. The system will replace the current manual keyword tagging process, saving significant time and improving consistency. ## What is Expected - **AI Model**: A model trained to generate 5–10 relevant, high-quality keywords per image, with a focus on agricultural context and subtle distinctions (e.g., farmer vs. rancher, male vs. female farmer). - **Title Generation**: Optionally generate a descriptive product title for each photo (e.g., "Farmer and son walking in cornfield"). - **Location Extraction**: If location metadata is present in the image, extract and use it as a keyword (e.g., "Iowa"). - **CSV Output**: For each photo, output a CSV row with: - Photo file name - Human-entered keywords (for comparison) - AI-generated keywords - AI-generated title (if available) - Location (if available) - **Training**: The system should be trainable on a dataset of ~30,000 currently keyword-tagged photos. - **Scalability**: Should handle at least 1,000 photos/month (in batches of 500), with potential to double in 3 years. - **Quality**: Keywords and titles must be accurate, relevant, and reflect subtle ag-specific concepts. ## Folder Structure ``` . ├── data/ # Datasets: training, validation, test images, and CSVs │ ├── raw/ # Raw, unprocessed images and metadata │ ├── processed/# Preprocessed data ready for modeling │ └── ... ├── notebooks/ # Jupyter notebooks for EDA, prototyping, and experiments ├── src/ # Source code │ ├── data/ # Data loading, preprocessing scripts │ ├── model/ # Model architecture, training, inference code │ ├── utils/ # Utility functions │ └── main.py # Main entry point for training/inference ├── outputs/ # Generated outputs (CSVs, predictions, logs) ├── docs.txt # Project requirements and notes ├── README.md # Project overview and instructions └── .gitignore # Files and folders to ignore in git ``` ### Directory Details - **data/**: All datasets. Use `raw/` for original files, `processed/` for cleaned/ready-to-use data. - **notebooks/**: Jupyter notebooks for data exploration, prototyping, and model development. - **src/**: All source code, organized by function (data, model, utils). `main.py` is the main script. - **outputs/**: All generated outputs, including CSVs with AI-generated tags/titles, logs, and model predictions. - **docs.txt**: The original requirements and project notes. - **README.md**: This file. - **.gitignore**: Keeps unnecessary files out of version control. ## Deliverables - Well-documented code in `src/` - At least one Jupyter notebook showing EDA and model prototyping - Example CSV output as described above - Instructions for running the system - (Optional) Trained model weights ## Deadline **All deliverables are expected within 3 days of project start.**