Files
2025-07-03 15:27:59 +01:00

56 lines
3.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Smart Farm Photo Keyword Tagging AI
## Project Overview
This project aims to automate the generation of high-quality, agriculture-relevant keyword tags for agricultural stock photos using AI. The system will replace the current manual keyword tagging process, saving significant time and improving consistency.
## What is Expected
- **AI Model**: A model trained to generate 510 relevant, high-quality keywords per image, with a focus on agricultural context and subtle distinctions (e.g., farmer vs. rancher, male vs. female farmer).
- **Title Generation**: Optionally generate a descriptive product title for each photo (e.g., "Farmer and son walking in cornfield").
- **Location Extraction**: If location metadata is present in the image, extract and use it as a keyword (e.g., "Iowa").
- **CSV Output**: For each photo, output a CSV row with:
- Photo file name
- Human-entered keywords (for comparison)
- AI-generated keywords
- AI-generated title (if available)
- Location (if available)
- **Training**: The system should be trainable on a dataset of ~30,000 currently keyword-tagged photos.
- **Scalability**: Should handle at least 1,000 photos/month (in batches of 500), with potential to double in 3 years.
- **Quality**: Keywords and titles must be accurate, relevant, and reflect subtle ag-specific concepts.
## Folder Structure
```
.
├── data/ # Datasets: training, validation, test images, and CSVs
│ ├── raw/ # Raw, unprocessed images and metadata
│ ├── processed/# Preprocessed data ready for modeling
│ └── ...
├── notebooks/ # Jupyter notebooks for EDA, prototyping, and experiments
├── src/ # Source code
│ ├── data/ # Data loading, preprocessing scripts
│ ├── model/ # Model architecture, training, inference code
│ ├── utils/ # Utility functions
│ └── main.py # Main entry point for training/inference
├── outputs/ # Generated outputs (CSVs, predictions, logs)
├── docs.txt # Project requirements and notes
├── README.md # Project overview and instructions
└── .gitignore # Files and folders to ignore in git
```
### Directory Details
- **data/**: All datasets. Use `raw/` for original files, `processed/` for cleaned/ready-to-use data.
- **notebooks/**: Jupyter notebooks for data exploration, prototyping, and model development.
- **src/**: All source code, organized by function (data, model, utils). `main.py` is the main script.
- **outputs/**: All generated outputs, including CSVs with AI-generated tags/titles, logs, and model predictions.
- **docs.txt**: The original requirements and project notes.
- **README.md**: This file.
- **.gitignore**: Keeps unnecessary files out of version control.
## Deliverables
- Well-documented code in `src/`
- At least one Jupyter notebook showing EDA and model prototyping
- Example CSV output as described above
- Instructions for running the system
- (Optional) Trained model weights
## Deadline
**All deliverables are expected within 3 days of project start.**