setup repo structure

2025-07-03 15:27:59 +01:00
commit d0668af517
5 changed files with 130 additions and 0 deletions
@@ -0,0 +1,56 @@
+# Smart Farm Photo Keyword Tagging AI
+
+## Project Overview
+This project aims to automate the generation of high-quality, agriculture-relevant keyword tags for agricultural stock photos using AI. The system will replace the current manual keyword tagging process, saving significant time and improving consistency.
+
+## What is Expected
+- **AI Model**: A model trained to generate 5–10 relevant, high-quality keywords per image, with a focus on agricultural context and subtle distinctions (e.g., farmer vs. rancher, male vs. female farmer).
+- **Title Generation**: Optionally generate a descriptive product title for each photo (e.g., "Farmer and son walking in cornfield").
+- **Location Extraction**: If location metadata is present in the image, extract and use it as a keyword (e.g., "Iowa").
+- **CSV Output**: For each photo, output a CSV row with:
+  - Photo file name
+  - Human-entered keywords (for comparison)
+  - AI-generated keywords
+  - AI-generated title (if available)
+  - Location (if available)
+- **Training**: The system should be trainable on a dataset of ~30,000 currently keyword-tagged photos.
+- **Scalability**: Should handle at least 1,000 photos/month (in batches of 500), with potential to double in 3 years.
+- **Quality**: Keywords and titles must be accurate, relevant, and reflect subtle ag-specific concepts.
+
+## Folder Structure
+```
+.
+├── data/         # Datasets: training, validation, test images, and CSVs
+│   ├── raw/      # Raw, unprocessed images and metadata
+│   ├── processed/# Preprocessed data ready for modeling
+│   └── ...
+├── notebooks/    # Jupyter notebooks for EDA, prototyping, and experiments
+├── src/          # Source code
+│   ├── data/     # Data loading, preprocessing scripts
+│   ├── model/    # Model architecture, training, inference code
+│   ├── utils/    # Utility functions
+│   └── main.py   # Main entry point for training/inference
+├── outputs/      # Generated outputs (CSVs, predictions, logs)
+├── docs.txt      # Project requirements and notes
+├── README.md     # Project overview and instructions
+└── .gitignore    # Files and folders to ignore in git
+```
+
+### Directory Details
+- **data/**: All datasets. Use `raw/` for original files, `processed/` for cleaned/ready-to-use data.
+- **notebooks/**: Jupyter notebooks for data exploration, prototyping, and model development.
+- **src/**: All source code, organized by function (data, model, utils). `main.py` is the main script.
+- **outputs/**: All generated outputs, including CSVs with AI-generated tags/titles, logs, and model predictions.
+- **docs.txt**: The original requirements and project notes.
+- **README.md**: This file.
+- **.gitignore**: Keeps unnecessary files out of version control.
+
+## Deliverables
+- Well-documented code in `src/`
+- At least one Jupyter notebook showing EDA and model prototyping
+- Example CSV output as described above
+- Instructions for running the system
+- (Optional) Trained model weights
+
+## Deadline
+**All deliverables are expected within 3 days of project start.**