setup repo structure

2025-07-03 15:27:59 +01:00
commit d0668af517
5 changed files with 130 additions and 0 deletions
@@ -0,0 +1,41 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# Jupyter Notebook checkpoints
+.ipynb_checkpoints
+
+# PyCharm
+.idea/
+
+# VS Code
+.vscode/
+
+# Data and outputs
+data/
+outputs/
+
+# OS files
+.DS_Store 
@@ -0,0 +1,56 @@
+# Smart Farm Photo Keyword Tagging AI
+
+## Project Overview
+This project aims to automate the generation of high-quality, agriculture-relevant keyword tags for agricultural stock photos using AI. The system will replace the current manual keyword tagging process, saving significant time and improving consistency.
+
+## What is Expected
+- **AI Model**: A model trained to generate 5–10 relevant, high-quality keywords per image, with a focus on agricultural context and subtle distinctions (e.g., farmer vs. rancher, male vs. female farmer).
+- **Title Generation**: Optionally generate a descriptive product title for each photo (e.g., "Farmer and son walking in cornfield").
+- **Location Extraction**: If location metadata is present in the image, extract and use it as a keyword (e.g., "Iowa").
+- **CSV Output**: For each photo, output a CSV row with:
+  - Photo file name
+  - Human-entered keywords (for comparison)
+  - AI-generated keywords
+  - AI-generated title (if available)
+  - Location (if available)
+- **Training**: The system should be trainable on a dataset of ~30,000 currently keyword-tagged photos.
+- **Scalability**: Should handle at least 1,000 photos/month (in batches of 500), with potential to double in 3 years.
+- **Quality**: Keywords and titles must be accurate, relevant, and reflect subtle ag-specific concepts.
+
+## Folder Structure
+```
+.
+├── data/         # Datasets: training, validation, test images, and CSVs
+│   ├── raw/      # Raw, unprocessed images and metadata
+│   ├── processed/# Preprocessed data ready for modeling
+│   └── ...
+├── notebooks/    # Jupyter notebooks for EDA, prototyping, and experiments
+├── src/          # Source code
+│   ├── data/     # Data loading, preprocessing scripts
+│   ├── model/    # Model architecture, training, inference code
+│   ├── utils/    # Utility functions
+│   └── main.py   # Main entry point for training/inference
+├── outputs/      # Generated outputs (CSVs, predictions, logs)
+├── docs.txt      # Project requirements and notes
+├── README.md     # Project overview and instructions
+└── .gitignore    # Files and folders to ignore in git
+```
+
+### Directory Details
+- **data/**: All datasets. Use `raw/` for original files, `processed/` for cleaned/ready-to-use data.
+- **notebooks/**: Jupyter notebooks for data exploration, prototyping, and model development.
+- **src/**: All source code, organized by function (data, model, utils). `main.py` is the main script.
+- **outputs/**: All generated outputs, including CSVs with AI-generated tags/titles, logs, and model predictions.
+- **docs.txt**: The original requirements and project notes.
+- **README.md**: This file.
+- **.gitignore**: Keeps unnecessary files out of version control.
+
+## Deliverables
+- Well-documented code in `src/`
+- At least one Jupyter notebook showing EDA and model prototyping
+- Example CSV output as described above
+- Instructions for running the system
+- (Optional) Trained model weights
+
+## Deadline
+**All deliverables are expected within 3 days of project start.** 
@@ -0,0 +1,33 @@
+You want to build a custom AI-powered system to automatically generate keyword tags for agricultural stock photos.
+
+You want the system to help eliminate your current manual keyword tagging process, which is currently handled by an assistant and takes about 10 hours/month.
+
+You need to process 1,000 photos per month, in batches of 500, and this number may scale up over time (possibly doubling in 3 years).
+
+You want the system to generate 5 to 10 high-quality keywords per image, with a focus on agricultural relevance.
+
+You want to be able to train the AI using your current keyword-tagged photo dataset, which contains about 30,000 photos.
+
+The system must differentiate subtle ag-specific concepts, such as:
+
+Farmer vs. rancher
+
+Dairy farmer vs. rancher
+
+Chicken farmer (not rancher)
+
+Male vs. female farmers (for diversity tagging)
+
+You want the system to optionally generate a descriptive product title like: “Farmer and son walking in cornfield.”
+
+If location metadata is available in the image file, you want the system to extract and use that data as a keyword (e.g., “Iowa”).
+
+You want the final output in CSV format, with each photo’s file name matched to its:
+
+Human-entered keywords (for comparison, if needed)
+
+AI-generated keywords
+
+AI-generated title (if available)
+
+Location (if available)