setup repo structure

This commit is contained in:
OwusuBlessing
2025-07-03 15:27:59 +01:00
commit d0668af517
5 changed files with 130 additions and 0 deletions
+41
View File
@@ -0,0 +1,41 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
# Jupyter Notebook checkpoints
.ipynb_checkpoints
# PyCharm
.idea/
# VS Code
.vscode/
# Data and outputs
data/
outputs/
# OS files
.DS_Store
View File
+56
View File
@@ -0,0 +1,56 @@
# Smart Farm Photo Keyword Tagging AI
## Project Overview
This project aims to automate the generation of high-quality, agriculture-relevant keyword tags for agricultural stock photos using AI. The system will replace the current manual keyword tagging process, saving significant time and improving consistency.
## What is Expected
- **AI Model**: A model trained to generate 510 relevant, high-quality keywords per image, with a focus on agricultural context and subtle distinctions (e.g., farmer vs. rancher, male vs. female farmer).
- **Title Generation**: Optionally generate a descriptive product title for each photo (e.g., "Farmer and son walking in cornfield").
- **Location Extraction**: If location metadata is present in the image, extract and use it as a keyword (e.g., "Iowa").
- **CSV Output**: For each photo, output a CSV row with:
- Photo file name
- Human-entered keywords (for comparison)
- AI-generated keywords
- AI-generated title (if available)
- Location (if available)
- **Training**: The system should be trainable on a dataset of ~30,000 currently keyword-tagged photos.
- **Scalability**: Should handle at least 1,000 photos/month (in batches of 500), with potential to double in 3 years.
- **Quality**: Keywords and titles must be accurate, relevant, and reflect subtle ag-specific concepts.
## Folder Structure
```
.
├── data/ # Datasets: training, validation, test images, and CSVs
│ ├── raw/ # Raw, unprocessed images and metadata
│ ├── processed/# Preprocessed data ready for modeling
│ └── ...
├── notebooks/ # Jupyter notebooks for EDA, prototyping, and experiments
├── src/ # Source code
│ ├── data/ # Data loading, preprocessing scripts
│ ├── model/ # Model architecture, training, inference code
│ ├── utils/ # Utility functions
│ └── main.py # Main entry point for training/inference
├── outputs/ # Generated outputs (CSVs, predictions, logs)
├── docs.txt # Project requirements and notes
├── README.md # Project overview and instructions
└── .gitignore # Files and folders to ignore in git
```
### Directory Details
- **data/**: All datasets. Use `raw/` for original files, `processed/` for cleaned/ready-to-use data.
- **notebooks/**: Jupyter notebooks for data exploration, prototyping, and model development.
- **src/**: All source code, organized by function (data, model, utils). `main.py` is the main script.
- **outputs/**: All generated outputs, including CSVs with AI-generated tags/titles, logs, and model predictions.
- **docs.txt**: The original requirements and project notes.
- **README.md**: This file.
- **.gitignore**: Keeps unnecessary files out of version control.
## Deliverables
- Well-documented code in `src/`
- At least one Jupyter notebook showing EDA and model prototyping
- Example CSV output as described above
- Instructions for running the system
- (Optional) Trained model weights
## Deadline
**All deliverables are expected within 3 days of project start.**
+33
View File
@@ -0,0 +1,33 @@
You want to build a custom AI-powered system to automatically generate keyword tags for agricultural stock photos.
You want the system to help eliminate your current manual keyword tagging process, which is currently handled by an assistant and takes about 10 hours/month.
You need to process 1,000 photos per month, in batches of 500, and this number may scale up over time (possibly doubling in 3 years).
You want the system to generate 5 to 10 high-quality keywords per image, with a focus on agricultural relevance.
You want to be able to train the AI using your current keyword-tagged photo dataset, which contains about 30,000 photos.
The system must differentiate subtle ag-specific concepts, such as:
Farmer vs. rancher
Dairy farmer vs. rancher
Chicken farmer (not rancher)
Male vs. female farmers (for diversity tagging)
You want the system to optionally generate a descriptive product title like: “Farmer and son walking in cornfield.”
If location metadata is available in the image file, you want the system to extract and use that data as a keyword (e.g., “Iowa”).
You want the final output in CSV format, with each photos file name matched to its:
Human-entered keywords (for comparison, if needed)
AI-generated keywords
AI-generated title (if available)
Location (if available)
View File