commit d0668af517d7dce0f2a455349ef3f2f9841c5567 Author: OwusuBlessing Date: Thu Jul 3 15:27:59 2025 +0100 setup repo structure diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..ebf54f5 --- /dev/null +++ b/.gitignore @@ -0,0 +1,41 @@ +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +env/ +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +*.egg-info/ +.installed.cfg +*.egg + +# Jupyter Notebook checkpoints +.ipynb_checkpoints + +# PyCharm +.idea/ + +# VS Code +.vscode/ + +# Data and outputs +data/ +outputs/ + +# OS files +.DS_Store \ No newline at end of file diff --git a/.gitkeep b/.gitkeep new file mode 100644 index 0000000..e69de29 diff --git a/README.md b/README.md new file mode 100644 index 0000000..78fa89e --- /dev/null +++ b/README.md @@ -0,0 +1,56 @@ +# Smart Farm Photo Keyword Tagging AI + +## Project Overview +This project aims to automate the generation of high-quality, agriculture-relevant keyword tags for agricultural stock photos using AI. The system will replace the current manual keyword tagging process, saving significant time and improving consistency. + +## What is Expected +- **AI Model**: A model trained to generate 5–10 relevant, high-quality keywords per image, with a focus on agricultural context and subtle distinctions (e.g., farmer vs. rancher, male vs. female farmer). +- **Title Generation**: Optionally generate a descriptive product title for each photo (e.g., "Farmer and son walking in cornfield"). +- **Location Extraction**: If location metadata is present in the image, extract and use it as a keyword (e.g., "Iowa"). +- **CSV Output**: For each photo, output a CSV row with: + - Photo file name + - Human-entered keywords (for comparison) + - AI-generated keywords + - AI-generated title (if available) + - Location (if available) +- **Training**: The system should be trainable on a dataset of ~30,000 currently keyword-tagged photos. +- **Scalability**: Should handle at least 1,000 photos/month (in batches of 500), with potential to double in 3 years. +- **Quality**: Keywords and titles must be accurate, relevant, and reflect subtle ag-specific concepts. + +## Folder Structure +``` +. +├── data/ # Datasets: training, validation, test images, and CSVs +│ ├── raw/ # Raw, unprocessed images and metadata +│ ├── processed/# Preprocessed data ready for modeling +│ └── ... +├── notebooks/ # Jupyter notebooks for EDA, prototyping, and experiments +├── src/ # Source code +│ ├── data/ # Data loading, preprocessing scripts +│ ├── model/ # Model architecture, training, inference code +│ ├── utils/ # Utility functions +│ └── main.py # Main entry point for training/inference +├── outputs/ # Generated outputs (CSVs, predictions, logs) +├── docs.txt # Project requirements and notes +├── README.md # Project overview and instructions +└── .gitignore # Files and folders to ignore in git +``` + +### Directory Details +- **data/**: All datasets. Use `raw/` for original files, `processed/` for cleaned/ready-to-use data. +- **notebooks/**: Jupyter notebooks for data exploration, prototyping, and model development. +- **src/**: All source code, organized by function (data, model, utils). `main.py` is the main script. +- **outputs/**: All generated outputs, including CSVs with AI-generated tags/titles, logs, and model predictions. +- **docs.txt**: The original requirements and project notes. +- **README.md**: This file. +- **.gitignore**: Keeps unnecessary files out of version control. + +## Deliverables +- Well-documented code in `src/` +- At least one Jupyter notebook showing EDA and model prototyping +- Example CSV output as described above +- Instructions for running the system +- (Optional) Trained model weights + +## Deadline +**All deliverables are expected within 3 days of project start.** \ No newline at end of file diff --git a/docs.txt b/docs.txt new file mode 100644 index 0000000..c052bde --- /dev/null +++ b/docs.txt @@ -0,0 +1,33 @@ +You want to build a custom AI-powered system to automatically generate keyword tags for agricultural stock photos. + +You want the system to help eliminate your current manual keyword tagging process, which is currently handled by an assistant and takes about 10 hours/month. + +You need to process 1,000 photos per month, in batches of 500, and this number may scale up over time (possibly doubling in 3 years). + +You want the system to generate 5 to 10 high-quality keywords per image, with a focus on agricultural relevance. + +You want to be able to train the AI using your current keyword-tagged photo dataset, which contains about 30,000 photos. + +The system must differentiate subtle ag-specific concepts, such as: + +Farmer vs. rancher + +Dairy farmer vs. rancher + +Chicken farmer (not rancher) + +Male vs. female farmers (for diversity tagging) + +You want the system to optionally generate a descriptive product title like: “Farmer and son walking in cornfield.” + +If location metadata is available in the image file, you want the system to extract and use that data as a keyword (e.g., “Iowa”). + +You want the final output in CSV format, with each photo’s file name matched to its: + +Human-entered keywords (for comparison, if needed) + +AI-generated keywords + +AI-generated title (if available) + +Location (if available) \ No newline at end of file diff --git a/src/main.py b/src/main.py new file mode 100644 index 0000000..e69de29