updated readme

This commit is contained in:
OwusuBlessing
2025-08-06 22:49:29 +01:00
parent fef3f5ae35
commit fd54d4be39
+46 -46
View File
@@ -2,7 +2,7 @@
A comprehensive framework for fine-tuning NLP models with organized YAML configurations, supporting multiple tasks (classification, completion, styling, matching). A comprehensive framework for fine-tuning NLP models with organized YAML configurations, supporting multiple tasks (classification, completion, styling, matching).
## 🎯 Supported Tasks ## Supported Tasks
This framework supports multiple NLP tasks with organized configurations: This framework supports multiple NLP tasks with organized configurations:
@@ -13,60 +13,60 @@ This framework supports multiple NLP tasks with organized configurations:
### Current Implementation Status ### Current Implementation Status
- **Classification**: Fully implemented with emotion classification example - **Classification**: Fully implemented with emotion classification example
- 🔄 **Completion**: Planned for future updates - **Completion**: Planned for future updates
- 🔄 **Styling**: Planned for future updates - **Styling**: Planned for future updates
- 🔄 **Matching**: Planned for future updates - **Matching**: Planned for future updates
**Note**: Currently only classification task is supported. Other tasks (completion, styling, matching) are planned for future updates. **Note**: Currently only classification task is supported. Other tasks (completion, styling, matching) are planned for future updates.
## 🏗️ Project Structure ## Project Structure
``` ```
fine-tune-task/ fine-tune-task/
├── configs/ # YAML configuration files ├── configs/ # YAML configuration files
│ ├── classification/ # Implemented │ ├── classification/ # Implemented
│ │ ├── emotion.yaml # Emotion classification │ │ ├── emotion.yaml # Emotion classification
│ │ └── custom.yaml # Custom dataset │ │ └── custom.yaml # Custom dataset
│ ├── completion/ # 🔄 Planned for future updates │ ├── completion/ # Planned for future updates
│ ├── styling/ # 🔄 Planned for future updates │ ├── styling/ # Planned for future updates
│ └── matching/ # 🔄 Planned for future updates │ └── matching/ # Planned for future updates
├── data/ # Data directories ├── data/ # Data directories
│ ├── raw/ # Raw input data │ ├── raw/ # Raw input data
│ │ ├── classification/ # Implemented │ │ ├── classification/ # Implemented
│ │ ├── completion/ # 🔄 Planned for future updates │ │ ├── completion/ # Planned for future updates
│ │ ├── styling/ # 🔄 Planned for future updates │ │ ├── styling/ # Planned for future updates
│ │ └── matching/ # 🔄 Planned for future updates │ │ └── matching/ # Planned for future updates
│ └── processed/ # Processed data │ └── processed/ # Processed data
│ ├── classification/ # Implemented │ ├── classification/ # Implemented
│ ├── completion/ # 🔄 Planned for future updates │ ├── completion/ # Planned for future updates
│ ├── styling/ # 🔄 Planned for future updates │ ├── styling/ # Planned for future updates
│ └── matching/ # 🔄 Planned for future updates │ └── matching/ # Planned for future updates
├── pipelines/ # Core pipeline scripts ├── pipelines/ # Core pipeline scripts
│ ├── classification/ # Implemented │ ├── classification/ # Implemented
│ │ ├── data_processor.py # Data processing │ │ ├── data_processor.py # Data processing
│ │ ├── train.py # Training │ │ ├── train.py # Training
│ │ └── inference.py # Inference │ │ └── inference.py # Inference
│ ├── completion/ # 🔄 Framework ready │ ├── completion/ # Planned for future updates
│ ├── styling/ # 🔄 Framework ready │ ├── styling/ # Planned for future updates
│ └── matching/ # 🔄 Framework ready │ └── matching/ # Planned for future updates
├── scripts/ # User-friendly scripts ├── scripts/ # User-friendly scripts
│ ├── classification/ # Implemented │ ├── classification/ # Implemented
│ │ ├── data_processor.py # Data processing script │ │ ├── data_processor.py # Data processing script
│ │ ├── trainer.py # Training script │ │ ├── trainer.py # Training script
│ │ └── inference.py # Inference script │ │ └── inference.py # Inference script
│ ├── completion/ # 🔄 Framework ready │ ├── completion/ # Planned for future updates
│ ├── styling/ # 🔄 Framework ready │ ├── styling/ # Planned for future updates
│ └── matching/ # 🔄 Framework ready │ └── matching/ # Planned for future updates
├── results/ # Model outputs ├── results/ # Model outputs
│ ├── classification/ # Implemented │ ├── classification/ # Implemented
│ ├── completion/ # 🔄 Ready │ ├── completion/ # Planned for future updates
│ ├── styling/ # 🔄 Ready │ ├── styling/ # Planned for future updates
│ └── matching/ # 🔄 Ready │ └── matching/ # Planned for future updates
└── utils/ # Shared utility modules └── utils/ # Shared utility modules
``` ```
## 🚀 Quick Start (Classification Task) ## Quick Start (Classification Task)
### 1. Setup Environment ### 1. Setup Environment
@@ -93,7 +93,7 @@ ls -la ./data/processed/classification/emotion/classification/
**Expected Output:** **Expected Output:**
``` ```
Data processing completed successfully! Data processing completed successfully!
Data source: huggingface Data source: huggingface
Dataset: dair-ai/emotion Dataset: dair-ai/emotion
Total samples: 2999 Total samples: 2999
@@ -117,7 +117,7 @@ ls -la ./results/classification/emotion_model/
**Expected Output:** **Expected Output:**
``` ```
Training completed successfully! Training completed successfully!
Model: bert-base-uncased Model: bert-base-uncased
Data directory: ./data/processed/classification/emotion Data directory: ./data/processed/classification/emotion
Training for 3 epochs with batch size 16 Training for 3 epochs with batch size 16
@@ -136,7 +136,7 @@ python scripts/classification/inference.py --config configs/classification/emoti
**Expected Output:** **Expected Output:**
``` ```
Inference completed successfully! Inference completed successfully!
Loading model from: ./results/classification/emotion_model Loading model from: ./results/classification/emotion_model
Predicted label: joy Predicted label: joy
Confidence: 0.8542 Confidence: 0.8542
@@ -146,7 +146,7 @@ python scripts/classification/inference.py --config configs/classification/emoti
- surprise: 0.0224 - surprise: 0.0224
``` ```
## 🔧 Adding New Tasks ## Adding New Tasks
To add a new task (e.g., completion, styling, matching), follow these steps: To add a new task (e.g., completion, styling, matching), follow these steps:
@@ -288,7 +288,7 @@ python scripts/completion/trainer.py --config configs/completion/text_generation
python scripts/completion/inference.py --config configs/completion/text_generation.yaml --input-text "Once upon a time" python scripts/completion/inference.py --config configs/completion/text_generation.yaml --input-text "Once upon a time"
``` ```
## 📋 YAML Configuration Guide ## YAML Configuration Guide
### Configuration Structure ### Configuration Structure
@@ -335,7 +335,7 @@ inference:
- `configs/classification/emotion.yaml` - Emotion classification with HuggingFace dataset - `configs/classification/emotion.yaml` - Emotion classification with HuggingFace dataset
- `configs/classification/custom.yaml` - Custom dataset processing - `configs/classification/custom.yaml` - Custom dataset processing
## 🔧 Usage Examples ## Usage Examples
### Data Processing Examples ### Data Processing Examples
@@ -385,7 +385,7 @@ python scripts/classification/inference.py --config configs/classification/emoti
python scripts/classification/inference.py examples python scripts/classification/inference.py examples
``` ```
## 🐛 Troubleshooting Common Errors ## Troubleshooting Common Errors
### 1. ModuleNotFoundError: No module named 'utils' ### 1. ModuleNotFoundError: No module named 'utils'
@@ -405,7 +405,7 @@ python scripts/classification/data_processor.py --config configs/classification/
**Error:** **Error:**
``` ```
Model path not found: ./results/classification/emotion_model Model path not found: ./results/classification/emotion_model
``` ```
**Solution:** **Solution:**
@@ -421,7 +421,7 @@ python scripts/classification/inference.py --config configs/classification/emoti
**Error:** **Error:**
``` ```
Data directory not found: ./data/processed/classification/emotion Data directory not found: ./data/processed/classification/emotion
``` ```
**Solution:** **Solution:**
@@ -480,7 +480,7 @@ python scripts/classification/trainer.py --config configs/classification/emotion
python scripts/classification/trainer.py --config configs/classification/emotion.yaml --device cpu python scripts/classification/trainer.py --config configs/classification/emotion.yaml --device cpu
``` ```
## 📊 Monitoring and Logs ## Monitoring and Logs
### Check Processing Status ### Check Processing Status
@@ -510,7 +510,7 @@ tail -f logs/training.log
└── label_info.json # Label mappings └── label_info.json # Label mappings
``` ```
## 🔄 Workflow Summary ## Workflow Summary
1. **Setup**: Install dependencies and set PYTHONPATH 1. **Setup**: Install dependencies and set PYTHONPATH
2. **Data Processing**: Process raw data into organized splits 2. **Data Processing**: Process raw data into organized splits
@@ -518,7 +518,7 @@ tail -f logs/training.log
4. **Inference**: Use trained model for predictions 4. **Inference**: Use trained model for predictions
5. **Monitoring**: Check logs and outputs for errors 5. **Monitoring**: Check logs and outputs for errors
## 📝 Creating Custom Configurations ## Creating Custom Configurations
### For New Datasets ### For New Datasets
@@ -560,7 +560,7 @@ data:
python scripts/classification/data_processor.py --config configs/classification/custom.yaml python scripts/classification/data_processor.py --config configs/classification/custom.yaml
``` ```
## 🎯 Best Practices ## Best Practices
1. **Always check output directories** before running next step 1. **Always check output directories** before running next step
2. **Use small datasets for testing** before full runs 2. **Use small datasets for testing** before full runs
@@ -569,7 +569,7 @@ python scripts/classification/data_processor.py --config configs/classification/
5. **Use version control** for YAML files 5. **Use version control** for YAML files
6. **Test with CLI overrides** for quick experiments 6. **Test with CLI overrides** for quick experiments
## 📞 Support ## Support
For issues and questions: For issues and questions:
1. Check the troubleshooting section above 1. Check the troubleshooting section above
@@ -579,4 +579,4 @@ For issues and questions:
--- ---
**Happy fine-tuning! 🚀** **Happy fine-tuning!**