diff --git a/README.md b/README.md index 6929a7d..f387422 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ A comprehensive framework for fine-tuning NLP models with organized YAML configurations, supporting multiple tasks (classification, completion, styling, matching). -## 🎯 Supported Tasks +## Supported Tasks This framework supports multiple NLP tasks with organized configurations: @@ -13,60 +13,60 @@ This framework supports multiple NLP tasks with organized configurations: ### Current Implementation Status -- ✅ **Classification**: Fully implemented with emotion classification example -- 🔄 **Completion**: Planned for future updates -- 🔄 **Styling**: Planned for future updates -- 🔄 **Matching**: Planned for future updates +- **Classification**: Fully implemented with emotion classification example +- **Completion**: Planned for future updates +- **Styling**: Planned for future updates +- **Matching**: Planned for future updates **Note**: Currently only classification task is supported. Other tasks (completion, styling, matching) are planned for future updates. -## 🏗️ Project Structure +## Project Structure ``` fine-tune-task/ ├── configs/ # YAML configuration files -│ ├── classification/ # ✅ Implemented +│ ├── classification/ # Implemented │ │ ├── emotion.yaml # Emotion classification │ │ └── custom.yaml # Custom dataset -│ ├── completion/ # 🔄 Planned for future updates -│ ├── styling/ # 🔄 Planned for future updates -│ └── matching/ # 🔄 Planned for future updates +│ ├── completion/ # Planned for future updates +│ ├── styling/ # Planned for future updates +│ └── matching/ # Planned for future updates ├── data/ # Data directories │ ├── raw/ # Raw input data -│ │ ├── classification/ # ✅ Implemented -│ │ ├── completion/ # 🔄 Planned for future updates -│ │ ├── styling/ # 🔄 Planned for future updates -│ │ └── matching/ # 🔄 Planned for future updates +│ │ ├── classification/ # Implemented +│ │ ├── completion/ # Planned for future updates +│ │ ├── styling/ # Planned for future updates +│ │ └── matching/ # Planned for future updates │ └── processed/ # Processed data -│ ├── classification/ # ✅ Implemented -│ ├── completion/ # 🔄 Planned for future updates -│ ├── styling/ # 🔄 Planned for future updates -│ └── matching/ # 🔄 Planned for future updates +│ ├── classification/ # Implemented +│ ├── completion/ # Planned for future updates +│ ├── styling/ # Planned for future updates +│ └── matching/ # Planned for future updates ├── pipelines/ # Core pipeline scripts -│ ├── classification/ # ✅ Implemented +│ ├── classification/ # Implemented │ │ ├── data_processor.py # Data processing │ │ ├── train.py # Training │ │ └── inference.py # Inference -│ ├── completion/ # 🔄 Framework ready -│ ├── styling/ # 🔄 Framework ready -│ └── matching/ # 🔄 Framework ready +│ ├── completion/ # Planned for future updates +│ ├── styling/ # Planned for future updates +│ └── matching/ # Planned for future updates ├── scripts/ # User-friendly scripts -│ ├── classification/ # ✅ Implemented +│ ├── classification/ # Implemented │ │ ├── data_processor.py # Data processing script │ │ ├── trainer.py # Training script │ │ └── inference.py # Inference script -│ ├── completion/ # 🔄 Framework ready -│ ├── styling/ # 🔄 Framework ready -│ └── matching/ # 🔄 Framework ready +│ ├── completion/ # Planned for future updates +│ ├── styling/ # Planned for future updates +│ └── matching/ # Planned for future updates ├── results/ # Model outputs -│ ├── classification/ # ✅ Implemented -│ ├── completion/ # 🔄 Ready -│ ├── styling/ # 🔄 Ready -│ └── matching/ # 🔄 Ready +│ ├── classification/ # Implemented +│ ├── completion/ # Planned for future updates +│ ├── styling/ # Planned for future updates +│ └── matching/ # Planned for future updates └── utils/ # Shared utility modules ``` -## 🚀 Quick Start (Classification Task) +## Quick Start (Classification Task) ### 1. Setup Environment @@ -93,7 +93,7 @@ ls -la ./data/processed/classification/emotion/classification/ **Expected Output:** ``` -✅ Data processing completed successfully! +Data processing completed successfully! Data source: huggingface Dataset: dair-ai/emotion Total samples: 2999 @@ -117,7 +117,7 @@ ls -la ./results/classification/emotion_model/ **Expected Output:** ``` -✅ Training completed successfully! +Training completed successfully! Model: bert-base-uncased Data directory: ./data/processed/classification/emotion Training for 3 epochs with batch size 16 @@ -136,7 +136,7 @@ python scripts/classification/inference.py --config configs/classification/emoti **Expected Output:** ``` -✅ Inference completed successfully! +Inference completed successfully! Loading model from: ./results/classification/emotion_model Predicted label: joy Confidence: 0.8542 @@ -146,7 +146,7 @@ python scripts/classification/inference.py --config configs/classification/emoti - surprise: 0.0224 ``` -## 🔧 Adding New Tasks +## Adding New Tasks To add a new task (e.g., completion, styling, matching), follow these steps: @@ -288,7 +288,7 @@ python scripts/completion/trainer.py --config configs/completion/text_generation python scripts/completion/inference.py --config configs/completion/text_generation.yaml --input-text "Once upon a time" ``` -## 📋 YAML Configuration Guide +## YAML Configuration Guide ### Configuration Structure @@ -335,7 +335,7 @@ inference: - `configs/classification/emotion.yaml` - Emotion classification with HuggingFace dataset - `configs/classification/custom.yaml` - Custom dataset processing -## 🔧 Usage Examples +## Usage Examples ### Data Processing Examples @@ -385,7 +385,7 @@ python scripts/classification/inference.py --config configs/classification/emoti python scripts/classification/inference.py examples ``` -## 🐛 Troubleshooting Common Errors +## Troubleshooting Common Errors ### 1. ModuleNotFoundError: No module named 'utils' @@ -405,7 +405,7 @@ python scripts/classification/data_processor.py --config configs/classification/ **Error:** ``` -❌ Model path not found: ./results/classification/emotion_model +Model path not found: ./results/classification/emotion_model ``` **Solution:** @@ -421,7 +421,7 @@ python scripts/classification/inference.py --config configs/classification/emoti **Error:** ``` -❌ Data directory not found: ./data/processed/classification/emotion +Data directory not found: ./data/processed/classification/emotion ``` **Solution:** @@ -480,7 +480,7 @@ python scripts/classification/trainer.py --config configs/classification/emotion python scripts/classification/trainer.py --config configs/classification/emotion.yaml --device cpu ``` -## 📊 Monitoring and Logs +## Monitoring and Logs ### Check Processing Status @@ -510,7 +510,7 @@ tail -f logs/training.log └── label_info.json # Label mappings ``` -## 🔄 Workflow Summary +## Workflow Summary 1. **Setup**: Install dependencies and set PYTHONPATH 2. **Data Processing**: Process raw data into organized splits @@ -518,7 +518,7 @@ tail -f logs/training.log 4. **Inference**: Use trained model for predictions 5. **Monitoring**: Check logs and outputs for errors -## 📝 Creating Custom Configurations +## Creating Custom Configurations ### For New Datasets @@ -560,7 +560,7 @@ data: python scripts/classification/data_processor.py --config configs/classification/custom.yaml ``` -## 🎯 Best Practices +## Best Practices 1. **Always check output directories** before running next step 2. **Use small datasets for testing** before full runs @@ -569,7 +569,7 @@ python scripts/classification/data_processor.py --config configs/classification/ 5. **Use version control** for YAML files 6. **Test with CLI overrides** for quick experiments -## 📞 Support +## Support For issues and questions: 1. Check the troubleshooting section above @@ -579,4 +579,4 @@ For issues and questions: --- -**Happy fine-tuning! 🚀** +**Happy fine-tuning!**