updated readme
This commit is contained in:
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
A comprehensive framework for fine-tuning NLP models with organized YAML configurations, supporting multiple tasks (classification, completion, styling, matching).
|
A comprehensive framework for fine-tuning NLP models with organized YAML configurations, supporting multiple tasks (classification, completion, styling, matching).
|
||||||
|
|
||||||
## 🎯 Supported Tasks
|
## Supported Tasks
|
||||||
|
|
||||||
This framework supports multiple NLP tasks with organized configurations:
|
This framework supports multiple NLP tasks with organized configurations:
|
||||||
|
|
||||||
@@ -13,60 +13,60 @@ This framework supports multiple NLP tasks with organized configurations:
|
|||||||
|
|
||||||
### Current Implementation Status
|
### Current Implementation Status
|
||||||
|
|
||||||
- ✅ **Classification**: Fully implemented with emotion classification example
|
- **Classification**: Fully implemented with emotion classification example
|
||||||
- 🔄 **Completion**: Planned for future updates
|
- **Completion**: Planned for future updates
|
||||||
- 🔄 **Styling**: Planned for future updates
|
- **Styling**: Planned for future updates
|
||||||
- 🔄 **Matching**: Planned for future updates
|
- **Matching**: Planned for future updates
|
||||||
|
|
||||||
**Note**: Currently only classification task is supported. Other tasks (completion, styling, matching) are planned for future updates.
|
**Note**: Currently only classification task is supported. Other tasks (completion, styling, matching) are planned for future updates.
|
||||||
|
|
||||||
## 🏗️ Project Structure
|
## Project Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
fine-tune-task/
|
fine-tune-task/
|
||||||
├── configs/ # YAML configuration files
|
├── configs/ # YAML configuration files
|
||||||
│ ├── classification/ # ✅ Implemented
|
│ ├── classification/ # Implemented
|
||||||
│ │ ├── emotion.yaml # Emotion classification
|
│ │ ├── emotion.yaml # Emotion classification
|
||||||
│ │ └── custom.yaml # Custom dataset
|
│ │ └── custom.yaml # Custom dataset
|
||||||
│ ├── completion/ # 🔄 Planned for future updates
|
│ ├── completion/ # Planned for future updates
|
||||||
│ ├── styling/ # 🔄 Planned for future updates
|
│ ├── styling/ # Planned for future updates
|
||||||
│ └── matching/ # 🔄 Planned for future updates
|
│ └── matching/ # Planned for future updates
|
||||||
├── data/ # Data directories
|
├── data/ # Data directories
|
||||||
│ ├── raw/ # Raw input data
|
│ ├── raw/ # Raw input data
|
||||||
│ │ ├── classification/ # ✅ Implemented
|
│ │ ├── classification/ # Implemented
|
||||||
│ │ ├── completion/ # 🔄 Planned for future updates
|
│ │ ├── completion/ # Planned for future updates
|
||||||
│ │ ├── styling/ # 🔄 Planned for future updates
|
│ │ ├── styling/ # Planned for future updates
|
||||||
│ │ └── matching/ # 🔄 Planned for future updates
|
│ │ └── matching/ # Planned for future updates
|
||||||
│ └── processed/ # Processed data
|
│ └── processed/ # Processed data
|
||||||
│ ├── classification/ # ✅ Implemented
|
│ ├── classification/ # Implemented
|
||||||
│ ├── completion/ # 🔄 Planned for future updates
|
│ ├── completion/ # Planned for future updates
|
||||||
│ ├── styling/ # 🔄 Planned for future updates
|
│ ├── styling/ # Planned for future updates
|
||||||
│ └── matching/ # 🔄 Planned for future updates
|
│ └── matching/ # Planned for future updates
|
||||||
├── pipelines/ # Core pipeline scripts
|
├── pipelines/ # Core pipeline scripts
|
||||||
│ ├── classification/ # ✅ Implemented
|
│ ├── classification/ # Implemented
|
||||||
│ │ ├── data_processor.py # Data processing
|
│ │ ├── data_processor.py # Data processing
|
||||||
│ │ ├── train.py # Training
|
│ │ ├── train.py # Training
|
||||||
│ │ └── inference.py # Inference
|
│ │ └── inference.py # Inference
|
||||||
│ ├── completion/ # 🔄 Framework ready
|
│ ├── completion/ # Planned for future updates
|
||||||
│ ├── styling/ # 🔄 Framework ready
|
│ ├── styling/ # Planned for future updates
|
||||||
│ └── matching/ # 🔄 Framework ready
|
│ └── matching/ # Planned for future updates
|
||||||
├── scripts/ # User-friendly scripts
|
├── scripts/ # User-friendly scripts
|
||||||
│ ├── classification/ # ✅ Implemented
|
│ ├── classification/ # Implemented
|
||||||
│ │ ├── data_processor.py # Data processing script
|
│ │ ├── data_processor.py # Data processing script
|
||||||
│ │ ├── trainer.py # Training script
|
│ │ ├── trainer.py # Training script
|
||||||
│ │ └── inference.py # Inference script
|
│ │ └── inference.py # Inference script
|
||||||
│ ├── completion/ # 🔄 Framework ready
|
│ ├── completion/ # Planned for future updates
|
||||||
│ ├── styling/ # 🔄 Framework ready
|
│ ├── styling/ # Planned for future updates
|
||||||
│ └── matching/ # 🔄 Framework ready
|
│ └── matching/ # Planned for future updates
|
||||||
├── results/ # Model outputs
|
├── results/ # Model outputs
|
||||||
│ ├── classification/ # ✅ Implemented
|
│ ├── classification/ # Implemented
|
||||||
│ ├── completion/ # 🔄 Ready
|
│ ├── completion/ # Planned for future updates
|
||||||
│ ├── styling/ # 🔄 Ready
|
│ ├── styling/ # Planned for future updates
|
||||||
│ └── matching/ # 🔄 Ready
|
│ └── matching/ # Planned for future updates
|
||||||
└── utils/ # Shared utility modules
|
└── utils/ # Shared utility modules
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🚀 Quick Start (Classification Task)
|
## Quick Start (Classification Task)
|
||||||
|
|
||||||
### 1. Setup Environment
|
### 1. Setup Environment
|
||||||
|
|
||||||
@@ -93,7 +93,7 @@ ls -la ./data/processed/classification/emotion/classification/
|
|||||||
|
|
||||||
**Expected Output:**
|
**Expected Output:**
|
||||||
```
|
```
|
||||||
✅ Data processing completed successfully!
|
Data processing completed successfully!
|
||||||
Data source: huggingface
|
Data source: huggingface
|
||||||
Dataset: dair-ai/emotion
|
Dataset: dair-ai/emotion
|
||||||
Total samples: 2999
|
Total samples: 2999
|
||||||
@@ -117,7 +117,7 @@ ls -la ./results/classification/emotion_model/
|
|||||||
|
|
||||||
**Expected Output:**
|
**Expected Output:**
|
||||||
```
|
```
|
||||||
✅ Training completed successfully!
|
Training completed successfully!
|
||||||
Model: bert-base-uncased
|
Model: bert-base-uncased
|
||||||
Data directory: ./data/processed/classification/emotion
|
Data directory: ./data/processed/classification/emotion
|
||||||
Training for 3 epochs with batch size 16
|
Training for 3 epochs with batch size 16
|
||||||
@@ -136,7 +136,7 @@ python scripts/classification/inference.py --config configs/classification/emoti
|
|||||||
|
|
||||||
**Expected Output:**
|
**Expected Output:**
|
||||||
```
|
```
|
||||||
✅ Inference completed successfully!
|
Inference completed successfully!
|
||||||
Loading model from: ./results/classification/emotion_model
|
Loading model from: ./results/classification/emotion_model
|
||||||
Predicted label: joy
|
Predicted label: joy
|
||||||
Confidence: 0.8542
|
Confidence: 0.8542
|
||||||
@@ -146,7 +146,7 @@ python scripts/classification/inference.py --config configs/classification/emoti
|
|||||||
- surprise: 0.0224
|
- surprise: 0.0224
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🔧 Adding New Tasks
|
## Adding New Tasks
|
||||||
|
|
||||||
To add a new task (e.g., completion, styling, matching), follow these steps:
|
To add a new task (e.g., completion, styling, matching), follow these steps:
|
||||||
|
|
||||||
@@ -288,7 +288,7 @@ python scripts/completion/trainer.py --config configs/completion/text_generation
|
|||||||
python scripts/completion/inference.py --config configs/completion/text_generation.yaml --input-text "Once upon a time"
|
python scripts/completion/inference.py --config configs/completion/text_generation.yaml --input-text "Once upon a time"
|
||||||
```
|
```
|
||||||
|
|
||||||
## 📋 YAML Configuration Guide
|
## YAML Configuration Guide
|
||||||
|
|
||||||
### Configuration Structure
|
### Configuration Structure
|
||||||
|
|
||||||
@@ -335,7 +335,7 @@ inference:
|
|||||||
- `configs/classification/emotion.yaml` - Emotion classification with HuggingFace dataset
|
- `configs/classification/emotion.yaml` - Emotion classification with HuggingFace dataset
|
||||||
- `configs/classification/custom.yaml` - Custom dataset processing
|
- `configs/classification/custom.yaml` - Custom dataset processing
|
||||||
|
|
||||||
## 🔧 Usage Examples
|
## Usage Examples
|
||||||
|
|
||||||
### Data Processing Examples
|
### Data Processing Examples
|
||||||
|
|
||||||
@@ -385,7 +385,7 @@ python scripts/classification/inference.py --config configs/classification/emoti
|
|||||||
python scripts/classification/inference.py examples
|
python scripts/classification/inference.py examples
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🐛 Troubleshooting Common Errors
|
## Troubleshooting Common Errors
|
||||||
|
|
||||||
### 1. ModuleNotFoundError: No module named 'utils'
|
### 1. ModuleNotFoundError: No module named 'utils'
|
||||||
|
|
||||||
@@ -405,7 +405,7 @@ python scripts/classification/data_processor.py --config configs/classification/
|
|||||||
|
|
||||||
**Error:**
|
**Error:**
|
||||||
```
|
```
|
||||||
❌ Model path not found: ./results/classification/emotion_model
|
Model path not found: ./results/classification/emotion_model
|
||||||
```
|
```
|
||||||
|
|
||||||
**Solution:**
|
**Solution:**
|
||||||
@@ -421,7 +421,7 @@ python scripts/classification/inference.py --config configs/classification/emoti
|
|||||||
|
|
||||||
**Error:**
|
**Error:**
|
||||||
```
|
```
|
||||||
❌ Data directory not found: ./data/processed/classification/emotion
|
Data directory not found: ./data/processed/classification/emotion
|
||||||
```
|
```
|
||||||
|
|
||||||
**Solution:**
|
**Solution:**
|
||||||
@@ -480,7 +480,7 @@ python scripts/classification/trainer.py --config configs/classification/emotion
|
|||||||
python scripts/classification/trainer.py --config configs/classification/emotion.yaml --device cpu
|
python scripts/classification/trainer.py --config configs/classification/emotion.yaml --device cpu
|
||||||
```
|
```
|
||||||
|
|
||||||
## 📊 Monitoring and Logs
|
## Monitoring and Logs
|
||||||
|
|
||||||
### Check Processing Status
|
### Check Processing Status
|
||||||
|
|
||||||
@@ -510,7 +510,7 @@ tail -f logs/training.log
|
|||||||
└── label_info.json # Label mappings
|
└── label_info.json # Label mappings
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🔄 Workflow Summary
|
## Workflow Summary
|
||||||
|
|
||||||
1. **Setup**: Install dependencies and set PYTHONPATH
|
1. **Setup**: Install dependencies and set PYTHONPATH
|
||||||
2. **Data Processing**: Process raw data into organized splits
|
2. **Data Processing**: Process raw data into organized splits
|
||||||
@@ -518,7 +518,7 @@ tail -f logs/training.log
|
|||||||
4. **Inference**: Use trained model for predictions
|
4. **Inference**: Use trained model for predictions
|
||||||
5. **Monitoring**: Check logs and outputs for errors
|
5. **Monitoring**: Check logs and outputs for errors
|
||||||
|
|
||||||
## 📝 Creating Custom Configurations
|
## Creating Custom Configurations
|
||||||
|
|
||||||
### For New Datasets
|
### For New Datasets
|
||||||
|
|
||||||
@@ -560,7 +560,7 @@ data:
|
|||||||
python scripts/classification/data_processor.py --config configs/classification/custom.yaml
|
python scripts/classification/data_processor.py --config configs/classification/custom.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🎯 Best Practices
|
## Best Practices
|
||||||
|
|
||||||
1. **Always check output directories** before running next step
|
1. **Always check output directories** before running next step
|
||||||
2. **Use small datasets for testing** before full runs
|
2. **Use small datasets for testing** before full runs
|
||||||
@@ -569,7 +569,7 @@ python scripts/classification/data_processor.py --config configs/classification/
|
|||||||
5. **Use version control** for YAML files
|
5. **Use version control** for YAML files
|
||||||
6. **Test with CLI overrides** for quick experiments
|
6. **Test with CLI overrides** for quick experiments
|
||||||
|
|
||||||
## 📞 Support
|
## Support
|
||||||
|
|
||||||
For issues and questions:
|
For issues and questions:
|
||||||
1. Check the troubleshooting section above
|
1. Check the troubleshooting section above
|
||||||
@@ -579,4 +579,4 @@ For issues and questions:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**Happy fine-tuning! 🚀**
|
**Happy fine-tuning!**
|
||||||
|
|||||||
Reference in New Issue
Block a user