commit a8b09f2a2126dfb9b7905ea0b4ce35ab31ca8c9d Author: OwusuBlessing Date: Fri Jul 25 21:05:23 2025 +0100 first commit diff --git a/README.md b/README.md new file mode 100644 index 0000000..f5bd376 --- /dev/null +++ b/README.md @@ -0,0 +1,255 @@ +# ML Engineer Assessment: Custom Model Fine-tuning Challenge + +## 🎯 Scenario +You are tasked with building a **Customer Support Intent Classification System** for our e-commerce platform. The system should automatically categorize customer inquiries to route them to appropriate support teams. + +## 📊 Dataset Provided +**BANKING77 Dataset** - Real banking customer service queries +- **Source**: HuggingFace `datasets` library (`banking77`) +- **Size**: 13,083 labeled customer queries +- **Classes**: 77 banking-related intents (card_arrival, transfer, balance, etc.) +- **Format**: `{'text': 'query', 'label': intent_id}` +- **Split**: You'll need to create train/validation/test splits +- **Domain**: Banking and financial services customer support + +**Sample Data Points:** +``` +"What is the base rate of the bank?" → get_exchange_rate +"I am still waiting on my card" → card_arrival +"Can you help me make a payment?" → transfer +``` + +## 🎯 Your Mission +Build a complete fine-tuning pipeline for **Banking Customer Support Intent Classification** that demonstrates your ML engineering skills across the full lifecycle. + +--- + +## 🔧 Technical Requirements + +### Core Implementation (Must Have) +1. **Model Selection & Fine-tuning** + - Choose and justify a pre-trained model (BERT, RoBERTa, DistilBERT, etc.) + - Implement fine-tuning with proper hyperparameter configuration + - Handle class imbalance if present + +2. **Data Pipeline** + - Clean and preprocess the provided dataset + - Implement proper train/val/test splits (70/15/15) + - Create data loaders with appropriate batching + +3. **Training Infrastructure** + - Implement training loop with proper logging + - Add early stopping and learning rate scheduling + - Track key metrics (accuracy, F1-macro, F1-per-class) + +4. **Evaluation & Metrics** + - Comprehensive evaluation on test set + - Confusion matrix and classification report + - Error analysis with examples + +5. **Inference Demo** + - Create a simple inference script/API + - Demonstrate prediction on new examples + - Show confidence scores + +6. **Executable Pipelines** (Required) + - **Training Pipeline**: End-to-end automated training with single command + - **Inference Pipeline**: Batch or single prediction pipeline + - **Evaluation Pipeline**: Automated model evaluation and reporting + +7. **Jupyter Notebooks** (Required) + - **Data Exploration**: EDA, class distribution, sample analysis + - **Model Experimentation**: Different approaches, hyperparameter testing + - **Results Analysis**: Performance analysis, error analysis, insights + +### Additional Features +- Experiment tracking ( simple logging) +- Model versioning and checkpointing +- Hyperparameter optimization +- Simple web interface for testing + +--- + +## 📋 Deliverables + +### 1. Code Structure (Clean & Modular) +``` +project/ +├── data/ +│ ├── raw/ +│ └── processed/ +├── notebooks/ +│ ├── 01_data_exploration.ipynb +│ ├── 02_model_experimentation.ipynb +│ └── 03_results_analysis.ipynb +├── src/ +│ ├── __init__.py +│ ├── data_preprocessing.py +│ ├── model.py +│ ├── train.py +│ ├── evaluate.py +│ ├── inference.py +│ └── utils.py +├── pipelines/ +│ ├── train_pipeline.py +│ ├── inference_pipeline.py +│ └── evaluation_pipeline.py +├── configs/ +│ ├── model_config.yaml +│ ├── train_config.yaml +│ └── inference_config.yaml +├── experiments/ +│ └── logs/ +├── models/ +│ └── checkpoints/ +├── requirements.txt +├── README.md +└── run_demo.py +``` + +### 2. Documentation & Notebooks +- **README.md**: Setup instructions, usage examples, design decisions +- **Jupyter Notebooks**: + - Data exploration with visualizations and insights + - Model experimentation and hyperparameter analysis + - Results analysis with error examples and improvement suggestions +- **Code comments**: Clear docstrings and inline comments +- **Results summary**: Model performance, key findings + +### 3. Executable Pipelines +Create command-line interfaces for each major workflow: + +**Training Pipeline:** +```bash +python pipelines/train_pipeline.py --config configs/train_config.yaml +# Should handle: data loading → preprocessing → training → validation → model saving +``` + +**Inference Pipeline:** +```bash +python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/test_samples.csv +# Should handle: model loading → preprocessing → batch prediction → output saving +``` + +**Evaluation Pipeline:** +```bash +python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv +# Should handle: model loading → evaluation → metrics calculation → report generation +``` + +### 3. Live Dem +- **Pipeline Demonstration**: Show training, inference, and evaluation pipelines in action +- **Notebook Walkthrough**: Key insights from data exploration and experimentation +- **Code Architecture**: Explain design choices and component interactions +- **Performance Analysis**: Model results, error analysis, improvement ideas +- **Q&A**: Discuss trade-offs, improvements, production considerations + +--- + +## 🎯 Evaluation Criteria + +### Technical Skills +- **Code Quality**: Clean, modular, well-documented code +- **ML Implementation**: Proper fine-tuning, evaluation, metrics +- **Data Handling**: Preprocessing, splitting, batching +- **Pipeline Design**: Executable, configurable, and robust workflows +- **Notebook Quality**: Clear analysis, insights, and experimentation + +### System Design +- **Architecture**: Logical code organization and separation of concerns +- **Pipeline Integration**: Seamless flow between training, inference, and evaluation +- **Configurability**: Easy to modify hyperparameters and model choices +- **Reproducibility**: Consistent results across runs +- **Best Practices**: Following ML engineering conventions + +### Problem Solving +- **Dataset Analysis**: Understanding data characteristics and challenges +- **Model Choice**: Justified selection of model and approach +- **Performance Optimization**: Addressing class imbalance, overfitting, etc. +- **Trade-off Awareness**: Understanding of speed vs accuracy, etc. + +### Communication +- **Documentation**: Clear README and code documentation +- **Problem Articulation**: Clear explanation of challenges and solutions + +- + +--- + +## 🛠️ Suggested Tech Stack +**Required:** +- PyTorch or TensorFlow/Keras +- HuggingFace Transformers +- pandas, numpy, scikit-learn +- matplotlib/seaborn for visualization +- FastAPI for inference API + + + +## 🔧 Pipeline Requirements + +### Training Pipeline (`pipelines/train_pipeline.py`) +**Must include:** +```python +# Key components your training pipeline should handle: +- Config loading and validation +- Data loading and preprocessing +- Model initialization +- Training loop with logging +- Validation and early stopping +- Model checkpointing and saving +- Experiment metadata logging +``` + +**Usage:** +```bash +python pipelines/train_pipeline.py --config configs/train_config.yaml --experiment_name "baseline_bert" +``` + +### Inference Pipeline (`pipelines/inference_pipeline.py`) +**Must include:** +```python +# Key components your inference pipeline should handle: +- Model loading from checkpoint +- Input data preprocessing +- Batch or single prediction +- Confidence score calculation +- Output formatting and saving +- Error handling for malformed inputs +``` + +**Usage:** +```bash +# Single prediction +python pipelines/inference_pipeline.py --model_path models/best_model.pt --text "I want to return my order" + +# Batch prediction +python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/new_queries.csv --output_file results/predictions.csv +``` + +### Evaluation Pipeline (`pipelines/evaluation_pipeline.py`) +**Must include:** +```python +# Key components your evaluation pipeline should handle: +- Model loading from checkpoint +- Test data loading and preprocessing +- Comprehensive evaluation metrics +- Confusion matrix and classification report +- Error analysis with examples +- Results saving and visualization +``` + +**Usage:** +```bash +python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv --output_dir results/evaluation/ +``` + + +## ⚡ Success Indicators +- All three pipelines execute successfully without errors +- Jupyter notebooks show clear data insights and experimentation +- Model achieves >85% accuracy on test set +- Clean, production-ready code structure with proper separation of concerns +- Comprehensive evaluation with actionable insights +- Clear demonstration of ML engineering best practices +- Ability to articulate technical decisions confidently during demo \ No newline at end of file