first commit

2025-07-25 21:05:23 +01:00
commit a8b09f2a21
1 changed files with 255 additions and 0 deletions
@@ -0,0 +1,255 @@
+# ML Engineer Assessment: Custom Model Fine-tuning Challenge
+
+## 🎯 Scenario
+You are tasked with building a **Customer Support Intent Classification System** for our e-commerce platform. The system should automatically categorize customer inquiries to route them to appropriate support teams.
+
+## 📊 Dataset Provided
+**BANKING77 Dataset** - Real banking customer service queries
+- **Source**: HuggingFace `datasets` library (`banking77`)
+- **Size**: 13,083 labeled customer queries
+- **Classes**: 77 banking-related intents (card_arrival, transfer, balance, etc.)
+- **Format**: `{'text': 'query', 'label': intent_id}`
+- **Split**: You'll need to create train/validation/test splits
+- **Domain**: Banking and financial services customer support
+
+**Sample Data Points:**
+```
+"What is the base rate of the bank?" → get_exchange_rate
+"I am still waiting on my card" → card_arrival  
+"Can you help me make a payment?" → transfer
+```
+
+## 🎯 Your Mission
+Build a complete fine-tuning pipeline for **Banking Customer Support Intent Classification** that demonstrates your ML engineering skills across the full lifecycle.
+
+---
+
+## 🔧 Technical Requirements
+
+### Core Implementation (Must Have)
+1. **Model Selection & Fine-tuning**
+   - Choose and justify a pre-trained model (BERT, RoBERTa, DistilBERT, etc.)
+   - Implement fine-tuning with proper hyperparameter configuration
+   - Handle class imbalance if present
+
+2. **Data Pipeline**
+   - Clean and preprocess the provided dataset
+   - Implement proper train/val/test splits (70/15/15)
+   - Create data loaders with appropriate batching
+
+3. **Training Infrastructure**
+   - Implement training loop with proper logging
+   - Add early stopping and learning rate scheduling
+   - Track key metrics (accuracy, F1-macro, F1-per-class)
+
+4. **Evaluation & Metrics**
+   - Comprehensive evaluation on test set
+   - Confusion matrix and classification report
+   - Error analysis with examples
+
+5. **Inference Demo**
+   - Create a simple inference script/API
+   - Demonstrate prediction on new examples
+   - Show confidence scores
+
+6. **Executable Pipelines** (Required)
+   - **Training Pipeline**: End-to-end automated training with single command
+   - **Inference Pipeline**: Batch or single prediction pipeline
+   - **Evaluation Pipeline**: Automated model evaluation and reporting
+
+7. **Jupyter Notebooks** (Required)
+   - **Data Exploration**: EDA, class distribution, sample analysis
+   - **Model Experimentation**: Different approaches, hyperparameter testing
+   - **Results Analysis**: Performance analysis, error analysis, insights
+
+### Additional  Features 
+- Experiment tracking ( simple logging)
+- Model versioning and checkpointing
+- Hyperparameter optimization
+- Simple web interface for testing
+
+---
+
+## 📋 Deliverables
+
+### 1. Code Structure (Clean & Modular)
+```
+project/
+├── data/
+│   ├── raw/
+│   └── processed/
+├── notebooks/
+│   ├── 01_data_exploration.ipynb
+│   ├── 02_model_experimentation.ipynb
+│   └── 03_results_analysis.ipynb
+├── src/
+│   ├── __init__.py
+│   ├── data_preprocessing.py
+│   ├── model.py
+│   ├── train.py
+│   ├── evaluate.py
+│   ├── inference.py
+│   └── utils.py
+├── pipelines/
+│   ├── train_pipeline.py
+│   ├── inference_pipeline.py
+│   └── evaluation_pipeline.py
+├── configs/
+│   ├── model_config.yaml
+│   ├── train_config.yaml
+│   └── inference_config.yaml
+├── experiments/
+│   └── logs/
+├── models/
+│   └── checkpoints/
+├── requirements.txt
+├── README.md
+└── run_demo.py
+```
+
+### 2. Documentation & Notebooks
+- **README.md**: Setup instructions, usage examples, design decisions
+- **Jupyter Notebooks**: 
+  - Data exploration with visualizations and insights
+  - Model experimentation and hyperparameter analysis
+  - Results analysis with error examples and improvement suggestions
+- **Code comments**: Clear docstrings and inline comments
+- **Results summary**: Model performance, key findings
+
+### 3. Executable Pipelines
+Create command-line interfaces for each major workflow:
+
+**Training Pipeline:**
+```bash
+python pipelines/train_pipeline.py --config configs/train_config.yaml
+# Should handle: data loading → preprocessing → training → validation → model saving
+```
+
+**Inference Pipeline:**
+```bash
+python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/test_samples.csv
+# Should handle: model loading → preprocessing → batch prediction → output saving
+```
+
+**Evaluation Pipeline:**
+```bash
+python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv
+# Should handle: model loading → evaluation → metrics calculation → report generation
+```
+
+### 3. Live Dem
+- **Pipeline Demonstration**: Show training, inference, and evaluation pipelines in action
+- **Notebook Walkthrough**: Key insights from data exploration and experimentation
+- **Code Architecture**: Explain design choices and component interactions
+- **Performance Analysis**: Model results, error analysis, improvement ideas
+- **Q&A**: Discuss trade-offs, improvements, production considerations
+
+---
+
+## 🎯 Evaluation Criteria
+
+### Technical Skills
+- **Code Quality**: Clean, modular, well-documented code
+- **ML Implementation**: Proper fine-tuning, evaluation, metrics
+- **Data Handling**: Preprocessing, splitting, batching
+- **Pipeline Design**: Executable, configurable, and robust workflows
+- **Notebook Quality**: Clear analysis, insights, and experimentation
+
+### System Design
+- **Architecture**: Logical code organization and separation of concerns
+- **Pipeline Integration**: Seamless flow between training, inference, and evaluation
+- **Configurability**: Easy to modify hyperparameters and model choices
+- **Reproducibility**: Consistent results across runs
+- **Best Practices**: Following ML engineering conventions
+
+### Problem Solving
+- **Dataset Analysis**: Understanding data characteristics and challenges
+- **Model Choice**: Justified selection of model and approach
+- **Performance Optimization**: Addressing class imbalance, overfitting, etc.
+- **Trade-off Awareness**: Understanding of speed vs accuracy, etc.
+
+### Communication
+- **Documentation**: Clear README and code documentation
+- **Problem Articulation**: Clear explanation of challenges and solutions
+
+-
+
+---
+
+## 🛠️ Suggested Tech Stack
+**Required:**
+- PyTorch or TensorFlow/Keras
+- HuggingFace Transformers
+- pandas, numpy, scikit-learn
+- matplotlib/seaborn for visualization
+- FastAPI for inference API
+
+
+
+## 🔧 Pipeline Requirements
+
+### Training Pipeline (`pipelines/train_pipeline.py`)
+**Must include:**
+```python
+# Key components your training pipeline should handle:
+- Config loading and validation
+- Data loading and preprocessing
+- Model initialization
+- Training loop with logging
+- Validation and early stopping
+- Model checkpointing and saving
+- Experiment metadata logging
+```
+
+**Usage:**
+```bash
+python pipelines/train_pipeline.py --config configs/train_config.yaml --experiment_name "baseline_bert"
+```
+
+### Inference Pipeline (`pipelines/inference_pipeline.py`)
+**Must include:**
+```python
+# Key components your inference pipeline should handle:
+- Model loading from checkpoint
+- Input data preprocessing
+- Batch or single prediction
+- Confidence score calculation
+- Output formatting and saving
+- Error handling for malformed inputs
+```
+
+**Usage:**
+```bash
+# Single prediction
+python pipelines/inference_pipeline.py --model_path models/best_model.pt --text "I want to return my order"
+
+# Batch prediction
+python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/new_queries.csv --output_file results/predictions.csv
+```
+
+### Evaluation Pipeline (`pipelines/evaluation_pipeline.py`)
+**Must include:**
+```python
+# Key components your evaluation pipeline should handle:
+- Model loading from checkpoint
+- Test data loading and preprocessing
+- Comprehensive evaluation metrics
+- Confusion matrix and classification report
+- Error analysis with examples
+- Results saving and visualization
+```
+
+**Usage:**
+```bash
+python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv --output_dir results/evaluation/
+```
+
+
+## ⚡ Success Indicators
+- All three pipelines execute successfully without errors
+- Jupyter notebooks show clear data insights and experimentation
+- Model achieves >85% accuracy on test set
+- Clean, production-ready code structure with proper separation of concerns
+- Comprehensive evaluation with actionable insights
+- Clear demonstration of ML engineering best practices
+- Ability to articulate technical decisions confidently during demo