first commit
This commit is contained in:
@@ -0,0 +1,255 @@
|
||||
# ML Engineer Assessment: Custom Model Fine-tuning Challenge
|
||||
|
||||
## 🎯 Scenario
|
||||
You are tasked with building a **Customer Support Intent Classification System** for our e-commerce platform. The system should automatically categorize customer inquiries to route them to appropriate support teams.
|
||||
|
||||
## 📊 Dataset Provided
|
||||
**BANKING77 Dataset** - Real banking customer service queries
|
||||
- **Source**: HuggingFace `datasets` library (`banking77`)
|
||||
- **Size**: 13,083 labeled customer queries
|
||||
- **Classes**: 77 banking-related intents (card_arrival, transfer, balance, etc.)
|
||||
- **Format**: `{'text': 'query', 'label': intent_id}`
|
||||
- **Split**: You'll need to create train/validation/test splits
|
||||
- **Domain**: Banking and financial services customer support
|
||||
|
||||
**Sample Data Points:**
|
||||
```
|
||||
"What is the base rate of the bank?" → get_exchange_rate
|
||||
"I am still waiting on my card" → card_arrival
|
||||
"Can you help me make a payment?" → transfer
|
||||
```
|
||||
|
||||
## 🎯 Your Mission
|
||||
Build a complete fine-tuning pipeline for **Banking Customer Support Intent Classification** that demonstrates your ML engineering skills across the full lifecycle.
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Technical Requirements
|
||||
|
||||
### Core Implementation (Must Have)
|
||||
1. **Model Selection & Fine-tuning**
|
||||
- Choose and justify a pre-trained model (BERT, RoBERTa, DistilBERT, etc.)
|
||||
- Implement fine-tuning with proper hyperparameter configuration
|
||||
- Handle class imbalance if present
|
||||
|
||||
2. **Data Pipeline**
|
||||
- Clean and preprocess the provided dataset
|
||||
- Implement proper train/val/test splits (70/15/15)
|
||||
- Create data loaders with appropriate batching
|
||||
|
||||
3. **Training Infrastructure**
|
||||
- Implement training loop with proper logging
|
||||
- Add early stopping and learning rate scheduling
|
||||
- Track key metrics (accuracy, F1-macro, F1-per-class)
|
||||
|
||||
4. **Evaluation & Metrics**
|
||||
- Comprehensive evaluation on test set
|
||||
- Confusion matrix and classification report
|
||||
- Error analysis with examples
|
||||
|
||||
5. **Inference Demo**
|
||||
- Create a simple inference script/API
|
||||
- Demonstrate prediction on new examples
|
||||
- Show confidence scores
|
||||
|
||||
6. **Executable Pipelines** (Required)
|
||||
- **Training Pipeline**: End-to-end automated training with single command
|
||||
- **Inference Pipeline**: Batch or single prediction pipeline
|
||||
- **Evaluation Pipeline**: Automated model evaluation and reporting
|
||||
|
||||
7. **Jupyter Notebooks** (Required)
|
||||
- **Data Exploration**: EDA, class distribution, sample analysis
|
||||
- **Model Experimentation**: Different approaches, hyperparameter testing
|
||||
- **Results Analysis**: Performance analysis, error analysis, insights
|
||||
|
||||
### Additional Features
|
||||
- Experiment tracking ( simple logging)
|
||||
- Model versioning and checkpointing
|
||||
- Hyperparameter optimization
|
||||
- Simple web interface for testing
|
||||
|
||||
---
|
||||
|
||||
## 📋 Deliverables
|
||||
|
||||
### 1. Code Structure (Clean & Modular)
|
||||
```
|
||||
project/
|
||||
├── data/
|
||||
│ ├── raw/
|
||||
│ └── processed/
|
||||
├── notebooks/
|
||||
│ ├── 01_data_exploration.ipynb
|
||||
│ ├── 02_model_experimentation.ipynb
|
||||
│ └── 03_results_analysis.ipynb
|
||||
├── src/
|
||||
│ ├── __init__.py
|
||||
│ ├── data_preprocessing.py
|
||||
│ ├── model.py
|
||||
│ ├── train.py
|
||||
│ ├── evaluate.py
|
||||
│ ├── inference.py
|
||||
│ └── utils.py
|
||||
├── pipelines/
|
||||
│ ├── train_pipeline.py
|
||||
│ ├── inference_pipeline.py
|
||||
│ └── evaluation_pipeline.py
|
||||
├── configs/
|
||||
│ ├── model_config.yaml
|
||||
│ ├── train_config.yaml
|
||||
│ └── inference_config.yaml
|
||||
├── experiments/
|
||||
│ └── logs/
|
||||
├── models/
|
||||
│ └── checkpoints/
|
||||
├── requirements.txt
|
||||
├── README.md
|
||||
└── run_demo.py
|
||||
```
|
||||
|
||||
### 2. Documentation & Notebooks
|
||||
- **README.md**: Setup instructions, usage examples, design decisions
|
||||
- **Jupyter Notebooks**:
|
||||
- Data exploration with visualizations and insights
|
||||
- Model experimentation and hyperparameter analysis
|
||||
- Results analysis with error examples and improvement suggestions
|
||||
- **Code comments**: Clear docstrings and inline comments
|
||||
- **Results summary**: Model performance, key findings
|
||||
|
||||
### 3. Executable Pipelines
|
||||
Create command-line interfaces for each major workflow:
|
||||
|
||||
**Training Pipeline:**
|
||||
```bash
|
||||
python pipelines/train_pipeline.py --config configs/train_config.yaml
|
||||
# Should handle: data loading → preprocessing → training → validation → model saving
|
||||
```
|
||||
|
||||
**Inference Pipeline:**
|
||||
```bash
|
||||
python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/test_samples.csv
|
||||
# Should handle: model loading → preprocessing → batch prediction → output saving
|
||||
```
|
||||
|
||||
**Evaluation Pipeline:**
|
||||
```bash
|
||||
python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv
|
||||
# Should handle: model loading → evaluation → metrics calculation → report generation
|
||||
```
|
||||
|
||||
### 3. Live Dem
|
||||
- **Pipeline Demonstration**: Show training, inference, and evaluation pipelines in action
|
||||
- **Notebook Walkthrough**: Key insights from data exploration and experimentation
|
||||
- **Code Architecture**: Explain design choices and component interactions
|
||||
- **Performance Analysis**: Model results, error analysis, improvement ideas
|
||||
- **Q&A**: Discuss trade-offs, improvements, production considerations
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Evaluation Criteria
|
||||
|
||||
### Technical Skills
|
||||
- **Code Quality**: Clean, modular, well-documented code
|
||||
- **ML Implementation**: Proper fine-tuning, evaluation, metrics
|
||||
- **Data Handling**: Preprocessing, splitting, batching
|
||||
- **Pipeline Design**: Executable, configurable, and robust workflows
|
||||
- **Notebook Quality**: Clear analysis, insights, and experimentation
|
||||
|
||||
### System Design
|
||||
- **Architecture**: Logical code organization and separation of concerns
|
||||
- **Pipeline Integration**: Seamless flow between training, inference, and evaluation
|
||||
- **Configurability**: Easy to modify hyperparameters and model choices
|
||||
- **Reproducibility**: Consistent results across runs
|
||||
- **Best Practices**: Following ML engineering conventions
|
||||
|
||||
### Problem Solving
|
||||
- **Dataset Analysis**: Understanding data characteristics and challenges
|
||||
- **Model Choice**: Justified selection of model and approach
|
||||
- **Performance Optimization**: Addressing class imbalance, overfitting, etc.
|
||||
- **Trade-off Awareness**: Understanding of speed vs accuracy, etc.
|
||||
|
||||
### Communication
|
||||
- **Documentation**: Clear README and code documentation
|
||||
- **Problem Articulation**: Clear explanation of challenges and solutions
|
||||
|
||||
-
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Suggested Tech Stack
|
||||
**Required:**
|
||||
- PyTorch or TensorFlow/Keras
|
||||
- HuggingFace Transformers
|
||||
- pandas, numpy, scikit-learn
|
||||
- matplotlib/seaborn for visualization
|
||||
- FastAPI for inference API
|
||||
|
||||
|
||||
|
||||
## 🔧 Pipeline Requirements
|
||||
|
||||
### Training Pipeline (`pipelines/train_pipeline.py`)
|
||||
**Must include:**
|
||||
```python
|
||||
# Key components your training pipeline should handle:
|
||||
- Config loading and validation
|
||||
- Data loading and preprocessing
|
||||
- Model initialization
|
||||
- Training loop with logging
|
||||
- Validation and early stopping
|
||||
- Model checkpointing and saving
|
||||
- Experiment metadata logging
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python pipelines/train_pipeline.py --config configs/train_config.yaml --experiment_name "baseline_bert"
|
||||
```
|
||||
|
||||
### Inference Pipeline (`pipelines/inference_pipeline.py`)
|
||||
**Must include:**
|
||||
```python
|
||||
# Key components your inference pipeline should handle:
|
||||
- Model loading from checkpoint
|
||||
- Input data preprocessing
|
||||
- Batch or single prediction
|
||||
- Confidence score calculation
|
||||
- Output formatting and saving
|
||||
- Error handling for malformed inputs
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Single prediction
|
||||
python pipelines/inference_pipeline.py --model_path models/best_model.pt --text "I want to return my order"
|
||||
|
||||
# Batch prediction
|
||||
python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/new_queries.csv --output_file results/predictions.csv
|
||||
```
|
||||
|
||||
### Evaluation Pipeline (`pipelines/evaluation_pipeline.py`)
|
||||
**Must include:**
|
||||
```python
|
||||
# Key components your evaluation pipeline should handle:
|
||||
- Model loading from checkpoint
|
||||
- Test data loading and preprocessing
|
||||
- Comprehensive evaluation metrics
|
||||
- Confusion matrix and classification report
|
||||
- Error analysis with examples
|
||||
- Results saving and visualization
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv --output_dir results/evaluation/
|
||||
```
|
||||
|
||||
|
||||
## ⚡ Success Indicators
|
||||
- All three pipelines execute successfully without errors
|
||||
- Jupyter notebooks show clear data insights and experimentation
|
||||
- Model achieves >85% accuracy on test set
|
||||
- Clean, production-ready code structure with proper separation of concerns
|
||||
- Comprehensive evaluation with actionable insights
|
||||
- Clear demonstration of ML engineering best practices
|
||||
- Ability to articulate technical decisions confidently during demo
|
||||
Reference in New Issue
Block a user