8.6 KiB
ML Engineer Assessment: Custom Model Fine-tuning Challenge
🎯 Scenario
You are tasked with building a Customer Support Intent Classification System for our e-commerce platform. The system should automatically categorize customer inquiries to route them to appropriate support teams.
📊 Dataset Provided
BANKING77 Dataset - Real banking customer service queries
- Source: HuggingFace
datasetslibrary (banking77) - Size: 13,083 labeled customer queries
- Classes: 77 banking-related intents (card_arrival, transfer, balance, etc.)
- Format:
{'text': 'query', 'label': intent_id} - Split: You'll need to create train/validation/test splits
- Domain: Banking and financial services customer support
Sample Data Points:
"What is the base rate of the bank?" → get_exchange_rate
"I am still waiting on my card" → card_arrival
"Can you help me make a payment?" → transfer
🎯 Your Mission
Build a complete fine-tuning pipeline for Banking Customer Support Intent Classification that demonstrates your ML engineering skills across the full lifecycle.
🔧 Technical Requirements
Core Implementation (Must Have)
-
Model Selection & Fine-tuning
- Choose and justify a pre-trained model (BERT, RoBERTa, DistilBERT, etc.)
- Implement fine-tuning with proper hyperparameter configuration
- Handle class imbalance if present
-
Data Pipeline
- Clean and preprocess the provided dataset
- Implement proper train/val/test splits (70/15/15)
- Create data loaders with appropriate batching
-
Training Infrastructure
- Implement training loop with proper logging
- Add early stopping and learning rate scheduling
- Track key metrics (accuracy, F1-macro, F1-per-class)
-
Evaluation & Metrics
- Comprehensive evaluation on test set
- Confusion matrix and classification report
- Error analysis with examples
-
Inference Demo
- Create a simple inference script/API
- Demonstrate prediction on new examples
- Show confidence scores
-
Executable Pipelines (Required)
- Training Pipeline: End-to-end automated training with single command
- Inference Pipeline: Batch or single prediction pipeline
- Evaluation Pipeline: Automated model evaluation and reporting
-
Jupyter Notebooks (Required)
- Data Exploration: EDA, class distribution, sample analysis
- Model Experimentation: Different approaches, hyperparameter testing
- Results Analysis: Performance analysis, error analysis, insights
Additional Features
- Experiment tracking ( simple logging)
- Model versioning and checkpointing
- Hyperparameter optimization
- Simple web interface for testing
📋 Deliverables
1. Code Structure (Clean & Modular)
project/
├── data/
│ ├── raw/
│ └── processed/
├── notebooks/
│ ├── 01_data_exploration.ipynb
│ ├── 02_model_experimentation.ipynb
│ └── 03_results_analysis.ipynb
├── src/
│ ├── __init__.py
│ ├── data_preprocessing.py
│ ├── model.py
│ ├── train.py
│ ├── evaluate.py
│ ├── inference.py
│ └── utils.py
├── pipelines/
│ ├── train_pipeline.py
│ ├── inference_pipeline.py
│ └── evaluation_pipeline.py
├── configs/
│ ├── model_config.yaml
│ ├── train_config.yaml
│ └── inference_config.yaml
├── experiments/
│ └── logs/
├── models/
│ └── checkpoints/
├── requirements.txt
├── README.md
└── run_demo.py
2. Documentation & Notebooks
- README.md: Setup instructions, usage examples, design decisions
- Jupyter Notebooks:
- Data exploration with visualizations and insights
- Model experimentation and hyperparameter analysis
- Results analysis with error examples and improvement suggestions
- Code comments: Clear docstrings and inline comments
- Results summary: Model performance, key findings
3. Executable Pipelines
Create command-line interfaces for each major workflow:
Training Pipeline:
python pipelines/train_pipeline.py --config configs/train_config.yaml
# Should handle: data loading → preprocessing → training → validation → model saving
Inference Pipeline:
python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/test_samples.csv
# Should handle: model loading → preprocessing → batch prediction → output saving
Evaluation Pipeline:
python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv
# Should handle: model loading → evaluation → metrics calculation → report generation
3. Live Dem
- Pipeline Demonstration: Show training, inference, and evaluation pipelines in action
- Notebook Walkthrough: Key insights from data exploration and experimentation
- Code Architecture: Explain design choices and component interactions
- Performance Analysis: Model results, error analysis, improvement ideas
- Q&A: Discuss trade-offs, improvements, production considerations
🎯 Evaluation Criteria
Technical Skills
- Code Quality: Clean, modular, well-documented code
- ML Implementation: Proper fine-tuning, evaluation, metrics
- Data Handling: Preprocessing, splitting, batching
- Pipeline Design: Executable, configurable, and robust workflows
- Notebook Quality: Clear analysis, insights, and experimentation
System Design
- Architecture: Logical code organization and separation of concerns
- Pipeline Integration: Seamless flow between training, inference, and evaluation
- Configurability: Easy to modify hyperparameters and model choices
- Reproducibility: Consistent results across runs
- Best Practices: Following ML engineering conventions
Problem Solving
- Dataset Analysis: Understanding data characteristics and challenges
- Model Choice: Justified selection of model and approach
- Performance Optimization: Addressing class imbalance, overfitting, etc.
- Trade-off Awareness: Understanding of speed vs accuracy, etc.
Communication
-
Documentation: Clear README and code documentation
-
Problem Articulation: Clear explanation of challenges and solutions
🛠️ Suggested Tech Stack
Required:
- PyTorch or TensorFlow/Keras
- HuggingFace Transformers
- pandas, numpy, scikit-learn
- matplotlib/seaborn for visualization
- FastAPI for inference API
🔧 Pipeline Requirements
Training Pipeline (pipelines/train_pipeline.py)
Must include:
# Key components your training pipeline should handle:
- Config loading and validation
- Data loading and preprocessing
- Model initialization
- Training loop with logging
- Validation and early stopping
- Model checkpointing and saving
- Experiment metadata logging
Usage:
python pipelines/train_pipeline.py --config configs/train_config.yaml --experiment_name "baseline_bert"
Inference Pipeline (pipelines/inference_pipeline.py)
Must include:
# Key components your inference pipeline should handle:
- Model loading from checkpoint
- Input data preprocessing
- Batch or single prediction
- Confidence score calculation
- Output formatting and saving
- Error handling for malformed inputs
Usage:
# Single prediction
python pipelines/inference_pipeline.py --model_path models/best_model.pt --text "I want to return my order"
# Batch prediction
python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/new_queries.csv --output_file results/predictions.csv
Evaluation Pipeline (pipelines/evaluation_pipeline.py)
Must include:
# Key components your evaluation pipeline should handle:
- Model loading from checkpoint
- Test data loading and preprocessing
- Comprehensive evaluation metrics
- Confusion matrix and classification report
- Error analysis with examples
- Results saving and visualization
Usage:
python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv --output_dir results/evaluation/
⚡ Success Indicators
- All three pipelines execute successfully without errors
- Jupyter notebooks show clear data insights and experimentation
- Model achieves >85% accuracy on test set
- Clean, production-ready code structure with proper separation of concerns
- Comprehensive evaluation with actionable insights
- Clear demonstration of ML engineering best practices
- Ability to articulate technical decisions confidently during demo