# ML Engineer Assessment: Custom Model Fine-tuning Challenge ## 🎯 Scenario You are tasked with building a **Customer Support Intent Classification System** for our e-commerce platform. The system should automatically categorize customer inquiries to route them to appropriate support teams. ## 📊 Dataset Provided **BANKING77 Dataset** - Real banking customer service queries - **Source**: HuggingFace `datasets` library (`banking77`) - **Size**: 13,083 labeled customer queries - **Classes**: 77 banking-related intents (card_arrival, transfer, balance, etc.) - **Format**: `{'text': 'query', 'label': intent_id}` - **Split**: You'll need to create train/validation/test splits - **Domain**: Banking and financial services customer support **Sample Data Points:** ``` "What is the base rate of the bank?" → get_exchange_rate "I am still waiting on my card" → card_arrival "Can you help me make a payment?" → transfer ``` ## 🎯 Your Mission Build a complete fine-tuning pipeline for **Banking Customer Support Intent Classification** that demonstrates your ML engineering skills across the full lifecycle. --- ## 🔧 Technical Requirements ### Core Implementation (Must Have) 1. **Model Selection & Fine-tuning** - Choose and justify a pre-trained model (BERT, RoBERTa, DistilBERT, etc.) - Implement fine-tuning with proper hyperparameter configuration - Handle class imbalance if present 2. **Data Pipeline** - Clean and preprocess the provided dataset - Implement proper train/val/test splits (70/15/15) - Create data loaders with appropriate batching 3. **Training Infrastructure** - Implement training loop with proper logging - Add early stopping and learning rate scheduling - Track key metrics (accuracy, F1-macro, F1-per-class) 4. **Evaluation & Metrics** - Comprehensive evaluation on test set - Confusion matrix and classification report - Error analysis with examples 5. **Inference Demo** - Create a simple inference script/API - Demonstrate prediction on new examples - Show confidence scores 6. **Executable Pipelines** (Required) - **Training Pipeline**: End-to-end automated training with single command - **Inference Pipeline**: Batch or single prediction pipeline - **Evaluation Pipeline**: Automated model evaluation and reporting 7. **Jupyter Notebooks** (Required) - **Data Exploration**: EDA, class distribution, sample analysis - **Model Experimentation**: Different approaches, hyperparameter testing - **Results Analysis**: Performance analysis, error analysis, insights ### Additional Features - Experiment tracking ( simple logging) - Model versioning and checkpointing - Hyperparameter optimization - Simple web interface for testing --- ## 📋 Deliverables ### 1. Code Structure (Clean & Modular) ``` project/ ├── data/ │ ├── raw/ │ └── processed/ ├── notebooks/ │ ├── 01_data_exploration.ipynb │ ├── 02_model_experimentation.ipynb │ └── 03_results_analysis.ipynb ├── src/ │ ├── __init__.py │ ├── data_preprocessing.py │ ├── model.py │ ├── train.py │ ├── evaluate.py │ ├── inference.py │ └── utils.py ├── pipelines/ │ ├── train_pipeline.py │ ├── inference_pipeline.py │ └── evaluation_pipeline.py ├── configs/ │ ├── model_config.yaml │ ├── train_config.yaml │ └── inference_config.yaml ├── experiments/ │ └── logs/ ├── models/ │ └── checkpoints/ ├── requirements.txt ├── README.md └── run_demo.py ``` ### 2. Documentation & Notebooks - **README.md**: Setup instructions, usage examples, design decisions - **Jupyter Notebooks**: - Data exploration with visualizations and insights - Model experimentation and hyperparameter analysis - Results analysis with error examples and improvement suggestions - **Code comments**: Clear docstrings and inline comments - **Results summary**: Model performance, key findings ### 3. Executable Pipelines Create command-line interfaces for each major workflow: **Training Pipeline:** ```bash python pipelines/train_pipeline.py --config configs/train_config.yaml # Should handle: data loading → preprocessing → training → validation → model saving ``` **Inference Pipeline:** ```bash python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/test_samples.csv # Should handle: model loading → preprocessing → batch prediction → output saving ``` **Evaluation Pipeline:** ```bash python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv # Should handle: model loading → evaluation → metrics calculation → report generation ``` ### 3. Live Dem - **Pipeline Demonstration**: Show training, inference, and evaluation pipelines in action - **Notebook Walkthrough**: Key insights from data exploration and experimentation - **Code Architecture**: Explain design choices and component interactions - **Performance Analysis**: Model results, error analysis, improvement ideas - **Q&A**: Discuss trade-offs, improvements, production considerations --- ## 🎯 Evaluation Criteria ### Technical Skills - **Code Quality**: Clean, modular, well-documented code - **ML Implementation**: Proper fine-tuning, evaluation, metrics - **Data Handling**: Preprocessing, splitting, batching - **Pipeline Design**: Executable, configurable, and robust workflows - **Notebook Quality**: Clear analysis, insights, and experimentation ### System Design - **Architecture**: Logical code organization and separation of concerns - **Pipeline Integration**: Seamless flow between training, inference, and evaluation - **Configurability**: Easy to modify hyperparameters and model choices - **Reproducibility**: Consistent results across runs - **Best Practices**: Following ML engineering conventions ### Problem Solving - **Dataset Analysis**: Understanding data characteristics and challenges - **Model Choice**: Justified selection of model and approach - **Performance Optimization**: Addressing class imbalance, overfitting, etc. - **Trade-off Awareness**: Understanding of speed vs accuracy, etc. ### Communication - **Documentation**: Clear README and code documentation - **Problem Articulation**: Clear explanation of challenges and solutions - --- ## 🛠️ Suggested Tech Stack **Required:** - PyTorch or TensorFlow/Keras - HuggingFace Transformers - pandas, numpy, scikit-learn - matplotlib/seaborn for visualization - FastAPI for inference API ## 🔧 Pipeline Requirements ### Training Pipeline (`pipelines/train_pipeline.py`) **Must include:** ```python # Key components your training pipeline should handle: - Config loading and validation - Data loading and preprocessing - Model initialization - Training loop with logging - Validation and early stopping - Model checkpointing and saving - Experiment metadata logging ``` **Usage:** ```bash python pipelines/train_pipeline.py --config configs/train_config.yaml --experiment_name "baseline_bert" ``` ### Inference Pipeline (`pipelines/inference_pipeline.py`) **Must include:** ```python # Key components your inference pipeline should handle: - Model loading from checkpoint - Input data preprocessing - Batch or single prediction - Confidence score calculation - Output formatting and saving - Error handling for malformed inputs ``` **Usage:** ```bash # Single prediction python pipelines/inference_pipeline.py --model_path models/best_model.pt --text "I want to return my order" # Batch prediction python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/new_queries.csv --output_file results/predictions.csv ``` ### Evaluation Pipeline (`pipelines/evaluation_pipeline.py`) **Must include:** ```python # Key components your evaluation pipeline should handle: - Model loading from checkpoint - Test data loading and preprocessing - Comprehensive evaluation metrics - Confusion matrix and classification report - Error analysis with examples - Results saving and visualization ``` **Usage:** ```bash python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv --output_dir results/evaluation/ ``` ## ⚡ Success Indicators - All three pipelines execute successfully without errors - Jupyter notebooks show clear data insights and experimentation - Model achieves >85% accuracy on test set - Clean, production-ready code structure with proper separation of concerns - Comprehensive evaluation with actionable insights - Clear demonstration of ML engineering best practices - Ability to articulate technical decisions confidently during demo