Files
TuneML-Custom-Model-Fine-tu…/README.md
T
OwusuBlessing a8b09f2a21 first commit
2025-07-25 21:05:23 +01:00

8.6 KiB

ML Engineer Assessment: Custom Model Fine-tuning Challenge

🎯 Scenario

You are tasked with building a Customer Support Intent Classification System for our e-commerce platform. The system should automatically categorize customer inquiries to route them to appropriate support teams.

📊 Dataset Provided

BANKING77 Dataset - Real banking customer service queries

  • Source: HuggingFace datasets library (banking77)
  • Size: 13,083 labeled customer queries
  • Classes: 77 banking-related intents (card_arrival, transfer, balance, etc.)
  • Format: {'text': 'query', 'label': intent_id}
  • Split: You'll need to create train/validation/test splits
  • Domain: Banking and financial services customer support

Sample Data Points:

"What is the base rate of the bank?" → get_exchange_rate
"I am still waiting on my card" → card_arrival  
"Can you help me make a payment?" → transfer

🎯 Your Mission

Build a complete fine-tuning pipeline for Banking Customer Support Intent Classification that demonstrates your ML engineering skills across the full lifecycle.


🔧 Technical Requirements

Core Implementation (Must Have)

  1. Model Selection & Fine-tuning

    • Choose and justify a pre-trained model (BERT, RoBERTa, DistilBERT, etc.)
    • Implement fine-tuning with proper hyperparameter configuration
    • Handle class imbalance if present
  2. Data Pipeline

    • Clean and preprocess the provided dataset
    • Implement proper train/val/test splits (70/15/15)
    • Create data loaders with appropriate batching
  3. Training Infrastructure

    • Implement training loop with proper logging
    • Add early stopping and learning rate scheduling
    • Track key metrics (accuracy, F1-macro, F1-per-class)
  4. Evaluation & Metrics

    • Comprehensive evaluation on test set
    • Confusion matrix and classification report
    • Error analysis with examples
  5. Inference Demo

    • Create a simple inference script/API
    • Demonstrate prediction on new examples
    • Show confidence scores
  6. Executable Pipelines (Required)

    • Training Pipeline: End-to-end automated training with single command
    • Inference Pipeline: Batch or single prediction pipeline
    • Evaluation Pipeline: Automated model evaluation and reporting
  7. Jupyter Notebooks (Required)

    • Data Exploration: EDA, class distribution, sample analysis
    • Model Experimentation: Different approaches, hyperparameter testing
    • Results Analysis: Performance analysis, error analysis, insights

Additional Features

  • Experiment tracking ( simple logging)
  • Model versioning and checkpointing
  • Hyperparameter optimization
  • Simple web interface for testing

📋 Deliverables

1. Code Structure (Clean & Modular)

project/
├── data/
│   ├── raw/
│   └── processed/
├── notebooks/
│   ├── 01_data_exploration.ipynb
│   ├── 02_model_experimentation.ipynb
│   └── 03_results_analysis.ipynb
├── src/
│   ├── __init__.py
│   ├── data_preprocessing.py
│   ├── model.py
│   ├── train.py
│   ├── evaluate.py
│   ├── inference.py
│   └── utils.py
├── pipelines/
│   ├── train_pipeline.py
│   ├── inference_pipeline.py
│   └── evaluation_pipeline.py
├── configs/
│   ├── model_config.yaml
│   ├── train_config.yaml
│   └── inference_config.yaml
├── experiments/
│   └── logs/
├── models/
│   └── checkpoints/
├── requirements.txt
├── README.md
└── run_demo.py

2. Documentation & Notebooks

  • README.md: Setup instructions, usage examples, design decisions
  • Jupyter Notebooks:
    • Data exploration with visualizations and insights
    • Model experimentation and hyperparameter analysis
    • Results analysis with error examples and improvement suggestions
  • Code comments: Clear docstrings and inline comments
  • Results summary: Model performance, key findings

3. Executable Pipelines

Create command-line interfaces for each major workflow:

Training Pipeline:

python pipelines/train_pipeline.py --config configs/train_config.yaml
# Should handle: data loading → preprocessing → training → validation → model saving

Inference Pipeline:

python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/test_samples.csv
# Should handle: model loading → preprocessing → batch prediction → output saving

Evaluation Pipeline:

python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv
# Should handle: model loading → evaluation → metrics calculation → report generation

3. Live Dem

  • Pipeline Demonstration: Show training, inference, and evaluation pipelines in action
  • Notebook Walkthrough: Key insights from data exploration and experimentation
  • Code Architecture: Explain design choices and component interactions
  • Performance Analysis: Model results, error analysis, improvement ideas
  • Q&A: Discuss trade-offs, improvements, production considerations

🎯 Evaluation Criteria

Technical Skills

  • Code Quality: Clean, modular, well-documented code
  • ML Implementation: Proper fine-tuning, evaluation, metrics
  • Data Handling: Preprocessing, splitting, batching
  • Pipeline Design: Executable, configurable, and robust workflows
  • Notebook Quality: Clear analysis, insights, and experimentation

System Design

  • Architecture: Logical code organization and separation of concerns
  • Pipeline Integration: Seamless flow between training, inference, and evaluation
  • Configurability: Easy to modify hyperparameters and model choices
  • Reproducibility: Consistent results across runs
  • Best Practices: Following ML engineering conventions

Problem Solving

  • Dataset Analysis: Understanding data characteristics and challenges
  • Model Choice: Justified selection of model and approach
  • Performance Optimization: Addressing class imbalance, overfitting, etc.
  • Trade-off Awareness: Understanding of speed vs accuracy, etc.

Communication

  • Documentation: Clear README and code documentation

  • Problem Articulation: Clear explanation of challenges and solutions


🛠️ Suggested Tech Stack

Required:

  • PyTorch or TensorFlow/Keras
  • HuggingFace Transformers
  • pandas, numpy, scikit-learn
  • matplotlib/seaborn for visualization
  • FastAPI for inference API

🔧 Pipeline Requirements

Training Pipeline (pipelines/train_pipeline.py)

Must include:

# Key components your training pipeline should handle:
- Config loading and validation
- Data loading and preprocessing
- Model initialization
- Training loop with logging
- Validation and early stopping
- Model checkpointing and saving
- Experiment metadata logging

Usage:

python pipelines/train_pipeline.py --config configs/train_config.yaml --experiment_name "baseline_bert"

Inference Pipeline (pipelines/inference_pipeline.py)

Must include:

# Key components your inference pipeline should handle:
- Model loading from checkpoint
- Input data preprocessing
- Batch or single prediction
- Confidence score calculation
- Output formatting and saving
- Error handling for malformed inputs

Usage:

# Single prediction
python pipelines/inference_pipeline.py --model_path models/best_model.pt --text "I want to return my order"

# Batch prediction
python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/new_queries.csv --output_file results/predictions.csv

Evaluation Pipeline (pipelines/evaluation_pipeline.py)

Must include:

# Key components your evaluation pipeline should handle:
- Model loading from checkpoint
- Test data loading and preprocessing
- Comprehensive evaluation metrics
- Confusion matrix and classification report
- Error analysis with examples
- Results saving and visualization

Usage:

python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv --output_dir results/evaluation/

Success Indicators

  • All three pipelines execute successfully without errors
  • Jupyter notebooks show clear data insights and experimentation
  • Model achieves >85% accuracy on test set
  • Clean, production-ready code structure with proper separation of concerns
  • Comprehensive evaluation with actionable insights
  • Clear demonstration of ML engineering best practices
  • Ability to articulate technical decisions confidently during demo