Test/TuneML-Custom-Model-Fine-tuning-Challenge

Fork 0

Files

T

OwusuBlessing a8b09f2a21 first commit

2025-07-25 21:05:23 +01:00

8.6 KiB

Raw Blame History

ML Engineer Assessment: Custom Model Fine-tuning Challenge

🎯 Scenario

You are tasked with building a Customer Support Intent Classification System for our e-commerce platform. The system should automatically categorize customer inquiries to route them to appropriate support teams.

📊 Dataset Provided

BANKING77 Dataset - Real banking customer service queries

Source: HuggingFace datasets library (banking77)
Size: 13,083 labeled customer queries
Classes: 77 banking-related intents (card_arrival, transfer, balance, etc.)
Format: {'text': 'query', 'label': intent_id}
Split: You'll need to create train/validation/test splits
Domain: Banking and financial services customer support

Sample Data Points:

"What is the base rate of the bank?" → get_exchange_rate
"I am still waiting on my card" → card_arrival  
"Can you help me make a payment?" → transfer

🎯 Your Mission

Build a complete fine-tuning pipeline for Banking Customer Support Intent Classification that demonstrates your ML engineering skills across the full lifecycle.

🔧 Technical Requirements

Core Implementation (Must Have)

Model Selection & Fine-tuning
- Choose and justify a pre-trained model (BERT, RoBERTa, DistilBERT, etc.)
- Implement fine-tuning with proper hyperparameter configuration
- Handle class imbalance if present
Data Pipeline
- Clean and preprocess the provided dataset
- Implement proper train/val/test splits (70/15/15)
- Create data loaders with appropriate batching
Training Infrastructure
- Implement training loop with proper logging
- Add early stopping and learning rate scheduling
- Track key metrics (accuracy, F1-macro, F1-per-class)
Evaluation & Metrics
- Comprehensive evaluation on test set
- Confusion matrix and classification report
- Error analysis with examples
Inference Demo
- Create a simple inference script/API
- Demonstrate prediction on new examples
- Show confidence scores
Executable Pipelines (Required)
- Training Pipeline: End-to-end automated training with single command
- Inference Pipeline: Batch or single prediction pipeline
- Evaluation Pipeline: Automated model evaluation and reporting
Jupyter Notebooks (Required)
- Data Exploration: EDA, class distribution, sample analysis
- Model Experimentation: Different approaches, hyperparameter testing
- Results Analysis: Performance analysis, error analysis, insights

Additional Features

Experiment tracking ( simple logging)
Model versioning and checkpointing
Hyperparameter optimization
Simple web interface for testing

📋 Deliverables

1. Code Structure (Clean & Modular)

project/
├── data/
│   ├── raw/
│   └── processed/
├── notebooks/
│   ├── 01_data_exploration.ipynb
│   ├── 02_model_experimentation.ipynb
│   └── 03_results_analysis.ipynb
├── src/
│   ├── __init__.py
│   ├── data_preprocessing.py
│   ├── model.py
│   ├── train.py
│   ├── evaluate.py
│   ├── inference.py
│   └── utils.py
├── pipelines/
│   ├── train_pipeline.py
│   ├── inference_pipeline.py
│   └── evaluation_pipeline.py
├── configs/
│   ├── model_config.yaml
│   ├── train_config.yaml
│   └── inference_config.yaml
├── experiments/
│   └── logs/
├── models/
│   └── checkpoints/
├── requirements.txt
├── README.md
└── run_demo.py

2. Documentation & Notebooks

README.md: Setup instructions, usage examples, design decisions
Jupyter Notebooks:
- Data exploration with visualizations and insights
- Model experimentation and hyperparameter analysis
- Results analysis with error examples and improvement suggestions
Code comments: Clear docstrings and inline comments
Results summary: Model performance, key findings

3. Executable Pipelines

Create command-line interfaces for each major workflow:

Training Pipeline:

python pipelines/train_pipeline.py --config configs/train_config.yaml
# Should handle: data loading → preprocessing → training → validation → model saving

Inference Pipeline:

python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/test_samples.csv
# Should handle: model loading → preprocessing → batch prediction → output saving

Evaluation Pipeline:

python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv
# Should handle: model loading → evaluation → metrics calculation → report generation

3. Live Dem

Pipeline Demonstration: Show training, inference, and evaluation pipelines in action
Notebook Walkthrough: Key insights from data exploration and experimentation
Code Architecture: Explain design choices and component interactions
Performance Analysis: Model results, error analysis, improvement ideas
Q&A: Discuss trade-offs, improvements, production considerations

🎯 Evaluation Criteria

Technical Skills

Code Quality: Clean, modular, well-documented code
ML Implementation: Proper fine-tuning, evaluation, metrics
Data Handling: Preprocessing, splitting, batching
Pipeline Design: Executable, configurable, and robust workflows
Notebook Quality: Clear analysis, insights, and experimentation

System Design

Architecture: Logical code organization and separation of concerns
Pipeline Integration: Seamless flow between training, inference, and evaluation
Configurability: Easy to modify hyperparameters and model choices
Reproducibility: Consistent results across runs
Best Practices: Following ML engineering conventions

Problem Solving

Dataset Analysis: Understanding data characteristics and challenges
Model Choice: Justified selection of model and approach
Performance Optimization: Addressing class imbalance, overfitting, etc.
Trade-off Awareness: Understanding of speed vs accuracy, etc.

Communication

Documentation: Clear README and code documentation
Problem Articulation: Clear explanation of challenges and solutions

🛠️ Suggested Tech Stack

Required:

PyTorch or TensorFlow/Keras
HuggingFace Transformers
pandas, numpy, scikit-learn
matplotlib/seaborn for visualization
FastAPI for inference API

🔧 Pipeline Requirements

Training Pipeline (`pipelines/train_pipeline.py`)

Must include:

# Key components your training pipeline should handle:
- Config loading and validation
- Data loading and preprocessing
- Model initialization
- Training loop with logging
- Validation and early stopping
- Model checkpointing and saving
- Experiment metadata logging

Usage:

python pipelines/train_pipeline.py --config configs/train_config.yaml --experiment_name "baseline_bert"

Inference Pipeline (`pipelines/inference_pipeline.py`)

Must include:

# Key components your inference pipeline should handle:
- Model loading from checkpoint
- Input data preprocessing
- Batch or single prediction
- Confidence score calculation
- Output formatting and saving
- Error handling for malformed inputs

Usage:

# Single prediction
python pipelines/inference_pipeline.py --model_path models/best_model.pt --text "I want to return my order"

# Batch prediction
python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/new_queries.csv --output_file results/predictions.csv

Evaluation Pipeline (`pipelines/evaluation_pipeline.py`)

Must include:

# Key components your evaluation pipeline should handle:
- Model loading from checkpoint
- Test data loading and preprocessing
- Comprehensive evaluation metrics
- Confusion matrix and classification report
- Error analysis with examples
- Results saving and visualization

Usage:

python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv --output_dir results/evaluation/

⚡ Success Indicators

All three pipelines execute successfully without errors
Jupyter notebooks show clear data insights and experimentation
Model achieves >85% accuracy on test set
Clean, production-ready code structure with proper separation of concerns
Comprehensive evaluation with actionable insights
Clear demonstration of ML engineering best practices
Ability to articulate technical decisions confidently during demo

8.6 KiB Raw Blame History

ML Engineer Assessment: Custom Model Fine-tuning Challenge

🎯 Scenario

📊 Dataset Provided

🎯 Your Mission

🔧 Technical Requirements

Core Implementation (Must Have)

Additional Features

📋 Deliverables

1. Code Structure (Clean & Modular)

2. Documentation & Notebooks

3. Executable Pipelines

3. Live Dem

🎯 Evaluation Criteria

Technical Skills

System Design

Problem Solving

Communication

🛠️ Suggested Tech Stack

🔧 Pipeline Requirements

Training Pipeline (pipelines/train_pipeline.py)

Inference Pipeline (pipelines/inference_pipeline.py)

Evaluation Pipeline (pipelines/evaluation_pipeline.py)

⚡ Success Indicators

8.6 KiB

Raw Blame History

Training Pipeline (`pipelines/train_pipeline.py`)

Inference Pipeline (`pipelines/inference_pipeline.py`)

Evaluation Pipeline (`pipelines/evaluation_pipeline.py`)