README.md

# ML Engineer Assessment: Custom Model Fine-tuning Challenge

## 🎯 Scenario
You are tasked with building a **Customer Support Intent Classification System** for our e-commerce platform. The system should automatically categorize customer inquiries to route them to appropriate support teams.

## 📊 Dataset Provided
**BANKING77 Dataset** - Real banking customer service queries
- **Source**: HuggingFace `datasets` library (`banking77`)
- **Size**: 13,083 labeled customer queries
- **Classes**: 77 banking-related intents (card_arrival, transfer, balance, etc.)
- **Format**: `{'text': 'query', 'label': intent_id}`
- **Split**: You'll need to create train/validation/test splits
- **Domain**: Banking and financial services customer support

**Sample Data Points:**
```
"What is the base rate of the bank?" → get_exchange_rate
"I am still waiting on my card" → card_arrival  
"Can you help me make a payment?" → transfer
```

## 🎯 Your Mission
Build a complete fine-tuning pipeline for **Banking Customer Support Intent Classification** that demonstrates your ML engineering skills across the full lifecycle.

---

## 🔧 Technical Requirements

### Core Implementation (Must Have)
1. **Model Selection & Fine-tuning**
   - Choose and justify a pre-trained model (BERT, RoBERTa, DistilBERT, etc.)
   - Implement fine-tuning with proper hyperparameter configuration
   - Handle class imbalance if present

2. **Data Pipeline**
   - Clean and preprocess the provided dataset
   - Implement proper train/val/test splits (70/15/15)
   - Create data loaders with appropriate batching

3. **Training Infrastructure**
   - Implement training loop with proper logging
   - Add early stopping and learning rate scheduling
   - Track key metrics (accuracy, F1-macro, F1-per-class)

4. **Evaluation & Metrics**
   - Comprehensive evaluation on test set
   - Confusion matrix and classification report
   - Error analysis with examples

5. **Inference Demo**
   - Create a simple inference script/API
   - Demonstrate prediction on new examples
   - Show confidence scores

6. **Executable Pipelines** (Required)
   - **Training Pipeline**: End-to-end automated training with single command
   - **Inference Pipeline**: Batch or single prediction pipeline
   - **Evaluation Pipeline**: Automated model evaluation and reporting

7. **Jupyter Notebooks** (Required)
   - **Data Exploration**: EDA, class distribution, sample analysis
   - **Model Experimentation**: Different approaches, hyperparameter testing
   - **Results Analysis**: Performance analysis, error analysis, insights

### Additional  Features 
- Experiment tracking ( simple logging)
- Model versioning and checkpointing
- Hyperparameter optimization
- Simple web interface for testing

---

## 📋 Deliverables

### 1. Code Structure (Clean & Modular)
```
project/
├── data/
│   ├── raw/
│   └── processed/
├── notebooks/
│   ├── 01_data_exploration.ipynb
│   ├── 02_model_experimentation.ipynb
│   └── 03_results_analysis.ipynb
├── src/
│   ├── __init__.py
│   ├── data_preprocessing.py
│   ├── model.py
│   ├── train.py
│   ├── evaluate.py
│   ├── inference.py
│   └── utils.py
├── pipelines/
│   ├── train_pipeline.py
│   ├── inference_pipeline.py
│   └── evaluation_pipeline.py
├── configs/
│   ├── model_config.yaml
│   ├── train_config.yaml
│   └── inference_config.yaml
├── experiments/
│   └── logs/
├── models/
│   └── checkpoints/
├── requirements.txt
├── README.md
└── run_demo.py
```

### 2. Documentation & Notebooks
- **README.md**: Setup instructions, usage examples, design decisions
- **Jupyter Notebooks**: 
  - Data exploration with visualizations and insights
  - Model experimentation and hyperparameter analysis
  - Results analysis with error examples and improvement suggestions
- **Code comments**: Clear docstrings and inline comments
- **Results summary**: Model performance, key findings

### 3. Executable Pipelines
Create command-line interfaces for each major workflow:

**Training Pipeline:**
```bash
python pipelines/train_pipeline.py --config configs/train_config.yaml
# Should handle: data loading → preprocessing → training → validation → model saving
```

**Inference Pipeline:**
```bash
python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/test_samples.csv
# Should handle: model loading → preprocessing → batch prediction → output saving
```

**Evaluation Pipeline:**
```bash
python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv
# Should handle: model loading → evaluation → metrics calculation → report generation
```

### 3. Live Dem
- **Pipeline Demonstration**: Show training, inference, and evaluation pipelines in action
- **Notebook Walkthrough**: Key insights from data exploration and experimentation
- **Code Architecture**: Explain design choices and component interactions
- **Performance Analysis**: Model results, error analysis, improvement ideas
- **Q&A**: Discuss trade-offs, improvements, production considerations

---

## 🎯 Evaluation Criteria

### Technical Skills
- **Code Quality**: Clean, modular, well-documented code
- **ML Implementation**: Proper fine-tuning, evaluation, metrics
- **Data Handling**: Preprocessing, splitting, batching
- **Pipeline Design**: Executable, configurable, and robust workflows
- **Notebook Quality**: Clear analysis, insights, and experimentation

### System Design
- **Architecture**: Logical code organization and separation of concerns
- **Pipeline Integration**: Seamless flow between training, inference, and evaluation
- **Configurability**: Easy to modify hyperparameters and model choices
- **Reproducibility**: Consistent results across runs
- **Best Practices**: Following ML engineering conventions

### Problem Solving
- **Dataset Analysis**: Understanding data characteristics and challenges
- **Model Choice**: Justified selection of model and approach
- **Performance Optimization**: Addressing class imbalance, overfitting, etc.
- **Trade-off Awareness**: Understanding of speed vs accuracy, etc.

### Communication
- **Documentation**: Clear README and code documentation
- **Problem Articulation**: Clear explanation of challenges and solutions

-

---

## 🛠️ Suggested Tech Stack
**Required:**
- PyTorch or TensorFlow/Keras
- HuggingFace Transformers
- pandas, numpy, scikit-learn
- matplotlib/seaborn for visualization
- FastAPI for inference API


## 🔧 Pipeline Requirements

### Training Pipeline (`pipelines/train_pipeline.py`)
**Must include:**
```python
# Key components your training pipeline should handle:
- Config loading and validation
- Data loading and preprocessing
- Model initialization
- Training loop with logging
- Validation and early stopping
- Model checkpointing and saving
- Experiment metadata logging
```

**Usage:**
```bash
python pipelines/train_pipeline.py --config configs/train_config.yaml --experiment_name "baseline_bert"
```

### Inference Pipeline (`pipelines/inference_pipeline.py`)
**Must include:**
```python
# Key components your inference pipeline should handle:
- Model loading from checkpoint
- Input data preprocessing
- Batch or single prediction
- Confidence score calculation
- Output formatting and saving
- Error handling for malformed inputs
```

**Usage:**
```bash
# Single prediction
python pipelines/inference_pipeline.py --model_path models/best_model.pt --text "I want to return my order"

# Batch prediction
python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/new_queries.csv --output_file results/predictions.csv
```

### Evaluation Pipeline (`pipelines/evaluation_pipeline.py`)
**Must include:**
```python
# Key components your evaluation pipeline should handle:
- Model loading from checkpoint
- Test data loading and preprocessing
- Comprehensive evaluation metrics
- Confusion matrix and classification report
- Error analysis with examples
- Results saving and visualization
```

**Usage:**
```bash
python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv --output_dir results/evaluation/
```


## ⚡ Success Indicators
- All three pipelines execute successfully without errors
- Jupyter notebooks show clear data insights and experimentation
- Model achieves >85% accuracy on test set
- Clean, production-ready code structure with proper separation of concerns
- Comprehensive evaluation with actionable insights
- Clear demonstration of ML engineering best practices
- Ability to articulate technical decisions confidently during demo
first commit 2025-07-25 21:05:23 +01:00			`# ML Engineer Assessment: Custom Model Fine-tuning Challenge`

			`## 🎯 Scenario`
			`You are tasked with building a Customer Support Intent Classification System for our e-commerce platform. The system should automatically categorize customer inquiries to route them to appropriate support teams.`

			`## 📊 Dataset Provided`
			`BANKING77 Dataset - Real banking customer service queries`
			- Source: HuggingFace `datasets` library (`banking77`)
			`- Size: 13,083 labeled customer queries`
			`- Classes: 77 banking-related intents (card_arrival, transfer, balance, etc.)`
			- Format: `{'text': 'query', 'label': intent_id}`
			`- Split: You'll need to create train/validation/test splits`
			`- Domain: Banking and financial services customer support`

			`Sample Data Points:`
			```
			`"What is the base rate of the bank?" → get_exchange_rate`
			`"I am still waiting on my card" → card_arrival`
			`"Can you help me make a payment?" → transfer`
			```

			`## 🎯 Your Mission`
			`Build a complete fine-tuning pipeline for Banking Customer Support Intent Classification that demonstrates your ML engineering skills across the full lifecycle.`

			`---`

			`## 🔧 Technical Requirements`

			`### Core Implementation (Must Have)`
			`1. Model Selection & Fine-tuning`
			`- Choose and justify a pre-trained model (BERT, RoBERTa, DistilBERT, etc.)`
			`- Implement fine-tuning with proper hyperparameter configuration`
			`- Handle class imbalance if present`

			`2. Data Pipeline`
			`- Clean and preprocess the provided dataset`
			`- Implement proper train/val/test splits (70/15/15)`
			`- Create data loaders with appropriate batching`

			`3. Training Infrastructure`
			`- Implement training loop with proper logging`
			`- Add early stopping and learning rate scheduling`
			`- Track key metrics (accuracy, F1-macro, F1-per-class)`

			`4. Evaluation & Metrics`
			`- Comprehensive evaluation on test set`
			`- Confusion matrix and classification report`
			`- Error analysis with examples`

			`5. Inference Demo`
			`- Create a simple inference script/API`
			`- Demonstrate prediction on new examples`
			`- Show confidence scores`

			`6. Executable Pipelines (Required)`
			`- Training Pipeline: End-to-end automated training with single command`
			`- Inference Pipeline: Batch or single prediction pipeline`
			`- Evaluation Pipeline: Automated model evaluation and reporting`

			`7. Jupyter Notebooks (Required)`
			`- Data Exploration: EDA, class distribution, sample analysis`
			`- Model Experimentation: Different approaches, hyperparameter testing`
			`- Results Analysis: Performance analysis, error analysis, insights`

			`### Additional Features`
			`- Experiment tracking ( simple logging)`
			`- Model versioning and checkpointing`
			`- Hyperparameter optimization`
			`- Simple web interface for testing`

			`---`

			`## 📋 Deliverables`

			`### 1. Code Structure (Clean & Modular)`
			```
			`project/`
			`├── data/`
			`│ ├── raw/`
			`│ └── processed/`
			`├── notebooks/`
			`│ ├── 01_data_exploration.ipynb`
			`│ ├── 02_model_experimentation.ipynb`
			`│ └── 03_results_analysis.ipynb`
			`├── src/`
			`│ ├── __init__.py`
			`│ ├── data_preprocessing.py`
			`│ ├── model.py`
			`│ ├── train.py`
			`│ ├── evaluate.py`
			`│ ├── inference.py`
			`│ └── utils.py`
			`├── pipelines/`
			`│ ├── train_pipeline.py`
			`│ ├── inference_pipeline.py`
			`│ └── evaluation_pipeline.py`
			`├── configs/`
			`│ ├── model_config.yaml`
			`│ ├── train_config.yaml`
			`│ └── inference_config.yaml`
			`├── experiments/`
			`│ └── logs/`
			`├── models/`
			`│ └── checkpoints/`
			`├── requirements.txt`
			`├── README.md`
			`└── run_demo.py`
			```

			`### 2. Documentation & Notebooks`
			`- README.md: Setup instructions, usage examples, design decisions`
			`- Jupyter Notebooks:`
			`- Data exploration with visualizations and insights`
			`- Model experimentation and hyperparameter analysis`
			`- Results analysis with error examples and improvement suggestions`
			`- Code comments: Clear docstrings and inline comments`
			`- Results summary: Model performance, key findings`

			`### 3. Executable Pipelines`
			`Create command-line interfaces for each major workflow:`

			`Training Pipeline:`
			```bash
			`python pipelines/train_pipeline.py --config configs/train_config.yaml`
			`# Should handle: data loading → preprocessing → training → validation → model saving`
			```

			`Inference Pipeline:`
			```bash
			`python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/test_samples.csv`
			`# Should handle: model loading → preprocessing → batch prediction → output saving`
			```

			`Evaluation Pipeline:`
			```bash
			`python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv`
			`# Should handle: model loading → evaluation → metrics calculation → report generation`
			```

			`### 3. Live Dem`
			`- Pipeline Demonstration: Show training, inference, and evaluation pipelines in action`
			`- Notebook Walkthrough: Key insights from data exploration and experimentation`
			`- Code Architecture: Explain design choices and component interactions`
			`- Performance Analysis: Model results, error analysis, improvement ideas`
			`- Q&A: Discuss trade-offs, improvements, production considerations`

			`---`

			`## 🎯 Evaluation Criteria`

			`### Technical Skills`
			`- Code Quality: Clean, modular, well-documented code`
			`- ML Implementation: Proper fine-tuning, evaluation, metrics`
			`- Data Handling: Preprocessing, splitting, batching`
			`- Pipeline Design: Executable, configurable, and robust workflows`
			`- Notebook Quality: Clear analysis, insights, and experimentation`

			`### System Design`
			`- Architecture: Logical code organization and separation of concerns`
			`- Pipeline Integration: Seamless flow between training, inference, and evaluation`
			`- Configurability: Easy to modify hyperparameters and model choices`
			`- Reproducibility: Consistent results across runs`
			`- Best Practices: Following ML engineering conventions`

			`### Problem Solving`
			`- Dataset Analysis: Understanding data characteristics and challenges`
			`- Model Choice: Justified selection of model and approach`
			`- Performance Optimization: Addressing class imbalance, overfitting, etc.`
			`- Trade-off Awareness: Understanding of speed vs accuracy, etc.`

			`### Communication`
			`- Documentation: Clear README and code documentation`
			`- Problem Articulation: Clear explanation of challenges and solutions`

			`-`

			`---`

			`## 🛠️ Suggested Tech Stack`
			`Required:`
			`- PyTorch or TensorFlow/Keras`
			`- HuggingFace Transformers`
			`- pandas, numpy, scikit-learn`
			`- matplotlib/seaborn for visualization`
			`- FastAPI for inference API`



			`## 🔧 Pipeline Requirements`

			### Training Pipeline (`pipelines/train_pipeline.py`)
			`Must include:`
			```python
			`# Key components your training pipeline should handle:`
			`- Config loading and validation`
			`- Data loading and preprocessing`
			`- Model initialization`
			`- Training loop with logging`
			`- Validation and early stopping`
			`- Model checkpointing and saving`
			`- Experiment metadata logging`
			```

			`Usage:`
			```bash
			`python pipelines/train_pipeline.py --config configs/train_config.yaml --experiment_name "baseline_bert"`
			```

			### Inference Pipeline (`pipelines/inference_pipeline.py`)
			`Must include:`
			```python
			`# Key components your inference pipeline should handle:`
			`- Model loading from checkpoint`
			`- Input data preprocessing`
			`- Batch or single prediction`
			`- Confidence score calculation`
			`- Output formatting and saving`
			`- Error handling for malformed inputs`
			```

			`Usage:`
			```bash
			`# Single prediction`
			`python pipelines/inference_pipeline.py --model_path models/best_model.pt --text "I want to return my order"`

			`# Batch prediction`
			`python pipelines/inference_pipeline.py --model_path models/best_model.pt --input_file data/new_queries.csv --output_file results/predictions.csv`
			```

			### Evaluation Pipeline (`pipelines/evaluation_pipeline.py`)
			`Must include:`
			```python
			`# Key components your evaluation pipeline should handle:`
			`- Model loading from checkpoint`
			`- Test data loading and preprocessing`
			`- Comprehensive evaluation metrics`
			`- Confusion matrix and classification report`
			`- Error analysis with examples`
			`- Results saving and visualization`
			```

			`Usage:`
			```bash
			`python pipelines/evaluation_pipeline.py --model_path models/best_model.pt --test_data data/test.csv --output_dir results/evaluation/`
			```


			`## ⚡ Success Indicators`
			`- All three pipelines execute successfully without errors`
			`- Jupyter notebooks show clear data insights and experimentation`
			`- Model achieves >85% accuracy on test set`
			`- Clean, production-ready code structure with proper separation of concerns`
			`- Comprehensive evaluation with actionable insights`
			`- Clear demonstration of ML engineering best practices`
			`- Ability to articulate technical decisions confidently during demo`