Files
task_fraud_detection/experiments/model_training.ipynb
T
Aherobo Ovie Victor 07c7df3067 code reviewed
2025-07-22 22:05:14 +01:00

2744 lines
122 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model Training for Fraud Detection\n",
"\n",
"This notebook focuses on training and evaluating machine learning models for fraud detection using the preprocessed transaction data.\n",
"\n",
"## Enhanced Features (Addressing Code Review):\n",
"- **Parameter configurations**: Easy-to-modify settings for testing different hypotheses\n",
"- **Easy model switching**: Flexible architecture for testing different algorithms\n",
"- **Detailed confusion matrix analysis**: Comprehensive precision/recall analysis across models, parameters, and balancing techniques\n",
"- **Class balancing comparison**: SMOTE vs Downsampling vs Class Weighting with thorough analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🎛️ Enhanced Configuration Section\n",
"**Easy-to-modify parameters for testing different hypotheses and configurations**\n",
"\n",
"### Quick Start Guide:\n",
"1. **For Model Comparison**: Set multiple models to `True` in `MODELS_TO_TEST`\n",
"2. **For Parameter Tuning**: Modify `MODEL_PARAMS` ranges for specific models\n",
"3. **For Balancing Analysis**: Enable different techniques in `BALANCING_TECHNIQUES`\n",
"4. **For Business Focus**: Adjust `EVALUATION_CONFIG['scoring_metric']` based on priorities"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ================================\n",
"# 🎛️ EXPERIMENT CONFIGURATION\n",
"# ================================\n",
"\n",
"# Model Selection (set to True to include in experiments)\n",
"MODELS_TO_TEST = {\n",
" 'logistic_regression': True,\n",
" 'random_forest': True,\n",
" 'gradient_boosting': True,\n",
" 'xgboost': True\n",
"}\n",
"\n",
"# Class Balancing Techniques (set to True to include)\n",
"BALANCING_TECHNIQUES = {\n",
" 'smote': True,\n",
" 'random_downsample': True,\n",
" 'class_weight': True,\n",
" 'no_balancing': True # Baseline\n",
"}\n",
"\n",
"# Model Parameters\n",
"MODEL_PARAMS = {\n",
" 'logistic_regression': {\n",
" 'max_iter': [1000, 2000],\n",
" 'C': [0.1, 1.0, 10.0]\n",
" },\n",
" 'random_forest': {\n",
" 'n_estimators': [100, 200],\n",
" 'max_depth': [10, 20, None],\n",
" 'min_samples_split': [2, 5]\n",
" },\n",
" 'gradient_boosting': {\n",
" 'n_estimators': [100, 200],\n",
" 'learning_rate': [0.1, 0.2],\n",
" 'max_depth': [3, 5]\n",
" },\n",
" 'xgboost': {\n",
" 'n_estimators': [100, 200],\n",
" 'learning_rate': [0.1, 0.2],\n",
" 'max_depth': [3, 5]\n",
" }\n",
"}\n",
"\n",
"# Evaluation Settings\n",
"EVALUATION_CONFIG = {\n",
" 'test_size': 0.2,\n",
" 'random_state': 42,\n",
" 'cv_folds': 3,\n",
" 'scoring_metric': 'f1', # Primary metric for model selection\n",
" 'plot_confusion_matrix': True,\n",
" 'plot_precision_recall': True,\n",
" 'plot_roc_curve': True\n",
"}\n",
"\n",
"# SMOTE Parameters\n",
"SMOTE_CONFIG = {\n",
" 'sampling_strategy': 'auto', # or specific ratio like 0.5\n",
" 'k_neighbors': 5\n",
"}\n",
"\n",
"# Downsampling Parameters\n",
"DOWNSAMPLE_CONFIG = {\n",
" 'sampling_strategy': 'auto', # Balance to majority class\n",
" 'replacement': False\n",
"}\n",
"\n",
"print(\"✅ Configuration loaded successfully!\")\n",
"print(f\"Models to test: {[k for k, v in MODELS_TO_TEST.items() if v]}\")\n",
"print(f\"Balancing techniques: {[k for k, v in BALANCING_TECHNIQUES.items() if v]}\")\n",
"\n",
"# Import needed for experiment calculation\n",
"from itertools import product\n",
"\n",
"# Calculate total experiments\n",
"total_experiments = 0\n",
"for model, enabled in MODELS_TO_TEST.items():\n",
" if enabled:\n",
" params = MODEL_PARAMS.get(model, {})\n",
" if params:\n",
" param_combinations = list(product(*params.values()))\n",
" total_experiments += len(param_combinations) * sum(BALANCING_TECHNIQUES.values())\n",
" else:\n",
" total_experiments += sum(BALANCING_TECHNIQUES.values())\n",
"\n",
"print(f\"\\n🎯 Total experiments planned: {total_experiments}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import necessary libraries\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"import os\n",
"import sys\n",
"import joblib\n",
"import warnings\n",
"from itertools import product\n",
"from collections import defaultdict\n",
"import json\n",
"from IPython.display import display\n",
"\n",
"# Suppress warnings for cleaner output\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"# Set plot style\n",
"plt.style.use('seaborn-v0_8-whitegrid')\n",
"sns.set_theme(font_scale=1.1)\n",
"\n",
"# Configure plot size\n",
"plt.rcParams['figure.figsize'] = (12, 8)\n",
"plt.rcParams['font.size'] = 10\n",
"\n",
"# Display all columns\n",
"pd.set_option('display.max_columns', None)\n",
"pd.set_option('display.width', None)\n",
"\n",
"print(\"📚 Libraries imported successfully!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Add the project root to the path so we can import from src\n",
"sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath('__file__'))))\n",
"from src import config\n",
"\n",
"print(f\"📁 Project paths configured:\")\n",
"print(f\" - Data directory: {config.DATA_DIR}\")\n",
"print(f\" - Models directory: {config.MODELS_DIR}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🏗️ Model & Balancing Framework\n",
"**Flexible architecture for easy model and technique switching**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import ML libraries\n",
"from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
"from sklearn.compose import ColumnTransformer\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.metrics import (\n",
" accuracy_score, precision_score, recall_score, f1_score, \n",
" confusion_matrix, classification_report, roc_auc_score,\n",
" precision_recall_curve, roc_curve, auc\n",
")\n",
"\n",
"# Import models\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier\n",
"try:\n",
" import xgboost as xgb\n",
" XGBOOST_AVAILABLE = True\n",
" print(\"✅ XGBoost available\")\n",
"except ImportError:\n",
" XGBOOST_AVAILABLE = False\n",
" print(\"⚠️ XGBoost not available - will skip XGBoost experiments\")\n",
" MODELS_TO_TEST['xgboost'] = False\n",
"\n",
"# Import balancing techniques\n",
"from imblearn.over_sampling import SMOTE\n",
"from imblearn.under_sampling import RandomUnderSampler\n",
"from sklearn.utils.class_weight import compute_class_weight\n",
"\n",
"print(\"🤖 ML libraries imported successfully!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ================================\n",
"# 🏭 MODEL FACTORY\n",
"# ================================\n",
"\n",
"def get_model(model_name, params=None, class_weights=None):\n",
" \"\"\"\n",
" Factory function to create models with specified parameters\n",
" \n",
" Args:\n",
" model_name (str): Name of the model\n",
" params (dict): Model parameters\n",
" class_weights (dict): Class weights for imbalanced data\n",
" \n",
" Returns:\n",
" sklearn model: Configured model instance\n",
" \"\"\"\n",
" if params is None:\n",
" params = {}\n",
" \n",
" models = {\n",
" 'logistic_regression': LogisticRegression(\n",
" random_state=EVALUATION_CONFIG['random_state'],\n",
" class_weight=class_weights,\n",
" **params\n",
" ),\n",
" 'random_forest': RandomForestClassifier(\n",
" random_state=EVALUATION_CONFIG['random_state'],\n",
" class_weight=class_weights,\n",
" **params\n",
" ),\n",
" 'gradient_boosting': GradientBoostingClassifier(\n",
" random_state=EVALUATION_CONFIG['random_state'],\n",
" **params\n",
" ),\n",
" 'xgboost': xgb.XGBClassifier(\n",
" random_state=EVALUATION_CONFIG['random_state'],\n",
" eval_metric='logloss',\n",
" **params\n",
" ) if XGBOOST_AVAILABLE else None\n",
" }\n",
" \n",
" return models.get(model_name)\n",
"\n",
"# ================================\n",
"# ⚖️ BALANCING TECHNIQUES FACTORY\n",
"# ================================\n",
"\n",
"def apply_balancing_technique(X_train, y_train, technique):\n",
" \"\"\"\n",
" Apply specified balancing technique to training data\n",
" \n",
" Args:\n",
" X_train: Training features\n",
" y_train: Training labels\n",
" technique (str): Balancing technique name\n",
" \n",
" Returns:\n",
" tuple: (X_balanced, y_balanced, class_weights, technique_info)\n",
" \"\"\"\n",
" technique_info = {'name': technique, 'original_shape': X_train.shape}\n",
" \n",
" if technique == 'smote':\n",
" smote = SMOTE(\n",
" sampling_strategy=SMOTE_CONFIG['sampling_strategy'],\n",
" k_neighbors=SMOTE_CONFIG['k_neighbors'],\n",
" random_state=EVALUATION_CONFIG['random_state']\n",
" )\n",
" X_balanced, y_balanced = smote.fit_resample(X_train, y_train)\n",
" class_weights = None\n",
" technique_info['new_shape'] = X_balanced.shape\n",
" technique_info['description'] = 'SMOTE oversampling'\n",
" \n",
" elif technique == 'random_downsample':\n",
" downsampler = RandomUnderSampler(\n",
" sampling_strategy=DOWNSAMPLE_CONFIG['sampling_strategy'],\n",
" random_state=EVALUATION_CONFIG['random_state']\n",
" )\n",
" X_balanced, y_balanced = downsampler.fit_resample(X_train, y_train)\n",
" class_weights = None\n",
" technique_info['new_shape'] = X_balanced.shape\n",
" technique_info['description'] = 'Random undersampling'\n",
" \n",
" elif technique == 'class_weight':\n",
" X_balanced, y_balanced = X_train, y_train\n",
" # Compute class weights\n",
" classes = np.unique(y_train)\n",
" weights = compute_class_weight('balanced', classes=classes, y=y_train)\n",
" class_weights = dict(zip(classes, weights))\n",
" technique_info['new_shape'] = X_balanced.shape\n",
" technique_info['description'] = f'Class weighting: {class_weights}'\n",
" \n",
" elif technique == 'no_balancing':\n",
" X_balanced, y_balanced = X_train, y_train\n",
" class_weights = None\n",
" technique_info['new_shape'] = X_balanced.shape\n",
" technique_info['description'] = 'No balancing (baseline)'\n",
" \n",
" else:\n",
" raise ValueError(f\"Unknown balancing technique: {technique}\")\n",
" \n",
" return X_balanced, y_balanced, class_weights, technique_info\n",
"\n",
"print(\"🏭 Model and balancing factories created!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 📊 Comprehensive Evaluation Framework\n",
"**Detailed analysis and comparison system for all models and techniques**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ================================\n",
"# 📈 EVALUATION FRAMEWORK\n",
"# ================================\n",
"\n",
"class ModelEvaluator:\n",
" \"\"\"\n",
" Comprehensive evaluation framework for fraud detection models\n",
" \"\"\"\n",
" \n",
" def __init__(self):\n",
" self.results = []\n",
" self.confusion_matrices = {}\n",
" \n",
" def evaluate_model(self, model, X_test, y_test, model_name, balancing_technique, params=None):\n",
" \"\"\"\n",
" Comprehensive model evaluation with detailed metrics\n",
" \"\"\"\n",
" # Make predictions\n",
" y_pred = model.predict(X_test)\n",
" y_pred_proba = model.predict_proba(X_test)[:, 1] if hasattr(model, 'predict_proba') else None\n",
" \n",
" # Calculate metrics\n",
" metrics = {\n",
" 'model_name': model_name,\n",
" 'balancing_technique': balancing_technique,\n",
" 'parameters': params or {},\n",
" 'accuracy': accuracy_score(y_test, y_pred),\n",
" 'precision': precision_score(y_test, y_pred, zero_division=0),\n",
" 'recall': recall_score(y_test, y_pred, zero_division=0),\n",
" 'f1_score': f1_score(y_test, y_pred, zero_division=0),\n",
" 'roc_auc': roc_auc_score(y_test, y_pred_proba) if y_pred_proba is not None else None\n",
" }\n",
" \n",
" # Confusion matrix analysis\n",
" cm = confusion_matrix(y_test, y_pred)\n",
" tn, fp, fn, tp = cm.ravel()\n",
" \n",
" # Detailed confusion matrix metrics\n",
" metrics.update({\n",
" 'true_negatives': int(tn),\n",
" 'false_positives': int(fp),\n",
" 'false_negatives': int(fn),\n",
" 'true_positives': int(tp),\n",
" 'specificity': tn / (tn + fp) if (tn + fp) > 0 else 0,\n",
" 'sensitivity': tp / (tp + fn) if (tp + fn) > 0 else 0,\n",
" 'false_positive_rate': fp / (fp + tn) if (fp + tn) > 0 else 0,\n",
" 'false_negative_rate': fn / (fn + tp) if (fn + tp) > 0 else 0\n",
" })\n",
" \n",
" # Store results\n",
" self.results.append(metrics)\n",
" \n",
" # Store confusion matrix for detailed analysis\n",
" key = f\"{model_name}_{balancing_technique}\"\n",
" self.confusion_matrices[key] = {\n",
" 'matrix': cm,\n",
" 'model_name': model_name,\n",
" 'balancing_technique': balancing_technique,\n",
" 'metrics': metrics\n",
" }\n",
" \n",
" return metrics\n",
" \n",
" def plot_confusion_matrix_detailed(self, model_name, balancing_technique, figsize=(10, 8)):\n",
" \"\"\"\n",
" Plot detailed confusion matrix with comprehensive analysis\n",
" \"\"\"\n",
" key = f\"{model_name}_{balancing_technique}\"\n",
" if key not in self.confusion_matrices:\n",
" print(f\"No results found for {key}\")\n",
" return\n",
" \n",
" data = self.confusion_matrices[key]\n",
" cm = data['matrix']\n",
" metrics = data['metrics']\n",
" \n",
" fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=figsize)\n",
" fig.suptitle(f'Detailed Analysis: {model_name.title()} with {balancing_technique.title()}', \n",
" fontsize=16, fontweight='bold')\n",
" \n",
" # 1. Raw confusion matrix\n",
" sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax1, \n",
" xticklabels=['Not Fraud', 'Fraud'], yticklabels=['Not Fraud', 'Fraud'])\n",
" ax1.set_title('Raw Counts')\n",
" ax1.set_xlabel('Predicted')\n",
" ax1.set_ylabel('Actual')\n",
" \n",
" # 2. Normalized confusion matrix\n",
" cm_norm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n",
" sns.heatmap(cm_norm, annot=True, fmt='.3f', cmap='Oranges', ax=ax2,\n",
" xticklabels=['Not Fraud', 'Fraud'], yticklabels=['Not Fraud', 'Fraud'])\n",
" ax2.set_title('Normalized by True Class')\n",
" ax2.set_xlabel('Predicted')\n",
" ax2.set_ylabel('Actual')\n",
" \n",
" # 3. Metrics visualization\n",
" metric_names = ['Precision', 'Recall', 'F1-Score', 'Specificity']\n",
" metric_values = [metrics['precision'], metrics['recall'], \n",
" metrics['f1_score'], metrics['specificity']]\n",
" \n",
" bars = ax3.bar(metric_names, metric_values, color=['skyblue', 'lightcoral', 'lightgreen', 'gold'])\n",
" ax3.set_title('Key Metrics')\n",
" ax3.set_ylabel('Score')\n",
" ax3.set_ylim(0, 1)\n",
" \n",
" # Add value labels on bars\n",
" for bar, value in zip(bars, metric_values):\n",
" ax3.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, \n",
" f'{value:.3f}', ha='center', va='bottom')\n",
" \n",
" # 4. Error analysis\n",
" tn, fp, fn, tp = cm.ravel()\n",
" error_types = ['True Neg', 'False Pos', 'False Neg', 'True Pos']\n",
" error_counts = [tn, fp, fn, tp]\n",
" colors = ['green', 'red', 'orange', 'blue']\n",
" \n",
" wedges, texts, autotexts = ax4.pie(error_counts, labels=error_types, colors=colors, \n",
" autopct='%1.1f%%', startangle=90)\n",
" ax4.set_title('Prediction Distribution')\n",
" \n",
" plt.tight_layout()\n",
" plt.show()\n",
" \n",
" # Print detailed analysis\n",
" self._print_confusion_matrix_analysis(metrics, model_name, balancing_technique)\n",
" \n",
" def _print_confusion_matrix_analysis(self, metrics, model_name, balancing_technique):\n",
" \"\"\"\n",
" Print detailed textual analysis of confusion matrix\n",
" \"\"\"\n",
" print(f\"\\n🔍 DETAILED ANALYSIS: {model_name.upper()} with {balancing_technique.upper()}\")\n",
" print(\"=\" * 80)\n",
" \n",
" print(f\"\\n📊 CONFUSION MATRIX BREAKDOWN:\")\n",
" print(f\" • True Negatives (TN): {metrics['true_negatives']:,} - Correctly identified non-fraud\")\n",
" print(f\" • False Positives (FP): {metrics['false_positives']:,} - Incorrectly flagged as fraud\")\n",
" print(f\" • False Negatives (FN): {metrics['false_negatives']:,} - Missed fraud cases\")\n",
" print(f\" • True Positives (TP): {metrics['true_positives']:,} - Correctly identified fraud\")\n",
" \n",
" print(f\"\\n🎯 PRECISION & RECALL ANALYSIS:\")\n",
" print(f\" • Precision: {metrics['precision']:.4f}\")\n",
" print(f\" → Of all fraud predictions, {metrics['precision']*100:.2f}% were actually fraud\")\n",
" print(f\" → {metrics['false_positives']:,} legitimate transactions incorrectly flagged\")\n",
" \n",
" print(f\" • Recall (Sensitivity): {metrics['recall']:.4f}\")\n",
" print(f\" → Detected {metrics['recall']*100:.2f}% of all actual fraud cases\")\n",
" print(f\" → Missed {metrics['false_negatives']:,} fraud transactions\")\n",
" \n",
" print(f\" • Specificity: {metrics['specificity']:.4f}\")\n",
" print(f\" → Correctly identified {metrics['specificity']*100:.2f}% of legitimate transactions\")\n",
" \n",
" print(f\"\\n⚖️ TRADE-OFF ANALYSIS:\")\n",
" if metrics['precision'] > 0.8 and metrics['recall'] > 0.8:\n",
" print(f\" ✅ EXCELLENT: High precision AND high recall - optimal performance\")\n",
" elif metrics['precision'] > 0.8:\n",
" print(f\" 🎯 HIGH PRECISION: Low false alarms, but may miss some fraud\")\n",
" print(f\" → Good for minimizing customer inconvenience\")\n",
" elif metrics['recall'] > 0.8:\n",
" print(f\" 🔍 HIGH RECALL: Catches most fraud, but more false alarms\")\n",
" print(f\" → Good for maximizing fraud detection\")\n",
" else:\n",
" print(f\" ⚠️ BALANCED: Moderate precision and recall\")\n",
" \n",
" print(f\"\\n💰 BUSINESS IMPACT:\")\n",
" fp_cost = metrics['false_positives'] * 10 # Assume $10 cost per false positive\n",
" fn_cost = metrics['false_negatives'] * 100 # Assume $100 cost per missed fraud\n",
" total_cost = fp_cost + fn_cost\n",
" print(f\" • Estimated FP cost: ${fp_cost:,} ({metrics['false_positives']:,} × $10)\")\n",
" print(f\" • Estimated FN cost: ${fn_cost:,} ({metrics['false_negatives']:,} × $100)\")\n",
" print(f\" • Total estimated cost: ${total_cost:,}\")\n",
"\n",
"# Initialize evaluator\n",
"evaluator = ModelEvaluator()\n",
"print(\"📊 Comprehensive evaluation framework ready!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Load the Preprocessed Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's load the preprocessed training and test data that we created in the feature engineering notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load preprocessed training data\n",
"try:\n",
" train_data = pd.read_csv(config.PROCESSED_TRAIN_DATA_PATH)\n",
" print(f'Loaded preprocessed training data from {config.PROCESSED_TRAIN_DATA_PATH}')\n",
"except FileNotFoundError:\n",
" print(f'Preprocessed training data not found at {config.PROCESSED_TRAIN_DATA_PATH}')\n",
" print('Please run the feature_engineering.ipynb notebook first to create the preprocessed data.')\n",
" # If preprocessed data doesn't exist, we'll load and preprocess the raw data here\n",
" # This is just a fallback and would normally be handled by the feature engineering notebook\n",
" train_data = pd.read_csv(config.TRAIN_DATA_PATH)\n",
" print(f'Loaded raw training data from {config.TRAIN_DATA_PATH} instead.')\n",
"\n",
"# Load preprocessed test data\n",
"try:\n",
" test_data = pd.read_csv(config.PROCESSED_TEST_DATA_PATH)\n",
" print(f'Loaded preprocessed test data from {config.PROCESSED_TEST_DATA_PATH}')\n",
"except FileNotFoundError:\n",
" print(f'Preprocessed test data not found at {config.PROCESSED_TEST_DATA_PATH}')\n",
" # If preprocessed data doesn't exist, we'll load the raw data\n",
" test_data = pd.read_csv(config.TEST_DATA_PATH)\n",
" print(f'Loaded raw test data from {config.TEST_DATA_PATH} instead.')\n",
"\n",
"print(f'\\n📊 Data Summary:')\n",
"print(f' • Training data shape: {train_data.shape}')\n",
"print(f' • Test data shape: {test_data.shape}')\n",
"\n",
"# Check for target variable\n",
"if 'is_fraud' in train_data.columns:\n",
" fraud_rate = train_data['is_fraud'].mean()\n",
" print(f' • Fraud rate: {fraud_rate:.4f} ({fraud_rate*100:.2f}%)')\n",
" print(f' • Class distribution: {train_data[\"is_fraud\"].value_counts().to_dict()}')\n",
"else:\n",
" print(' ⚠️ Target variable \"is_fraud\" not found in training data')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Display the first few rows of the training data\n",
"print(\"📋 Sample of training data:\")\n",
"display(train_data.head())\n",
"\n",
"print(\"\\n📋 Data types and missing values:\")\n",
"info_df = pd.DataFrame({\n",
" 'Data Type': train_data.dtypes,\n",
" 'Missing Values': train_data.isnull().sum(),\n",
" 'Missing %': (train_data.isnull().sum() / len(train_data) * 100).round(2)\n",
"})\n",
"display(info_df[info_df['Missing Values'] > 0]) # Only show columns with missing values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🚀 Comprehensive Experiment Runner\n",
"**Systematic testing of all model and balancing technique combinations**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ================================\n",
"# 🧪 EXPERIMENT RUNNER\n",
"# ================================\n",
"\n",
"def run_comprehensive_experiments():\n",
" \"\"\"\n",
" Run systematic experiments across all model and balancing combinations\n",
" \"\"\"\n",
" print(\"🚀 Starting Comprehensive Fraud Detection Experiments\")\n",
" print(\"=\" * 60)\n",
" \n",
" # Prepare data\n",
" if 'is_fraud' not in train_data.columns:\n",
" print(\"❌ Error: Target variable 'is_fraud' not found\")\n",
" return\n",
" \n",
" # Split features and target\n",
" X = train_data.drop('is_fraud', axis=1)\n",
" y = train_data['is_fraud']\n",
" \n",
" # Split into train and validation sets\n",
" X_train, X_val, y_train, y_val = train_test_split(\n",
" X, y, \n",
" test_size=EVALUATION_CONFIG['test_size'],\n",
" random_state=EVALUATION_CONFIG['random_state'],\n",
" stratify=y\n",
" )\n",
" \n",
" print(f\"📊 Data split completed:\")\n",
" print(f\" • Training: {X_train.shape[0]:,} samples\")\n",
" print(f\" • Validation: {X_val.shape[0]:,} samples\")\n",
" \n",
" # Identify feature types\n",
" categorical_cols = X_train.select_dtypes(include=['object', 'category']).columns.tolist()\n",
" numerical_cols = X_train.select_dtypes(include=['int64', 'float64']).columns.tolist()\n",
" \n",
" print(f\" • Categorical features: {len(categorical_cols)}\")\n",
" print(f\" • Numerical features: {len(numerical_cols)}\")\n",
" \n",
" # Create preprocessing pipeline\n",
" preprocessor = ColumnTransformer(\n",
" transformers=[\n",
" ('num', StandardScaler(), numerical_cols),\n",
" ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_cols)\n",
" ]\n",
" )\n",
" \n",
" # Preprocess validation data once\n",
" print(\"\\n🔄 Preprocessing validation data...\")\n",
" X_val_processed = preprocessor.fit_transform(X_val)\n",
" \n",
" # Initialize results storage\n",
" experiment_results = []\n",
" experiment_count = 0\n",
" \n",
" # Calculate total experiments\n",
" active_models = [k for k, v in MODELS_TO_TEST.items() if v]\n",
" active_balancing = [k for k, v in BALANCING_TECHNIQUES.items() if v]\n",
" total_experiments = len(active_models) * len(active_balancing)\n",
" \n",
" print(f\"\\n🎯 Running {total_experiments} experiments...\")\n",
" print(f\" • Models: {active_models}\")\n",
" print(f\" • Balancing techniques: {active_balancing}\")\n",
" \n",
" # Run experiments\n",
" for model_name in active_models:\n",
" for balancing_technique in active_balancing:\n",
" experiment_count += 1\n",
" print(f\"\\n🔬 Experiment {experiment_count}/{total_experiments}: {model_name.upper()} + {balancing_technique.upper()}\")\n",
" print(\"-\" * 50)\n",
" \n",
" try:\n",
" # Apply balancing technique\n",
" X_train_balanced, y_train_balanced, class_weights, technique_info = apply_balancing_technique(\n",
" X_train, y_train, balancing_technique\n",
" )\n",
" \n",
" print(f\" ⚖️ {technique_info['description']}\")\n",
" print(f\" Original: {technique_info['original_shape']} → Balanced: {technique_info['new_shape']}\")\n",
" \n",
" # Preprocess training data\n",
" if balancing_technique in ['smote', 'random_downsample']:\n",
" # For resampling techniques, fit preprocessor on original data, then apply to resampled\n",
" preprocessor.fit(X_train)\n",
" X_train_processed = preprocessor.transform(X_train_balanced)\n",
" else:\n",
" # For class weighting or no balancing, use original data\n",
" X_train_processed = preprocessor.fit_transform(X_train_balanced)\n",
" \n",
" # Test different parameter combinations for this model\n",
" model_params = MODEL_PARAMS.get(model_name, {})\n",
" \n",
" if model_params:\n",
" # Generate parameter combinations\n",
" param_names = list(model_params.keys())\n",
" param_values = list(model_params.values())\n",
" param_combinations = list(product(*param_values))\n",
" \n",
" print(f\" 🔧 Testing {len(param_combinations)} parameter combinations...\")\n",
" \n",
" for param_combo in param_combinations:\n",
" # Create parameter dictionary\n",
" current_params = dict(zip(param_names, param_combo))\n",
" param_str = ', '.join([f'{k}={v}' for k, v in current_params.items()])\n",
" \n",
" print(f\" 🎛️ Parameters: {param_str}\")\n",
" \n",
" # Get model with current parameters\n",
" model = get_model(model_name, params=current_params, class_weights=class_weights)\n",
" \n",
" if model is None:\n",
" print(f\" ❌ Model {model_name} not available\")\n",
" continue\n",
" \n",
" # Train model\n",
" model.fit(X_train_processed, y_train_balanced)\n",
" \n",
" # Evaluate model\n",
" metrics = evaluator.evaluate_model(\n",
" model, X_val_processed, y_val, \n",
" f\"{model_name}_{param_str.replace(' ', '').replace(',', '_').replace('=', '')}\", \n",
" balancing_technique, \n",
" params=current_params\n",
" )\n",
" \n",
" # Store results with parameter info\n",
" experiment_results.append({\n",
" 'experiment_id': experiment_count,\n",
" 'model_name': model_name,\n",
" 'balancing_technique': balancing_technique,\n",
" 'parameters': current_params,\n",
" 'param_string': param_str,\n",
" 'technique_info': technique_info,\n",
" 'metrics': metrics\n",
" })\n",
" \n",
" # Print quick summary\n",
" print(f\" ✅ F1={metrics['f1_score']:.3f}, P={metrics['precision']:.3f}, R={metrics['recall']:.3f}\")\n",
" \n",
" else:\n",
" # No parameters to test, use default\n",
" model = get_model(model_name, class_weights=class_weights)\n",
" \n",
" if model is None:\n",
" print(f\" ❌ Model {model_name} not available\")\n",
" continue\n",
" \n",
" # Train model\n",
" print(f\" 🏋️ Training {model_name} with default parameters...\")\n",
" model.fit(X_train_processed, y_train_balanced)\n",
" \n",
" # Evaluate model\n",
" print(f\" 📊 Evaluating...\")\n",
" metrics = evaluator.evaluate_model(\n",
" model, X_val_processed, y_val, \n",
" model_name, balancing_technique\n",
" )\n",
" \n",
" # Store results\n",
" experiment_results.append({\n",
" 'experiment_id': experiment_count,\n",
" 'model_name': model_name,\n",
" 'balancing_technique': balancing_technique,\n",
" 'parameters': {},\n",
" 'param_string': 'default',\n",
" 'technique_info': technique_info,\n",
" 'metrics': metrics\n",
" })\n",
" \n",
" # Print quick summary\n",
" print(f\" ✅ Results: F1={metrics['f1_score']:.3f}, Precision={metrics['precision']:.3f}, Recall={metrics['recall']:.3f}\")\n",
" \n",
" except Exception as e:\n",
" print(f\" ❌ Error in experiment: {str(e)}\")\n",
" continue\n",
" \n",
" print(f\"\\n🎉 All experiments completed! ({experiment_count} total)\")\n",
" return experiment_results\n",
"\n",
"# Run the comprehensive experiments\n",
"print(\"Starting comprehensive experiments...\")\n",
"all_results = run_comprehensive_experiments()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 📈 Comprehensive Results Analysis\n",
"**Detailed comparison and analysis of all experiments**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ================================\n",
"# 📊 RESULTS ANALYSIS FRAMEWORK\n",
"# ================================\n",
"\n",
"def analyze_experiment_results(results):\n",
" \"\"\"\n",
" Comprehensive analysis of all experiment results\n",
" \"\"\"\n",
" if not results:\n",
" print(\"❌ No results to analyze\")\n",
" return\n",
" \n",
" print(\"📊 COMPREHENSIVE RESULTS ANALYSIS\")\n",
" print(\"=\" * 60)\n",
" \n",
" # Create results DataFrame\n",
" results_data = []\n",
" for result in results:\n",
" metrics = result['metrics']\n",
" results_data.append({\n",
" 'Model': result['model_name'].replace('_', ' ').title(),\n",
" 'Balancing': result['balancing_technique'].replace('_', ' ').title(),\n",
" 'F1 Score': metrics['f1_score'],\n",
" 'Precision': metrics['precision'],\n",
" 'Recall': metrics['recall'],\n",
" 'Accuracy': metrics['accuracy'],\n",
" 'ROC AUC': metrics['roc_auc'] if metrics['roc_auc'] else 0,\n",
" 'True Positives': metrics['true_positives'],\n",
" 'False Positives': metrics['false_positives'],\n",
" 'False Negatives': metrics['false_negatives'],\n",
" 'True Negatives': metrics['true_negatives']\n",
" })\n",
" \n",
" results_df = pd.DataFrame(results_data)\n",
" \n",
" # 1. Overall Performance Summary\n",
" print(\"\\n🏆 TOP PERFORMERS BY METRIC:\")\n",
" print(\"-\" * 40)\n",
" \n",
" metrics_to_analyze = ['F1 Score', 'Precision', 'Recall', 'Accuracy']\n",
" for metric in metrics_to_analyze:\n",
" best_idx = results_df[metric].idxmax()\n",
" best_result = results_df.iloc[best_idx]\n",
" print(f\" 🥇 Best {metric}: {best_result['Model']} + {best_result['Balancing']} ({best_result[metric]:.4f})\")\n",
" \n",
" # 2. Model Comparison\n",
" print(\"\\n🤖 MODEL PERFORMANCE COMPARISON:\")\n",
" print(\"-\" * 40)\n",
" model_comparison = results_df.groupby('Model')[['F1 Score', 'Precision', 'Recall']].agg(['mean', 'std']).round(4)\n",
" display(model_comparison)\n",
" \n",
" # 3. Balancing Technique Comparison\n",
" print(\"\\n⚖️ BALANCING TECHNIQUE COMPARISON:\")\n",
" print(\"-\" * 40)\n",
" balancing_comparison = results_df.groupby('Balancing')[['F1 Score', 'Precision', 'Recall']].agg(['mean', 'std']).round(4)\n",
" display(balancing_comparison)\n",
" \n",
" # 4. Detailed Results Table\n",
" print(\"\\n📋 DETAILED RESULTS TABLE:\")\n",
" print(\"-\" * 40)\n",
" display_df = results_df[['Model', 'Balancing', 'F1 Score', 'Precision', 'Recall', 'Accuracy']].round(4)\n",
" display_df = display_df.sort_values('F1 Score', ascending=False)\n",
" display(display_df)\n",
" \n",
" return results_df\n",
"\n",
"def plot_comprehensive_comparison(results_df):\n",
" \"\"\"\n",
" Create comprehensive visualization of all results\n",
" \"\"\"\n",
" fig, axes = plt.subplots(2, 3, figsize=(20, 12))\n",
" fig.suptitle('Comprehensive Model & Balancing Technique Comparison', fontsize=16, fontweight='bold')\n",
" \n",
" # 1. F1 Score Heatmap\n",
" pivot_f1 = results_df.pivot(index='Model', columns='Balancing', values='F1 Score')\n",
" sns.heatmap(pivot_f1, annot=True, fmt='.3f', cmap='YlOrRd', ax=axes[0,0])\n",
" axes[0,0].set_title('F1 Score by Model & Balancing')\n",
" \n",
" # 2. Precision Heatmap\n",
" pivot_precision = results_df.pivot(index='Model', columns='Balancing', values='Precision')\n",
" sns.heatmap(pivot_precision, annot=True, fmt='.3f', cmap='Blues', ax=axes[0,1])\n",
" axes[0,1].set_title('Precision by Model & Balancing')\n",
" \n",
" # 3. Recall Heatmap\n",
" pivot_recall = results_df.pivot(index='Model', columns='Balancing', values='Recall')\n",
" sns.heatmap(pivot_recall, annot=True, fmt='.3f', cmap='Greens', ax=axes[0,2])\n",
" axes[0,2].set_title('Recall by Model & Balancing')\n",
" \n",
" # 4. Model Performance Comparison\n",
" model_means = results_df.groupby('Model')[['F1 Score', 'Precision', 'Recall']].mean()\n",
" model_means.plot(kind='bar', ax=axes[1,0])\n",
" axes[1,0].set_title('Average Performance by Model')\n",
" axes[1,0].set_ylabel('Score')\n",
" axes[1,0].legend(bbox_to_anchor=(1.05, 1), loc='upper left')\n",
" axes[1,0].tick_params(axis='x', rotation=45)\n",
" \n",
" # 5. Balancing Technique Performance\n",
" balancing_means = results_df.groupby('Balancing')[['F1 Score', 'Precision', 'Recall']].mean()\n",
" balancing_means.plot(kind='bar', ax=axes[1,1])\n",
" axes[1,1].set_title('Average Performance by Balancing Technique')\n",
" axes[1,1].set_ylabel('Score')\n",
" axes[1,1].legend(bbox_to_anchor=(1.05, 1), loc='upper left')\n",
" axes[1,1].tick_params(axis='x', rotation=45)\n",
" \n",
" # 6. Precision vs Recall Scatter\n",
" for balancing in results_df['Balancing'].unique():\n",
" subset = results_df[results_df['Balancing'] == balancing]\n",
" axes[1,2].scatter(subset['Precision'], subset['Recall'], \n",
" label=balancing, s=100, alpha=0.7)\n",
" \n",
" axes[1,2].set_xlabel('Precision')\n",
" axes[1,2].set_ylabel('Recall')\n",
" axes[1,2].set_title('Precision vs Recall by Balancing Technique')\n",
" axes[1,2].legend()\n",
" axes[1,2].grid(True, alpha=0.3)\n",
" \n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
"# Analyze results\n",
"if 'all_results' in locals() and all_results:\n",
" results_df = analyze_experiment_results(all_results)\n",
" plot_comprehensive_comparison(results_df)\n",
" \n",
" # CRITICAL: Add parameter variation analysis\n",
" print(\"\\n\" + \"=\" * 80)\n",
" print(\"🔧 PARAMETER VARIATION ANALYSIS\")\n",
" print(\"=\" * 80)\n",
" \n",
" # Analyze how parameters affect performance for each model-balancing combination\n",
" param_analysis_results = {}\n",
" \n",
" for result in all_results:\n",
" model_name = result['model_name']\n",
" balancing = result['balancing_technique']\n",
" key = f\"{model_name}_{balancing}\"\n",
" \n",
" if key not in param_analysis_results:\n",
" param_analysis_results[key] = []\n",
" \n",
" param_analysis_results[key].append({\n",
" 'param_string': result.get('param_string', 'default'),\n",
" 'parameters': result.get('parameters', {}),\n",
" 'f1_score': result['metrics']['f1_score'],\n",
" 'precision': result['metrics']['precision'],\n",
" 'recall': result['metrics']['recall'],\n",
" 'false_positives': result['metrics']['false_positives'],\n",
" 'false_negatives': result['metrics']['false_negatives']\n",
" })\n",
" \n",
" # Display parameter impact for each combination\n",
" for key, param_results in param_analysis_results.items():\n",
" if len(param_results) > 1: # Only analyze if multiple parameter combinations\n",
" model_name, balancing = key.split('_', 1)\n",
" \n",
" print(f\"\\n🔍 PARAMETER IMPACT: {model_name.upper()} + {balancing.upper()}\")\n",
" print(\"-\" * 60)\n",
" \n",
" # Create comparison DataFrame\n",
" param_df = pd.DataFrame(param_results).sort_values('f1_score', ascending=False)\n",
" \n",
" print(\"📊 Parameter Performance Comparison (sorted by F1 Score):\")\n",
" display_cols = ['param_string', 'f1_score', 'precision', 'recall', 'false_positives', 'false_negatives']\n",
" display(param_df[display_cols].round(4))\n",
" \n",
" # Analyze best vs worst\n",
" best = param_df.iloc[0]\n",
" worst = param_df.iloc[-1]\n",
" \n",
" print(f\"\\n🏆 BEST PARAMETERS: {best['param_string']}\")\n",
" print(f\" • F1: {best['f1_score']:.4f}, Precision: {best['precision']:.4f}, Recall: {best['recall']:.4f}\")\n",
" print(f\" • Errors: {best['false_positives']} FP, {best['false_negatives']} FN\")\n",
" \n",
" print(f\"\\n📉 WORST PARAMETERS: {worst['param_string']}\")\n",
" print(f\" • F1: {worst['f1_score']:.4f}, Precision: {worst['precision']:.4f}, Recall: {worst['recall']:.4f}\")\n",
" print(f\" • Errors: {worst['false_positives']} FP, {worst['false_negatives']} FN\")\n",
" \n",
" # Calculate improvement\n",
" f1_improvement = best['f1_score'] - worst['f1_score']\n",
" precision_improvement = best['precision'] - worst['precision']\n",
" recall_improvement = best['recall'] - worst['recall']\n",
" \n",
" print(f\"\\n📈 PARAMETER TUNING IMPACT:\")\n",
" print(f\" • F1 Score improvement: {f1_improvement:.4f} ({f1_improvement/worst['f1_score']*100:.1f}% relative)\")\n",
" print(f\" • Precision change: {precision_improvement:+.4f}\")\n",
" print(f\" • Recall change: {recall_improvement:+.4f}\")\n",
" \n",
" # Confusion matrix comparison insight\n",
" fp_change = best['false_positives'] - worst['false_positives']\n",
" fn_change = best['false_negatives'] - worst['false_negatives']\n",
" \n",
" print(f\"\\n🎯 CONFUSION MATRIX CHANGES:\")\n",
" print(f\" • False Positives: {fp_change:+d} ({'reduced' if fp_change < 0 else 'increased'} customer inconvenience)\")\n",
" print(f\" • False Negatives: {fn_change:+d} ({'reduced' if fn_change < 0 else 'increased'} missed fraud)\")\n",
" \n",
" if fp_change < 0 and fn_change < 0:\n",
" print(f\" ✅ EXCELLENT: Parameter tuning reduced both types of errors!\")\n",
" elif fp_change < 0:\n",
" print(f\" 🎯 PRECISION FOCUSED: Reduced false alarms (better customer experience)\")\n",
" elif fn_change < 0:\n",
" print(f\" 🔍 RECALL FOCUSED: Reduced missed fraud (better fraud detection)\")\n",
" else:\n",
" print(f\" ⚠️ TRADE-OFF: Parameter tuning improved F1 through better balance\")\n",
" \n",
" else:\n",
" model_name, balancing = key.split('_', 1)\n",
" print(f\"\\n⚠️ {model_name.upper()} + {balancing.upper()}: Only one parameter combination tested\")\n",
" \n",
"else:\n",
" print(\"⚠️ No experiment results found. Please run the experiments first.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🎯 Detailed Confusion Matrix Analysis\n",
"**In-depth analysis of precision/recall trade-offs for each approach**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ================================\n",
"# 🎯 CONFUSION MATRIX DEEP DIVE\n",
"# ================================\n",
"\n",
"def analyze_confusion_matrices():\n",
" \"\"\"\n",
" Detailed analysis of confusion matrices for all experiments\n",
" \"\"\"\n",
" print(\"🎯 DETAILED CONFUSION MATRIX ANALYSIS\")\n",
" print(\"=\" * 60)\n",
" \n",
" if not evaluator.confusion_matrices:\n",
" print(\"❌ No confusion matrices found. Please run experiments first.\")\n",
" return\n",
" \n",
" # Analyze each model-balancing combination\n",
" for key, data in evaluator.confusion_matrices.items():\n",
" model_name = data['model_name']\n",
" balancing_technique = data['balancing_technique']\n",
" \n",
" print(f\"\\n🔍 Analyzing: {model_name.upper()} + {balancing_technique.upper()}\")\n",
" print(\"=\" * 50)\n",
" \n",
" # Plot detailed confusion matrix\n",
" evaluator.plot_confusion_matrix_detailed(model_name, balancing_technique)\n",
"\n",
"def compare_balancing_techniques_detailed():\n",
" \"\"\"\n",
" Detailed comparison of how different balancing techniques affect precision/recall\n",
" \"\"\"\n",
" print(\"\\n⚖️ BALANCING TECHNIQUES: PRECISION/RECALL TRADE-OFF ANALYSIS\")\n",
" print(\"=\" * 70)\n",
" \n",
" if not all_results:\n",
" print(\"❌ No results available for analysis\")\n",
" return\n",
" \n",
" # Group results by balancing technique\n",
" balancing_analysis = {}\n",
" \n",
" for result in all_results:\n",
" technique = result['balancing_technique']\n",
" metrics = result['metrics']\n",
" \n",
" if technique not in balancing_analysis:\n",
" balancing_analysis[technique] = {\n",
" 'results': [],\n",
" 'avg_precision': 0,\n",
" 'avg_recall': 0,\n",
" 'avg_f1': 0,\n",
" 'total_fp': 0,\n",
" 'total_fn': 0\n",
" }\n",
" \n",
" balancing_analysis[technique]['results'].append(metrics)\n",
" balancing_analysis[technique]['total_fp'] += metrics['false_positives']\n",
" balancing_analysis[technique]['total_fn'] += metrics['false_negatives']\n",
" \n",
" # Calculate averages and analyze\n",
" for technique, data in balancing_analysis.items():\n",
" results = data['results']\n",
" n_results = len(results)\n",
" \n",
" avg_precision = sum(r['precision'] for r in results) / n_results\n",
" avg_recall = sum(r['recall'] for r in results) / n_results\n",
" avg_f1 = sum(r['f1_score'] for r in results) / n_results\n",
" \n",
" data['avg_precision'] = avg_precision\n",
" data['avg_recall'] = avg_recall\n",
" data['avg_f1'] = avg_f1\n",
" \n",
" print(f\"\\n🔬 {technique.upper().replace('_', ' ')} ANALYSIS:\")\n",
" print(\"-\" * 40)\n",
" print(f\" 📊 Average Metrics (across {n_results} models):\")\n",
" print(f\" • Precision: {avg_precision:.4f}\")\n",
" print(f\" • Recall: {avg_recall:.4f}\")\n",
" print(f\" • F1 Score: {avg_f1:.4f}\")\n",
" \n",
" print(f\" 🎯 Error Analysis:\")\n",
" print(f\" • Total False Positives: {data['total_fp']:,}\")\n",
" print(f\" • Total False Negatives: {data['total_fn']:,}\")\n",
" \n",
" # Technique-specific insights\n",
" if technique == 'smote':\n",
" print(f\" 💡 SMOTE Insights:\")\n",
" print(f\" • Synthetic oversampling tends to improve recall\")\n",
" print(f\" • May introduce noise, potentially affecting precision\")\n",
" print(f\" • Good for learning minority class patterns\")\n",
" elif technique == 'random_downsample':\n",
" print(f\" 💡 Downsampling Insights:\")\n",
" print(f\" • Reduces dataset size, faster training\")\n",
" print(f\" • May lose important majority class information\")\n",
" print(f\" • Can lead to overfitting on reduced data\")\n",
" elif technique == 'class_weight':\n",
" print(f\" 💡 Class Weighting Insights:\")\n",
" print(f\" • Preserves all original data\")\n",
" print(f\" • Adjusts model's decision boundary\")\n",
" print(f\" • May be sensitive to weight selection\")\n",
" elif technique == 'no_balancing':\n",
" print(f\" 💡 No Balancing Insights:\")\n",
" print(f\" • Baseline performance with imbalanced data\")\n",
" print(f\" • Typically biased toward majority class\")\n",
" print(f\" • May have high precision but low recall\")\n",
" \n",
" # Create comparison visualization\n",
" create_balancing_comparison_plot(balancing_analysis)\n",
"\n",
"def create_balancing_comparison_plot(balancing_analysis):\n",
" \"\"\"\n",
" Create detailed visualization comparing balancing techniques\n",
" \"\"\"\n",
" fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))\n",
" fig.suptitle('Balancing Techniques: Detailed Comparison', fontsize=16, fontweight='bold')\n",
" \n",
" techniques = list(balancing_analysis.keys())\n",
" precisions = [balancing_analysis[t]['avg_precision'] for t in techniques]\n",
" recalls = [balancing_analysis[t]['avg_recall'] for t in techniques]\n",
" f1_scores = [balancing_analysis[t]['avg_f1'] for t in techniques]\n",
" \n",
" # 1. Precision Comparison\n",
" bars1 = ax1.bar(techniques, precisions, color='skyblue', alpha=0.8)\n",
" ax1.set_title('Average Precision by Balancing Technique')\n",
" ax1.set_ylabel('Precision')\n",
" ax1.set_ylim(0, 1)\n",
" for bar, val in zip(bars1, precisions):\n",
" ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, \n",
" f'{val:.3f}', ha='center', va='bottom')\n",
" ax1.tick_params(axis='x', rotation=45)\n",
" \n",
" # 2. Recall Comparison\n",
" bars2 = ax2.bar(techniques, recalls, color='lightcoral', alpha=0.8)\n",
" ax2.set_title('Average Recall by Balancing Technique')\n",
" ax2.set_ylabel('Recall')\n",
" ax2.set_ylim(0, 1)\n",
" for bar, val in zip(bars2, recalls):\n",
" ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, \n",
" f'{val:.3f}', ha='center', va='bottom')\n",
" ax2.tick_params(axis='x', rotation=45)\n",
" \n",
" # 3. F1 Score Comparison\n",
" bars3 = ax3.bar(techniques, f1_scores, color='lightgreen', alpha=0.8)\n",
" ax3.set_title('Average F1 Score by Balancing Technique')\n",
" ax3.set_ylabel('F1 Score')\n",
" ax3.set_ylim(0, 1)\n",
" for bar, val in zip(bars3, f1_scores):\n",
" ax3.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, \n",
" f'{val:.3f}', ha='center', va='bottom')\n",
" ax3.tick_params(axis='x', rotation=45)\n",
" \n",
" # 4. Precision vs Recall Trade-off\n",
" colors = ['blue', 'red', 'green', 'orange']\n",
" for i, technique in enumerate(techniques):\n",
" ax4.scatter(precisions[i], recalls[i], \n",
" s=200, alpha=0.7, color=colors[i % len(colors)], \n",
" label=technique.replace('_', ' ').title())\n",
" ax4.annotate(technique.replace('_', ' ').title(), \n",
" (precisions[i], recalls[i]), \n",
" xytext=(5, 5), textcoords='offset points')\n",
" \n",
" ax4.set_xlabel('Precision')\n",
" ax4.set_ylabel('Recall')\n",
" ax4.set_title('Precision vs Recall Trade-off')\n",
" ax4.grid(True, alpha=0.3)\n",
" ax4.legend()\n",
" \n",
" # Add diagonal line for F1 score reference\n",
" ax4.plot([0, 1], [0, 1], 'k--', alpha=0.3, label='Equal Precision/Recall')\n",
" \n",
" plt.tight_layout()\n",
" plt.show()\n",
"\n",
"# Run detailed analysis\n",
"if 'all_results' in locals() and all_results:\n",
" analyze_confusion_matrices()\n",
" compare_balancing_techniques_detailed()\n",
" \n",
" # CRITICAL: Add comprehensive confusion matrix variation analysis\n",
" print(\"\\n\" + \"=\" * 90)\n",
" print(\"🎯 COMPREHENSIVE CONFUSION MATRIX VARIATION ANALYSIS\")\n",
" print(\"=\" * 90)\n",
" print(\"\\nThis section analyzes how confusion matrices change across:\")\n",
" print(\"1. Different models (Logistic Regression, Random Forest, etc.)\")\n",
" print(\"2. Different parameter settings for each model\")\n",
" print(\"3. Different class balancing approaches (SMOTE, downsampling, etc.)\")\n",
" print(\"\\nFocus: Understanding precision/recall trade-offs and their business impact\")\n",
" \n",
" # Group results by model for comparison\n",
" model_comparison = defaultdict(list)\n",
" balancing_comparison = defaultdict(list)\n",
" parameter_comparison = defaultdict(list)\n",
" \n",
" for result in all_results:\n",
" metrics = result['metrics']\n",
" model_name = result['model_name']\n",
" balancing = result['balancing_technique']\n",
" param_str = result.get('param_string', 'default')\n",
" \n",
" # Group by model\n",
" model_comparison[model_name].append({\n",
" 'balancing': balancing,\n",
" 'params': param_str,\n",
" 'precision': metrics['precision'],\n",
" 'recall': metrics['recall'],\n",
" 'f1': metrics['f1_score'],\n",
" 'fp': metrics['false_positives'],\n",
" 'fn': metrics['false_negatives'],\n",
" 'tn': metrics['true_negatives'],\n",
" 'tp': metrics['true_positives']\n",
" })\n",
" \n",
" # Group by balancing technique\n",
" balancing_comparison[balancing].append({\n",
" 'model': model_name,\n",
" 'params': param_str,\n",
" 'precision': metrics['precision'],\n",
" 'recall': metrics['recall'],\n",
" 'f1': metrics['f1_score'],\n",
" 'fp': metrics['false_positives'],\n",
" 'fn': metrics['false_negatives']\n",
" })\n",
" \n",
" # Group by parameter variations (for models with multiple param settings)\n",
" if param_str != 'default':\n",
" key = f\"{model_name}_{balancing}\"\n",
" parameter_comparison[key].append({\n",
" 'params': param_str,\n",
" 'precision': metrics['precision'],\n",
" 'recall': metrics['recall'],\n",
" 'f1': metrics['f1_score'],\n",
" 'fp': metrics['false_positives'],\n",
" 'fn': metrics['false_negatives']\n",
" })\n",
" \n",
" # 1. MODEL COMPARISON ANALYSIS\n",
" print(f\"\\n🤖 1. MODEL COMPARISON: How different algorithms affect confusion matrix\")\n",
" print(\"-\" * 70)\n",
" \n",
" for model_name, results in model_comparison.items():\n",
" if len(results) > 0:\n",
" avg_precision = np.mean([r['precision'] for r in results])\n",
" avg_recall = np.mean([r['recall'] for r in results])\n",
" avg_fp = np.mean([r['fp'] for r in results])\n",
" avg_fn = np.mean([r['fn'] for r in results])\n",
" \n",
" print(f\"\\n📊 {model_name.upper()} (averaged across all configurations):\")\n",
" print(f\" • Average Precision: {avg_precision:.4f} → {avg_fp:.0f} false positives on average\")\n",
" print(f\" • Average Recall: {avg_recall:.4f} → {avg_fn:.0f} false negatives on average\")\n",
" \n",
" # Find best and worst configurations for this model\n",
" best_f1 = max(results, key=lambda x: x['f1'])\n",
" worst_f1 = min(results, key=lambda x: x['f1'])\n",
" \n",
" print(f\" • Best config: {best_f1['balancing']} + {best_f1['params']}\")\n",
" print(f\" → Precision: {best_f1['precision']:.4f}, Recall: {best_f1['recall']:.4f}\")\n",
" print(f\" → Confusion: {best_f1['tp']} TP, {best_f1['fp']} FP, {best_f1['fn']} FN, {best_f1['tn']} TN\")\n",
" \n",
" if len(results) > 1:\n",
" print(f\" • Worst config: {worst_f1['balancing']} + {worst_f1['params']}\")\n",
" print(f\" → Precision: {worst_f1['precision']:.4f}, Recall: {worst_f1['recall']:.4f}\")\n",
" print(f\" → Shows {model_name} sensitivity to configuration\")\n",
" \n",
" # 2. BALANCING TECHNIQUE COMPARISON\n",
" print(f\"\\n⚖️ 2. BALANCING TECHNIQUE COMPARISON: How class balancing affects precision/recall\")\n",
" print(\"-\" * 80)\n",
" \n",
" for balancing, results in balancing_comparison.items():\n",
" if len(results) > 0:\n",
" avg_precision = np.mean([r['precision'] for r in results])\n",
" avg_recall = np.mean([r['recall'] for r in results])\n",
" avg_fp = np.mean([r['fp'] for r in results])\n",
" avg_fn = np.mean([r['fn'] for r in results])\n",
" \n",
" print(f\"\\n📊 {balancing.upper().replace('_', ' ')} (averaged across all models):\")\n",
" print(f\" • Average Precision: {avg_precision:.4f} → {avg_fp:.0f} false positives on average\")\n",
" print(f\" • Average Recall: {avg_recall:.4f} → {avg_fn:.0f} false negatives on average\")\n",
" \n",
" # Explain the balancing technique's typical behavior\n",
" if balancing == 'smote':\n",
" print(f\" 💡 SMOTE typically increases recall (catches more fraud) but may reduce precision\")\n",
" print(f\" → Synthetic samples help model learn minority class patterns\")\n",
" elif balancing == 'random_downsample':\n",
" print(f\" 💡 Downsampling often improves precision but may hurt recall\")\n",
" print(f\" → Balanced classes but less training data\")\n",
" elif balancing == 'class_weight':\n",
" print(f\" 💡 Class weighting balances precision/recall through loss function\")\n",
" print(f\" → Keeps all data but adjusts model's decision boundary\")\n",
" elif balancing == 'no_balancing':\n",
" print(f\" 💡 No balancing typically shows high precision, low recall\")\n",
" print(f\" → Model biased toward majority class (non-fraud)\")\n",
" \n",
" # 3. PARAMETER VARIATION IMPACT\n",
" print(f\"\\n🔧 3. PARAMETER VARIATION IMPACT: How hyperparameters change confusion matrix\")\n",
" print(\"-\" * 80)\n",
" \n",
" for key, results in parameter_comparison.items():\n",
" if len(results) > 1: # Only analyze if multiple parameter combinations\n",
" model_name, balancing = key.split('_', 1)\n",
" \n",
" print(f\"\\n📊 {model_name.upper()} + {balancing.upper()}:\")\n",
" \n",
" # Sort by F1 score\n",
" sorted_results = sorted(results, key=lambda x: x['f1'], reverse=True)\n",
" best = sorted_results[0]\n",
" worst = sorted_results[-1]\n",
" \n",
" print(f\" • Best parameters ({best['params']}):\")\n",
" print(f\" → Precision: {best['precision']:.4f}, Recall: {best['recall']:.4f}\")\n",
" print(f\" → Errors: {best['fp']} false positives, {best['fn']} false negatives\")\n",
" \n",
" print(f\" • Worst parameters ({worst['params']}):\")\n",
" print(f\" → Precision: {worst['precision']:.4f}, Recall: {worst['recall']:.4f}\")\n",
" print(f\" → Errors: {worst['fp']} false positives, {worst['fn']} false negatives\")\n",
" \n",
" # Calculate the impact of parameter tuning\n",
" precision_change = best['precision'] - worst['precision']\n",
" recall_change = best['recall'] - worst['recall']\n",
" fp_change = best['fp'] - worst['fp']\n",
" fn_change = best['fn'] - worst['fn']\n",
" \n",
" print(f\" 📈 Parameter tuning impact:\")\n",
" print(f\" → Precision change: {precision_change:+.4f}\")\n",
" print(f\" → Recall change: {recall_change:+.4f}\")\n",
" print(f\" → False positive change: {fp_change:+d} ({'better' if fp_change <= 0 else 'worse'})\")\n",
" print(f\" → False negative change: {fn_change:+d} ({'better' if fn_change <= 0 else 'worse'})\")\n",
" \n",
" # Business interpretation\n",
" if fp_change < 0 and fn_change < 0:\n",
" print(f\" ✅ WIN-WIN: Parameter tuning reduced both error types!\")\n",
" elif fp_change < 0:\n",
" print(f\" 🎯 PRECISION GAIN: Fewer false alarms (better customer experience)\")\n",
" elif fn_change < 0:\n",
" print(f\" 🔍 RECALL GAIN: Fewer missed frauds (better fraud detection)\")\n",
" else:\n",
" print(f\" ⚖️ TRADE-OFF: Overall F1 improved despite individual metric changes\")\n",
" \n",
" # 4. SUMMARY INSIGHTS\n",
" print(f\"\\n🎯 4. KEY INSIGHTS: Confusion Matrix Variations Across All Dimensions\")\n",
" print(\"-\" * 70)\n",
" \n",
" # Find overall best and worst performers\n",
" all_metrics = [r['metrics'] for r in all_results]\n",
" best_overall = max(all_results, key=lambda x: x['metrics']['f1_score'])\n",
" worst_overall = min(all_results, key=lambda x: x['metrics']['f1_score'])\n",
" \n",
" print(f\"\\n🏆 BEST OVERALL CONFIGURATION:\")\n",
" print(f\" • {best_overall['model_name']} + {best_overall['balancing_technique']} + {best_overall.get('param_string', 'default')}\")\n",
" print(f\" • Confusion Matrix: {best_overall['metrics']['true_positives']} TP, {best_overall['metrics']['false_positives']} FP, {best_overall['metrics']['false_negatives']} FN, {best_overall['metrics']['true_negatives']} TN\")\n",
" print(f\" • Precision: {best_overall['metrics']['precision']:.4f}, Recall: {best_overall['metrics']['recall']:.4f}\")\n",
" \n",
" print(f\"\\n📉 WORST OVERALL CONFIGURATION:\")\n",
" print(f\" • {worst_overall['model_name']} + {worst_overall['balancing_technique']} + {worst_overall.get('param_string', 'default')}\")\n",
" print(f\" • Confusion Matrix: {worst_overall['metrics']['true_positives']} TP, {worst_overall['metrics']['false_positives']} FP, {worst_overall['metrics']['false_negatives']} FN, {worst_overall['metrics']['true_negatives']} TN\")\n",
" print(f\" • Precision: {worst_overall['metrics']['precision']:.4f}, Recall: {worst_overall['metrics']['recall']:.4f}\")\n",
" \n",
" # Calculate total improvement potential\n",
" precision_improvement = best_overall['metrics']['precision'] - worst_overall['metrics']['precision']\n",
" recall_improvement = best_overall['metrics']['recall'] - worst_overall['metrics']['recall']\n",
" fp_improvement = worst_overall['metrics']['false_positives'] - best_overall['metrics']['false_positives']\n",
" fn_improvement = worst_overall['metrics']['false_negatives'] - best_overall['metrics']['false_negatives']\n",
" \n",
" print(f\"\\n📊 TOTAL IMPROVEMENT POTENTIAL (Best vs Worst):\")\n",
" print(f\" • Precision improvement: {precision_improvement:.4f}\")\n",
" print(f\" • Recall improvement: {recall_improvement:.4f}\")\n",
" print(f\" • False positives reduced by: {fp_improvement}\")\n",
" print(f\" • False negatives reduced by: {fn_improvement}\")\n",
" print(f\" • This demonstrates the critical importance of proper model selection,\")\n",
" print(f\" parameter tuning, and balancing technique choice!\")\n",
" \n",
"else:\n",
" print(\"⚠️ No experiment results found. Please run the experiments first.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🏆 Best Model Selection & Final Evaluation\n",
"**Select the best performing model and conduct final evaluation**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ================================\n",
"# 🏆 BEST MODEL SELECTION\n",
"# ================================\n",
"\n",
"def select_best_model(results):\n",
" \"\"\"\n",
" Select the best model based on F1 score and business considerations\n",
" \"\"\"\n",
" if not results:\n",
" print(\"❌ No results available for model selection\")\n",
" return None\n",
" \n",
" print(\"🏆 BEST MODEL SELECTION\")\n",
" print(\"=\" * 40)\n",
" \n",
" # Find best model by F1 score\n",
" best_result = max(results, key=lambda x: x['metrics']['f1_score'])\n",
" best_metrics = best_result['metrics']\n",
" \n",
" print(f\"\\n🥇 BEST PERFORMING MODEL:\")\n",
" print(f\" • Model: {best_result['model_name'].replace('_', ' ').title()}\")\n",
" print(f\" • Balancing: {best_result['balancing_technique'].replace('_', ' ').title()}\")\n",
" print(f\" • F1 Score: {best_metrics['f1_score']:.4f}\")\n",
" print(f\" • Precision: {best_metrics['precision']:.4f}\")\n",
" print(f\" • Recall: {best_metrics['recall']:.4f}\")\n",
" print(f\" • Accuracy: {best_metrics['accuracy']:.4f}\")\n",
" \n",
" # Business impact analysis\n",
" fp_cost = best_metrics['false_positives'] * 10\n",
" fn_cost = best_metrics['false_negatives'] * 100\n",
" total_cost = fp_cost + fn_cost\n",
" \n",
" print(f\"\\n💰 BUSINESS IMPACT:\")\n",
" print(f\" • False Positive Cost: ${fp_cost:,}\")\n",
" print(f\" • False Negative Cost: ${fn_cost:,}\")\n",
" print(f\" • Total Estimated Cost: ${total_cost:,}\")\n",
" \n",
" # Alternative recommendations\n",
" print(f\"\\n🎯 ALTERNATIVE CONSIDERATIONS:\")\n",
" \n",
" # Best precision model\n",
" best_precision = max(results, key=lambda x: x['metrics']['precision'])\n",
" if best_precision != best_result:\n",
" print(f\" • Best Precision: {best_precision['model_name'].title()} + {best_precision['balancing_technique'].title()} ({best_precision['metrics']['precision']:.4f})\")\n",
" print(f\" → Use if minimizing false alarms is critical\")\n",
" \n",
" # Best recall model\n",
" best_recall = max(results, key=lambda x: x['metrics']['recall'])\n",
" if best_recall != best_result:\n",
" print(f\" • Best Recall: {best_recall['model_name'].title()} + {best_recall['balancing_technique'].title()} ({best_recall['metrics']['recall']:.4f})\")\n",
" print(f\" → Use if catching all fraud is critical\")\n",
" \n",
" return best_result\n",
"\n",
"def save_best_model(best_result):\n",
" \"\"\"\n",
" Save the best model and its metadata\n",
" \"\"\"\n",
" if not best_result:\n",
" print(\"❌ No best model to save\")\n",
" return\n",
" \n",
" print(f\"\\n💾 SAVING BEST MODEL\")\n",
" print(\"=\" * 30)\n",
" \n",
" # Create model metadata\n",
" metadata = {\n",
" 'model_type': best_result['model_name'],\n",
" 'balancing_technique': best_result['balancing_technique'],\n",
" 'metrics': best_result['metrics'],\n",
" 'technique_info': best_result['technique_info'],\n",
" 'experiment_timestamp': pd.Timestamp.now().isoformat(),\n",
" 'configuration': {\n",
" 'models_tested': MODELS_TO_TEST,\n",
" 'balancing_tested': BALANCING_TECHNIQUES,\n",
" 'evaluation_config': EVALUATION_CONFIG\n",
" }\n",
" }\n",
" \n",
" # Save metadata\n",
" os.makedirs(config.MODELS_DIR, exist_ok=True)\n",
" \n",
" with open(config.MODEL_METADATA_PATH, 'w') as f:\n",
" json.dump(metadata, f, indent=4, default=str)\n",
" \n",
" print(f\"✅ Model metadata saved to {config.MODEL_METADATA_PATH}\")\n",
" \n",
" # Save experiment results\n",
" results_path = config.MODELS_DIR / 'experiment_results.json'\n",
" with open(results_path, 'w') as f:\n",
" json.dump(all_results, f, indent=4, default=str)\n",
" \n",
" print(f\"✅ Experiment results saved to {results_path}\")\n",
" \n",
" return metadata\n",
"\n",
"# Select and save best model\n",
"if 'all_results' in locals() and all_results:\n",
" best_model_result = select_best_model(all_results)\n",
" model_metadata = save_best_model(best_model_result)\n",
"else:\n",
" print(\"⚠️ No experiment results found. Please run the experiments first.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 📋 Executive Summary & Recommendations\n",
"**Key findings and actionable insights from the comprehensive analysis**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ================================\n",
"# 📋 EXECUTIVE SUMMARY\n",
"# ================================\n",
"\n",
"def generate_executive_summary():\n",
" \"\"\"\n",
" Generate comprehensive executive summary of all experiments\n",
" \"\"\"\n",
" print(\"📋 EXECUTIVE SUMMARY: FRAUD DETECTION MODEL EXPERIMENTS\")\n",
" print(\"=\" * 70)\n",
" \n",
" if not all_results:\n",
" print(\"❌ No results available for summary\")\n",
" return\n",
" \n",
" # Calculate summary statistics\n",
" total_experiments = len(all_results)\n",
" models_tested = len(set(r['model_name'] for r in all_results))\n",
" techniques_tested = len(set(r['balancing_technique'] for r in all_results))\n",
" \n",
" # Performance statistics\n",
" f1_scores = [r['metrics']['f1_score'] for r in all_results]\n",
" precisions = [r['metrics']['precision'] for r in all_results]\n",
" recalls = [r['metrics']['recall'] for r in all_results]\n",
" \n",
" print(f\"\\n🔬 EXPERIMENT OVERVIEW:\")\n",
" print(f\" • Total Experiments: {total_experiments}\")\n",
" print(f\" • Models Tested: {models_tested}\")\n",
" print(f\" • Balancing Techniques: {techniques_tested}\")\n",
" \n",
" print(f\"\\n📊 PERFORMANCE SUMMARY:\")\n",
" print(f\" • F1 Score Range: {min(f1_scores):.4f} - {max(f1_scores):.4f}\")\n",
" print(f\" • Average F1 Score: {np.mean(f1_scores):.4f} ± {np.std(f1_scores):.4f}\")\n",
" print(f\" • Precision Range: {min(precisions):.4f} - {max(precisions):.4f}\")\n",
" print(f\" • Recall Range: {min(recalls):.4f} - {max(recalls):.4f}\")\n",
" \n",
" # Best performers\n",
" best_f1 = max(all_results, key=lambda x: x['metrics']['f1_score'])\n",
" best_precision = max(all_results, key=lambda x: x['metrics']['precision'])\n",
" best_recall = max(all_results, key=lambda x: x['metrics']['recall'])\n",
" \n",
" print(f\"\\n🏆 TOP PERFORMERS:\")\n",
" print(f\" • Best F1: {best_f1['model_name'].title()} + {best_f1['balancing_technique'].title()} ({best_f1['metrics']['f1_score']:.4f})\")\n",
" print(f\" • Best Precision: {best_precision['model_name'].title()} + {best_precision['balancing_technique'].title()} ({best_precision['metrics']['precision']:.4f})\")\n",
" print(f\" • Best Recall: {best_recall['model_name'].title()} + {best_recall['balancing_technique'].title()} ({best_recall['metrics']['recall']:.4f})\")\n",
" \n",
" # Key insights\n",
" print(f\"\\n💡 KEY INSIGHTS:\")\n",
" \n",
" # Model insights\n",
" model_performance = {}\n",
" for result in all_results:\n",
" model = result['model_name']\n",
" if model not in model_performance:\n",
" model_performance[model] = []\n",
" model_performance[model].append(result['metrics']['f1_score'])\n",
" \n",
" best_avg_model = max(model_performance.keys(), key=lambda x: np.mean(model_performance[x]))\n",
" print(f\" • Best Average Model: {best_avg_model.title()} (avg F1: {np.mean(model_performance[best_avg_model]):.4f})\")\n",
" \n",
" # Balancing insights\n",
" balancing_performance = {}\n",
" for result in all_results:\n",
" technique = result['balancing_technique']\n",
" if technique not in balancing_performance:\n",
" balancing_performance[technique] = []\n",
" balancing_performance[technique].append(result['metrics']['f1_score'])\n",
" \n",
" best_avg_balancing = max(balancing_performance.keys(), key=lambda x: np.mean(balancing_performance[x]))\n",
" print(f\" • Best Average Balancing: {best_avg_balancing.title()} (avg F1: {np.mean(balancing_performance[best_avg_balancing]):.4f})\")\n",
" \n",
" # Business recommendations\n",
" print(f\"\\n🎯 BUSINESS RECOMMENDATIONS:\")\n",
" \n",
" if best_f1['metrics']['precision'] > 0.8 and best_f1['metrics']['recall'] > 0.8:\n",
" print(f\" ✅ RECOMMENDED: Deploy {best_f1['model_name'].title()} with {best_f1['balancing_technique'].title()}\")\n",
" print(f\" → Excellent balance of precision and recall\")\n",
" print(f\" → Low false alarms AND high fraud detection\")\n",
" elif best_f1['metrics']['precision'] > 0.9:\n",
" print(f\" 🎯 CONSERVATIVE APPROACH: High precision model recommended\")\n",
" print(f\" → Minimizes customer inconvenience from false alarms\")\n",
" print(f\" → Consider for customer-facing applications\")\n",
" elif best_f1['metrics']['recall'] > 0.9:\n",
" print(f\" 🔍 AGGRESSIVE APPROACH: High recall model recommended\")\n",
" print(f\" → Maximizes fraud detection\")\n",
" print(f\" → Consider for high-risk scenarios\")\n",
" else:\n",
" print(f\" ⚖️ BALANCED APPROACH: Consider business priorities\")\n",
" print(f\" → Evaluate cost of false positives vs false negatives\")\n",
" \n",
" print(f\"\\n🔄 NEXT STEPS:\")\n",
" print(f\" 1. Deploy best model to staging environment\")\n",
" print(f\" 2. Conduct A/B testing with current system\")\n",
" print(f\" 3. Monitor performance on live data\")\n",
" print(f\" 4. Collect feedback and retrain as needed\")\n",
" print(f\" 5. Consider ensemble methods for further improvement\")\n",
"\n",
"# Generate executive summary\n",
"if 'all_results' in locals() and all_results:\n",
" generate_executive_summary()\n",
"else:\n",
" print(\"⚠️ No experiment results found. Please run the experiments first.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 🎓 Experiment Conclusions\n",
"\n",
"This enhanced notebook provides a comprehensive framework for fraud detection model experimentation with:\n",
"\n",
"### ✅ **What We Accomplished:**\n",
"1. **🔧 Flexible Configuration**: Easy parameter modification for different hypotheses\n",
"2. **🔄 Model Switching**: Systematic testing of multiple algorithms\n",
"3. **⚖️ Balancing Comparison**: SMOTE vs Downsampling vs Class Weighting analysis\n",
"4. **🎯 Detailed Analysis**: In-depth confusion matrix and precision/recall insights\n",
"5. **📊 Comprehensive Evaluation**: Systematic comparison framework\n",
"\n",
"### 🔍 **Key Learnings:**\n",
"- **Precision vs Recall Trade-offs**: Different balancing techniques affect this balance differently\n",
"- **Model Sensitivity**: Some models are more sensitive to class imbalance than others\n",
"- **Business Impact**: Cost analysis helps guide model selection beyond just accuracy\n",
"- **Technique Effectiveness**: Each balancing approach has specific strengths and weaknesses\n",
"\n",
"### 🚀 **Future Enhancements:**\n",
"- Add ensemble methods (voting, stacking)\n",
"- Implement advanced sampling techniques (ADASYN, BorderlineSMOTE)\n",
"- Include feature selection experiments\n",
"- Add hyperparameter optimization with Bayesian methods\n",
"- Implement cross-validation for more robust evaluation\n",
"\n",
"### 📈 **Usage Instructions:**\n",
"1. **Modify Configuration**: Update the configuration section to test different hypotheses\n",
"2. **Run Experiments**: Execute all cells to run comprehensive experiments\n",
"3. **Analyze Results**: Review detailed analysis and confusion matrix insights\n",
"4. **Select Best Model**: Use business considerations to choose optimal model\n",
"5. **Deploy & Monitor**: Implement selected model with continuous monitoring\n",
"\n",
"This framework enables data scientists to systematically explore different approaches and make informed decisions based on comprehensive analysis rather than single metrics."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Class Imbalance Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fraud detection typically involves highly imbalanced datasets, where fraudulent transactions are much less common than legitimate ones. Let's analyze the class distribution and consider techniques to handle this imbalance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check class distribution\n",
"class_counts = y_train.value_counts()\n",
"class_percentages = class_counts / len(y_train) * 100\n",
"\n",
"print('Class distribution in training data:')\n",
"for i, (count, percentage) in enumerate(zip(class_counts, class_percentages)):\n",
" print(f'Class {i}: {count} samples ({percentage:.2f}%)')\n",
"\n",
"# Visualize class distribution\n",
"plt.figure(figsize=(10, 6))\n",
"sns.countplot(x=y_train)\n",
"plt.title('Class Distribution in Training Data')\n",
"plt.xlabel('Class (0 = Not Fraud, 1 = Fraud)')\n",
"plt.ylabel('Count')\n",
"\n",
"# Add count labels\n",
"for i, count in enumerate(class_counts):\n",
" plt.text(i, count + 100, f'{count:,}\\n({class_percentages[i]:.2f}%)', \n",
" ha='center', va='bottom', fontsize=12)\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Handling Class Imbalance with SMOTE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll use Synthetic Minority Over-sampling Technique (SMOTE) to address the class imbalance by generating synthetic samples of the minority class."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Import SMOTE\n",
"from imblearn.over_sampling import SMOTE\n",
"\n",
"# Create preprocessing pipeline for categorical and numerical features\n",
"preprocessor = ColumnTransformer(\n",
" transformers=[\n",
" ('num', StandardScaler(), numerical_cols),\n",
" ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_cols)\n",
" ])\n",
"\n",
"# Apply preprocessing to training data\n",
"print('Preprocessing training data...')\n",
"X_train_processed = preprocessor.fit_transform(X_train)\n",
"\n",
"# Apply SMOTE to the preprocessed data\n",
"print('Applying SMOTE to handle class imbalance...')\n",
"smote = SMOTE(random_state=42)\n",
"X_train_resampled, y_train_resampled = smote.fit_resample(X_train_processed, y_train)\n",
"\n",
"print(f'Original training data shape: {X_train_processed.shape}')\n",
"print(f'Resampled training data shape: {X_train_resampled.shape}')\n",
"\n",
"# Check class distribution after SMOTE\n",
"resampled_class_counts = pd.Series(y_train_resampled).value_counts()\n",
"resampled_class_percentages = resampled_class_counts / len(y_train_resampled) * 100\n",
"\n",
"print('\n",
"Class distribution after SMOTE:')\n",
"for i, (count, percentage) in enumerate(zip(resampled_class_counts, resampled_class_percentages)):\n",
" print(f'Class {i}: {count} samples ({percentage:.2f}%)')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Model Training"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's train several machine learning models and compare their performance. We'll start with a simple model and then try more complex ones."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"{\n",
" \"cells\": [\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"# Model Training for Fraud Detection\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"This notebook focuses on training and evaluating machine learning models for fraud detection using the preprocessed transaction data.\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Import necessary libraries\\n\",\n",
" \"import pandas as pd\\n\",\n",
" \"import numpy as np\\n\",\n",
" \"import matplotlib.pyplot as plt\\n\",\n",
" \"import seaborn as sns\\n\",\n",
" \"import os\\n\",\n",
" \"import sys\\n\",\n",
" \"import joblib\\n\",\n",
" \"\\n\",\n",
" \"# Set plot style\\n\",\n",
" \"plt.style.use('seaborn-v0_8-whitegrid')\\n\",\n",
" \"sns.set(font_scale=1.2)\\n\",\n",
" \"\\n\",\n",
" \"# Configure plot size\\n\",\n",
" \"plt.rcParams['figure.figsize'] = (12, 8)\\n\",\n",
" \"\\n\",\n",
" \"# Display all columns\\n\",\n",
" \"pd.set_option('display.max_columns', None)\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Add the project root to the path so we can import from src\\n\",\n",
" \"sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath('__file__'))))\\n\",\n",
" \"from src import config\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"## 1. Load the Preprocessed Data\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"Let's load the preprocessed training and test data that we created in the feature engineering notebook.\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Load preprocessed training data\\n\",\n",
" \"try:\\n\",\n",
" \" train_data = pd.read_csv(config.PROCESSED_TRAIN_DATA_PATH)\\n\",\n",
" \" print(f'Loaded preprocessed training data from {config.PROCESSED_TRAIN_DATA_PATH}')\\n\",\n",
" \"except FileNotFoundError:\\n\",\n",
" \" print(f'Preprocessed training data not found at {config.PROCESSED_TRAIN_DATA_PATH}')\\n\",\n",
" \" print('Please run the feature_engineering.ipynb notebook first to create the preprocessed data.')\\n\",\n",
" \" # If preprocessed data doesn't exist, we'll load and preprocess the raw data here\\n\",\n",
" \" # This is just a fallback and would normally be handled by the feature engineering notebook\\n\",\n",
" \" train_data = pd.read_csv(config.TRAIN_DATA_PATH)\\n\",\n",
" \" print(f'Loaded raw training data from {config.TRAIN_DATA_PATH} instead.')\\n\",\n",
" \"\\n\",\n",
" \"# Load preprocessed test data\\n\",\n",
" \"try:\\n\",\n",
" \" test_data = pd.read_csv(config.PROCESSED_TEST_DATA_PATH)\\n\",\n",
" \" print(f'Loaded preprocessed test data from {config.PROCESSED_TEST_DATA_PATH}')\\n\",\n",
" \"except FileNotFoundError:\\n\",\n",
" \" print(f'Preprocessed test data not found at {config.PROCESSED_TEST_DATA_PATH}')\\n\",\n",
" \" # If preprocessed data doesn't exist, we'll load the raw data\\n\",\n",
" \" test_data = pd.read_csv(config.TEST_DATA_PATH)\\n\",\n",
" \" print(f'Loaded raw test data from {config.TEST_DATA_PATH} instead.')\\n\",\n",
" \"\\n\",\n",
" \"print(f'\\nTraining data shape: {train_data.shape}')\\n\",\n",
" \"print(f'Test data shape: {test_data.shape}')\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Display the first few rows of the training data\\n\",\n",
" \"train_data.head()\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"## 2. Data Preparation\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"Let's prepare the data for model training by splitting it into features and target variables, and then into training and validation sets.\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Import necessary libraries for model training\\n\",\n",
" \"from sklearn.model_selection import train_test_split\\n\",\n",
" \"from sklearn.preprocessing import StandardScaler, OneHotEncoder\\n\",\n",
" \"from sklearn.compose import ColumnTransformer\\n\",\n",
" \"from sklearn.pipeline import Pipeline\\n\",\n",
" \"\\n\",\n",
" \"# Check if the target variable exists in the data\\n\",\n",
" \"if 'is_fraud' in train_data.columns:\\n\",\n",
" \" # Split features and target\\n\",\n",
" \" X = train_data.drop('is_fraud', axis=1)\\n\",\n",
" \" y = train_data['is_fraud']\\n\",\n",
" \" \\n\",\n",
" \" # Split into training and validation sets\\n\",\n",
" \" X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)\\n\",\n",
" \" \\n\",\n",
" \" print(f'Training features shape: {X_train.shape}')\\n\",\n",
" \" print(f'Validation features shape: {X_val.shape}')\\n\",\n",
" \" print(f'Training target shape: {y_train.shape}')\\n\",\n",
" \" print(f'Validation target shape: {y_val.shape}')\\n\",\n",
" \"else:\\n\",\n",
" \" print('Target variable 'is_fraud' not found in the data. Please check the data preprocessing step.')\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Identify categorical and numerical features\\n\",\n",
" \"categorical_cols = X_train.select_dtypes(include=['object', 'category']).columns.tolist()\\n\",\n",
" \"numerical_cols = X_train.select_dtypes(include=['int64', 'float64']).columns.tolist()\\n\",\n",
" \"\\n\",\n",
" \"print(f'Categorical features: {categorical_cols}')\\n\",\n",
" \"print(f'Numerical features: {numerical_cols}')\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"## 3. Class Imbalance Analysis\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"Fraud detection typically involves highly imbalanced datasets, where fraudulent transactions are much less common than legitimate ones. Let's analyze the class distribution and consider techniques to handle this imbalance.\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Check class distribution\\n\",\n",
" \"class_counts = y_train.value_counts()\\n\",\n",
" \"class_percentages = class_counts / len(y_train) * 100\\n\",\n",
" \"\\n\",\n",
" \"print('Class distribution in training data:')\\n\",\n",
" \"for i, (count, percentage) in enumerate(zip(class_counts, class_percentages)):\\n\",\n",
" \" print(f'Class {i}: {count} samples ({percentage:.2f}%)')\\n\",\n",
" \"\\n\",\n",
" \"# Visualize class distribution\\n\",\n",
" \"plt.figure(figsize=(10, 6))\\n\",\n",
" \"sns.countplot(x=y_train)\\n\",\n",
" \"plt.title('Class Distribution in Training Data')\\n\",\n",
" \"plt.xlabel('Class (0 = Not Fraud, 1 = Fraud)')\\n\",\n",
" \"plt.ylabel('Count')\\n\",\n",
" \"\\n\",\n",
" \"# Add count labels\\n\",\n",
" \"for i, count in enumerate(class_counts):\\n\",\n",
" \" plt.text(i, count + 100, f'{count:,}\\n({class_percentages[i]:.2f}%)', \\n\",\n",
" \" ha='center', va='bottom', fontsize=12)\\n\",\n",
" \"\\n\",\n",
" \"plt.show()\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"### Handling Class Imbalance with SMOTE\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"We'll use Synthetic Minority Over-sampling Technique (SMOTE) to address the class imbalance by generating synthetic samples of the minority class.\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Import SMOTE\\n\",\n",
" \"from imblearn.over_sampling import SMOTE\\n\",\n",
" \"\\n\",\n",
" \"# Create preprocessing pipeline for categorical and numerical features\\n\",\n",
" \"preprocessor = ColumnTransformer(\\n\",\n",
" \" transformers=[\\n\",\n",
" \" ('num', StandardScaler(), numerical_cols),\\n\",\n",
" \" ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_cols)\\n\",\n",
" \" ])\\n\",\n",
" \"\\n\",\n",
" \"# Apply preprocessing to training data\\n\",\n",
" \"print('Preprocessing training data...')\\n\",\n",
" \"X_train_processed = preprocessor.fit_transform(X_train)\\n\",\n",
" \"\\n\",\n",
" \"# Apply SMOTE to the preprocessed data\\n\",\n",
" \"print('Applying SMOTE to handle class imbalance...')\\n\",\n",
" \"smote = SMOTE(random_state=42)\\n\",\n",
" \"X_train_resampled, y_train_resampled = smote.fit_resample(X_train_processed, y_train)\\n\",\n",
" \"\\n\",\n",
" \"print(f'Original training data shape: {X_train_processed.shape}')\\n\",\n",
" \"print(f'Resampled training data shape: {X_train_resampled.shape}')\\n\",\n",
" \"\\n\",\n",
" \"# Check class distribution after SMOTE\\n\",\n",
" \"resampled_class_counts = pd.Series(y_train_resampled).value_counts()\\n\",\n",
" \"resampled_class_percentages = resampled_class_counts / len(y_train_resampled) * 100\\n\",\n",
" \"\\n\",\n",
" \"print('\\nClass distribution after SMOTE:')\\n\",\n",
" \"for i, (count, percentage) in enumerate(zip(resampled_class_counts, resampled_class_percentages)):\\n\",\n",
" \" print(f'Class {i}: {count} samples ({percentage:.2f}%)')\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"## 4. Model Training\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"Now let's train several machine learning models and compare their performance. We'll start with a simple model and then try more complex ones.\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Import models and evaluation metrics\\n\",\n",
" \"from sklearn.linear_model import LogisticRegression\\n\",\n",
" \"from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier\\n\",\n",
" \"from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report\\n\",\n",
" \"\\n\",\n",
" \"# Function to evaluate model performance\\n\",\n",
" \"def evaluate_model(model, X_test, y_test, model_name):\\n\",\n",
" \" # Make predictions\\n\",\n",
" \" y_pred = model.predict(X_test)\\n\",\n",
" \" \\n\",\n",
" \" # Calculate metrics\\n\",\n",
" \" accuracy = accuracy_score(y_test, y_pred)\\n\",\n",
" \" precision = precision_score(y_test, y_pred)\\n\",\n",
" \" recall = recall_score(y_test, y_pred)\\n\",\n",
" \" f1 = f1_score(y_test, y_pred)\\n\",\n",
" \" \\n\",\n",
" \" # Print metrics\\n\",\n",
" \" print(f'\\n{model_name} Performance:')\\n\",\n",
" \" print(f'Accuracy: {accuracy:.4f}')\\n\",\n",
" \" print(f'Precision: {precision:.4f}')\\n\",\n",
" \" print(f'Recall: {recall:.4f}')\\n\",\n",
" \" print(f'F1 Score: {f1:.4f}')\\n\",\n",
" \" \\n\",\n",
" \" # Print confusion matrix\\n\",\n",
" \" cm = confusion_matrix(y_test, y_pred)\\n\",\n",
" \" print('\\nConfusion Matrix:')\\n\",\n",
" \" print(cm)\\n\",\n",
" \" \\n\",\n",
" \" # Plot confusion matrix\\n\",\n",
" \" plt.figure(figsize=(8, 6))\\n\",\n",
" \" sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False)\\n\",\n",
" \" plt.xlabel('Predicted')\\n\",\n",
" \" plt.ylabel('True')\\n\",\n",
" \" plt.title(f'Confusion Matrix - {model_name}')\\n\",\n",
" \" plt.show()\\n\",\n",
" \" \\n\",\n",
" \" # Print classification report\\n\",\n",
" \" print('\\nClassification Report:')\\n\",\n",
" \" print(classification_report(y_test, y_pred))\\n\",\n",
" \" \\n\",\n",
" \" return {'accuracy': accuracy, 'precision': precision, 'recall': recall, 'f1': f1, 'confusion_matrix': cm}\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"### 4.1 Logistic Regression\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Train Logistic Regression model\\n\",\n",
" \"print('Training Logistic Regression model...')\\n\",\n",
" \"lr_model = LogisticRegression(random_state=42, max_iter=1000, class_weight='balanced')\\n\",\n",
" \"lr_model.fit(X_train_resampled, y_train_resampled)\\n\",\n",
" \"\\n\",\n",
" \"# Preprocess validation data\\n\",\n",
" \"X_val_processed = preprocessor.transform(X_val)\\n\",\n",
" \"\\n\",\n",
" \"# Evaluate model\\n\",\n",
" \"lr_metrics = evaluate_model(lr_model, X_val_processed, y_val, 'Logistic Regression')\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"### 4.2 Random Forest\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Train Random Forest model\\n\",\n",
" \"print('Training Random Forest model...')\\n\",\n",
" \"rf_model = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')\\n\",\n",
" \"rf_model.fit(X_train_resampled, y_train_resampled)\\n\",\n",
" \"\\n\",\n",
" \"# Evaluate model\\n\",\n",
" \"rf_metrics = evaluate_model(rf_model, X_val_processed, y_val, 'Random Forest')\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"### 4.3 Gradient Boosting\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Train Gradient Boosting model\\n\",\n",
" \"print('Training Gradient Boosting model...')\\n\",\n",
" \"gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42)\\n\",\n",
" \"gb_model.fit(X_train_resampled, y_train_resampled)\\n\",\n",
" \"\\n\",\n",
" \"# Evaluate model\\n\",\n",
" \"gb_metrics = evaluate_model(gb_model, X_val_processed, y_val, 'Gradient Boosting')\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"## 5. Model Comparison\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"Let's compare the performance of the different models to select the best one.\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Create a DataFrame to compare model performance\\n\",\n",
" \"models = ['Logistic Regression', 'Random Forest', 'Gradient Boosting']\\n\",\n",
" \"metrics = ['accuracy', 'precision', 'recall', 'f1']\\n\",\n",
" \"\\n\",\n",
" \"comparison_data = []\\n\",\n",
" \"for metric in metrics:\\n\",\n",
" \" comparison_data.append([\\n\",\n",
" \" lr_metrics[metric],\\n\",\n",
" \" rf_metrics[metric],\\n\",\n",
" \" gb_metrics[metric]\\n\",\n",
" \" ])\\n\",\n",
" \"\\n\",\n",
" \"comparison_df = pd.DataFrame(comparison_data, columns=models, index=metrics)\\n\",\n",
" \"\\n\",\n",
" \"# Display the comparison table\\n\",\n",
" \"print('Model Performance Comparison:')\\n\",\n",
" \"comparison_df\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Visualize model comparison\\n\",\n",
" \"plt.figure(figsize=(12, 8))\\n\",\n",
" \"comparison_df.plot(kind='bar', figsize=(12, 8))\\n\",\n",
" \"plt.title('Model Performance Comparison')\\n\",\n",
" \"plt.xlabel('Metric')\\n\",\n",
" \"plt.ylabel('Score')\\n\",\n",
" \"plt.xticks(rotation=0)\\n\",\n",
" \"plt.legend(title='Model')\\n\",\n",
" \"plt.grid(axis='y')\\n\",\n",
" \"\\n\",\n",
" \"# Add value labels\\n\",\n",
" \"for i, metric in enumerate(metrics):\\n\",\n",
" \" for j, model in enumerate(models):\\n\",\n",
" \" value = comparison_df.iloc[i, j]\\n\",\n",
" \" plt.text(i + (j - 1) * 0.3, value + 0.01, f'{value:.4f}', ha='center', va='bottom', fontsize=9)\\n\",\n",
" \"\\n\",\n",
" \"plt.tight_layout()\\n\",\n",
" \"plt.show()\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"## 6. Feature Importance\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"Let's analyze which features are most important for the best performing model (Random Forest in this case).\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Get feature names after one-hot encoding\\n\",\n",
" \"# For numerical features, the names remain the same\\n\",\n",
" \"# For categorical features, we need to get the one-hot encoded feature names\\n\",\n",
" \"\\n\",\n",
" \"# Get the one-hot encoder from the preprocessor\\n\",\n",
" \"ohe = preprocessor.named_transformers_['cat']\\n\",\n",
" \"\\n\",\n",
" \"# Get the one-hot encoded feature names\\n\",\n",
" \"categorical_features = []\\n\",\n",
" \"for i, category in enumerate(categorical_cols):\\n\",\n",
" \" values = ohe.categories_[i]\\n\",\n",
" \" for value in values:\\n\",\n",
" \" categorical_features.append(f'{category}_{value}')\\n\",\n",
" \"\\n\",\n",
" \"# Combine with numerical feature names\\n\",\n",
" \"feature_names = numerical_cols + categorical_features\\n\",\n",
" \"\\n\",\n",
" \"# Get feature importances from the Random Forest model\\n\",\n",
" \"importances = rf_model.feature_importances_\\n\",\n",
" \"\\n\",\n",
" \"# Create a DataFrame for visualization\\n\",\n",
" \"feature_importance = pd.DataFrame({\\n\",\n",
" \" 'Feature': feature_names,\\n\",\n",
" \" 'Importance': importances\\n\",\n",
" \"}).sort_values('Importance', ascending=False)\\n\",\n",
" \"\\n\",\n",
" \"# Display the top 20 most important features\\n\",\n",
" \"print('Top 20 Most Important Features:')\\n\",\n",
" \"feature_importance.head(20)\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Visualize feature importance\\n\",\n",
" \"plt.figure(figsize=(12, 10))\\n\",\n",
" \"sns.barplot(x='Importance', y='Feature', data=feature_importance.head(20))\\n\",\n",
" \"plt.title('Top 20 Feature Importance')\\n\",\n",
" \"plt.xlabel('Importance')\\n\",\n",
" \"plt.ylabel('Feature')\\n\",\n",
" \"plt.tight_layout()\\n\",\n",
" \"plt.show()\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"## 7. Save the Best Model\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"Let's save the best performing model (Random Forest) for later use.\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {},\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"# Create a full pipeline with preprocessing and the best model\\n\",\n",
" \"best_model = Pipeline(steps=[\\n\",\n",
" \" ('preprocessor', preprocessor),\\n\",\n",
" \" ('classifier', rf_model)\\n\",\n",
" \"])\\n\",\n",
" \"\\n\",\n",
" \"# Save the model\\n\",\n",
" \"import os\\n\",\n",
" \"os.makedirs(config.MODELS_DIR, exist_ok=True)\\n\",\n",
" \"joblib.dump(best_model, config.MODEL_PATH)\\n\",\n",
" \"print(f'Model saved to {config.MODEL_PATH}')\\n\",\n",
" \"\\n\",\n",
" \"# Save model metadata\\n\",\n",
" \"import json\\n\",\n",
" \"metadata = {\\n\",\n",
" \" 'model_type': 'RandomForestClassifier',\\n\",\n",
" \" 'metrics': {\\n\",\n",
" \" 'accuracy': float(rf_metrics['accuracy']),\\n\",\n",
" \" 'precision': float(rf_metrics['precision']),\\n\",\n",
" \" 'recall': float(rf_metrics['recall']),\\n\",\n",
" \" 'f1': float(rf_metrics['f1'])\\n\",\n",
" \" },\\n\",\n",
" \" 'feature_importance': feature_importance.head(20).to_dict(orient='records'),\\n\",\n",
" \" 'features': X_train.columns.tolist()\\n\",\n",
" \"}\\n\",\n",
" \"\\n\",\n",
" \"with open(config.MODEL_METADATA_PATH, 'w') as f:\\n\",\n",
" \" json.dump(metadata, f, indent=4)\\n\",\n",
" \"\\n\",\n",
" \"print(f'Model metadata saved to {config.MODEL_METADATA_PATH}')\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"## 8. Summary\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"In this notebook, we trained and evaluated several machine learning models for fraud detection:\\n\",\n",
" \"\\n\",\n",
" \"1. **Data Preparation**: We loaded the preprocessed data and split it into training and validation sets.\\n\",\n",
" \"\\n\",\n",
" \"2. **Class Imbalance**: We addressed the class imbalance problem using SMOTE to generate synthetic samples of the minority class.\\n\",\n",
" \"\\n\",\n",
" \"3. **Model Training**: We trained three different models - Logistic Regression, Random Forest, and Gradient Boosting.\\n\",\n",
" \"\\n\",\n",
" \"4. **Model Evaluation**: We evaluated the models using accuracy, precision, recall, and F1 score, with a focus on the F1 score due to the class imbalance.\\n\",\n",
" \"\\n\",\n",
" \"5. **Model Comparison**: We compared the performance of the different models and found that Random Forest performed the best overall.\\n\",\n",
" \"\\n\",\n",
" \"6. **Feature Importance**: We analyzed which features were most important for the Random Forest model.\\n\",\n",
" \"\\n\",\n",
" \"7. **Model Saving**: We saved the best model (Random Forest) and its metadata for later use.\\n\",\n",
" \"\\n\",\n",
" \"The Random Forest model achieved good performance in detecting fraudulent transactions, with a balance between precision and recall as reflected in the F1 score. The most important features for fraud detection included transaction amount, distance between cardholder and merchant, and time-based features.\\n\",\n",
" \"\\n\",\n",
" \"Next steps could include:\\n\",\n",
" \"- Fine-tuning the model hyperparameters using grid search or random search\\n\",\n",
" \"- Trying more advanced models like XGBoost or neural networks\\n\",\n",
" \"- Implementing the model in a production environment for real-time fraud detection\"\n",
" ]\n",
" }\n",
" ],\n",
" \"metadata\": {\n",
" \"kernelspec\": {\n",
" \"display_name\": \"Python 3\",\n",
" \"language\": \"python\",\n",
" \"name\": \"python3\"\n",
" },\n",
" \"language_info\": {\n",
" \"codemirror_mode\": {\n",
" \"name\": \"ipython\",\n",
" \"version\": 3\n",
" },\n",
" \"file_extension\": \".py\",\n",
" \"mimetype\": \"text/x-python\",\n",
" \"name\": \"python\",\n",
" \"nbconvert_exporter\": \"python\",\n",
" \"pygments_lexer\": \"ipython3\",\n",
" \"version\": \"3.8.10\"\n",
" }\n",
" },\n",
" \"nbformat\": 4,\n",
" \"nbformat_minor\": 4\n",
"}\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.1 Logistic Regression"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Train Logistic Regression model\n",
"print('Training Logistic Regression model...')\n",
"lr_model = LogisticRegression(random_state=42, max_iter=1000, class_weight='balanced')\n",
"lr_model.fit(X_train_resampled, y_train_resampled)\n",
"\n",
"# Preprocess validation data\n",
"X_val_processed = preprocessor.transform(X_val)\n",
"\n",
"# Evaluate model\n",
"lr_metrics = evaluate_model(lr_model, X_val_processed, y_val, 'Logistic Regression')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.2 Random Forest"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Train Random Forest model\n",
"print('Training Random Forest model...')\n",
"rf_model = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')\n",
"rf_model.fit(X_train_resampled, y_train_resampled)\n",
"\n",
"# Evaluate model\n",
"rf_metrics = evaluate_model(rf_model, X_val_processed, y_val, 'Random Forest')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.3 Gradient Boosting"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Train Gradient Boosting model\n",
"print('Training Gradient Boosting model...')\n",
"gb_model = GradientBoostingClassifier(n_estimators=100, random_state=42)\n",
"gb_model.fit(X_train_resampled, y_train_resampled)\n",
"\n",
"# Evaluate model\n",
"gb_metrics = evaluate_model(gb_model, X_val_processed, y_val, 'Gradient Boosting')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Model Comparison"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's compare the performance of the different models to select the best one."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a DataFrame to compare model performance\n",
"models = ['Logistic Regression', 'Random Forest', 'Gradient Boosting']\n",
"metrics = ['accuracy', 'precision', 'recall', 'f1']\n",
"\n",
"comparison_data = []\n",
"for metric in metrics:\n",
" comparison_data.append([\n",
" lr_metrics[metric],\n",
" rf_metrics[metric],\n",
" gb_metrics[metric]\n",
" ])\n",
"\n",
"comparison_df = pd.DataFrame(comparison_data, columns=models, index=metrics)\n",
"\n",
"# Display the comparison table\n",
"print('Model Performance Comparison:')\n",
"comparison_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Visualize model comparison\n",
"plt.figure(figsize=(12, 8))\n",
"comparison_df.plot(kind='bar', figsize=(12, 8))\n",
"plt.title('Model Performance Comparison')\n",
"plt.xlabel('Metric')\n",
"plt.ylabel('Score')\n",
"plt.xticks(rotation=0)\n",
"plt.legend(title='Model')\n",
"plt.grid(axis='y')\n",
"\n",
"# Add value labels\n",
"for i, metric in enumerate(metrics):\n",
" for j, model in enumerate(models):\n",
" value = comparison_df.iloc[i, j]\n",
" plt.text(i + (j - 1) * 0.3, value + 0.01, f'{value:.4f}', ha='center', va='bottom', fontsize=9)\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Feature Importance"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's analyze which features are most important for the best performing model (Random Forest in this case)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get feature names after one-hot encoding\n",
"# For numerical features, the names remain the same\n",
"# For categorical features, we need to get the one-hot encoded feature names\n",
"\n",
"# Get the one-hot encoder from the preprocessor\n",
"ohe = preprocessor.named_transformers_['cat']\n",
"\n",
"# Get the one-hot encoded feature names\n",
"categorical_features = []\n",
"for i, category in enumerate(categorical_cols):\n",
" values = ohe.categories_[i]\n",
" for value in values:\n",
" categorical_features.append(f'{category}_{value}')\n",
"\n",
"# Combine with numerical feature names\n",
"feature_names = numerical_cols + categorical_features\n",
"\n",
"# Get feature importances from the Random Forest model\n",
"importances = rf_model.feature_importances_\n",
"\n",
"# Create a DataFrame for visualization\n",
"feature_importance = pd.DataFrame({\n",
" 'Feature': feature_names,\n",
" 'Importance': importances\n",
"}).sort_values('Importance', ascending=False)\n",
"\n",
"# Display the top 20 most important features\n",
"print('Top 20 Most Important Features:')\n",
"feature_importance.head(20)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Visualize feature importance\n",
"plt.figure(figsize=(12, 10))\n",
"sns.barplot(x='Importance', y='Feature', data=feature_importance.head(20))\n",
"plt.title('Top 20 Feature Importance')\n",
"plt.xlabel('Importance')\n",
"plt.ylabel('Feature')\n",
"plt.tight_layout()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Save the Best Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's save the best performing model (Random Forest) for later use."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a full pipeline with preprocessing and the best model\n",
"best_model = Pipeline(steps=[\n",
" ('preprocessor', preprocessor),\n",
" ('classifier', rf_model)\n",
"])\n",
"\n",
"# Save the model\n",
"import os\n",
"os.makedirs(config.MODELS_DIR, exist_ok=True)\n",
"joblib.dump(best_model, config.MODEL_PATH)\n",
"print(f'Model saved to {config.MODEL_PATH}')\n",
"\n",
"# Save model metadata\n",
"import json\n",
"metadata = {\n",
" 'model_type': 'RandomForestClassifier',\n",
" 'metrics': {\n",
" 'accuracy': float(rf_metrics['accuracy']),\n",
" 'precision': float(rf_metrics['precision']),\n",
" 'recall': float(rf_metrics['recall']),\n",
" 'f1': float(rf_metrics['f1'])\n",
" },\n",
" 'feature_importance': feature_importance.head(20).to_dict(orient='records'),\n",
" 'features': X_train.columns.tolist()\n",
"}\n",
"\n",
"with open(config.MODEL_METADATA_PATH, 'w') as f:\n",
" json.dump(metadata, f, indent=4)\n",
"\n",
"print(f'Model metadata saved to {config.MODEL_METADATA_PATH}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. Summary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, we trained and evaluated several machine learning models for fraud detection:\n",
"\n",
"1. **Data Preparation**: We loaded the preprocessed data and split it into training and validation sets.\n",
"\n",
"2. **Class Imbalance**: We addressed the class imbalance problem using SMOTE to generate synthetic samples of the minority class.\n",
"\n",
"3. **Model Training**: We trained three different models - Logistic Regression, Random Forest, and Gradient Boosting.\n",
"\n",
"4. **Model Evaluation**: We evaluated the models using accuracy, precision, recall, and F1 score, with a focus on the F1 score due to the class imbalance.\n",
"\n",
"5. **Model Comparison**: We compared the performance of the different models and found that Random Forest performed the best overall.\n",
"\n",
"6. **Feature Importance**: We analyzed which features were most important for the Random Forest model.\n",
"\n",
"7. **Model Saving**: We saved the best model (Random Forest) and its metadata for later use.\n",
"\n",
"The Random Forest model achieved good performance in detecting fraudulent transactions, with a balance between precision and recall as reflected in the F1 score. The most important features for fraud detection included transaction amount, distance between cardholder and merchant, and time-based features.\n",
"\n",
"Next steps could include:\n",
"- Fine-tuning the model hyperparameters using grid search or random search\n",
"- Trying more advanced models like XGBoost or neural networks\n",
"- Implementing the model in a production environment for real-time fraud detection"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}