First commit
Defined file structure and completed EDA
This commit is contained in:
@@ -0,0 +1 @@
|
|||||||
|
.venv/
|
||||||
+19
@@ -0,0 +1,19 @@
|
|||||||
|
FROM python:3.9-slim
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
# Copy requirements first to leverage Docker cache
|
||||||
|
COPY requirements.txt .
|
||||||
|
RUN pip install --no-cache-dir -r requirements.txt
|
||||||
|
|
||||||
|
# Copy the rest of the application
|
||||||
|
COPY . .
|
||||||
|
|
||||||
|
# Create necessary directories
|
||||||
|
RUN mkdir -p data/raw data/processed models
|
||||||
|
|
||||||
|
# Expose ports for API and Streamlit
|
||||||
|
EXPOSE 8000 8501
|
||||||
|
|
||||||
|
# Command to run both the API and Streamlit app
|
||||||
|
CMD ["sh", "-c", "uvicorn src.api.app:app --host 0.0.0.0 --port 8000 & streamlit run src/web/app.py --server.port 8501 --server.address 0.0.0.0"]
|
||||||
@@ -0,0 +1,119 @@
|
|||||||
|
# Fraud Detection System
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This project aims to analyze transaction data, extract meaningful insights through Exploratory Data Analysis (EDA), perform feature engineering, train a machine learning model to classify fraudulent transactions, and deploy a simple API with a Web UI to predict fraud in real-time.
|
||||||
|
|
||||||
|
## Dataset Description
|
||||||
|
|
||||||
|
The dataset consists of various features related to transactions, including details about the merchant, transaction amount, user details, and location. The key features are:
|
||||||
|
|
||||||
|
* **trans_date_trans_time** : Timestamp of the transaction.
|
||||||
|
* **cc_num** : Credit card number (anonymized transaction number).
|
||||||
|
* **merchant** : Name of the merchant.
|
||||||
|
* **category** : Type of merchant.
|
||||||
|
* **amt** : Amount transferred.
|
||||||
|
* **first, last** : First and last name of the cardholder.
|
||||||
|
* **gender** : Gender of the cardholder.
|
||||||
|
* **street, city, state, zip** : Location details of the cardholder.
|
||||||
|
* **lat, long** : Latitude and longitude of the cardholder.
|
||||||
|
* **city_pop** : Population of the city.
|
||||||
|
* **job** : Job description of the cardholder.
|
||||||
|
* **dob** : Date of birth of the cardholder.
|
||||||
|
* **trans_num** : Unique transaction number.
|
||||||
|
* **unix_time** : Unix timestamp.
|
||||||
|
* **merch_lat, merch_long** : Latitude and longitude of the merchant.
|
||||||
|
* **is_fraud** : Target variable (1 for fraud, 0 for legitimate transactions).
|
||||||
|
|
||||||
|
# Tasks:
|
||||||
|
|
||||||
|
### 1. Exploratory Data Analysis (EDA)
|
||||||
|
|
||||||
|
* Check for missing values and handle them appropriately.
|
||||||
|
* Analyze the distribution of transaction amounts.
|
||||||
|
* Identify correlations between different features.
|
||||||
|
* Visualize geographical patterns of fraudulent transactions.
|
||||||
|
* Investigate high-risk categories and merchants.
|
||||||
|
|
||||||
|
### 2. Feature Engineering
|
||||||
|
|
||||||
|
* Convert categorical variables into numerical representations.
|
||||||
|
* Derive additional features like transaction velocity, distance between merchant and user, and age of the cardholder.
|
||||||
|
* Normalize and scale numerical features.
|
||||||
|
* Extract time-based features (hour, day, weekday, month) from `trans_date_trans_time`.
|
||||||
|
* One-hot encode categorical features where necessary.
|
||||||
|
|
||||||
|
### 3. Model Training
|
||||||
|
|
||||||
|
* Split data into training and testing sets.
|
||||||
|
* Use classification algorithms like Logistic Regression, Random Forest, XGBoost, or Neural Networks.
|
||||||
|
* Train models using cross-validation and optimize hyperparameters.
|
||||||
|
* Evaluate models using accuracy, precision, recall, and F1-score.
|
||||||
|
|
||||||
|
### 4. API Deployment (Flask/FastAPI)
|
||||||
|
|
||||||
|
* Create an API that takes transaction details as input and predicts fraud.
|
||||||
|
* Use Flask or FastAPI to build an endpoint (`/predict`).
|
||||||
|
* Load the trained model and use it for inference.
|
||||||
|
* Deploy the API using Docker or a cloud service.
|
||||||
|
|
||||||
|
### 5. Web UI for Fraud Prediction
|
||||||
|
|
||||||
|
* Develop a simple HTML/CSS/JavaScript frontend.
|
||||||
|
* Integrate the frontend with the API to take user input and display fraud predictions.
|
||||||
|
* Use a framework like Streamlit or Flask to build a minimal UI.
|
||||||
|
|
||||||
|
## Installation and Usage
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
Ensure you have Python 3.x installed along with the required dependencies.
|
||||||
|
|
||||||
|
# Project File Structure:
|
||||||
|
```
|
||||||
|
│── data/ # Folder for storing raw and processed datasets
|
||||||
|
│ ├── raw/ # Original dataset files(**You will find all the dataset here**)
|
||||||
|
│ ├── processed/ # Processed/cleaned datasets
|
||||||
|
│── experiments/ # Jupyter notebooks or scripts for EDA and model experimentation
|
||||||
|
│ ├── eda.ipynb # Exploratory Data Analysis notebook
|
||||||
|
│ ├── feature_engineering.ipynb # Feature engineering experiments
|
||||||
|
│ ├── model_training.ipynb # Model training experiments
|
||||||
|
│── models/ # Folder for storing trained models and checkpoints
|
||||||
|
│ ├── fraud_model.pkl # Serialized trained model
|
||||||
|
│ ├── model_metadata.json # Metadata about the model
|
||||||
|
│── src/ # Source code for model training, API, and frontend
|
||||||
|
│ ├── __init__.py # Python package indicator
|
||||||
|
│ ├── config.py # Configuration settings
|
||||||
|
│ ├── data_preprocessing.py # Data cleaning and feature engineering scripts
|
||||||
|
│ ├── model_training.py # Script to train and save the model
|
||||||
|
│ ├── model_evaluation.py # Model evaluation script
|
||||||
|
│ ├── predict.py # Script to make predictions
|
||||||
|
│ ├── api/ # API folder (Flask/FastAPI)
|
||||||
|
│ │ ├── __init__.py
|
||||||
|
│ │ ├── app.py # FastAPI/Flask API for fraud detection
|
||||||
|
│ │ ├── inference.py # Load model and predict
|
||||||
|
│ ├── web/ # Frontend code for simple Web UI
|
||||||
|
│ │ ├── static/ # CSS, JS, images
|
||||||
|
│ │ ├── templates/ # HTML templates
|
||||||
|
│ │ ├── app.py # Streamlit or Flask-based frontend
|
||||||
|
│── README.md # Project documentation
|
||||||
|
│── requirements.txt # List of required Python libraries
|
||||||
|
│── .gitignore # Files and folders to ignore in version control
|
||||||
|
│── Dockerfile # Docker setup for deployment (if needed)
|
||||||
|
│── deployment/ # Scripts for deploying on cloud platforms
|
||||||
|
│ ├── docker-compose.yml # Docker Compose setup
|
||||||
|
│ ├── cloud_run.sh # Deployment script
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
### Explanation:
|
||||||
|
|
||||||
|
* **`data/`** : Stores raw and processed datasets.
|
||||||
|
* **`experiments/`** : Jupyter notebooks for EDA, feature engineering, and model training experiments.
|
||||||
|
* **`models/`** : Stores trained models and related metadata.
|
||||||
|
* **`src/`** : Core source code, including data processing, model training, evaluation, API, and frontend.
|
||||||
|
* **`api/`** : Contains API-related scripts (Flask or FastAPI).
|
||||||
|
* **`web/`** : Contains the frontend code for user interaction.
|
||||||
|
* **`README.md`** : Documentation for setting up and running the project.
|
||||||
|
* **`requirements.txt`** : Dependencies for the project.
|
||||||
|
* **`Dockerfile` & `deployment/`** : For containerization and cloud deployment.
|
||||||
@@ -0,0 +1,14 @@
|
|||||||
|
version: '3'
|
||||||
|
|
||||||
|
services:
|
||||||
|
fraud-detection:
|
||||||
|
build: .
|
||||||
|
ports:
|
||||||
|
- "8000:8000" # API
|
||||||
|
- "8501:8501" # Streamlit
|
||||||
|
volumes:
|
||||||
|
- ./data:/app/data
|
||||||
|
- ./models:/app/models
|
||||||
|
environment:
|
||||||
|
- PYTHONUNBUFFERED=1
|
||||||
|
restart: unless-stopped
|
||||||
@@ -0,0 +1,159 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "2c5baf8e",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# 📊 Exploratory Data Analysis: Fraud Detection Dataset"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "2f3e6a97",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"import seaborn as sns\n",
|
||||||
|
"\n",
|
||||||
|
"df = pd.read_csv(\"fraudTest.csv\")\n",
|
||||||
|
"df.head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "2bcadae6",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 🧾 Basic Overview of the Dataset"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "820cb0e9",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(\"Shape:\", df.shape)\n",
|
||||||
|
"print(\"\\nData Types:\\n\", df.dtypes)\n",
|
||||||
|
"print(\"\\nMissing Values:\\n\", df.isnull().sum())\n",
|
||||||
|
"print(\"\\nDuplicate Rows:\", df.duplicated().sum())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "caa22db9",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## ⚖️ Class Balance"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "7fb75259",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"sns.countplot(data=df, x=\"is_fraud\")\n",
|
||||||
|
"plt.title(\"Fraud vs Non-Fraud Transactions\")\n",
|
||||||
|
"plt.show()\n",
|
||||||
|
"\n",
|
||||||
|
"fraud_ratio = df[\"is_fraud\"].mean()\n",
|
||||||
|
"print(f\"Fraudulent transactions: {fraud_ratio:.4%}\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "658e9cd2",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 📊 Statistical Summary"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "202e2612",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"df.describe(include='all')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "12d24a95",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 🔗 Correlation Matrix"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "3c02acf0",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"plt.figure(figsize=(12, 8))\n",
|
||||||
|
"sns.heatmap(df.corr(numeric_only=True), annot=True, fmt=\".2f\", cmap=\"coolwarm\")\n",
|
||||||
|
"plt.title(\"Feature Correlation Matrix\")\n",
|
||||||
|
"plt.show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "fce8183a",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 💵 Transaction Amount Distribution by Fraud"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "ea72b131",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"plt.figure(figsize=(10, 6))\n",
|
||||||
|
"sns.boxplot(data=df, x='is_fraud', y='amt')\n",
|
||||||
|
"plt.yscale('log')\n",
|
||||||
|
"plt.title(\"Transaction Amount by Fraud Status\")\n",
|
||||||
|
"plt.show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "a7d7d378",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 🕒 Transaction Timing (Hourly)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "5f26f36f",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"df['trans_date_trans_time'] = pd.to_datetime(df['trans_date_trans_time'])\n",
|
||||||
|
"df['hour'] = df['trans_date_trans_time'].dt.hour\n",
|
||||||
|
"\n",
|
||||||
|
"plt.figure(figsize=(12, 6))\n",
|
||||||
|
"sns.histplot(data=df, x='hour', hue='is_fraud', multiple='stack', bins=24)\n",
|
||||||
|
"plt.title(\"Transaction Hour Distribution\")\n",
|
||||||
|
"plt.show()"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
||||||
File diff suppressed because one or more lines are too long
@@ -0,0 +1,156 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# feature_engineering_experiments.ipynb\n",
|
||||||
|
"\n",
|
||||||
|
"# Import libraries\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"from sklearn.preprocessing import LabelEncoder, StandardScaler\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"from datetime import datetime\n",
|
||||||
|
"\n",
|
||||||
|
"# Load data\n",
|
||||||
|
"df = pd.read_csv('../data/raw/fraudTrain.csv')\n",
|
||||||
|
"\n",
|
||||||
|
"# Basic preprocessing\n",
|
||||||
|
"df['trans_date_trans_time'] = pd.to_datetime(df['trans_date_trans_time'])\n",
|
||||||
|
"df['dob'] = pd.to_datetime(df['dob'])\n",
|
||||||
|
"\n",
|
||||||
|
"# Experiment 1: Basic Features\n",
|
||||||
|
"def create_basic_features(df):\n",
|
||||||
|
" # Time-based features\n",
|
||||||
|
" df['hour'] = df['trans_date_trans_time'].dt.hour\n",
|
||||||
|
" df['day_of_week'] = df['trans_date_trans_time'].dt.dayofweek\n",
|
||||||
|
" df['month'] = df['trans_date_trans_time'].dt.month\n",
|
||||||
|
" \n",
|
||||||
|
" # Age feature\n",
|
||||||
|
" df['dob'] = pd.to_datetime(df['dob'])\n",
|
||||||
|
" reference_date = pd.to_datetime('2020-06-21')\n",
|
||||||
|
" df['age'] = (reference_date - df['dob']).dt.days // 365\n",
|
||||||
|
" \n",
|
||||||
|
" # Distance between merchant and customer\n",
|
||||||
|
" df['distance'] = np.sqrt((df['merch_lat'] - df['lat'])**2 + (df['merch_long'] - df['long'])**2)\n",
|
||||||
|
" \n",
|
||||||
|
" # Categorical encoding\n",
|
||||||
|
" cat_cols = ['category', 'gender', 'state']\n",
|
||||||
|
" for col in cat_cols:\n",
|
||||||
|
" le = LabelEncoder()\n",
|
||||||
|
" df[col+'_encoded'] = le.fit_transform(df[col])\n",
|
||||||
|
" \n",
|
||||||
|
" return df\n",
|
||||||
|
"\n",
|
||||||
|
"# Experiment 2: Transaction Patterns\n",
|
||||||
|
"def create_transaction_patterns(df):\n",
|
||||||
|
" # Transaction frequency per customer\n",
|
||||||
|
" trans_count = df.groupby('cc_num')['trans_num'].count().reset_index()\n",
|
||||||
|
" trans_count.columns = ['cc_num', 'trans_count']\n",
|
||||||
|
" df = df.merge(trans_count, on='cc_num', how='left')\n",
|
||||||
|
" \n",
|
||||||
|
" # Average transaction amount per customer\n",
|
||||||
|
" avg_amount = df.groupby('cc_num')['amt'].mean().reset_index()\n",
|
||||||
|
" avg_amount.columns = ['cc_num', 'avg_trans_amount']\n",
|
||||||
|
" df = df.merge(avg_amount, on='cc_num', how='left')\n",
|
||||||
|
" \n",
|
||||||
|
" # Difference from average amount\n",
|
||||||
|
" df['amt_diff_from_avg'] = df['amt'] - df['avg_trans_amount']\n",
|
||||||
|
" \n",
|
||||||
|
" return df\n",
|
||||||
|
"\n",
|
||||||
|
"# Experiment 3: Time-based Features\n",
|
||||||
|
"def create_time_features(df):\n",
|
||||||
|
" # Time since last transaction\n",
|
||||||
|
" df = df.sort_values(['cc_num', 'trans_date_trans_time'])\n",
|
||||||
|
" df['time_since_last'] = df.groupby('cc_num')['trans_date_trans_time'].diff().dt.total_seconds() / 60\n",
|
||||||
|
" \n",
|
||||||
|
" # Fill NA for first transactions\n",
|
||||||
|
" df['time_since_last'] = df['time_since_last'].fillna(24*60) # Assume 24 hours if first transaction\n",
|
||||||
|
" \n",
|
||||||
|
" # Transaction velocity (transactions per hour)\n",
|
||||||
|
" df['trans_velocity'] = 60 / df['time_since_last'] # transactions per hour\n",
|
||||||
|
" \n",
|
||||||
|
" return df\n",
|
||||||
|
"\n",
|
||||||
|
"# Experiment 4: Merchant Behavior\n",
|
||||||
|
"def create_merchant_features(df):\n",
|
||||||
|
" # Merchant transaction count\n",
|
||||||
|
" merchant_counts = df['merchant'].value_counts().reset_index()\n",
|
||||||
|
" merchant_counts.columns = ['merchant', 'merchant_trans_count']\n",
|
||||||
|
" df = df.merge(merchant_counts, on='merchant', how='left')\n",
|
||||||
|
" \n",
|
||||||
|
" # Merchant fraud rate\n",
|
||||||
|
" merchant_fraud = df.groupby('merchant')['is_fraud'].mean().reset_index()\n",
|
||||||
|
" merchant_fraud.columns = ['merchant', 'merchant_fraud_rate']\n",
|
||||||
|
" df = df.merge(merchant_fraud, on='merchant', how='left')\n",
|
||||||
|
" \n",
|
||||||
|
" return df\n",
|
||||||
|
"\n",
|
||||||
|
"# Apply all feature engineering steps\n",
|
||||||
|
"df_features = create_basic_features(df)\n",
|
||||||
|
"df_features = create_transaction_patterns(df_features)\n",
|
||||||
|
"df_features = create_time_features(df_features)\n",
|
||||||
|
"df_features = create_merchant_features(df_features)\n",
|
||||||
|
"\n",
|
||||||
|
"# Select final features\n",
|
||||||
|
"features = ['amt', 'hour', 'day_of_week', 'month', 'age', 'distance',\n",
|
||||||
|
" 'category_encoded', 'gender_encoded', 'state_encoded',\n",
|
||||||
|
" 'trans_count', 'avg_trans_amount', 'amt_diff_from_avg',\n",
|
||||||
|
" 'time_since_last', 'trans_velocity', 'merchant_trans_count',\n",
|
||||||
|
" 'merchant_fraud_rate', 'city_pop']\n",
|
||||||
|
"\n",
|
||||||
|
"X = df_features[features]\n",
|
||||||
|
"y = df_features['is_fraud']\n",
|
||||||
|
"\n",
|
||||||
|
"# Split data\n",
|
||||||
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)\n",
|
||||||
|
"\n",
|
||||||
|
"X_train.replace([np.inf, -np.inf], np.nan, inplace=True)\n",
|
||||||
|
"X_test.replace([np.inf, -np.inf], np.nan, inplace=True)\n",
|
||||||
|
"X_train.dropna(inplace=True)\n",
|
||||||
|
"# Scale numerical features\n",
|
||||||
|
"scaler = StandardScaler()\n",
|
||||||
|
"X_train_scaled = scaler.fit_transform(X_train)\n",
|
||||||
|
"X_test_scaled = scaler.transform(X_test)\n",
|
||||||
|
"\n",
|
||||||
|
"# Save processed data for modeling\n",
|
||||||
|
"pd.DataFrame(X_train_scaled, columns=features).to_csv('X_train.csv', index=False)\n",
|
||||||
|
"pd.DataFrame(X_test_scaled, columns=features).to_csv('X_test.csv', index=False)\n",
|
||||||
|
"y_train.to_csv('y_train.csv', index=False)\n",
|
||||||
|
"y_test.to_csv('y_test.csv', index=False)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": ".venv",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.11.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,215 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 5,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"Class distribution in training set:\n",
|
||||||
|
"is_fraud\n",
|
||||||
|
"0 902418\n",
|
||||||
|
"1 5254\n",
|
||||||
|
"Name: count, dtype: int64\n",
|
||||||
|
"\n",
|
||||||
|
"Class distribution in test set:\n",
|
||||||
|
"is_fraud\n",
|
||||||
|
"0 386751\n",
|
||||||
|
"1 2252\n",
|
||||||
|
"Name: count, dtype: int64\n",
|
||||||
|
"📊 Evaluating Baseline Models:\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "stderr",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"c:\\Users\\babaw\\Documents\\Work\\Mana Knight Digital\\task_fraud_detection\\.venv\\Lib\\site-packages\\sklearn\\utils\\validation.py:1408: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
|
||||||
|
" y = column_or_1d(y, warn=True)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"ename": "ValueError",
|
||||||
|
"evalue": "Found input variables with inconsistent numbers of samples: [907658, 907672]",
|
||||||
|
"output_type": "error",
|
||||||
|
"traceback": [
|
||||||
|
"\u001b[31m---------------------------------------------------------------------------\u001b[39m",
|
||||||
|
"\u001b[31mValueError\u001b[39m Traceback (most recent call last)",
|
||||||
|
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[5]\u001b[39m\u001b[32m, line 80\u001b[39m\n\u001b[32m 78\u001b[39m \u001b[38;5;28mprint\u001b[39m(\u001b[33m\"\u001b[39m\u001b[33m📊 Evaluating Baseline Models:\u001b[39m\u001b[33m\"\u001b[39m)\n\u001b[32m 79\u001b[39m \u001b[38;5;28;01mfor\u001b[39;00m model \u001b[38;5;129;01min\u001b[39;00m models:\n\u001b[32m---> \u001b[39m\u001b[32m80\u001b[39m \u001b[43mevaluate_model\u001b[49m\u001b[43m(\u001b[49m\u001b[43mmodel\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mX_train\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mX_test\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my_train\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my_test\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 82\u001b[39m \u001b[38;5;66;03m# ⚖️ SMOTE Experiment\u001b[39;00m\n\u001b[32m 83\u001b[39m \u001b[38;5;28mprint\u001b[39m(\u001b[33m\"\u001b[39m\u001b[38;5;130;01m\\n\u001b[39;00m\u001b[33m📈 Experiment with SMOTE for class imbalance:\u001b[39m\u001b[33m\"\u001b[39m)\n",
|
||||||
|
"\u001b[36mCell\u001b[39m\u001b[36m \u001b[39m\u001b[32mIn[5]\u001b[39m\u001b[32m, line 39\u001b[39m, in \u001b[36mevaluate_model\u001b[39m\u001b[34m(model, X_train, X_test, y_train, y_test)\u001b[39m\n\u001b[32m 38\u001b[39m \u001b[38;5;28;01mdef\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[34mevaluate_model\u001b[39m(model, X_train, X_test, y_train, y_test):\n\u001b[32m---> \u001b[39m\u001b[32m39\u001b[39m \u001b[43mmodel\u001b[49m\u001b[43m.\u001b[49m\u001b[43mfit\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX_train\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my_train\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 40\u001b[39m y_pred = model.predict(X_test)\n\u001b[32m 41\u001b[39m y_prob = model.predict_proba(X_test)[:, \u001b[32m1\u001b[39m]\n",
|
||||||
|
"\u001b[36mFile \u001b[39m\u001b[32mc:\\Users\\babaw\\Documents\\Work\\Mana Knight Digital\\task_fraud_detection\\.venv\\Lib\\site-packages\\sklearn\\base.py:1389\u001b[39m, in \u001b[36m_fit_context.<locals>.decorator.<locals>.wrapper\u001b[39m\u001b[34m(estimator, *args, **kwargs)\u001b[39m\n\u001b[32m 1382\u001b[39m estimator._validate_params()\n\u001b[32m 1384\u001b[39m \u001b[38;5;28;01mwith\u001b[39;00m config_context(\n\u001b[32m 1385\u001b[39m skip_parameter_validation=(\n\u001b[32m 1386\u001b[39m prefer_skip_nested_validation \u001b[38;5;129;01mor\u001b[39;00m global_skip_validation\n\u001b[32m 1387\u001b[39m )\n\u001b[32m 1388\u001b[39m ):\n\u001b[32m-> \u001b[39m\u001b[32m1389\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfit_method\u001b[49m\u001b[43m(\u001b[49m\u001b[43mestimator\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
|
||||||
|
"\u001b[36mFile \u001b[39m\u001b[32mc:\\Users\\babaw\\Documents\\Work\\Mana Knight Digital\\task_fraud_detection\\.venv\\Lib\\site-packages\\sklearn\\linear_model\\_logistic.py:1222\u001b[39m, in \u001b[36mLogisticRegression.fit\u001b[39m\u001b[34m(self, X, y, sample_weight)\u001b[39m\n\u001b[32m 1219\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m 1220\u001b[39m _dtype = [np.float64, np.float32]\n\u001b[32m-> \u001b[39m\u001b[32m1222\u001b[39m X, y = \u001b[43mvalidate_data\u001b[49m\u001b[43m(\u001b[49m\n\u001b[32m 1223\u001b[39m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[32m 1224\u001b[39m \u001b[43m \u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1225\u001b[39m \u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1226\u001b[39m \u001b[43m \u001b[49m\u001b[43maccept_sparse\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mcsr\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[32m 1227\u001b[39m \u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m=\u001b[49m\u001b[43m_dtype\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1228\u001b[39m \u001b[43m \u001b[49m\u001b[43morder\u001b[49m\u001b[43m=\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mC\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[32m 1229\u001b[39m \u001b[43m \u001b[49m\u001b[43maccept_large_sparse\u001b[49m\u001b[43m=\u001b[49m\u001b[43msolver\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;129;43;01mnot\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;129;43;01min\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43m[\u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43mliblinear\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43msag\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[33;43m\"\u001b[39;49m\u001b[33;43msaga\u001b[39;49m\u001b[33;43m\"\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[32m 1230\u001b[39m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1231\u001b[39m check_classification_targets(y)\n\u001b[32m 1232\u001b[39m \u001b[38;5;28mself\u001b[39m.classes_ = np.unique(y)\n",
|
||||||
|
"\u001b[36mFile \u001b[39m\u001b[32mc:\\Users\\babaw\\Documents\\Work\\Mana Knight Digital\\task_fraud_detection\\.venv\\Lib\\site-packages\\sklearn\\utils\\validation.py:2961\u001b[39m, in \u001b[36mvalidate_data\u001b[39m\u001b[34m(_estimator, X, y, reset, validate_separately, skip_check_array, **check_params)\u001b[39m\n\u001b[32m 2959\u001b[39m y = check_array(y, input_name=\u001b[33m\"\u001b[39m\u001b[33my\u001b[39m\u001b[33m\"\u001b[39m, **check_y_params)\n\u001b[32m 2960\u001b[39m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[32m-> \u001b[39m\u001b[32m2961\u001b[39m X, y = \u001b[43mcheck_X_y\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43m*\u001b[49m\u001b[43m*\u001b[49m\u001b[43mcheck_params\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 2962\u001b[39m out = X, y\n\u001b[32m 2964\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m no_val_X \u001b[38;5;129;01mand\u001b[39;00m check_params.get(\u001b[33m\"\u001b[39m\u001b[33mensure_2d\u001b[39m\u001b[33m\"\u001b[39m, \u001b[38;5;28;01mTrue\u001b[39;00m):\n",
|
||||||
|
"\u001b[36mFile \u001b[39m\u001b[32mc:\\Users\\babaw\\Documents\\Work\\Mana Knight Digital\\task_fraud_detection\\.venv\\Lib\\site-packages\\sklearn\\utils\\validation.py:1389\u001b[39m, in \u001b[36mcheck_X_y\u001b[39m\u001b[34m(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_writeable, force_all_finite, ensure_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)\u001b[39m\n\u001b[32m 1370\u001b[39m X = check_array(\n\u001b[32m 1371\u001b[39m X,\n\u001b[32m 1372\u001b[39m accept_sparse=accept_sparse,\n\u001b[32m (...)\u001b[39m\u001b[32m 1384\u001b[39m input_name=\u001b[33m\"\u001b[39m\u001b[33mX\u001b[39m\u001b[33m\"\u001b[39m,\n\u001b[32m 1385\u001b[39m )\n\u001b[32m 1387\u001b[39m y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)\n\u001b[32m-> \u001b[39m\u001b[32m1389\u001b[39m \u001b[43mcheck_consistent_length\u001b[49m\u001b[43m(\u001b[49m\u001b[43mX\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m)\u001b[49m\n\u001b[32m 1391\u001b[39m \u001b[38;5;28;01mreturn\u001b[39;00m X, y\n",
|
||||||
|
"\u001b[36mFile \u001b[39m\u001b[32mc:\\Users\\babaw\\Documents\\Work\\Mana Knight Digital\\task_fraud_detection\\.venv\\Lib\\site-packages\\sklearn\\utils\\validation.py:475\u001b[39m, in \u001b[36mcheck_consistent_length\u001b[39m\u001b[34m(*arrays)\u001b[39m\n\u001b[32m 473\u001b[39m uniques = np.unique(lengths)\n\u001b[32m 474\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(uniques) > \u001b[32m1\u001b[39m:\n\u001b[32m--> \u001b[39m\u001b[32m475\u001b[39m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[32m 476\u001b[39m \u001b[33m\"\u001b[39m\u001b[33mFound input variables with inconsistent numbers of samples: \u001b[39m\u001b[38;5;132;01m%r\u001b[39;00m\u001b[33m\"\u001b[39m\n\u001b[32m 477\u001b[39m % [\u001b[38;5;28mint\u001b[39m(l) \u001b[38;5;28;01mfor\u001b[39;00m l \u001b[38;5;129;01min\u001b[39;00m lengths]\n\u001b[32m 478\u001b[39m )\n",
|
||||||
|
"\u001b[31mValueError\u001b[39m: Found input variables with inconsistent numbers of samples: [907658, 907672]"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"# model_training_experiment.ipynb\n",
|
||||||
|
"\n",
|
||||||
|
"# 📦 Import libraries\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"import seaborn as sns\n",
|
||||||
|
"\n",
|
||||||
|
"from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold\n",
|
||||||
|
"from sklearn.preprocessing import StandardScaler\n",
|
||||||
|
"from sklearn.metrics import (\n",
|
||||||
|
" accuracy_score, precision_score, recall_score, \n",
|
||||||
|
" f1_score, roc_auc_score, confusion_matrix, \n",
|
||||||
|
" classification_report, roc_curve\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"from sklearn.linear_model import LogisticRegression\n",
|
||||||
|
"from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier\n",
|
||||||
|
"from xgboost import XGBClassifier\n",
|
||||||
|
"\n",
|
||||||
|
"from imblearn.over_sampling import SMOTE\n",
|
||||||
|
"from imblearn.pipeline import Pipeline as ImbPipeline\n",
|
||||||
|
"import joblib\n",
|
||||||
|
"\n",
|
||||||
|
"# 📂 Load processed data\n",
|
||||||
|
"X_train = pd.read_csv('X_train.csv')\n",
|
||||||
|
"X_test = pd.read_csv('X_test.csv')\n",
|
||||||
|
"y_train = pd.read_csv('y_train.csv')\n",
|
||||||
|
"y_test = pd.read_csv('y_test.csv')\n",
|
||||||
|
"\n",
|
||||||
|
"# 🧪 Check class distribution\n",
|
||||||
|
"print(\"Class distribution in training set:\")\n",
|
||||||
|
"print(y_train.value_counts())\n",
|
||||||
|
"print(\"\\nClass distribution in test set:\")\n",
|
||||||
|
"print(y_test.value_counts())\n",
|
||||||
|
"\n",
|
||||||
|
"# ⚙️ Evaluation Function\n",
|
||||||
|
"def evaluate_model(model, X_train, X_test, y_train, y_test):\n",
|
||||||
|
" model.fit(X_train, y_train)\n",
|
||||||
|
" y_pred = model.predict(X_test)\n",
|
||||||
|
" y_prob = model.predict_proba(X_test)[:, 1]\n",
|
||||||
|
"\n",
|
||||||
|
" print(f\"\\n🔍 Model: {model.__class__.__name__}\")\n",
|
||||||
|
" print(\"Accuracy:\", accuracy_score(y_test, y_pred))\n",
|
||||||
|
" print(\"Precision:\", precision_score(y_test, y_pred))\n",
|
||||||
|
" print(\"Recall:\", recall_score(y_test, y_pred))\n",
|
||||||
|
" print(\"F1 Score:\", f1_score(y_test, y_pred))\n",
|
||||||
|
" print(\"ROC AUC:\", roc_auc_score(y_test, y_prob))\n",
|
||||||
|
"\n",
|
||||||
|
" # Confusion Matrix\n",
|
||||||
|
" cm = confusion_matrix(y_test, y_pred)\n",
|
||||||
|
" sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')\n",
|
||||||
|
" plt.title('Confusion Matrix')\n",
|
||||||
|
" plt.xlabel('Predicted')\n",
|
||||||
|
" plt.ylabel('Actual')\n",
|
||||||
|
" plt.show()\n",
|
||||||
|
"\n",
|
||||||
|
" # ROC Curve\n",
|
||||||
|
" fpr, tpr, _ = roc_curve(y_test, y_prob)\n",
|
||||||
|
" plt.plot(fpr, tpr, label=\"ROC Curve\")\n",
|
||||||
|
" plt.plot([0, 1], [0, 1], 'k--')\n",
|
||||||
|
" plt.xlabel('False Positive Rate')\n",
|
||||||
|
" plt.ylabel('True Positive Rate')\n",
|
||||||
|
" plt.title('ROC Curve')\n",
|
||||||
|
" plt.legend()\n",
|
||||||
|
" plt.show()\n",
|
||||||
|
" \n",
|
||||||
|
" return model\n",
|
||||||
|
"\n",
|
||||||
|
"# ⚗️ Baseline Models\n",
|
||||||
|
"models = [\n",
|
||||||
|
" LogisticRegression(max_iter=1000, random_state=42),\n",
|
||||||
|
" RandomForestClassifier(random_state=42),\n",
|
||||||
|
" GradientBoostingClassifier(random_state=42),\n",
|
||||||
|
" XGBClassifier(random_state=42, eval_metric='logloss')\n",
|
||||||
|
"]\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"📊 Evaluating Baseline Models:\")\n",
|
||||||
|
"for model in models:\n",
|
||||||
|
" evaluate_model(model, X_train, X_test, y_train, y_test)\n",
|
||||||
|
"\n",
|
||||||
|
"# ⚖️ SMOTE Experiment\n",
|
||||||
|
"print(\"\\n📈 Experiment with SMOTE for class imbalance:\")\n",
|
||||||
|
"smote_pipeline = ImbPipeline([\n",
|
||||||
|
" ('smote', SMOTE(random_state=42)),\n",
|
||||||
|
" ('model', LogisticRegression(max_iter=1000, random_state=42))\n",
|
||||||
|
"])\n",
|
||||||
|
"evaluate_model(smote_pipeline, X_train, X_test, y_train, y_test)\n",
|
||||||
|
"\n",
|
||||||
|
"# 🔍 Hyperparameter Tuning (XGBoost)\n",
|
||||||
|
"print(\"\\n🔧 Hyperparameter tuning for XGBoost:\")\n",
|
||||||
|
"param_grid = {\n",
|
||||||
|
" 'model__n_estimators': [100, 200],\n",
|
||||||
|
" 'model__max_depth': [3, 5, 7],\n",
|
||||||
|
" 'model__learning_rate': [0.01, 0.1],\n",
|
||||||
|
" 'model__subsample': [0.8, 1.0],\n",
|
||||||
|
" 'model__colsample_bytree': [0.8, 1.0]\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"grid_pipeline = ImbPipeline([\n",
|
||||||
|
" ('smote', SMOTE(random_state=42)),\n",
|
||||||
|
" ('model', XGBClassifier(random_state=42, eval_metric='logloss'))\n",
|
||||||
|
"])\n",
|
||||||
|
"\n",
|
||||||
|
"cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42)\n",
|
||||||
|
"grid_search = GridSearchCV(grid_pipeline, param_grid, cv=cv, scoring='roc_auc', n_jobs=-1, verbose=1)\n",
|
||||||
|
"grid_search.fit(X_train, y_train)\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"Best parameters:\", grid_search.best_params_)\n",
|
||||||
|
"print(\"Best ROC AUC from CV:\", grid_search.best_score_)\n",
|
||||||
|
"\n",
|
||||||
|
"# 🏆 Evaluate Best Model\n",
|
||||||
|
"best_model = grid_search.best_estimator_\n",
|
||||||
|
"evaluate_model(best_model, X_train, X_test, y_train, y_test)\n",
|
||||||
|
"\n",
|
||||||
|
"# 🌟 Feature Importance\n",
|
||||||
|
"model_step = best_model.named_steps['model']\n",
|
||||||
|
"if hasattr(model_step, 'feature_importances_'):\n",
|
||||||
|
" importances = model_step.feature_importances_\n",
|
||||||
|
" features = X_train.columns\n",
|
||||||
|
" feature_importance = pd.DataFrame({'Feature': features, 'Importance': importances})\n",
|
||||||
|
" feature_importance = feature_importance.sort_values('Importance', ascending=False)\n",
|
||||||
|
"\n",
|
||||||
|
" plt.figure(figsize=(12, 8))\n",
|
||||||
|
" sns.barplot(x='Importance', y='Feature', data=feature_importance)\n",
|
||||||
|
" plt.title('Feature Importance')\n",
|
||||||
|
" plt.show()\n",
|
||||||
|
"\n",
|
||||||
|
"# 💾 Save Best Model\n",
|
||||||
|
"joblib.dump(best_model, 'best_fraud_detection_model.pkl')\n",
|
||||||
|
"print(\"✅ Best model saved as 'best_fraud_detection_model.pkl'\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": ".venv",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.11.4"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,13 @@
|
|||||||
|
numpy
|
||||||
|
pandas
|
||||||
|
scikit-learn
|
||||||
|
matplotlib
|
||||||
|
seaborn
|
||||||
|
fastapi
|
||||||
|
uvicorn
|
||||||
|
python-multipart
|
||||||
|
pydantic
|
||||||
|
joblib
|
||||||
|
xgboost
|
||||||
|
streamlit
|
||||||
|
python-dotenv
|
||||||
@@ -0,0 +1,96 @@
|
|||||||
|
from fastapi import FastAPI, HTTPException
|
||||||
|
from pydantic import BaseModel
|
||||||
|
import pandas as pd
|
||||||
|
import numpy as np
|
||||||
|
import joblib
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from config import MODELS_DIR
|
||||||
|
from data_preprocessing import prepare_data
|
||||||
|
|
||||||
|
app = FastAPI(title="Fraud Detection API",
|
||||||
|
description="API for detecting fraudulent transactions",
|
||||||
|
version="1.0.0")
|
||||||
|
|
||||||
|
class Transaction(BaseModel):
|
||||||
|
trans_date_trans_time: str
|
||||||
|
cc_num: str
|
||||||
|
merchant: str
|
||||||
|
category: str
|
||||||
|
amt: float
|
||||||
|
first: str
|
||||||
|
last: str
|
||||||
|
gender: str
|
||||||
|
street: str
|
||||||
|
city: str
|
||||||
|
state: str
|
||||||
|
zip: str
|
||||||
|
lat: float
|
||||||
|
long: float
|
||||||
|
city_pop: int
|
||||||
|
job: str
|
||||||
|
dob: str
|
||||||
|
trans_num: str
|
||||||
|
unix_time: int
|
||||||
|
merch_lat: float
|
||||||
|
merch_long: float
|
||||||
|
|
||||||
|
class PredictionResponse(BaseModel):
|
||||||
|
is_fraud: bool
|
||||||
|
fraud_probability: float
|
||||||
|
confidence: str
|
||||||
|
|
||||||
|
def load_model():
|
||||||
|
"""Load the trained model and preprocessor."""
|
||||||
|
try:
|
||||||
|
model = joblib.load(MODELS_DIR / "fraud_model.joblib")
|
||||||
|
preprocessor = joblib.load(MODELS_DIR / "preprocessor.joblib")
|
||||||
|
return model, preprocessor
|
||||||
|
except FileNotFoundError:
|
||||||
|
raise HTTPException(status_code=500, detail="Model not found. Please train the model first.")
|
||||||
|
|
||||||
|
def get_confidence_level(probability: float) -> str:
|
||||||
|
"""Convert probability to confidence level."""
|
||||||
|
if probability >= 0.9:
|
||||||
|
return "Very High"
|
||||||
|
elif probability >= 0.7:
|
||||||
|
return "High"
|
||||||
|
elif probability >= 0.5:
|
||||||
|
return "Medium"
|
||||||
|
else:
|
||||||
|
return "Low"
|
||||||
|
|
||||||
|
@app.get("/")
|
||||||
|
async def root():
|
||||||
|
return {"message": "Welcome to the Fraud Detection API"}
|
||||||
|
|
||||||
|
@app.post("/predict", response_model=PredictionResponse)
|
||||||
|
async def predict(transaction: Transaction):
|
||||||
|
"""Predict whether a transaction is fraudulent."""
|
||||||
|
try:
|
||||||
|
# Load model and preprocessor
|
||||||
|
model, preprocessor = load_model()
|
||||||
|
|
||||||
|
# Convert transaction to DataFrame
|
||||||
|
transaction_dict = transaction.dict()
|
||||||
|
df = pd.DataFrame([transaction_dict])
|
||||||
|
|
||||||
|
# Prepare data for prediction
|
||||||
|
X, _, _ = prepare_data(df, preprocessor=preprocessor)
|
||||||
|
|
||||||
|
# Make prediction
|
||||||
|
probability = model.predict_proba(X)[0, 1]
|
||||||
|
is_fraud = probability >= 0.5
|
||||||
|
|
||||||
|
return PredictionResponse(
|
||||||
|
is_fraud=bool(is_fraud),
|
||||||
|
fraud_probability=float(probability),
|
||||||
|
confidence=get_confidence_level(probability)
|
||||||
|
)
|
||||||
|
except Exception as e:
|
||||||
|
raise HTTPException(status_code=500, detail=str(e))
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
import uvicorn
|
||||||
|
uvicorn.run(app, host="0.0.0.0", port=8000)
|
||||||
@@ -0,0 +1,26 @@
|
|||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Project paths
|
||||||
|
ROOT_DIR = Path(__file__).parent.parent
|
||||||
|
DATA_DIR = ROOT_DIR / "data"
|
||||||
|
RAW_DATA_DIR = DATA_DIR / "raw"
|
||||||
|
PROCESSED_DATA_DIR = DATA_DIR / "processed"
|
||||||
|
MODELS_DIR = ROOT_DIR / "models"
|
||||||
|
|
||||||
|
# Data files
|
||||||
|
TRAIN_DATA_PATH = RAW_DATA_DIR / "fraudTrain.csv"
|
||||||
|
TEST_DATA_PATH = RAW_DATA_DIR / "fraudTest.csv"
|
||||||
|
|
||||||
|
# Model parameters
|
||||||
|
RANDOM_STATE = 42
|
||||||
|
TEST_SIZE = 0.2
|
||||||
|
|
||||||
|
# Feature engineering parameters
|
||||||
|
CATEGORICAL_FEATURES = ['merchant', 'category', 'gender', 'job', 'state']
|
||||||
|
NUMERICAL_FEATURES = ['amt', 'lat', 'long', 'city_pop', 'merch_lat', 'merch_long']
|
||||||
|
TIME_FEATURES = ['trans_date_trans_time']
|
||||||
|
|
||||||
|
# API settings
|
||||||
|
API_HOST = "0.0.0.0"
|
||||||
|
API_PORT = 8000
|
||||||
@@ -0,0 +1,112 @@
|
|||||||
|
import pandas as pd
|
||||||
|
import numpy as np
|
||||||
|
from datetime import datetime
|
||||||
|
from sklearn.preprocessing import StandardScaler, OneHotEncoder
|
||||||
|
from sklearn.compose import ColumnTransformer
|
||||||
|
from sklearn.pipeline import Pipeline
|
||||||
|
import joblib
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from config import (
|
||||||
|
CATEGORICAL_FEATURES,
|
||||||
|
NUMERICAL_FEATURES,
|
||||||
|
TIME_FEATURES,
|
||||||
|
PROCESSED_DATA_DIR,
|
||||||
|
MODELS_DIR
|
||||||
|
)
|
||||||
|
|
||||||
|
def calculate_distance(lat1, lon1, lat2, lon2):
|
||||||
|
"""Calculate the Haversine distance between two points."""
|
||||||
|
R = 6371 # Earth's radius in kilometers
|
||||||
|
lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
|
||||||
|
dlat = lat2 - lat1
|
||||||
|
dlon = lon2 - lon1
|
||||||
|
a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
|
||||||
|
c = 2 * np.arcsin(np.sqrt(a))
|
||||||
|
return R * c
|
||||||
|
|
||||||
|
def extract_time_features(df):
|
||||||
|
"""Extract time-based features from transaction timestamp."""
|
||||||
|
df['trans_date_trans_time'] = pd.to_datetime(df['trans_date_trans_time'])
|
||||||
|
df['hour'] = df['trans_date_trans_time'].dt.hour
|
||||||
|
df['day'] = df['trans_date_trans_time'].dt.day
|
||||||
|
df['weekday'] = df['trans_date_trans_time'].dt.weekday
|
||||||
|
df['month'] = df['trans_date_trans_time'].dt.month
|
||||||
|
return df
|
||||||
|
|
||||||
|
def calculate_age(dob):
|
||||||
|
"""Calculate age from date of birth."""
|
||||||
|
today = datetime.now()
|
||||||
|
return today.year - pd.to_datetime(dob).dt.year
|
||||||
|
|
||||||
|
def preprocess_data(df):
|
||||||
|
"""Preprocess the input dataframe."""
|
||||||
|
# Create a copy to avoid modifying the original
|
||||||
|
df = df.copy()
|
||||||
|
|
||||||
|
# Extract time features
|
||||||
|
df = extract_time_features(df)
|
||||||
|
|
||||||
|
# Calculate age
|
||||||
|
df['age'] = calculate_age(df['dob'])
|
||||||
|
|
||||||
|
# Calculate distance between user and merchant
|
||||||
|
df['distance'] = calculate_distance(
|
||||||
|
df['lat'], df['long'],
|
||||||
|
df['merch_lat'], df['merch_long']
|
||||||
|
)
|
||||||
|
|
||||||
|
# Drop unnecessary columns
|
||||||
|
columns_to_drop = ['trans_date_trans_time', 'first', 'last', 'street', 'city',
|
||||||
|
'zip', 'trans_num', 'unix_time', 'dob', 'cc_num']
|
||||||
|
df = df.drop(columns=columns_to_drop, errors='ignore')
|
||||||
|
|
||||||
|
return df
|
||||||
|
|
||||||
|
def create_preprocessing_pipeline():
|
||||||
|
"""Create and return a preprocessing pipeline."""
|
||||||
|
numeric_transformer = Pipeline(steps=[
|
||||||
|
('scaler', StandardScaler())
|
||||||
|
])
|
||||||
|
|
||||||
|
categorical_transformer = Pipeline(steps=[
|
||||||
|
('onehot', OneHotEncoder(handle_unknown='ignore'))
|
||||||
|
])
|
||||||
|
|
||||||
|
preprocessor = ColumnTransformer(
|
||||||
|
transformers=[
|
||||||
|
('num', numeric_transformer, NUMERICAL_FEATURES + ['age', 'distance', 'hour', 'day', 'weekday', 'month']),
|
||||||
|
('cat', categorical_transformer, CATEGORICAL_FEATURES)
|
||||||
|
])
|
||||||
|
|
||||||
|
return preprocessor
|
||||||
|
|
||||||
|
def save_preprocessor(preprocessor, filename='preprocessor.joblib'):
|
||||||
|
"""Save the preprocessor to disk."""
|
||||||
|
MODELS_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
joblib.dump(preprocessor, MODELS_DIR / filename)
|
||||||
|
|
||||||
|
def load_preprocessor(filename='preprocessor.joblib'):
|
||||||
|
"""Load the preprocessor from disk."""
|
||||||
|
return joblib.load(MODELS_DIR / filename)
|
||||||
|
|
||||||
|
def prepare_data(df, preprocessor=None, fit=False):
|
||||||
|
"""Prepare data for model training or prediction."""
|
||||||
|
# Preprocess the data
|
||||||
|
df_processed = preprocess_data(df)
|
||||||
|
|
||||||
|
# Separate features and target
|
||||||
|
X = df_processed.drop(columns=['is_fraud'], errors='ignore')
|
||||||
|
y = df_processed['is_fraud'] if 'is_fraud' in df_processed.columns else None
|
||||||
|
|
||||||
|
# Transform features
|
||||||
|
if preprocessor is None:
|
||||||
|
preprocessor = create_preprocessing_pipeline()
|
||||||
|
|
||||||
|
if fit:
|
||||||
|
X_transformed = preprocessor.fit_transform(X)
|
||||||
|
save_preprocessor(preprocessor)
|
||||||
|
else:
|
||||||
|
X_transformed = preprocessor.transform(X)
|
||||||
|
|
||||||
|
return X_transformed, y, preprocessor
|
||||||
@@ -0,0 +1,103 @@
|
|||||||
|
import pandas as pd
|
||||||
|
import numpy as np
|
||||||
|
from sklearn.model_selection import train_test_split
|
||||||
|
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
|
||||||
|
import xgboost as xgb
|
||||||
|
import joblib
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from config import (
|
||||||
|
TRAIN_DATA_PATH,
|
||||||
|
TEST_DATA_PATH,
|
||||||
|
MODELS_DIR,
|
||||||
|
RANDOM_STATE,
|
||||||
|
TEST_SIZE
|
||||||
|
)
|
||||||
|
from data_preprocessing import prepare_data
|
||||||
|
|
||||||
|
def load_data():
|
||||||
|
"""Load and prepare the training and test data."""
|
||||||
|
# Load data
|
||||||
|
train_df = pd.read_csv(TRAIN_DATA_PATH)
|
||||||
|
test_df = pd.read_csv(TEST_DATA_PATH)
|
||||||
|
|
||||||
|
# Prepare training data
|
||||||
|
X_train, y_train, preprocessor = prepare_data(train_df, fit=True)
|
||||||
|
|
||||||
|
# Prepare test data
|
||||||
|
X_test, y_test, _ = prepare_data(test_df, preprocessor=preprocessor)
|
||||||
|
|
||||||
|
return X_train, y_train, X_test, y_test
|
||||||
|
|
||||||
|
def train_model(X_train, y_train):
|
||||||
|
"""Train the XGBoost model."""
|
||||||
|
# Define model parameters
|
||||||
|
params = {
|
||||||
|
'objective': 'binary:logistic',
|
||||||
|
'eval_metric': 'auc',
|
||||||
|
'max_depth': 6,
|
||||||
|
'learning_rate': 0.1,
|
||||||
|
'n_estimators': 100,
|
||||||
|
'subsample': 0.8,
|
||||||
|
'colsample_bytree': 0.8,
|
||||||
|
'random_state': RANDOM_STATE
|
||||||
|
}
|
||||||
|
|
||||||
|
# Create and train the model
|
||||||
|
model = xgb.XGBClassifier(**params)
|
||||||
|
model.fit(X_train, y_train)
|
||||||
|
|
||||||
|
return model
|
||||||
|
|
||||||
|
def evaluate_model(model, X_test, y_test):
|
||||||
|
"""Evaluate the model performance."""
|
||||||
|
# Make predictions
|
||||||
|
y_pred = model.predict(X_test)
|
||||||
|
y_pred_proba = model.predict_proba(X_test)[:, 1]
|
||||||
|
|
||||||
|
# Calculate metrics
|
||||||
|
print("Classification Report:")
|
||||||
|
print(classification_report(y_test, y_pred))
|
||||||
|
|
||||||
|
print("\nConfusion Matrix:")
|
||||||
|
print(confusion_matrix(y_test, y_pred))
|
||||||
|
|
||||||
|
print("\nROC AUC Score:", roc_auc_score(y_test, y_pred_proba))
|
||||||
|
|
||||||
|
return {
|
||||||
|
'classification_report': classification_report(y_test, y_pred, output_dict=True),
|
||||||
|
'confusion_matrix': confusion_matrix(y_test, y_pred).tolist(),
|
||||||
|
'roc_auc_score': roc_auc_score(y_test, y_pred_proba)
|
||||||
|
}
|
||||||
|
|
||||||
|
def save_model(model, metrics, filename='fraud_model.joblib'):
|
||||||
|
"""Save the trained model and its metrics."""
|
||||||
|
MODELS_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Save the model
|
||||||
|
joblib.dump(model, MODELS_DIR / filename)
|
||||||
|
|
||||||
|
# Save metrics
|
||||||
|
metrics_file = MODELS_DIR / 'model_metrics.json'
|
||||||
|
import json
|
||||||
|
with open(metrics_file, 'w') as f:
|
||||||
|
json.dump(metrics, f)
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Main function to train and evaluate the model."""
|
||||||
|
print("Loading data...")
|
||||||
|
X_train, y_train, X_test, y_test = load_data()
|
||||||
|
|
||||||
|
print("Training model...")
|
||||||
|
model = train_model(X_train, y_train)
|
||||||
|
|
||||||
|
print("Evaluating model...")
|
||||||
|
metrics = evaluate_model(model, X_test, y_test)
|
||||||
|
|
||||||
|
print("Saving model and metrics...")
|
||||||
|
save_model(model, metrics)
|
||||||
|
|
||||||
|
print("Training completed successfully!")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
+129
@@ -0,0 +1,129 @@
|
|||||||
|
import streamlit as st
|
||||||
|
import pandas as pd
|
||||||
|
import requests
|
||||||
|
import json
|
||||||
|
from datetime import datetime
|
||||||
|
import random
|
||||||
|
|
||||||
|
# API endpoint
|
||||||
|
API_URL = "http://localhost:8000/predict"
|
||||||
|
|
||||||
|
# Sample data for testing
|
||||||
|
SAMPLE_TRANSACTION = {
|
||||||
|
"trans_date_trans_time": "2020-06-21 12:14:25",
|
||||||
|
"cc_num": "1234567890123456",
|
||||||
|
"merchant": "fraud_Rippin, Kub and Mann",
|
||||||
|
"category": "misc_net",
|
||||||
|
"amt": 4.97,
|
||||||
|
"first": "Jennifer",
|
||||||
|
"last": "Banks",
|
||||||
|
"gender": "F",
|
||||||
|
"street": "561 Perry Cove",
|
||||||
|
"city": "Moravian Falls",
|
||||||
|
"state": "NC",
|
||||||
|
"zip": "28654",
|
||||||
|
"lat": 36.0788,
|
||||||
|
"long": -81.1781,
|
||||||
|
"city_pop": 3495,
|
||||||
|
"job": "Psychologist, counselling",
|
||||||
|
"dob": "1988-03-09",
|
||||||
|
"trans_num": "0b242abb623afc578575680df30655b9",
|
||||||
|
"unix_time": 1371816885,
|
||||||
|
"merch_lat": 36.011293,
|
||||||
|
"merch_long": -82.048315
|
||||||
|
}
|
||||||
|
|
||||||
|
def main():
|
||||||
|
st.title("Fraud Detection System")
|
||||||
|
st.write("Enter transaction details to check for potential fraud.")
|
||||||
|
|
||||||
|
# Create form for transaction details
|
||||||
|
with st.form("transaction_form"):
|
||||||
|
col1, col2 = st.columns(2)
|
||||||
|
|
||||||
|
with col1:
|
||||||
|
st.subheader("Transaction Details")
|
||||||
|
trans_date = st.date_input("Transaction Date", datetime.now())
|
||||||
|
trans_time = st.time_input("Transaction Time", datetime.now().time())
|
||||||
|
merchant = st.text_input("Merchant", SAMPLE_TRANSACTION["merchant"])
|
||||||
|
category = st.text_input("Category", SAMPLE_TRANSACTION["category"])
|
||||||
|
amount = st.number_input("Amount", value=SAMPLE_TRANSACTION["amt"], min_value=0.0)
|
||||||
|
|
||||||
|
with col2:
|
||||||
|
st.subheader("Cardholder Details")
|
||||||
|
first_name = st.text_input("First Name", SAMPLE_TRANSACTION["first"])
|
||||||
|
last_name = st.text_input("Last Name", SAMPLE_TRANSACTION["last"])
|
||||||
|
gender = st.selectbox("Gender", ["M", "F"], index=1)
|
||||||
|
dob = st.date_input("Date of Birth", datetime.strptime(SAMPLE_TRANSACTION["dob"], "%Y-%m-%d"))
|
||||||
|
job = st.text_input("Job", SAMPLE_TRANSACTION["job"])
|
||||||
|
|
||||||
|
st.subheader("Location Details")
|
||||||
|
col3, col4 = st.columns(2)
|
||||||
|
|
||||||
|
with col3:
|
||||||
|
street = st.text_input("Street", SAMPLE_TRANSACTION["street"])
|
||||||
|
city = st.text_input("City", SAMPLE_TRANSACTION["city"])
|
||||||
|
state = st.text_input("State", SAMPLE_TRANSACTION["state"])
|
||||||
|
zip_code = st.text_input("ZIP Code", SAMPLE_TRANSACTION["zip"])
|
||||||
|
lat = st.number_input("Latitude", value=SAMPLE_TRANSACTION["lat"])
|
||||||
|
long = st.number_input("Longitude", value=SAMPLE_TRANSACTION["long"])
|
||||||
|
city_pop = st.number_input("City Population", value=SAMPLE_TRANSACTION["city_pop"])
|
||||||
|
|
||||||
|
with col4:
|
||||||
|
merch_lat = st.number_input("Merchant Latitude", value=SAMPLE_TRANSACTION["merch_lat"])
|
||||||
|
merch_long = st.number_input("Merchant Longitude", value=SAMPLE_TRANSACTION["merch_long"])
|
||||||
|
|
||||||
|
submitted = st.form_submit_button("Check for Fraud")
|
||||||
|
|
||||||
|
if submitted:
|
||||||
|
# Prepare transaction data
|
||||||
|
transaction = {
|
||||||
|
"trans_date_trans_time": f"{trans_date} {trans_time}",
|
||||||
|
"cc_num": str(random.randint(1000000000000000, 9999999999999999)),
|
||||||
|
"merchant": merchant,
|
||||||
|
"category": category,
|
||||||
|
"amt": float(amount),
|
||||||
|
"first": first_name,
|
||||||
|
"last": last_name,
|
||||||
|
"gender": gender,
|
||||||
|
"street": street,
|
||||||
|
"city": city,
|
||||||
|
"state": state,
|
||||||
|
"zip": zip_code,
|
||||||
|
"lat": float(lat),
|
||||||
|
"long": float(long),
|
||||||
|
"city_pop": int(city_pop),
|
||||||
|
"job": job,
|
||||||
|
"dob": dob.strftime("%Y-%m-%d"),
|
||||||
|
"trans_num": f"{random.getrandbits(128):032x}",
|
||||||
|
"unix_time": int(datetime.combine(trans_date, trans_time).timestamp()),
|
||||||
|
"merch_lat": float(merch_lat),
|
||||||
|
"merch_long": float(merch_long)
|
||||||
|
}
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Send request to API
|
||||||
|
response = requests.post(API_URL, json=transaction)
|
||||||
|
result = response.json()
|
||||||
|
|
||||||
|
# Display results
|
||||||
|
st.subheader("Fraud Detection Results")
|
||||||
|
|
||||||
|
if result["is_fraud"]:
|
||||||
|
st.error(f"⚠️ Fraudulent Transaction Detected!")
|
||||||
|
else:
|
||||||
|
st.success(f"✅ Legitimate Transaction")
|
||||||
|
|
||||||
|
st.write(f"Fraud Probability: {result['fraud_probability']:.2%}")
|
||||||
|
st.write(f"Confidence Level: {result['confidence']}")
|
||||||
|
|
||||||
|
# Display additional information
|
||||||
|
with st.expander("Transaction Details"):
|
||||||
|
st.json(transaction)
|
||||||
|
|
||||||
|
except requests.exceptions.RequestException as e:
|
||||||
|
st.error(f"Error connecting to the API: {str(e)}")
|
||||||
|
st.info("Please make sure the API server is running.")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
Reference in New Issue
Block a user