Fraud Detection System

Overview

This project aims to analyze transaction data, extract meaningful insights through Exploratory Data Analysis (EDA), perform feature engineering, train a machine learning model to classify fraudulent transactions, and deploy a simple API with a Web UI to predict fraud in real-time.

Dataset Description

The dataset consists of various features related to transactions, including details about the merchant, transaction amount, user details, and location. The key features are:

trans_date_trans_time : Timestamp of the transaction.
cc_num : Credit card number (anonymized transaction number).
merchant : Name of the merchant.
category : Type of merchant.
amt : Amount transferred.
first, last : First and last name of the cardholder.
gender : Gender of the cardholder.
street, city, state, zip : Location details of the cardholder.
lat, long : Latitude and longitude of the cardholder.
city_pop : Population of the city.
job : Job description of the cardholder.
dob : Date of birth of the cardholder.
trans_num : Unique transaction number.
unix_time : Unix timestamp.
merch_lat, merch_long : Latitude and longitude of the merchant.
is_fraud : Target variable (1 for fraud, 0 for legitimate transactions).

Tasks:

1. Exploratory Data Analysis (EDA)

Check for missing values and handle them appropriately.
Analyze the distribution of transaction amounts.
Identify correlations between different features.
Visualize geographical patterns of fraudulent transactions.
Investigate high-risk categories and merchants.

2. Feature Engineering

Convert categorical variables into numerical representations.
Derive additional features like transaction velocity, distance between merchant and user, and age of the cardholder.
Normalize and scale numerical features.
Extract time-based features (hour, day, weekday, month) from trans_date_trans_time.
One-hot encode categorical features where necessary.

3. Model Training

Split data into training and testing sets.
Use classification algorithms like Logistic Regression, Random Forest, XGBoost, or Neural Networks.
Train models using cross-validation and optimize hyperparameters.
Evaluate models using accuracy, precision, recall, and F1-score.

4. API Deployment (Flask/FastAPI)

Create an API that takes transaction details as input and predicts fraud.
Use Flask or FastAPI to build an endpoint (/predict).
Load the trained model and use it for inference.
Deploy the API using Docker or a cloud service.

5. Web UI for Fraud Prediction

Develop a simple HTML/CSS/JavaScript frontend.
Integrate the frontend with the API to take user input and display fraud predictions.
Use a framework like Streamlit or Flask to build a minimal UI.

Installation and Usage

Prerequisites

Ensure you have Python 3.x installed along with the required dependencies.

Project File Structure:

ds_task_fraud_detection/ │── data/ # Folder for storing raw and processed datasets │ ├── raw/ # Original dataset files(You will find all the dataset here) │ ├── processed/ # Processed/cleaned datasets │── experiments/ # Jupyter notebooks or scripts for EDA and model experimentation │ ├── eda.ipynb # Exploratory Data Analysis notebook │ ├── feature_engineering.ipynb # Feature engineering experiments │ ├── model_training.ipynb # Model training experiments │── models/ # Folder for storing trained models and checkpoints │ ├── fraud_model.pkl # Serialized trained model │ ├── model_metadata.json # Metadata about the model │── src/ # Source code for model training, API, and frontend │ ├── __init__.py # Python package indicator │ ├── config.py # Configuration settings │ ├── data_preprocessing.py # Data cleaning and feature engineering scripts │ ├── model_training.py # Script to train and save the model │ ├── model_evaluation.py # Model evaluation script │ ├── predict.py # Script to make predictions │ ├── api/ # API folder (Flask/FastAPI) │ │ ├── __init__.py │ │ ├── app.py # FastAPI/Flask API for fraud detection │ │ ├── inference.py # Load model and predict │ ├── web/ # Frontend code for simple Web UI │ │ ├── static/ # CSS, JS, images │ │ ├── templates/ # HTML templates │ │ ├── app.py # Streamlit or Flask-based frontend │── README.md # Project documentation │── requirements.txt # List of required Python libraries │── .gitignore # Files and folders to ignore in version control │── Dockerfile # Docker setup for deployment (if needed) │── deployment/ # Scripts for deploying on cloud platforms │ ├── docker-compose.yml # Docker Compose setup │ ├── cloud_run.sh # Deployment script

Explanation:

data/ : Stores raw and processed datasets.
experiments/ : Jupyter notebooks for EDA, feature engineering, and model training experiments.
models/ : Stores trained models and related metadata.
src/ : Core source code, including data processing, model training, evaluation, API, and frontend.
api/ : Contains API-related scripts (Flask or FastAPI).
web/ : Contains the frontend code for user interaction.
README.md : Documentation for setting up and running the project.
requirements.txt : Dependencies for the project.
Dockerfile & deployment/ : For containerization and cloud deployment.

5.9 KiB Raw Blame History