2025-02-23 01:37:01 +06:00
2025-02-23 01:37:01 +06:00
2025-02-23 01:37:01 +06:00
2025-02-23 01:37:01 +06:00
2025-02-23 01:37:01 +06:00
2025-02-23 01:37:01 +06:00
2025-02-23 01:37:01 +06:00
2025-02-23 01:37:01 +06:00
2025-02-23 01:37:01 +06:00

Fraud Detection System

Overview

This project aims to analyze transaction data, extract meaningful insights through Exploratory Data Analysis (EDA), perform feature engineering, train a machine learning model to classify fraudulent transactions, and deploy a simple API with a Web UI to predict fraud in real-time.

Dataset Description

The dataset consists of various features related to transactions, including details about the merchant, transaction amount, user details, and location. The key features are:

  • trans_date_trans_time : Timestamp of the transaction.
  • cc_num : Credit card number (anonymized transaction number).
  • merchant : Name of the merchant.
  • category : Type of merchant.
  • amt : Amount transferred.
  • first, last : First and last name of the cardholder.
  • gender : Gender of the cardholder.
  • street, city, state, zip : Location details of the cardholder.
  • lat, long : Latitude and longitude of the cardholder.
  • city_pop : Population of the city.
  • job : Job description of the cardholder.
  • dob : Date of birth of the cardholder.
  • trans_num : Unique transaction number.
  • unix_time : Unix timestamp.
  • merch_lat, merch_long : Latitude and longitude of the merchant.
  • is_fraud : Target variable (1 for fraud, 0 for legitimate transactions).

Tasks:

1. Exploratory Data Analysis (EDA)

  • Check for missing values and handle them appropriately.
  • Analyze the distribution of transaction amounts.
  • Identify correlations between different features.
  • Visualize geographical patterns of fraudulent transactions.
  • Investigate high-risk categories and merchants.

2. Feature Engineering

  • Convert categorical variables into numerical representations.
  • Derive additional features like transaction velocity, distance between merchant and user, and age of the cardholder.
  • Normalize and scale numerical features.
  • Extract time-based features (hour, day, weekday, month) from trans_date_trans_time.
  • One-hot encode categorical features where necessary.

3. Model Training

  • Split data into training and testing sets.
  • Use classification algorithms like Logistic Regression, Random Forest, XGBoost, or Neural Networks.
  • Train models using cross-validation and optimize hyperparameters.
  • Evaluate models using accuracy, precision, recall, and F1-score.

4. API Deployment (Flask/FastAPI)

  • Create an API that takes transaction details as input and predicts fraud.
  • Use Flask or FastAPI to build an endpoint (/predict).
  • Load the trained model and use it for inference.
  • Deploy the API using Docker or a cloud service.

5. Web UI for Fraud Prediction

  • Develop a simple HTML/CSS/JavaScript frontend.
  • Integrate the frontend with the API to take user input and display fraud predictions.
  • Use a framework like Streamlit or Flask to build a minimal UI.

Installation and Usage

Prerequisites

Ensure you have Python 3.x installed along with the required dependencies.

Project File Structure:

ds_task_fraud_detection/ │── data/ # Folder for storing raw and processed datasets │ ├── raw/ # Original dataset files(You will find all the dataset here) │ ├── processed/ # Processed/cleaned datasets │── experiments/ # Jupyter notebooks or scripts for EDA and model experimentation │ ├── eda.ipynb # Exploratory Data Analysis notebook │ ├── feature_engineering.ipynb # Feature engineering experiments │ ├── model_training.ipynb # Model training experiments │── models/ # Folder for storing trained models and checkpoints │ ├── fraud_model.pkl # Serialized trained model │ ├── model_metadata.json # Metadata about the model │── src/ # Source code for model training, API, and frontend │ ├── __init__.py # Python package indicator │ ├── config.py # Configuration settings │ ├── data_preprocessing.py # Data cleaning and feature engineering scripts │ ├── model_training.py # Script to train and save the model │ ├── model_evaluation.py # Model evaluation script │ ├── predict.py # Script to make predictions │ ├── api/ # API folder (Flask/FastAPI) │ │ ├── __init__.py │ │ ├── app.py # FastAPI/Flask API for fraud detection │ │ ├── inference.py # Load model and predict │ ├── web/ # Frontend code for simple Web UI │ │ ├── static/ # CSS, JS, images │ │ ├── templates/ # HTML templates │ │ ├── app.py # Streamlit or Flask-based frontend │── README.md # Project documentation │── requirements.txt # List of required Python libraries │── .gitignore # Files and folders to ignore in version control │── Dockerfile # Docker setup for deployment (if needed) │── deployment/ # Scripts for deploying on cloud platforms │ ├── docker-compose.yml # Docker Compose setup │ ├── cloud_run.sh # Deployment script

Explanation:

  • data/ : Stores raw and processed datasets.
  • experiments/ : Jupyter notebooks for EDA, feature engineering, and model training experiments.
  • models/ : Stores trained models and related metadata.
  • src/ : Core source code, including data processing, model training, evaluation, API, and frontend.
  • api/ : Contains API-related scripts (Flask or FastAPI).
  • web/ : Contains the frontend code for user interaction.
  • README.md : Documentation for setting up and running the project.
  • requirements.txt : Dependencies for the project.
  • Dockerfile & deployment/ : For containerization and cloud deployment.
S
Description
No description provided
Readme 292 MiB
Languages
Jupyter Notebook 96.1%
Python 3%
HTML 0.7%
CSS 0.2%