First commit
Defined file structure and completed EDA
This commit is contained in:
@@ -0,0 +1,119 @@
|
||||
# Fraud Detection System
|
||||
|
||||
## Overview
|
||||
|
||||
This project aims to analyze transaction data, extract meaningful insights through Exploratory Data Analysis (EDA), perform feature engineering, train a machine learning model to classify fraudulent transactions, and deploy a simple API with a Web UI to predict fraud in real-time.
|
||||
|
||||
## Dataset Description
|
||||
|
||||
The dataset consists of various features related to transactions, including details about the merchant, transaction amount, user details, and location. The key features are:
|
||||
|
||||
* **trans_date_trans_time** : Timestamp of the transaction.
|
||||
* **cc_num** : Credit card number (anonymized transaction number).
|
||||
* **merchant** : Name of the merchant.
|
||||
* **category** : Type of merchant.
|
||||
* **amt** : Amount transferred.
|
||||
* **first, last** : First and last name of the cardholder.
|
||||
* **gender** : Gender of the cardholder.
|
||||
* **street, city, state, zip** : Location details of the cardholder.
|
||||
* **lat, long** : Latitude and longitude of the cardholder.
|
||||
* **city_pop** : Population of the city.
|
||||
* **job** : Job description of the cardholder.
|
||||
* **dob** : Date of birth of the cardholder.
|
||||
* **trans_num** : Unique transaction number.
|
||||
* **unix_time** : Unix timestamp.
|
||||
* **merch_lat, merch_long** : Latitude and longitude of the merchant.
|
||||
* **is_fraud** : Target variable (1 for fraud, 0 for legitimate transactions).
|
||||
|
||||
# Tasks:
|
||||
|
||||
### 1. Exploratory Data Analysis (EDA)
|
||||
|
||||
* Check for missing values and handle them appropriately.
|
||||
* Analyze the distribution of transaction amounts.
|
||||
* Identify correlations between different features.
|
||||
* Visualize geographical patterns of fraudulent transactions.
|
||||
* Investigate high-risk categories and merchants.
|
||||
|
||||
### 2. Feature Engineering
|
||||
|
||||
* Convert categorical variables into numerical representations.
|
||||
* Derive additional features like transaction velocity, distance between merchant and user, and age of the cardholder.
|
||||
* Normalize and scale numerical features.
|
||||
* Extract time-based features (hour, day, weekday, month) from `trans_date_trans_time`.
|
||||
* One-hot encode categorical features where necessary.
|
||||
|
||||
### 3. Model Training
|
||||
|
||||
* Split data into training and testing sets.
|
||||
* Use classification algorithms like Logistic Regression, Random Forest, XGBoost, or Neural Networks.
|
||||
* Train models using cross-validation and optimize hyperparameters.
|
||||
* Evaluate models using accuracy, precision, recall, and F1-score.
|
||||
|
||||
### 4. API Deployment (Flask/FastAPI)
|
||||
|
||||
* Create an API that takes transaction details as input and predicts fraud.
|
||||
* Use Flask or FastAPI to build an endpoint (`/predict`).
|
||||
* Load the trained model and use it for inference.
|
||||
* Deploy the API using Docker or a cloud service.
|
||||
|
||||
### 5. Web UI for Fraud Prediction
|
||||
|
||||
* Develop a simple HTML/CSS/JavaScript frontend.
|
||||
* Integrate the frontend with the API to take user input and display fraud predictions.
|
||||
* Use a framework like Streamlit or Flask to build a minimal UI.
|
||||
|
||||
## Installation and Usage
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Ensure you have Python 3.x installed along with the required dependencies.
|
||||
|
||||
# Project File Structure:
|
||||
```
|
||||
│── data/ # Folder for storing raw and processed datasets
|
||||
│ ├── raw/ # Original dataset files(**You will find all the dataset here**)
|
||||
│ ├── processed/ # Processed/cleaned datasets
|
||||
│── experiments/ # Jupyter notebooks or scripts for EDA and model experimentation
|
||||
│ ├── eda.ipynb # Exploratory Data Analysis notebook
|
||||
│ ├── feature_engineering.ipynb # Feature engineering experiments
|
||||
│ ├── model_training.ipynb # Model training experiments
|
||||
│── models/ # Folder for storing trained models and checkpoints
|
||||
│ ├── fraud_model.pkl # Serialized trained model
|
||||
│ ├── model_metadata.json # Metadata about the model
|
||||
│── src/ # Source code for model training, API, and frontend
|
||||
│ ├── __init__.py # Python package indicator
|
||||
│ ├── config.py # Configuration settings
|
||||
│ ├── data_preprocessing.py # Data cleaning and feature engineering scripts
|
||||
│ ├── model_training.py # Script to train and save the model
|
||||
│ ├── model_evaluation.py # Model evaluation script
|
||||
│ ├── predict.py # Script to make predictions
|
||||
│ ├── api/ # API folder (Flask/FastAPI)
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── app.py # FastAPI/Flask API for fraud detection
|
||||
│ │ ├── inference.py # Load model and predict
|
||||
│ ├── web/ # Frontend code for simple Web UI
|
||||
│ │ ├── static/ # CSS, JS, images
|
||||
│ │ ├── templates/ # HTML templates
|
||||
│ │ ├── app.py # Streamlit or Flask-based frontend
|
||||
│── README.md # Project documentation
|
||||
│── requirements.txt # List of required Python libraries
|
||||
│── .gitignore # Files and folders to ignore in version control
|
||||
│── Dockerfile # Docker setup for deployment (if needed)
|
||||
│── deployment/ # Scripts for deploying on cloud platforms
|
||||
│ ├── docker-compose.yml # Docker Compose setup
|
||||
│ ├── cloud_run.sh # Deployment script
|
||||
|
||||
```
|
||||
|
||||
### Explanation:
|
||||
|
||||
* **`data/`** : Stores raw and processed datasets.
|
||||
* **`experiments/`** : Jupyter notebooks for EDA, feature engineering, and model training experiments.
|
||||
* **`models/`** : Stores trained models and related metadata.
|
||||
* **`src/`** : Core source code, including data processing, model training, evaluation, API, and frontend.
|
||||
* **`api/`** : Contains API-related scripts (Flask or FastAPI).
|
||||
* **`web/`** : Contains the frontend code for user interaction.
|
||||
* **`README.md`** : Documentation for setting up and running the project.
|
||||
* **`requirements.txt`** : Dependencies for the project.
|
||||
* **`Dockerfile` & `deployment/`** : For containerization and cloud deployment.
|
||||
Reference in New Issue
Block a user