# Fraud Detection System ## Overview This project aims to analyze transaction data, extract meaningful insights through Exploratory Data Analysis (EDA), perform feature engineering, train a machine learning model to classify fraudulent transactions, and deploy a simple API with a Web UI to predict fraud in real-time. ## Dataset Description The dataset consists of various features related to transactions, including details about the merchant, transaction amount, user details, and location. The key features are: * **trans_date_trans_time** : Timestamp of the transaction. * **cc_num** : Credit card number (anonymized transaction number). * **merchant** : Name of the merchant. * **category** : Type of merchant. * **amt** : Amount transferred. * **first, last** : First and last name of the cardholder. * **gender** : Gender of the cardholder. * **street, city, state, zip** : Location details of the cardholder. * **lat, long** : Latitude and longitude of the cardholder. * **city_pop** : Population of the city. * **job** : Job description of the cardholder. * **dob** : Date of birth of the cardholder. * **trans_num** : Unique transaction number. * **unix_time** : Unix timestamp. * **merch_lat, merch_long** : Latitude and longitude of the merchant. * **is_fraud** : Target variable (1 for fraud, 0 for legitimate transactions). # Tasks: ### 1. Exploratory Data Analysis (EDA) * Check for missing values and handle them appropriately. * Analyze the distribution of transaction amounts. * Identify correlations between different features. * Visualize geographical patterns of fraudulent transactions. * Investigate high-risk categories and merchants. ### 2. Feature Engineering * Convert categorical variables into numerical representations. * Derive additional features like transaction velocity, distance between merchant and user, and age of the cardholder. * Normalize and scale numerical features. * Extract time-based features (hour, day, weekday, month) from `trans_date_trans_time`. * One-hot encode categorical features where necessary. ### 3. Model Training * Split data into training and testing sets. * Use classification algorithms like Logistic Regression, Random Forest, XGBoost, or Neural Networks. * Train models using cross-validation and optimize hyperparameters. * Evaluate models using accuracy, precision, recall, and F1-score. ### 4. API Deployment (Flask/FastAPI) * Create an API that takes transaction details as input and predicts fraud. * Use Flask or FastAPI to build an endpoint (`/predict`). * Load the trained model and use it for inference. * Deploy the API using Docker or a cloud service. ### 5. Web UI for Fraud Prediction * Develop a simple HTML/CSS/JavaScript frontend. * Integrate the frontend with the API to take user input and display fraud predictions. * Use a framework like Streamlit or Flask to build a minimal UI. ## Installation and Usage ### Prerequisites Ensure you have Python 3.x installed along with the required dependencies. # Project File Structure: ``` │── data/ # Folder for storing raw and processed datasets │ ├── raw/ # Original dataset files(**You will find all the dataset here**) │ ├── processed/ # Processed/cleaned datasets │── experiments/ # Jupyter notebooks or scripts for EDA and model experimentation │ ├── eda.ipynb # Exploratory Data Analysis notebook │ ├── feature_engineering.ipynb # Feature engineering experiments │ ├── model_training.ipynb # Model training experiments │── models/ # Folder for storing trained models and checkpoints │ ├── fraud_model.pkl # Serialized trained model │ ├── model_metadata.json # Metadata about the model │── src/ # Source code for model training, API, and frontend │ ├── __init__.py # Python package indicator │ ├── config.py # Configuration settings │ ├── data_preprocessing.py # Data cleaning and feature engineering scripts │ ├── model_training.py # Script to train and save the model │ ├── model_evaluation.py # Model evaluation script │ ├── predict.py # Script to make predictions │ ├── api/ # API folder (Flask/FastAPI) │ │ ├── __init__.py │ │ ├── app.py # FastAPI/Flask API for fraud detection │ │ ├── inference.py # Load model and predict │ ├── web/ # Frontend code for simple Web UI │ │ ├── static/ # CSS, JS, images │ │ ├── templates/ # HTML templates │ │ ├── app.py # Streamlit or Flask-based frontend │── README.md # Project documentation │── requirements.txt # List of required Python libraries │── .gitignore # Files and folders to ignore in version control │── Dockerfile # Docker setup for deployment (if needed) │── deployment/ # Scripts for deploying on cloud platforms │ ├── docker-compose.yml # Docker Compose setup │ ├── cloud_run.sh # Deployment script ``` ### Explanation: * **`data/`** : Stores raw and processed datasets. * **`experiments/`** : Jupyter notebooks for EDA, feature engineering, and model training experiments. * **`models/`** : Stores trained models and related metadata. * **`src/`** : Core source code, including data processing, model training, evaluation, API, and frontend. * **`api/`** : Contains API-related scripts (Flask or FastAPI). * **`web/`** : Contains the frontend code for user interaction. * **`README.md`** : Documentation for setting up and running the project. * **`requirements.txt`** : Dependencies for the project. * **`Dockerfile` & `deployment/`** : For containerization and cloud deployment.