This project aims to analyze transaction data, extract meaningful insights through Exploratory Data Analysis (EDA), perform feature engineering, train a machine learning model to classify fraudulent transactions, and deploy a simple API with a Web UI to predict fraud in real-time.
## Dataset Description
The dataset consists of various features related to transactions, including details about the merchant, transaction amount, user details, and location. The key features are:
* **trans_date_trans_time** : Timestamp of the transaction.
* **cc_num** : Credit card number (anonymized transaction number).
* **merchant** : Name of the merchant.
* **category** : Type of merchant.
* **amt** : Amount transferred.
* **first, last** : First and last name of the cardholder.
* **gender** : Gender of the cardholder.
* **street, city, state, zip** : Location details of the cardholder.
* **lat, long** : Latitude and longitude of the cardholder.
* **city_pop** : Population of the city.
* **job** : Job description of the cardholder.
* **dob** : Date of birth of the cardholder.
* **trans_num** : Unique transaction number.
* **unix_time** : Unix timestamp.
* **merch_lat, merch_long** : Latitude and longitude of the merchant.
* **is_fraud** : Target variable (1 for fraud, 0 for legitimate transactions).
# Tasks:
### 1. Exploratory Data Analysis (EDA)
* Check for missing values and handle them appropriately.
* Analyze the distribution of transaction amounts.
* Identify correlations between different features.
* Visualize geographical patterns of fraudulent transactions.
* Investigate high-risk categories and merchants.
### 2. Feature Engineering
* Convert categorical variables into numerical representations.
* Derive additional features like transaction velocity, distance between merchant and user, and age of the cardholder.
* Normalize and scale numerical features.
* Extract time-based features (hour, day, weekday, month) from `trans_date_trans_time`.
* One-hot encode categorical features where necessary.
### 3. Model Training
* Split data into training and testing sets.
* Use classification algorithms like Logistic Regression, Random Forest, XGBoost, or Neural Networks.
* Train models using cross-validation and optimize hyperparameters.
* Evaluate models using accuracy, precision, recall, and F1-score.
### 4. API Deployment (Flask/FastAPI)
* Create an API that takes transaction details as input and predicts fraud.
* Use Flask or FastAPI to build an endpoint (`/predict`).
* Load the trained model and use it for inference.
* Deploy the API using Docker or a cloud service.
### 5. Web UI for Fraud Prediction
* Develop a simple HTML/CSS/JavaScript frontend.
* Integrate the frontend with the API to take user input and display fraud predictions.
* Use a framework like Streamlit or Flask to build a minimal UI.
## Installation and Usage
### Prerequisites
Ensure you have Python 3.x installed along with the required dependencies.