🔍 Real-Time Fraud Detection System

A production-ready fraud detection pipeline using RabbitMQ, machine learning, and real-time processing. This system processes transaction data through a trained ML model to detect fraudulent activities in real-time.

📋 Table of Contents

Files Structure
System Architecture
Installation
Running the System
Expected Results Flow
Use Real Fraud Cases
Application Logs

📁 Files Structure

FraudDetection/
├── src/
│   ├── producer.py           # Transaction data ingestion
│   ├── consumer.py           # ML processing engine
│   └── results_viewer.py     # Real-time results display
├── artifacts/
│   ├── model.joblib          # Trained ML model
│   └── preprocessor.joblib   # Data preprocessing pipeline
├── data/
│   └── new_applications.csv  # Sample transaction data
├── notebook_and_ppt/
│   └── models.ipynb          # Model training notebook
├── submissions/
│   └── *.csv                 # Model predictions
├── image/
│   └── system_architecture.jpg
├── requirements.txt          # Python dependencies
├── docker-compose.yml        # RabbitMQ infrastructure
└── README.md

System Architecture

The system is divided into 2 main phases: an offline Training Phase and an online Prediction Phase

Training Phase

Data Loading & Merging: Loads raw data from multiple CSV files (train_transaction.csv, train_identity.csv)
Feature Engineering: Performs extensive feature engineering, including creating time-based features (Day, TransactionHours, DayofWeek), amount transformations (dollars, cents, log), email domain mapping, device categorization, and V-column selection. Feature selection parameters and domain mappings are calculated and saved.
Data Splitting: Data is split using GroupKFold and temporal validation to prevent data leakage and mimic realistic fraud detection scenarios.
Preprocessing: A preprocessing pipeline is defined to handle numerical features, categorical encoding, and feature scaling for the selected V-columns and engineered features.
Model Training: A XGBoost Model is trained on the preprocessed fraud detection data with class balancing to handle the imbalanced nature of fraud cases.
Artifact Saving: The trained preprocessor, model, and feature engineering parameters are saved to disk (model.joblib and preprocessor.joblib files) for use in the real-time prediction phase.

Prediction Phase (Deployment Workflow)

1. Data Ingestion (Producer)

Reads transaction data from CSV (new_applications.csv)
Converts to JSON messages
Publishes to fraud_detection_queue
Rate-limited processing (1 tx/second)

2. ML Processing (Consumer)

# Feature Engineering (139+ features)
├── Time Features: Day, TransactionHours, DayofWeek
├── V-columns: 100+ anonymized features (selected subset)
├── Amount Features: dollars, cents, TransactionAmt_log
├── Identity Features: email domains, device types
└── Unique IDs: card+email combinations

# ML Model: XGBoost/LightGBM (trained on 500K+ transactions)
├── Input: 139 engineered features
├── Output: Fraud probability [0-1]
└── Threshold: 0.5 (configurable)

Consumes from fraud_detection_queue
Feature Engineering:
- Remove V-cols
- Time-based features (Day, Hour, DayOfWeek)
- Amount features (dollars, cents, log transform)
- Email domain mapping
- Device categorization
- Unique identifier creation
Preprocessing: Transforms the engineered features using the loaded preprocessor.joblib
Prediction: Feeds the preprocessed data into the loaded model.joblib to predict the loan default status and probability.
Publishing: Results to fraud_results_queue

3. Results Display (Viewer)

Consumes from fraud_results_queue
Real-time fraud alerts
Transaction details and confidence scores

📦 Installation

1. Clone Repository

git clone https://github.com/khnguyenn/FraudDetection
cd FraudDetection

2. Install Python Dependencies

pip install -r requirements.txt

3. Start RabbitMQ Infrastructure

docker run -d --name rabbitmq -p 5672:5672 -p 15672:15672 rabbitmq:3-management

4. Verify RabbitMQ is Running

# Check containers
docker ps

🚀 Running the System

Quick Start (3 Terminal Setup)

Terminal 1: Start Results Viewer

cd src
python results_viewer.py

Output: 🎯 Listening for results on 'fraud_results_queue'

Terminal 2: Start ML Consumer

cd src
python consumer.py

Output: 🎯 Starting fraud detection consumer...

Terminal 3: Send Transaction Data

cd src
python producer.py

Output: Sent transaction tx_1, tx_2, tx_3...

Expected Results Flow

RAW DATA (SPLITING INTO 1 ROW) -> Producer → Queue → Consumer(Feature Engineering, Preprocessing, Machine Learning model) → Results Queue → Viewer

tx_1: → Feature Engineering → Model Prediction → 85% fraud → 🚨 FRAUD DETECTED
tx_2: → Feature Engineering → Model Prediction → 12% fraud → ✅ LEGITIMATE

Use Real Fraud Cases

SAMPLE_DATA_CSV = "../data/new_applications.csv"
# Change the data file u want in producer.py

Expected Output with Real Fraud

============================================================
🔍 FRAUD DETECTION RESULT
============================================================
Transaction ID: tx_6
✅ STATUS: LEGITIMATE TRANSACTION
🎯 Fraud Risk: 0.0%
============================================================

============================================================
🔍 FRAUD DETECTION RESULT
============================================================
Transaction ID: tx_4
🚨 STATUS: FRAUD DETECTED
🎯 Confidence: 92.6%
============================================================

Application Logs

# Consumer logs
tail -f consumer.log

# Producer logs
tail -f producer.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Real-Time Fraud Detection System

📋 Table of Contents

📁 Files Structure

System Architecture

Training Phase

Prediction Phase (Deployment Workflow)

1. Data Ingestion (Producer)

2. ML Processing (Consumer)

3. Results Display (Viewer)

📦 Installation

1. Clone Repository

2. Install Python Dependencies

3. Start RabbitMQ Infrastructure

4. Verify RabbitMQ is Running

🚀 Running the System

Quick Start (3 Terminal Setup)

Terminal 1: Start Results Viewer

Terminal 2: Start ML Consumer

Terminal 3: Send Transaction Data

Expected Results Flow

Use Real Fraud Cases

Expected Output with Real Fraud

Application Logs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
artifacts		artifacts
data		data
image		image
notebook_and_ppt		notebook_and_ppt
src		src
submissions		submissions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🔍 Real-Time Fraud Detection System

📋 Table of Contents

📁 Files Structure

System Architecture

Training Phase

Prediction Phase (Deployment Workflow)

1. Data Ingestion (Producer)

2. ML Processing (Consumer)

3. Results Display (Viewer)

📦 Installation

1. Clone Repository

2. Install Python Dependencies

3. Start RabbitMQ Infrastructure

4. Verify RabbitMQ is Running

🚀 Running the System

Quick Start (3 Terminal Setup)

Terminal 1: Start Results Viewer

Terminal 2: Start ML Consumer

Terminal 3: Send Transaction Data

Expected Results Flow

Use Real Fraud Cases

Expected Output with Real Fraud

Application Logs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages