This project implements a comprehensive fraud detection system using multiple machine learning algorithms and advanced feature engineering techniques.
- Data preprocessing and cleaning
- Advanced feature engineering
- Multiple ML models (Random Forest, XGBoost, Logistic Regression, Isolation Forest)
- Ensemble model with meta learner
- Comprehensive evaluation metrics
- Class imbalance handling
-
Clone the Repository
git clone <repository-url> cd <repository-directory> -
Install Dependencies
pip install -r requirements.txt -
Run the Fraud Detection Analysis Use the run script:
python run_fraud_detection.pyOr, run with a custom sample size:
python run_with_sample.py <sample_size>
- Random Forest: Accuracy 100.00%, F1-Score 99.91%, ROC-AUC 99.95%
- XGBoost: Accuracy 100.00%, F1-Score 99.85%, ROC-AUC 99.98%
- Logistic Regression: Accuracy 99.99%, F1-Score 99.60%, ROC-AUC 99.98% (Best Performer)
- Meta Ensemble: Accuracy 100.00%, F1-Score 99.85%, ROC-AUC 99.91%
amount_equals_old_balance(28.54%)step(25.51%)balance_diff_orig(11.74%)
- Evaluation plots and feature importance plots have been saved:
fraud_detection_evaluation.pngfeature_importance.png
Utilizes a Gradient Boosting Classifier to refine ensemble predictions for greater accuracy. The meta learner learns from ensemble predictions to provide a final probability score.