Detect phishing emails in real-time using machine learning — trained on six merged datasets and deployed via a full-stack FastAPI app.
🔗 Live Demo: https://phishingdetection.net
- What It Does
- Tech Stack
- ML Models
- Sample Input
- Future Improvements
- Project Structure
- Acknowledgements
- Author
This project allows users to paste real email content (sender, subject, body, etc.) and choose between three machine learning models to detect whether it's Phishing or Legitimate.
It combines:
- 📊 Natural Language Processing (TF-IDF + NLTK)
- 🤖 ML models (Naive Bayes, Logistic Regression, SVM)
- 🌍 Live API deployment + responsive UI
| Layer | Tech |
|---|---|
| Frontend | HTML, CSS, JavaScript |
| Backend | FastAPI, Uvicorn |
| ML/NLP | Scikit-learn, NLTK, joblib |
| Deployment | Render, Namecheap |
| Hosting Models | Hugging Face 🤗 |
Choose between:
- ✅ Support Vector Machine (Best Accuracy)
- ✅ Logistic Regression
- ✅ Multinomial Naive Bayes
Sender: freeiphone@gmail.com
Subject: Don't miss this chance!
Body: Click the link to claim your free iPhone 14 Pro Max.
Date: May 5, 2025
Model: SVM
✔️ Output: Phishing
-
📡 Public API & Documentation
Provide a proper REST API endpoint with OpenAPI/Swagger documentation so developers can integrate the phishing detection system into their own applications. -
🎨 Improve the UI
Rebuild the frontend using a modern framework like React (possibly with Tailwind or Material UI) to create a more interactive and responsive user experience. -
🔗 URL-Based Model
Train and integrate a secondary model focused specifically on analyzing URLs for phishing characteristics such as domain structure, length, obfuscation, and suspicious keywords. -
📈 Expand the Dataset
Enhance the model's performance by collecting a larger and more diverse dataset of phishing and legitimate emails, improving generalization and reducing bias. -
🧠 Improve Model Explainability
Integrate explainable AI tools like SHAP or LIME to provide transparency into why the model classified an email as phishing or legitimate. -
📬 Real-Time Email API Integration (Optional)
Integrate with email providers like Gmail or Microsoft Outlook via API to allow live scanning of user inboxes (with permission) and flag suspicious messages in real time.
EmailPhishingDetection/
├── api/
│ ├── main.py
│ └── pipeline.py
├── data/
│ └── phishing_email.csv
├── frontend/
│ └── static/
│ └── index.html
├── models/
│ ├── logistic_regression_model.joblib
│ ├── naive_bayes_model.joblib
│ ├── svm_model.joblib
│ └── tfidf_vectorizer.joblib
├── notebooks/
│ └── 01_training.ipynb
├── images/
├── .gitignore
├── LICENSE.md
├── README.md
├── render.yml
└── requirements.txt
This project uses the Phishing Email Dataset by Naser Abdullah Alam on Kaggle.
Please cite the following article if using this dataset:
Al-Subaiey, A., Al-Thani, M., Alam, N. A., Antora, K. F., Khandakar, A., & Zaman, S. A. U. (2024, May 19).
Novel Interpretable and Robust Web-based AI Platform for Phishing Email Detection.
ArXiv: https://arxiv.org/abs/2405.11619