GitHub - Magenta91/Fraud_Classified_model: This project presents a Cyberfraud Classifier that detects fraudulent activities in online environments using Natural Language Processing (NLP) and Deep Learning techniques. It was developed as part of an ML Project for SkillAI India to showcase the application of language models in cybersecurity.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Checkpoints		Checkpoints
Notebooks		Notebooks
data		data
results_bert		results_bert
.gitattributes		.gitattributes
README.txt		README.txt
confusion_matrix.jpg		confusion_matrix.jpg
requirements.txt		requirements.txt

Repository files navigation

Cyberfraud Classifier 🔒🤖
This project presents a Cyberfraud Classifier that classifies fraudulent activities in online environments using Natural Language Processing (NLP) and Deep Learning techniques. It was developed as part of an ML Project for SkillAI India to showcase the application of language models in cybersecurity.

🚀 Project Overview
Cyberfraud detection is a critical task in today’s digital landscape where scams, phishing, and malicious activities are prevalent. Our goal was to build a model that can analyze text-based input and accurately classify it’s fraudulent nature and place them in correct label.

🧠 Model Used
We used BERT (Bidirectional Encoder Representations from Transformers) — a state-of-the-art transformer-based model developed by Google. BERT has the ability to understand the context of words in both directions, making it particularly effective in analyzing the complex and often deceptive patterns of fraudulent text.

Why BERT?
Contextual Understanding: Unlike traditional models that interpret text in a single direction, BERT reads text bidirectionally, allowing deeper understanding of subtle clues in fraud-related messages.As this model was made to work under HINGLISH language conditions which is rather prevelant in the Sub-continent.

Pre-trained Knowledge: BERT comes pre-trained on a massive corpus (Wikipedia + BooksCorpus), which helps in recognizing general patterns and context even with relatively smaller domain-specific datasets.

Transfer Learning: Fine-tuning BERT on our dataset resulted in better generalization and superior performance compared to traditional ML models and basic deep learning architectures like RNNs or LSTMs.

📊 Performance
The model achieved an accuracy of 85% on the test set, demonstrating its capability to effectively distinguish fraudulent text from legitimate communication.

⚠️ Limitations
Due to the large size of the fine-tuned BERT model, we were unable to upload the model files directly to the repository. However, the code to load, train, and evaluate the model is fully included and documented for reproducibility.

📂 Contents
cyberfraud_classifier.ipynb: Jupyter Notebook containing full data processing, model training, and evaluation pipeline.

requirements.txt: All dependencies needed to run the notebook.

dataset/: (Reference to data used – upload instructions or sample, if applicable)

🛠️ How to Run
Clone the repo.

Install dependencies using pip install -r requirements.txt.

Run the notebook in a GPU-enabled environment for optimal performance.

Download a pre-trained BERT model via HuggingFace or TensorFlow Hub and fine-tune using the notebook.

About

This project presents a Cyberfraud Classifier that detects fraudulent activities in online environments using Natural Language Processing (NLP) and Deep Learning techniques. It was developed as part of an ML Project for SkillAI India to showcase the application of language models in cybersecurity.

cybersecurity nlp-machine-learning classification-algorithm

Readme