Enterprise-grade malicious URL detection system powered by Hybrid AI Architecture (Bloom Filters + Machine Learning).
CyberSentinel is a high-performance security microservice designed to detect phishing, malware, and defacement URLs in real-time. Unlike traditional blacklists that rely solely on slow database lookups, this system employs a multi-layered hybrid architecture:
- Layer 1: Probabilistic Bloom Filters (In-Memory) - Instant rejection of known bad/good URLs (O(k) complexity).
- Layer 2: Database Confirmation - Zero-false-positive verification for flagged entities.
- Layer 3: AI/ML Inference Engine - Real-time analysis of unknown URLs using an ONNX-powered Neural Network/Random Forest model.
This approach ensures sub-millisecond latency for 99% of requests while maintaining the ability to detect zero-day threats that haven't yet been blacklisted.
Note: Dashboard features real-time telemetry and "Cyberpunk" aesthetic.
- In-Memory Bloom Filters: Uses FNV-1a and MurmurHash2 double-hashing to store ~650,000+ signatures in a compact bit array.
- ONNX Runtime Integration: Runs a pre-trained machine learning model directly in Node.js to classify unknown URLs based on lexical features.
- MongoDB Persistence: Serializes Bloom Filter state to disk, allowing fast re-hydration on server restart.
- Microsecond Latency: Bloom filter checks take ~0.05ms.
- LRU Caching: Frequently accessed results are cached in memory.
- Express Rate Limiting: Protects the API from DDoS and abuse.
The system extracts 14 lexical features from every URL for the AI model:
- URL length & Special character counts (
@,//,?, etc.) - Suspicious keyword presence (e.g.,
login,verify,paypal) - IP check, HTTPS validity, and Hex-encoding detection.
- Entropy and repetition analysis.
- Built with EJS and TailwindCSS.
- Features a "Glassmorphism" design with neon accents.
- Real-time Telemetry: Visualizes server-side (Bloom/ML) vs client-side network latency.
graph TD
A[Client Request] --> B{"LRU Cache?"}
B -- Yes --> C[Return Cached Result]
B -- No --> D{"Bloom Filter (Malicious)?"}
D -- Yes --> E["Check MongoDB (Verify Type)"]
E --> F[Return Malicious/Type]
D -- No --> G{"Bloom Filter (Benign)?"}
G -- Yes --> H[Return Safe]
G -- No --> I[Run ONNX AI Model]
I --> J[Feature Extraction]
J --> K["Inference (Phishing/Malware/Defacement)"]
K --> L[Return ML Prediction]
- Node.js (v18+ recommended for ONNX)
- MongoDB (Running locally or Atlas URI)
-
Clone the repository
git clone https://github.com/yourusername/cybersentinel.git cd cybersentinel -
Install Dependencies
npm install
-
Configure Environment Create a
.envfile in the root:PORT=3000 MONGO_URI=mongodb://localhost:27017/urldetection
-
Download/Verify Model Ensure
final_url_model.onnxis present in the root directory. -
Start the Server
npm run start # OR for dev node index.js
The main endpoint using the full Hybrid Engine.
Request:
GET http://localhost:3000/check?url=http://suspicious-bank-login.com
Response:
{
"message": "phishing",
"responseTime": "12.45 ms"
}Possible messages: safe, benign, phishing, malware, defacement.
Bypasses caching and Bloom Filters to query the database directly. Used for performance comparison.
Response:
{
"message": "phishing",
"responseTime": "150.20 ms"
}| Component | Tech | Usage |
|---|---|---|
| Runtime | Node.js | Core Execution Environment |
| Framework | Express v5 | API Routing & Middleware |
| Database | MongoDB | Persistent Storage for Signatures |
| AI Engine | ONNX Runtime | Running ML Models in Node |
| Algorithms | Bloom Filter | Probabilistic Data Structure |
| Hashing | FNV-1a, Murmur2 | Fast Non-Crypto Hashing |
| Frontend | EJS, Tailwind | Interactive Dashboard |
The application is deployment-ready for platforms like Render.
Note: Ensure the hosting environment supports onnxruntime-node binary dependencies.