A specialized framework designed to handle extreme class imbalance by treating minority classes as anomalies. This project demonstrates high-precision detection across Network, Manufacturing quality control and Medical domains using machine learning methods.
- Overview
- Class Imbalance Strategy
- Case Study : Isolation Forest
- Features
- System Requirements
- Dataset setup & Model Artifact Generation
- Installation
- Quick Start
- API Documentation
- Datasets
- Architecture
- Usage Examples
- Project Structure
- Performance Metrics
- Contributing
- License
- Author
This project addresses the "Imbalanced Data" problem in Machine Learning. In real-world scenarios like Network Intrusion or Medical Diagnosis, the "Anomaly" is often extremely rare. This API uses Ensemble Anomaly Detection to find those rare events without needing a perfectly balanced training set. It is a production anomaly detection API built with FastAPI and PyTorch, designed for integration into industrial inspection, medical x-ray early disease detection and analysis or cybersecurity monitoring systems.
| Technique | Implementation | Benefit |
|---|---|---|
| One-Class SVM | Learns the boundary of "Normal" data. | Ignores the lack of minority samples. |
| Autoencoders | Reconstructs input; high error = anomaly. | Self-supervised; no labels required. |
| Isolation Forest | Isolates points in a tree structure. | Efficiently finds outliers in large data. |
The Theory: Anomalies are "few and different." In a tree-based structure, they are isolated much faster (shorter path) than normal points (longer path). This allows the API to detect attacks even if they were never seen during training.
- Multi-domain support - 3 independent detection modules
- Ensemble learning - Combines 4-7 models per domain using majority voting
- RESTful API - Clean Flask-based API with JSON responses
- Web interface - User-friendly HTML/CSS/JS frontend
- Real-time processing - Fast inference with processing time tracking
- Comprehensive results - Individual model outputs + ensemble predictions
- π₯ Deep Learning - ResNet34 for feature extraction, Autoencoders for anomaly detection
- π Classical ML - Isolation Forest, One-Class SVM, Elliptic Envelope, LOF
- π― High Accuracy - 92-99% across different domains
- π Detailed Metrics - Confidence scores, processing times, model agreement
- π Secure uploads - File validation, size limits, automatic cleanup
- π Logging - Comprehensive logging for debugging and monitoring
- OS: Linux, macOS, Windows 10+
- Python: 3.8 or higher
- RAM: 8GB (16GB recommended for X-ray module)
- Storage: 5GB for models and dependencies
- GPU: Optional (CUDA-compatible for faster inference) PyTorch+cu117
- Note: CUDA acceleration is used consistently across all pipelines (UNSW-NB15, MVTec AD, and NIH ChestXray14), particularly for CNN-based feature extraction and autoencoder inference. Device selection is handled dynamically (cuda if available, otherwise cpu), ensuring both efficiency and portability across systems.
flask>=2.0.0 torch>=1.9.0 torchvision>=0.10.0 scikit-learn>=1.0.0 numpy>=1.21.0 pandas>=1.3.0 pillow>=8.3.0 scipy>=1.7.0
To generate the required trained models and serialized artifacts (.pkl files) used by the anomaly detection API, all Python notebooks within the other repository MEng β Techniques to overcome class imbalance using anomaly/defect detection. This repository must be executed end-to-end after setting up the datasets in the correct directory structure.
- Download Datasets
Download the following datasets from their official sources:
-
UNSW-NB15 β Network intrusion detection dataset
-
MVTec AD β Industrial defect detection dataset
-
NIH ChestXray14 β Medical imaging anomaly detection dataset
- Execute All Notebooks
Each notebook must be run from top to bottom to:
-
Preprocess datasets
-
Train anomaly detection models
-
Calibrate decision thresholds and anomaly scores
-
Serialize trained models and preprocessing pipelines
This process generates .pkl files (e.g., trained models, PCA objects, scalers, encoders), which are saved locally and later loaded by the API for inference.
Examples of generated artifacts include:
-
Isolation Forest models
-
One-Class SVM models
-
PCA transformers
-
Feature scalers and encoders
git clone https://github.com/tanishq14
cd anomaly-detection-apipython -m venv anom_detsource anom_det/bin/activate
anom_det\Scripts\activatepip install -r requirements.txt
Place your trained models in the models/ directory: models/network/.pkl models/mvtec/.pkl, .pt models/xray/.pkl, *.pt
python check_system.py
If all checks pass β , you're ready to go!
python app.py
The API will be available at: http://localhost:5000
curl -X POST http://localhost:5000/api/predict/network
-H "Content-Type: application/json"
-d '{
"dur": 0.5,
"proto": "tcp",
"service": "http",
"state": "FIN",
"spkts": 12,
"dpkts": 10,
"sbytes": 800,
"dbytes": 15000,
"rate": 40.0
}'
curl -X POST http://localhost:5000/api/predict/mvtec
-F "file=@product_image.png"
curl -X POST http://localhost:5000/api/predict/xray
-F "file=@chest_xray.png"
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Homepage |
GET |
/network |
Network detection UI |
GET |
/mvtec |
MVTec inspection UI |
GET |
/xray |
X-ray analysis UI |
POST |
/api/predict/network |
Network intrusion detection |
POST |
/api/predict/mvtec |
Product quality inspection |
POST |
/api/predict/xray |
Chest X-ray analysis |
GET |
/api/health |
Health check |
GET |
/api/models/info |
Model information |
{
"success": true,
"timestamp": "2025-12-26T15:13:00.000000",
"api_version": "2.0",
"data": {
"ensemble": {
"prediction": "Normal",
"confidence": 95.5,
"votes": { "Normal": 3, "Anomaly": 1 }
},
"models": { ... },
"processing_time": "0.234s"
}
}
Network: UNSW-NB15
- Records: 2.5 million network flows
- Features: 44 dimensions
- Attack Types: 9 categories (DoS, Exploits, Reconnaissance, etc.)
- Split: 80% train, 20% test
MVTec: AD
- Images: 5,000+ high-resolution product images
- Categories: 15 product types
- Defects: Cracks, scratches, contamination, missing parts
- Split: Per-category train/test
X-ray: NIH Chest X-ray14
- Images: 112,120 frontal chest X-rays
- Conditions: 14 thoracic pathologies
- Classes: Multi-label classification
- Resolution: 224x224 (preprocessed)
User Input β Flask API β Pipeline Module β Ensemble Models β Prediction β Validation & Preprocessing β Feature Extraction (if image) β Parallel Model Execution β Majority Voting Ensemble β JSON Response + Confidence
Network (4 models):
- Isolation Forest
- One-Class SVM
- Elliptic Envelope
- Local Outlier Factor (LOF)
MVTec (5 components):
- ResNet34 (feature extraction)
- Isolation Forest
- One-Class SVM
- Elliptic Envelope
- LOF
X-ray (7 models):
- Autoencoder (unsupervised)
- Isolation Forest (unsupervised)
- One-Class SVM (unsupervised)
- Elliptic Envelope (unsupervised)
- LOF (unsupervised)
- Decision Tree (supervised)
- K-Nearest Neighbors (supervised)
from modules import predict_network, predict_mvtec, predict_xray
Network detection with preset
result = predict_network({'preset': 'normal_web_browsing'})
print(result['ensemble']['prediction']) # 'Normal' or 'Attack'
Product quality inspection
result = predict_mvtec('product_image.png')
if result['ensemble']['prediction'] == 'Anomaly':
print(f"Defect detected! Confidence: {result['ensemble']['confidence']}%")
Medical X-ray analysis
result = predict_xray('chest_xray.png')
for model, data in result['supervised_models'].items():
print(f"{model}: {data['prediction']} ({data['confidence']:.1f}%)")- Navigate to http://localhost:5000
- Select a detection module
- Upload image or enter data
- View comprehensive results with visualizations
anomaly-detection-api/
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ app.py # Flask application
βββ check_system.py # System diagnostics
βββ modules/ # Pipeline modules
β βββ __init__.py
β βββ network_pipeline.py
β βββ mvtec_pipeline.py
β βββ xray_pipeline.py
βββ models/ # Trained models
β βββ network/
β βββ mvtec/
β βββ xray/
βββ templates/ # HTML templates
βββ static/ # CSS/JS assets
βββ docs/ # Documentation
| Domain | Imabalance Ratio | Accuracy | Precision | Recall | F1-Score | Models |
|---|---|---|---|---|---|---|
| Network | Extreme | 99.2% | 98.5% | 99.1% | 98.8% | 4 |
| MVTec | High | 95.8% | 94.2% | 96.1% | 95.1% | 4 |
| X-ray | Moderate | 92.3% | 91.8% | 92.7% | 92.2% | 7 |
Metrics calculated on respective test sets using ensemble predictions
Basic check
python check_system.py
Verbose output
python check_system.py --verbose
Attempt fixes
python check_system.py --fix
Run unit tests (if implemented):
pytest tests/
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Tanishq Rahul Shelke
- Masters in Engineering (MEng) - Machine Learning Engineer
- Focus: Anomaly Detection, Ensemble Methods, Deep Learning
- LinkedIn: Tanishq Shelke
- GitHub: Tanishq14
- UNSW-NB15 Dataset: University of New South Wales
- MVTec AD Dataset: MVTec Software GmbH
- NIH Chest X-ray14: National Institutes of Health
- PyTorch Team: For the deep learning framework
- Scikit-learn Team: For machine learning tools
For issues, questions, or suggestions:
- Open an Issue
β If you find this project useful, please consider giving it a star!
Last Updated: February 1, 2026
This project is licensed under the MIT License - see the LICENSE file for details.