Skip to content

tanishq14/anomaly_detection_api

Repository files navigation

πŸ€– Techniques to Overcome Class Imbalance using Anomaly Detection

Python Flask PyTorch License Status Ensemble Learning

A specialized framework designed to handle extreme class imbalance by treating minority classes as anomalies. This project demonstrates high-precision detection across Network, Manufacturing quality control and Medical domains using machine learning methods.


πŸ“‹ Table of Contents


🎯 Overview

This project addresses the "Imbalanced Data" problem in Machine Learning. In real-world scenarios like Network Intrusion or Medical Diagnosis, the "Anomaly" is often extremely rare. This API uses Ensemble Anomaly Detection to find those rare events without needing a perfectly balanced training set. It is a production anomaly detection API built with FastAPI and PyTorch, designed for integration into industrial inspection, medical x-ray early disease detection and analysis or cybersecurity monitoring systems.


πŸ”§ Class Imbalance Strategy

Technique Implementation Benefit
One-Class SVM Learns the boundary of "Normal" data. Ignores the lack of minority samples.
Autoencoders Reconstructs input; high error = anomaly. Self-supervised; no labels required.
Isolation Forest Isolates points in a tree structure. Efficiently finds outliers in large data.

πŸ”¬ Case Study : Isolation Forest

The Theory: Anomalies are "few and different." In a tree-based structure, they are isolated much faster (shorter path) than normal points (longer path). This allows the API to detect attacks even if they were never seen during training.


✨ Features

Core Capabilities

  • Multi-domain support - 3 independent detection modules
  • Ensemble learning - Combines 4-7 models per domain using majority voting
  • RESTful API - Clean Flask-based API with JSON responses
  • Web interface - User-friendly HTML/CSS/JS frontend
  • Real-time processing - Fast inference with processing time tracking
  • Comprehensive results - Individual model outputs + ensemble predictions

Technical Features

  • πŸ”₯ Deep Learning - ResNet34 for feature extraction, Autoencoders for anomaly detection
  • πŸ“Š Classical ML - Isolation Forest, One-Class SVM, Elliptic Envelope, LOF
  • 🎯 High Accuracy - 92-99% across different domains
  • πŸ“ˆ Detailed Metrics - Confidence scores, processing times, model agreement
  • πŸ”’ Secure uploads - File validation, size limits, automatic cleanup
  • πŸ“ Logging - Comprehensive logging for debugging and monitoring

πŸ’» System Requirements

Minimum Requirements

  • OS: Linux, macOS, Windows 10+
  • Python: 3.8 or higher
  • RAM: 8GB (16GB recommended for X-ray module)
  • Storage: 5GB for models and dependencies
  • GPU: Optional (CUDA-compatible for faster inference) PyTorch+cu117
  • Note: CUDA acceleration is used consistently across all pipelines (UNSW-NB15, MVTec AD, and NIH ChestXray14), particularly for CNN-based feature extraction and autoencoder inference. Device selection is handled dynamically (cuda if available, otherwise cpu), ensuring both efficiency and portability across systems.

Python Dependencies

flask>=2.0.0 torch>=1.9.0 torchvision>=0.10.0 scikit-learn>=1.0.0 numpy>=1.21.0 pandas>=1.3.0 pillow>=8.3.0 scipy>=1.7.0


πŸ“ Dataset Setup & Model Artifact Generation

To generate the required trained models and serialized artifacts (.pkl files) used by the anomaly detection API, all Python notebooks within the other repository MEng β€” Techniques to overcome class imbalance using anomaly/defect detection. This repository must be executed end-to-end after setting up the datasets in the correct directory structure.

  1. Download Datasets

Download the following datasets from their official sources:

  • UNSW-NB15 – Network intrusion detection dataset

  • MVTec AD – Industrial defect detection dataset

  • NIH ChestXray14 – Medical imaging anomaly detection dataset

⚠️ Due to licensing restrictions, datasets are not included in this repository.

  1. Execute All Notebooks

Each notebook must be run from top to bottom to:

  • Preprocess datasets

  • Train anomaly detection models

  • Calibrate decision thresholds and anomaly scores

  • Serialize trained models and preprocessing pipelines

This process generates .pkl files (e.g., trained models, PCA objects, scalers, encoders), which are saved locally and later loaded by the API for inference.

Examples of generated artifacts include:

  • Isolation Forest models

  • One-Class SVM models

  • PCA transformers

  • Feature scalers and encoders


πŸš€ Installation

1. Clone the Repository

git clone https://github.com/tanishq14
cd anomaly-detection-api

2. Create Virtual Environment

python -m venv anom_det

Activate (Linux/Mac)

source anom_det/bin/activate

Activate (Windows)

anom_det\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Download/Train Models

Place your trained models in the models/ directory: models/network/.pkl models/mvtec/.pkl, .pt models/xray/.pkl, *.pt

5. Verify Installation

python check_system.py

If all checks pass βœ…, you're ready to go!


⚑ Quick Start

Start the API Server

python app.py

The API will be available at: http://localhost:5000

Test the API

Network Detection

curl -X POST http://localhost:5000/api/predict/network
-H "Content-Type: application/json"
-d '{
"dur": 0.5,
"proto": "tcp",
"service": "http",
"state": "FIN",
"spkts": 12,
"dpkts": 10,
"sbytes": 800,
"dbytes": 15000,
"rate": 40.0
}'

Image Analysis (MVTec/X-ray)

curl -X POST http://localhost:5000/api/predict/mvtec
-F "file=@product_image.png"
curl -X POST http://localhost:5000/api/predict/xray
-F "file=@chest_xray.png"

πŸ“š API Documentation

Endpoints

Method Endpoint Description
GET / Homepage
GET /network Network detection UI
GET /mvtec MVTec inspection UI
GET /xray X-ray analysis UI
POST /api/predict/network Network intrusion detection
POST /api/predict/mvtec Product quality inspection
POST /api/predict/xray Chest X-ray analysis
GET /api/health Health check
GET /api/models/info Model information

Response Format

All API responses follow this structure:

{
"success": true,
"timestamp": "2025-12-26T15:13:00.000000",
"api_version": "2.0",
"data": {
"ensemble": {
"prediction": "Normal",
"confidence": 95.5,
"votes": { "Normal": 3, "Anomaly": 1 }
},
"models": { ... },
"processing_time": "0.234s"
}
}

πŸ“Š Datasets

Network: UNSW-NB15

  • Records: 2.5 million network flows
  • Features: 44 dimensions
  • Attack Types: 9 categories (DoS, Exploits, Reconnaissance, etc.)
  • Split: 80% train, 20% test

MVTec: AD

  • Images: 5,000+ high-resolution product images
  • Categories: 15 product types
  • Defects: Cracks, scratches, contamination, missing parts
  • Split: Per-category train/test
  • Images: 112,120 frontal chest X-rays
  • Conditions: 14 thoracic pathologies
  • Classes: Multi-label classification
  • Resolution: 224x224 (preprocessed)

πŸ—οΈ Architecture

System Overview

User Input β†’ Flask API β†’ Pipeline Module β†’ Ensemble Models β†’ Prediction β†’ Validation & Preprocessing β†’ Feature Extraction (if image) β†’ Parallel Model Execution β†’ Majority Voting Ensemble β†’ JSON Response + Confidence

Models by Domain

Network (4 models):

  • Isolation Forest
  • One-Class SVM
  • Elliptic Envelope
  • Local Outlier Factor (LOF)

MVTec (5 components):

  • ResNet34 (feature extraction)
  • Isolation Forest
  • One-Class SVM
  • Elliptic Envelope
  • LOF

X-ray (7 models):

  • Autoencoder (unsupervised)
  • Isolation Forest (unsupervised)
  • One-Class SVM (unsupervised)
  • Elliptic Envelope (unsupervised)
  • LOF (unsupervised)
  • Decision Tree (supervised)
  • K-Nearest Neighbors (supervised)

πŸ“ Usage Examples

Python API

from modules import predict_network, predict_mvtec, predict_xray

Network detection with preset
result = predict_network({'preset': 'normal_web_browsing'})
print(result['ensemble']['prediction']) # 'Normal' or 'Attack'

Product quality inspection
result = predict_mvtec('product_image.png')
if result['ensemble']['prediction'] == 'Anomaly':
print(f"Defect detected! Confidence: {result['ensemble']['confidence']}%")

Medical X-ray analysis
result = predict_xray('chest_xray.png')
for model, data in result['supervised_models'].items():
print(f"{model}: {data['prediction']} ({data['confidence']:.1f}%)")

Web Interface

  1. Navigate to http://localhost:5000
  2. Select a detection module
  3. Upload image or enter data
  4. View comprehensive results with visualizations

πŸ“ Project Structure

anomaly-detection-api/
β”œβ”€β”€ README.md           # This file
β”œβ”€β”€ requirements.txt    # Python dependencies
β”œβ”€β”€ app.py              # Flask application
β”œβ”€β”€ check_system.py     # System diagnostics
β”œβ”€β”€ modules/            # Pipeline modules
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ network_pipeline.py
β”‚ β”œβ”€β”€ mvtec_pipeline.py
β”‚ └── xray_pipeline.py
β”œβ”€β”€ models/             # Trained models
β”‚ β”œβ”€β”€ network/
β”‚ β”œβ”€β”€ mvtec/
β”‚ └── xray/
β”œβ”€β”€ templates/          # HTML templates
β”œβ”€β”€ static/             # CSS/JS assets
└── docs/               # Documentation

πŸ“ˆ Performance Metrics

Domain Imabalance Ratio Accuracy Precision Recall F1-Score Models
Network Extreme 99.2% 98.5% 99.1% 98.8% 4
MVTec High 95.8% 94.2% 96.1% 95.1% 4
X-ray Moderate 92.3% 91.8% 92.7% 92.2% 7

Metrics calculated on respective test sets using ensemble predictions


πŸ§ͺ Testing

Run system diagnostics:

Basic check

python check_system.py

Verbose output

python check_system.py --verbose

Attempt fixes

python check_system.py --fix

Run unit tests (if implemented):

pytest tests/

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ‘¨β€πŸ’» Author

Tanishq Rahul Shelke

  • Masters in Engineering (MEng) - Machine Learning Engineer
  • Focus: Anomaly Detection, Ensemble Methods, Deep Learning
  • LinkedIn: Tanishq Shelke
  • GitHub: Tanishq14

πŸ™ Acknowledgments

  • UNSW-NB15 Dataset: University of New South Wales
  • MVTec AD Dataset: MVTec Software GmbH
  • NIH Chest X-ray14: National Institutes of Health
  • PyTorch Team: For the deep learning framework
  • Scikit-learn Team: For machine learning tools

πŸ“ž Support

For issues, questions, or suggestions:


⭐ If you find this project useful, please consider giving it a star!

Last Updated: February 1, 2026

anomaly_detection_api

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

This repository hosts a comprehensive, web-based Anomaly Detection platform designed to identify irregularities across three distinct domains: Network Security, Industrial Manufacturing, and Medical Imaging. Built with Flask REST API and powered by PyTorch and Scikit-learn.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors