🤖 Techniques to Overcome Class Imbalance using Anomaly Detection

A specialized framework designed to handle extreme class imbalance by treating minority classes as anomalies. This project demonstrates high-precision detection across Network, Manufacturing quality control and Medical domains using machine learning methods.

📋 Table of Contents

Overview
Class Imbalance Strategy
Case Study : Isolation Forest
Features
System Requirements
Dataset setup & Model Artifact Generation
Installation
Quick Start
API Documentation
Datasets
Architecture
Usage Examples
Project Structure
Performance Metrics
Contributing
License
Author

🎯 Overview

This project addresses the "Imbalanced Data" problem in Machine Learning. In real-world scenarios like Network Intrusion or Medical Diagnosis, the "Anomaly" is often extremely rare. This API uses Ensemble Anomaly Detection to find those rare events without needing a perfectly balanced training set. It is a production anomaly detection API built with FastAPI and PyTorch, designed for integration into industrial inspection, medical x-ray early disease detection and analysis or cybersecurity monitoring systems.

🔧 Class Imbalance Strategy

Technique	Implementation	Benefit
One-Class SVM	Learns the boundary of "Normal" data.	Ignores the lack of minority samples.
Autoencoders	Reconstructs input; high error = anomaly.	Self-supervised; no labels required.
Isolation Forest	Isolates points in a tree structure.	Efficiently finds outliers in large data.

🔬 Case Study : Isolation Forest

The Theory: Anomalies are "few and different." In a tree-based structure, they are isolated much faster (shorter path) than normal points (longer path). This allows the API to detect attacks even if they were never seen during training.

✨ Features

Core Capabilities

Multi-domain support - 3 independent detection modules
Ensemble learning - Combines 4-7 models per domain using majority voting
RESTful API - Clean Flask-based API with JSON responses
Web interface - User-friendly HTML/CSS/JS frontend
Real-time processing - Fast inference with processing time tracking
Comprehensive results - Individual model outputs + ensemble predictions

Technical Features

🔥 Deep Learning - ResNet34 for feature extraction, Autoencoders for anomaly detection
📊 Classical ML - Isolation Forest, One-Class SVM, Elliptic Envelope, LOF
🎯 High Accuracy - 92-99% across different domains
📈 Detailed Metrics - Confidence scores, processing times, model agreement
🔒 Secure uploads - File validation, size limits, automatic cleanup
📝 Logging - Comprehensive logging for debugging and monitoring

💻 System Requirements

Minimum Requirements

OS: Linux, macOS, Windows 10+
Python: 3.8 or higher
RAM: 8GB (16GB recommended for X-ray module)
Storage: 5GB for models and dependencies
GPU: Optional (CUDA-compatible for faster inference) PyTorch+cu117
Note: CUDA acceleration is used consistently across all pipelines (UNSW-NB15, MVTec AD, and NIH ChestXray14), particularly for CNN-based feature extraction and autoencoder inference. Device selection is handled dynamically (cuda if available, otherwise cpu), ensuring both efficiency and portability across systems.

Python Dependencies

flask>=2.0.0 torch>=1.9.0 torchvision>=0.10.0 scikit-learn>=1.0.0 numpy>=1.21.0 pandas>=1.3.0 pillow>=8.3.0 scipy>=1.7.0

📁 Dataset Setup & Model Artifact Generation

To generate the required trained models and serialized artifacts (.pkl files) used by the anomaly detection API, all Python notebooks within the other repository MEng — Techniques to overcome class imbalance using anomaly/defect detection. This repository must be executed end-to-end after setting up the datasets in the correct directory structure.

Download Datasets

Download the following datasets from their official sources:

UNSW-NB15 – Network intrusion detection dataset
MVTec AD – Industrial defect detection dataset
NIH ChestXray14 – Medical imaging anomaly detection dataset

⚠️ Due to licensing restrictions, datasets are not included in this repository.

Execute All Notebooks

Each notebook must be run from top to bottom to:

Preprocess datasets
Train anomaly detection models
Calibrate decision thresholds and anomaly scores
Serialize trained models and preprocessing pipelines

This process generates .pkl files (e.g., trained models, PCA objects, scalers, encoders), which are saved locally and later loaded by the API for inference.

Examples of generated artifacts include:

Isolation Forest models
One-Class SVM models
PCA transformers
Feature scalers and encoders

🚀 Installation

1. Clone the Repository

git clone https://github.com/tanishq14
cd anomaly-detection-api

2. Create Virtual Environment

python -m venv anom_det

Activate (Linux/Mac)

source anom_det/bin/activate

Activate (Windows)

anom_det\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Download/Train Models

Place your trained models in the models/ directory: models/network/.pkl models/mvtec/.pkl, .pt models/xray/.pkl, *.pt

5. Verify Installation

python check_system.py

If all checks pass ✅, you're ready to go!

⚡ Quick Start

Start the API Server

python app.py

The API will be available at: http://localhost:5000

Test the API

Network Detection

curl -X POST http://localhost:5000/api/predict/network
-H "Content-Type: application/json"
-d '{
"dur": 0.5,
"proto": "tcp",
"service": "http",
"state": "FIN",
"spkts": 12,
"dpkts": 10,
"sbytes": 800,
"dbytes": 15000,
"rate": 40.0
}'

Image Analysis (MVTec/X-ray)

curl -X POST http://localhost:5000/api/predict/mvtec
-F "file=@product_image.png"

curl -X POST http://localhost:5000/api/predict/xray
-F "file=@chest_xray.png"

📚 API Documentation

Endpoints

Method	Endpoint	Description
`GET`	`/`	Homepage
`GET`	`/network`	Network detection UI
`GET`	`/mvtec`	MVTec inspection UI
`GET`	`/xray`	X-ray analysis UI
`POST`	`/api/predict/network`	Network intrusion detection
`POST`	`/api/predict/mvtec`	Product quality inspection
`POST`	`/api/predict/xray`	Chest X-ray analysis
`GET`	`/api/health`	Health check
`GET`	`/api/models/info`	Model information

Response Format

All API responses follow this structure:

{
"success": true,
"timestamp": "2025-12-26T15:13:00.000000",
"api_version": "2.0",
"data": {
"ensemble": {
"prediction": "Normal",
"confidence": 95.5,
"votes": { "Normal": 3, "Anomaly": 1 }
},
"models": { ... },
"processing_time": "0.234s"
}
}

📊 Datasets

Network: UNSW-NB15

Records: 2.5 million network flows
Features: 44 dimensions
Attack Types: 9 categories (DoS, Exploits, Reconnaissance, etc.)
Split: 80% train, 20% test

MVTec: AD

Images: 5,000+ high-resolution product images
Categories: 15 product types
Defects: Cracks, scratches, contamination, missing parts
Split: Per-category train/test

X-ray: NIH Chest X-ray14

Images: 112,120 frontal chest X-rays
Conditions: 14 thoracic pathologies
Classes: Multi-label classification
Resolution: 224x224 (preprocessed)

🏗️ Architecture

System Overview

User Input → Flask API → Pipeline Module → Ensemble Models → Prediction → Validation & Preprocessing → Feature Extraction (if image) → Parallel Model Execution → Majority Voting Ensemble → JSON Response + Confidence

Models by Domain

Network (4 models):

Isolation Forest
One-Class SVM
Elliptic Envelope
Local Outlier Factor (LOF)

MVTec (5 components):

ResNet34 (feature extraction)
Isolation Forest
One-Class SVM
Elliptic Envelope
LOF

X-ray (7 models):

Autoencoder (unsupervised)
Isolation Forest (unsupervised)
One-Class SVM (unsupervised)
Elliptic Envelope (unsupervised)
LOF (unsupervised)
Decision Tree (supervised)
K-Nearest Neighbors (supervised)

📝 Usage Examples

Python API

from modules import predict_network, predict_mvtec, predict_xray

Network detection with preset
result = predict_network({'preset': 'normal_web_browsing'})
print(result['ensemble']['prediction']) # 'Normal' or 'Attack'

Product quality inspection
result = predict_mvtec('product_image.png')
if result['ensemble']['prediction'] == 'Anomaly':
print(f"Defect detected! Confidence: {result['ensemble']['confidence']}%")

Medical X-ray analysis
result = predict_xray('chest_xray.png')
for model, data in result['supervised_models'].items():
print(f"{model}: {data['prediction']} ({data['confidence']:.1f}%)")

Web Interface

Navigate to http://localhost:5000
Select a detection module
Upload image or enter data
View comprehensive results with visualizations

📁 Project Structure

anomaly-detection-api/
├── README.md           # This file
├── requirements.txt    # Python dependencies
├── app.py              # Flask application
├── check_system.py     # System diagnostics
├── modules/            # Pipeline modules
│ ├── __init__.py
│ ├── network_pipeline.py
│ ├── mvtec_pipeline.py
│ └── xray_pipeline.py
├── models/             # Trained models
│ ├── network/
│ ├── mvtec/
│ └── xray/
├── templates/          # HTML templates
├── static/             # CSS/JS assets
└── docs/               # Documentation

📈 Performance Metrics

Domain	Imabalance Ratio	Accuracy	Precision	Recall	F1-Score	Models
Network	Extreme	99.2%	98.5%	99.1%	98.8%	4
MVTec	High	95.8%	94.2%	96.1%	95.1%	4
X-ray	Moderate	92.3%	91.8%	92.7%	92.2%	7

Metrics calculated on respective test sets using ensemble predictions

🧪 Testing

Run system diagnostics:

Basic check

python check_system.py

Verbose output

python check_system.py --verbose

Attempt fixes

python check_system.py --fix

Run unit tests (if implemented):

pytest tests/

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

Tanishq Rahul Shelke

Masters in Engineering (MEng) - Machine Learning Engineer
Focus: Anomaly Detection, Ensemble Methods, Deep Learning
LinkedIn: Tanishq Shelke
GitHub: Tanishq14

🙏 Acknowledgments

UNSW-NB15 Dataset: University of New South Wales
MVTec AD Dataset: MVTec Software GmbH
NIH Chest X-ray14: National Institutes of Health
PyTorch Team: For the deep learning framework
Scikit-learn Team: For machine learning tools

📞 Support

For issues, questions, or suggestions:

Open an Issue

⭐ If you find this project useful, please consider giving it a star!

Last Updated: February 1, 2026

anomaly_detection_api

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
frontend		frontend
modules		modules
static		static
templates		templates
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Readme.md		Readme.md
app.py		app.py
check_models.py		check_models.py
check_system.py		check_system.py
config.py		config.py
construction_mapp.txt		construction_mapp.txt
docker-compose.yml		docker-compose.yml
project_structure.txt		project_structure.txt
requirements.txt		requirements.txt
test_utils.py		test_utils.py

Folders and files

Latest commit

History

Repository files navigation

🤖 Techniques to Overcome Class Imbalance using Anomaly Detection

📋 Table of Contents

🎯 Overview

🔧 Class Imbalance Strategy

🔬 Case Study : Isolation Forest

✨ Features

Core Capabilities

Technical Features

💻 System Requirements

Minimum Requirements

Python Dependencies

📁 Dataset Setup & Model Artifact Generation

🚀 Installation

1. Clone the Repository

2. Create Virtual Environment

Activate (Linux/Mac)

Activate (Windows)

3. Install Dependencies

4. Download/Train Models

5. Verify Installation

⚡ Quick Start

Start the API Server

Test the API

Network Detection

Image Analysis (MVTec/X-ray)

📚 API Documentation

Endpoints

Response Format

All API responses follow this structure:

📊 Datasets

Network: UNSW-NB15

MVTec: AD

X-ray: NIH Chest X-ray14

🏗️ Architecture

System Overview

Models by Domain

📝 Usage Examples

Python API

Web Interface

📁 Project Structure

📈 Performance Metrics

🧪 Testing

Run system diagnostics:

🤝 Contributing

📄 License

👨‍💻 Author

🙏 Acknowledgments

📞 Support

anomaly_detection_api

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages