Log Analysis & Anomaly Detection Using Machine Learning

This Python script implements an advanced process resource monitoring and anomaly detection system using K-Means clustering and machine learning techniques. The system analyzes system processes based on their resource usage patterns and identifies potential anomalies.

Features

Resource Monitoring: Tracks multiple system metrics including:
- CPU usage percentage
- Memory usage percentage
- Disk read/write rates (MB)
- Network sent/received rates (MB)
Advanced Analytics:
- K-Means clustering for behavioral profiling
- Principal Component Analysis (PCA) for dimensionality reduction
- Hybrid anomaly detection using:
  - Isolation Forest
  - Distance-based metrics
- Interactive 3D visualization of process clusters
Comprehensive Reporting:
- Detailed cluster analysis
- Anomaly detection results
- Performance metrics
- Interactive visualizations

Requirements

# Core Libraries
numpy
pandas
joblib

# Visualization
matplotlib
plotly
plotly.express
plotly.graph_objects

# Machine Learning
scikit-learn

Installation

Clone this repository or download the script
Install required packages:

pip install numpy pandas joblib matplotlib plotly scikit-learn

Usage

Prepare your process log data in CSV format with the following columns:
- cpu_percent
- memory_percent
- disk_read_mb
- disk_write_mb
- net_sent_mb
- net_recv_mb
Update the configuration section in the script:

DATA_PATH = 'process_log.csv'  # Path to your input data
SAVE_DIR = 'models'            # Directory to save models

Run the script:

python process_resource_clustering_k_means.py

Script Structure

The script is organized into multiple cells, each handling a specific aspect of the analysis:

Setup & Configuration (Cell 1)
- Library imports
- Global configuration
- Utility functions
Data Loading & Inspection (Cell 2)
- Data loading
- Initial validation
- Schema verification
Exploratory Data Analysis (Cells 3-5)
- Feature distributions
- Correlation analysis
- Temporal behavior analysis
Data Preprocessing (Cells 6-7)
- Missing value handling
- Scaling
- PCA transformation
Model Development (Cells 8-10)
- K-Means model selection
- Model training
- Performance evaluation
Anomaly Detection (Cell 11)
- Hybrid approach implementation
- Distance-based detection
- Isolation Forest integration
Visualization (Cells 12-14)
- Interactive 3D cluster visualization
- Inference overlay
- Test case visualization
Reporting (Cells 15-17)
- Data export
- Summary metrics
- Interpretive analysis
- Anomaly intelligence report

Output

The script generates several outputs:

Trained models saved in the models directory:
- scaler.joblib: StandardScaler model
- pca.joblib: PCA model
- kmeans_final.joblib: Final K-Means model
- isolation_forest.joblib: Isolation Forest model
- dist_minmax.joblib: MinMax scaler for distances
Comprehensive CSV report:
- cluster_anomaly_summary.csv: Detailed analysis results
Interactive visualizations:
- 3D cluster plots
- Feature distribution plots
- Correlation heatmaps

Performance Metrics

The script evaluates clustering performance using multiple metrics:

Silhouette Score
Calinski-Harabasz Index
Davies-Bouldin Index

License

This project is open-source and available under the MIT License.

Author

Swarag V S

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Process-Clustering-Anomaly-Detection		Process-Clustering-Anomaly-Detection
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Log Analysis & Anomaly Detection Using Machine Learning

Features

Requirements

Installation

Usage

Script Structure

Output

Performance Metrics

License

Author

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Log Analysis & Anomaly Detection Using Machine Learning

Features

Requirements

Installation

Usage

Script Structure

Output

Performance Metrics

License

Author

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages