Skip to content

writickp3-ctrl/system-observability-platform

Repository files navigation

System Observability Platform

A backend observability platform designed to ingest system logs, perform ML-based anomaly detection, and expose operational metrics for Linux-based services.

This project demonstrates production-style backend engineering practices including modular service design, REST APIs, structured logging, ML integration, and containerized deployment.


Problem Statement

Modern systems generate large volumes of logs and operational signals. Manually inspecting these logs is inefficient and error-prone.

This project aims to:

  • Collect system logs centrally
  • Persist events reliably
  • Detect anomalies automatically using ML
  • Classify severity of events
  • Provide observability metrics for debugging and monitoring

Tech Stack

  • Python (Backend + ML)
  • Flask (REST APIs)
  • SQLite (Event storage)
  • Docker (Containerization)
  • Linux (Runtime environment)

Core Features

  • REST APIs for system log ingestion
  • Persistent event storage using SQLite
  • Modular Flask services with centralized exception handling
  • Structured logging for traceability
  • ML-based anomaly detection and severity classification
  • Dockerized deployment for reproducible environments
  • Runtime operational metrics for debugging and analysis

High-Level Architecture

Log Producer / Client | v Flask REST API | v Processing Layer | | v v SQLite Store ML Engine | v Metrics & Monitoring


Repository Structure

system-observability-platform/ │ ├── src/ # Application source code ├── docs/ # Architecture and documentation ├── logs/ # Generated logs ├── metrics/ # Runtime metrics ├── Dockerfile ├── app.py └── README.md


Functional Flow

1. Log Ingestion

  • Clients send system events via REST endpoints
  • Payloads are validated
  • Events are persisted in SQLite
  • Structured logs are generated for traceability

2. Anomaly Detection

  • Incoming events are passed to ML pipeline
  • Features are extracted
  • Model detects abnormal patterns
  • Severity levels are assigned

3. Observability

  • Runtime metrics are generated
  • Errors and performance data are logged
  • Enables debugging and system health analysis

API Overview

POST /log

Accepts system log events.

Example:

curl -X POST http://localhost:5000/log
-H "Content-Type: application/json"
-d '{"message":"CPU spike detected","level":"warning"}'

Machine Learning Pipeline

Preprocessing of incoming log features

Anomaly detection model flags abnormal behavior

Severity classification assigns priority levels

ML inference is integrated directly into backend workflows

This simulates production-style ML + backend integration.

Run Locally (Docker)

Build Image

docker build -t observability .

Run Container

docker run -p 5000:5000 observability

Service will be available at:

http://localhost:5000

Engineering Highlights

Designed modular Flask backend following service-oriented patterns

Integrated ML inference into request pipelines

Implemented structured logging and centralized exception handling

Containerized application using Docker for consistent deployments

Applied observability concepts including metrics generation and event classification

Performance & Reliability Considerations

Lightweight SQLite storage for fast local persistence

Modular services allow future horizontal scaling

Docker ensures environment consistency

Structured logs simplify debugging

ML inference is embedded directly into backend flow

Key Learnings

Backend service architecture

REST API design

Linux observability fundamentals

ML integration in production workflows

Docker-based deployment strategies

Debugging distributed components

Planned Enhancements

Replace SQLite with PostgreSQL

Add Prometheus metrics exporter

Implement distributed tracing

Introduce async processing pipeline

Build monitoring dashboard UI

Add authentication and rate limiting

Introduce message queue for log ingestion

Future Improvements

  • Replace SQLite with PostgreSQL
  • Add Prometheus metrics
  • Async ingestion using Celery
  • Distributed tracing

Author

Developed in 2024 as part of backend systems and observability engineering practice.

About

Backend system for log ingestion, anomaly detection, and monitoring

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors