This repository documents the design and implementation of a real-time data streaming platform for an ecommerce use case.
The project focuses on data engineering fundamentals: event-driven architecture, Kafka-based ingestion, observability, and infrastructure decisions that protect data producers at scale.
Ecommerce platforms generate continuous streams of events:
- page views
- cart interactions
- purchases
These events are:
- high volume
- bursty
- business critical
A key constraint guided this design:
data producers must never break, even as the platform evolves.
All events enter the platform through a single, stable endpoint:
events.ecommerce-domain.com
Amazon Route 53 is used as a strategic routing layer to:
- decouple producers from backend infrastructure
- enable safe evolution of pipelines
- support failover and traffic spikes
Behind this entry point, Kafka handles durable ingestion and streaming.

The platform processes three core event categories:
- Page Views
- High volume, append-only
- Cart Events
- Bursty traffic, user-driven
- Purchase Events
- Low volume, business critical
Each event type is defined using explicit schemas to enforce contracts between producers and consumers.
- Producers publish events asynchronously
- Topics are partitioned based on access patterns
- Consumers are designed to be idempotent
- Consumer lag is treated as a first-class metric
Kafka configuration reflects traffic patterns and business criticality rather than uniform defaults.
Grafana dashboards track:
- ingestion rate
- consumer lag
- processing latency
- error rates
Observability is used not only for monitoring, but to inform routing and scaling decisions.
- Traffic simulation for load testing
- Schema versioning and compatibility checks
- Infrastructure as Code (Terraform / CloudFormation)
- Stream processing with Kafka Streams or Flink
This project was built as a hands-on exercise to showcase data engineering skills through realistic system design and documented technical decisions.