Incident correlation engine for data teams
Connects test failures to the job runs that caused them:
- Automatically correlates lineage events with test results
- Links upstream failures to downstream impact
- Provides a unified view across your data stack
- Works with your existing OpenLineage infrastructure
# Start the server (Docker)
docker-compose -f deployments/docker/docker-compose.yml up -d
# Or build from source
make build
./build/correlator
# Ingest OpenLineage events (single event - standard OL API)
curl -X POST http://localhost:8080/api/v1/lineage \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d @event.json
# Ingest batch events
curl -X POST http://localhost:8080/api/v1/lineage/batch \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d @events.jsonStandard OpenLineage integrations (dbt, Airflow, GE) work out of the box:
export OPENLINEAGE_URL=http://localhost:8080
export OPENLINEAGE_API_KEY=your-api-keyCorrelator receives OpenLineage events and builds correlation views:
- Ingest - Receives OpenLineage events via HTTP API or Kafka
- Canonicalize - Normalizes dataset URNs for consistent matching
- Store - Persists events with idempotency handling
- Correlate - Builds materialized views linking tests → datasets → jobs
- Query - Exposes correlation data via API
The Problem: When data tests fail, teams spend significant time manually searching through logs, lineage graphs, and job histories to find the root cause.
What You Get: Automated correlation that connects test failures to their source, reducing time spent investigating incidents.
Key Benefits:
- Faster incident resolution: Automated correlation instead of manual investigation
- Unified view: One place to see test results, lineage, and job runs
- Standards-based: Built on OpenLineage — no vendor lock-in
Different tools can emit different dataset URNs for the same underlying table. For example, when tools use different schema prefixes or naming conventions, Correlator sees them as separate datasets and cannot correlate test failures across tools.
Solution: Configure dataset patterns in .correlator.yaml to map orphan URNs to their canonical form:
# .correlator.yaml
dataset_patterns:
# GE emits: postgresql://prod-db/marts.customers
# dbt emits: postgresql://prod-db/analytics.marts.customers
- pattern: "postgresql://prod-db/marts.{table}"
canonical: "postgresql://prod-db/analytics.marts.{table}"Pattern syntax:
{variable}— captures any characters except/{variable*}— captures any characters including/(for paths)- Literal characters match exactly
- First matching pattern wins (order matters)
Workflow:
- Deploy Correlator and run your data tools (dbt, GE, Airflow)
- Check the Correlation Health page for orphan datasets
- Apply suggested patterns or write custom ones in
.correlator.yaml - Restart Correlator — historical data is immediately resolved
See .correlator.yaml.example for a full configuration template.
| Variable | Description | Default |
|---|---|---|
CORRELATOR_CONFIG_PATH |
Path to YAML config file | .correlator.yaml |
CORRELATOR_AUTH_ENABLED |
Enable API key authentication | false |
CORRELATOR_SERVER_PORT |
HTTP server port | 8080 |
CORRELATOR_SERVER_LOG_LEVEL |
Log level (debug, info, warn, error) | info |
CORRELATOR_KAFKA_ENABLED |
Enable Kafka consumer for OL events | false |
CORRELATOR_KAFKA_BROKERS |
Comma-separated Kafka broker addresses | (required if enabled) |
CORRELATOR_KAFKA_TOPIC |
Kafka topic to consume from | openlineage.events |
CORRELATOR_KAFKA_GROUP |
Kafka consumer group ID | correlator |
See .env.example for all available configuration options.
This project follows Semantic Versioning with the following guidelines:
-
0.x.y versions (e.g., 0.1.0, 0.2.0) indicate initial development phase:
- The API is not yet stable and may change between minor versions
- Features may be added, modified, or removed without major version changes
- For production-critical systems, please pin a version that works in your environment
-
1.0.0 and above will indicate a stable API with semantic versioning guarantees:
- MAJOR version for incompatible API changes
- MINOR version for backwards-compatible functionality additions
- PATCH version for backwards-compatible bug fixes
The current version is in early development stage, so expect possible API changes until the 1.0.0 release.
- Development: docs/DEVELOPMENT.md - Local setup, testing, architecture
- Contributing: docs/CONTRIBUTING.md - Contribution guidelines
- Go 1.23+
- PostgreSQL 15+
- Docker (for local development)
- OpenLineage: https://openlineage.io/
- Issues: https://github.com/correlator-io/correlator/issues
- Discussions: https://github.com/correlator-io/correlator/discussions
Apache 2.0 - See LICENSE for details.