Kubernetes operator for streaming data between different sources and sinks with support for message transformations.
The operator watches DataFlow custom resources and creates processor pods. Each processor reads from a source, applies optional transformations, and writes to a sink. Configuration uses the type + config format:
apiVersion: dataflow.dataflow.io/v1
kind: DataFlow
metadata:
name: kafka-to-postgres
spec:
source:
type: kafka
config:
brokers:
- localhost:9092
topic: input-topic
consumerGroup: dataflow-group
sink:
type: postgresql
config:
connectionString: "postgres://user:pass@localhost:5432/db?sslmode=disable"
table: output_table| Source | Sink |
|---|---|
| Kafka | Kafka |
| PostgreSQL | PostgreSQL |
| — | ClickHouse |
| — | Trino |
| — | Nessie (Apache Iceberg) |
Filter, Select, Remove, Mask, Flatten, Timestamp, SnakeCase, Router, Chain.
- Kubernetes 1.24+
- Helm 3.0+
- kubectl
- Go 1.25+ (for local development)
- Docker and docker-compose (for local development)
helm repo add dataflow-operator https://dataflow-operator.github.io/helm-charts
helm repo update
helm install dataflow-operator dataflow-operator/dataflow-operator- Start dependencies:
docker-compose up -dAvailable UIs:
- Kafka UI: http://localhost:8080
- ClickHouse: http://localhost:8123
- Run the operator:
task runtask generate # DeepCopy, CRD, RBAC manifests
task manifests # CRD and RBAC onlyIf you encounter issues with task generate:
go install sigs.k8s.io/controller-tools/cmd/controller-gen@latest
task generatetask build # builds bin/managerDocker image (builds both operator and processor binaries):
docker build -t dataflow:local .# Unit tests
task test
# Integration tests (requires Docker — uses testcontainers)
task test-integration| Command | Description |
|---|---|
task build |
Build bin/manager |
task run |
Run operator locally |
task generate |
Generate DeepCopy and manifests |
task manifests |
Generate CRD and RBAC |
task test |
Unit tests |
task test-integration |
Integration tests (Docker required) |
task install |
Install CRDs into cluster |
task uninstall |
Remove CRDs from cluster |
Example manifests are in config/samples/:
kafka-to-postgres.yaml— Kafka to PostgreSQLkafka-to-clickhouse.yaml— Kafka to ClickHousekafka-to-trino.yaml— Kafka to Trinopostgres-to-kafka-router.yaml— PostgreSQL to Kafka with Routerclickhouse-to-clickhouse.yaml— ClickHouse to ClickHouse
See all samples: ls config/samples/
Apache License 2.0