KPipe is a lightweight Kafka processing library for modern Java that lets you build safe, high‑performance message pipelines using virtual threads and a functional API.
- Modern Java concurrency (virtual threads)
- Composable functional pipelines
- Safe at-least-once processing
- Backpressure to protect downstream systems
- High throughput with minimal framework overhead
It is designed for Kafka consumer services performing transformations, enrichment, or routing.
Create a pipeline and start a Kafka consumer in a few lines:
final var registry = new MessageProcessorRegistry("demo");
final var sanitizeKey = RegistryKey.json("sanitize");
registry.registerOperator(sanitizeKey, JsonMessageProcessor.removeFieldsOperator("password"));
final var stampKey = RegistryKey.json("stamp");
registry.registerOperator(stampKey, JsonMessageProcessor.addTimestampOperator("processedAt"));
final var pipeline = registry.pipeline(MessageFormat.JSON)
.add(sanitizeKey, stampKey)
.toSink(MessageSinkRegistry.JSON_LOGGING)
.build();
final var consumer = KPipeConsumer.<byte[], byte[]>builder()
.withProperties(kafkaProps)
.withTopic("users")
.withPipeline(pipeline)
.withRetry(3, Duration.ofSeconds(1))
.build();
// Use the runner to manage the consumer lifecycle
final var runner = KPipeRunner.builder(consumer).build();
runner.start();KPipe handles:
- record processing
- retries
- metrics
- offset tracking
- safe commits
KPipe works well for:
- Kafka consumer microservices
- event enrichment pipelines
- lightweight transformation services
- I/O-bound processing (REST calls, database lookups)
- teams adopting modern Java concurrency
KPipe is not intended to replace large streaming frameworks. It focuses on simple, composable Kafka consumer pipelines.
Kafka Streams is powerful but introduces a full topology framework and state management layer that many services do not need.
KPipe focuses on code-first pipelines with minimal infrastructure overhead.
| Capability | Kafka Streams | Reactor Kafka | KPipe |
|---|---|---|---|
| Full stream processing framework | Yes | No | No |
| Lightweight consumer pipelines | Partial | Yes | Yes |
| Virtual-thread friendly | No | No | Yes |
| Functional pipeline API | Yes | Yes | Yes |
| Minimal dependencies | No | Yes | Yes |
KPipe sits between raw KafkaConsumer code and full streaming frameworks.
<dependency>
<groupId>io.github.eschizoid</groupId>
<artifactId>kpipe</artifactId>
<version>1.4.0</version>
</dependency>implementation 'io.github.eschizoid:kpipe:1.4.0'implementation("io.github.eschizoid:kpipe:1.4.0")libraryDependencies += "io.github.eschizoid" % "kpipe" % "1.4.0"KPipe is designed to be a lightweight, high-performance alternative to existing Kafka consumer libraries, focusing on modern Java 24+ features (Virtual Threads, Scoped Values) and predictable behavior.
Unlike traditional pipelines that often perform byte[] -> Object -> byte[] at every transformation step, KPipe
optimizes for throughput:
- Single Deserialization: Messages are deserialized once into a mutable representation (e.g.,
Mapfor JSON,GenericRecordfor Avro) via theMessagePipeline. - In-Place Transformations: A chain of
UnaryOperatorfunctions is applied to the same object. - Single Serialization: The final object is serialized back to
byte[]only once. - Integrated Sinks: Typed sinks can be attached directly to the pipeline, receiving the object before final serialization.
This approach significantly reduces CPU overhead and GC pressure.
KPipe uses Java Virtual Threads (Project Loom) for high-concurrency message processing.
- Efficient Resource Reuse: Heavy objects like
Schema.Parser,ByteArrayOutputStream, and Avro encoders are cached per virtual thread usingScopedValue, which is significantly more lightweight thanThreadLocal.- Optimization:
ScopedValueallows KPipe to share these heavy resources across all transformations in a single pipeline without the memory leak risks or scalability bottlenecks ofThreadLocalin a virtual-thread-per-record model.
- Optimization:
- Thread-Per-Record: Each message is processed in its own virtual thread, allowing KPipe to scale to millions of concurrent operations without the overhead of complex thread pools.
KPipe implements a Lowest Pending Offset strategy to ensure reliability even with parallel processing:
- Pluggable Offset Management: Use the
OffsetManagerinterface to customize how offsets are stored (Kafka-based or external database). - In-Flight Tracking: Every record's offset is tracked in a
ConcurrentSkipListSetper partition (inKafkaOffsetManager). - No-Gap Commits: Even if message 102 finishes before 101, offset 102 will not be committed until 101 is successfully processed.
- Crash Recovery: If the consumer crashes, it will resume from the last committed "safe" offset. While this may result in some records being re-processed (standard "at-least-once" behavior), it guarantees no message is ever skipped.
KPipe supports two modes of execution depending on your ordering and throughput requirements:
- Parallel Mode (Default): Best for stateless transformations (enrichment, masking). High throughput via virtual threads. Offsets are committed based on the lowest pending offset to ensure no gaps.
- Sequential Mode (
.withSequentialProcessing(true)): Best for stateful transformations where order per partition is critical (e.g., balance updates, sequence-dependent events). In this mode, only one message per partition is processed at a time. Backpressure is supported and operates by monitoring the consumer lag (the difference between the partition end-offset and the consumer's current position).
While Kafka-based offset storage is the default, KPipe supports external storage (e.g., PostgreSQL) for exactly-once processing or specific architectural needs.
- Seek on Assignment: When partitions are assigned, fetch the last processed offset from your database and call
consumer.seek(partition, offset + 1). - Update on Processed: Implement
markOffsetProcessedto save the offset to the database.
public class PostgresOffsetManager<K, V> implements OffsetManager<K, V> {
private final Consumer<K, V> consumer;
// ... DB connection ...
@Override
public void markOffsetProcessed(final ConsumerRecord<K, V> record) {
// SQL: UPDATE offsets SET offset = ? WHERE partition = ?
}
@Override
public ConsumerRebalanceListener createRebalanceListener() {
return new ConsumerRebalanceListener() {
@Override
public void onPartitionsAssigned(final Collection<TopicPartition> partitions) {
for (var tp : partitions) {
long lastOffset = fetchFromDb(tp);
consumer.seek(tp, lastOffset + 1);
}
}
// ...
};
}
// ...
}KPipe provides a robust, multi-layered error handling mechanism:
- Built-in Retries: Configure
.withRetry(maxRetries, backoff)to automatically retry transient failures. - Dead Letter Handling: Provide a
.withErrorHandler()to redirect messages that fail after all retries to an error topic or database. - Safe Pipelines: Use
MessageProcessorRegistry.withErrorHandling()to wrap individual processors with default values or logging, preventing a single malformed message from blocking the partition.
When a downstream sink (database, HTTP API, another Kafka topic) is slow, KPipe can automatically pause Kafka polling to prevent unbounded resource consumption or excessive lag.
Backpressure uses two configurable watermarks (hysteresis) to avoid rapid pause/resume oscillation:
- High watermark — pause Kafka polling when the monitored metric reaches this value.
- Low watermark — resume Kafka polling when the metric drops to or below this value.
KPipe automatically selects the optimal backpressure strategy based on your processing mode:
| Mode | Strategy | Metric Monitored | Use Case |
|---|---|---|---|
| Parallel (Default) | In-Flight | Total active virtual threads | Prevent memory exhaustion from too many concurrent tasks. |
| Sequential | Consumer Lag | Total unread messages in Kafka | Prevent the consumer from falling too far behind the producer. |
In parallel mode, multiple messages are processed concurrently using Virtual Threads. The backpressure controller monitors the number of messages currently "in-flight" (started but not yet finished).
- High Watermark Default: 10,000
- Low Watermark Default: 7,000
In sequential mode, messages are processed one by one to maintain strict ordering. Since only one message is ever in-flight, KPipe instead monitors the total consumer lag across all assigned partitions.
The lag is calculated using the formula:
lag = Σ (endOffset - position)
Where:
-
endOffset: The highest available offset in a partition. -
position: The offset of the next record to be fetched by this consumer. -
High Watermark Default: 10,000
-
Low Watermark Default: 7,000
final var consumer = KPipeConsumer.<byte[], byte[]>builder()
.withProperties(kafkaProps)
.withTopic("events")
.withProcessor(pipeline)
// Enable backpressure with default watermarks (10k / 7k)
.withBackpressure()
// Or configure explicit watermarks:
// .withBackpressure(5_000, 3_000)
.build();Backpressure is disabled by default and opt-in via .withBackpressure().
Observability: backpressure events are logged (WARNING on pause, INFO on resume) and tracked via two dedicated
metrics: backpressurePauseCount and backpressureTimeMs.
KPipe respects JVM signals and ensures timely shutdown without data loss:
- Interrupt Awareness: Interrupts trigger a coordinated shutdown sequence. They do not cause records to be skipped.
- Reliable Redelivery: If a record's processing is interrupted (e.g., during retry backoff or transformation), the offset is NOT marked as processed. This ensures it will be safely picked up by the next consumer instance, guaranteeing "at-least-once" delivery even during shutdown.
Extend the registry like this:
// Create a registry
final var registry = new MessageProcessorRegistry("myApp");
// Register a custom JSON operator for field transformations
final var uppercaseKey = RegistryKey.json("uppercase");
registry.registerOperator(uppercaseKey, map -> {
final var value = map.get("text");
if (value instanceof String text) map.put("text", text.toUpperCase());
return map;
});
// Built-in operators are also available
final var envKey = RegistryKey.json("addEnvironment");
registry.registerOperator(envKey,
JsonMessageProcessor.addFieldOperator("environment", "production"));
// Create a high-performance pipeline (single SerDe cycle)
final var pipeline = registry.pipeline(MessageFormat.JSON)
.add(envKey)
.add(uppercaseKey)
.add(RegistryKey.json("addTimestamp"))
.build();
// Use the pipeline with a consumer
final var consumer = KPipeConsumer.<byte[], byte[]>builder()
.withProperties(kafkaProps)
.withTopic("events")
.withPipeline(pipeline)
.withRetry(3, Duration.ofSeconds(1))
.build();
// Start processing messages
consumer.start();Monitor your consumer with built-in metrics:
// Access consumer metrics
final var metrics = consumer.getMetrics();
final var log = System.getLogger("org.kpipe.metrics");
log.log(Level.INFO, "Messages received: " + metrics.get("messagesReceived"));
log.log(Level.INFO, "Successfully processed: " + metrics.get("messagesProcessed"));
log.log(Level.INFO, "Processing errors: " + metrics.get("processingErrors"));
log.log(Level.INFO, "Messages in-flight: " + metrics.get("inFlight"));
// Backpressure metrics (present only when withBackpressure() is configured)
log.log(Level.INFO, "Backpressure pauses: " + metrics.get("backpressurePauseCount"));
log.log(Level.INFO, "Time spent paused (ms): " + metrics.get("backpressureTimeMs"));Configure automatic metrics reporting:
final var runner = KPipeRunner.builder(consumer)
.withMetricsReporters(List.of(ConsumerMetricsReporter.forConsumer(consumer::getMetrics)))
.withMetricsInterval(30_000)
.build();
runner.start();The consumer supports graceful shutdown with in-flight message handling:
final var log = System.getLogger("org.kpipe.app.Shutdown");
// Initiate graceful shutdown with 5-second timeout
boolean allProcessed = runner.shutdownGracefully(5000);
if (allProcessed) log.log(Level.INFO, "All messages processed successfully before shutdown");
else log.log(Level.WARNING, "Shutdown completed with some messages still in flight");
// Register as JVM shutdown hook
Runtime.getRuntime().addShutdownHook(
new Thread(() -> runner.close())
);The JSON processors provide operators (UnaryOperator<Map<String, Object>>) that can be composed into high-performance
pipelines:
final var registry = new MessageProcessorRegistry("myApp");
// Operators are pure functions that modify a Map
final var stampKey = RegistryKey.json("addTimestamp");
registry.registerOperator(stampKey, JsonMessageProcessor.addTimestampOperator("processedAt"));
final var sanitizeKey = RegistryKey.json("sanitize");
registry.registerOperator(sanitizeKey, JsonMessageProcessor.removeFieldsOperator("password", "ssn"));
// Metadata merging
final var metadata = Map.of("version", "1.0", "env", "prod");
final var metaKey = RegistryKey.json("addMetadata");
registry.registerOperator(metaKey, JsonMessageProcessor.mergeWithOperator(metadata));
// Build an optimized pipeline (one deserialization -> many transformations -> one serialization)
final var pipeline = registry.pipeline(MessageFormat.JSON)
.add(sanitizeKey)
.add(stampKey)
.add(metaKey)
.build();The Avro processors provide operators (UnaryOperator<GenericRecord>) that work within optimized pipelines:
final var registry = new MessageProcessorRegistry("myApp", MessageFormat.AVRO);
// Add schema (automatically registers addSource_user and addTimestamp_user)
registry.addSchema("user", "com.kpipe.User", "schemas/user.avsc");
final var schema = AvroMessageProcessor.getSchema("user");
// Register manual operators
final var sanitizeKey = RegistryKey.avro("sanitize");
registry.registerOperator(sanitizeKey,
AvroMessageProcessor.removeFieldsOperator(schema, "password", "creditCard"));
// Transform fields
final var upperKey = RegistryKey.avro("uppercaseName");
registry.registerOperator(upperKey,
AvroMessageProcessor.transformFieldOperator(schema, "name", value -> {
if (value instanceof String text) return text.toUpperCase();
return value;
}));
// Build an optimized pipeline
// This pipeline handles deserialization, all operators, and serialization in one pass
final var avroFormat = ((AvroFormat) MessageFormat.AVRO).withDefaultSchema("user");
final var pipeline = registry.pipeline(avroFormat)
.add(sanitizeKey)
.add(upperKey)
.add(RegistryKey.avro("addTimestamp_user"))
.build();
// For data with magic bytes (e.g., Confluent Wire Format), specify an offset:
final var confluentPipeline = registry.pipeline(avroFormat)
.skipBytes(5)
.add(sanitizeKey)
.add(RegistryKey.avro("addTimestamp_user"))
.build();For high-performance processing of Java records or POJOs, use the PojoFormat and TypedPipelineBuilder. This
leverages DSL-JSON annotation processing for near-native performance.
final var registry = new MessageProcessorRegistry("myApp");
// Define a custom operator for your record
final var userKey = RegistryKey.of("userTransform", UserRecord.class);
registry.registerOperator(userKey, user -> new UserRecord(user.id(), user.name().toUpperCase(), user.email()));
// Build an optimized POJO pipeline
final var pipeline = registry.pipeline(MessageFormat.pojo(UserRecord.class))
.add(userKey)
.build();Message sinks provide destinations for processed messages. The MessageSink interface is a functional interface that
defines a single method:
@FunctionalInterface
public interface MessageSink<T> {
void accept(final T processedValue);
}KPipe provides several built-in sinks:
// Create a JSON console sink (Map-typed)
final var jsonConsoleSink = new JsonConsoleSink<Map<String, Object>>();
// Create an Avro console sink (GenericRecord-typed)
final var avroConsoleSink = new AvroConsoleSink<GenericRecord>();
// Use a sink directly in the pipeline
final var pipeline = registry
.pipeline(MessageFormat.JSON)
.add(RegistryKey.json("sanitize"))
.toSink(jsonConsoleSink)
.build();You can create custom sinks using lambda expressions:
// Create a custom sink that writes to a database
final MessageSink<Map<String, Object>> databaseSink = (processedMap) -> {
try {
// Write to database
databaseService.insert(processedMap);
// Log success
log.log(Level.INFO, "Successfully wrote message to database: " + processedMap.get("id"));
} catch (Exception e) {
log.log(Level.ERROR, "Failed to write message to database", e);
}
};The MessageSinkRegistry provides a centralized repository for registering and retrieving message sinks:
// Create a registry
final var registry = new MessageSinkRegistry();
// Register sinks with explicit types
final var dbKey = RegistryKey.of("database", Map.class);
registry.register(dbKey, databaseSink);
// Use the sink by key in the pipeline
final var pipeline = registry.pipeline(MessageFormat.JSON)
.add(RegistryKey.json("enrich"))
.toSink(dbKey)
.build();The registry provides utilities for adding error handling to sinks:
// Create a sink with error handling
final var safeSink = MessageSinkRegistry.withErrorHandling(riskySink);
// Register and use the wrapped sink
final var safeKey = RegistryKey.json("safeDatabase");
registry.register(safeKey, safeSink);
final var pipeline = registry.pipeline(MessageFormat.JSON)
.toSink(safeKey)
.build();You can broadcast processed messages to multiple destinations simultaneously using CompositeMessageSink. Failures in
one sink (e.g., a database timeout) do not prevent other sinks from receiving the data.
// Create multiple sinks
final var postgresSink = new MyPostgresSink();
final var consoleSink = new JsonConsoleSink<Map<String, Object>>();
// Broadcast to both
final var compositeSink = new CompositeMessageSink<>(List.of(postgresSink, consoleSink));
// Use in pipeline
final var pipeline = registry.pipeline(MessageFormat.JSON).toSink(compositeSink).build();The KPipeRunner provides a high-level management layer for Kafka consumers, handling lifecycle, metrics, and graceful
shutdown:
// Create a consumer runner with default settings
final var runner = KPipeRunner.builder(consumer).build();
// Start the consumer
runner.start();
// Wait for shutdown
runner.awaitShutdown();The KPipeRunner supports extensive configuration options:
// Create a consumer runner with advanced configuration
final var runner = KPipeRunner.builder(consumer)
// Configure metrics reporting
.withMetricsReporters(List.of(ConsumerMetricsReporter.forConsumer(consumer::getMetrics)))
.withMetricsInterval(30_000) // Report metrics every 30 seconds
// Configure health checks
.withHealthCheck(KPipeConsumer::isRunning)
// Configure graceful shutdown
.withShutdownTimeout(10_000) // 10 seconds timeout for shutdown
.withShutdownHook(true) // Register JVM shutdown hook
// Configure custom start action
.withStartAction((c) -> {
log.log(Level.INFO, "Starting consumer");
c.start();
})
// Configure custom graceful shutdown
.withGracefulShutdown((c, timeoutMs) -> {
log.log(Level.INFO, "Initiating graceful shutdown with timeout: " + timeoutMs + "ms");
return KPipeRunner.performGracefulConsumerShutdown(c, timeoutMs);
})
.build();The KPipeRunner manages the complete lifecycle of a consumer:
// Start the consumer (idempotent - safe to call multiple times)
runner.start();
// Check if the consumer is healthy
boolean isHealthy = runner.isHealthy();
// Wait for shutdown (blocks until shutdown completes)
boolean cleanShutdown = runner.awaitShutdown();
// Initiate shutdown
runner.close();The KPipeRunner integrates with metrics reporting:
// Add multiple metrics reporters
final var runner = KPipeRunner.builder(consumer)
.withMetricsReporters(
List.of(
ConsumerMetricsReporter.forConsumer(consumer::getMetrics),
ProcessorMetricsReporter.forRegistry(processorRegistry)
)
)
.withMetricsInterval(60_000) // Report every minute
.build();The KPipeRunner implements AutoCloseable for use with try-with-resources:
try (final var runner = KPipeRunner.builder(consumer).build()) {
runner.start();
// Application logic here
// Runner will be automatically closed when exiting the try block
}Here's a concise example of a KPipe application:
public class KPipeApp implements AutoCloseable {
private static final System.Logger LOGGER = System.getLogger(KPipeApp.class.getName());
private final KPipeRunner<KPipeConsumer<byte[], byte[]>> runner;
static void main() {
// Load configuration from environment variables
final var config = AppConfig.fromEnv();
try (final var app = new KPipeApp(config)) {
app.start();
app.awaitShutdown();
} catch (final Exception e) {
LOGGER.log(Level.ERROR, "Fatal error in application", e);
System.exit(1);
}
}
public KPipeApp(final AppConfig config) {
// Create processor and sink registries
final var processorRegistry = new MessageProcessorRegistry(config.appName());
final var sinkRegistry = new MessageSinkRegistry();
final var commandQueue = new ConcurrentLinkedQueue<ConsumerCommand>();
// Create the functional consumer
final var functionalConsumer = KPipeConsumer.<byte[], byte[]>builder()
.withProperties(KafkaConsumerConfig.createConsumerConfig(config.bootstrapServers(), config.consumerGroup()))
.withTopic(config.topic())
.withPipeline(
processorRegistry
.pipeline(MessageFormat.JSON)
.add(RegistryKey.json("addSource"))
.add(RegistryKey.json("markProcessed"))
.add(RegistryKey.json("addTimestamp"))
.toSink(MessageSinkRegistry.JSON_LOGGING)
.build()
)
.withCommandQueue(commandQueue)
.withOffsetManagerProvider((consumer) ->
KafkaOffsetManager.builder(consumer)
.withCommandQueue(commandQueue)
.withCommitInterval(Duration.ofSeconds(30))
.build()
)
.withMetrics(true)
.build();
// Set up the consumer runner with metrics and shutdown hooks
runner = KPipeRunner.builder(functionalConsumer)
.withMetricsInterval(config.metricsInterval().toMillis())
.withShutdownTimeout(config.shutdownTimeout().toMillis())
.withShutdownHook(true)
.build();
}
public void start() {
runner.start();
}
public boolean awaitShutdown() {
return runner.awaitShutdown();
}
public void close() {
runner.close();
}
}Key Components:
- Configuration from environment variables
- Processor and sink registries for message handling
- Processing pipeline with error handling
- Metrics reporting and graceful shutdown
To Run:
# Set configuration
export HEALTH_HTTP_ENABLED=true
export HEALTH_HTTP_HOST=0.0.0.0
export HEALTH_HTTP_PORT=8080
export HEALTH_HTTP_PATH=/health
export KAFKA_BOOTSTRAP_SERVERS=localhost:9092
export KAFKA_CONSUMER_GROUP=my-group
export KAFKA_TOPIC=json-events
export PROCESSOR_PIPELINE=addSource,markProcessed,addTimestamp
export METRICS_INTERVAL_SEC=30
export SHUTDOWN_TIMEOUT_SEC=5- Java 24+ (Note: Ensure
--enable-previewis used asScopedValueand Virtual Thread optimizations continue to evolve). - Gradle (for building the project)
- kcat (for testing)
- Docker (for local Kafka setup)
Follow these steps to test the KPipe Kafka Consumer. KPipe includes a pre-configured docker-compose.yaml in the root
directory that starts a full local environment including Kafka, Zookeeper, and Confluent Schema Registry.
# Format code and build the library module
./gradlew clean :lib:spotlessApply :lib:build
# Format code and build the applications module
./gradlew :app:clean :app:spotlessApply :app:build
# Build the consumer app container and start all services
docker compose build --no-cache --build-arg MESSAGE_FORMAT=<json|avro|protobuf>
docker compose down -v
docker compose up -d
# Publish a simple JSON message to the json-topic
echo '{"message":"Hello world"}' | kcat -P -b kafka:9092 -t json-topic
# For complex JSON messages, use a file
cat test-message.json | kcat -P -b kafka:9092 -t json-topic
# Publish multiple test messages
for i in {1..10}; do echo "{\"id\":$i,\"message\":\"Test message $i\"}" | \
kcat -P -b kafka:9092 -t json-topic; doneIf you want to use Avro with a schema registry, follow these steps:
# Register an Avro schema
curl -X POST \
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
--data "{\"schema\": $(cat lib/src/test/resources/avro/customer.avsc | jq tostring)}" \
http://localhost:8081/subjects/com.kpipe.customer/versions
# Read registered schema
curl -s http://localhost:8081/subjects/com.kpipe.customer/versions/latest | jq -r '.schema' | jq --indent 2 '.'
# Produce an Avro message using kafka-avro-console-producer
cat <<'JSON' | docker run -i --rm --network kpipe_default \
-v "$PWD/lib/src/test/resources/avro/customer.avsc:/tmp/customer.avsc:ro" \
confluentinc/cp-schema-registry:8.2.0 \
sh -ec 'kafka-avro-console-producer \
--bootstrap-server kafka:9092 \
--topic avro-topic \
--property schema.registry.url=http://schema-registry:8081 \
--property value.schema="$(cat /tmp/customer.avsc)"'
{"id":1,"name":"Mariano Gonzalez","email":{"string":"mariano@example.com"},"active":true,"registrationDate":1635724800000,"address":{"com.kpipe.customer.Address":{"street":"123 Main St","city":"Chicago","zipCode":"00000","country":"USA"}},"tags":["premium","verified"],"preferences":{"notifications":"email"}}
JSONKafka consumer will:
- Connect to
localhost:9092 - Subscribe to
avro-topic|json-topic|protobuf-topic - Compose the processing pipeline from configured processors
- Process each message concurrently using virtual threads
For maintainable pipelines, you can compose multiple pipelines or operators:
final var registry = new MessageProcessorRegistry("myApp");
// Create focused operator groups
final var securityKey = RegistryKey.json("security");
registry.registerOperator(securityKey,
JsonMessageProcessor.removeFieldsOperator("password", "creditCard"));
final var enrichmentKey = RegistryKey.json("enrichment");
registry.registerOperator(enrichmentKey,
JsonMessageProcessor.addTimestampOperator("processedAt"));
// Compose them into an optimized pipeline
final var fullPipeline = registry.pipeline(MessageFormat.JSON)
.add(securityKey)
.add(enrichmentKey)
.build();For the highest level of type safety, you can define your operators as an Enum that implements UnaryOperator<T>.
This allows for bulk registration and discoverability of standard processors:
public enum StandardProcessors implements UnaryOperator<Map<String, Object>> {
TIMESTAMP(JsonMessageProcessor.addTimestampOperator("ts")),
SOURCE(JsonMessageProcessor.addFieldOperator("src", "app"));
private final UnaryOperator<Map<String, Object>> op;
StandardProcessors(final UnaryOperator<Map<String, Object>> op) { this.op = op; }
@Override
public Map<String, Object> apply(final Map<String, Object> t) { return op.apply(t); }
}
// Bulk register all enum constants
registry.registerEnum(Map.class, StandardProcessors.class);
// Now they can be used by name in configuration
// PROCESSOR_PIPELINE=TIMESTAMP,SOURCEKPipe provides a fluent when() operator directly in the TypedPipelineBuilder:
final var pipeline = registry
.pipeline(MessageFormat.JSON)
.when(
(map) -> "VIP".equals(map.get("level")),
(map) -> {
map.put("priority", "high");
return map;
},
(map) -> {
map.put("priority", "low");
return map;
}
)
.build();Alternatively, for byte[] level branching, use the static MessageProcessorRegistry.when() utility:
To skip a message in a pipeline, return null in your operator. KPipe will treat null as a signal to stop processing
the current record and will not send it to any downstream operators or sinks.
registry.registerOperator(RegistryKey.json("filter"), map -> {
if ("internal".equals(map.get("type"))) {
return null; // Skip this message
}
return map;
});You can access ConsumerRecord headers within a custom sink to propagate tracing or metadata:
MessageSink<byte[], byte[]> tracingSink = (record, processedValue) -> {
final var traceId = record.headers().lastHeader("X-Trace-Id");
if (traceId != null) {
// Use traceId.value() for logging or downstream calls
}
};- Message processors should be stateless and thread-safe.
- KPipe automatically handles resource reuse via
ScopedValue(optimized for Virtual Threads). Avoid manualThreadLocalusage. - For processors with side effects (like database calls), ensure they are compatible with high-concurrency virtual threads.
- Register frequently used processor combinations as single processors
- For very large messages, consider streaming JSON processors
- Profile your processor pipeline to identify bottlenecks
KPipe is designed for high-throughput, low-overhead Kafka processing using modern Java features and pipeline optimizations. Performance depends on workload shape (I/O vs CPU bound), partitioning, and message size.
- Zero-Copy Magic Byte Handling: For Avro data (especially from Confluent Schema Registry), KPipe supports an
offsetparameter that allows skipping magic bytes and schema IDs without performing expensiveArrays.copyOfRangeoperations. - DslJson Integration: Uses a high-performance JSON library to reduce parsing overhead and GC pressure.
Latest parallel benchmark snapshot (see benchmarks/README.md) shows a throughput edge for KPipe in that scenario, with
a higher allocation footprint than Confluent Parallel Consumer. Treat these as scenario-specific results, not universal
guarantees.
For systems like payment processors where the order of operations (Authorize → Capture) is vital:
- Consistent Partitioning: Ensure your producer uses a consistent key (e.g.,
transaction_idorcustomer_id). Kafka guarantees all messages with the same key land in the same partition. - Safety: KPipe manages each partition as an independent, ordered sequence of offsets. As long as related events share a key, KPipe's commitment strategy ensures they are handled reliably without skipping steps.
This library is inspired by the best practices from:
If you're a team using this library, feel free to:
- Register custom processors
- Add metrics/observability hooks
- Share improvements or retry strategies
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
