Release/1.7.0 by colmsnowplow · Pull Request #36 · snowplow-devops/kinsumer

colmsnowplow · 2025-09-10T13:11:09Z

This PR introduces features related to management of the amount of data we pull into memory via kinsumer:

Configurable limit on getRecords requests

Self explanatory

maxConrrentShards setting

Uses a semaphore to limit the number of shards we pull data from at once. Previous to this, when there's a bottleneck on delivering data to the client, we would continue to pull data from the stream, which would cause excessive memory consumption.

This feature allows us to limit the number of shards one pod pulls from - where there is no bottleneck we will cycle through shards continuously. When there is one, we will have a limit on the amount of shards we deal with, and therefore the memory we consume.

Buffered data metric

Adds metrics recroding the number of records in memory after pulling from kinesis, and the approximate size in memory of those records.

Additionally, allows configuration of metrics in order to cost-control, since we don't necessarily need all the metrics that kinsumer outputs.

The other commits are just to factor things better and add tests - they're later in the chain to avoid the need for a rebase, since lots of this work was done concurrently.

* Initial metrics implementation for tracking record in memory * Buffer metrics before sending * Fix metrics to account for pre-buffer in-memory data * Add a metric which measures memory backlog * Use int64 everywhere for recordsInMemory metric * Use channel instead of atomics to avoid possible contention * Log warnings if we drop metrics * Combine the updates to reduce channel traffic overhead * Reduce channel activity/overhead with a batching mechanic * Use a smaller channel and log immediately if blocked * Buffer metrics every second * Add test of new metrics implementation * Add config to filter metrics * Make metrics config cleaner (#35) * Make metrics config cleaner * go fmt

colmsnowplow added 5 commits September 9, 2025 16:48

Add getRecordsLimit to make pulls configurable (#29)

6933531

Add maxConcurrentShards setting (#30)

d9e3a13

Factor out batch processing into a function

afa11e0

Add tests covering concurrency control behaviours

ec8fbc3

colmsnowplow requested a review from Piotr Poniedziałek (pondzix) September 10, 2025 13:11

Piotr Poniedziałek (pondzix) approved these changes Sep 11, 2025

View reviewed changes

colmsnowplow merged commit f71d936 into master Sep 12, 2025
1 check passed

colmsnowplow deleted the release/1.7.0 branch September 12, 2025 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release/1.7.0#36

Release/1.7.0#36
colmsnowplow merged 5 commits intomasterfrom
release/1.7.0

colmsnowplow commented Sep 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

colmsnowplow commented Sep 10, 2025

Configurable limit on getRecords requests

maxConrrentShards setting

Buffered data metric

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants