Skip to content

Add per-tenant TSDB cardinality status API endpoint#7332

Open
CharlieTLe wants to merge 3 commits intocortexproject:masterfrom
CharlieTLe:worktree-sleepy-sniffing-squid
Open

Add per-tenant TSDB cardinality status API endpoint#7332
CharlieTLe wants to merge 3 commits intocortexproject:masterfrom
CharlieTLe:worktree-sleepy-sniffing-squid

Conversation

@CharlieTLe
Copy link
Member

Summary

  • Add /api/v1/status/tsdb endpoint that returns per-tenant TSDB cardinality statistics (series count by metric name, label value counts, memory usage by label, series count by label-value pair, min/max time)
  • Implement the full stack: protobuf definitions, ingester gRPC method, distributor aggregation, HTTP handler, and API route registration
  • Add API documentation for the new endpoint
  • Add integration tests that validate the full end-to-end flow in a Docker-based Cortex cluster

Test plan

  • Unit tests for ingester TSDBStatus gRPC method
  • Unit tests for distributor TSDBStatus aggregation logic
  • Unit tests for HTTP handler with various limit parameters
  • Integration test (TestTSDBStatus) that starts a single-binary Cortex cluster, pushes series with varying cardinality, and validates correct series counts, metric name breakdowns, label stats, and limit truncation

🤖 Generated with Claude Code

@dosubot dosubot bot added the type/observability To help know what is going on inside Cortex label Mar 6, 2026
CharlieTLe and others added 3 commits March 5, 2026 18:29
Expose TSDB head cardinality statistics through a new /api/v1/status/tsdb
endpoint, enabling users to identify which metrics, labels, and label-value
pairs contribute the most series. This follows the existing UserStats
fan-out pattern: ingester calls Head.Stats(), distributor aggregates
across the replication set (dividing replicated counts by RF), and an
HTTP handler serves JSON with an optional ?limit=N parameter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>
Document the new TSDB cardinality status endpoint in the HTTP API
reference, including query parameters, example request/response,
and field descriptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>
End-to-end test that starts a single-binary Cortex cluster, pushes
series with varying cardinality, and validates the TSDB status API
returns correct series counts, metric name breakdowns, and limit
truncation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>
@CharlieTLe CharlieTLe force-pushed the worktree-sleepy-sniffing-squid branch from 1404aa1 to 2536082 Compare March 6, 2026 02:29
limit := int32(10)
if v := r.FormValue("limit"); v != "" {
if n, err := strconv.Atoi(v); err == nil && n > 0 {
limit = int32(n)

Check failure

Code scanning / CodeQL

Incorrect conversion between integer types High

Incorrect conversion of an integer with architecture-dependent bit size from
strconv.Atoi
to a lower bit size type int32 without an upper bound check.
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
I think this worth a design proposal as it is not a simple API and we need to think about how to extend it long term. Several questions I have:

  • Is /api/v1/status/tsdb the right API to support? Or we should have a more specific API for cardinality analysis. Even though the endpoint is the same but I see you are using different response format compared to what Prometheus has https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats.
  • What fields should we support? memoryInBytesByLabelName like this might make it hard to extend to long term storage
  • Is it the right thing to expose the API via distributor? Worst case this API could impact writes and this is what I want to avoid. Exposing via Querier seems safer and we can utilize even Query Frontend caching in the future

@CharlieTLe
Copy link
Member Author

#7335

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL type/observability To help know what is going on inside Cortex

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants