Skip to content

Add proposal for per-tenant TSDB status API#7335

Open
CharlieTLe wants to merge 1 commit intocortexproject:masterfrom
CharlieTLe:proposal/per-tenant-tsdb-status-api
Open

Add proposal for per-tenant TSDB status API#7335
CharlieTLe wants to merge 1 commit intocortexproject:masterfrom
CharlieTLe:proposal/per-tenant-tsdb-status-api

Conversation

@CharlieTLe
Copy link
Member

Design proposal for the /api/v1/status/tsdb endpoint that exposes per-tenant TSDB head cardinality statistics. Covers architecture (Distributor fan-out to Ingesters), gRPC definitions, aggregation logic, Prometheus compatibility trade-offs, extensibility to long-term storage, and Distributor vs Querier routing alternatives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Charlie Le <charlie_le@apple.com>

Currently, Cortex tenants lack visibility into which metrics, labels, and label-value pairs contribute the most series in ingesters. Without this information, debugging high-cardinality issues requires operators to inspect TSDB internals directly on ingester instances, which is impractical in a multi-tenant, distributed environment.

Prometheus itself exposes a `/api/v1/status/tsdb` endpoint that provides cardinality statistics from the TSDB head. This proposal brings equivalent functionality to Cortex as a multi-tenant, distributed API.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of TSDB status API name... Prometheus API might change and add more stuff. A dedicated api/v1/cardinality might be better?


## Out of Scope

- **Long-term storage cardinality analysis**: This endpoint only covers in-memory TSDB head data in ingesters. Analyzing cardinality across compacted blocks in object storage is a separate concern. A future long-term cardinality API could reuse portable fields (see [Extensibility](#extensibility-to-long-term-storage)) or introduce a separate endpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to have a different API for long term storage cardinality? We should aim for the same API endpoint even though we don't have to design for it now


Expose per-tenant TSDB head cardinality statistics via a REST API endpoint on the Cortex query path. The endpoint should:

1. Be compatible with the Prometheus `/api/v1/status/tsdb` response format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this needs to be as part of the goal. Does it need to be compatible.
I think our API response format is already incompatible today

```

- **Authentication**: Requires `X-Scope-OrgID` header (standard Cortex tenant authentication).
- **Query Parameter**: `limit` (optional, default 10) - controls the number of top items returned per category.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about start and end?

message TSDBStatusResponse {
uint64 num_series = 1;
int64 min_time = 2;
int64 max_time = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need min max? How do we aggregate this in the final response? min(min_t) and max(max_t)?


2. **`chunkCount` omitted**: Prometheus includes a `chunkCount` field (from `prometheus_tsdb_head_chunks`). In a distributed system with replication, chunk counts across ingesters cannot be meaningfully aggregated — chunks are an ingester-local storage detail, and summing/dividing by the replication factor does not produce a useful number.

**Open question**: Should we adopt the `headStats` wrapper to maintain client compatibility with Prometheus tooling? The trade-off is compatibility vs simplicity — the flat format is easier to consume for Cortex-specific clients, but adopting the Prometheus format would allow reuse of existing client libraries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any Prometheus tool consumes this today? Why compatibility is a concern

| `labelValueCountByLabelName` | No | Portable to block storage |
| `seriesCountByLabelValuePair` | No | Portable to block storage |
| `memoryInBytesByLabelName` | **Yes** | In-memory byte usage has no analogue in object storage |
| `minTime` / `maxTime` | **Yes** | Reflects head time range, not total storage |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add those head specific fields?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants