Skip to content

feat: [BE] Add pipeline run stats endpoint with caching #177

@Mbeaulne

Description

@Mbeaulne

Description

Add a new GET /api/pipeline_runs/stats endpoint that returns aggregate run counts grouped by status. Supports the same filter_query format used by the existing GET /api/pipeline_runs/ list endpoint, so the stats cards on the homepage stay in sync with whatever filters the user has applied.

Filtering

The endpoint accepts filter_query — the same JSON predicate format already used by the list endpoint.

Param Type Description
filter_query string | null JSON filter query (same format as the list endpoint)
current_user string | null Resolves "me" placeholder in filter predicates

Supported filter predicates

These are the filters currently used on the homepage:

Filter Backend key Predicate type Example
Created by system/pipeline_run.created_by value_equals {"value_equals": {"key": "system/pipeline_run.created_by", "value": "me"}}
Pipeline name system/pipeline_run.name value_contains {"value_contains": {"key": "system/pipeline_run.name", "value_substring": "training"}}
Date range system/pipeline_run.date.created_at time_range {"time_range": {"key": "system/pipeline_run.date.created_at", "start_time": "...", "end_time": "..."}}
Annotations Custom annotation keys key_exists / value_contains {"key_exists": {"key": "my_annotation"}}

Predicates are combined with "and" logic:

{
  "and": [
    { "value_equals": { "key": "system/pipeline_run.created_by", "value": "me" } },
    { "time_range": { "key": "system/pipeline_run.date.created_at", "start_time": "2026-03-01T00:00:00Z", "end_time": "2026-03-31T23:59:59Z" } }
  ]
}

Examples

  • No filters → stats for all runs in the system
  • created_by: "me" → stats for my runs
  • created_by: "me" + date range → stats for my runs in the last 7 days
  • pipeline_name contains "training" → stats for all "training" runs

Response shape

{
  "total_runs": 1250,
  "counts_by_status": {
    "RUNNING": 3,
    "SUCCEEDED": 1100,
    "FAILED": 87,
    "CANCELLED": 50,
    "PENDING": 10
  },
  "cached_at": "2026-03-19T12:00:00Z"
}

Performance approach

Doing a full table scan on every request is not acceptable at scale. The solution:

  • Server-side in-memory cache with 30-60s TTL, keyed by the hash of the resolved filter_query
  • Runs a single SELECT container_execution_status, COUNT(*) ... WHERE <filter_clauses> GROUP BY container_execution_status query at most once per cache window
  • No impact on existing list queries — this is a completely separate endpoint
  • The existing filter_query_sql.py already builds WHERE clauses from filter predicates — reuse those same clauses for the stats query

Relevant files

  • api_server_sql.py — add new stats() method
  • api_router.py — register route
  • filter_query_sql.py — reuse existing WHERE clause builder for the stats query
  • filter_query_models.py — existing filter predicate models (no changes needed)
  • backend_types_sql.py — existing PipelineRun model, existing indexes support this query

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions