internet-latency-collector: add wheresitup job backlog observability by snormore · Pull Request #3203 · malbeclabs/doublezero

snormore · 2026-03-09T16:26:06Z

Summary

Add pending_jobs prometheus gauge and in_progress_count/pending_jobs log fields to wheresitup export summary, making job backlog accumulation visible for alerting and log analysis
Add WheresitupAPIResponseDuration histogram for tracking API call latency

Context

During the 2026-03-09 wheresitup slowdown (~09:01–10:50 UTC), ~60-90 of 435 jobs per cycle were completing too slowly, causing backlog growth from 435 to 1322 pending jobs. The only way to detect this was cross-referencing total_jobs with processed_count in logs after the fact. These changes make the backlog directly observable in both logs and Prometheus.

Testing Verification

Existing wheresitup package tests pass
No new tests added — changes are additive logging/metrics with no behavioral change

Add metrics and logging to make wheresitup service slowdowns easier to detect and diagnose. During the 2026-03-09 incident, ~60-90 jobs per cycle were completing too slowly, causing backlog accumulation that was only visible by cross-referencing total_jobs with processed_count. - Add pending_jobs and in_progress_count to export summary logs - Add WheresitupPendingJobs prometheus gauge for alerting on backlog - Add WheresitupAPIResponseDuration histogram for API call timing

…ability

snormore added skip-changelog and removed skip-changelog labels Mar 9, 2026

internet-latency-collector: add changelog entry for wheresitup observ…

4123333

…ability

snormore marked this pull request as ready for review March 9, 2026 16:56

snormore requested a review from nikw9944 March 9, 2026 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internet-latency-collector: add wheresitup job backlog observability#3203

internet-latency-collector: add wheresitup job backlog observability#3203
snormore wants to merge 2 commits intomainfrom
snor/wheresitup-observability

snormore commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

snormore commented Mar 9, 2026

Summary

Context

Testing Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant