Skip to content

internet-latency-collector: add wheresitup job backlog observability#3203

Open
snormore wants to merge 2 commits intomainfrom
snor/wheresitup-observability
Open

internet-latency-collector: add wheresitup job backlog observability#3203
snormore wants to merge 2 commits intomainfrom
snor/wheresitup-observability

Conversation

@snormore
Copy link
Contributor

@snormore snormore commented Mar 9, 2026

Summary

  • Add pending_jobs prometheus gauge and in_progress_count/pending_jobs log fields to wheresitup export summary, making job backlog accumulation visible for alerting and log analysis
  • Add WheresitupAPIResponseDuration histogram for tracking API call latency

Context

During the 2026-03-09 wheresitup slowdown (~09:01–10:50 UTC), ~60-90 of 435 jobs per cycle were completing too slowly, causing backlog growth from 435 to 1322 pending jobs. The only way to detect this was cross-referencing total_jobs with processed_count in logs after the fact. These changes make the backlog directly observable in both logs and Prometheus.

Testing Verification

  • Existing wheresitup package tests pass
  • No new tests added — changes are additive logging/metrics with no behavioral change

Add metrics and logging to make wheresitup service slowdowns easier to
detect and diagnose. During the 2026-03-09 incident, ~60-90 jobs per
cycle were completing too slowly, causing backlog accumulation that was
only visible by cross-referencing total_jobs with processed_count.

- Add pending_jobs and in_progress_count to export summary logs
- Add WheresitupPendingJobs prometheus gauge for alerting on backlog
- Add WheresitupAPIResponseDuration histogram for API call timing
@snormore snormore marked this pull request as ready for review March 9, 2026 16:56
@snormore snormore requested a review from nikw9944 March 9, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant