Skip to content

Fix indexer#95

Open
anhthii wants to merge 2 commits intomainfrom
fix-indexer
Open

Fix indexer#95
anhthii wants to merge 2 commits intomainfrom
fix-indexer

Conversation

@anhthii
Copy link
Copy Markdown
Collaborator

@anhthii anhthii commented Apr 3, 2026

Fix: Catchup worker not processing pending blocks & phantom range accumulation

Problem

Two related bugs in the catchup worker caused pending_blocks to accumulate without being processed, particularly affecting fast chains like Aptos:

  1. Catchup worker exited permanently after processing initial ranges. When the regular worker later created new catchup ranges via skipAheadIfLagging, nothing picked them up until the service
    was restarted.

  2. loadCatchupProgress created phantom catchup ranges on reload. When no ranges existed in Consul, it created new ones, processed them, deleted them from Consul, then recreated them on the next
    reload — an infinite cycle that inflated the in-memory catchup_pending_blocks counter. This caused the status API to report ~100 pending blocks even though Consul had zero catchup ranges.

Root Cause

  • runCatchup() called return when blockRanges was empty, killing the goroutine permanently.
  • loadCatchupProgress() had dual responsibility: loading existing ranges AND creating new ones. On reload after processing, it always found an empty store and created new ranges, causing a desync
    between Consul (source of truth) and the in-memory status registry.

Changes

  • runCatchup(): Instead of exiting when ranges are empty, the worker now polls the block store every 5 seconds for new ranges. It only exits on context cancellation (service shutdown).
  • loadCatchupProgress(): Removed range creation logic. The function now only loads existing ranges from the store. Range creation remains the sole responsibility of the regular worker
    (determineStartingBlock / skipAheadIfLagging).

Impact

  • Catchup ranges created mid-flight by the regular worker are now picked up without requiring a service restart.
  • The status API catchup_pending_blocks counter stays in sync with the actual Consul state.
  • Fast chains (Aptos, Solana) will correctly report healthy status instead of degraded/slow due to phantom pending blocks.

anhthii added 2 commits April 3, 2026 08:12
The catchup worker exited permanently when its initial ranges were
empty or completed. This meant ranges created later by the regular
worker's skipAheadIfLagging were never picked up until service restart,
causing pending blocks to accumulate without being processed.

Now the worker polls the block store every 5 seconds for new ranges
instead of returning.
loadCatchupProgress was creating new catchup ranges when none existed
in the store. This caused an infinite cycle: ranges were created,
processed, deleted from store, then recreated on the next reload —
inflating the in-memory catchup_pending_blocks counter with phantom
ranges that no longer existed in Consul.

Range creation is the responsibility of the regular worker (via
determineStartingBlock or skipAheadIfLagging). The catchup worker
should only load and process existing ranges.
@anhthii anhthii requested a review from vietddude April 3, 2026 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants