Skip to content

Hard resync triggers infinite loop due to stale timing filters #191

@craigmillard86

Description

@craigmillard86

When WiFi congestion causes the audio queue to empty, the player triggers a hard resync but fails to reset timing filters. This
causes an infinite resync loop where audio never recovers.

Symptoms

  • Hundreds of RESYNCING HARD 1: age Xus, latency Yus log messages in rapid succession
  • Corrupted latency values (e.g., 1469934083866us = ~17 days)
  • Queue always shows 0 messages
  • TAS5805M reports repeated clock faults
  • Audio never recovers (or takes minutes to eventually recover by chance)

Trigger Sequence

  1. WiFi congestion occurs
  • Network latency spike (1+ seconds observed in testing)
  • No audio chunks arrive during spike
  1. Queue drains
  • Player task keeps consuming chunks but nothing arriving
  • consecutive_empty_count increments each empty read
  • Threshold hit after CONFIG_PLAYER_QUEUE_EMPTY_THRESHOLD (default 3) empty reads
  1. Hard resync triggers
  • HARD 2 resync fires: stops I2S, sets initialSync = 0
  • BUG: TimeFilter and median filters NOT reset
  1. WiFi recovers, but timing is corrupted
  • Chunks start arriving again
  • Age calculated using serverNow from get_diff_to_server()
  • TimeFilter was extrapolating clock drift during outage with no updates
  • Kalman filter extrapolated wildly → returns garbage value (17+ days offset)
  1. Infinite loop
  • Every chunk appears massively late due to corrupt diff2Server
  • HARD 1 resync fires → drops chunk, continues
  • Next chunk also has garbage age → HARD 1 again
  • Never recovers because filters never reset

Root Cause

When hard resync is triggered in player.c, the code resets initialSync but does not reset:

  • latencyTimeFilter (Kalman filter for time sync)
  • shortMedianFilter / miniMedianFilter (sync drift tracking)

The race condition is between WiFi recovery (chunks arriving) and TimeFilter state (extrapolating into garbage). If chunks arrive
before TimeFilter is reset, stale extrapolated values corrupt all subsequent timing calculations.

Why It Sometimes Recovered

If by chance a chunk's calculated age went negative (appeared "early"), the age < 0 code path does reset median filters. This allowed
partial recovery, but was luck-based and slow.

Affected Code

components/lightsnapcast/player.c:

  • HARD 1 resync path (~line 1535-1576)
  • HARD 2 resync path (~line 1810-1836)

Example Logs:

  [1B][0;32mI (1390379) PLAYER: initial sync age: 1us, chunk duration: 24000us[1B][0m
  [1B][0;33mW (1392630) TAS5805M: Faults detected: err0=0x00, err1=0x04, err2=0x00, ot_warn=0x00[1B][0m
  [1B][0;33mW (1392630) TAS5805M: tas5805m_decode_faults: Clock fault[1B][0m
  [1B][0;32mI (1392634) TAS5805M: Faults cleared[1B][0m
  [1B][0;33mW (1662579) PLAYER: RESYNCING HARD 2: age -110us, latency 1469934083866us, free 8426135, largest block 8126464, 0, rssi:
  -42[1B][0m
  [1B][0;33mW (1662856) TAS5805M: Faults detected: err0=0x00, err1=0x04, err2=0x00, ot_warn=0x00[1B][0m
  [1B][0;33mW (1662856) TAS5805M: tas5805m_decode_faults: Clock fault[1B][0m
  [1B][0;32mI (1662860) TAS5805M: Faults cleared[1B][0m
  [1B][0;33mW (1663333) PLAYER: RESYNCING HARD 1: age 1419276us, latency 1469934083864us, free 8426135, largest block 8126464, rssi:
  -42, left in queue 0[1B][0m
  [1B][0;33mW (1663826) PLAYER: RESYNCING HARD 1: age 1888074us, latency 1469934083863us, free 8426207, largest block 8126464, rssi:
  -41, left in queue 0[1B][0m
  [1B][0;33mW (1664073) PLAYER: RESYNCING HARD 1: age 2110976us, latency 1469934083862us, free 8412567, largest block 8126464, rssi:
  -41, left in queue 0[1B][0m
  [1B][0;33mW (1664080) PLAYER: RESYNCING HARD 1: age 2094070us, latency 1469934083862us, free 8403631, largest block 8126464, rssi:
  -41, left in queue 0[1B][0m
  [1B][0;33mW (1664093) PLAYER: RESYNCING HARD 1: age 2082371us, latency 1469934083862us, free 8388007, largest block 8126464, rssi:
  -41, left in queue 0[1B][0m
  [1B][0;33mW (1664106) PLAYER: RESYNCING HARD 1: age 2071777us, latency 1469934083862us, free 8389775, largest block 8126464, rssi:
  -41, left in queue 0[1B][0m
  ...
  [1B][0;33mW (1666677) PLAYER: RESYNCING HARD 1: age 11127us, latency 1469934083854us, free 8389711, largest block 8126464, rssi: -42,
   left in queue 0[1B][0m
  [1B][0;32mI (1666695) PLAYER: DMA completely loaded[1B][0m
  [1B][0;32mI (1666697) PLAYER: initial sync age: 1us, chunk duration: 24000us[1B][0m
  [1B][0;33mW (1667868) TAS5805M: Faults detected: err0=0x00, err1=0x04, err2=0x00, ot_warn=0x00[1B][0m
  [1B][0;33mW (1667868) TAS5805M: tas5805m_decode_faults: Clock fault[1B][0m
  [1B][0;32mI (1667872) TAS5805M: Faults cleared[1B][0m
  [1B][0;33mW (1669061) PLAYER: RESYNCING HARD 2: age 5214us, latency 1469934083845us, free 7937727, largest block 7733248, 105, rssi:
  -41[1B][0m
  [1B][0;32mI (1669063) PLAYER: DMA completely loaded[1B][0m
  [1B][0;32mI (1669116) PLAYER: initial sync age: 2us, chunk duration: 24000us[1B][0m
  [1B][0;33mW (1672881) TAS5805M: Faults detected: err0=0x00, err1=0x04, err2=0x00, ot_warn=0x00[1B][0m
  [1B][0;33mW (1672881) TAS5805M: tas5805m_decode_faults: Clock fault[1B][0m
  [1B][0;32mI (1672885) TAS5805M: Faults cleared[1B][0m

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions