fix: Graceful Shutdown Timeout for GTFS Manager#737
fix: Graceful Shutdown Timeout for GTFS Manager#737AhmedAlian7 wants to merge 2 commits intoOneBusAway:mainfrom
Conversation
…ndefinite blocking
84a115e to
4fd64a7
Compare
aaronbrethorst
left a comment
There was a problem hiding this comment.
Ahmed, this is a well-targeted fix for a real problem — an indefinite hang during shutdown if a real-time feed goroutine gets stuck. The context-based timeout with select on wg.Wait() is the standard Go pattern for this, and the 25s timeout with K8s SIGTERM margin is a thoughtful choice.
Critical Issues (0 found)
None.
Important Issues (0 found)
None.
Suggestions (1 found)
-
DB close is outside
shutdownOnce, weakening idempotency (tracked in #750)Moving
GtfsDB.Close()out ofshutdownOnce.Do()means a secondShutdown()call will spawn an unnecessary goroutine and re-close the already-closed DB. Go'ssql.DB.Close()handles this gracefully (returns nil), so this isn't a bug — the idempotency test passes correctly. But it would be cleaner to guard the DB close with its ownsync.Onceand extract the duplicated close logic from bothselectbranches into a single deferred call. Filed as #750 for follow-up.
Strengths
- The
selectonwg.Wait()channel vsctx.Done()is the correct Go pattern for bounded waits on WaitGroups - 25s timeout leaves a 5s safety margin inside K8s' default 30s SIGTERM window — good operational thinking
- DB is closed in both the clean and timeout paths, preventing resource leaks either way
TestManagerShutdown_TimeoutPathwith a stuck goroutine is a great test — it verifies the exact scenario that motivated this fix- All 15+ call sites across 7 test files updated consistently; zero old-signature callers remain
- The warning log on timeout (not fatal) is the right severity — shutdown timeout is informational, not something to crash over
Recommended Action
After fixing the merge conflict, we will merge as-is. Follow up on #750 when convenient.
Description
Manager.Shutdown()calledwg.Wait()with no timeout. A stuck real-time feed goroutine would block shutdown indefinitely.Solution
Updated
Shutdownto accept acontext.Contextand racewg.Wait()againstctx.Done(). The production call site inapp.gouses a 25s timeout, keeping a 5s safety margin inside Kubernetes' default SIGTERM window. Timeoutslog a warning instead of fatally erroring, preserving liveness.
Changes
gtfs_manager.gonew signatureShutdown(ctx context.Context) errorwith context-select patternapp.go25s shutdown context with structured warning log on timeoutnil) and timeout path (context.DeadlineExceeded)Tests
go build ./...make testmake lintgo fmt ./...