Skip to content

fix: Graceful Shutdown Timeout for GTFS Manager#737

Open
AhmedAlian7 wants to merge 2 commits intoOneBusAway:mainfrom
AhmedAlian7:fix/graceful-shutdown-timeout
Open

fix: Graceful Shutdown Timeout for GTFS Manager#737
AhmedAlian7 wants to merge 2 commits intoOneBusAway:mainfrom
AhmedAlian7:fix/graceful-shutdown-timeout

Conversation

@AhmedAlian7
Copy link
Copy Markdown
Contributor

Description

Manager.Shutdown() called wg.Wait() with no timeout. A stuck real-time feed goroutine would block shutdown indefinitely.

Solution

Updated Shutdown to accept a context.Context and race wg.Wait() against ctx.Done(). The production call site in app.go uses a 25s timeout, keeping a 5s safety margin inside Kubernetes' default SIGTERM window. Timeouts
log a warning instead of fatally erroring, preserving liveness.

Changes

  • gtfs_manager.go new signature Shutdown(ctx context.Context) error with context-select pattern
  • app.go 25s shutdown context with structured warning log on timeout
  • 15+ call sites across 7 test files updated; zero uses of old 0-arg signature remain
  • 2 new unit tests: clean path (nil) and timeout path (context.DeadlineExceeded)

Tests

go build ./...   make test   make lint   go fmt ./...

@AhmedAlian7 AhmedAlian7 force-pushed the fix/graceful-shutdown-timeout branch from 84a115e to 4fd64a7 Compare March 17, 2026 23:34
Copy link
Copy Markdown
Member

@aaronbrethorst aaronbrethorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahmed, this is a well-targeted fix for a real problem — an indefinite hang during shutdown if a real-time feed goroutine gets stuck. The context-based timeout with select on wg.Wait() is the standard Go pattern for this, and the 25s timeout with K8s SIGTERM margin is a thoughtful choice.

Critical Issues (0 found)

None.

Important Issues (0 found)

None.

Suggestions (1 found)

  1. DB close is outside shutdownOnce, weakening idempotency (tracked in #750)

    Moving GtfsDB.Close() out of shutdownOnce.Do() means a second Shutdown() call will spawn an unnecessary goroutine and re-close the already-closed DB. Go's sql.DB.Close() handles this gracefully (returns nil), so this isn't a bug — the idempotency test passes correctly. But it would be cleaner to guard the DB close with its own sync.Once and extract the duplicated close logic from both select branches into a single deferred call. Filed as #750 for follow-up.

Strengths

  • The select on wg.Wait() channel vs ctx.Done() is the correct Go pattern for bounded waits on WaitGroups
  • 25s timeout leaves a 5s safety margin inside K8s' default 30s SIGTERM window — good operational thinking
  • DB is closed in both the clean and timeout paths, preventing resource leaks either way
  • TestManagerShutdown_TimeoutPath with a stuck goroutine is a great test — it verifies the exact scenario that motivated this fix
  • All 15+ call sites across 7 test files updated consistently; zero old-signature callers remain
  • The warning log on timeout (not fatal) is the right severity — shutdown timeout is informational, not something to crash over

Recommended Action

After fixing the merge conflict, we will merge as-is. Follow up on #750 when convenient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: Add Timeout to Graceful Shutdown in GTFS Manager

2 participants