Skip to content

fix: resolve direction calculator caching bugs and prevent stale queries on hot-swap#797

Open
3rabiii wants to merge 2 commits intoOneBusAway:mainfrom
3rabiii:fix/direction-calculator-caching
Open

fix: resolve direction calculator caching bugs and prevent stale queries on hot-swap#797
3rabiii wants to merge 2 commits intoOneBusAway:mainfrom
3rabiii:fix/direction-calculator-caching

Conversation

@3rabiii
Copy link
Copy Markdown
Contributor

@3rabiii 3rabiii commented Mar 26, 2026

Description

This PR addresses several reliability and correctness bugs within the AdvancedDirectionCalculator. Previously, transient database errors could permanently poison the direction cache, insufficient shape points defaulted to an invalid "East" orientation, and the calculator retained a stale database pointer after a GTFS hot-swap.

Changes Included

  • Prevent Transient Error Caching: computeFromShapes now returns (string, error). The loop tracks transient DB errors (lastTransientErr) and propagates them to the caller, ensuring we only cache valid directions or legitimate "no data" states, avoiding permanent cache poisoning during DB hiccups.
  • Fix Bogus Orientations: Explicitly return sql.ErrNoRows when there are fewer than 2 shape points. This prevents the caller from incorrectly interpreting a nil error as a valid 0 radian (East) orientation.
  • Add Error Visibility: Added slog.Warn for non-ErrNoRows errors in the orientation calculation loop to surface silent failures like DB connection drops or timeouts.
  • Resolve Stale Queries on Hot-Swap: - Added a sync.RWMutex to protect the queries pointer in AdvancedDirectionCalculator.
    • Added an UpdateQueries() method to atomically swap the DB pointer and safely evict the entire direction result cache.
    • Wired the calculator into gtfs.Manager during startup so that ForceUpdate can automatically trigger the refresh after a successful GTFS reload.

Verification

image

fixes: #747

Copy link
Copy Markdown

@fletcherw fletcherw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Adel, this is a good improvement to the caching logic. I have some feedback but no large changes are needed.

adc.queriesMu.Unlock()

// Evict all cached directions so they are recomputed against the new DB.
adc.directionResults.Range(func(key, _ any) bool {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be written more simply as adc.directionResults.Clear().


// Test with a non-existent stop
direction := calc.computeFromShapes(ctx, "nonexistent")
direction, _ := calc.computeFromShapes(ctx, "nonexistent")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than ignoring errors here and below, you should add assert.NoError(t, err) calls.

adc.directionResults.Store(stopID, computedDir)
}

return computedDir, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, even if computeFromShapes returned an err, you're returning computedDir and a nil error. If the intention is to ignore errors, you should leave a comment here explaining why it's safe to do so.

// a recovered database will be retried on the next request.
// Lifecycle note: This map grows indefinitely for the lifetime of the application.
// Unbounded growth is acceptable here because it is strictly bounded by the finite
// number of valid real-world stops, and computed directions remain stable across GTFS reloads.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is out-of-date now since the directions can change after reloads. Either don't clear the directions on reload or update the comment.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an easy way you can add a test for the new behavior you've added, verifying that a returned error value isn't cached?


dbConfig := newGTFSDBConfig(finalDBPath, manager.config)
if reopenedClient, reopenErr := gtfsdb.NewClient(dbConfig); reopenErr == nil {
manager.GtfsDB = reopenedClient
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need to call UpdateQueries here if the previous client was closed and then reopened?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Direction calculator: permanent negative cache on transient DB errors

2 participants