Bug fix: Preserve complete trips by resolving trip_ids from filters#66
Bug fix: Preserve complete trips by resolving trip_ids from filters#66invisiblefunnel wants to merge 1 commit intomasterfrom
Conversation
Unfortunately, the refactor for v1.0 introduced a bug where
filtering by a file other than trips.txt could cause
the view to be incomplete.
|
|
||
| view = {"stops.txt": {"stop_id": full_feed.stops.stop_id[0]}} | ||
| feed = ptg.load_feed(fixture("trimet-vermont-2018-02-06"), view=view) | ||
| assert feed.stops.stop_id.shape[0] == 72 |
There was a problem hiding this comment.
Only one stop is present without the fix.
There was a problem hiding this comment.
I don't think I fully understand, because if the view specifies a single stop_id, wouldn't it be expected that only that stop is present in the output?
There was a problem hiding this comment.
Hi Stuart! This is a bit confusing. The reason is because partridge should always preserve the referential integrity of trips. If we consider trips the "atomic unit" of GTFS, then filtering by stop_id means "show me all the trips that use this stop". To keep trips whole, the resulting feed will have all stops, stop_times, etc. associated with those trips.
There was a problem hiding this comment.
Oh, that makes sense. Thanks Danny! :)
|
@dget this is also good to merge! |
Pre-1.0 versions of partridge preserved relationally-complete trips by resolving each filter clause to a set of reachable trip_ids. This behavior was mistakenly changed in the 1.0 refactor. You could be affected if you use partridge to filter by stop_times.txt or an ancestor of stop_times.txt in the dependency graph. This use case seems uncommon in practice.
To reproduce the issue, filter stops.txt by some field and observe that only stops matching the filter are kept, potentially leaving trips with missing stops. See the test case for details.
TODO