fix: filter same-ID and same-playlist duplicates from find-duplicate-names#36
fix: filter same-ID and same-playlist duplicates from find-duplicate-names#36
Conversation
…names Exclude non-actionable duplicate pairs where: - Both tracks have the same Spotify ID, or - Both tracks appear in the exact same set of playlists These pairs cannot be meaningfully resolved (no "keep one, remove other" decision possible), and filtering them out makes the output actionable. Changes: - Add playlist-identity check in Pass 1 and Pass 2 of find_duplicate_names() - Add test_same_id_tracks_excluded() to verify same-ID filtering - Add test_same_playlist_tracks_excluded() to verify same-playlist filtering - Update TODO.md to mark task as completed with enhanced future roadmap All 321 tests pass; linting checks pass. Co-Authored-By: Claude Code <noreply@anthropic.com>
Remove HIGH PRIORITY task 'Fix find-duplicate-names to Filter Same-ID Dupes' as it has been implemented in PR #36. Co-Authored-By: Claude Code <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates find-duplicate-names to drop duplicate pairs that are not actionable: pairs with the same Spotify track ID and pairs where both tracks appear in the exact same set of playlists. It also adds regression tests to ensure these cases are excluded.
Changes:
- Add a “same playlist-set” filter in both pass 1 (same-prefix) and pass 2 (cross-prefix shared-artist) comparisons.
- Add unit tests covering same-ID exclusion and same-playlist-set exclusion.
- Update
TODO.mdroadmap by removing a completed item and adding future enhancement tasks.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
spotfm/spotify/dupes.py |
Adds filtering to exclude pairs that share identical playlist membership in both matching passes. |
tests/test_dupes.py |
Adds tests asserting same-ID and same-playlist-set pairs are not returned by find_duplicate_names. |
TODO.md |
Updates planning notes (remove completed task; add duration/categorization and SQL parameterization roadmap items). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address Copilot review feedback: - Precompute playlist_id_set for each candidate in get_fuzzy_match_candidates() - Use cached sets instead of computing frozenset() on-the-fly in Pass 1 & 2 - Move pair_key check before expensive playlist comparisons (Pass 1) - Eliminates O(n) set construction per pair comparison Improvements: - Reduces hot-path overhead in match-processing loops - Avoids repeated frozenset() calls for identical playlist comparisons - All 321 tests pass; linting passes Co-Authored-By: Claude Code <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -213,6 +215,7 @@ def get_fuzzy_match_candidates(excluded_playlist_ids=None, min_name_length=3): | |||
| "full_name": full_name, | |||
| "name_prefix": name_prefix, | |||
| "name_length": name_length, | |||
| "playlist_id_set": playlist_id_set, | |||
| } | |||
There was a problem hiding this comment.
get_fuzzy_match_candidates() now returns an additional playlist_id_set field, but the function docstring/return description still only mentions prefix/length metadata. Please update the docstring (and any documented return shape) to include playlist_id_set so callers/tests know it’s part of the candidate schema.
| "playlists": shared_playlists, | ||
| "full_name": "The Beatles - Come Together", | ||
| "name_prefix": "com", | ||
| "name_length": 14, |
There was a problem hiding this comment.
The mocked candidate for "Come Together" sets name_length to 14, but in production this field comes from LENGTH(t.name) (i.e., len(name), which is 12 for "Come Together"). To keep the mock aligned with real candidate schema and avoid future refactor brittleness, compute this with len(...) (or set the correct literal).
| "name_length": 14, | |
| "name_length": len("Come Together"), |
Summary
Filter out non-actionable duplicates from
find-duplicate-names:These pairs cannot be meaningfully resolved since there's no "keep one, remove other" decision possible.
Changes
Code Changes (spotfm/spotify/dupes.py)
Test Coverage (tests/test_dupes.py)
Documentation (TODO.md)
Testing
Related Issues
Implements and closes the HIGH PRIORITY task from TODO.md: "Fix find-duplicate-names to Filter Same-ID Dupes"