fix: filter same-ID and same-playlist duplicates from find-duplicate-names by jmlrt · Pull Request #36 · jmlrt/spotfm

jmlrt · 2026-03-16T10:15:32Z

Summary

Filter out non-actionable duplicates from find-duplicate-names:

Tracks with the same Spotify ID
Tracks appearing in the exact same set of playlists

These pairs cannot be meaningfully resolved since there's no "keep one, remove other" decision possible.

Changes

Code Changes (spotfm/spotify/dupes.py)

Pass 1: Added playlist-identity filter (lines 428-432)
- Skips pairs where both tracks have identical playlist sets
Pass 2: Added playlist-identity filter (lines 541-545)
- Applies same logic to cross-prefix comparisons

Test Coverage (tests/test_dupes.py)

test_same_id_tracks_excluded(): Verifies same-ID pairs are filtered
- Mocks candidates with identical track ID
- Confirms output is empty
test_same_playlist_tracks_excluded(): Verifies same-playlist pairs are filtered
- Mocks candidates in identical playlists
- Confirms output is empty

Documentation (TODO.md)

✅ Removed completed HIGH PRIORITY task: "Fix find-duplicate-names to Filter Same-ID Dupes"
Enhanced future roadmap with duration-based categorization and parameterized SQL migration

Testing

✅ All 321 tests pass
✅ Linting passed (ruff)
✅ New test cases added and passing
✅ Code coverage maintained at 79.46%

Related Issues

Implements and closes the HIGH PRIORITY task from TODO.md: "Fix find-duplicate-names to Filter Same-ID Dupes"

…names Exclude non-actionable duplicate pairs where: - Both tracks have the same Spotify ID, or - Both tracks appear in the exact same set of playlists These pairs cannot be meaningfully resolved (no "keep one, remove other" decision possible), and filtering them out makes the output actionable. Changes: - Add playlist-identity check in Pass 1 and Pass 2 of find_duplicate_names() - Add test_same_id_tracks_excluded() to verify same-ID filtering - Add test_same_playlist_tracks_excluded() to verify same-playlist filtering - Update TODO.md to mark task as completed with enhanced future roadmap All 321 tests pass; linting checks pass. Co-Authored-By: Claude Code <noreply@anthropic.com>

Remove HIGH PRIORITY task 'Fix find-duplicate-names to Filter Same-ID Dupes' as it has been implemented in PR #36. Co-Authored-By: Claude Code <noreply@anthropic.com>

Copilot

Pull request overview

This PR updates find-duplicate-names to drop duplicate pairs that are not actionable: pairs with the same Spotify track ID and pairs where both tracks appear in the exact same set of playlists. It also adds regression tests to ensure these cases are excluded.

Changes:

Add a “same playlist-set” filter in both pass 1 (same-prefix) and pass 2 (cross-prefix shared-artist) comparisons.
Add unit tests covering same-ID exclusion and same-playlist-set exclusion.
Update TODO.md roadmap by removing a completed item and adding future enhancement tasks.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`spotfm/spotify/dupes.py`	Adds filtering to exclude pairs that share identical playlist membership in both matching passes.
`tests/test_dupes.py`	Adds tests asserting same-ID and same-playlist-set pairs are not returned by `find_duplicate_names`.
`TODO.md`	Updates planning notes (remove completed task; add duration/categorization and SQL parameterization roadmap items).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

spotfm/spotify/dupes.py

Address Copilot review feedback: - Precompute playlist_id_set for each candidate in get_fuzzy_match_candidates() - Use cached sets instead of computing frozenset() on-the-fly in Pass 1 & 2 - Move pair_key check before expensive playlist comparisons (Pass 1) - Eliminates O(n) set construction per pair comparison Improvements: - Reduces hot-path overhead in match-processing loops - Avoids repeated frozenset() calls for identical playlist comparisons - All 321 tests pass; linting passes Co-Authored-By: Claude Code <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-21T09:18:46Z

spotfm/spotify/dupes.py

@@ -213,6 +215,7 @@ def get_fuzzy_match_candidates(excluded_playlist_ids=None, min_name_length=3):
                "full_name": full_name,
                "name_prefix": name_prefix,
                "name_length": name_length,
+                "playlist_id_set": playlist_id_set,
            }


get_fuzzy_match_candidates() now returns an additional playlist_id_set field, but the function docstring/return description still only mentions prefix/length metadata. Please update the docstring (and any documented return shape) to include playlist_id_set so callers/tests know it’s part of the candidate schema.

Copilot · 2026-03-21T09:18:47Z

tests/test_dupes.py

+                "playlists": shared_playlists,
+                "full_name": "The Beatles - Come Together",
+                "name_prefix": "com",
+                "name_length": 14,


The mocked candidate for "Come Together" sets name_length to 14, but in production this field comes from LENGTH(t.name) (i.e., len(name), which is 12 for "Come Together"). To keep the mock aligned with real candidate schema and avoid future refactor brittleness, compute this with len(...) (or set the correct literal).

Suggested change

"name_length": 14,

"name_length": len("Come Together"),

jmlrt and others added 2 commits March 16, 2026 11:12

chore: remove completed task from TODO.md

688a613

Remove HIGH PRIORITY task 'Fix find-duplicate-names to Filter Same-ID Dupes' as it has been implemented in PR #36. Co-Authored-By: Claude Code <noreply@anthropic.com>

jmlrt marked this pull request as ready for review March 21, 2026 08:34

Copilot AI review requested due to automatic review settings March 21, 2026 08:34

Copilot started reviewing on behalf of jmlrt March 21, 2026 08:35 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

spotfm/spotify/dupes.py Outdated Show resolved Hide resolved

spotfm/spotify/dupes.py Outdated Show resolved Hide resolved

jmlrt requested a review from Copilot March 21, 2026 09:15

Copilot started reviewing on behalf of jmlrt March 21, 2026 09:15 View session

Copilot AI reviewed Mar 21, 2026

View reviewed changes

jmlrt merged commit aef4c31 into main Mar 21, 2026
8 checks passed

jmlrt deleted the fix/duplicate-names-filtering branch March 21, 2026 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: filter same-ID and same-playlist duplicates from find-duplicate-names#36

fix: filter same-ID and same-playlist duplicates from find-duplicate-names#36
jmlrt merged 3 commits intomainfrom
fix/duplicate-names-filtering

jmlrt commented Mar 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Copilot AI Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jmlrt commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Code Changes (spotfm/spotify/dupes.py)

Test Coverage (tests/test_dupes.py)

Documentation (TODO.md)

Testing

Related Issues

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jmlrt commented Mar 16, 2026 •

edited

Loading