Skip to content

#268 QuerySetのアルゴリズムの改善#894

Merged
izumin2000 merged 3 commits intorefactor/866-backendfrom
feature/268-queryset
Apr 7, 2026
Merged

#268 QuerySetのアルゴリズムの改善#894
izumin2000 merged 3 commits intorefactor/866-backendfrom
feature/268-queryset

Conversation

@izumin2000
Copy link
Copy Markdown
Owner

Summary

  • sitemap.py: values_list().iterator() でモデルインスタンスを生成せず ID のみ取得
  • migrate_imitates.py: Song.objects.iterator() でメモリ効率化
  • youtube.py: .distinct().iterator() でメモリ効率化
  • song_card.py: song.authors.all()[:2] でDB側 LIMIT 2 に
  • top.py: list(songInsL)[:-(new_count+1):-1]songInsL.order_by('-id')[:new_count] で全件ロードを排除しDB側 ORDER BY + LIMIT に変更

Test plan

  • トップページが正常に表示される(新着・未完成・AI・広告)
  • python manage.py sitemap が正常に完了する
  • python manage.py youtube が正常に完了する
  • python manage.py migrate_imitates --dry-run が正常に動作する

🤖 Generated with Claude Code

izumin2000 and others added 2 commits April 6, 2026 01:06
- sitemap.py: values_list().iterator()で必要カラムのみ取得し全件メモリロードを回避
  - Song: values_list('id', flat=True).iterator()
  - Author: values_list('id', 'name').iterator()
  - Article: values_list('article_id', flat=True).iterator()
- migrate_imitates.py: Song.objects.iterator()でQuerySetキャッシュを排除
- youtube.py: filter(...).distinct().iterator()でQuerySetキャッシュを排除
- song_card.py: song.authors.all()[:2]で合作判定に全件ロードが不要なためLIMIT 2に限定

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
list(songInsL)[:-(new_count + 1):-1]で全件ロードしていたのを
songInsL.order_by('-id')[:new_count]のDB側LIMIT処理に変更

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

Code Review

Overall, these are solid memory efficiency improvements. The changes correctly push work to the database instead of loading unnecessary data into Python. A few notes below:


subekashi/views/top.py

The most impactful change. The old code loaded the entire songInsL queryset into a Python list and reversed it:

context["songInsL"] = list(songInsL)[:-(new_count + 1):-1]

The new code is correct and much more efficient:

context["songInsL"] = songInsL.order_by('-id')[:new_count]

One assumption to be aware of: order_by('-id') implies that higher IDs = newer songs, which is a reasonable assumption for auto-increment PKs. If there's ever a case where IDs are not monotonically increasing with creation time (e.g., bulk imports, migrations), this could return results in a surprising order. If Song has a created_at field, order_by('-created_at') would be more semantically correct. Worth checking.


subekashi/templatetags/song_card.py

# Before
authors_list = list(song.authors.all())
# After
authors_list = list(song.authors.all()[:2])

This is correct. get_author only needs to know if there are 2+ authors (for "合作" display) and uses at most [0]. The LIMIT 2 is safe and avoids fetching unnecessary rows.


subekashi/management/commands/sitemap.py

Good use of values_list(...).iterator() — avoids model instantiation entirely and keeps memory usage flat for large tables.

Minor: The article_id field name in values_list('article_id', flat=True) suggests Article.article_id is not the default Django id. This is fine as long as it matches the model field. The comment fix (<int:article_id><str:article_id>) is a welcome correction.


subekashi/management/commands/youtube.py and migrate_imitates.py

Adding .iterator() to long-running management commands is a good practice. Both are correct.

Note for youtube.py: iterator() disables the queryset cache, which is fine here since each song object is only referenced once within its iteration step.


Test Coverage

No tests are added or updated in this PR. The top.py change is the most semantically significant (different SQL query, explicit ordering), and a unit test asserting the ordering and count of songInsL in the context would add confidence here. Not a blocker, but worth considering.


Summary

File Assessment
top.py ✅ Correct, much more efficient — verify -id vs -created_at intent
song_card.py ✅ Safe and correct
sitemap.py ✅ Good improvement + comment fix
youtube.py ✅ Correct
migrate_imitates.py ✅ Correct

The changes are a net improvement. Recommend clarifying the ordering field assumption in top.py before merging.

@izumin2000 izumin2000 linked an issue Apr 7, 2026 that may be closed by this pull request
全Songオブジェクトをロードしてタイトルのみ取り出していた箇所を
values_list('title', flat=True)でtitleカラムのみ取得するよう変更

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

Code Review: QuerySetのアルゴリズムの改善

Overall this is a solid performance-oriented PR. The changes are all in the right direction — reducing full-table loads, avoiding unnecessary model instantiation, and delegating work to the database. Below are my findings, organized by file.


subekashi/views/top.py

Change: list(songInsL)[:-(new_count + 1):-1]songInsL.order_by('-id')[:new_count]

Correctness concern (potential breaking change):
The original expression used Python's negative-step slice to reverse the list and take the last new_count elements. This assumes songInsL is already ordered by some criteria (presumably ascending by id or insertion order) and returns the most recent ones in reverse (newest-first) order.

The replacement order_by('-id')[:new_count] achieves the same intent — newest first, limited to new_count — but only if the semantic of "newest" is indeed id descending. If songInsL was originally ordered by a field other than id (e.g., created_at), this change silently alters ordering behavior. It would be worth verifying what produces songInsL and confirming that sorting by -id is equivalent in this context.

Performance improvement: Confirmed — avoids loading all rows into Python memory. This is the correct approach.


subekashi/views/author.py

Change: ", ".join(song.title for song in songInsL)", ".join(songInsL.values_list('title', flat=True))

Correctness concern:
values_list(..., flat=True) returns a ValuesListQuerySet, which is iterable and works with str.join(). However, if songInsL has already been evaluated/cached elsewhere in the view (e.g., used in template context), this introduces a second database query for the same data. Verify whether songInsL is consumed again after this line.

Performance improvement: Confirmed — avoids loading full model instances just to access the title field.


subekashi/templatetags/song_card.py

Change: list(song.authors.all())list(song.authors.all()[:2])

Logic concern (subtle bug risk):
The subsequent code checks if len(authors_list) >= 2 to decide whether the song is a collaboration. With [:2], this check becomes: "are there at least 2 authors in the first 2 results?" — i.e., len == 2.

This is fine for the collaboration check itself. However, if any code later in the template tag relies on authors_list having all authors (e.g., for display purposes beyond the 2-author check), it will now silently receive a truncated list.

Review the full get_author function body to confirm that only the first author (or first two) are ever accessed from authors_list. If yes, this change is correct and safe. If not, the limit of [:2] is too aggressive.

Performance improvement: Confirmed — limits the DB query at the SQL level.


subekashi/management/commands/sitemap.py

Changes:

  • Song.objects.all()values_list('id', flat=True).iterator()
  • Author.objects.all()values_list('id', 'name').iterator()
  • Article.objects.filter(...).exclude(...) → same with values_list('article_id', flat=True).iterator()

Good practices confirmed:

  • Using values_list avoids constructing full ORM model instances for operations that only need a few fields.
  • Using .iterator() prevents the QuerySet cache from accumulating all rows in memory, which is especially important for management commands that may process large tables.

Minor comment update:
The comment for articles was updated from # /articles/<int:article_id> to # /articles/<str:article_id>. Verify that article_id is indeed a string type in the model. If it's actually an integer, this comment creates a misleading inconsistency with the Song URL comment which still says <int:song_id>.

No functional regressions found in this file.


subekashi/management/commands/migrate_imitates.py

Change: Song.objects.all()Song.objects.iterator()

Correct and safe. For a management command iterating over all songs to perform migrations, .iterator() is the standard Django recommendation to avoid loading all rows into memory at once. The internal logic (checking song.imitate, doing writes) is unaffected.

Note: If chunk_size tuning is ever needed for very large tables, Django's iterator(chunk_size=N) supports it.


subekashi/management/commands/youtube.py

Change: .distinct().distinct().iterator()

Correct and safe. Adding .iterator() here prevents QuerySet caching on what could be a large result set. The .distinct() is already present to handle the links__isnull=False filter on a reverse FK that may produce duplicates.


Test Coverage

The test plan in the PR description is manual. There are no automated tests added or modified. Given the nature of these changes (QuerySet optimization), it would be valuable to add:

  1. Unit/integration tests for get_author in song_card.py — specifically asserting behavior when a song has more than 2 authors, to guard against regressions from the [:2] limit.
  2. A regression test for top.py — verifying that the top-page view returns songs in the expected order and count under different is_shown_new cookie values.

Summary

File Assessment
top.py Approve with caveat — confirm -id ordering semantics match original behavior
author.py Approve with caveat — confirm no double-query on songInsL
song_card.py Approve with caveat — confirm authors_list is not used beyond index 1
sitemap.py Approve — clean and correct
migrate_imitates.py Approve — clean and correct
youtube.py Approve — clean and correct

The overall direction is excellent. The three caveats above are low-to-medium risk and should be verified before merging, especially the top.py ordering change which could silently affect what users see on the homepage.

Review generated by Claude

@izumin2000 izumin2000 merged commit 16a3885 into refactor/866-backend Apr 7, 2026
1 check passed
@izumin2000 izumin2000 deleted the feature/268-queryset branch April 7, 2026 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QuerySetのアルゴリズムの改善

1 participant