1
application/single_app/route_backend_public_documents.py
Like the internal list routes, this external public-docs list now loads all matching docs and paginates in Python after collapsing revisions. That can get expensive quickly for large workspaces and defeats Cosmos-side paging/sorting. Prefer filtering to current revisions directly in the query (with a legacy fallback for missing fields) so OFFSET/LIMIT can stay in Cosmos and the endpoint remains scalable.
GitHub Copilot: Read
Read skill cosmosdb-best-practices
I’m checking the actual public-documents list path against the internal list routes to see whether the suggestion is accurate and whether the current query shape is forcing Python-side pagination.
Read , lines 1 to 400
Searched for regex collapse|revision|OFFSET|LIMIT|current_revision|is_current|list_public|list.*documents (**/application/single_app/**/*.py), 200 results
Searched for files matching **/application/single_app/route_backend*_documents.py, 3 matches
I confirmed the public route is doing a full query, then select_current_documents, then Python slicing. I’m pulling the personal and group equivalents now to check whether the suggestion matches the existing pattern or if this route diverged.
Searched for regex def api_list_documents|def api_list_group_documents|SELECT \* FROM c WHERE|OFFSET|select_current_documents\(|sort_documents\( (**/application/single_app/route_backend*_documents.py), 23 results
Read , lines 1 to 260
Read , lines 1 to 260
Read , lines 280 to 360
I found the same pattern in the personal and group list endpoints, so the suggestion’s premise is directionally right on scalability, but it’s not a regression unique to this public route. I’m reading the exact list implementations and the revision helpers to see whether a Cosmos-side filter is safe with the project’s legacy revision fallback.
Read , lines 500 to 590
Read , lines 260 to 330
Read , lines 420 to 490
Read , lines 600 to 740
There’s already a stronger pattern elsewhere in the codebase: revision-aware queries fetch only the fields needed, then collapse with a current-version check plus a legacy fallback. I’m reading that helper because it’s probably the best evidence for or against the suggestion as written.
Read changed files in
Read , lines 7480 to 7575
Searched for text is_current_version|revision_family_id|search_visibility_state (**/application/single_app/**/*.py), 61 results
The main caveat is the legacy path: select_current_documents does more than check a boolean flag, because it groups by revision family and falls back to filename-based collapsing when revision metadata is missing. I’m checking the repo memory for prior revision behavior so I don’t overstate what a simple query rewrite can safely do.
Read memory
Searched for regex normalize_document_revision_families\( (**/application/single_app/**/*.py), 4 results
Read , lines 110 to 160
I have enough context to assess it, but I want exact anchors for the revision helpers before I answer so I can be precise about the caveat around legacy families and normalization.
Searched for regex def select_current_documents|def normalize_document_revision_families (**/application/single_app/functions_documents.py), 2 results
I mostly agree with it. The performance concern is real: route_backend_public_documents.py fetches all matching documents, route_backend_public_documents.py collapses revisions in Python, and route_backend_public_documents.py paginates in Python. For a large public workspace, that is exactly the kind of pattern that burns RU and latency unnecessarily. It also is not unique to this route; the personal and group list endpoints do the same thing in route_backend_documents.py and route_backend_group_documents.py.
The part I would tighten is the legacy fallback wording. A simple current-version predicate is not fully equivalent to the existing behavior, because functions_documents.py does more than check a flag: it collapses legacy families using revision-family metadata and, when that is missing, a filename-based fallback. The list routes do not run the normalization pass that search does in functions_search.py through functions_documents.py. So if you only add something like “is current or missing field” to the Cosmos query, older unnormalized multi-revision families can still leak duplicates and break counts/paging semantics.
I would restate the suggestion as: move these list routes to a query-side fast path for normalized current revisions, but keep an explicit fallback or one-time normalization/migration path for legacy revision families. Also, if the API contract ever changes, Cosmos continuation tokens are a better long-term fit than deep OFFSET/LIMIT, though keeping filtering and sorting in Cosmos is still a good improvement within the current page/page_size contract.
- I can rewrite this into a tighter PR review comment.
- I can sketch the safest implementation approach for all three list routes, not just the public one.
2
application/single_app/route_backend_group_documents.py
This group list path now loads all matching documents and paginates client-side after select_current_documents(...). That can be very costly for large groups and removes Cosmos-side pagination (ORDER BY ... OFFSET ... LIMIT ...). Since the data model now tracks current revision state, prefer a Cosmos filter for current docs (with legacy fallback) so paging/sorting stays server-side and scalable.
GitHub Copilot: I’m checking the Cosmos DB guidance and the relevant group document path first, then I’ll judge whether this is a real scalability regression or just a theoretical concern.
Read skill cosmosdb-best-practices
Read , lines 1 to 400
Read
Read memory
The route does appear to materialize the full match set before slicing, so I’m tracing whether the revision flags already support a server-side filter and whether other routes already solved this better.
Searched for regex select_current_documents|is_current_version|search_visibility_state (**/application/single_app/**/*.py), 35 results
Read , lines 1 to 320
Read , lines 1 to 420
I found the same pattern in the personal and public list routes, but the key question is whether the group route can now rely on persisted current-revision flags instead of collapsing in Python. I’m checking that path next.
Read , lines 460 to 610
Read , lines 430 to 760
Read , lines 150 to 330
I’ve confirmed the current route does full materialization and in-memory pagination. I’m checking whether that was an intentional compromise tied to legacy revision handling or whether it can be replaced cleanly with a server-side current-doc filter.
Read changed files in
Read , lines 220 to 340
Searched for regex ORDER BY c\._ts|OFFSET|LIMIT (**/application/single_app/**/*.py), 100 results
I haven’t found a server-side current-only query helper yet, which makes this look like a duplicated stopgap rather than a deliberate long-term design. I’m checking whether the persisted flags are reliable enough for a filtered query with a legacy fallback.
Ran terminal command: git log -n 5 -- application/single_app/route_backend_group_documents.py
Searched for regex is_current_version = true|search_visibility_state = 'active'|search_visibility_state = "active"|IS_DEFINED\(c\.is_current_version\)|NOT IS_DEFINED\(c\.is_current_version\) (**/application/single_app/**/*.py), no results
Read , lines 2990 to 3085
The history confirms this behavior came in with the recent document versioning change, so I’m pulling that diff to see whether the fallback logic really required the in-memory collapse.
Ran terminal command: git show 5feca22 -- application/single_app/route_backend_group_documents.py
Ran terminal command: git show 5feca22 -- application/single_app/route_backend_documents.py application/single_app/route_backend_public_documents.py
The diff confirms this was a direct tradeoff: the prior server-side count and page query were dropped solely to dedupe revisions in Python. I’m reading the related route diff now to see whether there’s already a better pattern elsewhere.
Read , lines 1 to 260
I’ve got enough to assess it. I’m grabbing exact anchors for the relevant lines so I can give you a precise recommendation instead of a vague “sounds good.”
Searched for regex matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\] (**/application/single_app/route_backend_group_documents.py), 4 results
Searched for regex "is_current_version": True|"search_visibility_state": "active"|existing_document\['is_current_version'\] = False|existing_document\['search_visibility_state'\] = 'archived' (**/application/single_app/functions_documents.py), 8 results
Searched for regex promoted_document\['is_current_version'\] = True|promoted_document\['search_visibility_state'\] = 'active' (**/application/single_app/functions_documents.py), 2 results
Searched for regex def normalize_document_revision_families|document_item\["is_current_version"\] = expected_current|document_item\["search_visibility_state"\] = "active"|document_item\["search_visibility_state"\] = "archived" (**/application/single_app/functions_documents.py), 5 results
Searched for regex legacy_q =|needs_legacy_update_check (**/application/single_app/route_backend_group_documents.py), 3 results
Searched for regex matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\] (**/application/single_app/route_backend_documents.py|application/single_app/route_backend_public_documents.py), no results
One last check: I’m confirming whether this same pagination tradeoff exists in the sibling list routes, because if it does, the best fix is probably a shared helper rather than a one-off group patch.
Searched for regex matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\] (**/application/single_app/route_backend_documents.py), 4 results
Searched for regex matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\] (**/application/single_app/route_backend_public_documents.py), 5 results
I agree with the suggestion. It’s a real scalability regression, not just a stylistic preference.
The problem is visible in route_backend_group_documents.py: the route now fetches every matching document, runs functions_documents.py in Python, then computes total_count and slices the page in memory. That means RU cost, payload size, and app memory all grow with the full result set and full revision history, instead of with the requested page size. For large groups, that’s exactly the wrong side of the tradeoff.
The suggestion is also supported by the data model. New current revisions are persisted with is_current_version=True and search_visibility_state="active" in functions_documents.py, prior revisions are archived in functions_documents.py, delete promotion restores the prior current revision in functions_documents.py, and there’s already a repair path in functions_documents.py. So the route should prefer a Cosmos query filtered to current docs and keep ORDER BY ... OFFSET ... LIMIT on the server.
The one caveat is legacy data. A pure c.is_current_version = true filter can hide older docs that were never backfilled, and the route already knows legacy records may exist via route_backend_group_documents.py. So the right implementation is usually:
- If no legacy docs are present for the requested groups, use server-side count + paged query with a current-only filter.
- If legacy docs are present, fall back to the current materialize-and-collapse path until those groups are normalized.
One more thing: this same regression exists in the personal and public list routes too, at route_backend_documents.py and route_backend_public_documents.py. If you fix it, I’d do it as a shared helper rather than only in the group route.
If you want, I can implement that current-only server-side path with legacy fallback across all three routes.
3
application/single_app/route_backend_documents.py
These list endpoints now fetch all matching documents from Cosmos (SELECT * ... WHERE ...), then collapse/sort/paginate in Python. For users/groups with many documents this can become very expensive in RU, memory, and latency (and defeats server-side OFFSET/LIMIT). Since revision state is now explicitly modeled, consider filtering server-side for current revisions (e.g., c.is_current_version = true or c.search_visibility_state = 'active' with a legacy fallback for missing fields) so Cosmos can paginate efficiently.
GitHub Copilot: I’m checking the actual document list routes and the current revision logic first, then I’ll assess whether the suggestion is correct in this codebase and where the tradeoffs are.
Read
Read skill cosmosdb-best-practices
Read , lines 1 to 400
Read
Read memory
I’ve confirmed the repo already models current revisions explicitly. Next I’m tracing which list routes still fetch the full matching set and then collapse in Python, because that determines whether the suggestion is just valid in principle or actionable as-is here.
Searched for regex list_documents|select_current_documents|normalize_document_revision_families|OFFSET|LIMIT|page|pagination|sort (**/application/single_app/route_backend_documents.py), 53 results
Searched for regex def select_current_documents|def normalize_document_revision_families|is_current_version|search_visibility_state (**/application/single_app/functions_documents.py), 27 results
Read , lines 1 to 420
I found the personal documents list route that does exactly what the suggestion describes: broad fetch, then select_current_documents, sort, and slice in Python. I’m reading the surrounding implementation and the helper query code now to check whether a server-side filter is straightforward or whether legacy/shared-document rules make it more complicated.
Read , lines 400 to 620
Searched for regex _query_accessible_documents|get_all_documents|get_documents_by|shared_user_ids|shared_group_ids (**/application/single_app/functions_documents.py), 125 results
Read , lines 7480 to 7565
The suggestion is directionally right, but I want to be precise about the legacy edge case before I give you a yes/no. The current helper falls back to “latest per family or per legacy filename” when revision flags are missing, so a naive is_current_version = true filter would be incomplete unless those old docs have already been normalized.
Read , lines 420 to 520
Searched for regex normalize_document_revision_families\( (**/application/single_app/**/*.py), 4 results
Searched for regex @app.route\('/api/documents|@app.route\('/api/group|@app.route\('/api/public (**/application/single_app/*.py), 61 results
I’ve got the code path. I’m pulling one quick docs check on Cosmos pagination so I don’t hand-wave the RU tradeoff, because OFFSET/LIMIT and continuation tokens are not equivalent in Cosmos.
Calling documentation microsoft_docs_search...
Completed with input: {
"intent": "Find official Azure Cosmos DB guidance comparing OFFSET LIMIT and continuation tokens for efficient pagination and RU usage"
}
Read
I’m doing one last pass on the sibling group/public list routes. If they mirror this pattern, that strengthens the case for a shared helper rather than a one-off tweak in the personal route.
Searched for regex select_current_documents\(|matching_docs = list\(|offset = \(page - 1\) \* page_size|docs = current_docs\[offset:offset \+ page_size\] (**/application/single_app/route_backend_group_documents.py), 4 results
Searched for regex select_current_documents\(|matching_docs = list\(|offset = \(page - 1\) \* page_size|docs = current_docs\[offset:offset \+ page_size\] (**/application/single_app/route_backend_public_documents.py), 5 results
Read , lines 145 to 270
Read , lines 113 to 210
I’ve got enough to answer. One quick check remains: whether the existing “legacy upgrade” flow already normalizes revision flags. If it doesn’t, that’s the main reason I’d treat the suggestion as “good, but incomplete” instead of “ship it exactly as written.”
Read , lines 840 to 930
Searched for regex upgrade_legacy|legacy_update|normalize_document_revision_families|is_current_version|search_visibility_state (**/application/single_app/functions_documents.py), 27 results
Read , lines 6889 to 6975
Read , lines 6975 to 7015
I think the suggestion is good and worth acting on, but I would tighten the implementation details.
-
The performance diagnosis is accurate. The personal list route in route_backend_documents.py fetches all matching documents, then collapses revisions, sorts, and paginates in Python at route_backend_documents.py, route_backend_documents.py, and route_backend_documents.py. For large workspaces that is expensive in RU, memory, and latency.
-
I would not replace that with only is_current_version = true yet. The current selection logic in functions_documents.py still supports legacy records by grouping on revision family when present, or scope plus file name when it is not. The list route does not call functions_documents.py, and the existing legacy upgrade flow in functions_documents.py does not backfill revision flags. So a strict current-only filter can hide valid legacy docs.
-
The right transitional move is to push down as much filtering as possible: prefer current rows server-side, use search_visibility_state = active as compatibility, and keep a fallback path for records missing both fields until data is normalized. That will reduce the candidate set substantially, even though shared-document access still keeps the personal query cross-partition.
-
I would also tweak the pagination claim. Server-side pagination is better than materializing everything, but in Cosmos the efficient pagination primitive is continuation tokens, not large OFFSET/LIMIT pages. If the API must preserve page numbers and total_count, OFFSET/LIMIT can be a short-term step; it is not the best long-term Cosmos shape.
-
The same pattern exists in the group and public list routes, so this is really a shared-query/helper problem, not just a personal-route issue: route_backend_group_documents.py and route_backend_public_documents.py.
If I were turning that suggestion into work, I’d do it in this order:
- Normalize or backfill revision markers for legacy documents.
- Move the personal, group, and public list routes to a shared current-documents query helper.
- Decide whether to keep page-number pagination temporarily or switch the API to continuation tokens.
If you want, I can turn that into a concrete refactor plan or implement the first pass.
GitHub Copilot: I’m checking the Cosmos guidance and the revision-handling caveat once, then I’ll tell you whether these comments are converging on the same fix or describing different work.
Read
Read skill cosmosdb-best-practices
I’ve loaded the general Cosmos best-practice guidance; I’m checking the repo’s explicit Cosmos instructions and the prior revision-state note to make sure the answer is grounded in this codebase’s constraints.
Read
Read memory
I’ve confirmed the shared theme; I’m pulling exact file anchors for the three list routes and the revision helper so the comparison is precise rather than just high-level.
Read , lines 150 to 280
Read , lines 500 to 590
Read , lines 110 to 220
Read , lines 420 to 520
I have the personal and public anchors; I’m grabbing the matching group-route block and the current-selection helper so the three routes can be compared apples to apples.
Read , lines 280 to 360
Read , lines 300 to 420
I have the comparison; I’m pulling exact line numbers for the shared fetch-collapse-slice pattern so I can answer with direct anchors instead of approximate references.
Searched for regex matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\] (**/application/single_app/route_backend_documents.py), 4 results
Searched for regex matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\] (**/application/single_app/route_backend_group_documents.py), 4 results
Searched for regex matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\] (**/application/single_app/route_backend_public_documents.py), 5 results
Searched for regex def select_current_documents|def normalize_document_revision_families|is_current_version|search_visibility_state (**/application/single_app/functions_documents.py), 27 results
Yes. They’re all converging on the same underlying resolution, even if they phrase it a little differently.
The shared problem is the same in all three list routes: fetch the full matching Cosmos result set, collapse revisions with select_current_documents, then paginate in Python. You can see that pattern in route_backend_documents.py, route_backend_group_documents.py, and route_backend_public_documents.py. The shared caveat is also the same: current-selection is not just a simple flag check, because functions_documents.py collapses legacy families too, and functions_documents.py is the normalization path that repairs revision metadata.
So the comments are aiming at one common fix direction:
- Move revision filtering closer to the Cosmos query so the list endpoints stop materializing everything first.
- Keep a legacy fallback or backfill path, because a pure is_current_version = true filter is not fully equivalent for older data.
- Ideally solve it once as a shared helper for personal, group, and public routes, rather than three separate one-off patches.
- Treat continuation tokens as the better long-term Cosmos pagination model, while server-side filtering/paging is the immediate improvement.
So: same resolution family, not three unrelated suggestions. The later comments are basically refining the earlier ones by adding the legacy-data constraint and the shared-helper angle. No code changed.
1
application/single_app/route_backend_public_documents.py
Like the internal list routes, this external public-docs list now loads all matching docs and paginates in Python after collapsing revisions. That can get expensive quickly for large workspaces and defeats Cosmos-side paging/sorting. Prefer filtering to current revisions directly in the query (with a legacy fallback for missing fields) so OFFSET/LIMIT can stay in Cosmos and the endpoint remains scalable.
GitHub Copilot: Read
Read skill cosmosdb-best-practices
I’m checking the actual public-documents list path against the internal list routes to see whether the suggestion is accurate and whether the current query shape is forcing Python-side pagination.
Read , lines 1 to 400
Searched for regex
collapse|revision|OFFSET|LIMIT|current_revision|is_current|list_public|list.*documents(**/application/single_app/**/*.py), 200 resultsSearched for files matching
**/application/single_app/route_backend*_documents.py, 3 matchesI confirmed the public route is doing a full query, then
select_current_documents, then Python slicing. I’m pulling the personal and group equivalents now to check whether the suggestion matches the existing pattern or if this route diverged.Searched for regex
def api_list_documents|def api_list_group_documents|SELECT \* FROM c WHERE|OFFSET|select_current_documents\(|sort_documents\((**/application/single_app/route_backend*_documents.py), 23 resultsRead , lines 1 to 260
Read , lines 1 to 260
Read , lines 280 to 360
I found the same pattern in the personal and group list endpoints, so the suggestion’s premise is directionally right on scalability, but it’s not a regression unique to this public route. I’m reading the exact list implementations and the revision helpers to see whether a Cosmos-side filter is safe with the project’s legacy revision fallback.
Read , lines 500 to 590
Read , lines 260 to 330
Read , lines 420 to 490
Read , lines 600 to 740
There’s already a stronger pattern elsewhere in the codebase: revision-aware queries fetch only the fields needed, then collapse with a current-version check plus a legacy fallback. I’m reading that helper because it’s probably the best evidence for or against the suggestion as written.
Read changed files in
Read , lines 7480 to 7575
Searched for text
is_current_version|revision_family_id|search_visibility_state(**/application/single_app/**/*.py), 61 resultsThe main caveat is the legacy path:
select_current_documentsdoes more than check a boolean flag, because it groups by revision family and falls back to filename-based collapsing when revision metadata is missing. I’m checking the repo memory for prior revision behavior so I don’t overstate what a simple query rewrite can safely do.Read memory
Searched for regex
normalize_document_revision_families\((**/application/single_app/**/*.py), 4 resultsRead , lines 110 to 160
I have enough context to assess it, but I want exact anchors for the revision helpers before I answer so I can be precise about the caveat around legacy families and normalization.
Searched for regex
def select_current_documents|def normalize_document_revision_families(**/application/single_app/functions_documents.py), 2 resultsI mostly agree with it. The performance concern is real: route_backend_public_documents.py fetches all matching documents, route_backend_public_documents.py collapses revisions in Python, and route_backend_public_documents.py paginates in Python. For a large public workspace, that is exactly the kind of pattern that burns RU and latency unnecessarily. It also is not unique to this route; the personal and group list endpoints do the same thing in route_backend_documents.py and route_backend_group_documents.py.
The part I would tighten is the legacy fallback wording. A simple current-version predicate is not fully equivalent to the existing behavior, because functions_documents.py does more than check a flag: it collapses legacy families using revision-family metadata and, when that is missing, a filename-based fallback. The list routes do not run the normalization pass that search does in functions_search.py through functions_documents.py. So if you only add something like “is current or missing field” to the Cosmos query, older unnormalized multi-revision families can still leak duplicates and break counts/paging semantics.
I would restate the suggestion as: move these list routes to a query-side fast path for normalized current revisions, but keep an explicit fallback or one-time normalization/migration path for legacy revision families. Also, if the API contract ever changes, Cosmos continuation tokens are a better long-term fit than deep OFFSET/LIMIT, though keeping filtering and sorting in Cosmos is still a good improvement within the current page/page_size contract.
2
application/single_app/route_backend_group_documents.py
This group list path now loads all matching documents and paginates client-side after select_current_documents(...). That can be very costly for large groups and removes Cosmos-side pagination (ORDER BY ... OFFSET ... LIMIT ...). Since the data model now tracks current revision state, prefer a Cosmos filter for current docs (with legacy fallback) so paging/sorting stays server-side and scalable.
GitHub Copilot: I’m checking the Cosmos DB guidance and the relevant group document path first, then I’ll judge whether this is a real scalability regression or just a theoretical concern.
Read skill cosmosdb-best-practices
Read , lines 1 to 400
Read
Read memory
The route does appear to materialize the full match set before slicing, so I’m tracing whether the revision flags already support a server-side filter and whether other routes already solved this better.
Searched for regex
select_current_documents|is_current_version|search_visibility_state(**/application/single_app/**/*.py), 35 resultsRead , lines 1 to 320
Read , lines 1 to 420
I found the same pattern in the personal and public list routes, but the key question is whether the group route can now rely on persisted current-revision flags instead of collapsing in Python. I’m checking that path next.
Read , lines 460 to 610
Read , lines 430 to 760
Read , lines 150 to 330
I’ve confirmed the current route does full materialization and in-memory pagination. I’m checking whether that was an intentional compromise tied to legacy revision handling or whether it can be replaced cleanly with a server-side current-doc filter.
Read changed files in
Read , lines 220 to 340
Searched for regex
ORDER BY c\._ts|OFFSET|LIMIT(**/application/single_app/**/*.py), 100 resultsI haven’t found a server-side current-only query helper yet, which makes this look like a duplicated stopgap rather than a deliberate long-term design. I’m checking whether the persisted flags are reliable enough for a filtered query with a legacy fallback.
Ran terminal command: git log -n 5 -- application/single_app/route_backend_group_documents.py
Searched for regex
is_current_version = true|search_visibility_state = 'active'|search_visibility_state = "active"|IS_DEFINED\(c\.is_current_version\)|NOT IS_DEFINED\(c\.is_current_version\)(**/application/single_app/**/*.py), no resultsRead , lines 2990 to 3085
The history confirms this behavior came in with the recent document versioning change, so I’m pulling that diff to see whether the fallback logic really required the in-memory collapse.
Ran terminal command: git show 5feca22 -- application/single_app/route_backend_group_documents.py
Ran terminal command: git show 5feca22 -- application/single_app/route_backend_documents.py application/single_app/route_backend_public_documents.py
The diff confirms this was a direct tradeoff: the prior server-side count and page query were dropped solely to dedupe revisions in Python. I’m reading the related route diff now to see whether there’s already a better pattern elsewhere.
Read , lines 1 to 260
I’ve got enough to assess it. I’m grabbing exact anchors for the relevant lines so I can give you a precise recommendation instead of a vague “sounds good.”
Searched for regex
matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\](**/application/single_app/route_backend_group_documents.py), 4 resultsSearched for regex
"is_current_version": True|"search_visibility_state": "active"|existing_document\['is_current_version'\] = False|existing_document\['search_visibility_state'\] = 'archived'(**/application/single_app/functions_documents.py), 8 resultsSearched for regex
promoted_document\['is_current_version'\] = True|promoted_document\['search_visibility_state'\] = 'active'(**/application/single_app/functions_documents.py), 2 resultsSearched for regex
def normalize_document_revision_families|document_item\["is_current_version"\] = expected_current|document_item\["search_visibility_state"\] = "active"|document_item\["search_visibility_state"\] = "archived"(**/application/single_app/functions_documents.py), 5 resultsSearched for regex
legacy_q =|needs_legacy_update_check(**/application/single_app/route_backend_group_documents.py), 3 resultsSearched for regex
matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\](**/application/single_app/route_backend_documents.py|application/single_app/route_backend_public_documents.py), no resultsOne last check: I’m confirming whether this same pagination tradeoff exists in the sibling list routes, because if it does, the best fix is probably a shared helper rather than a one-off group patch.
Searched for regex
matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\](**/application/single_app/route_backend_documents.py), 4 resultsSearched for regex
matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\](**/application/single_app/route_backend_public_documents.py), 5 resultsI agree with the suggestion. It’s a real scalability regression, not just a stylistic preference.
The problem is visible in route_backend_group_documents.py: the route now fetches every matching document, runs functions_documents.py in Python, then computes
total_countand slices the page in memory. That means RU cost, payload size, and app memory all grow with the full result set and full revision history, instead of with the requested page size. For large groups, that’s exactly the wrong side of the tradeoff.The suggestion is also supported by the data model. New current revisions are persisted with
is_current_version=Trueandsearch_visibility_state="active"in functions_documents.py, prior revisions are archived in functions_documents.py, delete promotion restores the prior current revision in functions_documents.py, and there’s already a repair path in functions_documents.py. So the route should prefer a Cosmos query filtered to current docs and keepORDER BY ... OFFSET ... LIMITon the server.The one caveat is legacy data. A pure
c.is_current_version = truefilter can hide older docs that were never backfilled, and the route already knows legacy records may exist via route_backend_group_documents.py. So the right implementation is usually:One more thing: this same regression exists in the personal and public list routes too, at route_backend_documents.py and route_backend_public_documents.py. If you fix it, I’d do it as a shared helper rather than only in the group route.
If you want, I can implement that current-only server-side path with legacy fallback across all three routes.
3
application/single_app/route_backend_documents.py
These list endpoints now fetch all matching documents from Cosmos (SELECT * ... WHERE ...), then collapse/sort/paginate in Python. For users/groups with many documents this can become very expensive in RU, memory, and latency (and defeats server-side OFFSET/LIMIT). Since revision state is now explicitly modeled, consider filtering server-side for current revisions (e.g., c.is_current_version = true or c.search_visibility_state = 'active' with a legacy fallback for missing fields) so Cosmos can paginate efficiently.
GitHub Copilot: I’m checking the actual document list routes and the current revision logic first, then I’ll assess whether the suggestion is correct in this codebase and where the tradeoffs are.
Read
Read skill cosmosdb-best-practices
Read , lines 1 to 400
Read
Read memory
I’ve confirmed the repo already models current revisions explicitly. Next I’m tracing which list routes still fetch the full matching set and then collapse in Python, because that determines whether the suggestion is just valid in principle or actionable as-is here.
Searched for regex
list_documents|select_current_documents|normalize_document_revision_families|OFFSET|LIMIT|page|pagination|sort(**/application/single_app/route_backend_documents.py), 53 resultsSearched for regex
def select_current_documents|def normalize_document_revision_families|is_current_version|search_visibility_state(**/application/single_app/functions_documents.py), 27 resultsRead , lines 1 to 420
I found the personal documents list route that does exactly what the suggestion describes: broad fetch, then
select_current_documents, sort, and slice in Python. I’m reading the surrounding implementation and the helper query code now to check whether a server-side filter is straightforward or whether legacy/shared-document rules make it more complicated.Read , lines 400 to 620
Searched for regex
_query_accessible_documents|get_all_documents|get_documents_by|shared_user_ids|shared_group_ids(**/application/single_app/functions_documents.py), 125 resultsRead , lines 7480 to 7565
The suggestion is directionally right, but I want to be precise about the legacy edge case before I give you a yes/no. The current helper falls back to “latest per family or per legacy filename” when revision flags are missing, so a naive
is_current_version = truefilter would be incomplete unless those old docs have already been normalized.Read , lines 420 to 520
Searched for regex
normalize_document_revision_families\((**/application/single_app/**/*.py), 4 resultsSearched for regex
@app.route\('/api/documents|@app.route\('/api/group|@app.route\('/api/public(**/application/single_app/*.py), 61 resultsI’ve got the code path. I’m pulling one quick docs check on Cosmos pagination so I don’t hand-wave the RU tradeoff, because
OFFSET/LIMITand continuation tokens are not equivalent in Cosmos.Calling documentation microsoft_docs_search...
Completed with input: {
"intent": "Find official Azure Cosmos DB guidance comparing OFFSET LIMIT and continuation tokens for efficient pagination and RU usage"
}
Read
I’m doing one last pass on the sibling group/public list routes. If they mirror this pattern, that strengthens the case for a shared helper rather than a one-off tweak in the personal route.
Searched for regex
select_current_documents\(|matching_docs = list\(|offset = \(page - 1\) \* page_size|docs = current_docs\[offset:offset \+ page_size\](**/application/single_app/route_backend_group_documents.py), 4 resultsSearched for regex
select_current_documents\(|matching_docs = list\(|offset = \(page - 1\) \* page_size|docs = current_docs\[offset:offset \+ page_size\](**/application/single_app/route_backend_public_documents.py), 5 resultsRead , lines 145 to 270
Read , lines 113 to 210
I’ve got enough to answer. One quick check remains: whether the existing “legacy upgrade” flow already normalizes revision flags. If it doesn’t, that’s the main reason I’d treat the suggestion as “good, but incomplete” instead of “ship it exactly as written.”
Read , lines 840 to 930
Searched for regex
upgrade_legacy|legacy_update|normalize_document_revision_families|is_current_version|search_visibility_state(**/application/single_app/functions_documents.py), 27 resultsRead , lines 6889 to 6975
Read , lines 6975 to 7015
I think the suggestion is good and worth acting on, but I would tighten the implementation details.
The performance diagnosis is accurate. The personal list route in route_backend_documents.py fetches all matching documents, then collapses revisions, sorts, and paginates in Python at route_backend_documents.py, route_backend_documents.py, and route_backend_documents.py. For large workspaces that is expensive in RU, memory, and latency.
I would not replace that with only is_current_version = true yet. The current selection logic in functions_documents.py still supports legacy records by grouping on revision family when present, or scope plus file name when it is not. The list route does not call functions_documents.py, and the existing legacy upgrade flow in functions_documents.py does not backfill revision flags. So a strict current-only filter can hide valid legacy docs.
The right transitional move is to push down as much filtering as possible: prefer current rows server-side, use search_visibility_state = active as compatibility, and keep a fallback path for records missing both fields until data is normalized. That will reduce the candidate set substantially, even though shared-document access still keeps the personal query cross-partition.
I would also tweak the pagination claim. Server-side pagination is better than materializing everything, but in Cosmos the efficient pagination primitive is continuation tokens, not large OFFSET/LIMIT pages. If the API must preserve page numbers and total_count, OFFSET/LIMIT can be a short-term step; it is not the best long-term Cosmos shape.
The same pattern exists in the group and public list routes, so this is really a shared-query/helper problem, not just a personal-route issue: route_backend_group_documents.py and route_backend_public_documents.py.
If I were turning that suggestion into work, I’d do it in this order:
If you want, I can turn that into a concrete refactor plan or implement the first pass.
GitHub Copilot: I’m checking the Cosmos guidance and the revision-handling caveat once, then I’ll tell you whether these comments are converging on the same fix or describing different work.
Read
Read skill cosmosdb-best-practices
I’ve loaded the general Cosmos best-practice guidance; I’m checking the repo’s explicit Cosmos instructions and the prior revision-state note to make sure the answer is grounded in this codebase’s constraints.
Read
Read memory
I’ve confirmed the shared theme; I’m pulling exact file anchors for the three list routes and the revision helper so the comparison is precise rather than just high-level.
Read , lines 150 to 280
Read , lines 500 to 590
Read , lines 110 to 220
Read , lines 420 to 520
I have the personal and public anchors; I’m grabbing the matching group-route block and the current-selection helper so the three routes can be compared apples to apples.
Read , lines 280 to 360
Read , lines 300 to 420
I have the comparison; I’m pulling exact line numbers for the shared fetch-collapse-slice pattern so I can answer with direct anchors instead of approximate references.
Searched for regex
matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\](**/application/single_app/route_backend_documents.py), 4 resultsSearched for regex
matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\](**/application/single_app/route_backend_group_documents.py), 4 resultsSearched for regex
matching_docs = list|select_current_documents\(|total_count = len\(current_docs\)|docs = current_docs\[offset:offset \+ page_size\](**/application/single_app/route_backend_public_documents.py), 5 resultsSearched for regex
def select_current_documents|def normalize_document_revision_families|is_current_version|search_visibility_state(**/application/single_app/functions_documents.py), 27 resultsYes. They’re all converging on the same underlying resolution, even if they phrase it a little differently.
The shared problem is the same in all three list routes: fetch the full matching Cosmos result set, collapse revisions with select_current_documents, then paginate in Python. You can see that pattern in route_backend_documents.py, route_backend_group_documents.py, and route_backend_public_documents.py. The shared caveat is also the same: current-selection is not just a simple flag check, because functions_documents.py collapses legacy families too, and functions_documents.py is the normalization path that repairs revision metadata.
So the comments are aiming at one common fix direction:
So: same resolution family, not three unrelated suggestions. The later comments are basically refining the earlier ones by adding the legacy-data constraint and the shared-helper angle. No code changed.