Problem
GitHub's REST API limits page-based pagination to page 100 (10,000 items with per_page=100). Repositories with more than 10,000 issues/PRs (like mne-tools/mne-python with 13,000+) trigger an HTTP 422 error when requesting beyond page 100.
Current error:
HTTP 422 on page 101 for mne-tools/mne-python issues
This means we silently miss issues/PRs beyond the 10,000 limit.
Proposed Solution
Switch from page-based to cursor-based pagination using GitHub's GraphQL API or the REST API's Link header with since/after parameters.
Option A: Use since parameter (REST API)
For issues/PRs, use since parameter with ISO 8601 timestamp to paginate by creation/update date instead of page number. This avoids the page limit entirely.
Option B: Switch to GraphQL API
GitHub's GraphQL API uses cursor-based pagination natively and has no page limit.
Implementation
- Modify
src/knowledge/github_sync.py sync functions
- Replace
page=N iteration with cursor-based approach
- Keep backward compatibility with existing incremental sync logic
- Add tests for repos with >10,000 items
Context
Discovered during MNE community onboarding. mne-tools/mne-python has 13,000+ issues, exceeding GitHub's page-based pagination limit. The sync currently stops at 10,000 items silently (or errors on page 101).
Problem
GitHub's REST API limits page-based pagination to page 100 (10,000 items with per_page=100). Repositories with more than 10,000 issues/PRs (like mne-tools/mne-python with 13,000+) trigger an HTTP 422 error when requesting beyond page 100.
Current error:
This means we silently miss issues/PRs beyond the 10,000 limit.
Proposed Solution
Switch from page-based to cursor-based pagination using GitHub's GraphQL API or the REST API's
Linkheader withsince/afterparameters.Option A: Use
sinceparameter (REST API)For issues/PRs, use
sinceparameter with ISO 8601 timestamp to paginate by creation/update date instead of page number. This avoids the page limit entirely.Option B: Switch to GraphQL API
GitHub's GraphQL API uses cursor-based pagination natively and has no page limit.
Implementation
src/knowledge/github_sync.pysync functionspage=Niteration with cursor-based approachContext
Discovered during MNE community onboarding. mne-tools/mne-python has 13,000+ issues, exceeding GitHub's page-based pagination limit. The sync currently stops at 10,000 items silently (or errors on page 101).