perf(scheduler): strip id_token and refresh_token from scheduler cache#1429
Open
Fumeng24 wants to merge 11 commits intoWei-Shaw:mainfrom
Open
perf(scheduler): strip id_token and refresh_token from scheduler cache#1429Fumeng24 wants to merge 11 commits intoWei-Shaw:mainfrom
Fumeng24 wants to merge 11 commits intoWei-Shaw:mainfrom
Conversation
The scheduler cache stores full Account JSON blobs including OAuth tokens in Redis. In multi-instance deployments with limited inter-server bandwidth, this causes excessive Redis traffic — each account entry is ~20KB due to large JWT tokens (id_token, refresh_token), and with 1000 accounts a full rebuild writes ~20MB to Redis. The gateway request path only needs access_token and api_key; id_token and refresh_token are consumed exclusively by background token-refresh services that read directly from the database. This change: - Adds marshalAccountForCache() that strips heavy credential fields before serialization, using a shallow copy to avoid mutating the caller - Applies it to SetAccount, SetSnapshot, and UpdateLastUsed - Adds unit tests for the stripping logic Expected impact: ~60-70% reduction in per-account Redis payload size, significantly reducing cross-server Redis bandwidth in multi-instance deployments.
Extends the cache bandwidth optimization to also strip: - AccountGroups[].Group: full Group objects (only GroupID needed for routing) - AccountGroups[].Account: back-reference to parent (circular/redundant) - Groups: duplicate of AccountGroups[].Group (ops-only, not used in gateway) The gateway request path (isAccountInGroup) only reads AccountGroup.GroupID. The Groups slice is only consumed by ops monitoring services which are disabled on non-primary instances. With 1000 accounts sharing the same groups, this eliminates massive duplication — each ~40-field Group object was previously embedded twice per account (in AccountGroups[].Group and Groups[]). Adds unit tests for Group stripping and mutation safety.
…1357) Cherry-picked from upstream PR Wei-Shaw#1357. Adds cancel_stream_on_client_disconnect config option to stop draining upstream responses when client disconnects, saving bandwidth.
Cherry-picked from upstream PR Wei-Shaw#1382. Adds race-aware recovery for invalid_grant and per-account mutex to prevent concurrent refresh causing false errors.
Cherry-picked from upstream PR Wei-Shaw#1391. Upgrades 429/529 from passthrough accounts into failover-capable errors instead of passing directly to client.
Cherry-picked from upstream PR Wei-Shaw#1358. Prevents stale codex usage snapshots from re-rate-limiting pool mode API key accounts after manual reset.
Cherry-picked from upstream PR Wei-Shaw#1399. Adds DB_MAX_OPEN_CONNS etc. env bindings with conservative defaults.
…o_system_logs_test Use strings.Contains from stdlib instead of custom helper that conflicts with the same-named function in ops_repo_system_logs_test.go.
- max_open_conns: 20 → 50 - max_idle_conns: 5 → 25 (50% ratio per project docs) - conn_max_lifetime_minutes: 10 → 30
Port of upstream PR Wei-Shaw#734 with all 6 review issues fixed: - Fix Wei-Shaw#1: AuthService calls ReferralService.ProcessReferralRegistration() - Fix Wei-Shaw#2: GetOrCreateProfile re-fetches on user_id unique conflict - Fix Wei-Shaw#3: Migration uses ON DELETE CASCADE for all foreign keys - Fix Wei-Shaw#4: Single migration file (082_create_referral_tables.sql) - Fix Wei-Shaw#5: grantRewardsInTx is unexported - Fix Wei-Shaw#6: Validates non-negative reward values Features: - User referral codes (8-char, lazy-loaded) - Dual rewards (inviter + invitee, configurable in admin settings) - Admin referral stats endpoint - Self-referral prevention, duplicate protection via DB constraints - Raw SQL repository (no ent dependency for referral tables) Backend only - frontend will be added in a follow-up commit.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The scheduler cache (
sched:acc:{id}) stores full Account JSON blobs in Redis, including large OAuth tokens (id_token,refresh_token). In multi-instance deployments with limited inter-server bandwidth (e.g. 30Mbps), this causes excessive Redis traffic:The gateway request path only needs
access_tokenandapi_key. Theid_tokenandrefresh_tokenare consumed exclusively by background token-refresh services that read directly from the database.Changes
marshalAccountForCache()helper that stripsid_tokenandrefresh_tokenbefore serialization, using a shallow copy to avoid mutating the caller's dataSetAccount,SetSnapshot, andUpdateLastUsedwrite pathsImpact
access_tokenandapi_keyare preservedTest plan
marshalAccountForCache(nil, empty, strip, no-mutation)iftop -f "port 6379"