perf(scheduler): strip id_token and refresh_token from scheduler cache by Fumeng24 · Pull Request #1429 · Wei-Shaw/sub2api

Fumeng24 · 2026-04-01T17:09:19Z

Summary

The scheduler cache (sched:acc:{id}) stores full Account JSON blobs in Redis, including large OAuth tokens (id_token, refresh_token). In multi-instance deployments with limited inter-server bandwidth (e.g. 30Mbps), this causes excessive Redis traffic:

Each account entry is ~20KB due to JWT tokens
With 1000 accounts, a full rebuild writes ~20MB to Redis
Observed ~19.5Mbps of Redis upload traffic from scheduler cache writes alone

The gateway request path only needs access_token and api_key. The id_token and refresh_token are consumed exclusively by background token-refresh services that read directly from the database.

Changes

Add marshalAccountForCache() helper that strips id_token and refresh_token before serialization, using a shallow copy to avoid mutating the caller's data
Apply to SetAccount, SetSnapshot, and UpdateLastUsed write paths
Add unit tests for the stripping logic (nil account, empty credentials, strip verification, mutation safety)

Impact

~60-70% reduction in per-account Redis payload size
Significantly reduces cross-server Redis bandwidth in multi-instance deployments
No impact on gateway functionality — access_token and api_key are preserved
No impact on token refresh — refresh services read credentials from PostgreSQL

Test plan

Unit tests for marshalAccountForCache (nil, empty, strip, no-mutation)
Verify gateway requests still work (access_token/api_key preserved in cache)
Verify token refresh still works (reads from DB, not cache)
Monitor Redis bandwidth reduction with iftop -f "port 6379"

The scheduler cache stores full Account JSON blobs including OAuth tokens in Redis. In multi-instance deployments with limited inter-server bandwidth, this causes excessive Redis traffic — each account entry is ~20KB due to large JWT tokens (id_token, refresh_token), and with 1000 accounts a full rebuild writes ~20MB to Redis. The gateway request path only needs access_token and api_key; id_token and refresh_token are consumed exclusively by background token-refresh services that read directly from the database. This change: - Adds marshalAccountForCache() that strips heavy credential fields before serialization, using a shallow copy to avoid mutating the caller - Applies it to SetAccount, SetSnapshot, and UpdateLastUsed - Adds unit tests for the stripping logic Expected impact: ~60-70% reduction in per-account Redis payload size, significantly reducing cross-server Redis bandwidth in multi-instance deployments.

Extends the cache bandwidth optimization to also strip: - AccountGroups[].Group: full Group objects (only GroupID needed for routing) - AccountGroups[].Account: back-reference to parent (circular/redundant) - Groups: duplicate of AccountGroups[].Group (ops-only, not used in gateway) The gateway request path (isAccountInGroup) only reads AccountGroup.GroupID. The Groups slice is only consumed by ops monitoring services which are disabled on non-primary instances. With 1000 accounts sharing the same groups, this eliminates massive duplication — each ~40-field Group object was previously embedded twice per account (in AccountGroups[].Group and Groups[]). Adds unit tests for Group stripping and mutation safety.

…1357) Cherry-picked from upstream PR Wei-Shaw#1357. Adds cancel_stream_on_client_disconnect config option to stop draining upstream responses when client disconnects, saving bandwidth.

Cherry-picked from upstream PR Wei-Shaw#1382. Adds race-aware recovery for invalid_grant and per-account mutex to prevent concurrent refresh causing false errors.

Cherry-picked from upstream PR Wei-Shaw#1391. Upgrades 429/529 from passthrough accounts into failover-capable errors instead of passing directly to client.

Cherry-picked from upstream PR Wei-Shaw#1358. Prevents stale codex usage snapshots from re-rate-limiting pool mode API key accounts after manual reset.

Cherry-picked from upstream PR Wei-Shaw#1399. Adds DB_MAX_OPEN_CONNS etc. env bindings with conservative defaults.

…o_system_logs_test Use strings.Contains from stdlib instead of custom helper that conflicts with the same-named function in ops_repo_system_logs_test.go.

- max_open_conns: 20 → 50 - max_idle_conns: 5 → 25 (50% ratio per project docs) - conn_max_lifetime_minutes: 10 → 30

Port of upstream PR Wei-Shaw#734 with all 6 review issues fixed: - Fix Wei-Shaw#1: AuthService calls ReferralService.ProcessReferralRegistration() - Fix Wei-Shaw#2: GetOrCreateProfile re-fetches on user_id unique conflict - Fix Wei-Shaw#3: Migration uses ON DELETE CASCADE for all foreign keys - Fix Wei-Shaw#4: Single migration file (082_create_referral_tables.sql) - Fix Wei-Shaw#5: grantRewardsInTx is unexported - Fix Wei-Shaw#6: Validates non-negative reward values Features: - User referral codes (8-char, lazy-loaded) - Dual rewards (inviter + invitee, configurable in admin settings) - Admin referral stats endpoint - Self-referral prevention, duplicate protection via DB constraints - Raw SQL repository (no ent dependency for referral tables) Backend only - frontend will be added in a follow-up commit.

Fumeng24 added 11 commits April 2, 2026 01:08

feat(gateway): cancel upstream stream on client disconnect (Wei-Shaw#…

4ffecff

…1357) Cherry-picked from upstream PR Wei-Shaw#1357. Adds cancel_stream_on_client_disconnect config option to stop draining upstream responses when client disconnects, saving bandwidth.

fix: resolve refresh token race condition (Wei-Shaw#1382)

3f37226

Cherry-picked from upstream PR Wei-Shaw#1382. Adds race-aware recovery for invalid_grant and per-account mutex to prevent concurrent refresh causing false errors.

fix(openai): failover passthrough 429 and 529 (Wei-Shaw#1391)

089f904

Cherry-picked from upstream PR Wei-Shaw#1391. Upgrades 429/529 from passthrough accounts into failover-capable errors instead of passing directly to client.

fix: skip pool mode codex extra rate-limit sync (Wei-Shaw#1358)

8cfb1ee

Cherry-picked from upstream PR Wei-Shaw#1358. Prevents stale codex usage snapshots from re-rate-limiting pool mode API key accounts after manual reset.

perf(db): env-configurable connection pool settings (Wei-Shaw#1399)

b8be4e6

Cherry-picked from upstream PR Wei-Shaw#1399. Adds DB_MAX_OPEN_CONNS etc. env bindings with conservative defaults.

fix(test): rename contains helper to avoid redeclaration with ops_rep…

1caa09f

…o_system_logs_test Use strings.Contains from stdlib instead of custom helper that conflicts with the same-named function in ops_repo_system_logs_test.go.

perf(db): raise default connection pool for ~300rpm workloads

9ea5973

- max_open_conns: 20 → 50 - max_idle_conns: 5 → 25 (50% ratio per project docs) - conn_max_lifetime_minutes: 10 → 30

revert(db): restore original connection pool defaults (256/128/30)

5a9bcde

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(scheduler): strip id_token and refresh_token from scheduler cache#1429

perf(scheduler): strip id_token and refresh_token from scheduler cache#1429
Fumeng24 wants to merge 11 commits intoWei-Shaw:mainfrom
Fumeng24:optimize/strip-scheduler-cache-credentials

Fumeng24 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Fumeng24 commented Apr 1, 2026

Summary

Changes

Impact

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant