-
Notifications
You must be signed in to change notification settings - Fork 171
Apply async-with pattern to ValkeyClient with retry-based disconnection #10755
Copy link
Copy link
Open
Description
Objective
Refactor ValkeyClient usage to an async with pattern that internally tracks retry counts and decides whether to tear down the connection based on consecutive failures.
Details
- Wrap ValkeyClient operations in an async with context manager pattern
- Track retry_count internally; when the threshold is exceeded, disconnect the client rather than continuing to retry on a broken connection
- Migrate all ValkeyClient usage sites to the new async with pattern
- This has a wide blast radius — all 14 domain-specific Valkey clients and their callers need updating
Implementation Reference
AgentClientPool (src/ai/backend/manager/clients/agent/pool.py) uses a similar pattern with acquire(), _record_failure(), and _record_success() that can serve as a reference for the implementation approach.
Code References
- MonitoringValkeyClient: src/ai/backend/common/clients/valkey_client/client.py (~line 413)
- Domain-specific clients (14 total): src/ai/backend/common/clients/valkey_client/valkey_*/client.py
- Manager Valkey dependency: src/ai/backend/manager/dependencies/infrastructure/redis.py
- Agent Valkey dependency: src/ai/backend/agent/dependencies/infrastructure/redis.py
- AgentClientPool (reference only): src/ai/backend/manager/clients/agent/pool.py
Acceptance Criteria
- ValkeyClient operations are wrapped in an async with context manager
- Retry count is tracked internally and the connection is torn down when the threshold is exceeded
- All ValkeyClient usage sites are migrated to the new pattern
- Verified in both Standalone and Sentinel modes
- Tests covering the retry-based disconnection logic are included
JIRA Issue: BA-5578
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels