-
Notifications
You must be signed in to change notification settings - Fork 171
Harden Standalone/Sentinel reconnection logic #10753
Copy link
Copy link
Open
Description
Objective
Strengthen the reconnection logic for both Standalone and Sentinel modes so that recovery succeeds even in abnormal termination scenarios such as socket deletion or container restarts.
Details
- ValkeyStandaloneClient.need_reconnect() currently only checks if client is None — improve to verify actual connection state
- Accurately reflect GlideClient internal connection state
- In Sentinel mode, detect connection loss beyond just master address changes
- Ensure reconnection succeeds after socket file deletion under /tmp
Reproduction Scenarios
- Stop and restart the Redis/Valkey container
- Delete the Valkey socket file under /tmp, then restart the container
- Scenario where only the operation client connection is lost
Code References
- ValkeyStandaloneClient: src/ai/backend/common/clients/valkey_client/client.py (~line 133)
- ValkeyStandaloneClient.need_reconnect(): same file — currently returns client is None
- ValkeySentinelClient: same file (~line 226)
- ValkeySentinelClient.need_reconnect() and _get_master_address(): same file (~line 327, 348)
- MonitoringValkeyClient._reconnect(): same file (~line 551)
Acceptance Criteria
- need_reconnect() accurately detects broken connections beyond client is None
- Reconnection succeeds after socket deletion + container restart
- Verified in both Standalone and Sentinel modes
- Tests covering each reproduction scenario are included
JIRA Issue: BA-5576
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels