feat(BA-5578): refactor ValkeyClient to async with acquire() pattern#10766
Open
jopemachine wants to merge 9 commits intomainfrom
Open
feat(BA-5578): refactor ValkeyClient to async with acquire() pattern#10766jopemachine wants to merge 9 commits intomainfrom
jopemachine wants to merge 9 commits intomainfrom
Conversation
…ttern Add an acquire() async context manager to AbstractValkeyClient that wraps operations and tracks consecutive connection failures. When the failure threshold is exceeded, the connection is torn down and reconnected rather than continuing to retry on a broken connection. - Add _VALKEY_CONNECTION_ERRORS tuple for connection error detection - Add acquire() base implementation to AbstractValkeyClient - Override acquire() in MonitoringValkeyClient with failure tracking and threshold-based reconnection - Migrate all 13 domain-specific Valkey clients to use the new pattern - Add tests for retry-based disconnection logic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Refactors the Valkey client abstraction to standardize operations behind an async with ... acquire() context manager, enabling operation-level failure tracking and reconnect-on-threshold behavior, and migrates all domain Valkey clients to use the new pattern.
Changes:
- Added
AbstractValkeyClient.acquire()async context manager and aMonitoringValkeyClient.acquire()override that counts consecutive operation connection failures and triggers_reconnect()at a threshold. - Migrated domain-specific Valkey clients to use
async with self._client.acquire() as conn:for all Valkey operations. - Added unit tests validating failure counting, threshold behavior, and reconnection outcomes for
MonitoringValkeyClient.acquire().
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/ai/backend/common/clients/valkey_client/client.py |
Introduces acquire() and implements operation failure tracking + threshold-based reconnection. |
tests/unit/common/clients/valkey_client/test_monitoring_valkey_client.py |
Adds test coverage for the new acquire() behavior and reconnection triggering. |
src/ai/backend/common/clients/valkey_client/valkey_volume_stats/client.py |
Migrates volume stats operations to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_stream/client.py |
Migrates stream operations (xgroup/xreadgroup/etc.) to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_stat/client.py |
Migrates stat/cache operations and batched execs to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_session/client.py |
Migrates session CRUD and time/ping to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_schedule/client.py |
Migrates scheduling state operations and batch execs to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_rate_limit/client.py |
Migrates rate-limit scripts and sorted-set operations to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_live/client.py |
Migrates live data operations, scans, and pipelines to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_leader/client.py |
Migrates leadership Lua script invocations to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_image/client.py |
Migrates image membership operations and scans to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_container_log/client.py |
Migrates list/log operations and batch execs to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_bgtask/client.py |
Migrates task metadata operations, scripts, and execs to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_artifact/client.py |
Migrates artifact progress tracking (get/exec/hincrby/hset/scan) to the acquire() pattern. |
src/ai/backend/common/clients/valkey_client/valkey_artifact_registries/client.py |
Migrates registry CRUD operations to the acquire() pattern. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/unit/common/clients/valkey_client/test_monitoring_valkey_client.py
Show resolved
Hide resolved
The ValkeyLeaderClient was refactored to use `async with self._client.acquire()`, but the test mock didn't support the async context manager protocol. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add MonitoringValkeyClientSpec dataclass to group configuration fields (reconnectable_exceptions, consecutive_failure_threshold, monitor_interval) - Make AbstractValkeyClient.acquire() a proper abstract method - Add acquire() implementations to ValkeyStandaloneClient and ValkeySentinelClient Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merge test_acquire_resets_counter_and_reconnects_at_threshold and test_acquire_reconnect_restores_connectivity into test_operation_client_recovery_while_monitor_healthy, which verifies the core BA-5577 scenario: operation client broken while monitor client is healthy, with counter reset and recovery assertions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ingValkeyClient Serialize _reconnect() calls from acquire() and monitor loop with asyncio.Lock to prevent interleaving disconnect/connect operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename AbstractValkeyClient.client property to _raw_client (internal) - Rename acquire() context manager to client() (public API) - Replace _reconnect_lock with _needs_reconnect flag; monitor loop handles actual reconnection to avoid concurrent reconnect races - Update all 14 subdomain clients, ValkeyCache, plugins, and tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of waiting up to monitor_interval (10s) for the next cycle, client() now sets a _reconnect_event that immediately wakes the monitor loop. Reconnection is still performed solely by the monitor loop, avoiding concurrent reconnect races. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves BA-5578.
Summary
acquire()async context manager toAbstractValkeyClientandMonitoringValkeyClientthat wraps operations and tracks consecutive connection failuresasync with self._client.acquire() as conn:patternChanges
client.py): Define_VALKEY_CONNECTION_ERRORStuple, addacquire()base implementation toAbstractValkeyClient, override inMonitoringValkeyClientwith_operation_failure_counttracking and threshold-based_reconnect()triggerself._client.client.xxx()calls withasync with self._client.acquire() as conn:patternTestMonitoringValkeyClientAcquireclass with 8 test cases covering success/failure counting, threshold detection, reconnection, and error type trackingTest plan
pants fmt/fix/lint --changed-since=HEADpassespants check --changed-since=HEAD(mypy) passespants test --changed-since=HEADpasses🤖 Generated with Claude Code