feat: replaces PersistentConnector monkey-patch with proper nixl_conn…#6913
feat: replaces PersistentConnector monkey-patch with proper nixl_conn…#6913dsocek wants to merge 1 commit intoai-dynamo:mainfrom
Conversation
|
👋 Hi dsocek! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThis change refactors NIXL agent lifecycle management by introducing native remote reference counting and shared connection pooling in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/bindings/python/src/dynamo/nixl_connect/__init__.py`:
- Around line 609-616: The current release_remote_ref method treats missing
names as a "last release" because it uses self._remote_refs.get(name, 0) <= 1;
change the logic in release_remote_ref to first check presence (if name not in
self._remote_refs: return False) so unknown names are ignored, then read the
count, if count <= 1 pop and return True, otherwise decrement
self._remote_refs[name] and return False; reference: release_remote_ref,
self._remote_refs, and self._remote_refs_lock (ensure the presence check and
subsequent pop/decrement happen while holding the lock).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6b9551e9-ced9-4ba5-9166-35ec3fc75cfd
📒 Files selected for processing (2)
components/src/dynamo/common/multimodal/embedding_transfer.pylib/bindings/python/src/dynamo/nixl_connect/__init__.py
…ect implementation Signed-off-by: Daniel Socek <daniel.socek@intel.com>
02965ca to
4d381e6
Compare
|
/ok to test 4d381e6 |
Overview:
Under concurrent multimodal workloads,
nixl_connecthas two issues:NIXL_ERR_NOT_FOUND).There is an existing workaround in
embedding_transfer.pythat monkey-patches the cleanup to a no-op and subclasses the connector. This prevents the crashes but leaks remote agents and affects allnixl_connectusers globally.This PR fixes both issues properly in
nixl_connectitself and removes the workaround.Details:
nixl_connect/__init__.py:embedding_transfer.py:PersistentConnectorsubclassRemote._releasemonkey-patchnixl_connect.Connector()directlyWhere should the reviewer start?
nixl_connect/__init__.py:_create_connection()and the newacquire_remote_ref()/release_remote_ref()methodsembedding_transfer.py: removal ofPersistentConnectorand monkey-patchRelated Issues:
Summary by CodeRabbit