Fix parallel MT ingestion returning base collection without tenant context#165
Merged
Fix parallel MT ingestion returning base collection without tenant context#165
Conversation
…t context Parallel multi-tenant ingestion (introduced in 11e5f3c) never assigned the tenant-scoped collection back to the return variable. Callers that used the returned collection for len() or batch.wait_for_vector_indexing() on MT collections got: "class X has multi-tenancy enabled, but request was without tenant" The fix assigns the tenant-scoped collection inside the parallel futures loop, matching the sequential path that already did this correctly. Added regression tests: - 3 unit tests verifying the return value identity for parallel MT, sequential MT, and non-MT collections - 3 integration tests exercising len() and wait_for_vector_indexing() on the returned collection for single-tenant, MT sequential, and MT parallel - CI workflow updated to run the new integration test file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Orca Security Scan Summary
| Status | Check | Issues by priority | |
|---|---|---|---|
| Infrastructure as Code | View in Orca | ||
| SAST | View in Orca | ||
| Secrets | View in Orca | ||
| Vulnerabilities | View in Orca |
The text2vec-transformers-model2vec module produces 256-dimensional vectors, not 384. Updated hardcoded dimension expectations in both the parametrized config test and the named vectors test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
create_data()in parallel mode (parallel_workers > 1) returned the base collection object instead of a tenant-scoped one. Any caller usinglen(collection)orcollection.batch.wait_for_vector_indexing()on a multi-tenant collection would fail with:"class X has multi-tenancy enabled, but request was without tenant"ThreadPoolExecutor-based parallel ingestion but never assigned the tenant-scoped_collfrom futures back to thecollectionreturn variable. The sequential path (line 972) already did this correctly.collection = _collinside the parallel futures loop under the existing lockRegression tests added
Unit tests (3 new in
test_data_manager.py)test_parallel_returns_tenant_scoped_collection— verifies parallel mode returns a tenant-scoped collection, not the base onetest_sequential_returns_tenant_scoped_collection— verifies sequential MT mode also returns tenant-scopedtest_non_mt_returns_base_collection— verifies non-MT collections correctly return the base collectionIntegration tests (new file
test_create_data_return_collection.py)test_create_data_returns_usable_collection_single_tenant— non-MT:len()andwait_for_vector_indexing()work on returned collectiontest_create_data_returns_usable_collection_multitenant_sequential— MT sequential: same operations work without "request was without tenant" errortest_create_data_returns_usable_collection_multitenant_parallel— MT parallel: the exact scenario that was brokenCI
.github/workflows/main.yamlto runtest_data_integration.pyandtest_create_data_return_collection.pyalongsidetest_integration.pyTest plan
weaviate-e2e-testsno longer fail with the tenant error🤖 Generated with Claude Code