Skip to content

Fix parallel MT ingestion returning base collection without tenant context#165

Merged
jfrancoa merged 2 commits intomainfrom
jose/fix-parallel-mt-return-collection
Mar 27, 2026
Merged

Fix parallel MT ingestion returning base collection without tenant context#165
jfrancoa merged 2 commits intomainfrom
jose/fix-parallel-mt-return-collection

Conversation

@jfrancoa
Copy link
Copy Markdown
Collaborator

Summary

  • Bug: create_data() in parallel mode (parallel_workers > 1) returned the base collection object instead of a tenant-scoped one. Any caller using len(collection) or collection.batch.wait_for_vector_indexing() on a multi-tenant collection would fail with: "class X has multi-tenancy enabled, but request was without tenant"
  • Root cause: Commit 11e5f3c introduced ThreadPoolExecutor-based parallel ingestion but never assigned the tenant-scoped _coll from futures back to the collection return variable. The sequential path (line 972) already did this correctly.
  • Fix: One-line change — assign collection = _coll inside the parallel futures loop under the existing lock

Regression tests added

Unit tests (3 new in test_data_manager.py)

  • test_parallel_returns_tenant_scoped_collection — verifies parallel mode returns a tenant-scoped collection, not the base one
  • test_sequential_returns_tenant_scoped_collection — verifies sequential MT mode also returns tenant-scoped
  • test_non_mt_returns_base_collection — verifies non-MT collections correctly return the base collection

Integration tests (new file test_create_data_return_collection.py)

  • test_create_data_returns_usable_collection_single_tenant — non-MT: len() and wait_for_vector_indexing() work on returned collection
  • test_create_data_returns_usable_collection_multitenant_sequential — MT sequential: same operations work without "request was without tenant" error
  • test_create_data_returns_usable_collection_multitenant_parallel — MT parallel: the exact scenario that was broken

CI

  • Updated .github/workflows/main.yaml to run test_data_integration.py and test_create_data_return_collection.py alongside test_integration.py

Test plan

  • All 42 unit tests pass (39 existing + 3 new)
  • Integration tests pass against a running Weaviate instance (CI will validate)
  • Verify the e2e recovery tests in weaviate-e2e-tests no longer fail with the tenant error

🤖 Generated with Claude Code

…t context

Parallel multi-tenant ingestion (introduced in 11e5f3c) never assigned the
tenant-scoped collection back to the return variable. Callers that used the
returned collection for len() or batch.wait_for_vector_indexing() on MT
collections got:

    "class X has multi-tenancy enabled, but request was without tenant"

The fix assigns the tenant-scoped collection inside the parallel futures
loop, matching the sequential path that already did this correctly.

Added regression tests:
- 3 unit tests verifying the return value identity for parallel MT,
  sequential MT, and non-MT collections
- 3 integration tests exercising len() and wait_for_vector_indexing() on
  the returned collection for single-tenant, MT sequential, and MT parallel
- CI workflow updated to run the new integration test file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

The text2vec-transformers-model2vec module produces 256-dimensional vectors,
not 384. Updated hardcoded dimension expectations in both the parametrized
config test and the named vectors test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jfrancoa jfrancoa merged commit dd41083 into main Mar 27, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant