Skip to content

Fix iterator locking#1497

Open
ianton-ru wants to merge 2 commits intoantalya-25.8from
bugfix/antalya-25.8/fix_iterator_wrappers
Open

Fix iterator locking#1497
ianton-ru wants to merge 2 commits intoantalya-25.8from
bugfix/antalya-25.8/fix_iterator_wrappers

Conversation

@ianton-ru
Copy link

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix issues found in #1436

Documentation entry for user-facing changes

Add has_concurrent_next method to iterator wrappers.
Add separate iterator thread in query thread group to collect ProfileEvents properly.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@ianton-ru
Copy link
Author

@codex review

@github-actions
Copy link

github-actions bot commented Mar 10, 2026

Workflow [PR], commit [55cef3c]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d01eb1575f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

size_t estimatedKeysCount() override { return iterator->estimatedKeysCount(); }
std::optional<UInt64> getSnapshotVersion() const override { return iterator->getSnapshotVersion(); }

bool has_concurrent_next() const override { return iterator->has_concurrent_next(); }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep bucket-splitting iterator serialized

Propagating has_concurrent_next() from the wrapped iterator here allows StorageObjectStorageStableTaskDistributor::getMatchingFileFromIterator to call next() without its mutex, but ObjectIteratorSplitByBuckets::next mutates pending_objects_info via unsynchronized push/pop in IObjectIterator.cpp. When multiple replicas pull tasks concurrently (e.g., distributed reads with cluster_table_function_split_granularity=BUCKET and a concurrent source iterator), this introduces a data race that can corrupt task assignment or crash.

Useful? React with 👍 / 👎.

[this]()
[this, thread_group = DB::CurrentThread::getGroup()]()
{
ThreadGroupSwitcher switcher(thread_group, "IcebergKeysIterator");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use a valid ThreadGroupSwitcher thread name length

ThreadGroupSwitcher forwards thread_name to setThreadName, which rejects names longer than 15 bytes; "IcebergKeysIterator" is 19 bytes, so this path deterministically throws and enters the switcher’s exception handler every time. That means the new thread-group setup always takes the error path (with noisy logs and skipped normal switcher teardown logic), so the locking/profile-events fix does not run cleanly.

Useful? React with 👍 / 👎.

@ianton-ru
Copy link
Author

@codex review

@ianton-ru
Copy link
Author

@codex review, please!

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ianton-ru
Copy link
Author

Disclaimer: This review was generated by an AI coding assistant (Cursor, Auto agent). It is not an official human review.


Audit summary for PR #1497 (Fix iterator locking) — both commits (d01eb15 + 55cef3c) reviewed.

Result: no remaining confirmed defects.

  • P1 (bucket iterator data race): Addressed in 55cef3cObjectIteratorSplitByBuckets::next() is now fully protected by a mutex; has_concurrent_next() correctly returns true while remaining thread-safe.
  • P2 (thread name length): Addressed in 55cef3c — thread name shortened from "IcebergKeysIterator" (19 bytes) to "IcebergKeys" (11 bytes), within the 15-byte limit.

Recommendation: Safe to merge from a concurrency and thread-naming perspective. Ensure any future refactors of ObjectIteratorSplitByBuckets::next() keep the mutex held for the entire method body (including the refill loop). Consider adding a TSAN test for cluster reads with cluster_table_function_split_granularity=BUCKET and an Iceberg source.

Copy link
Collaborator

@arthurpassos arthurpassos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tho it is sad we had to add yet another mutex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants