Skip to content

[GLUTEN-11752] Fix AdaptiveSparkPlanExec accessibility in columnar write optimization#11753

Open
wangyum wants to merge 4 commits intoapache:mainfrom
wangyum:GLUTEN-11752
Open

[GLUTEN-11752] Fix AdaptiveSparkPlanExec accessibility in columnar write optimization#11753
wangyum wants to merge 4 commits intoapache:mainfrom
wangyum:GLUTEN-11752

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Mar 13, 2026

What changes are proposed in this pull request?

This pr fixes the issue where Gluten's columnar writer optimization breaks shuffle IDs retrieval introduced in Spark PR #51432.

Spark expects to access AdaptiveSparkPlanExec.shuffleIds via pattern matching:

queryExecution.executedPlan match {
  case ae: AdaptiveSparkPlanExec =>
    ae.context.shuffleIds.asScala.keys
}

This PR refactored the wrapping logic to:

  1. Wrap aqe.inputPlan with genColumnarToCarrierRow() first → ColumnarToCarrierRow(inputPlan)
  2. Create a new AdaptiveSparkPlanExec with the wrapped child → AdaptiveSparkPlanExec(ColumnarToCarrierRow(...))
  3. Set supportsColumnar=false since the child is already columnar

This preserves AdaptiveSparkPlanExec in the plan hierarchy while maintaining the columnar write optimization benefits.

How was this patch tested?

Unit test.

Was this patch authored or co-authored using generative AI tooling?

No.

Fixes #11752

@github-actions github-actions bot added the CORE works for Gluten Core label Mar 13, 2026
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

2 similar comments
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@pan3793
Copy link
Member

pan3793 commented Mar 13, 2026

was it fixed by apache/spark#53620?

@zhouyuan
Copy link
Member

@wangyum could you please do a rebase to fix the CK CI issue?
Cc @JkSelf

wangyum added 3 commits March 14, 2026 09:16
…mization

Refactor GlutenWriterColumnarRules to preserve AdaptiveSparkPlanExec in plan hierarchy,
enabling shuffle IDs retrieval that was broken by Spark PR #51432.

Original implementation wrapped AdaptiveSparkPlanExec with ColumnarToCarrierRow,
hiding it from external pattern matching. New approach wraps the input plan first,
then creates a new AdaptiveSparkPlanExec with the wrapped child and supportsColumnar=false.

Added GlutenWriterColumnarRulesSuite for Spark 4.0 and 4.1 to verify the fix.
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@wangyum
Copy link
Member Author

wangyum commented Mar 14, 2026

@pan3793

  1. Better compatibility - Maintaining plan structure consistent with Spark avoids similar issues with other code that depends on AdaptiveSparkPlanExec.
  2. Clearer semantics - Having AdaptiveSparkPlanExec as the child with other operators wrapping it aligns better with Spark's design intent.

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix AdaptiveSparkPlanExec wrapped by ColumnarToCarrierRow breaks shuffle IDs retrieval

3 participants