[BugFix] Fix PushDownContext shallow copy bug#5199
[BugFix] Fix PushDownContext shallow copy bug#5199songkant-aws wants to merge 9 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: Songkan Tang <songkant@amazon.com>
PR Reviewer Guide 🔍(Review updated until commit 05a2ed2)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 05a2ed2 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit af6c695
Suggestions up to commit 11db491
Suggestions up to commit 84892bcSuggestions up to commit f36c20a
Suggestions up to commit 7df5f94
|
Signed-off-by: Songkan Tang <songkant@amazon.com>
|
Persistent review updated to latest commit 21f8940 |
|
cc @qianheng-aws as proper reviewer |
| if (builder instanceof NestedAggregationBuilder nested) { | ||
| return copyNestedAggregationBuilder(nested); | ||
| } | ||
| return builder; |
There was a problem hiding this comment.
What if there is other missed or upcoming agg builder?
| } | ||
| } | ||
|
|
||
| private static final class TermsAggregationBuilderCopy extends TermsAggregationBuilder { |
There was a problem hiding this comment.
Can this be replaced with
source.shallowCopy(source, copySubAggregations(source), copyMetadataOrNull(source))
instead of defining these extended class?
Signed-off-by: Songkan Tang <songkant@amazon.com>
|
Persistent review updated to latest commit 9fe5192 |
|
Persistent review updated to latest commit 7df5f94 |
Signed-off-by: Songkan Tang <songkant@amazon.com>
|
Persistent review updated to latest commit f36c20a |
Signed-off-by: Songkan Tang <songkant@amazon.com>
|
Persistent review updated to latest commit 84892bc |
|
Persistent review updated to latest commit 11db491 |
|
Persistent review updated to latest commit af6c695 |
|
@qianheng-aws Now the fix is changed to lazily push down agg request at the createRequest phase. But it introduces some intermediate immutable spec object to record any logical pushdown operations. It touches more logic now but I think it could mitigate the current problem of AggPushDownAction's mutation during planning. |
Signed-off-by: Songkan Tang <songkant@amazon.com>
af6c695 to
05a2ed2
Compare
|
Persistent review updated to latest commit 05a2ed2 |
qianheng-aws
left a comment
There was a problem hiding this comment.
As commented here:https://github.com/opensearch-project/sql/pull/5199/changes#r2980342004, if the AggPushDownAction can be reusable, plenty of params can be removed, which are maintained all the way but only for creating AggPushDownAction in the end in build() method.
|
|
||
| private static LimitPushdownMode inferLimitPushdownMode( | ||
| List<AggregationBuilder> builders, @Nullable AggregationBuilder rootBuilder) { | ||
| if (builders.isEmpty()) { |
There was a problem hiding this comment.
Nit: seems no need to pass in param builders here. This can be replaced with rootBuilder==null.
There was a problem hiding this comment.
Yes, replace it with rootBuilder == null check now.
| // Treats leaf metric aggregations as limit-pushable because they produce a single row. | ||
| return LimitPushdownMode.LEAF_METRIC; | ||
| } | ||
| return inferLimitPushdownMode(inferKind(rootBuilder)); |
There was a problem hiding this comment.
Nit: pass in kind to avoid calling inferKind method here, it's already evaluated before calling inferLimitPushdownMode.
| extendedTypeMapping, | ||
| bucketNames, | ||
| bucketNames, | ||
| new AggPushDownAction(builderAndParser, extendedTypeMapping, bucketNames).getScriptCount(), |
There was a problem hiding this comment.
It's weird to new a AggPushDownAction here only to get the script count. Replace it with builderAndParser.getLeft().stream().mapToInt(AggPushDownAction::getScriptCount).sum()
There was a problem hiding this comment.
Done. It's now calling getScriptCount instead
| bucketNames, | ||
| scriptCount, | ||
| measureSortTarget, | ||
| inferLimitPushdownMode(measureSortTarget), |
There was a problem hiding this comment.
Why invoking infer again while measureSortTarget is unchanged here?
There was a problem hiding this comment.
This was incorrect. Now it will infer rewriteTarget in local method and based on rewritten AggKind to inferLimitPushdownMode again.
There was a problem hiding this comment.
This was incorrect. Now it will infer rewriteTarget in local method and based on rewritten AggKind to inferLimitPushdownMode again.
| AggregateAnalyzer.AggregateBuilderHelper helper = | ||
| new AggregateAnalyzer.AggregateBuilderHelper( | ||
| rowType, fieldTypes, cluster, bucketNullable, queryBucketSize); | ||
| Pair<List<AggregationBuilder>, OpenSearchAggregationResponseParser> builderAndParser = |
There was a problem hiding this comment.
Why should we analyze the aggregate again? Can we reuse the builderAndParser generated when pushing down the aggregate? Even more, we may be able to reuse the AggPushDownAction.
Since we pushdown everything lazily until this build() method, which will only be invoked once. It should be safe to reuse.
There was a problem hiding this comment.
As discussed offline. Ideally if we don't care about afterKey rewrite in agg pagination, then the builderAndParser can be reusable. I change the behavior to reuse passed builderAndParser to AggSpec.
However, I reviewed the whole process. I still think we still need AggSpec like object to hold planning stage agg state immutable copies(can't fully AggPushDownAction). AggPushDownAction is used at the final materialization of replayed PushDownOperations for agg. It will be better to separate them into two different objects. During optimizer planning stage, we still need to record some agg state changes in different RelSubsets. One for planning side agg state immutable objects, the other one is for replaying final operations to materialize final AggregationBuilder(lazily generate request builder).
As far as I can see, only these params are getting changed in the push down process for agg. |
Signed-off-by: Songkan Tang <songkant@amazon.com>
|
Failed to generate code suggestions for PR |
|
@qianheng-aws Now I have streamlined the AggSpec. And choose not to reuse AggPushDownAction because we need several immutable objects copies in different RelSubsets to represent different logical agg states per RelSubset. However, replaying pushdown operations to change internal agg parser is a one-time mutable state change, which is better lazily built at last. |
Description
PushdownContext.clone() / cloneWithoutSort() / cloneForAggregate() replays add(operation) to construct new context. Aggregation operations reuses the same
AggPushDownActioninstance in both old and new context due to shallow copy side effect happened during cloning PushdownContext. The following pushdown actions from other equivalent RelSubsets will modify this shared mutable action object, which could contaminate previous context's agg states.Related Issues
Resolves #5125
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.