[FLINK-39178][table-planner] Support cascaded delta join in planner#27734
[FLINK-39178][table-planner] Support cascaded delta join in planner#27734xuyangzhong wants to merge 1 commit intoapache:masterfrom
Conversation
| public boolean equals(Object o) { | ||
| if (o == null || getClass() != o.getClass()) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
nit: Missing this == o check for reference equality optimization and early return for same reference.
| StringBuilder sb = new StringBuilder(); | ||
| sb.append("binary: {"); | ||
| sb.append( | ||
| binary2BinaryJoinAssociation.entrySet().stream() |
There was a problem hiding this comment.
Multiple sorted() calls and string concatenations in hot path. Can we cache if called frequently?
There was a problem hiding this comment.
This method is only used to output debug information when errors occur, and it’s called almost never. I’m wondering if there’s no need to optimize it. What do you think?
| * | ||
| * <pre>{@code | ||
| * Top | ||
| * (Set1.a1 = Set2.c1 and Set1.b2 = Set2.d2) |
There was a problem hiding this comment.
I am not understanding this picture.
-I see bottom1 (A.a0 = B.b0) and bottom2 as (C.c0 = D.d0)
- what is A.a0 does it relate to Set1 or a1?
so less abstractly
(CustomerSet.name = OrderSet2.customerName and CustomerSet.age = OrderSet2.customerAge)
what would
- Bottom1 Bottom2
* (A.a0 = B.b0) (C.c0 = D.d0)
Look like as related to the top conditions.
There was a problem hiding this comment.
Considering that joins at different levels use different columns as join keys, using the actual column names would be harder to understand. So here I instead specify a simplified convention: each leaf table has columns named with its lowercase letter plus a number, e.g., A(a0, a1), B(b0, b1, b2), C(c0, c1), D(d0, d1, d2).
Do you think it would be easier to understand if I change it to the following?
/**
* Try to build lookup chain for delta join to do lookup.
*
* <p>Take the following join tree as example. Each leaf table has columns named with its
* lowercase letter and a number, e.g., A(a0, a1), B(b0, b1, b2), C(c0, c1), D(d0, d1, d2).
*
* <pre>{@code
* Top
* (a1 = c1 and b2 = d2)
* / \
* Bottom1 Bottom2
* (a0 = b0) (c0 = d0)
* / \ / \
* A(a0,a1) B(b0,b1,b2) C(c0,c1) D(d0,d1,d2)
*
* }</pre>
*
* <p>If Bottom1 is treated as stream side and Bottom2 is treated as lookup side, the lookup
* chain will be like this:
*
* <p>use A + B to lookup C with (a1 = c1) -> use C to lookup D with (c0 = d0).
*/
What is the purpose of the change
Support cascaded delta join in planner.
Note: runtime implement and new exec node will be adapted and introduced later.
Brief change log
DeltaJoinTreeto log the join info and calc info upstreamDeltaJoinAssociationto provide the info to build lookup chainDeltaJoinLookupChainto log the lookup path while there are multi tables in the lookup sideVerifying this change
New tests are added to verify this pr.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation