Skip to content

[FLINK-39178][table-planner] Support cascaded delta join in planner#27734

Open
xuyangzhong wants to merge 1 commit intoapache:masterfrom
xuyangzhong:cascaded_dt_comm
Open

[FLINK-39178][table-planner] Support cascaded delta join in planner#27734
xuyangzhong wants to merge 1 commit intoapache:masterfrom
xuyangzhong:cascaded_dt_comm

Conversation

@xuyangzhong
Copy link
Contributor

@xuyangzhong xuyangzhong commented Mar 4, 2026

What is the purpose of the change

Support cascaded delta join in planner.

Note: runtime implement and new exec node will be adapted and introduced later.

Brief change log

  • Add internal DeltaJoinTree to log the join info and calc info upstream
  • Add internal DeltaJoinAssociation to provide the info to build lookup chain
  • Add internal DeltaJoinLookupChain to log the lookup path while there are multi tables in the lookup side
  • Support LHS, RHS and Bushy cascaded delta join in planner
  • Add tests to verify this change

Verifying this change

New tests are added to verify this pr.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? later

@flinkbot
Copy link
Collaborator

flinkbot commented Mar 4, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

public boolean equals(Object o) {
if (o == null || getClass() != o.getClass()) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Missing this == o check for reference equality optimization and early return for same reference.

StringBuilder sb = new StringBuilder();
sb.append("binary: {");
sb.append(
binary2BinaryJoinAssociation.entrySet().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple sorted() calls and string concatenations in hot path. Can we cache if called frequently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is only used to output debug information when errors occur, and it’s called almost never. I’m wondering if there’s no need to optimize it. What do you think?

*
* <pre>{@code
* Top
* (Set1.a1 = Set2.c1 and Set1.b2 = Set2.d2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not understanding this picture.
-I see bottom1 (A.a0 = B.b0) and bottom2 as (C.c0 = D.d0)

  • what is A.a0 does it relate to Set1 or a1?

so less abstractly
(CustomerSet.name = OrderSet2.customerName and CustomerSet.age = OrderSet2.customerAge)
what would

  • Bottom1 Bottom2
    * (A.a0 = B.b0) (C.c0 = D.d0)

Look like as related to the top conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that joins at different levels use different columns as join keys, using the actual column names would be harder to understand. So here I instead specify a simplified convention: each leaf table has columns named with its lowercase letter plus a number, e.g., A(a0, a1), B(b0, b1, b2), C(c0, c1), D(d0, d1, d2).

Do you think it would be easier to understand if I change it to the following?

/**
     * Try to build lookup chain for delta join to do lookup.
     *
     * <p>Take the following join tree as example. Each leaf table has columns named with its
     * lowercase letter and a number, e.g., A(a0, a1), B(b0, b1, b2), C(c0, c1), D(d0, d1, d2).
     *
     * <pre>{@code
     *                     Top
     *            (a1 = c1 and b2 = d2)
     *            /                       \
     *       Bottom1                      Bottom2
     *      (a0 = b0)                    (c0 = d0)
     *    /         \                    /       \
     * A(a0,a1)  B(b0,b1,b2)        C(c0,c1)  D(d0,d1,d2)
     *
     * }</pre>
     *
     * <p>If Bottom1 is treated as stream side and Bottom2 is treated as lookup side, the lookup
     * chain will be like this:
     *
     * <p>use A + B to lookup C with (a1 = c1) -> use C to lookup D with (c0 = d0).
     */

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants