risingwave_frontend::optimizer

Module delta_join_solver

source
Expand description

The solver for delta join, which determines lookup order of a join plan. All collection types in this module should be BTree to ensure determinism between runs.

§Representation of Multi-Way Join

In this module, a* means lookup executor that looks-up the arrangement a of the current epoch. a means looks-up arrangement a of the previous epoch.

Delta joins only support inner equal join. The solver is based on the following formula (take 3-way join as an example):

d((A join1 B) join2 C)
= ((A + dA) join1 (B + dB)) join2 (C + dC) - A join1 B join2 C
= (A join1 B + A join1 dB + dA join1 (B + dB)) join2 (C + dC) - A join1 B join2 C
= A join1 B join2 (C + dC) + A join1 dB join2 (C + dC) + dA join1 (B + dB) join2 (C + dC) - A join1 B join2 C
= A join1 B join2 dC + A join1 dB join2 (C + dC) + dA join1 (B + dB) join2 (C + dC)
= dA join1 (B + dB) join2 (C + dC) + dB join1 A join2 (C + dC) + dC join2 B join1 A

join1 means A join B using condition #1,
join2 means B join C using condition #2.

Inner joins satisfy commutative law and associative laws, so we can switch them back and forth between joins.

… which generates the following look-up graph:

a -> b* -> c* -> output3
b -> a  -> c* -> output2
c -> b  -> a  -> output1

And the final output is output1 <concat> output2 <concat> output3. The concatenation is ordered: all items from output 1 must appear before output 2.

TODO: support dynamic filter and filter condition in lookups.

§What need the caller do?

After solving the delta join plan, caller will need to do a lot of things.

  • Use the correct index for every stream input. By looking at the first lookup fragment in each row, we can decide whether to use a.x or a.y as index for stream input.
  • Insert exchanges between lookups of different distribution. Generally, if the whole row is operating on the same join key, we only need to do 1v1 exchanges between lookups. However, it would be possible that a row of lookup first join a.x == b.x, then a.y == c.y. In this case, we will need to insert hash exchange between these two lookups.
  • Ensure the order of union. Always union from the last row to the first row.
  • Insert exchange before union. Still the case for a.x == b.x, then a.y == c.y, it is possible that every lookup path produces different distribution. We need to shuffle them before feeding data to union.

Structs§

  • Given a multi-way inner join plan, the solver will produces the most efficient way of getting those join done.
  • Represents whether left and right can be joined using a condition.
  • A row in the lookup plan, which includes the stream side arrangement, and all arrangements to be placed in the lookup node.
  • SolverEnv 🔒

Enums§

  • Decides how to place arrangements over lookup nodes. Given a 3-way join example:
  • Decides how to place stream inputs. Given a 3-way join example: