Fixing fallback in ORCA when we have a Correlated IN query with no
projections from the inner side.
For the query: `explain select * from foo where foo.a in (select foo.b from bar);`
ORCA generates 2 diff plans based on the size of `bar`. If bar is a small table,
then ORCA picks the plan below as it is cheaper to broadcast bar.
```
Physical plan #1:
+--CPhysicalMotionGather(master) rows:1 width:76 rebinds:1 cost:1324032.133055 origin: [Grp:4, GrpExpr:19]
+--CPhysicalCorrelatedInLeftSemiNLJoin("b" (1)) rows:1 width:76 rebinds:1 cost:1324032.133025 origin: [Grp:4, GrpExpr:18]
|--CPhysicalFilter rows:1 width:38 rebinds:1 cost:431.000092 origin: [Grp:9, GrpExpr:1]
| |--CPhysicalTableScan "foo" ("foo") rows:1 width:38 rebinds:1 cost:431.000021 origin: [Grp:0, GrpExpr:1]
| +--CScalarCmp (=) origin: [Grp:6, GrpExpr:0]
| |--CScalarIdent "a" (0) origin: [Grp:2, GrpExpr:0]
| +--CScalarIdent "b" (1) origin: [Grp:5, GrpExpr:0]
|--CPhysicalSpool rows:1 width:38 rebinds:1 cost:431.000026 origin: [Grp:1, GrpExpr:3]
| +--CPhysicalMotionBroadcast rows:1 width:38 rebinds:1 cost:431.000025 origin: [Grp:1, GrpExpr:2]
| +--CPhysicalTableScan "bar" ("bar") rows:1 width:38 rebinds:1 cost:431.000007 origin: [Grp:1, GrpExpr:1]
+--CScalarConst (1) origin: [Grp:8, GrpExpr:0]
```
However if bar is a large table, ORCA will decorrelate it into a inner join
and produces a plan as below:
```
Physical plan #2:
+--CPhysicalMotionGather(master) rows:1 width:76 rebinds:1 cost:1324095.808700 origin: [Grp:4, GrpExpr:19]
+--CPhysicalInnerNLJoin rows:1 width:76 rebinds:1 cost:1324095.808670 origin: [Grp:4, GrpExpr:14]
|--CPhysicalFilter rows:1 width:38 rebinds:1 cost:431.000092 origin: [Grp:9, GrpExpr:1]
| |--CPhysicalTableScan "foo" ("foo") rows:1 width:38 rebinds:1 cost:431.000021 origin: [Grp:0, GrpExpr:1]
| +--CScalarCmp (=) origin: [Grp:6, GrpExpr:0]
| |--CScalarIdent "a" (0) origin: [Grp:2, GrpExpr:0]
| +--CScalarIdent "b" (1) origin: [Grp:5, GrpExpr:0]
|--CPhysicalSpool rows:1 width:38 rebinds:1 cost:431.062210 origin: [Grp:32, GrpExpr:5]
| +--CPhysicalMotionBroadcast rows:1 width:38 rebinds:1 cost:431.062209 origin: [Grp:32, GrpExpr:6]
| +--CPhysicalLimit <empty> global rows:1 width:38 rebinds:1 cost:431.062155 origin: [Grp:32, GrpExpr:2]
| |--CPhysicalMotionGather(master) rows:1 width:38 rebinds:1 cost:431.062154 origin: [Grp:45, GrpExpr:2]
| | +--CPhysicalLimit <empty> local rows:1 width:38 rebinds:1 cost:431.062150 origin: [Grp:45, GrpExpr:1]
| | |--CPhysicalTableScan "bar" ("bar") rows:8192 width:38 rebinds:1 cost:431.057071 origin: [Grp:1, GrpExpr:1]
| | |--CScalarConst (0) origin: [Grp:15, GrpExpr:0]
| | +--CScalarConst (1) origin: [Grp:31, GrpExpr:0]
| |--CScalarConst (0) origin: [Grp:15, GrpExpr:0]
| +--CScalarConst (1) origin: [Grp:31, GrpExpr:0]
+--CScalarConst (1) origin: [Grp:8, GrpExpr:0]
```
Translator successfully translates Plan#2 however, it throws an exception while translating
plan #1 into a `subplan` and falls back to planner.
This PR fixes this to handle the translation for plan#1.
We added a check for `COperator::EopPhysicalCorrelatedInLeftSemiNLJoin == eopid`
in `CTranslatorExprToDXL::PdxlnCorrelatedNLJoin`. This function creates a scalar subplan
if you have a correlated NL join with a true join filter. The current check only handled
`CorrelatedInnerNLJoin` case and so we extended it to handle `CorrelatedInLeftSemiNLJoin` case as well.
This produces correct subplan and there is no fallback.
```
Gather Motion 3:1 (slice2; segments: 3) (cost=0.00..1324032.10 rows=2 width=8)
-> Table Scan on foo (cost=0.00..1324032.10 rows=1 width=8)
Filter: a = a AND ((subplan))
SubPlan 1
-> Result (cost=0.00..431.00 rows=1 width=1)
-> Materialize (cost=0.00..431.00 rows=1 width=1)
-> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..431.00 rows=1 width=1)
-> Table Scan on bar (cost=0.00..431.00 rows=1 width=1)
Settings: optimizer=on
Optimizer status: PQO version 2.36.0
```
1. The plan produced above can be further optimized by inserting limit
over `bar`. We will file a separate story to handle this.
2. We could not generate a repro query with NOT IN with the similar
symptom as this (-CPhysicalCorrelatedNotInLeftAntiSemiNLJoin join with
Const true filter); hence no check has been added for
`EopPhysicalCorrelatedNotInLeftAntiSemiNLJoin` join type. ORCA always
decorraltes this type of NOT IN query into CPhysicalInnerNLJoin with
scalar comparison as a <> b:
`explain select * from foo where foo.a not in (select foo.b from bar)`
Added minidump tests:
1. `CorrelatedIN-LeftSemiJoin-True.mdp` This is the repro query as shown
above.
2. `CorrelatedIN-LeftSemiNotIn-True.mdp` This was previously causing a crash with
ORCA in DEBUG build. Now it produces the correct plan.
[#147893491]
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
Showing
此差异已折叠。
此差异已折叠。
想要评论请 注册 或 登录