提交 989415a2 编写于 作者: D Dhanashree Kashid and Jemish Patel 提交者: Jemish Patel

Fixing fallback in ORCA when we have a Correlated IN query with no

projections from the inner side.

For the query: `explain select * from foo where foo.a in (select foo.b from bar);`
ORCA generates 2 diff plans based on the size of `bar`.  If bar is a small table,
then  ORCA picks the plan below as it is cheaper to broadcast bar.

```
Physical plan #1:
+--CPhysicalMotionGather(master)   rows:1   width:76  rebinds:1   cost:1324032.133055   origin: [Grp:4, GrpExpr:19]
   +--CPhysicalCorrelatedInLeftSemiNLJoin("b" (1))   rows:1   width:76  rebinds:1   cost:1324032.133025   origin: [Grp:4, GrpExpr:18]
      |--CPhysicalFilter   rows:1   width:38  rebinds:1   cost:431.000092   origin: [Grp:9, GrpExpr:1]
      |  |--CPhysicalTableScan "foo" ("foo")   rows:1   width:38  rebinds:1   cost:431.000021   origin: [Grp:0, GrpExpr:1]
      |  +--CScalarCmp (=)   origin: [Grp:6, GrpExpr:0]
      |     |--CScalarIdent "a" (0)   origin: [Grp:2, GrpExpr:0]
      |     +--CScalarIdent "b" (1)   origin: [Grp:5, GrpExpr:0]
      |--CPhysicalSpool   rows:1   width:38  rebinds:1   cost:431.000026   origin: [Grp:1, GrpExpr:3]
      |  +--CPhysicalMotionBroadcast    rows:1   width:38  rebinds:1   cost:431.000025   origin: [Grp:1, GrpExpr:2]
      |     +--CPhysicalTableScan "bar" ("bar")   rows:1   width:38  rebinds:1   cost:431.000007   origin: [Grp:1, GrpExpr:1]
      +--CScalarConst (1)   origin: [Grp:8, GrpExpr:0]
```
However if bar is a large table, ORCA will decorrelate it into a inner join
and produces a plan as below:

```
Physical plan #2:
+--CPhysicalMotionGather(master)   rows:1   width:76  rebinds:1   cost:1324095.808700   origin: [Grp:4, GrpExpr:19]
   +--CPhysicalInnerNLJoin   rows:1   width:76  rebinds:1   cost:1324095.808670   origin: [Grp:4, GrpExpr:14]
      |--CPhysicalFilter   rows:1   width:38  rebinds:1   cost:431.000092   origin: [Grp:9, GrpExpr:1]
      |  |--CPhysicalTableScan "foo" ("foo")   rows:1   width:38  rebinds:1   cost:431.000021   origin: [Grp:0, GrpExpr:1]
      |  +--CScalarCmp (=)   origin: [Grp:6, GrpExpr:0]
      |     |--CScalarIdent "a" (0)   origin: [Grp:2, GrpExpr:0]
      |     +--CScalarIdent "b" (1)   origin: [Grp:5, GrpExpr:0]
      |--CPhysicalSpool   rows:1   width:38  rebinds:1   cost:431.062210   origin: [Grp:32, GrpExpr:5]
      |  +--CPhysicalMotionBroadcast    rows:1   width:38  rebinds:1   cost:431.062209   origin: [Grp:32, GrpExpr:6]
      |     +--CPhysicalLimit <empty> global   rows:1   width:38  rebinds:1   cost:431.062155   origin: [Grp:32, GrpExpr:2]
      |        |--CPhysicalMotionGather(master)   rows:1   width:38  rebinds:1   cost:431.062154   origin: [Grp:45, GrpExpr:2]
      |        |  +--CPhysicalLimit <empty> local   rows:1   width:38  rebinds:1   cost:431.062150   origin: [Grp:45, GrpExpr:1]
      |        |     |--CPhysicalTableScan "bar" ("bar")   rows:8192   width:38  rebinds:1   cost:431.057071   origin: [Grp:1, GrpExpr:1]
      |        |     |--CScalarConst (0)   origin: [Grp:15, GrpExpr:0]
      |        |     +--CScalarConst (1)   origin: [Grp:31, GrpExpr:0]
      |        |--CScalarConst (0)   origin: [Grp:15, GrpExpr:0]
      |        +--CScalarConst (1)   origin: [Grp:31, GrpExpr:0]
      +--CScalarConst (1)   origin: [Grp:8, GrpExpr:0]
```

Translator successfully translates Plan#2 however, it throws an exception while translating
plan #1 into a `subplan` and falls back to planner.
This PR fixes this to handle the translation for plan#1.

We added a check for `COperator::EopPhysicalCorrelatedInLeftSemiNLJoin == eopid`
in `CTranslatorExprToDXL::PdxlnCorrelatedNLJoin`. This function creates a scalar subplan
if you have a correlated NL join with a true join filter. The current check only handled
`CorrelatedInnerNLJoin` case and so we extended it to handle `CorrelatedInLeftSemiNLJoin` case as well.

This produces correct subplan and there is no fallback.
```
 Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..1324032.10 rows=2 width=8)
   ->  Table Scan on foo  (cost=0.00..1324032.10 rows=1 width=8)
         Filter: a = a AND ((subplan))
         SubPlan 1
           ->  Result  (cost=0.00..431.00 rows=1 width=1)
                 ->  Materialize  (cost=0.00..431.00 rows=1 width=1)
                       ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=1)
                             ->  Table Scan on bar  (cost=0.00..431.00 rows=1 width=1)
 Settings:  optimizer=on
 Optimizer status: PQO version 2.36.0
```

1. The plan produced above can be further optimized by inserting limit
over `bar`. We will file a separate story to handle this.
2. We could not generate a repro query with NOT IN with the similar
symptom as this (-CPhysicalCorrelatedNotInLeftAntiSemiNLJoin join with
Const true filter); hence no check has been added for
`EopPhysicalCorrelatedNotInLeftAntiSemiNLJoin` join type. ORCA always
decorraltes this type of NOT IN query into CPhysicalInnerNLJoin with
scalar comparison as a <> b:
`explain select * from foo where foo.a not in (select foo.b from bar)`

Added minidump tests:

1. `CorrelatedIN-LeftSemiJoin-True.mdp` This is the repro query as shown
above.
2. `CorrelatedIN-LeftSemiNotIn-True.mdp` This was previously causing a crash with
ORCA in DEBUG build. Now it produces the correct plan.

[#147893491]
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
上级 e973f96e
此差异已折叠。
此差异已折叠。
......@@ -3251,8 +3251,16 @@ CTranslatorExprToDXL::PdxlnCorrelatedNLJoin
COperator::EOperatorId eopid = pexpr->Pop()->Eopid();
CDXLNode *pdxlnCond = NULL;
// Create a subplan with a Boolean from the inner child if we have a Const True as a join condition.
// One scenario for this is when IN sublinks contain a projection from the outer table only such as:
// select * from foo where foo.a in (select foo.b from bar);
// If bar is a very small table, ORCA generates a CorrelatedInLeftSemiNLJoin with a Const true join filter
// and condition foo.a = foo.b is added as a filter on the table scan of foo. If bar is a large table,
// ORCA generates a plan with CorrelatedInnerNLJoin with a Const true join filter and a LIMIT over the
// scan of bar. The same foo.a = foo.b condition is also added as a filter on the table scan of foo.
if (CUtils::FScalarConstTrue(pexprScalar) &&
COperator::EopPhysicalCorrelatedInnerNLJoin == eopid)
(COperator::EopPhysicalCorrelatedInnerNLJoin == eopid || COperator::EopPhysicalCorrelatedInLeftSemiNLJoin == eopid))
{
// translate relational inner child expression
CDXLNode *pdxlnInnerChild = Pdxln
......
......@@ -82,6 +82,8 @@ const CHAR *rgszFileNames[] =
"../data/dxl/minidump/Correlated-LASJ-With-Outer-Expr.mdp",
"../data/dxl/minidump/Correlated-SemiJoin.mdp",
"../data/dxl/minidump/CorrelatedSemiJoin-True.mdp",
"../data/dxl/minidump/CorrelatedIN-LeftSemiJoin-True.mdp",
"../data/dxl/minidump/CorrelatedIN-LeftSemiNotIn-True.mdp",
"../data/dxl/minidump/Correlated-AntiSemiJoin.mdp",
"../data/dxl/minidump/CorrelatedAntiSemiJoin-True.mdp",
"../data/dxl/minidump/Correlation-With-Casting-1.mdp",
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册