1. 19 7月, 2017 4 次提交
    • D
      Fixing fallback in ORCA when we have a Correlated IN query with no · 989415a2
      Dhanashree Kashid and Jemish Patel 提交于
      projections from the inner side.
      
      For the query: `explain select * from foo where foo.a in (select foo.b from bar);`
      ORCA generates 2 diff plans based on the size of `bar`.  If bar is a small table,
      then  ORCA picks the plan below as it is cheaper to broadcast bar.
      
      ```
      Physical plan #1:
      +--CPhysicalMotionGather(master)   rows:1   width:76  rebinds:1   cost:1324032.133055   origin: [Grp:4, GrpExpr:19]
         +--CPhysicalCorrelatedInLeftSemiNLJoin("b" (1))   rows:1   width:76  rebinds:1   cost:1324032.133025   origin: [Grp:4, GrpExpr:18]
            |--CPhysicalFilter   rows:1   width:38  rebinds:1   cost:431.000092   origin: [Grp:9, GrpExpr:1]
            |  |--CPhysicalTableScan "foo" ("foo")   rows:1   width:38  rebinds:1   cost:431.000021   origin: [Grp:0, GrpExpr:1]
            |  +--CScalarCmp (=)   origin: [Grp:6, GrpExpr:0]
            |     |--CScalarIdent "a" (0)   origin: [Grp:2, GrpExpr:0]
            |     +--CScalarIdent "b" (1)   origin: [Grp:5, GrpExpr:0]
            |--CPhysicalSpool   rows:1   width:38  rebinds:1   cost:431.000026   origin: [Grp:1, GrpExpr:3]
            |  +--CPhysicalMotionBroadcast    rows:1   width:38  rebinds:1   cost:431.000025   origin: [Grp:1, GrpExpr:2]
            |     +--CPhysicalTableScan "bar" ("bar")   rows:1   width:38  rebinds:1   cost:431.000007   origin: [Grp:1, GrpExpr:1]
            +--CScalarConst (1)   origin: [Grp:8, GrpExpr:0]
      ```
      However if bar is a large table, ORCA will decorrelate it into a inner join
      and produces a plan as below:
      
      ```
      Physical plan #2:
      +--CPhysicalMotionGather(master)   rows:1   width:76  rebinds:1   cost:1324095.808700   origin: [Grp:4, GrpExpr:19]
         +--CPhysicalInnerNLJoin   rows:1   width:76  rebinds:1   cost:1324095.808670   origin: [Grp:4, GrpExpr:14]
            |--CPhysicalFilter   rows:1   width:38  rebinds:1   cost:431.000092   origin: [Grp:9, GrpExpr:1]
            |  |--CPhysicalTableScan "foo" ("foo")   rows:1   width:38  rebinds:1   cost:431.000021   origin: [Grp:0, GrpExpr:1]
            |  +--CScalarCmp (=)   origin: [Grp:6, GrpExpr:0]
            |     |--CScalarIdent "a" (0)   origin: [Grp:2, GrpExpr:0]
            |     +--CScalarIdent "b" (1)   origin: [Grp:5, GrpExpr:0]
            |--CPhysicalSpool   rows:1   width:38  rebinds:1   cost:431.062210   origin: [Grp:32, GrpExpr:5]
            |  +--CPhysicalMotionBroadcast    rows:1   width:38  rebinds:1   cost:431.062209   origin: [Grp:32, GrpExpr:6]
            |     +--CPhysicalLimit <empty> global   rows:1   width:38  rebinds:1   cost:431.062155   origin: [Grp:32, GrpExpr:2]
            |        |--CPhysicalMotionGather(master)   rows:1   width:38  rebinds:1   cost:431.062154   origin: [Grp:45, GrpExpr:2]
            |        |  +--CPhysicalLimit <empty> local   rows:1   width:38  rebinds:1   cost:431.062150   origin: [Grp:45, GrpExpr:1]
            |        |     |--CPhysicalTableScan "bar" ("bar")   rows:8192   width:38  rebinds:1   cost:431.057071   origin: [Grp:1, GrpExpr:1]
            |        |     |--CScalarConst (0)   origin: [Grp:15, GrpExpr:0]
            |        |     +--CScalarConst (1)   origin: [Grp:31, GrpExpr:0]
            |        |--CScalarConst (0)   origin: [Grp:15, GrpExpr:0]
            |        +--CScalarConst (1)   origin: [Grp:31, GrpExpr:0]
            +--CScalarConst (1)   origin: [Grp:8, GrpExpr:0]
      ```
      
      Translator successfully translates Plan#2 however, it throws an exception while translating
      plan #1 into a `subplan` and falls back to planner.
      This PR fixes this to handle the translation for plan#1.
      
      We added a check for `COperator::EopPhysicalCorrelatedInLeftSemiNLJoin == eopid`
      in `CTranslatorExprToDXL::PdxlnCorrelatedNLJoin`. This function creates a scalar subplan
      if you have a correlated NL join with a true join filter. The current check only handled
      `CorrelatedInnerNLJoin` case and so we extended it to handle `CorrelatedInLeftSemiNLJoin` case as well.
      
      This produces correct subplan and there is no fallback.
      ```
       Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..1324032.10 rows=2 width=8)
         ->  Table Scan on foo  (cost=0.00..1324032.10 rows=1 width=8)
               Filter: a = a AND ((subplan))
               SubPlan 1
                 ->  Result  (cost=0.00..431.00 rows=1 width=1)
                       ->  Materialize  (cost=0.00..431.00 rows=1 width=1)
                             ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=1)
                                   ->  Table Scan on bar  (cost=0.00..431.00 rows=1 width=1)
       Settings:  optimizer=on
       Optimizer status: PQO version 2.36.0
      ```
      
      1. The plan produced above can be further optimized by inserting limit
      over `bar`. We will file a separate story to handle this.
      2. We could not generate a repro query with NOT IN with the similar
      symptom as this (-CPhysicalCorrelatedNotInLeftAntiSemiNLJoin join with
      Const true filter); hence no check has been added for
      `EopPhysicalCorrelatedNotInLeftAntiSemiNLJoin` join type. ORCA always
      decorraltes this type of NOT IN query into CPhysicalInnerNLJoin with
      scalar comparison as a <> b:
      `explain select * from foo where foo.a not in (select foo.b from bar)`
      
      Added minidump tests:
      
      1. `CorrelatedIN-LeftSemiJoin-True.mdp` This is the repro query as shown
      above.
      2. `CorrelatedIN-LeftSemiNotIn-True.mdp` This was previously causing a crash with
      ORCA in DEBUG build. Now it produces the correct plan.
      
      [#147893491]
      Signed-off-by: NJemish Patel <jpatel@pivotal.io>
      989415a2
    • B
      Bump ORCA version to 2.38 · e973f96e
      Bhunvesh Chaudhary 提交于
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      e973f96e
    • B
      Implement DXL Representation for VALUESSCAN [#147773843] · 8b95abb0
      Bhunvesh Chaudhary 提交于
      Postgres and thus (GPDB Planner) supports Values via an operator called ValueScan.
      ```
      explain SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t (num,letter);
                                QUERY PLAN
      --------------------------------------------------------------
       Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=1 width=36)
       Optimizer status: legacy query optimizer
      (2 rows)
      ```
      However, inside Orca we expand each row in the Values list into a
      Result node that projects constants.
      Thus the above query with three rows having 2 columns is
      represented as a plan by GPORCA as an append with three Result nodes.
      Each of the Result nodes is a CTG with project elements.
      ```
      explain SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t (num,letter);
                         QUERY PLAN
      -------------------------------------------------
       Append  (cost=0.00..0.00 rows=1 width=12)
         ->  Result  (cost=0.00..0.00 rows=1 width=12)
         ->  Result  (cost=0.00..0.00 rows=1 width=12)
         ->  Result  (cost=0.00..0.00 rows=1 width=12)
       Settings:  optimizer=on
       Optimizer status: PQO version 2.32.0
      (6 rows)
      ```
      
      This commit introduces a new value scan operator and instead of
      generating multiple result node, ORCA will now generate a value scan
      node. The resulting plan will look like:
      
      ```
                                                QUERY PLAN
      ----------------------------------------------------------------------------------------------
       Values Scan on "Values"  (cost=0.00..0.44 rows=37000 width=4)
       Optimizer status: PQO version 2.37.0
      (2 rows)
      ```
      
      This enhancement bring in significant performance improvement in total
      runtime of the queries involving high number of constant values.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      8b95abb0
    • B
      Add OSPrint for Const Table Get · 2c4e0a9d
      Bhunvesh Chaudhary 提交于
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      2c4e0a9d
  2. 15 7月, 2017 1 次提交
    • B
      Remove part oid (#186) · 2c139497
      Bhuvnesh 提交于
      * Donot generated PartOid Expression
      
      In GPDB, PartOidExpr is not used however ORCA still generates it.
      But HAWQ uses PartOid for sorting while inserting into Append Only
      Row / Parquet Partitioned tables.
      
      This patch uses Parquet Storage and Number of Partitions in a Append
      Only row partitioned table to decide if PartOid should be generated.
      In case of GPDB, Parquet storage is not supported and the GUC to control
      the number of partitions above which sort should be used is set to int
      max which is practically not feasible, so in case of GPDB PartOid expr
      will never be generated, however HAWQ can control the generation of
      PartOid based on the value of already existing GUCs in HAWQ.
      
      * Remove PartOid ProjElem from minidump files
      
      * Fixed CICGTest
      
      * Fix CDMLTest
      
      * Fix CDirectDispatchTest
      
      * Fix CPhysicalParallelUnionAllTest
      
      * Fix CCollapseProjectTest test
      
      * Fix parser for Partition Selector
      
      A Partition Selector node can have another partition selector node as
      its immediate child. In such cases, the current parsers fails. The patch
      fixes the issue
      
      * Fix PartTbl Test
      
      * PR Feedback Applied
      
      * Applied HSY feedback 1
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      
      * Bump ORCA to 2.37
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      2c139497
  3. 11 7月, 2017 4 次提交
    • V
      Update ORCA version · 73a7bffd
      Venkatesh Raghavan 提交于
      73a7bffd
    • V
      Convert Non-correlated EXISTS subquery to a LIMIT 1 AND a JOIN · e04ae39d
      Venkatesh Raghavan 提交于
      Enable GPORCA to generate better plans for non-correlated exists subquery in the WHERE clause
      
      Consider the following exists subquery, `(select * from bar)`. GPORCA generates an elaborate count based implementation of this subquery. If bar is a fact table, the count is going to be expensive.
      
      ```
      vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                                          QUERY PLAN
      ------------------------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice3; segments: 3)  (cost=0.00..1368262.79 rows=400324 width=8)
         ->  Nested Loop  (cost=0.00..1368250.86 rows=133442 width=8)
               Join Filter: true
               ->  Table Scan on foo  (cost=0.00..461.91 rows=133442 width=8)
                     Filter: a = b
               ->  Materialize  (cost=0.00..438.57 rows=1 width=1)
                     ->  Broadcast Motion 1:3  (slice2)  (cost=0.00..438.57 rows=3 width=1)
                           ->  Result  (cost=0.00..438.57 rows=1 width=1)
                                 Filter: (count((count()))) > 0::bigint
                                 ->  Aggregate  (cost=0.00..438.57 rows=1 width=8)
                                       ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..438.57 rows=1 width=8)
                                             ->  Aggregate  (cost=0.00..438.57 rows=1 width=8)
                                                   ->  Table Scan on bar  (cost=0.00..437.95 rows=332395 width=1)
       Optimizer status: PQO version 2.35.1
      (14 rows)
      ```
      Planner on the other hand uses LIMIT as shown in the INIT plan.
      
      ```
      vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                                 QUERY PLAN
      ------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice2; segments: 3)  (cost=0.03..13611.14 rows=1001 width=8)
         ->  Result  (cost=0.03..13611.14 rows=334 width=8)
               One-Time Filter: $0
               InitPlan  (slice3)
                 ->  Limit  (cost=0.00..0.03 rows=1 width=0)
                       ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..0.03 rows=1 width=0)
                             ->  Limit  (cost=0.00..0.01 rows=1 width=0)
                                   ->  Seq Scan on bar  (cost=0.00..11072.84 rows=332395 width=0)
               ->  Seq Scan on foo  (cost=0.00..13611.11 rows=334 width=8)
                     Filter: a = b
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
      (12 rows)
      ```
      
      While GPORCA doesnot support init-plan, we can nevertheless generate a better plan by using LIMIT instead of count. After this PR, GPORCA will generate the following plan with LIMIT clause.
      
      ```
      vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                                       QUERY PLAN
      ------------------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice3; segments: 3)  (cost=0.00..1368262.73 rows=400324 width=8)
         ->  Nested Loop EXISTS Join  (cost=0.00..1368250.80 rows=133442 width=8)
               Join Filter: true
               ->  Table Scan on foo  (cost=0.00..461.91 rows=133442 width=8)
                     Filter: a = b
               ->  Materialize  (cost=0.00..438.57 rows=1 width=1)
                     ->  Broadcast Motion 1:3  (slice2)  (cost=0.00..438.57 rows=3 width=1)
                           ->  Limit  (cost=0.00..438.57 rows=1 width=1)
                                 ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..438.57 rows=1 width=1)
                                       ->  Limit  (cost=0.00..438.57 rows=1 width=1)
                                             ->  Table Scan on bar  (cost=0.00..437.95 rows=332395 width=1)
       Optimizer status: PQO version 2.35.1
      (12 rows)
      ```
      e04ae39d
    • B
      Bump ORCA version to 2.35.2 · 4ad9ce70
      Bhunvesh Chaudhary 提交于
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      4ad9ce70
    • B
      Check existing tag before publishing new artifacts · 9ff85c29
      Bhunvesh Chaudhary 提交于
      Each ORCA commit must BUMP the version. If the version is not
      bumped new releases will not be pushed to the ORCA repository.
      This commit adds the check to validate the version of the
      current commit with the tag version existing on the repository.
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      9ff85c29
  4. 06 7月, 2017 3 次提交
    • O
      Fix the behaviour of FBetterThan for Random vs Hashed · bbbbd699
      Omer Arap 提交于
      There is a tie breaker logic in FBetterThan function in CCostContext.
      According to the code if the distribution is Hashed rather than Random,
      it should be favored when the costs are equal.
      
      The code was checking both if the distribution spec in the same context
      is both equal to Hashed and Random which is false by default. It should
      check if comparing one is Hashed and compared one is Random for correct
      behavior.
      bbbbd699
    • B
      Bump ORCA version to 2.35.0 · 900a586f
      Bhuvnesh Chaudhary 提交于
      There are changes to regress tests in GPDB repository, so bumping up the
      minor version.
      900a586f
    • B
      Preprocess query to ensure Scalar Ident is on LHS · aa6754d1
      Bhuvnesh Chaudhary 提交于
      CScalarCmp and CScalarIsDistinctFrom operator must have the CScalarIdent operator
      on the LHS and CScalarConst operator on the RHS. If it's
      not the case the predicate is assigned a default selectivity
      due to which the Cardinality estimate is impacted.
      
      This patch fixes the issue by reordering the children of
      CScalarCmp and CScalarIsDistinctFrom operator if CScalarIdent operator is on
      the RHS on CScalarConst is on LHS. Also the Comparision operator is changed
      due to the reordering of the arguments, if supported. If the
      corresponding comparision operator does not exist or supported, the
      children are not reordered.
      Only cases of the type CONST = VAR are handled by this patch.
      Signed-off-by: NOmer Arap <oarap@pivotal.io>
      aa6754d1
  5. 01 7月, 2017 1 次提交
  6. 30 6月, 2017 2 次提交
    • J
      Enable Concourse caching of ccache · b46380ee
      Jesse Zhang 提交于
      Shiny new feature in Concourse 3.3.0
      (https://concourse.ci/running-tasks.html#caches)
      
      [ci skip]
      b46380ee
    • H
      Get rid of UlSafeLength() function. · 25c2b4dd
      Heikki Linnakangas 提交于
      It made the assumption that it's OK to call it on a NULL pointer, which
      isn't cool with all C++ compilers and options. I'm getting a bunch of
      warnings like this because of it:
      
      /home/heikki/gpdb/optimizer-main/libgpos/include/gpos/common/CDynamicPtrArray.inl:382:3: warning: nonnull argument ‘this’ compared to NULL [-Wnonnull-compare]
         if (NULL == this)
         ^~
      
      There are a few other places that produce the same error, but one step at
      a time. This is most important because it's in an inline function, so this
      produces warnings also in any code that uses ORCA, like the translator code
      in GPDB's src/backend/gpopt/ directory, not just ORCA itself.
      
      Since the function is now gone, all references to it also need to be removed
      from the translator code outside ORCA.
      
      Bump up Orca version to 2.34.2
      25c2b4dd
  7. 28 6月, 2017 2 次提交
  8. 21 6月, 2017 1 次提交
  9. 20 6月, 2017 2 次提交
    • O
      Remove .inl files and merge implementation in .h · 80305df7
      Omer Arap 提交于
      The template classes in orca mostly do the implementation in
      `.inl` files while some implementation also exists in `.h` files.
      It makes it hard to traverse the code in .inl files since some
      IDEs do not recognize formatting. Therefore this commit moves
      the implementation to the `.h` files wherever is applicable.
      
      This commit does not port implementation from `.inl` where
      there exists `.cpp` implementation file as well as `.h` only has
      function definitions such as `CUtils.h`.
      80305df7
    • V
      Update Join Cardinality Estimation for Text/bpchar/varchar/char columns · 6567f566
      Venkatesh Raghavan 提交于
      Histogram intersection depends on the value of the bucket boundaries.
      For datatypes like text, varchar, etc. Orca currently uses a hash function
      to mark bucket boundaries. This function is slightly useful for equality
      with singleton buckets but nothing more. So the previous join stats computation
      based on histogram intersection is totally bogus. In this CL, we now modified
      it into a NDV (number of distinct values) based estimation.
      6567f566
  10. 10 6月, 2017 1 次提交
  11. 08 6月, 2017 1 次提交
  12. 18 5月, 2017 1 次提交
  13. 16 5月, 2017 2 次提交
  14. 15 5月, 2017 2 次提交
  15. 12 5月, 2017 1 次提交
  16. 09 5月, 2017 6 次提交
    • D
      Bump ORCA version to 2.30 · a4f89e1d
      Dhanashree Kashid 提交于
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      a4f89e1d
    • J
      Retain additional conditions while translating Correlated Nested Loop Join [#144468913] · 8a0552f0
      Jemish Patel and Jesse Zhang 提交于
      For a query using a correlated subselect as predicate, such as:
      
      ```
      CREATE TABLE partitioned_table (a int, pk int) DISTRIBUTED BY (a)
      	PARTITION BY range(pk) (end(5), end(10));
      CREATE TABLE other_table (c int, d int) DISTRIBUTED BY (c);
      INSERT INTO partitioned_table VALUES (1, 1), (2, 4), (3, 9);
      ANALYZE ROOTPARTITION partitioned_table;
      
      EXPLAIN SELECT pk from partitioned_table WHERE a > (SELECT d FROM other_table WHERE c = a) AND pk < 12;
      ```
      
      ORCA will generate a Correlated Nested Loop Left Outer Join, which
      should translate to a DXL Scan under a DXL SubPlan filter. However, the
      translation happened in the following order:
      
      0. Translate outer child (which has a filter) of correlated NLJ
      0. Build a DXL SubPlan using the inner child, which is intended to serve as an
         "additional filter condition" on top of the outer child
      0. Now based on the DXL plan for the outer child, we decide whether or not to
         use the additional condition (generated in #2 or #3) as a filter in the
         final result.
      0. If the outer child is a Physical Sequence, we discarded the condition
         assuming that filter condition is already present in the partition
         selector.
      0. This code to discard the subplan was added in
         e99325cc because, previously, we were always
         inserting additional filter as "the second child" based on the assumption
         that every DXL node has a filter child in the 2nd place. As it turned out,
         the DXL Sequence node is one counterexample: it has no filter, and it's
         second child is expected to be a partition selector.
      0. We didn't catch this error in e99325cc because the test cases had a trivial
         additional filter condition of a constant true, so dropping it didn't really
         raise any eyebrows. However, the generally correct approach should be to
         retain this additional condition, either as an additional DXL Result node on
         top of the DXL for the outer child, or for appropriate types of nodes,
         inline the condition into existing filters.
      
      This patch set fixes that.
      Signed-off-by: NJemish Patel <jpatel@pivotal.io>
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      8a0552f0
    • J
      More consistent ref-counting [#144468913] · b2fdaf9b
      Jemish Patel and Jesse Zhang 提交于
      We ensure all three cases of PdxlnCorrelatedNLJoin take ownership of the
      DXLProperties object
      b2fdaf9b
    • J
      54565e06
    • J
    • D
      Move predicate combination into PdxlnResultFromNLJoinOuter [#144468913] · ce347b4a
      0. There are cases where the additional scalar condition cannot be combined
         with the original condition in the DXL plan. Case in point: when the outer
         child gets translated into a DXL Sequence, we cannot put the "combined
         condition" into the sequence.
      0. Deferring the combination of conditions also gives us an optimization
         opportunity to reduce double translations.
      ce347b4a
  17. 30 4月, 2017 1 次提交
    • J
      Extend GPDB ICG with optimizer to an hour · c940456a
      Jesse Zhang 提交于
      installcheck has over the last year gotten slowly bloated. This test
      runs about 32 minutes when nothing else happens in Concourse. Given we
      also run the planner ICG in parallel, we'd better err on the safe side
      and extend this to an hour.
      
      [ci skip]
      c940456a
  18. 26 4月, 2017 5 次提交