1. 21 7月, 2017 1 次提交
    • H
      Cut down the runtime for CCostTest · 3a586479
      Haisheng Yuan 提交于
      Most of time is spent on running the 4 minidumps in CalibratedCostModel test.
      The 4 minidumps are already tested in other test suites. Moreover, it just
      executes the minidump, doesn't even compare the generaeted plan.
      
      CalibratedCostModel is default in GPDB right now, and most minidumps use
      calibrated cost model, that will also test CalibratedCostModel at the same
      time. So remove EresUnittest_CalibratedCostModel. Cut down the debug runtime
      of CCostTest from 814829 ms to 458 ms.
      3a586479
  2. 19 7月, 2017 2 次提交
  3. 15 7月, 2017 1 次提交
    • B
      Remove part oid (#186) · 2c139497
      Bhuvnesh 提交于
      * Donot generated PartOid Expression
      
      In GPDB, PartOidExpr is not used however ORCA still generates it.
      But HAWQ uses PartOid for sorting while inserting into Append Only
      Row / Parquet Partitioned tables.
      
      This patch uses Parquet Storage and Number of Partitions in a Append
      Only row partitioned table to decide if PartOid should be generated.
      In case of GPDB, Parquet storage is not supported and the GUC to control
      the number of partitions above which sort should be used is set to int
      max which is practically not feasible, so in case of GPDB PartOid expr
      will never be generated, however HAWQ can control the generation of
      PartOid based on the value of already existing GUCs in HAWQ.
      
      * Remove PartOid ProjElem from minidump files
      
      * Fixed CICGTest
      
      * Fix CDMLTest
      
      * Fix CDirectDispatchTest
      
      * Fix CPhysicalParallelUnionAllTest
      
      * Fix CCollapseProjectTest test
      
      * Fix parser for Partition Selector
      
      A Partition Selector node can have another partition selector node as
      its immediate child. In such cases, the current parsers fails. The patch
      fixes the issue
      
      * Fix PartTbl Test
      
      * PR Feedback Applied
      
      * Applied HSY feedback 1
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      
      * Bump ORCA to 2.37
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      2c139497
  4. 11 7月, 2017 3 次提交
    • V
      Update ORCA version · 73a7bffd
      Venkatesh Raghavan 提交于
      73a7bffd
    • V
      Convert Non-correlated EXISTS subquery to a LIMIT 1 AND a JOIN · e04ae39d
      Venkatesh Raghavan 提交于
      Enable GPORCA to generate better plans for non-correlated exists subquery in the WHERE clause
      
      Consider the following exists subquery, `(select * from bar)`. GPORCA generates an elaborate count based implementation of this subquery. If bar is a fact table, the count is going to be expensive.
      
      ```
      vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                                          QUERY PLAN
      ------------------------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice3; segments: 3)  (cost=0.00..1368262.79 rows=400324 width=8)
         ->  Nested Loop  (cost=0.00..1368250.86 rows=133442 width=8)
               Join Filter: true
               ->  Table Scan on foo  (cost=0.00..461.91 rows=133442 width=8)
                     Filter: a = b
               ->  Materialize  (cost=0.00..438.57 rows=1 width=1)
                     ->  Broadcast Motion 1:3  (slice2)  (cost=0.00..438.57 rows=3 width=1)
                           ->  Result  (cost=0.00..438.57 rows=1 width=1)
                                 Filter: (count((count()))) > 0::bigint
                                 ->  Aggregate  (cost=0.00..438.57 rows=1 width=8)
                                       ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..438.57 rows=1 width=8)
                                             ->  Aggregate  (cost=0.00..438.57 rows=1 width=8)
                                                   ->  Table Scan on bar  (cost=0.00..437.95 rows=332395 width=1)
       Optimizer status: PQO version 2.35.1
      (14 rows)
      ```
      Planner on the other hand uses LIMIT as shown in the INIT plan.
      
      ```
      vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                                 QUERY PLAN
      ------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice2; segments: 3)  (cost=0.03..13611.14 rows=1001 width=8)
         ->  Result  (cost=0.03..13611.14 rows=334 width=8)
               One-Time Filter: $0
               InitPlan  (slice3)
                 ->  Limit  (cost=0.00..0.03 rows=1 width=0)
                       ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..0.03 rows=1 width=0)
                             ->  Limit  (cost=0.00..0.01 rows=1 width=0)
                                   ->  Seq Scan on bar  (cost=0.00..11072.84 rows=332395 width=0)
               ->  Seq Scan on foo  (cost=0.00..13611.11 rows=334 width=8)
                     Filter: a = b
       Settings:  optimizer=off
       Optimizer status: legacy query optimizer
      (12 rows)
      ```
      
      While GPORCA doesnot support init-plan, we can nevertheless generate a better plan by using LIMIT instead of count. After this PR, GPORCA will generate the following plan with LIMIT clause.
      
      ```
      vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                                       QUERY PLAN
      ------------------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice3; segments: 3)  (cost=0.00..1368262.73 rows=400324 width=8)
         ->  Nested Loop EXISTS Join  (cost=0.00..1368250.80 rows=133442 width=8)
               Join Filter: true
               ->  Table Scan on foo  (cost=0.00..461.91 rows=133442 width=8)
                     Filter: a = b
               ->  Materialize  (cost=0.00..438.57 rows=1 width=1)
                     ->  Broadcast Motion 1:3  (slice2)  (cost=0.00..438.57 rows=3 width=1)
                           ->  Limit  (cost=0.00..438.57 rows=1 width=1)
                                 ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..438.57 rows=1 width=1)
                                       ->  Limit  (cost=0.00..438.57 rows=1 width=1)
                                             ->  Table Scan on bar  (cost=0.00..437.95 rows=332395 width=1)
       Optimizer status: PQO version 2.35.1
      (12 rows)
      ```
      e04ae39d
    • B
      Bump ORCA version to 2.35.2 · 4ad9ce70
      Bhunvesh Chaudhary 提交于
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      4ad9ce70
  5. 06 7月, 2017 2 次提交
    • O
      Fix the behaviour of FBetterThan for Random vs Hashed · bbbbd699
      Omer Arap 提交于
      There is a tie breaker logic in FBetterThan function in CCostContext.
      According to the code if the distribution is Hashed rather than Random,
      it should be favored when the costs are equal.
      
      The code was checking both if the distribution spec in the same context
      is both equal to Hashed and Random which is false by default. It should
      check if comparing one is Hashed and compared one is Random for correct
      behavior.
      bbbbd699
    • B
      Bump ORCA version to 2.35.0 · 900a586f
      Bhuvnesh Chaudhary 提交于
      There are changes to regress tests in GPDB repository, so bumping up the
      minor version.
      900a586f
  6. 01 7月, 2017 1 次提交
  7. 30 6月, 2017 1 次提交
    • H
      Get rid of UlSafeLength() function. · 25c2b4dd
      Heikki Linnakangas 提交于
      It made the assumption that it's OK to call it on a NULL pointer, which
      isn't cool with all C++ compilers and options. I'm getting a bunch of
      warnings like this because of it:
      
      /home/heikki/gpdb/optimizer-main/libgpos/include/gpos/common/CDynamicPtrArray.inl:382:3: warning: nonnull argument ‘this’ compared to NULL [-Wnonnull-compare]
         if (NULL == this)
         ^~
      
      There are a few other places that produce the same error, but one step at
      a time. This is most important because it's in an inline function, so this
      produces warnings also in any code that uses ORCA, like the translator code
      in GPDB's src/backend/gpopt/ directory, not just ORCA itself.
      
      Since the function is now gone, all references to it also need to be removed
      from the translator code outside ORCA.
      
      Bump up Orca version to 2.34.2
      25c2b4dd
  8. 28 6月, 2017 1 次提交
  9. 20 6月, 2017 2 次提交
    • O
      Remove .inl files and merge implementation in .h · 80305df7
      Omer Arap 提交于
      The template classes in orca mostly do the implementation in
      `.inl` files while some implementation also exists in `.h` files.
      It makes it hard to traverse the code in .inl files since some
      IDEs do not recognize formatting. Therefore this commit moves
      the implementation to the `.h` files wherever is applicable.
      
      This commit does not port implementation from `.inl` where
      there exists `.cpp` implementation file as well as `.h` only has
      function definitions such as `CUtils.h`.
      80305df7
    • V
      Update Join Cardinality Estimation for Text/bpchar/varchar/char columns · 6567f566
      Venkatesh Raghavan 提交于
      Histogram intersection depends on the value of the bucket boundaries.
      For datatypes like text, varchar, etc. Orca currently uses a hash function
      to mark bucket boundaries. This function is slightly useful for equality
      with singleton buckets but nothing more. So the previous join stats computation
      based on histogram intersection is totally bogus. In this CL, we now modified
      it into a NDV (number of distinct values) based estimation.
      6567f566
  10. 18 5月, 2017 1 次提交
  11. 15 5月, 2017 1 次提交
  12. 09 5月, 2017 1 次提交
  13. 26 4月, 2017 2 次提交
  14. 22 4月, 2017 3 次提交
  15. 21 4月, 2017 1 次提交
  16. 13 4月, 2017 1 次提交
  17. 12 4月, 2017 2 次提交
  18. 11 4月, 2017 1 次提交
  19. 05 4月, 2017 1 次提交
  20. 04 4月, 2017 1 次提交
    • H
      Remove dead code about SharedScan · c6838425
      Haisheng Yuan 提交于
      When I was trying to understand how does Orca generate plan for CTE using
      shared input scan, I found that the share input scan is generated during CTE
      producer & consumer DXL node to PlannedStmt translation stage, instead of Expr
      to DXL  stage inside Orca. It turns out CDXLPhysicalSharedScan is not used
      anywhere, so remove all the related dead code.
      c6838425
  21. 01 4月, 2017 1 次提交
    • H
      Refactor Expr to DXL translator for list partition selector predicates · cdc45d92
      Haisheng Yuan 提交于
      Previously when the final plan is translated into DXL, Orca uses range
      predicates to represent the values of the list partition (range start and end
      value equal a single list value), which is error-prone and redundant for the
      query executor.
      
      In this patch, we use the following way to translate the predicates of the list
      partition selector in DXL (range based partition remain unchanged):
      1. PK op Scalar -> Scalar reverse(op) Any(PartListValues)
      For partition selector predicate `pk1 < 5`, will be translated to `ArrayComp(5 >
      Any(PartListValues))`, which means as long as any value of list partition values
      is less than 5, the partition will be selected.
      2. PK is (not) NULL -> PartListNullTest
      3. Propagation Expression ->
         Const1 = Any(PartListValues) or Const2 = Any(PartListValues) ...
      
      [#140699737]
      Closes #149
      cdc45d92
  22. 31 3月, 2017 2 次提交
  23. 29 3月, 2017 1 次提交
  24. 21 3月, 2017 1 次提交
  25. 17 3月, 2017 1 次提交
    • H
      Fix bug that can't generate equality partition filters for multilevel partition table · e99b9fef
      Haisheng Yuan 提交于
      The equality parition filter of partition selector works very well for single
      level partition table, but if the table has multilevel partitions, e.g. 2
      levels with pk1, pk2 as the partition key for level 1 and 2, and there is a
      equality predicate in the query, say pk1 = 2, then level 2 equality filter is null,
      the function `FEqPartFiltersAllLevels` will return false, causing the equality
      predicate put into PartFilters instead of PartEqFilters. This bug has been
      fixed in the patch.
      
      [#141826453]
      e99b9fef
  26. 16 3月, 2017 1 次提交
  27. 08 3月, 2017 2 次提交
  28. 07 3月, 2017 1 次提交
  29. 23 2月, 2017 1 次提交