1. 12 10月, 2017 1 次提交
  2. 06 10月, 2017 4 次提交
  3. 05 10月, 2017 1 次提交
    • V
      Fixing incorrect asserts in CXformUtils · bded4a81
      Venkatesh Raghavan 提交于
      There was a fairly classic bug / typo where the assertion would never
      fail because we put an enum member (non-zero) in a boolean context.
      
      Even though the method in question is generically named
      `TransformImplementBinaryOp`, it's actually only on the code path of
      transforming a physical nested loop (non-correlated, non-indexed) join.
      
      This commit adds back all types of eligible nested loop joins into the
      assertion.
      Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
      bded4a81
  4. 29 9月, 2017 1 次提交
  5. 27 9月, 2017 2 次提交
    • O
      Update space size of a minidump · f6a453c1
      Omer Arap 提交于
      f6a453c1
    • O
      Reorder preprocessing steps to avoid false trimming of existential subqueries [#150988530] · bec7b6af
      Omer Arap 提交于
      Orca remove outer references from order grouping columns in GbAgg. Orca also
      trims an existential subquery whose inner expression is a GbAgg  with no grouping
      columns by replacing it with a Boolean constant. However in same caes
      applying the first preprocessing step affects the latter and produces an
      unintended trimming of existential subqueries.
      
      This commit changes the order of these two preprocessing step to avoid
      that complications.
      
      E.g
      ```
      SELECT * from foo where not exits (SELECT sum(bar.a) from bar where
      foo.a = bar.a GROUP BY foo.a;
      ```
      
      In this example the grouping column is an outer reference and it is
      removed by `PexprRemoveSuperfluousOuterRefs`. And the next preprocessing
      step `PexprTrimExistentialSubqueries` sees that the GbAgg has no
      grouping colums and implys `NOT EXISTS` as `false`.
      
      Therefore we change the order and this fixes the problem.
      
      Bump version to 2.46.3
      bec7b6af
  6. 23 9月, 2017 1 次提交
    • B
      Push ORCA src to bintray repository for conan · f395a029
      Bhuvnesh 提交于
      ORCA source will be pushed to bintray repository for each version bump.
      Developers can use conan dependency manager to build orca before
      building GPDB via minimal steps and need not worry about the version
      dependency.
      As we currently bump orca version used by GPDB, we will ensure that
      conanfile.txt gets updated for each bump of ORCA.
      
      To build orca from GPDB:
      Step 1: cd <path_to_gpdb>/depends
      Step 2: env CONAN_CMAKE_GENERATOR=Ninja conan install -s build_type=Debug --build
      where: build_type can be `Debug` or `Release`
      Signed-off-by: NJemish Patel <jpatel@pivotal.io>
      f395a029
  7. 20 9月, 2017 1 次提交
  8. 19 9月, 2017 3 次提交
  9. 16 9月, 2017 1 次提交
    • B
      Incorrect Decorrelation results in wrong plan · 94e509d0
      Bhuvnesh Chaudhary 提交于
      While attempting to decorrelate the subquery, we were
      incorrectly pulling up the join before calculating the window function
      of the results of the join. In cases where we have subqueries with
      window function and the subquery has outer references we should not
      attempt decorrelating it.
      
      Ex: select C.j from C where C.i in (select rank() over (order by B.i) from B where B.i=C.i) order by C.j;
      The above subquery has outer references and result of window function is projected from subquery
      
      There are further optimization which can be done in case of existential
      queries but this PR fixes the plan.
      Signed-off-by: NJemish Patel <jpatel@pivotal.io>
      Signed-off-by: NJemish Patel <jpatel@pivotal.io>
      94e509d0
  10. 15 9月, 2017 4 次提交
    • V
      GPORCA incorrectly collapsing FOJ with condition false · 30cfe889
      Venkatesh Raghavan 提交于
      Prior to this fix, the logic that calculated max cardinality for each
      logical operator assumed that the Full Outer Join with condition false will always
      return empty set. This was used by the following preprocessing step
      
      CExpressionPreprocessor::PexprPruneEmptySubtrees
      
      to eliminate the FOJ subtrees (since it thought had zero output cardinality),
      replacing them with a const table get and zero tuples
      
      ```
      vraghavan=# explain select * from foo a full outer join foo b on false;
                         QUERY PLAN
      ------------------------------------------------
       Result  (cost=0.00..0.00 rows=0 width=8)
         ->  Result  (cost=0.00..0.00 rows=0 width=8)
               One-Time Filter: false
       Optimizer status: PQO version 2.43.1
      (4 rows)
      ```
      This collapsing is incorrect. In this CL, the max cardinality logic has been
      fixed to ensure that GPORCA plan generates correct plan.
      
      After the fix:
      
      ```
      vraghavan=# explain select * from foo a full outer join foo b on false;
                                                                  QUERY PLAN
      ----------------------------------------------------------------------------------------------------------------------------------
       Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..2207585.75 rows=35 width=8)
         ->  Result  (cost=0.00..2207585.75 rows=12 width=8)
               ->  Sequence  (cost=0.00..2207585.75 rows=12 width=8)
                     ->  Shared Scan (share slice:id 2:0)  (cost=0.00..431.00 rows=2 width=1)
                           ->  Materialize  (cost=0.00..431.00 rows=2 width=1)
                                 ->  Table Scan on foo  (cost=0.00..431.00 rows=2 width=34)
                     ->  Sequence  (cost=0.00..2207154.75 rows=12 width=8)
                           ->  Shared Scan (share slice:id 2:1)  (cost=0.00..431.00 rows=2 width=1)
                                 ->  Materialize  (cost=0.00..431.00 rows=2 width=1)
                                       ->  Table Scan on foo  (cost=0.00..431.00 rows=2 width=34)
                           ->  Append  (cost=0.00..2206723.75 rows=12 width=8)
                                 ->  Nested Loop Left Join  (cost=0.00..882691.26 rows=10 width=68)
                                       Join Filter: false
                                       ->  Shared Scan (share slice:id 2:0)  (cost=0.00..431.00 rows=2 width=34)
                                       ->  Result  (cost=0.00..0.00 rows=0 width=34)
                                             One-Time Filter: false
                                 ->  Result  (cost=0.00..1324032.49 rows=2 width=68)
                                       ->  Nested Loop Left Anti Semi Join  (cost=0.00..1324032.49 rows=2 width=34)
                                             Join Filter: false
                                             ->  Shared Scan (share slice:id 2:1)  (cost=0.00..431.00 rows=2 width=34)
                                             ->  Materialize  (cost=0.00..431.00 rows=5 width=1)
                                                   ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=5 width=1)
                                                         ->  Result  (cost=0.00..431.00 rows=2 width=1)
                                                               ->  Shared Scan (share slice:id 1:0)  (cost=0.00..431.00 rows=2 width=1)
       Optimizer status: PQO version 2.43.1
      (25 rows)
      ```
      30cfe889
    • O
      Bump Orca version to 2.44.0 · f55e6f70
      Omer Arap 提交于
      f55e6f70
    • O
      Minidump updates · 8df82709
      Omer Arap 提交于
      8df82709
    • O
      Only request stats of columns needed for cardinality estimation [#150424379] · 05a26924
      Omer Arap 提交于
      GPORCA should not spend time extracting column statistics that are not
      needed for cardinality estimation. This commit eliminates this overhead
      of requesting and generating the statistics for columns that are not
      used in cardinality estimation unnecessarily.
      
      E.g:
      `CREATE TABLE foo (a int, b int, c int);`
      
      For table foo, the query below only needs for stats for column `a` which
      is the distribution column and column `c` which is the column used in
      where clause.
      `select * from foo where c=2;`
      
      However, prior to that commit, the column statistics for column `b` is
      also calculated and passed for the cardinality estimation. The only
      information needed by the optimizer is the `width` of column `b`. For
      this tiny information, we transfer every stats information for that
      column.
      
      This commit and its counterpart commit in GPDB ensures that the column
      width information is passed and extracted in the `dxl:Relation` metadata
      information.
      
      Preliminary results for short running queries provides up to 65x
      performance improvement.
      Signed-off-by: NJemish Patel <jpatel@pivotal.io>
      05a26924
  11. 12 9月, 2017 4 次提交
    • J
      Use col position from table descriptor for index key retrieval · 7ccedc81
      Jemish Patel 提交于
      Index keys are relative to the relation and the list of columns in
      `pdrgpcr` is relative to the table descriptor and does not include any
      dropped columns. Consider the case where we had 20 columns in a table. We
      create an index that covers col # 20 as one of its keys. Then we drop
      columns 10 through 15. Now the index key still points to col #20 but the
      column ref list in `pdrgpcr` will only have 15 elements in it and cause
      ORCA to crash with an `Out of Bounds` exception when
      `CLogical::PosFromIndex()` gets called.
      
      This commit fixes that issue by using the index key as the position to
      retrieve the column from the relation. Use the column's `attno` to get
      the column's current position relative to the table descriptor `ulPos`.
      Then uses this `ulPos` to retrieve the `CColRef` of the index key
      column as shown below:
      
      ```
      CColRef *pcr = (*pdrgpcr)[ulPos];
      ```
      
      We also added the 2 test cases below to test for the above condition:
      
      1. `DynamicIndexGetDroppedCols`
      2. `LogicalIndexGetDroppedCols`
      
      Both of the above test cases use a table with 30 columns; create a btree
      index on 6 columns and then drop 7 columns so the table only has 23
      columns.
      
      This commit also bumps ORCA to version 2.43.1
      7ccedc81
    • B
      Bump ORCA version · 9dec59e0
      Bhuvnesh Chaudhary 提交于
      9dec59e0
    • B
    • B
      Added handling for winstar & winagg fields · c8b1e7f5
      Bhuvnesh Chaudhary 提交于
      With commit 387c485d winstar and winagg
      fields were added in WindowRef Node, so this commit adds handling for
      them in ORCA.
      c8b1e7f5
  12. 09 9月, 2017 6 次提交
  13. 08 9月, 2017 1 次提交
  14. 07 9月, 2017 1 次提交
    • D
      Enable Index Scan when leaf partitions are queried directly (#219) · 4019b25a
      Dhanashree Kashid 提交于
      Currently ORCA does not support index scan on leaf partitions when leaf partitions are queried directly. It only supports index scan if we query the root table. This PR along with the corresponding GPDB PR adds the support for using indexes when leaf partitions are queried
      directly.
      
      When a root table that has indexes (either homogenous/complete or
      heterogenous/partial) is queried; the Relcache Translator sends index
      information to ORCA. This enables ORCA to generate an alternative plan with
      Dynamic Index Scan on all partitions (in case of homogenous index) or a plan
      with partial scan i.e. Dynamic Table Scan on leaf partitions that don’t have
      indexes + Dynamic Index Scan on leaf partitions with indexes (in case of
      heterogeneous index).
      
      This is a two step process in Relcache Translator as described below:
      Step 1 - Get list of all index oids
      CTranslatorRelcacheToDXL::PdrgpmdidRelIndexes() performs this step and it
      only retrieves indexes on root and regular tables; for leaf partitions it bails
      out.
      
      Now for root, list of index oids is nothing but index oids on its leaf
      partitions. For instance:
      
      CREATE TABLE foo ( a int, b int, c int, d int) DISTRIBUTED by (a) PARTITION
      BY RANGE(b) (PARTITION p1 START (1) END (10) INCLUSIVE, PARTITION p2 START (11)
      END (20) INCLUSIVE);
      
      CREATE INDEX complete_c on foo USING btree (c);
      CREATE INDEX partial_d on foo_1_prt_p2 using btree(d);
      The index list will look like = { complete_c_1_prt_p1, partial_d }
      
      For a complete index, the index oid of the first leaf partitions is retrieved.
      If there are partial indexes, all the partial index oids are retrieved.
      
      Step 2 - Construct Index Metadata object
      CTranslatorRelcacheToDXL::Pmdindex() performs this step.
      
      For each index oid retrieved in Step #1 above; construct an Index Metadata
      object (CMDIndexGPDB) to be stored in metadata cache such that ORCA can get all
      the information about the index.
      Along with all other information about the index, CMDIndexGPDB also contains
      a flag fPartial which denotes if the given index is homogenous (ORCA
      will apply it to all partitions selected by partition selector) or heterogenous
      (the index will be applied to only appropriate partitions).
      The process is as follows:
      ```
              Foreach oid in index oid list :
                      Get index relation (rel)
                      If rel is a leaf partition :
                              Get the root rel of the leaf partition
                              Get all the indexes on the root (this will be same list as step #1)
                              Determine if the current index oid is homogenous or heterogenous
                              Construct CMDIndexGPDB based appropriately (with fPartial, part constraint,
                              defaultlevels info)
                      Else:
                              Construct a normal CMDIndexGPDB object.
      ```
      Now for leaf partitions, there is no notion of homogenous or heterogenous
      indexes since a leaf partition is like a regular table. Hence in Pmdindex()
      we should not got for checking if index is complete or not.
      
      Additionally, If a given index is homogenous or heterogenous needs to be
      decided from the perspective of relation we are querying(such as root or a
      leaf).
      
      Hence the right place of fPartial flag is in the relation metadata object
      (CMDRelationGPDB) and not the independent Index metadata object (CMDIndexGPDB).
      This commit makes following changes to support index scan on leaf partitions
      along with partial scans :
      
      Relcache Translator:
      
      In Step1, retrieve the index information on the leaf partition and create a
      list of CMDIndexInfo object which contain the index oid and fPartial flag.
      Step 1 is the place where we know what relation we are querying which enable us
      to determine whether or not the index is homogenous from the context of the
      relation.
      
      The relation metadata tag will look like following after this change:
      
      Before:
      ```
              <dxl:Indexes>
                      <dxl:Index Mdid="0.17159874.1.0"/>
                      <dxl:Index Mdid="0.17159920.1.0"/>
              </dxl:Indexes>
      ```
      After:
      ```
              <dxl:IndexInfoList>
                      <dxl:IndexInfo Mdid="0.17159874.1.0" IsPartial="true"/>
                      <dxl:IndexInfo Mdid="0.17159920.1.0" IsPartial="false"/>
              </dxl:IndexInfoList>
      ```
      ORCA changes:
      
      A new class CMDIndexInfoGPDB has been created in ORCA which contains index mdid and fPartial flag. For external tables, normal tables and leaf partitions; the fPartial flag will always be false.
      CMDRelationGPDB will contain array of CMDIndexInfoGPDB instead of simple index mdid array.
      Add a new parsehandler to parse IndexInfoList and IndexInfo to create an array of CMDIndexInfoGPDB.
      Update the existing mini dumps to remove fPartial flag from Index metatada tag and associate it with IndexInfo tag under Relation metadata.
      Add new test scenarios for querying the leaf partition with homogenous/heterogenous index on root table.
      4019b25a
  15. 02 9月, 2017 4 次提交
    • D
      Bump ORCA version to 2.41.0 · 042471ca
      Dhanashree Kashid 提交于
      Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
      042471ca
    • D
      Update the minidumps · cba22e80
      Dhanashree Kashid 提交于
      Now we send the part constraint expression only in the cases below:
      
                  IsPartTable     Index   DefaultParts   ShouldSendPartConstraint
                  NO              -       -              -
                  YES             YES     YES/NO         YES
                  YES             NO      NO             NO
                  YES             NO      YES            YES (but only default levels info)
      
      This commit updates the minidumps accordingly.
      1. If the Relation tag has indices then keep the part constraint tag
      2. If the Relation tag has no indices and no default partitions; remove
      the entire part constraint tag
      3. If the Relation tag has no indices but has default partitions at any
      level then keep the part constraint tag but remove the scalar expression
      4. Regenerated the following stale minidumps:
         * DynamicIndexScan-Homogenous.mdp
         * DynamicIndexScan-Heterogenous-Union.mdp
         * DynamicBitmapTableScan-Basic.mdp
      5. Added four more test cases to CPartTblTest demonstrating the table
      above.
      
      [Ref #149769559]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      Signed-off-by: NOmer Arap <oarap@pivotal.io>
      cba22e80
    • J
      Don't serialize part constraint expr if no indices · deb78d49
      Jemish Patel 提交于
      Do not serialize and de-serialize the part constraint expression
      when there are no indices on the partitioned rel.
      
      The relacache translator in GPDB will send empty part constraint
      expression when the rel has no indices.
      
      [Ref #149769559]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      deb78d49
    • O
      Remove dead code · a26b4eb0
      Omer Arap 提交于
      We never send null part constraints from the relacache translator
      hence we do not handling for the same.
      
      This code was probably added to support the older minidumps. There are
      a few very old minidump files which do not contain the part constraint
      tag in relation tag.
      
      Now with the fix on the relcache translator side in GPDB, the only case
      when we send NULL part constraints is when there are no indices and no
      default partitions; we still don't need null check for part constraint in
      this case because the `fDummyConstraint` will always be true.
      
      [Ref #149769559]
      Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
      a26b4eb0
  16. 29 8月, 2017 1 次提交
    • O
      Convert IN subq to EXISTS subq with pred [##149683475] · 0037ed8f
      Omer Arap 提交于
      This commit adds a preprocessing step to change the expression tree when
      there is an IN subquery with a project list that includes an outer
      reference but no columns is included from the project's relational
      child. This preprocessing helps ORCA to decorrelate the subquery. Orca
      currently does not support directly decorrelating IN subqueries if there
      is an outer reference in the CLogicalProject. Converting an IN subquery
      to a predicate AND EXISTS subquery helps Orca generate more with
      decorrelated subquery option.
      
      Below is an example of the preprocessing applied in this commit.
      
      Before preprocessing:
      ```
         +--CScalarSubqueryAny(=)["?column?" (17)]
            |--CLogicalProject
            |  |--CLogicalGet "bar" ("bar"), Columns: ["c" (9)]}
            |  +--CScalarProjectList
            |     +--CScalarProjectElement "?column?" (17)
            |        +--CScalarOp (+)
            |           |--CScalarIdent "b" (1)
            |           +--CScalarConst (1)
            +--CScalarIdent "a" (0)
      ```
      After:
      ```
         +--CScalarBoolOp (EboolopAnd)
            |--CScalarOp (=)
            |  |--CScalarIdent "a" (0)
            |  +--CScalarOp (+)
            |     |--CScalarIdent "b" (1)
            |     +--CScalarConst (1)
            +--CScalarSubqueryExists
               +--CLogicalGet "bar" ("bar"), Columns: ["c" (9)] }
      ```
      
      This commit bumps Orca version to 2.40.3
      Signed-off-by: NJemish Patel <jpatel@pivotal.io>
      0037ed8f
  17. 28 8月, 2017 1 次提交
  18. 26 8月, 2017 1 次提交
  19. 24 8月, 2017 1 次提交
  20. 10 8月, 2017 1 次提交