提交 · 160d60b2869b8a11a3cac1f197d222228e00e94a · Greenplum / Gpdb

12 10月, 2017 1 次提交

Fix incorrect check in PciIntervalFromScalarIDF() · 160d60b2

由 Shreedhar Hardikar 提交于 10月 09, 2017

When using CConstraintInterval to derive partition filters, we cannot
use a casted Ident, since the cast information is lost. This tripped an
assertion. Also implement FIdentIDFConst().

160d60b2

06 10月, 2017 4 次提交

S

Bump ORCA version. · f6d3ccae
由 Shreedhar Hardikar 提交于 9月 25, 2017

f6d3ccae
S
Ctest changes for IDF in partition selection · 6ddaecec
由 Shreedhar Hardikar 提交于 9月 22, 2017
```
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
```
6ddaecec

Handle Scalar IDFs in PciIntervalFromScalarExpr. · 1fd0a5e2

由 Shreedhar Hardikar 提交于 9月 22, 2017

Add functionality to return a Constraint Interval for an expression of
the form col IS DISTINCT FROM const.
Also remove an unneccesary check in PexprPredicateCol() for the case of
IS NOT NULL, since it is now supported. But this meant redundant IS NOT
NULL filters, so take care of that also.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

1fd0a5e2

S
Implement DbgPrint() for CConstraint and CConstraintInterval · feb9927c
由 Shreedhar Hardikar 提交于 9月 22, 2017
```
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
```
feb9927c

05 10月, 2017 1 次提交

Fixing incorrect asserts in CXformUtils · bded4a81

由 Venkatesh Raghavan 提交于 10月 04, 2017

There was a fairly classic bug / typo where the assertion would never
fail because we put an enum member (non-zero) in a boolean context.

Even though the method in question is generically named
`TransformImplementBinaryOp`, it's actually only on the code path of
transforming a physical nested loop (non-correlated, non-indexed) join.

This commit adds back all types of eligible nested loop joins into the
assertion.
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>

bded4a81

29 9月, 2017 1 次提交
- B
  
  Updated pipeline to build and test orca using conan · ee724dbf
  由 Bhuvnesh Chaudhary 提交于 9月 27, 2017
  
  ee724dbf
27 9月, 2017 2 次提交

O

Update space size of a minidump · f6a453c1
由 Omer Arap 提交于 9月 26, 2017

f6a453c1

Reorder preprocessing steps to avoid false trimming of existential subqueries [#150988530] · bec7b6af

由 Omer Arap 提交于 9月 21, 2017

Orca remove outer references from order grouping columns in GbAgg. Orca also
trims an existential subquery whose inner expression is a GbAgg  with no grouping
columns by replacing it with a Boolean constant. However in same caes
applying the first preprocessing step affects the latter and produces an
unintended trimming of existential subqueries.

This commit changes the order of these two preprocessing step to avoid
that complications.

E.g
```
SELECT * from foo where not exits (SELECT sum(bar.a) from bar where
foo.a = bar.a GROUP BY foo.a;
```

In this example the grouping column is an outer reference and it is
removed by `PexprRemoveSuperfluousOuterRefs`. And the next preprocessing
step `PexprTrimExistentialSubqueries` sees that the GbAgg has no
grouping colums and implys `NOT EXISTS` as `false`.

Therefore we change the order and this fixes the problem.

Bump version to 2.46.3

bec7b6af

23 9月, 2017 1 次提交

Push ORCA src to bintray repository for conan · f395a029

由 Bhuvnesh 提交于 9月 20, 2017

ORCA source will be pushed to bintray repository for each version bump.
Developers can use conan dependency manager to build orca before
building GPDB via minimal steps and need not worry about the version
dependency.
As we currently bump orca version used by GPDB, we will ensure that
conanfile.txt gets updated for each bump of ORCA.

To build orca from GPDB:
Step 1: cd <path_to_gpdb>/depends
Step 2: env CONAN_CMAKE_GENERATOR=Ninja conan install -s build_type=Debug --build
where: build_type can be `Debug` or `Release`
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

f395a029

20 9月, 2017 1 次提交
- J
  Mention Ninja · 8557a671
  由 Jesse Zhang 提交于 9月 19, 2017
```
Now that we all are using it.
[ci skip]
```
  8557a671
19 9月, 2017 3 次提交

Mark Raise functions with noreturn attribute. · e21a80b4

由 Heikki Linnakangas 提交于 9月 19, 2017

This potentially allows the compiler to make some optimizations, or at
give better warnings, e.g. about using variables uninitialized.

e21a80b4

Map GPOS severity level to GPDB Severity Levels · 3fd5086d

由 Bhuvnesh Chaudhary 提交于 9月 12, 2017

GPOS raises exception with different severity level, but
they were being logged to GPDB logs at LOG severity level.

This is the initial commit which introduces the functionality. If an
exception is created with debug log level, these messages will be logged
to GPBD with debug1 level rest will be logged as LOG level.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

3fd5086d

Decorrelate queries with Aggregate Window Func · 0a547ac0

由 Bhuvnesh 提交于 9月 15, 2017

If the correlated subquery has aggregate window functions we can
pull up the quals and aggregate function as a the condition of the
join between outer and inner query
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

0a547ac0

16 9月, 2017 1 次提交

Incorrect Decorrelation results in wrong plan · 94e509d0

由 Bhuvnesh Chaudhary 提交于 9月 15, 2017

While attempting to decorrelate the subquery, we were
incorrectly pulling up the join before calculating the window function
of the results of the join. In cases where we have subqueries with
window function and the subquery has outer references we should not
attempt decorrelating it.

Ex: select C.j from C where C.i in (select rank() over (order by B.i) from B where B.i=C.i) order by C.j;
The above subquery has outer references and result of window function is projected from subquery

There are further optimization which can be done in case of existential
queries but this PR fixes the plan.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

94e509d0

15 9月, 2017 4 次提交

GPORCA incorrectly collapsing FOJ with condition false · 30cfe889

由 Venkatesh Raghavan 提交于 9月 14, 2017

Prior to this fix, the logic that calculated max cardinality for each
logical operator assumed that the Full Outer Join with condition false will always
return empty set. This was used by the following preprocessing step

CExpressionPreprocessor::PexprPruneEmptySubtrees

to eliminate the FOJ subtrees (since it thought had zero output cardinality),
replacing them with a const table get and zero tuples

```
vraghavan=# explain select * from foo a full outer join foo b on false;
                   QUERY PLAN
------------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=8)
   ->  Result  (cost=0.00..0.00 rows=0 width=8)
         One-Time Filter: false
 Optimizer status: PQO version 2.43.1
(4 rows)
```
This collapsing is incorrect. In this CL, the max cardinality logic has been
fixed to ensure that GPORCA plan generates correct plan.

After the fix:

```
vraghavan=# explain select * from foo a full outer join foo b on false;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..2207585.75 rows=35 width=8)
   ->  Result  (cost=0.00..2207585.75 rows=12 width=8)
         ->  Sequence  (cost=0.00..2207585.75 rows=12 width=8)
               ->  Shared Scan (share slice:id 2:0)  (cost=0.00..431.00 rows=2 width=1)
                     ->  Materialize  (cost=0.00..431.00 rows=2 width=1)
                           ->  Table Scan on foo  (cost=0.00..431.00 rows=2 width=34)
               ->  Sequence  (cost=0.00..2207154.75 rows=12 width=8)
                     ->  Shared Scan (share slice:id 2:1)  (cost=0.00..431.00 rows=2 width=1)
                           ->  Materialize  (cost=0.00..431.00 rows=2 width=1)
                                 ->  Table Scan on foo  (cost=0.00..431.00 rows=2 width=34)
                     ->  Append  (cost=0.00..2206723.75 rows=12 width=8)
                           ->  Nested Loop Left Join  (cost=0.00..882691.26 rows=10 width=68)
                                 Join Filter: false
                                 ->  Shared Scan (share slice:id 2:0)  (cost=0.00..431.00 rows=2 width=34)
                                 ->  Result  (cost=0.00..0.00 rows=0 width=34)
                                       One-Time Filter: false
                           ->  Result  (cost=0.00..1324032.49 rows=2 width=68)
                                 ->  Nested Loop Left Anti Semi Join  (cost=0.00..1324032.49 rows=2 width=34)
                                       Join Filter: false
                                       ->  Shared Scan (share slice:id 2:1)  (cost=0.00..431.00 rows=2 width=34)
                                       ->  Materialize  (cost=0.00..431.00 rows=5 width=1)
                                             ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=5 width=1)
                                                   ->  Result  (cost=0.00..431.00 rows=2 width=1)
                                                         ->  Shared Scan (share slice:id 1:0)  (cost=0.00..431.00 rows=2 width=1)
 Optimizer status: PQO version 2.43.1
(25 rows)
```

30cfe889

O

Bump Orca version to 2.44.0 · f55e6f70
由 Omer Arap 提交于 9月 11, 2017

f55e6f70
O

Minidump updates · 8df82709
由 Omer Arap 提交于 9月 08, 2017

8df82709

Only request stats of columns needed for cardinality estimation [#150424379] · 05a26924

由 Omer Arap 提交于 8月 28, 2017

GPORCA should not spend time extracting column statistics that are not
needed for cardinality estimation. This commit eliminates this overhead
of requesting and generating the statistics for columns that are not
used in cardinality estimation unnecessarily.

E.g:
`CREATE TABLE foo (a int, b int, c int);`

For table foo, the query below only needs for stats for column `a` which
is the distribution column and column `c` which is the column used in
where clause.
`select * from foo where c=2;`

However, prior to that commit, the column statistics for column `b` is
also calculated and passed for the cardinality estimation. The only
information needed by the optimizer is the `width` of column `b`. For
this tiny information, we transfer every stats information for that
column.

This commit and its counterpart commit in GPDB ensures that the column
width information is passed and extracted in the `dxl:Relation` metadata
information.

Preliminary results for short running queries provides up to 65x
performance improvement.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

05a26924

12 9月, 2017 4 次提交

Use col position from table descriptor for index key retrieval · 7ccedc81

由 Jemish Patel 提交于 9月 08, 2017

Index keys are relative to the relation and the list of columns in
`pdrgpcr` is relative to the table descriptor and does not include any
dropped columns. Consider the case where we had 20 columns in a table. We
create an index that covers col # 20 as one of its keys. Then we drop
columns 10 through 15. Now the index key still points to col #20 but the
column ref list in `pdrgpcr` will only have 15 elements in it and cause
ORCA to crash with an `Out of Bounds` exception when
`CLogical::PosFromIndex()` gets called.

This commit fixes that issue by using the index key as the position to
retrieve the column from the relation. Use the column's `attno` to get
the column's current position relative to the table descriptor `ulPos`.
Then uses this `ulPos` to retrieve the `CColRef` of the index key
column as shown below:

```
CColRef *pcr = (*pdrgpcr)[ulPos];
```

We also added the 2 test cases below to test for the above condition:

1. `DynamicIndexGetDroppedCols`
2. `LogicalIndexGetDroppedCols`

Both of the above test cases use a table with 30 columns; create a btree
index on 6 columns and then drop 7 columns so the table only has 23
columns.

This commit also bumps ORCA to version 2.43.1

7ccedc81

B

Bump ORCA version · 9dec59e0
由 Bhuvnesh Chaudhary 提交于 9月 10, 2017

9dec59e0
B

Added WindowStarArg and WindowSimpleAgg attributes in minidumps · 23789420
由 Bhuvnesh Chaudhary 提交于 9月 10, 2017

23789420

Added handling for winstar & winagg fields · c8b1e7f5

由 Bhuvnesh Chaudhary 提交于 9月 10, 2017

With commit 387c485d winstar and winagg
fields were added in WindowRef Node, so this commit adds handling for
them in ORCA.

c8b1e7f5

09 9月, 2017 6 次提交
- H
  Revert "Remove unused COstreamFile." · 07e6fe20
  由 Heikki Linnakangas 提交于 9月 09, 2017
```
Turns out that COstreamFile was still used by GPDB's ORCA translator code,
so it wasn't quite dead yet, after all.

This reverts commit 770dd5db.

Bump version to 2.42.3
```
  07e6fe20
- H
  Remove unused libgpos network wrapper code. · eefd9eaf
  由 Heikki Linnakangas 提交于 9月 09, 2017
```
Bump version to 2.42.2
```
  eefd9eaf
- H
  
  Remove unused CLoggerFile class. · 098cd538
  由 Heikki Linnakangas 提交于 9月 09, 2017
  
  098cd538
- H
  
  Remove unused CAutoFileDescriptor.h file. · ddb8bf12
  由 Heikki Linnakangas 提交于 9月 09, 2017
  
  ddb8bf12
- H
  Remove unused COstreamFile. · 770dd5db
  由 Heikki Linnakangas 提交于 9月 09, 2017
```
The header file was referenced in a few places, but it was otherwise
unused.
```
  770dd5db
- O
  Replace escaped qoutes with normal ones · 2211ca91
  由 Omer Arap 提交于 9月 07, 2017
```
Reformat minidump files with xmllint

Plan and cost change update after changing system column widths

Bump version to 2.42.1
```
  2211ca91
08 9月, 2017 1 次提交
- H
  
  Remove unused CCNFConverter class. · e26f249a
  由 Heikki Linnakangas 提交于 9月 07, 2017
  
  e26f249a
07 9月, 2017 1 次提交

Enable Index Scan when leaf partitions are queried directly (#219) · 4019b25a

由 Dhanashree Kashid 提交于 9月 06, 2017

Currently ORCA does not support index scan on leaf partitions when leaf partitions are queried directly. It only supports index scan if we query the root table. This PR along with the corresponding GPDB PR adds the support for using indexes when leaf partitions are queried
directly.

When a root table that has indexes (either homogenous/complete or
heterogenous/partial) is queried; the Relcache Translator sends index
information to ORCA. This enables ORCA to generate an alternative plan with
Dynamic Index Scan on all partitions (in case of homogenous index) or a plan
with partial scan i.e. Dynamic Table Scan on leaf partitions that don’t have
indexes + Dynamic Index Scan on leaf partitions with indexes (in case of
heterogeneous index).

This is a two step process in Relcache Translator as described below:
Step 1 - Get list of all index oids
CTranslatorRelcacheToDXL::PdrgpmdidRelIndexes() performs this step and it
only retrieves indexes on root and regular tables; for leaf partitions it bails
out.

Now for root, list of index oids is nothing but index oids on its leaf
partitions. For instance:

CREATE TABLE foo ( a int, b int, c int, d int) DISTRIBUTED by (a) PARTITION
BY RANGE(b) (PARTITION p1 START (1) END (10) INCLUSIVE, PARTITION p2 START (11)
END (20) INCLUSIVE);

CREATE INDEX complete_c on foo USING btree (c);
CREATE INDEX partial_d on foo_1_prt_p2 using btree(d);
The index list will look like = { complete_c_1_prt_p1, partial_d }

For a complete index, the index oid of the first leaf partitions is retrieved.
If there are partial indexes, all the partial index oids are retrieved.

Step 2 - Construct Index Metadata object
CTranslatorRelcacheToDXL::Pmdindex() performs this step.

For each index oid retrieved in Step #1 above; construct an Index Metadata
object (CMDIndexGPDB) to be stored in metadata cache such that ORCA can get all
the information about the index.
Along with all other information about the index, CMDIndexGPDB also contains
a flag fPartial which denotes if the given index is homogenous (ORCA
will apply it to all partitions selected by partition selector) or heterogenous
(the index will be applied to only appropriate partitions).
The process is as follows:
```
        Foreach oid in index oid list :
                Get index relation (rel)
                If rel is a leaf partition :
                        Get the root rel of the leaf partition
                        Get all the indexes on the root (this will be same list as step #1)
                        Determine if the current index oid is homogenous or heterogenous
                        Construct CMDIndexGPDB based appropriately (with fPartial, part constraint,
                        defaultlevels info)
                Else:
                        Construct a normal CMDIndexGPDB object.
```
Now for leaf partitions, there is no notion of homogenous or heterogenous
indexes since a leaf partition is like a regular table. Hence in Pmdindex()
we should not got for checking if index is complete or not.

Additionally, If a given index is homogenous or heterogenous needs to be
decided from the perspective of relation we are querying(such as root or a
leaf).

Hence the right place of fPartial flag is in the relation metadata object
(CMDRelationGPDB) and not the independent Index metadata object (CMDIndexGPDB).
This commit makes following changes to support index scan on leaf partitions
along with partial scans :

Relcache Translator:

In Step1, retrieve the index information on the leaf partition and create a
list of CMDIndexInfo object which contain the index oid and fPartial flag.
Step 1 is the place where we know what relation we are querying which enable us
to determine whether or not the index is homogenous from the context of the
relation.

The relation metadata tag will look like following after this change:

Before:
```
        <dxl:Indexes>
                <dxl:Index Mdid="0.17159874.1.0"/>
                <dxl:Index Mdid="0.17159920.1.0"/>
        </dxl:Indexes>
```
After:
```
        <dxl:IndexInfoList>
                <dxl:IndexInfo Mdid="0.17159874.1.0" IsPartial="true"/>
                <dxl:IndexInfo Mdid="0.17159920.1.0" IsPartial="false"/>
        </dxl:IndexInfoList>
```
ORCA changes:

A new class CMDIndexInfoGPDB has been created in ORCA which contains index mdid and fPartial flag. For external tables, normal tables and leaf partitions; the fPartial flag will always be false.
CMDRelationGPDB will contain array of CMDIndexInfoGPDB instead of simple index mdid array.
Add a new parsehandler to parse IndexInfoList and IndexInfo to create an array of CMDIndexInfoGPDB.
Update the existing mini dumps to remove fPartial flag from Index metatada tag and associate it with IndexInfo tag under Relation metadata.
Add new test scenarios for querying the leaf partition with homogenous/heterogenous index on root table.

4019b25a

02 9月, 2017 4 次提交

D
Bump ORCA version to 2.41.0 · 042471ca
由 Dhanashree Kashid 提交于 8月 24, 2017
```
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
```
042471ca

Update the minidumps · cba22e80

由 Dhanashree Kashid 提交于 8月 10, 2017

Now we send the part constraint expression only in the cases below:

            IsPartTable     Index   DefaultParts   ShouldSendPartConstraint
            NO              -       -              -
            YES             YES     YES/NO         YES
            YES             NO      NO             NO
            YES             NO      YES            YES (but only default levels info)

This commit updates the minidumps accordingly.
1. If the Relation tag has indices then keep the part constraint tag
2. If the Relation tag has no indices and no default partitions; remove
the entire part constraint tag
3. If the Relation tag has no indices but has default partitions at any
level then keep the part constraint tag but remove the scalar expression
4. Regenerated the following stale minidumps:
   * DynamicIndexScan-Homogenous.mdp
   * DynamicIndexScan-Heterogenous-Union.mdp
   * DynamicBitmapTableScan-Basic.mdp
5. Added four more test cases to CPartTblTest demonstrating the table
above.

[Ref #149769559]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
Signed-off-by: NOmer Arap <oarap@pivotal.io>

cba22e80

Don't serialize part constraint expr if no indices · deb78d49

由 Jemish Patel 提交于 8月 11, 2017

Do not serialize and de-serialize the part constraint expression
when there are no indices on the partitioned rel.

The relacache translator in GPDB will send empty part constraint
expression when the rel has no indices.

[Ref #149769559]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

deb78d49

Remove dead code · a26b4eb0

由 Omer Arap 提交于 8月 09, 2017

We never send null part constraints from the relacache translator
hence we do not handling for the same.

This code was probably added to support the older minidumps. There are
a few very old minidump files which do not contain the part constraint
tag in relation tag.

Now with the fix on the relcache translator side in GPDB, the only case
when we send NULL part constraints is when there are no indices and no
default partitions; we still don't need null check for part constraint in
this case because the `fDummyConstraint` will always be true.

[Ref #149769559]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

a26b4eb0

29 8月, 2017 1 次提交

Convert IN subq to EXISTS subq with pred [##149683475] · 0037ed8f

由 Omer Arap 提交于 8月 28, 2017

This commit adds a preprocessing step to change the expression tree when
there is an IN subquery with a project list that includes an outer
reference but no columns is included from the project's relational
child. This preprocessing helps ORCA to decorrelate the subquery. Orca
currently does not support directly decorrelating IN subqueries if there
is an outer reference in the CLogicalProject. Converting an IN subquery
to a predicate AND EXISTS subquery helps Orca generate more with
decorrelated subquery option.

Below is an example of the preprocessing applied in this commit.

Before preprocessing:
```
   +--CScalarSubqueryAny(=)["?column?" (17)]
      |--CLogicalProject
      |  |--CLogicalGet "bar" ("bar"), Columns: ["c" (9)]}
      |  +--CScalarProjectList
      |     +--CScalarProjectElement "?column?" (17)
      |        +--CScalarOp (+)
      |           |--CScalarIdent "b" (1)
      |           +--CScalarConst (1)
      +--CScalarIdent "a" (0)
```
After:
```
   +--CScalarBoolOp (EboolopAnd)
      |--CScalarOp (=)
      |  |--CScalarIdent "a" (0)
      |  +--CScalarOp (+)
      |     |--CScalarIdent "b" (1)
      |     +--CScalarConst (1)
      +--CScalarSubqueryExists
         +--CLogicalGet "bar" ("bar"), Columns: ["c" (9)] }
```

This commit bumps Orca version to 2.40.3
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

0037ed8f

28 8月, 2017 1 次提交

Allow building with unpatched Xerces-C. · 021de9a3

由 Heikki Linnakangas 提交于 8月 27, 2017

Per discussion at https://github.com/greenplum-db/gpdb/pull/2379, we don't
really need to use a special, patched, version of Xerces-C. Remove the
check.

See also commit 0b3b421a in GPDB, where we made the same change for
GPDB's autoconf check.

021de9a3

26 8月, 2017 1 次提交

Fix broken CAutoPTest & make pipeline green · 52feb13f

由 Ekta Khanna 提交于 8月 25, 2017

Previous to c09a0acd, `CStackObject()` constructor did a validity check if the pointer is on the stack using `FOnStack()`. Since the function is removed; the following test in `CAutoPTest` is invalid :

CAutoPTest::EresUnittest_Allocation()

This commit removes the test.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

52feb13f

24 8月, 2017 1 次提交

Remove FOnStack function, it doesn't work without frame pointers. · c09a0acd

由 Heikki Linnakangas 提交于 8月 24, 2017

If you compile with -fomit-frame-pointer, which is the default on recent
versions of gcc, the stack unwinding code in FOnStack will not work. This
is just a non-critical debugging aid, so let's just remove it altogether.

c09a0acd

10 8月, 2017 1 次提交
- E
  Bump ORCA version to 2.40.0 · ef457558
  由 Ekta Khanna 提交于 8月 06, 2017
```
Signed-off-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
```
  ef457558