提交 · 7a608593d330efc0523ecdf1087f1e8df878a28f · Greenplum / Gpdb

09 11月, 2017 1 次提交
- V
  
  Update README with GPORCA related blog posts · 7a608593
  由 Venkatesh Raghavan 提交于 11月 08, 2017
  
  7a608593
08 11月, 2017 3 次提交
- S
  Merge pull request #256 from greenplum-db/bump_version · 53f6246a
  由 sambitesh 提交于 11月 07, 2017
```
Bump Orca Version
```
  53f6246a
- S
  
  Bump Orca Version · d5d4d333
  由 Sambitesh Dash 提交于 11月 07, 2017
  
  d5d4d333
- S
  Merge pull request #255 from greenplum-db/bitmap_fix · 7548f3c4
  由 sambitesh 提交于 11月 07, 2017
```
Merge conditions into corresponding Bitmap Index Probes
```
  7548f3c4
07 11月, 2017 1 次提交

Merge conditions into corresponding Bitmap Index Probes · 39d0145b

由 Sambitesh Dash 提交于 11月 06, 2017

There was bug in the way Bitmap Index Probes were being merged.

Consider the table query below :

SELECT * FROM foo WHERE b = 2 AND c >=6 AND c <= 6;

Let's assume there is index on coloumn 'b' and 'c'. Bitmap Index Probe
on second and third condition ('c>=6' and 'c<=6') are mergeable because
they are on the same indexed coloumn. Due to the bug, Orca would merge
second Index Probe with first Index Probe condition ('b=2') instead. This lead to a wrong 'Recheck
Condition' which if selected as a filter lead to a wrong plan. This
commit fixes this bug.
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>

39d0145b

03 11月, 2017 2 次提交
- H
  Revert "Eliminate redundant distinct inside IN/NOT IN/EXISTS/NOT EXISTS subquery (#254)" · 9fdcb32d
  由 Haisheng Yuan 提交于 11月 02, 2017
```
This reverts commit 5738da22.
```
  9fdcb32d
- H
  Eliminate redundant distinct inside IN/NOT IN/EXISTS/NOT EXISTS subquery (#254) · 5738da22
  由 Haisheng Yuan 提交于 11月 02, 2017
```
In an IN, EXISTS, NOT IN or NOT EXISTS subquery, any duplicates in the
subselect will not affect the overall result, so we can throw away any
DISTINCT clause. Unless there's a LIMIT.
```
  5738da22
02 11月, 2017 1 次提交

Slight README clean up (#250) · b40be5f7

由 Jesse Zhang 提交于 11月 01, 2017

Cleans up README and simplifies build steps

1. Remove leftover mentions of `make` in the context of building ORCA
1. Because ninja is parallel by default, remove mentions of how to
   parallelize the build

* Simplify build steps in README

Noticeably, we no longer require the two most hated steps:
cd-after-mkdir-build. Instead `cmake` will directly mkdir the build
directly if it doesn't exist.

* [ci skip]
This fixes #248

b40be5f7

28 10月, 2017 1 次提交
- D
  
  Add CTE inlining test from TINC · 0edffa91
  由 Dhanashree 提交于 10月 25, 2017
  
  0edffa91
24 10月, 2017 3 次提交

D
Bump ORCA Version · 18ebd876
由 Dhanashree Kashid 提交于 10月 23, 2017
```
Bump ORCA version after commit b111104f
```
18ebd876
S
Bump ORCA patch version and update README and conan.py for cmake version · c2141701
由 Sambitesh Dash 提交于 10月 19, 2017
```
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
```
c2141701

Remove tautological undefined comparisons · b111104f

由 Jesse Zhang 提交于 9月 25, 2017

A "tautological" comparison is a comparison that's only meaningfully
necessary when we consider "undefined" C++ behaviors.

For context, in well-formed C++ code:

  1. references can never be bound to NULL; and
  1. the `this` pointer in a member function can never be NULL

Historically ORCA has relied on implementation-specific (undefined,
actually) behavior where

  1. we might call a member function on a potentially NULL object with
  the `->` operator, or
  1. some callers may bind a (possibly-NULL) pointer to a reference with
  the '*' operator and try to print it into an IOStream

While doing so gives the benefit of centralizing the check, a dependence
on undefined behavior means we risk producing the wrong code. Indeed,
more modern compilers aggressively optimize against undefined behaviors:
e.g. by eliminating the `NULL` checks, or assuming the variable used for
indexing an array is never out of bound.

Commit ee5ef334 is a "tick" towards
reducing such undefined comparisons. This commit is the "tock" that
eliminates them.

For more context GCC 6+ chokes without this:

Running on a macOS iMac:

```
env CC='gcc-6' CXX='g++-6' cmake -GNinja -DCMAKE_BUILD_TYPE=Debug -H. -Bbuild.gcc6.debug
ninja -C build.gcc6.debug
```

GCC produces this error:

```
ninja: Entering directory `build.gcc6.debug'
[548/1027] Building CXX object libgpopt/CMakeFiles/gpopt.dir/src/base/CCTEMap.cpp.o
FAILED: libgpopt/CMakeFiles/gpopt.dir/src/base/CCTEMap.cpp.o
ccache /usr/local/bin/g++-6  -Dgpopt_EXPORTS -I/usr/local/include -I../libgpos/include -I../libgpopt/include -I../libgpopt/../libgpcost/include -I../libgpopt/../libnaucrates/include -Ilibgpos/include -Wall -Werror -Wextra -pedantic-errors -Wno-variadic-macros -Wno-tautological-undefined-compare -fno-omit-frame-pointer -g -g3 -fPIC -MD -MT libgpopt/CMakeFiles/gpopt.dir/src/base/CCTEMap.cpp.o -MF libgpopt/CMakeFiles/gpopt.dir/src/base/CCTEMap.cpp.o.d -o libgpopt/CMakeFiles/gpopt.dir/src/base/CCTEMap.cpp.o -c ../libgpopt/src/base/CCTEMap.cpp
../libgpopt/src/base/CCTEMap.cpp: In function 'gpos::IOstream& gpopt::operator<<(gpos::IOstream&, gpopt::CCTEMap&)':
../libgpopt/src/base/CCTEMap.cpp:432:18: error: the compiler can assume that the address of 'cm' will never be NULL [-Werror=address]
     return (NULL == &cm) ? os : cm.OsPrint(os);
                  ^
At global scope:
cc1plus: all warnings being treated as errors
[549/1027] Building CXX object libgpopt/CMakeFiles/gpopt.dir/src/base/CCTEReq.cpp.o
FAILED: libgpopt/CMakeFiles/gpopt.dir/src/base/CCTEReq.cpp.o
ccache /usr/local/bin/g++-6  -Dgpopt_EXPORTS -I/usr/local/include -I../libgpos/include -I../libgpopt/include -I../libgpopt/../libgpcost/include -I../libgpopt/../libnaucrates/include -Ilibgpos/include -Wall -Werror -Wextra -pedantic-errors -Wno-variadic-macros -Wno-tautological-undefined-compare -fno-omit-frame-pointer -g -g3 -fPIC -MD -MT libgpopt/CMakeFiles/gpopt.dir/src/base/CCTEReq.cpp.o -MF libgpopt/CMakeFiles/gpopt.dir/src/base/CCTEReq.cpp.o.d -o libgpopt/CMakeFiles/gpopt.dir/src/base/CCTEReq.cpp.o -c ../libgpopt/src/base/CCTEReq.cpp
../libgpopt/src/base/CCTEReq.cpp: In function 'gpos::IOstream& gpopt::operator<<(gpos::IOstream&, gpopt::CCTEReq&)':
../libgpopt/src/base/CCTEReq.cpp:569:18: error: the compiler can assume that the address of 'cter' will never be NULL [-Werror=address]
     return (NULL == &cter) ? os : cter.OsPrint(os);
                  ^
At global scope:
cc1plus: all warnings being treated as errors
[557/1027] Linking CXX shared library libnaucrates/libnaucrates.2.47.0.dylib
ninja: build stopped: subcommand failed.
```

b111104f

20 10月, 2017 2 次提交

Add a configuration option to disable enforcements of constraints. · 04293f5a

由 Heikki Linnakangas 提交于 10月 20, 2017

Add a new EnforceConstraints hint configuration option. If it's set,
ORCA will not add Assert nodes to enforce CHECK and NOT NULL constraints
on INSERT and UPDATE statements.

This is useful for GPDB, which is prepared to enforce the constraints on
its own. In theory, the optimizer could prove the inserted row to always
satisfy the constraints, in which case it could optimize away the checks,
whereas the executor can't do that on its own. But ORCA doesn't currently
attempt to do that. The reason I'd like to enforce the constraints in
GPDB side instead of as Assertion nodes, is that you get a different error
message from the assertion node. That's annoying, because we have to
maintain alternative expected output files for tests that hit CHECK or
NOT NULL constraints because of that.

Bump version number to 2.48.0, since this is an incompatible change.

04293f5a

J
Use PROJECT_SOURCE_DIR instead of CMAKE_SOURCE_DIR · 46d3a54b
由 Jesse Zhang 提交于 10月 19, 2017
```
This is semantically more precise, and also enables ORCA to be included
in other CMake projects

[ci skip]
```
46d3a54b

18 10月, 2017 3 次提交

S
Bump ORCA Version to 2.47.1 · a5d22777
由 Shreedhar Hardikar 提交于 10月 06, 2017
```
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
```
a5d22777
S
Add minidump tests · 8191b41c
由 Shreedhar Hardikar 提交于 10月 06, 2017
```
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
```
8191b41c

Support subqueries in the Scalar child of ANY/ALL Subqueries · df1807cc

由 Shreedhar Hardikar 提交于 10月 06, 2017

While un-nesting the ANY/ALL subqueries, ORCA generates a predicate to
be used for Join Expression. In this process; ORCA always assumed that
scalar child of the subquery is ScalarIdent or ScalarConst and did not
take into account ScalarSubqueries. This resulted in a crash for queries
like :

EXPLAIN SELECT * FROM foo WHERE
	(SELECT COUNT(*) FROM foo) IN (SELECT 1 FROM bar);

This commit fixes this by recursively trying to unnest the subqueries
appearing in scalar part of ANY/ALL subqueries.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

df1807cc

12 10月, 2017 3 次提交

S
Bump ORCA Version · 42aeca48
由 Shreedhar Hardikar 提交于 10月 02, 2017
```
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
```
42aeca48

Enable partition selection when a cast is present · 268a8e62

由 Shreedhar Hardikar 提交于 10月 02, 2017

Enabled by including CScalarCmp expressions that contain a CScalarCast
in the types of expressions considered for partition filters. Observe
that if the expression contains a CScalarCast over a CScalarIdent, it
must be preserved all the way to the final plan. That is, we cannot
"peak" and extract the identifer under the cast. For this reason, in
case it is an equality comparison with a cast, levelEqExprs can no
longer be used.

Also, Convert a cast on list part filters to array coerce.

During Expr to DXL translation, construct a CDXLScalarArrayCoerceExpr
operator on top of CDXLScalarPartListValues when a cast is present on
top of the partition key in the partition filter expression.

Also, refactor SplitPartPredicates() to make it easier to read; and
refactor methods around PdxlnListFilterScCmp() by extracting out the
generation of Part Key expression for LIST partition filters.

However, ORCA won't be able to generate a partition filter from an
expression of the form : pk::int8 IS DISTINCT FROM 5. This is because
IDF expressions are handled by the CConstraintInterval framwork which
converts it to a corresponding CScalarCmp + CScalarNullTest. This
framwork cannot preserve cast information on the partkey since it stores
only a CColRef.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

268a8e62

Fix incorrect check in PciIntervalFromScalarIDF() · 160d60b2

由 Shreedhar Hardikar 提交于 10月 09, 2017

When using CConstraintInterval to derive partition filters, we cannot
use a casted Ident, since the cast information is lost. This tripped an
assertion. Also implement FIdentIDFConst().

160d60b2

06 10月, 2017 4 次提交

S

Bump ORCA version. · f6d3ccae
由 Shreedhar Hardikar 提交于 9月 25, 2017

f6d3ccae
S
Ctest changes for IDF in partition selection · 6ddaecec
由 Shreedhar Hardikar 提交于 9月 22, 2017
```
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
```
6ddaecec

Handle Scalar IDFs in PciIntervalFromScalarExpr. · 1fd0a5e2

由 Shreedhar Hardikar 提交于 9月 22, 2017

Add functionality to return a Constraint Interval for an expression of
the form col IS DISTINCT FROM const.
Also remove an unneccesary check in PexprPredicateCol() for the case of
IS NOT NULL, since it is now supported. But this meant redundant IS NOT
NULL filters, so take care of that also.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

1fd0a5e2

S
Implement DbgPrint() for CConstraint and CConstraintInterval · feb9927c
由 Shreedhar Hardikar 提交于 9月 22, 2017
```
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
```
feb9927c

05 10月, 2017 1 次提交

Fixing incorrect asserts in CXformUtils · bded4a81

由 Venkatesh Raghavan 提交于 10月 04, 2017

There was a fairly classic bug / typo where the assertion would never
fail because we put an enum member (non-zero) in a boolean context.

Even though the method in question is generically named
`TransformImplementBinaryOp`, it's actually only on the code path of
transforming a physical nested loop (non-correlated, non-indexed) join.

This commit adds back all types of eligible nested loop joins into the
assertion.
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>

bded4a81

29 9月, 2017 1 次提交
- B
  
  Updated pipeline to build and test orca using conan · ee724dbf
  由 Bhuvnesh Chaudhary 提交于 9月 27, 2017
  
  ee724dbf
27 9月, 2017 2 次提交

O

Update space size of a minidump · f6a453c1
由 Omer Arap 提交于 9月 26, 2017

f6a453c1

Reorder preprocessing steps to avoid false trimming of existential subqueries [#150988530] · bec7b6af

由 Omer Arap 提交于 9月 21, 2017

Orca remove outer references from order grouping columns in GbAgg. Orca also
trims an existential subquery whose inner expression is a GbAgg  with no grouping
columns by replacing it with a Boolean constant. However in same caes
applying the first preprocessing step affects the latter and produces an
unintended trimming of existential subqueries.

This commit changes the order of these two preprocessing step to avoid
that complications.

E.g
```
SELECT * from foo where not exits (SELECT sum(bar.a) from bar where
foo.a = bar.a GROUP BY foo.a;
```

In this example the grouping column is an outer reference and it is
removed by `PexprRemoveSuperfluousOuterRefs`. And the next preprocessing
step `PexprTrimExistentialSubqueries` sees that the GbAgg has no
grouping colums and implys `NOT EXISTS` as `false`.

Therefore we change the order and this fixes the problem.

Bump version to 2.46.3

bec7b6af

23 9月, 2017 1 次提交

Push ORCA src to bintray repository for conan · f395a029

由 Bhuvnesh 提交于 9月 20, 2017

ORCA source will be pushed to bintray repository for each version bump.
Developers can use conan dependency manager to build orca before
building GPDB via minimal steps and need not worry about the version
dependency.
As we currently bump orca version used by GPDB, we will ensure that
conanfile.txt gets updated for each bump of ORCA.

To build orca from GPDB:
Step 1: cd <path_to_gpdb>/depends
Step 2: env CONAN_CMAKE_GENERATOR=Ninja conan install -s build_type=Debug --build
where: build_type can be `Debug` or `Release`
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

f395a029

20 9月, 2017 1 次提交
- J
  Mention Ninja · 8557a671
  由 Jesse Zhang 提交于 9月 19, 2017
```
Now that we all are using it.
[ci skip]
```
  8557a671
19 9月, 2017 3 次提交

Mark Raise functions with noreturn attribute. · e21a80b4

由 Heikki Linnakangas 提交于 9月 19, 2017

This potentially allows the compiler to make some optimizations, or at
give better warnings, e.g. about using variables uninitialized.

e21a80b4

Map GPOS severity level to GPDB Severity Levels · 3fd5086d

由 Bhuvnesh Chaudhary 提交于 9月 12, 2017

GPOS raises exception with different severity level, but
they were being logged to GPDB logs at LOG severity level.

This is the initial commit which introduces the functionality. If an
exception is created with debug log level, these messages will be logged
to GPBD with debug1 level rest will be logged as LOG level.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

3fd5086d

Decorrelate queries with Aggregate Window Func · 0a547ac0

由 Bhuvnesh 提交于 9月 15, 2017

If the correlated subquery has aggregate window functions we can
pull up the quals and aggregate function as a the condition of the
join between outer and inner query
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

0a547ac0

16 9月, 2017 1 次提交

Incorrect Decorrelation results in wrong plan · 94e509d0

由 Bhuvnesh Chaudhary 提交于 9月 15, 2017

While attempting to decorrelate the subquery, we were
incorrectly pulling up the join before calculating the window function
of the results of the join. In cases where we have subqueries with
window function and the subquery has outer references we should not
attempt decorrelating it.

Ex: select C.j from C where C.i in (select rank() over (order by B.i) from B where B.i=C.i) order by C.j;
The above subquery has outer references and result of window function is projected from subquery

There are further optimization which can be done in case of existential
queries but this PR fixes the plan.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

94e509d0

15 9月, 2017 4 次提交

GPORCA incorrectly collapsing FOJ with condition false · 30cfe889

由 Venkatesh Raghavan 提交于 9月 14, 2017

Prior to this fix, the logic that calculated max cardinality for each
logical operator assumed that the Full Outer Join with condition false will always
return empty set. This was used by the following preprocessing step

CExpressionPreprocessor::PexprPruneEmptySubtrees

to eliminate the FOJ subtrees (since it thought had zero output cardinality),
replacing them with a const table get and zero tuples

```
vraghavan=# explain select * from foo a full outer join foo b on false;
                   QUERY PLAN
------------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=8)
   ->  Result  (cost=0.00..0.00 rows=0 width=8)
         One-Time Filter: false
 Optimizer status: PQO version 2.43.1
(4 rows)
```
This collapsing is incorrect. In this CL, the max cardinality logic has been
fixed to ensure that GPORCA plan generates correct plan.

After the fix:

```
vraghavan=# explain select * from foo a full outer join foo b on false;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..2207585.75 rows=35 width=8)
   ->  Result  (cost=0.00..2207585.75 rows=12 width=8)
         ->  Sequence  (cost=0.00..2207585.75 rows=12 width=8)
               ->  Shared Scan (share slice:id 2:0)  (cost=0.00..431.00 rows=2 width=1)
                     ->  Materialize  (cost=0.00..431.00 rows=2 width=1)
                           ->  Table Scan on foo  (cost=0.00..431.00 rows=2 width=34)
               ->  Sequence  (cost=0.00..2207154.75 rows=12 width=8)
                     ->  Shared Scan (share slice:id 2:1)  (cost=0.00..431.00 rows=2 width=1)
                           ->  Materialize  (cost=0.00..431.00 rows=2 width=1)
                                 ->  Table Scan on foo  (cost=0.00..431.00 rows=2 width=34)
                     ->  Append  (cost=0.00..2206723.75 rows=12 width=8)
                           ->  Nested Loop Left Join  (cost=0.00..882691.26 rows=10 width=68)
                                 Join Filter: false
                                 ->  Shared Scan (share slice:id 2:0)  (cost=0.00..431.00 rows=2 width=34)
                                 ->  Result  (cost=0.00..0.00 rows=0 width=34)
                                       One-Time Filter: false
                           ->  Result  (cost=0.00..1324032.49 rows=2 width=68)
                                 ->  Nested Loop Left Anti Semi Join  (cost=0.00..1324032.49 rows=2 width=34)
                                       Join Filter: false
                                       ->  Shared Scan (share slice:id 2:1)  (cost=0.00..431.00 rows=2 width=34)
                                       ->  Materialize  (cost=0.00..431.00 rows=5 width=1)
                                             ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..431.00 rows=5 width=1)
                                                   ->  Result  (cost=0.00..431.00 rows=2 width=1)
                                                         ->  Shared Scan (share slice:id 1:0)  (cost=0.00..431.00 rows=2 width=1)
 Optimizer status: PQO version 2.43.1
(25 rows)
```

30cfe889

O

Bump Orca version to 2.44.0 · f55e6f70
由 Omer Arap 提交于 9月 11, 2017

f55e6f70
O

Minidump updates · 8df82709
由 Omer Arap 提交于 9月 08, 2017

8df82709

Only request stats of columns needed for cardinality estimation [#150424379] · 05a26924

由 Omer Arap 提交于 8月 28, 2017

GPORCA should not spend time extracting column statistics that are not
needed for cardinality estimation. This commit eliminates this overhead
of requesting and generating the statistics for columns that are not
used in cardinality estimation unnecessarily.

E.g:
`CREATE TABLE foo (a int, b int, c int);`

For table foo, the query below only needs for stats for column `a` which
is the distribution column and column `c` which is the column used in
where clause.
`select * from foo where c=2;`

However, prior to that commit, the column statistics for column `b` is
also calculated and passed for the cardinality estimation. The only
information needed by the optimizer is the `width` of column `b`. For
this tiny information, we transfer every stats information for that
column.

This commit and its counterpart commit in GPDB ensures that the column
width information is passed and extracted in the `dxl:Relation` metadata
information.

Preliminary results for short running queries provides up to 65x
performance improvement.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

05a26924

12 9月, 2017 2 次提交

Use col position from table descriptor for index key retrieval · 7ccedc81

由 Jemish Patel 提交于 9月 08, 2017

Index keys are relative to the relation and the list of columns in
`pdrgpcr` is relative to the table descriptor and does not include any
dropped columns. Consider the case where we had 20 columns in a table. We
create an index that covers col # 20 as one of its keys. Then we drop
columns 10 through 15. Now the index key still points to col #20 but the
column ref list in `pdrgpcr` will only have 15 elements in it and cause
ORCA to crash with an `Out of Bounds` exception when
`CLogical::PosFromIndex()` gets called.

This commit fixes that issue by using the index key as the position to
retrieve the column from the relation. Use the column's `attno` to get
the column's current position relative to the table descriptor `ulPos`.
Then uses this `ulPos` to retrieve the `CColRef` of the index key
column as shown below:

```
CColRef *pcr = (*pdrgpcr)[ulPos];
```

We also added the 2 test cases below to test for the above condition:

1. `DynamicIndexGetDroppedCols`
2. `LogicalIndexGetDroppedCols`

Both of the above test cases use a table with 30 columns; create a btree
index on 6 columns and then drop 7 columns so the table only has 23
columns.

This commit also bumps ORCA to version 2.43.1

7ccedc81

B

Bump ORCA version · 9dec59e0
由 Bhuvnesh Chaudhary 提交于 9月 10, 2017

9dec59e0