提交 · 2c139497eadc7cc00d9a178980b865b574663686 · Greenplum / Gpdb

15 7月, 2017 1 次提交

由 Bhuvnesh 提交于 7月 14, 2017

* Donot generated PartOid Expression

In GPDB, PartOidExpr is not used however ORCA still generates it.
But HAWQ uses PartOid for sorting while inserting into Append Only
Row / Parquet Partitioned tables.

This patch uses Parquet Storage and Number of Partitions in a Append
Only row partitioned table to decide if PartOid should be generated.
In case of GPDB, Parquet storage is not supported and the GUC to control
the number of partitions above which sort should be used is set to int
max which is practically not feasible, so in case of GPDB PartOid expr
will never be generated, however HAWQ can control the generation of
PartOid based on the value of already existing GUCs in HAWQ.

* Remove PartOid ProjElem from minidump files

* Fixed CICGTest

* Fix CDMLTest

* Fix CDirectDispatchTest

* Fix CPhysicalParallelUnionAllTest

* Fix CCollapseProjectTest test

* Fix parser for Partition Selector

A Partition Selector node can have another partition selector node as
its immediate child. In such cases, the current parsers fails. The patch
fixes the issue

* Fix PartTbl Test

* PR Feedback Applied

* Applied HSY feedback 1
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

* Bump ORCA to 2.37
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

2c139497

11 7月, 2017 4 次提交

V

Update ORCA version · 73a7bffd
由 Venkatesh Raghavan 提交于 7月 10, 2017

73a7bffd

Convert Non-correlated EXISTS subquery to a LIMIT 1 AND a JOIN · e04ae39d

由 Venkatesh Raghavan 提交于 7月 10, 2017

Enable GPORCA to generate better plans for non-correlated exists subquery in the WHERE clause

Consider the following exists subquery, `(select * from bar)`. GPORCA generates an elaborate count based implementation of this subquery. If bar is a fact table, the count is going to be expensive.

```
vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice3; segments: 3)  (cost=0.00..1368262.79 rows=400324 width=8)
   ->  Nested Loop  (cost=0.00..1368250.86 rows=133442 width=8)
         Join Filter: true
         ->  Table Scan on foo  (cost=0.00..461.91 rows=133442 width=8)
               Filter: a = b
         ->  Materialize  (cost=0.00..438.57 rows=1 width=1)
               ->  Broadcast Motion 1:3  (slice2)  (cost=0.00..438.57 rows=3 width=1)
                     ->  Result  (cost=0.00..438.57 rows=1 width=1)
                           Filter: (count((count()))) > 0::bigint
                           ->  Aggregate  (cost=0.00..438.57 rows=1 width=8)
                                 ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..438.57 rows=1 width=8)
                                       ->  Aggregate  (cost=0.00..438.57 rows=1 width=8)
                                             ->  Table Scan on bar  (cost=0.00..437.95 rows=332395 width=1)
 Optimizer status: PQO version 2.35.1
(14 rows)
```
Planner on the other hand uses LIMIT as shown in the INIT plan.

```
vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                           QUERY PLAN
------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=0.03..13611.14 rows=1001 width=8)
   ->  Result  (cost=0.03..13611.14 rows=334 width=8)
         One-Time Filter: $0
         InitPlan  (slice3)
           ->  Limit  (cost=0.00..0.03 rows=1 width=0)
                 ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..0.03 rows=1 width=0)
                       ->  Limit  (cost=0.00..0.01 rows=1 width=0)
                             ->  Seq Scan on bar  (cost=0.00..11072.84 rows=332395 width=0)
         ->  Seq Scan on foo  (cost=0.00..13611.11 rows=334 width=8)
               Filter: a = b
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(12 rows)
```

While GPORCA doesnot support init-plan, we can nevertheless generate a better plan by using LIMIT instead of count. After this PR, GPORCA will generate the following plan with LIMIT clause.

```
vraghavan=# explain select * from foo where foo.a = foo.b and exists (select * from bar);
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice3; segments: 3)  (cost=0.00..1368262.73 rows=400324 width=8)
   ->  Nested Loop EXISTS Join  (cost=0.00..1368250.80 rows=133442 width=8)
         Join Filter: true
         ->  Table Scan on foo  (cost=0.00..461.91 rows=133442 width=8)
               Filter: a = b
         ->  Materialize  (cost=0.00..438.57 rows=1 width=1)
               ->  Broadcast Motion 1:3  (slice2)  (cost=0.00..438.57 rows=3 width=1)
                     ->  Limit  (cost=0.00..438.57 rows=1 width=1)
                           ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..438.57 rows=1 width=1)
                                 ->  Limit  (cost=0.00..438.57 rows=1 width=1)
                                       ->  Table Scan on bar  (cost=0.00..437.95 rows=332395 width=1)
 Optimizer status: PQO version 2.35.1
(12 rows)
```

e04ae39d

B
Bump ORCA version to 2.35.2 · 4ad9ce70
由 Bhunvesh Chaudhary 提交于 7月 10, 2017
```
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
```
4ad9ce70

Check existing tag before publishing new artifacts · 9ff85c29

由 Bhunvesh Chaudhary 提交于 7月 10, 2017

Each ORCA commit must BUMP the version. If the version is not
bumped new releases will not be pushed to the ORCA repository.
This commit adds the check to validate the version of the
current commit with the tag version existing on the repository.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

9ff85c29

06 7月, 2017 3 次提交

Fix the behaviour of FBetterThan for Random vs Hashed · bbbbd699

由 Omer Arap 提交于 7月 05, 2017

There is a tie breaker logic in FBetterThan function in CCostContext.
According to the code if the distribution is Hashed rather than Random,
it should be favored when the costs are equal.

The code was checking both if the distribution spec in the same context
is both equal to Hashed and Random which is false by default. It should
check if comparing one is Hashed and compared one is Random for correct
behavior.

bbbbd699

B
Bump ORCA version to 2.35.0 · 900a586f
由 Bhuvnesh Chaudhary 提交于 6月 29, 2017
```
There are changes to regress tests in GPDB repository, so bumping up the
minor version.
```
900a586f

Preprocess query to ensure Scalar Ident is on LHS · aa6754d1

由 Bhuvnesh Chaudhary 提交于 6月 29, 2017

CScalarCmp and CScalarIsDistinctFrom operator must have the CScalarIdent operator
on the LHS and CScalarConst operator on the RHS. If it's
not the case the predicate is assigned a default selectivity
due to which the Cardinality estimate is impacted.

This patch fixes the issue by reordering the children of
CScalarCmp and CScalarIsDistinctFrom operator if CScalarIdent operator is on
the RHS on CScalarConst is on LHS. Also the Comparision operator is changed
due to the reordering of the arguments, if supported. If the
corresponding comparision operator does not exist or supported, the
children are not reordered.
Only cases of the type CONST = VAR are handled by this patch.
Signed-off-by: NOmer Arap <oarap@pivotal.io>

aa6754d1

01 7月, 2017 1 次提交
- V
  Refactor the code in CPredicateUtils · 350ca788
  由 Venkatesh Raghavan 提交于 6月 30, 2017
```
Minor cleaniup of duplicate code.
```
  350ca788
30 6月, 2017 2 次提交

Enable Concourse caching of ccache · b46380ee

由 Jesse Zhang 提交于 6月 29, 2017

Shiny new feature in Concourse 3.3.0
(https://concourse.ci/running-tasks.html#caches)

[ci skip]

b46380ee

Get rid of UlSafeLength() function. · 25c2b4dd

由 Heikki Linnakangas 提交于 6月 22, 2017

It made the assumption that it's OK to call it on a NULL pointer, which
isn't cool with all C++ compilers and options. I'm getting a bunch of
warnings like this because of it:

/home/heikki/gpdb/optimizer-main/libgpos/include/gpos/common/CDynamicPtrArray.inl:382:3: warning: nonnull argument ‘this’ compared to NULL [-Wnonnull-compare]
   if (NULL == this)
   ^~

There are a few other places that produce the same error, but one step at
a time. This is most important because it's in an inline function, so this
produces warnings also in any code that uses ORCA, like the translator code
in GPDB's src/backend/gpopt/ directory, not just ORCA itself.

Since the function is now gone, all references to it also need to be removed
from the translator code outside ORCA.

Bump up Orca version to 2.34.2

25c2b4dd

28 6月, 2017 2 次提交

B
Bump ORCA version to 2.34.1 · 3fa3cb01
由 Bhuvnesh Chaudhary 提交于 6月 27, 2017
```
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>
```
3fa3cb01

[#146190079] Falls back due to CTE prod-cons inconsistency · f826d526

由 Omer Arap 提交于 6月 21, 2017

This commit introduces a check to detect if the CTE producer and
matching consumer is executed on the right place. So if CTE producer is
executed on master/segments/segment, then matching consumer also has to
execute on master/segments/segment. In rare cases, orca generates plans
that violate this assumption. This commit detects plans of that kind and
falls back.
Signed-off-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

f826d526

21 6月, 2017 1 次提交
- H
  
  Remove unused Slab memory pool and Slab cache stuff · f7e1fdd3
  由 Heikki Linnakangas 提交于 6月 21, 2017
  
  f7e1fdd3
20 6月, 2017 2 次提交

Remove .inl files and merge implementation in .h · 80305df7

由 Omer Arap 提交于 5月 31, 2017

The template classes in orca mostly do the implementation in
`.inl` files while some implementation also exists in `.h` files.
It makes it hard to traverse the code in .inl files since some
IDEs do not recognize formatting. Therefore this commit moves
the implementation to the `.h` files wherever is applicable.

This commit does not port implementation from `.inl` where
there exists `.cpp` implementation file as well as `.h` only has
function definitions such as `CUtils.h`.

80305df7

Update Join Cardinality Estimation for Text/bpchar/varchar/char columns · 6567f566

由 Venkatesh Raghavan 提交于 6月 19, 2017

Histogram intersection depends on the value of the bucket boundaries.
For datatypes like text, varchar, etc. Orca currently uses a hash function
to mark bucket boundaries. This function is slightly useful for equality
with singleton buckets but nothing more. So the previous join stats computation
based on histogram intersection is totally bogus. In this CL, we now modified
it into a NDV (number of distinct values) based estimation.

6567f566

10 6月, 2017 1 次提交
- J
  Revert "Get all submodules when building GPDB" · b364b239
  由 Jesse Zhang 提交于 6月 09, 2017
```
This reverts commit 8c315ea4.
[ci skip]
```
  b364b239
08 6月, 2017 1 次提交
- J
  Adding a **How to Contribute** section to the README [#146394531] · 67785cdd
  由 Jemish Patel 提交于 6月 01, 2017
```
[ci skip]
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
```
  67785cdd
18 5月, 2017 1 次提交
- V
  
  Add api to return all xforms that generate hash join · 658e05ac
  由 Venkatesh Raghavan 提交于 5月 17, 2017
  
  658e05ac
16 5月, 2017 2 次提交
- J
  Don't trigger a build on changes over README · 226e9d5d
  由 Jesse Zhang 提交于 5月 15, 2017
```
[ci skip]
```
  226e9d5d
- J
  Get all submodules when building GPDB · 8c315ea4
  由 Jesse Zhang 提交于 5月 15, 2017
```
Todd broke our CI in greenplum-db/gpdb@398534a9.

[ci skip]
```
  8c315ea4
15 5月, 2017 2 次提交
- V
  
  Update traceflags used in Debug build · 9510ab9a
  由 Venkatesh Raghavan 提交于 5月 14, 2017
  
  9510ab9a
- V
  Streamline Orca Traceflags · ecc57e1b
  由 Venkatesh Raghavan 提交于 5月 14, 2017
```
* Make sure intent of the traceflags is clear
* Remove double negation where possible
* Update comments
```
  ecc57e1b
12 5月, 2017 1 次提交
- C
  
  Format codeblocks in the readme so Github shows them correctly · 120a4741
  由 C.J. Jameson 提交于 5月 09, 2017
  
  120a4741
09 5月, 2017 6 次提交

D
Bump ORCA version to 2.30 · a4f89e1d
由 Dhanashree Kashid 提交于 5月 08, 2017
```
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>
```
a4f89e1d

Retain additional conditions while translating Correlated Nested Loop Join [#144468913] · 8a0552f0

由 Jemish Patel and Jesse Zhang 提交于 5月 02, 2017

For a query using a correlated subselect as predicate, such as:

```
CREATE TABLE partitioned_table (a int, pk int) DISTRIBUTED BY (a)
	PARTITION BY range(pk) (end(5), end(10));
CREATE TABLE other_table (c int, d int) DISTRIBUTED BY (c);
INSERT INTO partitioned_table VALUES (1, 1), (2, 4), (3, 9);
ANALYZE ROOTPARTITION partitioned_table;

EXPLAIN SELECT pk from partitioned_table WHERE a > (SELECT d FROM other_table WHERE c = a) AND pk < 12;
```

ORCA will generate a Correlated Nested Loop Left Outer Join, which
should translate to a DXL Scan under a DXL SubPlan filter. However, the
translation happened in the following order:

0. Translate outer child (which has a filter) of correlated NLJ
0. Build a DXL SubPlan using the inner child, which is intended to serve as an
   "additional filter condition" on top of the outer child
0. Now based on the DXL plan for the outer child, we decide whether or not to
   use the additional condition (generated in #2 or #3) as a filter in the
   final result.
0. If the outer child is a Physical Sequence, we discarded the condition
   assuming that filter condition is already present in the partition
   selector.
0. This code to discard the subplan was added in
   e99325cc because, previously, we were always
   inserting additional filter as "the second child" based on the assumption
   that every DXL node has a filter child in the 2nd place. As it turned out,
   the DXL Sequence node is one counterexample: it has no filter, and it's
   second child is expected to be a partition selector.
0. We didn't catch this error in e99325cc because the test cases had a trivial
   additional filter condition of a constant true, so dropping it didn't really
   raise any eyebrows. However, the generally correct approach should be to
   retain this additional condition, either as an additional DXL Result node on
   top of the DXL for the outer child, or for appropriate types of nodes,
   inline the condition into existing filters.

This patch set fixes that.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

8a0552f0

J
More consistent ref-counting [#144468913] · b2fdaf9b
由 Jemish Patel and Jesse Zhang 提交于 5月 02, 2017
```
We ensure all three cases of PdxlnCorrelatedNLJoin take ownership of the
DXLProperties object
```
b2fdaf9b
J

Re-use the same dxl properties [#144468913] · 54565e06
由 Jemish Patel and Jesse Zhang 提交于 5月 02, 2017

54565e06
J

Extract true-check into PdxlnResultFromScalarConditionAndDxlChild [#144468913] · e3b1d571
由 Jemish Patel and Jesse Zhang 提交于 5月 02, 2017

e3b1d571

Move predicate combination into PdxlnResultFromNLJoinOuter [#144468913] · ce347b4a

由 Dhanashree Kashid, Jemish Patel and Jesse Zhang 提交于 5月 02, 2017

0. There are cases where the additional scalar condition cannot be combined
with the original condition in the DXL plan. Case in point: when the outer
child gets translated into a DXL Sequence, we cannot put the "combined
condition" into the sequence.
0. Deferring the combination of conditions also gives us an optimization
opportunity to reduce double translations.

ce347b4a

30 4月, 2017 1 次提交

Extend GPDB ICG with optimizer to an hour · c940456a

由 Jesse Zhang 提交于 4月 29, 2017

installcheck has over the last year gotten slowly bloated. This test
runs about 32 minutes when nothing else happens in Concourse. Given we
also run the planner ICG in parallel, we'd better err on the safe side
and extend this to an hour.

[ci skip]

c940456a

26 4月, 2017 9 次提交

O
Bump Orca version to 2.29.0 · 26e07c30
由 Omer Arap 提交于 4月 25, 2017
```
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
```
26e07c30

Avoid throwing an error recursively while serializing error context. · c4b1cfda

由 Heikki Linnakangas 提交于 4月 25, 2017

If an error occurs while serializing an exception's error context, don't
recurse into Serialize. Serializing the context is likely to just fail
again, leading to infinite recursion.

Also disable abort-signal while serializing error context, to also avoid
recursing.

c4b1cfda

Clear CWorker->m_ptsk field on exception. · 4238ad87

由 Heikki Linnakangas 提交于 4月 25, 2017

If an error occurs, a worker is no longer executing the given task. This
was causing problems later, when the Task object had already been
destroyed, but m_ptsk was left dangling.

4238ad87

Avoid mallocs while iterating shared hash table. · e6fbc911

由 Heikki Linnakangas 提交于 4月 25, 2017

The hash table iterator holds a spinlock on the hash table, and there's
a GPOS_ASSERT_NO_SPINLOCK in PvMalloc. Writing to the dumper stream can
cause allocations, so avoid doing that while iterating.

e6fbc911

Speed up serialization of XML attributes to a file. · 72b865e4

由 Heikki Linnakangas 提交于 4月 03, 2017

This isn't strictly necessary, but considerably speeds up the dumping of
large queries. This brought down the time needed to process and dump a
600 MB minidump from about 90 s to 30 s on my laptop.

72b865e4

Serialize minidumps in a streaming fashion. · a223a100

由 Heikki Linnakangas 提交于 4月 03, 2017

Instead of having a fixed-size buffer to serialize minidumps to, refactor
the serialization functions to write to a stream. That simplifies the
serialization functions, as they no longer need to reserve space in the
buffer ahead of time (i.e. the UlpRequiredSpace() functions are gone),
reduces memory usage when dumping small queries, and makes it possible to
minidump large queries without running out of memory.

This removes the arbitrary 16 MB limit on minidump size. I'm not sure it's
a good idea to create multi-gigabyte minidumps in practice, but that's a
case of "if it hurts, don't do it". At least it's now possible, if you need
to.

a223a100

O
Bump up Orca version to 2.28.0 · ae65932c
由 Omer Arap 提交于 4月 25, 2017
```
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
```
ae65932c
J
Change the tests for CHashMapIter and CHashSetIter · 16ada974
由 Jesse Zhang 提交于 4月 25, 2017
```
Signed-off-by: NOmer Arap <oarap@pivotal.io>
```
16ada974
O

Convert stats config map to a set · 21601aeb
由 Omer Arap 提交于 4月 19, 2017

21601aeb