提交 · a341621d6405b9a9f07acc8c25a8730f505ca7eb · Greenplum / Gpdb

21 9月, 2018 1 次提交

Introduce optimizer_enable_gather_on_segment_for_DML GUC · a341621d

由 Sambitesh Dash 提交于 9月 13, 2018

When ON, ORCA will optimize DML queries by enforcing a non-master gather
whenever possible. When off, a gather on master will be enforced
instead.

Default value will be ON.

Also add new tests to ensure sane behavior when this optimization is
turned on and fix the existing tests.
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

a341621d

21 8月, 2018 1 次提交

Do not create split update for relations excluded by constraints · 9b8dd4f4

由 Taylor Vesely 提交于 8月 09, 2018

When the query_planner determines that a relation does not to need
scanning due to constraint exclusion, it will create a 'dummy' plan for
that operation. When we plan a split update, it does not understand this
'dummy' plan shape, and will fail with an assertion.

Instead, because an excluded relation will never return tuples, do not
attempt to create a split update at all.

9b8dd4f4

03 8月, 2018 1 次提交
- K
  Revert "Merge with PostgreSQL 9.2beta2." · e0aa3ef2
  由 Karen Huddleston 提交于 8月 02, 2018
```
This reverts commit 4750e1b6.
```
  e0aa3ef2
02 8月, 2018 1 次提交

Merge with PostgreSQL 9.2beta2. · 4750e1b6

由 Richard Guo 提交于 8月 02, 2018

This is the final batch of commits from PostgreSQL 9.2 development,
up to the point where the REL9_2_STABLE branch was created, and 9.3
development started on the PostgreSQL master branch.

Notable upstream changes:

* Index-only scan was included in the batch of upstream commits. It
  allows queries to retrieve data only from indexes, avoiding heap access.

* Group commit was added to work effectively under heavy load. Previously,
  batching of commits became ineffective as the write workload increased,
  because of internal lock contention.

* A new fast-path lock mechanism was added to reduce the overhead of
  taking and releasing certain types of locks which are taken and released
  very frequently but rarely conflict.

* The new "parameterized path" mechanism was added. It allows inner index
  scans to use values from relations that are more than one join level up
  from the scan. This can greatly improve performance in situations where
  semantic restrictions (such as outer joins) limit the allowed join orderings.

* SP-GiST (Space-Partitioned GiST) index access method was added to support
  unbalanced partitioned search structures. For suitable problems, SP-GiST can
  be faster than GiST in both index build time and search time.

* Checkpoints now are performed by a dedicated background process. Formerly
  the background writer did both dirty-page writing and checkpointing. Separating
  this into two processes allows each goal to be accomplished more predictably.

* Custom plan was supported for specific parameter values even when using
  prepared statements.

* API for FDW was improved to provide multiple access "paths" for their tables,
  allowing more flexibility in join planning.

* Security_barrier option was added for views to prevents optimizations that
  might allow view-protected data to be exposed to users.

* Range data type was added to store a lower and upper bound belonging to its
  base data type.

* CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
  SELECT query is planned during the execution of the utility. To conform to
  this change, GPDB executes the utility statement only on QD and dispatches
  the plan of the SELECT query to QEs.
Co-authored-by: NAdam Lee <ali@pivotal.io>
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Co-authored-by: NGang Xiong <gxiong@pivotal.io>
Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
Co-authored-by: NPaul Guo <paulguo@gmail.com>
Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

4750e1b6

23 7月, 2018 1 次提交

Enable update on distribution column in legacy planner. · 6be0a32a

由 Zhenghua Lyu 提交于 7月 23, 2018

Before, we cannot update distribution column in legacy planner, because the OLD tuple
and NEW tuple maybe belong to different segments. We enable this by borrowing ORCA's
logic, namely, split each update operation into delete and insert. The delete operation is hashed
by OLD tuple attributes, and insert operation is hashed by NEW tuple attributes. This change
includes following items:
* We need push missed OLD attributes to sub plan tree so that that attribute could be passed to top Motion.
* In addition, if the result relation has oids, we also need to put oid in the targetlist.
* If result relation is partitioned, we should special treat it because resultRelations is partition tables instead of root table, but that is true for normal Insert.
* Special treats for update triggers, because trigger cannot be executed across segments.
* Special treatment in nodeModifyTable, so that it can process Insert/Delete for update purpose.
* Proper initialization of SplitUpdate.

There are still TODOs:
* We don't handle cost gracefully, because we add SplitUpdate node after plan generated. Already added a FIXME for this
* For deletion, we could optimize in just sending distribution columns instead of all columns


Author: Xiaoran Wang <xiwang@pivotal.io>
Author: Max Yang <myang@pivotal.io>
Author: Shujie Zhang <shzhang@pivotal.io>
Author: Zhenghua Lyu <zlv@pivotal.io>

6be0a32a

28 3月, 2018 1 次提交

Add GUC to enable / disable Join Associativity in ORCA · bb68b5c6

由 Bhuvnesh Chaudhary 提交于 3月 26, 2018

This commit introduces a GUC `optimizer_enable_associativity` to enable
or disable join associativity. Join Associativity increases the search
space as it increases the numbers of groups to represent a join and its
associative counterpart, i.e (A X B) X C ~ A X (B X C).

This patch, by default disables join associativity transform, if
required the users can enable the transform. There are few plan changes
which are observed due to this change. However, further evaluation of
the plan changes revealed that even though the cost of the the resulting
plan has increased, the execution time went down by 1-2 seconds.

For the queries with plan changes, there are 3 tables which are joined,
i.e A, B and C. If we increase the number of tuples returned by the
subquery which forms A', we see the old plan. But if the tuples in
relation B and C is significantly higher, the plan changes with the
patch yeild faster execution times. This suggests that we may need to
tune the cost model to adapt to such cases.

The plan cost increase is 1000x as compared to the old plans, this 1000x
factor is due to the value of `optimizer_nestloop_factor=1024`, if you
set the value of the GUC `optimizer_nestloop_factor=1`, the plan before
or after the patch remains same.

bb68b5c6

27 9月, 2017 1 次提交
- E
  Misc test answer file changes · 8bd49b1b
  由 Ekta Khanna 提交于 9月 15, 2017
```
Signed-off-by: NJemish Patel <jpatel@pivotal.io>
```
  8bd49b1b
25 9月, 2017 1 次提交

Remove the concept of window "key levels". · b1651a43

由 Heikki Linnakangas 提交于 9月 25, 2017

It wasn't very useful. ORCA and Postgres both just stack WindowAgg nodes
on top of each other, and no-one's been unhappy about that, so we might as
well do that, too. This reduces the difference between GPDB and the upstream
implementation, and will hopefully make it smoother to switch.

Rename the Window Plan node type to WindowAgg, to match upstream, now
that it is fairly close to the upstream version.

b1651a43

15 9月, 2017 1 次提交

Only request stats of columns needed for cardinality estimation [#150424379] · c5ade96d

由 Omer Arap 提交于 8月 29, 2017

GPORCA should not spend time extracting column statistics that are not
needed for cardinality estimation. This commit eliminates this overhead
of requesting and generating the statistics for columns that are not
used in cardinality estimation unnecessarily.

E.g:
`CREATE TABLE foo (a int, b int, c int);`

For table foo, the query below only needs for stats for column `a` which
is the distribution column and column `c` which is the column used in
where clause.
`select * from foo where c=2;`

However, prior to that commit, the column statistics for column `b` is
also calculated and passed for the cardinality estimation. The only
information needed by the optimizer is the `width` of column `b`. For
this tiny information, we transfer every stats information for that
column.

This commit and its counterpart commit in GPORCA ensures that the column
width information is passed and extracted in the `dxl:Relation` metadata
information.

Preliminary results for short running queries provides up to 65x
performance improvement.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

c5ade96d

12 9月, 2017 1 次提交

Refactor adding explicit distribution motion logic · 8f01bf79

由 Bhuvnesh Chaudhary 提交于 9月 10, 2017

nMotionNodes tracks the number of Motion in a plan, and each
plan node maintains nMotionNodes. Counting number of Motions in a plan node by
traversing the tree and adding up nMotionNodes found in nested plans will give
incorrect number of Motion nodes. So instead of using nMotionNodes, use
a boolean flag to track if the subtree tree excluding the initplans
contains a motion node

8f01bf79