提交 · 999359d051d1d8bb295c567671f033f7b2fb6677 · Greenplum / Gpdb

24 11月, 2017 1 次提交

Fixing tests for DQA with ROLLUP() · 906609e8

由 Ekta Khanna 提交于 11月 15, 2017

Post window function Postgres merge, queries with distinct qualified
aggs and ROLLUP were generating wrong results. This issue fixed by
disabling the guc gp_enable_groupext_distinct_pruning
Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>

906609e8

21 11月, 2017 1 次提交

Supporting Join Optimization Levels in GPORCA · c8192690

由 Bhuvnesh Chaudhary 提交于 11月 09, 2017

The concept of optimization levels is known in many enterprise
optimizers. It enables user to handle the degree of optimization that is
being employed. The optimization levels allow the grouping of
transformations into bags of rules (where each is assigned a particular
level). By default all rules are applied, but if a user wants to apply
fewer rules they are able to. This decision is made by them based on
domain knowledge, and they know even with fewer rules being applied the
plan generated satisfies their needs.

The Cascade optimizer, on which GPORCA is based on, allows grouping of
transformation rules into optimization levels. This concept of
optimization levels has also been extended to join ordering allowing
user to pick the join order via the query or, use greedy approach or use
exhaustive approach.

Postgres based planners use join_limit and from_limit to reduce the
search space. While the objective of Optimization/Join is also to reduce
search space, but the way it does it is different. It is requesting the
optimizer to apply or not apply a subset of rules and providing more
flexibility to the customer. This is one of the most frequently
requested feature from our enterprise clients who have high degree of
domain knowledge.

This PR introduces this concept. In the immediate future we are planning
to add different polynomial join ordering techniques with guaranteed
bound as part of the "Greedy" search.
Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>

c8192690

16 11月, 2017 1 次提交

Remove hash partitioning support · 152d1223

由 Daniel Gustafsson 提交于 11月 16, 2017

Hash partitioning was never fully implemented, and was never turned
on by default. There has been no effort to complete the feature, so
rather than carrying dead code this removes all support for hash
partitioning. Should we ever want this feature, we will most likely
start from scratch anyways.

As an effect from removing the unsupported MERGE/MODIFY commands,
this previously accepted query is no longer legal:

	create table t (a int, b int)
	distributed by (a)
	partition by range (b) (start () end(2));

The syntax was an effect of an incorrect rule in the parser which
made the start boundary optional for CREATE TABLE when it was only
intended for MODIFY PARTITION.

pg_upgrade was already checking for hash partitions so no new check
was required (upgrade would've been impossible anyways due to hash
algorithm change).

152d1223

31 10月, 2017 1 次提交

Change gp_resource_group_memory_limit default value to 0.7. · 22111c6a

由 Zhenghua Lyu 提交于 10月 30, 2017

The default value is actually a recommendation. To be safe to cluster and make the segment memory usage is close to old experienced value, we will set this GUC default
to 0.7.

22111c6a

30 10月, 2017 1 次提交

Fix a resgroup performance issue. · 0b85b9d0

由 Ning Yu 提交于 10月 30, 2017

On low end system with 1~2 cpu cores the new queries in a cold resgroup
can suffer from a high latency when the overall load is very high.

The root cause is that we used to set very high cpu priority for gpdb
cgroups, so non gpdb process are scheduled with very low priority and
high latency. GPDB processes are also affected by this because
postmaster and other auxiliary are not put into gpdb cgroups. Even for
QD and QEs they are not put into a gpdb cgroup until their transaction
is began.

To fix this we made below changes:
* put postmaster and all its children processes into the toplevel
  gpdb cgroup;
* provide a GUC to control the cgroup cpu priority for gpdb processes
  when resgroup is enabled;
* set a lower cpu priority by default;

0b85b9d0

14 10月, 2017 1 次提交
- J
  Mark join assoiativity & commutativity GUC visible · c770bcf4
  由 Jesse Zhang and Omer Arap 提交于 10月 13, 2017
```
Allow GUC optimizer_join_arity_for_associativity_commutativity
to be visible to regular database users.
```
  c770bcf4
10 10月, 2017 1 次提交

Hide the two tuplesort implementations behind a common facade. · bbf40a8c

由 Heikki Linnakangas 提交于 10月 10, 2017

We have two implementations of tuplesort: the "regular" one inherited
from upstream, in tuplesort.c, and a GPDB-specific tuplesort_mk.c. We had
modified all the callers to check the gp_enable_mk_sort GUC, and deal with
both of them. However, that makes merging with upstream difficult, and
litters the code with the boilerplate to check the GUC and call one of
the two implementations.

Simplify the callers, by providing a single API that hides the two
implementations from the rest of the system. The API is the tuplesort_*
functions, as in upstream. This requires some preprocessor trickery,
so that tuplesort.c can use the tuplesort_* function names as is, but in
the rest of the codebase, calling tuplesort_*() will call a "switcheroo"
function that decides which implementation to actually call. While this
is more lines of code overall, it keeps all the ugliness confined in
tuplesort.h, not littered throughout the codebase.

bbf40a8c

09 10月, 2017 1 次提交

Decouple GUC max_resource_groups and max_connections. · cbd23ea2

由 Richard Guo 提交于 10月 09, 2017

Previously there is a restriction on GUC 'max_resource_groups'
that it cannot be larger than 'max_connections'.
This restriction may cause gpdb fail to start if the two GUCs
are not set properly.
We decide to decouple these two GUCs and set a hard limit
of 100 for 'max_resource_groups'.

cbd23ea2

26 9月, 2017 1 次提交

Convert GPDB-specific GUCs to the new "enum" type · 59165cfa

由 Jacob Champion 提交于 9月 26, 2017

Several GUCs are simply enumerated strings that are parsed into integer
types behind the scenes. As of 8.4, the GUC system recognizes a new
type, enum, which will do this for us. Move as many as we can to the new
system.

As part of this,
- gp_idf_deduplicate was changed from a char* string to an int, and new
  IDF_DEDUPLICATE_* macros were added for each option
- password_hash_algorithm was changed to an int
- for codegen_optimization_level, "none" is the default now when codegen
  is not enabled during compilation (instead of the empty string).

A couple of GUCs that *could* be represented as enums
(optimizer_minidump, gp_workfile_compress_algorithm) have been
purposefully kept with the prior system because they require the GUC
variable to be something other than an integer anyway.
Signed-off-by: NJacob Champion <pchampion@pivotal.io>

59165cfa

22 9月, 2017 1 次提交

Enable ORCA to be tracked by Mem Accounting · 669dd279

由 Kavinder Dhaliwal 提交于 9月 21, 2017

Before this commit all memory allocations made by ORCA/GPOS were a
blackbox to GPDB. However the ground work had been in place to allow
GPDB's Memory Accounting Framework to track memory consumption by ORCA.
This commit introduces two new functions
Ext_OptimizerAlloc and Ext_OptimizerFree which
pass through their parameters to gp_malloc and gp_free and do some bookeeping
against the Optimizer Memory Account. This introduces very little
overhead to the GPOS memory management framework.
Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

669dd279

21 9月, 2017 1 次提交

Make gp_replication.conf for USE_SEGWALREP only. · b7ce6930

由 Ashwin Agrawal 提交于 9月 20, 2017

The intend of this extra configuration file is to control the
synchronization between primary and mirror for WALREP.

The gp_replication.conf is not designed to work with filerep, for
example, the scripts like gp_expand will fail since it directly modify
the configuration files instead of going through initdb.
Signed-off-by: NXin Zhang <xzhang@pivotal.io>

b7ce6930

20 9月, 2017 1 次提交

Remove the restriction on sum of memory_spill_ratio and memory_shared_quota. · c5a5780a

由 Richard Guo 提交于 9月 20, 2017

This commit does two changes:
1. Remove the restriction that sum of memory_spill_ratio and memory_shared_quota
must be no larger than 100.
2. Change the range of memory_spill_ratio to be [0, 100].

c5a5780a

19 9月, 2017 3 次提交

X
Fix: address PR comment and adding MERGE_FIXME · f3c00e1b
由 Xin Zhang 提交于 9月 14, 2017
```
Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>
```
f3c00e1b

Create generic API to set any GUC values in GP_REPLICATION_CONFIG_FILENAME · f39047dd

由 Xin Zhang 提交于 9月 13, 2017

New API: void set_gp_replication_config(const char *name, const char *value)

This function is inspired by the upstream ALTER SYSTEM command
AlterSystemSetConfigFile() from commit
7dfab04a.

Once we merged the upstream changes, we can remove this function and
directly use the AlterSystemSetConfigFile().
Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>

f39047dd

Add GP-specific replication config file gp_replication.conf · a90b4103

由 Xin Zhang 提交于 9月 13, 2017

We use this file to store the GUC value `sychronous_standby_names` in
order to control the blocking behavior between primary and mirrors as
used in upstream. When this GUC is on, the primary is blocked and
waiting for the commits propagated to mirrors regardless of mirror
status. When this GUC is off, primary just archive and won't wait for
mirrors.

gp_replication.conf is now read unconditionally by the GUC parsing logic
and needs to be set up by initdb. Refactor set_null_conf() to take a
filename so that we don't copy-paste more code.
Signed-off-by: NJacob Champion <pchampion@pivotal.io>

a90b4103

14 9月, 2017 1 次提交

Remove unused ENABLE_LTRACE code. · d994b38e

由 Heikki Linnakangas 提交于 9月 14, 2017

Although I'm not too familiar with SystemTap, I'm pretty sure that recent
versions can do user space tracing better. I don't think anyone is using
these hacks anymore, so remove them.

d994b38e

07 9月, 2017 4 次提交

Un-hide recursive CTE on master [#150861534] · 20152cbf

由 Jesse Zhang 提交于 9月 06, 2017

We will be less conservative and enable by default recursive CTE on
master, while keeping recursive CTE hidden as we progress on developing
the feature.

This reverts the following two commits:
* 280c577a "Set gp_recursive_cte_prototype GUC to true in test"
* 4d5f8087 "Guard Recursive CTE behind a GUC"
Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>

20152cbf

Evaluate lesser joins to produce best join tree · 6ad94ff2

由 Jemish Patel 提交于 9月 06, 2017

Previously we were setting the value of
`optimizer_join_arity_for_associativity_commutativity` to very large
number and so ORCA would spend a very long time evaluating all possible n_way_join
combinations to come up with the cheapest join tree to use in the plan.

We are reducing this value to `7` as it does not prove to be beneficial
to spend time and resources to evaluate any more than 7_way_joins in
trying to find the cheapest join tree.

6ad94ff2

J
Set gp_recursive_cte_prototype GUC to true in test · 280c577a
由 Jesse Zhang 提交于 8月 21, 2017
```
Plus minor corrections in spelling and comments.
Signed-off-by: NSam Dash <sdash@pivotal.io>
```
280c577a

Guard Recursive CTE behind a GUC · 4d5f8087

由 Kavinder Dhaliwal 提交于 8月 18, 2017

While Recurisve CTE is still being developed it will be hidden from users by
the guc gp_recursive_cte_prototype
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

4d5f8087

01 9月, 2017 1 次提交

Fix Copyright and file headers across the tree · ed7414ee

由 Daniel Gustafsson 提交于 9月 01, 2017

This bumps the copyright years to the appropriate years after not
having been updated for some time. Also reformats existing code
headers to match the upstream style to ensure consistency.

ed7414ee

29 8月, 2017 1 次提交
- X
  GUC gp_enable_segment_copy_checking should be sent by cdbgang · f393cbe2
  由 Xiaoran Wang 提交于 8月 29, 2017
```
Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>
Signed-off-by: NAdam Lee <ali@pivotal.io>
```
  f393cbe2
28 8月, 2017 1 次提交

Add GUC to control the distribution key checking for "COPY FROM ON SEGMENT" · 6566d48c

由 Xiaoran Wang 提交于 8月 23, 2017

GUC value is `gp_enable_segment_copy_checking`, its default value is true.
User can disable the distribution key check with with a GUC value.
Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>

6566d48c

24 8月, 2017 1 次提交

Add GUC 'memory_spill_ratio' for resource group. · 44373949

由 Richard Guo 提交于 8月 24, 2017

GUC 'memory_spill_ratio' can be set at multiple levels,
in particular at resource group level and session level,
and session would override resource group.

When setting GUC 'memory_spill_ratio' at session level, the
semantic validation will not be checked until it is referred
by further queries.

44373949

21 8月, 2017 1 次提交

Move ORCA invocation into standard_planner · d5dbbfd9

由 Daniel Gustafsson 提交于 8月 21, 2017

The way ORCA was tied into the planner, running a planner_hook
was not supported in the intended way. This commit moves ORCA
into standard_planner() instead of planner() and leaves the hook
for extensions to make use of, with or without ORCA. Since the
intention with the optimizer GUC is to replace the planner in
postgres, while keeping the planning proess, this allows for
planner extensions to co-operate with that.

In order to reduce the Greenplum footprint in upstream postgres
source files for future merges, the ORCA functions are moved to
their own file.

Also adds a memaccounting class for planner hooks since they
otherwise ran in the planner scope, as well as a test for using
planner_hooks.

d5dbbfd9

15 8月, 2017 1 次提交

Remove unused test GUCs. · 0d9942a2

由 Heikki Linnakangas 提交于 8月 14, 2017

There are better ways to set block size and table type, like with the
gp_default_storage_options GUC.

0d9942a2

09 8月, 2017 1 次提交

Add debug info for interconnect network timeout · 9a9cd48b

由 Pengzhou Tang 提交于 8月 07, 2017

It was very difficult to verify if interconnect is stucked in resending
phase or if there is udp resending latency within interconnect. To improve
it, this commit record a debug message every Gp_interconnect_debug_retry_interval
times when gp_log_interconnect is set to DEBUG.

9a9cd48b

03 8月, 2017 1 次提交

Add checksum verification on mirror of filerep resync · 51ff21af

由 Xin Zhang 提交于 8月 02, 2017

Validate every BufferPool page sent to the mirror by the
primary prior to writing.
Signed-off-by: NTaylor Vesely <tvesely@pivotal.io>

51ff21af

02 8月, 2017 1 次提交

Make memory spill in resource group take effect · 68babac4

由 Richard Guo 提交于 8月 02, 2017

Resource group memory spill is similar to 'statement_mem' in
resource queue, the difference is memory spill is calculated
according to the memory quota of the resource group.

The related GUCs, variables and functions shared by both resource
queue and resource group are moved to the namespace resource manager.

Also codes of resource queue relating to memory policy are refactored in this commit.
Signed-off-by: NPengzhou Tang <ptang@pivotal.io>
Signed-off-by: NNing Yu <nyu@pivotal.io>

68babac4

13 7月, 2017 1 次提交

Add GUC to control number of blocks that a resync worker operates on · 2960bd7c

由 Asim R P 提交于 6月 27, 2017

The GUC gp_changetracking_max_rows replaces a compile time constant. Resync
worker obtains at the most gp_changetracking_max_rows number of changed blocks
from changetracking log at one time. Controling this with a GUC allows
exploiting bugs in resync logic around this area.

2960bd7c

29 6月, 2017 1 次提交

Implement resgroup memory limit (#2669) · b5e1fb0a

由 Ning Yu 提交于 6月 29, 2017

Implement resgroup memory limit.

In a resgroup we divide the memory into several slots, the number
depends on the concurrency setting in the resgroup. Each slot has a
reserved quota of memory, all the slots also share some shared memory
which can be acquired preemptively.

Some GUCs and resgroup options are defined to adjust the exact allocation
policy:

resgroup options:
- memory_shared_quota
- memory_spill_ratio

GUCs:
- gp_resource_group_memory_limit
Signed-off-by: NNing Yu <nyu@pivotal.io>

b5e1fb0a

24 6月, 2017 1 次提交

Enable xlogging for create fs objects on segments. · 9efec6b2

由 Ashwin Agrawal 提交于 6月 20, 2017

Incase of --enable-segwalrep, write-ahead logging should not be skipped for
anything, as it relies on that mechanism to construct the things on
mirror. Write-ahead logging for these pieces were only enabled performed for
master, with this commit gets enabled for segments as well.

9efec6b2

22 6月, 2017 1 次提交

Eliminating alien nodes before execution (#2588) · 9b8f5c0b

由 foyzur 提交于 6月 21, 2017

In GPDB the dispatcher dispatches the entire plan tree to each query executor (QX). Each QX deserializes the entire plan tree and starts execution from the root of the plan tree. This begins by calling InitPlan on the QueryDesc, which blindly calls ExecInitNode on the root of the plan.

Unfortunately, this is wasteful, in terms of memory and CPU. Each QX is in charge of a single slice. There can be many slices. Looking into plan nodes that belong to other slices, and initializing (e.g., creating PlanState for such nodes) is clearly wasteful. For large plans, particularly planner plans, in the presence of partitions, this can add up to a significant waste.

This PR proposes a fix to solve this problem. The idea is to find the local root for each slice and start ExecInitNode there.

There are few special cases:

SubPlans are special, as they appear as expression but the expression holds the root of the sub plan tree. All the subplans are bundled in the plannedstmt->subplans, but confusingly as Plan pointers (i.e., we save the root of the SubPlan expression's Plan tree). Therefore, to find the relevant sub plans, we need to first find the relevant expressions and extract their roots and then iterate the plannedstmt->subplans, but only ExecInitNode on the ones that we can reach from some expressions in current slice.

InitPlan are no better as they can appear anywhere in the Plan tree. Walking from a local motion is not sufficient to find these InitPlan. Therefore, we need to walk from the root of the plan tree and identify all the SubPlan. Note: unlike regular subplan, the initplan may not appear in the expression as subplan; rather it will appear as a parameter generator in some other parts of the tree. We need to find these InitPlan and obtain the SubPlan for each InitPlan. We can then use the SubPlan's setParam to copy precomputed parameter values from estate->es_param_list_info to estate->es_param_exec_vals

We also found that the origSliceIdInPlan is highly unreliable and cannot be used as an indicator of a plan node's slice information. Therefore, we precompute each plan node's slice information to correctly determine if a Plan node is alien or not. This makes alien node identification more accurate. In successive PRs, we plan to use the alien memory account balance as a test to see if we successfully eliminated all aliens. We will also use the alien account balance to determine memory savings.

9b8f5c0b

07 6月, 2017 2 次提交

restore TCP interconnect · 353a937d

由 Pengzhou Tang 提交于 5月 22, 2017

This commit restore TCP interconnect and fix some hang issues.

* restore TCP interconnect code
* Add GUC called gp_interconnect_tcp_listener_backlog for tcp to control the backlog param of listen call
* use memmove instead of memcpy because the memory areas do overlap.
* call checkForCancelFromQD() for TCP interconnect if there are no data for a while, this can avoid QD from getting stuck.
* revert cancelUnfinished related modification in 8d251945, otherwise some queries will get stuck
* move and rename faultinjector "cursor_qe_reader_after_snapshot" to make test cases pass under TCP interconnect.

353a937d

Misc changes of gp_log_gang · 9d5b10ae

由 Pengzhou Tang 提交于 5月 21, 2017

* Change the default level of gp_log_gang to off.
* Log the query plan size in level TERSE, it's useful for debugging.

9d5b10ae

25 5月, 2017 1 次提交
- V
  
  Add start/end_ignore where needed · ebc36838
  由 Venkatesh Raghavan 提交于 5月 23, 2017
  
  ebc36838
19 5月, 2017 2 次提交

Implement resource group cpu rate limitation. · 2650f728

由 Pengzhou Tang 提交于 5月 01, 2017

Resource group cpu rate limitation is implemented with cgroup on linux
system. When resource group is enabled via GUC we check whether cgroup
is available and properly configured on the system. A sub cgroup is
created for each resource group, cpu quota and share weight will be set
depends on the resource group configuration. The queries will run under
these cgroups, and the cpu usage will be restricted by cgroup.

The cgroups directory structures:
* /sys/fs/cgroup/{cpu,cpuacct}/gpdb: the toplevel gpdb cgroup
* /sys/fs/cgroup/{cpu,cpuacct}/gpdb/*/: cgroup for each resource group

The logic for cpu rate limitation:

* in toplevel gpdb cgroup we set the cpu quota and share weight as:

    cpu.cfs_quota_us := cpu.cfs_period_us * 256 * gp_resource_group_cpu_limit
    cpu.shares := 1024 * ncores

* for each sub group we set the cpu quota and share weight as:

    sub.cpu.cfs_quota_us := -1
    sub.cpu.shares := top.cpu.shares * sub.cpu_rate_limit

The minimum and maximum cpu percentage for a sub cgroup:

    sub.cpu.min_percentage := gp_resource_group_cpu_limit * sub.cpu_rate_limit
    sub.cpu.max_percentage := gp_resource_group_cpu_limit

The acutal percentage depends on how busy the system is.

gp_resource_group_cpu_limit is a GUC introduced to control the cpu
resgroups assigned on each host.

    gpconfig -c gp_resource_group_cpu_limit -v '0.9'

A new pipeline is created to perform the tests as we need privileged
permission to enable and setup cgroups on the system.
Signed-off-by: NNing Yu <nyu@pivotal.io>

2650f728

Make ICG tests pass when GPDB is compiled with disable-orca · 7e774f28

由 Venkatesh Raghavan 提交于 5月 18, 2017

In the updated tests, we used functions like disable_xform and
enable_xform to hint the optimizer to disallow/allow a particular
physical node. However, these functions are only available when GPDB
is built with GPORCA. Planner on the other hand accomplished this
via a GUC.

To avoid usage of these functions in tests, I have introduced couple
of GUCS that mimic the same planner behavior but now for GPORCA.
In this effort I needed to add an API inside GPORCA.

7e774f28

15 5月, 2017 1 次提交

Streamline Orca Gucs · 9f2c838b

由 Venkatesh Raghavan 提交于 5月 15, 2017

* Enable analyzing root partitions
* Ensure that the name of the guc is clear
* Remove double negation (where possible)
* Update comments
* Co-locate gucs that have similar purpose
* Remove dead gucs
* Classify them correctly so that they are no longer hidden

9f2c838b

11 5月, 2017 1 次提交
- K
  Limit the optimizer mdcache default size to 16MB. · 15b97158
  由 Karthikeyan Jambu Rajaraman 提交于 5月 08, 2017
```
Based on the experiments with tpch and tpcds, we found that 16MB would
be the optimal size of mdcache.
[#96490982].
```
  15b97158