提交 · 1822c8269fb2f12efa29a35d1e57b3ff6ab28bea · Greenplum / Gpdb

23 9月, 2017 3 次提交

Add a long living account for Relinquished Memory · 1822c826

由 Kavinder Dhaliwal 提交于 9月 22, 2017

There are cases where during execution a Memory Intensive Operator (MI)
may not use all the memory that is allocated to it. This means that this
extra memory (quota - allocated) can be relinquished for other MI nodes
to use during execution of a statement. For example

->  Hash Join
         ->  HashAggregate
         ->  Hash
In the above query fragment the HashJoin operator has a MI operator for
both its inner and outer subtree. If there ever is the case that the
Hash node used much less memory than was given as its quota it will now
call MemoryAccounting_DeclareDone() and the difference between its
quota and allocated amount will be added to the allocated amount of the
RelinquishedPool. Doing this will enable HashAggregate to request memory
from this RelinquishedPool if it exhausts its quota to prevent spilling.

This PR adds two new API's to the MemoryAccounting Framework

MemoryAccounting_DeclareDone(): Add the difference between a memory
account's quota and its allocated amount to the long living
RelinquishedPool

MemoryAccounting_RequestQuotaIncrease(): Retrieve all relinquished
memory by incrementing an operator's operatorMemKb and setting the
RelinquishedPool to 0

Note: This PR introduces the facility for Hash to relinquish memory to
the RelinquishedPool memory account and for the Agg operator
(specifically HashAgg) to request an increase to its quota before it
builds its hash table. This commit does not generally apply this
paradigm to all MI operators
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>

1822c826

Cherry-pick 'ae47eb1' from upstream to fix Nested CTE errors (#3360) · 009b1809

由 sambitesh 提交于 9月 22, 2017

Before this cherry-pick the below query would have errored out

WITH outermost(x) AS (
  SELECT 1
  UNION (WITH innermost as (SELECT 2)
         SELECT * FROM innermost
         UNION SELECT 3)
)
SELECT * FROM outermost;
Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>

009b1809

Add gp_stat_replication view · 1546ec3b

由 Taylor Vesely 提交于 9月 21, 2017

In order to view the primary segments' replication stream data from
their pg_stat_replication view, we currently need to connect to the
primary segment individually via utility mode. To make life easier, we
introduce a function that will fetch each primary segment's
replication stream data and wrap it with a view named
gp_stat_replication. It will now be possible to view all the cluster
replication information from the master in a regular psql session.

Authors: Taylor Vesely and Jimmy Yih

1546ec3b

22 9月, 2017 2 次提交

Enable ORCA to be tracked by Mem Accounting · 669dd279

由 Kavinder Dhaliwal 提交于 9月 21, 2017

Before this commit all memory allocations made by ORCA/GPOS were a
blackbox to GPDB. However the ground work had been in place to allow
GPDB's Memory Accounting Framework to track memory consumption by ORCA.
This commit introduces two new functions
Ext_OptimizerAlloc and Ext_OptimizerFree which
pass through their parameters to gp_malloc and gp_free and do some bookeeping
against the Optimizer Memory Account. This introduces very little
overhead to the GPOS memory management framework.
Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

669dd279

Fix comment, rendered incorrect by commit . · a133901a

由 Heikki Linnakangas 提交于 9月 21, 2017

We can encounter tuples that belong to later batches even after the first
pass. Revert the comment to the way it is in upstream.
I forgot to update

a133901a

21 9月, 2017 9 次提交

Fix bug in handling re-scan of a hash join. · f7101d98

由 Heikki Linnakangas 提交于 9月 21, 2017

The WITH RECURSIVE test case in 'join_gp' would miss some rows, if
the hash algorithm (src/backend/access/hash/hashfunc.c) was replaced
with the one from PostgreSQL 8.4, or if statement_mem was lowered from
1000 kB to 700 kB. This is what happened:

1. A tuple belongs to batch 0, and is kept in memory during processing
   batch 0.

2. The outer scan finishes, and we spill the inner batch 0 from memory
   to a file, with SpillFirstBatch, and start processing tuple 1

3. While processing batch 1, the number of batches is increased, and
   the tuple that belonged to batch 0, and was already written to the
   batch 0's file, is moved, to a later batch.

4. After the first scan is complete, the hash join is re-scanned

5. We reload the batch file 0 into memory. While reloading, we encounter
   the tuple that now doesn't seem to belong to batch 0, and throw it
   away.

6. We perform the rest of the re-scan. We have missed any matches to the
   tuple that was thrown away. It was not part of the later batch files,
   because in the first pass, it was handled as part of batch 0. But in
   the re-scan, it was not handled as part of batch 0, because nbatch was
   now larger, so it didn't belong there.

To fix, when reloading a batch file we see a tuple that actually belongs
to a later batch file, we write it to that later file. To avoid adding
it there multiple times, if the hash join is re-scanned multiple times,
if any tuples are moved when reloading a batch file, destroy the batch
file and re-create it with just the remaining tuples.

This is made a bit complicated by the fact that BFZ temp files don't support
appending to a file that's already been rewinded for reading. So what we
actually do, is always re-create the batch file, even if there has been no
changes to it. I left comments about that, Ideally, we would either support
re-appending to BFZ files, or stopped using BFZ workfiles for this
altogether (I'm not convinced they're any better than plain BufFiles). But
that can be done later.

Fixes github issue #3284

f7101d98

Don't double-count inner tuples reloaded from file. · 429ff8c4

由 Heikki Linnakangas 提交于 9月 21, 2017

ExecHashTableInsert also increments the counter, so we don't need to do it
here. This is harmless AFAICS, the counter isn't used for anything but
instrumentation at the moment, but it confused me while debugging.

429ff8c4

Fix CURRENT OF to work with PL/pgSQL cursors. · 91411ac4

由 Heikki Linnakangas 提交于 9月 21, 2017

It only worked for cursors declared with DECLARE CURSOR, before. You got
an "there is no parameter $0" error if you tried. This moves the decision
on whether a plan is "simply updatable", from the parser to the planner.
Doing it in the parser was awkward, because we only want to do it for
queries that are used in a cursor, and for SPI queries, we don't know it
at that time yet.

For some reason, the copy, out, read-functions of CurrentOfExpr were missing
the cursor_param field. While we're at it, reorder the code to match
upstream.

This only makes the required changes to the Postgres planner. ORCA has never
supported updatable cursors. In fact, it will fall back to the Postgres
planner on any DECLARE CURSOR command, so that's why the existing tests
have passed even with optimizer=off.

91411ac4

Remove now-unnecessary code from gp_read_error_log to dispatch the call. · 4035881e

由 Heikki Linnakangas 提交于 9月 21, 2017

There was code in gp_read_error_log(), to "manually" dispatch the call to
all the segments, if it was executed in the dispatcher. This was
previously necessary, because even though the function was marked with
prodataaccess='s', the planner did not guarantee that it's executed in the
segments, when called in the targetlist like "SELECT
gp_read_error_log('tab')". Now that we have the EXECUTE ON ALL SEGMENTS
syntax, and are more rigorous about enforcing that in the planner, this
hack is no longer required.

4035881e

Refactor resource group source code, part 2. · a2cf9bdf

由 Ning Yu 提交于 9月 21, 2017

* resgroup: provide helper funcs for memory usage updates.

We used to have complex and duplicate logic to update group & slot
memory usage under different context, now we provide two helper
functions to increase or decrease memory usage in group and slot.

Two bad named functions `attachToSlot()` and `detachFromSlot()` are
retired now.

* resgroup: provide helper function to unassign a dropped resgroup.

* resgroup: move complex checks into helper functions.

Many helper functions were added with descriptive names to increase
readability of lots of complex checks.

Also added a pointer to resource group slot in self.

* resgroup: add helper functions for wait queue operations.

a2cf9bdf

Make gp_replication.conf for USE_SEGWALREP only. · b7ce6930

由 Ashwin Agrawal 提交于 9月 20, 2017

The intend of this extra configuration file is to control the
synchronization between primary and mirror for WALREP.

The gp_replication.conf is not designed to work with filerep, for
example, the scripts like gp_expand will fail since it directly modify
the configuration files instead of going through initdb.
Signed-off-by: NXin Zhang <xzhang@pivotal.io>

b7ce6930

Take advantage of the new EXECUTE ON syntax in gp_toolkit. · 9a039e4f

由 Heikki Linnakangas 提交于 9月 20, 2017

Also change a few regression tests to use the new syntax, instead of
gp_toolkit's __gp_localid and __gp_masterid functions.

9a039e4f

Add support for CREATE FUNCTION EXECUTE ON [MASTER | ALL SEGMENTS] · aa148d2a

由 Heikki Linnakangas 提交于 9月 20, 2017

We already had a hack for the EXECUTE ON ALL SEGMENTS case, by setting
prodataaccess='s'. This exposes the functionality to users via DDL, and adds
support for the EXECUTE ON MASTER case.

There was discussion on gpdb-dev about also supporting ON MASTER AND ALL
SEGMENTS, but that is not implemented yet. There is no handy "locus" in the
planner to represent that. There was also discussion about making a
gp_segment_id column implicitly available for functions, but that is also
not implemented yet.

The old behavior was that a function that if a function was marked as
IMMUTABLE, it could be executed anywhere. Otherwise it was always executed
on the master. For backwards-compatibility, this keeps that behavior for
EXECUTE ON ANY (the default), so even if a function is marked as EXECUTE ON
ANY, it will always be executed on the master unless it's IMMUTABLE.

There is no support for these new options in ORCA. Using any ON MASTER or
ON ALL SEGMENTS functions in a query cause ORCA to fall back. This is the
same as with the prodataaccess='s' hack that this replaces, but now that it
is more user-visible, it would be nice to teach ORCA about it.

The new options are only supported for set-returning functions, because for
a regular function marked as EXECUTE ON ALL SEGMENTS, it's not clear how
the results should be combined. ON MASTER would probably be doable, but
there's no need for that right now, so punt.

Another restriction is that a function with ON ALL SEGMENTS or ON MASTER can
only be used in the FROM clause, or in the target list of a simple SELECT
with no FROM clause. So "SELECT func()" is accepted, but "SELECT func() FROM
foo" is not. "SELECT * FROM func(), foo" works, however. EXECUTE ON ANY
functions, which is the default, work the same as before.

aa148d2a

Fix multistage aggregation plan targetlists · 41640e69

由 Bhuvnesh Chaudhary 提交于 9月 19, 2017

If there are aggregation queries with aliases same as the table actual
columns and they are propagated further from subqueries and grouping is
applied on the column alias it may result in inconsistent targetlists
for aggregation plan causing crash.

	CREATE TABLE t1 (a int) DISTRIBUTED RANDOMLY;
	SELECT substr(a, 2) as a
	FROM
		(SELECT ('-'||a)::varchar as a
			FROM (SELECT a FROM t1) t2
		) t3
	GROUP BY a;

41640e69

20 9月, 2017 6 次提交

Dump more detailed info for memory usage in gp_resgroup_status · 2816fe67

由 Pengzhou Tang 提交于 9月 18, 2017

In this commit, we add more detailed memory metrics to the 'memory_usage'
column of gp_resgroup_status include current/available memory usage in
a group, current/available memory usage for a slot, current/available
memory usage for the shared part.

2816fe67

resource group: refine ResGroupSlotAcquire · 4646bbc6

由 Gang Xiong 提交于 9月 11, 2017

Previously, waiters waiting on a dropped resource group need to be
reassigned to a new group, to achieve it, ResGroupSlotAcquire is
modified to be complicated and not easy to understand, this commit
refines it.

Author: Gang Xiong <gxiong@pivotal.io>

4646bbc6

resgroup: Allow concurrency to be zero. · 77007ff6

由 Pengzhou Tang 提交于 9月 05, 2017

Allow CREATE RESOURCE GROUP and ALTER RESOURCE GROUP to set concurrency
to 0, so there will eventually be no running queries after some time, so
the resource group can be dropped. On drop all pending queries will be
moved to the new resource group assigned to the role; but if the role is
also dropped the pending queries will all be canceled. Another thing is
we do not allow setting concurrency of admin group to zero, superuser is
under admin group and only superuser can alter resource group, so once
concurrency of admin group is set to zero, there will be no chance to set
it again.
Signed-off-by: NNing Yu <nyu@pivotal.io>

77007ff6

M
Report error when 'COPY (SELECT ...) TO' with 'ON SEGMENT' · cbddcc86
由 Ming LI 提交于 9月 20, 2017
```
Because we don't know the data location of the result of SELECT query,
ON SEGMENT is forbidden.
```
cbddcc86

Remove the restriction on sum of memory_spill_ratio and memory_shared_quota. · c5a5780a

由 Richard Guo 提交于 9月 20, 2017

This commit does two changes:
1. Remove the restriction that sum of memory_spill_ratio and memory_shared_quota
must be no larger than 100.
2. Change the range of memory_spill_ratio to be [0, 100].

c5a5780a

Fix warning of passing const to non-const parameter. · f4417c50

由 Hubert Zhang 提交于 9月 19, 2017

Function FaultInjectorIdentifierStringToEnum(faultName) pass a const
string to a non-const parameter, which cause a build warnig. But on the
second thought, we have supported injecting fault by fault name without
corresponding fault identifier, so it's better to use faultname instead
of fault enum identifier in the ereport.

f4417c50

19 9月, 2017 4 次提交

Map GPOS severity level to GPDB Severity Levels · e25eba47

由 Bhuvnesh Chaudhary 提交于 9月 13, 2017

GPOS raises exception with different severity level, but
they were being logged to GPDB logs at LOG severity level.
This disabled users to not turn off logging for GPOS exceptions, unless
GPDB log setting was changed higher than LOG severity level.

This is the initial commit which introduces the functionality. If an
exception is created without the GPDB severity level, it will default to
LOG severity level in GPDB.
Signed-off-by: NJemish Patel <jpatel@pivotal.io>

e25eba47

X
Fix: address PR comment and adding MERGE_FIXME · f3c00e1b
由 Xin Zhang 提交于 9月 14, 2017
```
Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>
```
f3c00e1b

Create generic API to set any GUC values in GP_REPLICATION_CONFIG_FILENAME · f39047dd

由 Xin Zhang 提交于 9月 13, 2017

New API: void set_gp_replication_config(const char *name, const char *value)

This function is inspired by the upstream ALTER SYSTEM command
AlterSystemSetConfigFile() from commit
7dfab04a.

Once we merged the upstream changes, we can remove this function and
directly use the AlterSystemSetConfigFile().
Signed-off-by: NAbhijit Subramanya <asubramanya@pivotal.io>

f39047dd

Add GP-specific replication config file gp_replication.conf · a90b4103

由 Xin Zhang 提交于 9月 13, 2017

We use this file to store the GUC value `sychronous_standby_names` in
order to control the blocking behavior between primary and mirrors as
used in upstream. When this GUC is on, the primary is blocked and
waiting for the commits propagated to mirrors regardless of mirror
status. When this GUC is off, primary just archive and won't wait for
mirrors.

gp_replication.conf is now read unconditionally by the GUC parsing logic
and needs to be set up by initdb. Refactor set_null_conf() to take a
filename so that we don't copy-paste more code.
Signed-off-by: NJacob Champion <pchampion@pivotal.io>

a90b4103

18 9月, 2017 2 次提交

Add sanity checks for unrecognized window frame options. · c7c158dd

由 Heikki Linnakangas 提交于 9月 18, 2017

These shouldn't happen, but Coverity warned about these. GCC would also
complain, but I've been compiling with -Wno-maybe-uninitialized lately,
because of noise.

Actually, this isn't quite enough; ORCA also needs to mark GPOS_RAISE
with the "noreturn" attribute, so that the compiler gets the hint.
Opened https://github.com/greenplum-db/gporca/pull/234 about that.

c7c158dd

Using fault name instead of enum as the key of fault hash table (#3249) · 4616d3ec

由 Huan Zhang 提交于 9月 18, 2017

Using fault name instead of enum as the key of fault hash table

GPDB fault injector uses fault enum as the key of fault hash table.
If someone wants to inject fault into gpdb extensions(a separate repo),
she has to hard code the extension related fault enums into gpdb core
code, this is not a good practice.
So we simply use fault name as the hash key to remove the need of hard
code the fault enum. Note that fault injector API doesn't change.

4616d3ec

17 9月, 2017 2 次提交

Convert WindowFrame to frameOptions + start + end · ebf9763c

由 Heikki Linnakangas 提交于 9月 17, 2017

In GPDB, we have so far used a WindowFrame struct to represent the start
and end window bound, in a ROWS/RANGE BETWEEN clause, while PostgreSQL
uses the combination of  a frameOptions bitmask and start and end
expressions. Refactor to replace the WindowFrame with the upstream
representation.

ebf9763c

Hardcode the "frame maker" function for LEAD and LAG. · 686aab95

由 Heikki Linnakangas 提交于 9月 17, 2017

This removes pg_window.winframemakerfunc column. It was only used for
LEAD/LAG, and only in the Postgres planner. Hardcode the same special
handling for LEAD/LAG in planwindow.c instead, based on winkind.

This is one step in refactoring the planner and executor further, to
replace the GPDB implementation of window functions with the upstream
one.

686aab95

16 9月, 2017 3 次提交

Fix check for superuser_reserved_connections. · 06ea112c

由 Heikki Linnakangas 提交于 9月 16, 2017

Upstream uses >= here. It was changed in GPDB, to use > instead of >=. but
I don't see how that's more correct or better. I tracked that change in
the old pre-open-sourcing repository to this commit:

commit f3e98a1ef5fc5915662077b137c563371ea1c0a4
Date: Mon Apr 6 15:04:33 2009 -0800

   Fixed guc check for ReservedBackends.

   [git-p4: depot-paths = "//cdb2/main/": change = 33269]

So, there was no explanation there either, what the alleged problem was.

06ea112c

Fix CREATE TABLE AS VALUES ... DISTRIBUTED BY · 47936ab2

由 Heikki Linnakangas 提交于 9月 16, 2017

Should call setQryDistributionPolicy() after applyColumnNames(), otherwise
the column names specified in the CREATE TABLE cannot be used in the
DISTRIBUTED BY clause. Add test case.

Fixes github issue #3285.

47936ab2

Remove function isMemoryIntensiveFunction · 5c9b81ef

由 Kavinder Dhaliwal 提交于 8月 30, 2017

Historically this function was used to special case a few operators that
were not considered to be MemoryIntensive. However, now it always
returns true. This commit removes the function and also moves the case
for T_FunctionScan in IsMemoryIntensiveOperator into the group that
always returns true, as this is its current behavior

5c9b81ef

15 9月, 2017 9 次提交

Make it possible to build without libbz2, also on non-Windows. · d6749c3c

由 Heikki Linnakangas 提交于 9月 15, 2017

The bzip2 library is only used by the gfile/fstream code, used for external
tables and gpfdist. The usage of bzip2 was in #ifndef WIN32 blocks, so it
was only built on non-Windows systems.

Instead of tying it to the platform, use a proper autoconf check and
HAVE_LIBBZ2 flags. This makes it possible to build gpfdist with bzip2
support on Windows, as well as building without bzip2 on non-Windows
systems. That makes it easier to test the otherwise Windows-only codepaths
on other platforms. --with-libbz2 is still the default, but you can now use
--without-libbz2 if you wish.

I'm sure that some regression tests will fail if you actually build the
server without libbz2, but I'm not going to address that right now. We have
similar problems with other features that are in principle optional, but
cause some regression tests to fail.

Also use "#ifdef HAVE_LIBZ" rather than "#ifndef WIN32" to enable/disable
zlib support in gpfdist. Building the server still fails if you use
--without-zlib, but at least you can build the client programs without
zlib, also on non-Windows systems.

Remove obsolete copy of bzlib.h from the repository while we're at it.

d6749c3c

Fix stanullfrac computation on column with all-wide values. · 90bcf3fd

由 Heikki Linnakangas 提交于 9月 15, 2017

If a the sample of a column consists entirely of "too wide" values, which
are left out of the sample when it's passed to the compute_stats function,
we pass an empty sample to it. The default compute_stats gets confused by
that, and computes the null fraction as 0 / 0 = NaN, so we end up storing
NaN as stanullfrac.

If all the values in the sample are wide values, then they're surely not
NULLs, so the right thing to do is to store stanullfrac = 0. That is a
bit non-linear with the normal compute_stats function, which effectively
treats too wide values as not existing at all, which artificially inflates
the null fraction. Another non-linear thing is that we store stawidth=1024
in this special case, but the normal computation again ignores the wide
values in computing stawidth. If we wanted to do something about that, we
should adjust the normal computation to take those wide values better into
account, but that's a different story, and now we at least won't store NaN
in stanullfrac any longer.

Fixes github issue #3259.

90bcf3fd

Stop supporting SQL type aliases in ALTER TYPE SET DEFAULT ENCODING. · b4f125bd

由 Heikki Linnakangas 提交于 9月 15, 2017

This is a bit unfortunate, in case someone is using them. But as it
happens, we haven't even mentioned the ALTER TYPE SET DEFAULT ENCODING
command in the documentation, so there probably aren't many people using
them, and you can achieve the same thing by using the normal, non-alias,
names like "varchar" instead of "character varying".

b4f125bd

H
Move extended grouping processing to after transforming window functions. · 0b246cec
由 Heikki Linnakangas 提交于 9月 15, 2017
```
This way we don't need the weird half-transformation of WindowDefs. Makes
things simpler.
```
0b246cec

Fix remaining equal* functions to not compare 'location' field. · c6613ddd

由 Heikki Linnakangas 提交于 9月 15, 2017

The 'location' field is just to give better error messages. It should not
be considered when testing whether two nodes are equal. (Note that the
COMPARE_LOCATION_FIELD() macro that we now consistently use on the
'location' field is a no-op.)

I noticed this while working on a patch that would compare two ColumnRefs
to see if they are equal, and could be collapsed to one.

c6613ddd

Rewrite the way a DTM initialization error is logged, to retain file & lineno. · c6f931fe

由 Heikki Linnakangas 提交于 9月 15, 2017

While working on the 8.4 merge, I had a bug that tripped an Insist inside
the PG_TRY-CATCH. That was very difficult to track down, because the way
the error is logged here. Using ereport() includes filename and line
number where it's re-emitted, not the original place. So all I got was
"Unexpected internal error" in the log, with meaningless filename & lineno.

This rewrites the way the error is reported so that it preserves the
original filename and line number. It will also use the original error
level and will preserve all the other fields.

c6f931fe

M

Fixed crash at copy report unexpected message type error · c7a382c6
由 Ming LI 提交于 9月 14, 2017

c7a382c6

Fix Bug: spi_execute assert fail when there's no query mem · 5d6447ae

由 Zhenghua Lyu 提交于 9月 15, 2017

The user can config resgroup to make some query's query memory is zero.
In such cases, it will use work memory. And since query_mem's type is uint64,
we simply remove the assert in spi execution's code.

5d6447ae

Remove gp_fault_strategy catalog table and corresponding code. · f5b5c218

由 Ashwin Agrawal 提交于 9月 12, 2017

Using gp_segment_configuration catalog table easily can find if mirrors exist or
not, do not need special table to communicate the same. Earlier
gp_fault_strategy used to convey 'n' for mirrorless system, 'f' for replication
and 's' for san mirrors. Since support for 's' was removed in 5.0 only purpose
gp_fault_strategy served was mirrored or not mirrored system. Hence deleting the
gp_fault_strategy table and at required places using gp_segment_configuration to
find the required info.

f5b5c218