提交 · 9a9cd48bc7df94e91cd1c9340b261df5e9bdad9f · Greenplum / Gpdb

09 8月, 2017 10 次提交

Add debug info for interconnect network timeout · 9a9cd48b

由 Pengzhou Tang 提交于 8月 07, 2017

It was very difficult to verify if interconnect is stucked in resending
phase or if there is udp resending latency within interconnect. To improve
it, this commit record a debug message every Gp_interconnect_debug_retry_interval
times when gp_log_interconnect is set to DEBUG.

9a9cd48b

[#149699023] Handle interrupts in ORCA to avoid crashes · 17322684

由 Bhuvnesh Chaudhary 提交于 8月 04, 2017

In ORCA, we donot process interrupts during planning stage, however
if there are elog/ereport (which further calls errfinish) statements to
print additional messages we prematurely exit out the planning stage
without cleaning up the memory pools leading to inconsistent memory pool
state. This results in crashes for the subsequent queries.

This commit fixes the issue by handling interrupts while
printing messages using elog/ereport in ORCA.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

17322684

H
Whitespace fixes, to remove unnecessary differences from upstream. · a14fa2f5
由 Heikki Linnakangas 提交于 8月 08, 2017
```
Will hopefully reduce silly merge conflicts in the future.
```
a14fa2f5

Remove list_find_* functions. · 0340f543

由 Heikki Linnakangas 提交于 8月 08, 2017

They don't exist in the upstream. All but one of the callers actually just
needed list_member_*().

0340f543

Move function prototype to where it belongs. · 15ae5084

由 Heikki Linnakangas 提交于 8月 08, 2017

These functions are defined in cdbpartition.c, not tablecmds.c. The
prototypes should be in matching header file.

15ae5084

H

Remove unnecessary #includes · 4d573999
由 Heikki Linnakangas 提交于 8月 08, 2017

4d573999
H

Remove unused function. · 9637bb1d
由 Heikki Linnakangas 提交于 8月 08, 2017

9637bb1d
H
Remove unnecessary forward declarations. · 06719101
由 Heikki Linnakangas 提交于 8月 08, 2017
```
To match the upstream code.
```
06719101
H
Set Natts_pg_appendonly correctly. · 25001e8a
由 Heikki Linnakangas 提交于 8月 08, 2017
```
Seems harmless, in this direction, but let's be tidy.
```
25001e8a

Replace special "QE details" protocol message with standard ParameterStatus msg. · d85257f7

由 Heikki Linnakangas 提交于 8月 08, 2017

This gets rid of the GPDB-specific "QE details" message, that was only sent
once at QE backend startup, to notify the QD about the motion listener port
of the QE backend. Use a standard ParameterStatus message instead, pretending
that there is a GUC called "qe_listener_port". This reduces the difference
between the gp_libpq_fe copy of libpq, and libpq proper. I have a dream that
one day we will start using the standard libpq also for QD-QE communication,
and get rid of the special gp_libpq_fe copy altogether, and this is a small
step in that direction.

In the passing, change the type of Gp_listener_port variable from signed to
unsigned. Gp_listener_port actually holds two values: the TCP and UDP
listener ports, and there is bit-shifting code to store those two 16-bit
port numbers in the single 32-bit integer. But the bit-shifting was a bit
iffy, on a signed integer. Making it unsigned makes it more clear what's
happening.

d85257f7

08 8月, 2017 3 次提交
- H
  Remove extern declaration for non-existent global variable. · 40734789
  由 Heikki Linnakangas 提交于 8月 08, 2017
```
The variable was removed in PostgreSQL 8.3 (commit 7c5e5439), but this
removal of the prototype was missed in the merge.
```
  40734789
- H
  
  Remove duplicated function prototype. · d68d976f
  由 Heikki Linnakangas 提交于 8月 08, 2017
  
  d68d976f
- H
  Remove unnecessary use of PQExpBuffer. · cc38f526
  由 Heikki Linnakangas 提交于 8月 08, 2017
```
StringInfo is more appropriate in backend code. (Unless the buffer needs to
be used in a thread.)

In the passing, rename the 'conn' static variable in cdbfilerepconnclient.c.
It seemed overly generic.
```
  cc38f526
05 8月, 2017 3 次提交

XLOG_HINT integration with persistent table (PT) · a21da89e

由 Ashwin Agrawal 提交于 8月 03, 2017

We add the PT information in addition to the backup block for proper recovery.
In this case, during XLog redo, the blocks of the dropped tables will not be
restored after consulting PT.

In order to fetch the PT information, we have to pass Relation to
MarkBufferDirtyHint(). For that interface change, we refactored
MarkBufferDirtyHint() with additional Relation parameter. The relation
information is eventually passed to XLogSaveBufferForHint() to fetch the PT
information when preparing the XLOG_HINT record.

a21da89e

Refactor checksumming code to make it easier to use externally. · bad76320

由 Tom Lane 提交于 6月 13, 2013

pg_filedump and other external utility programs are likely to want to be
able to check Postgres page checksums.  To avoid messy duplication of code,
move the checksumming functionality into an exported header file, much as
we did awhile back for the CRC code.

In passing, get rid of an unportable assumption that a static char[] array
will be word-aligned, and do some other minor code beautification.

(cherry picked from commit f0421634)

bad76320

Introduce new page checksum algorithm and module. · 3fe41fe1

由 Simon Riggs 提交于 4月 29, 2013

Isolate checksum calculation to its own module, so that bufpage
knows little if anything about the details of the calculation.

This implementation is a modified FNV-1a hash checksum, details
of which are given in the new checksum.c header comments.

Basic implementation only, so we fix the output value.

Later related commits will add version numbers to pg_control,
compiler optimization flags and memory barriers.

Ants Aasma, reviewed by Jeff Davis and Simon Riggs

(cherry picked from commit 43e7a668)

3fe41fe1

03 8月, 2017 4 次提交

Fix resource group memory overuse issue when increasing concurrency. · 94a08704

由 Ning Yu 提交于 8月 03, 2017

Resource group may have memory overuse in below case:

	CREATE RESOURCE GROUP rg_concurrency_test WITH
	(concurrency=1, cpu_rate_limit=20, memory_limit=60,
	 memory_shared_quota=0, memory_spill_ratio=10);
	CREATE ROLE role_concurrency_test RESOURCE GROUP rg_concurrency_test;

	11:SET ROLE role_concurrency_test;
	11:BEGIN;

	21:SET ROLE role_concurrency_test;
	22:SET ROLE role_concurrency_test;
	21&:BEGIN;
	22&:BEGIN;

	ALTER RESOURCE GROUP rg_concurrency_test SET CONCURRENCY 2;

	11:END;

The cause is that we didn't check overall memory quota usage in the
past, so pending queries can be waken up as long as the concurrency
limit is not reached, in such a case if the currently running tranctions
have used all the memory quota in the resource group then the overall
memory usage will be exceeded.

To fix this issue we now checks both concurrency limit and memory quota
usage to decide whether to wake up pending queries.
Signed-off-by: NZhenghua Lyu <zlv@pivotal.io>

94a08704

Add checksum verification on mirror of filerep resync · 51ff21af

由 Xin Zhang 提交于 8月 02, 2017

Validate every BufferPool page sent to the mirror by the
primary prior to writing.
Signed-off-by: NTaylor Vesely <tvesely@pivotal.io>

51ff21af

Allow I/O reliability checks using 16-bit checksums · ed0efd2a

由 Simon Riggs 提交于 3月 22, 2013

Checksums are set immediately prior to flush out of shared buffers
and checked when pages are read in again. Hint bit setting will
require full page write when block is dirtied, which causes various
infrastructure changes. Extensive comments, docs and README.

WARNING message thrown if checksum fails on non-all zeroes page;
ERROR thrown but can be disabled with ignore_checksum_failure = on.

Feature enabled by an initdb option, since transition from option off
to option on is long and complex and has not yet been implemented.
Default is not to use checksums.

Checksum used is WAL CRC-32 truncated to 16-bits.

Simon Riggs, Jeff Davis, Greg Smith
Wide input and assistance from many community members. Thank you.

(cherry picked from commit 96ef3b8f)

ed0efd2a

Remove PageSetTLI and rename pd_tli to pd_checksum · 626df6b4

由 Simon Riggs 提交于 3月 18, 2013

Remove use of PageSetTLI() from all page manipulation functions
and adjust README to indicate change in the way we make changes
to pages. Repurpose those bytes into the pd_checksum field and
explain how that works in comments about page header.

Refactoring ahead of actual feature patch which would make use
of the checksum field, arriving later.

Jeff Davis, with comments and doc changes by Simon Riggs
Direction suggested by Robert Haas; many others providing
review comments.

(cherry picked from bb7cc262)

626df6b4

02 8月, 2017 1 次提交

Make memory spill in resource group take effect · 68babac4

由 Richard Guo 提交于 8月 02, 2017

Resource group memory spill is similar to 'statement_mem' in
resource queue, the difference is memory spill is calculated
according to the memory quota of the resource group.

The related GUCs, variables and functions shared by both resource
queue and resource group are moved to the namespace resource manager.

Also codes of resource queue relating to memory policy are refactored in this commit.
Signed-off-by: NPengzhou Tang <ptang@pivotal.io>
Signed-off-by: NNing Yu <nyu@pivotal.io>

68babac4

31 7月, 2017 1 次提交

Implement "COPY ... FROM ... ON SEGMENT' · e254287e

由 Ming LI 提交于 7月 31, 2017

Support COPY statement that imports the data file on segments directly
parallel. It could be used to import data files generated by "COPY ...
to ... ON SEGMENT'.

This commit also supports all kinds of data file formats which "COPY ...
TO" supports, processes reject limit numbers and logs errors accordingly.

Key workflow:
   a) For COPY FROM, nothing changed by this commit, dispatch modified
   COPY command to segments at first, then read data file on master, and
   dispatch the data to relevant segment to process.

   b) For COPY FROM ON SEGMENT, on QD, read dummy data file, other parts
   keep unchanged, on QE, process the data stream (empty) dispatched
   from QD at first, then re-do the same workflow to read and process
   the local segment data file.
Signed-off-by: NMing LI <mli@pivotal.io>
Signed-off-by: NAdam Lee <ali@pivotal.io>
Signed-off-by: NHaozhou Wang <hawang@pivotal.io>
Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>

e254287e

27 7月, 2017 1 次提交

Make SQL based fault injection function available to all tests. · b23680d6

由 Asim R P 提交于 7月 24, 2017

The function gp_inject_fault() was defined in a test specific contrib module
(src/test/dtm).  It is moved to a dedicated contrib module gp_inject_fault.
All tests can now make use of it.  Two pg_regress tests (dispatch and cursor)
are modified to demonstrate the usage.  The function is modified so that it can
inject fault in any segment, specified by dbid.  No more invoking
gpfaultinjector python script from SQL files.

The new module is integrated into top level build so that it is included in
make and make install.

b23680d6

25 7月, 2017 1 次提交

Fix resgroup ICW failures · 4165a543

由 Ning Yu 提交于 7月 25, 2017

* Fix the resgroup assert failure on CREATE INDEX CONCURRENTLY syntax.

When resgroup is enabled an assertion failure will be encountered with
below case:

    SET gp_create_index_concurrently TO true;
    DROP TABLE IF EXISTS concur_heap;
    CREATE TABLE concur_heap (f1 text, f2 text, dk text) distributed by (dk);
    CREATE INDEX CONCURRENTLY concur_index1 ON concur_heap(f2,f1);

The root cause is that we had the assumption on QD that a command is
dispatched to QEs when assigned to a resgroup, but this is false with
CREATE INDEX CONCURRENTLY syntax.

To fix it we have to make necessary check and cleanup on QEs.

* Do not assign a resource group in SIGUSR1 handler.

When assigning a resource group on master it might call WaitLatch() to
wait for a free slot. However as WaitLatch() expects to be waken by the
SIGUSR1 signal, it will run into endless waiting when SIGUSR1 is
blocked.

One scenario is the catch up handler. Catch up handler is triggered and
executed directly in the SIGUSR1 handler, so during its execution
SIGUSR1 is blocked. And as catch up handler will begin a transaction so
it will try to assign a resource group and trigger the endless waiting.

To fix this we add the check to not assign a resource group when running
inside the SIGUSR1 handler. As signal handlers are supposed to be light
and short and safe, so skip resource group in such a case shall be
reasonable.

4165a543

24 7月, 2017 1 次提交

Use non-blocking recv() in internal_cancel() · 23e5a5ee

由 xiong-gang 提交于 7月 24, 2017

The issue of hanging on recv() in internal_cancel() are reported
serveral times, the socket status is shown 'ESTABLISHED' on master,
while the peer process on the segment has already exit. We are not
sure how exactly dose this happen, but we are able to simulate this
hang issue by dropping packet or reboot the system on the segment.

This patch use poll() to do non-blocking recv() in internal_cancel();
the timeout of poll() is set to the max value of authentication_timeout
to make sure the process on segment has already exit before attempting
another retry; and we expect retry on connect() can detect network issue.
Signed-off-by: NNing Yu <nyu@pivotal.io>

23e5a5ee

22 7月, 2017 2 次提交

Revert "Make SQL based fault injection function available to all tests." · 582d0fd4

由 Asim R P 提交于 7月 21, 2017

Loading a C UDF within postgres binary is not a good idea.  The binary cannot
be loaded as a shared object on Linux (it works on OSX).

This reverts commit 9361a6dd.

582d0fd4

Make SQL based fault injection function available to all tests. · 9361a6dd

由 Asim R P 提交于 7月 20, 2017

The function gp_inject_fault() was defined in a test specific contrib module
(src/test/dtm).  All tests can now make use of it.  Two pg_regress tests
(dispatch and cursor) are modified to demonstrate the usage.  The function is
also made capable to inject fault in any segment, specified by dbid.  No more
invoking gpfaultinjector python script from SQL files.

9361a6dd

21 7月, 2017 1 次提交

Improve partition selection logging (#2796) · 038aa959

由 Jesse Zhang 提交于 7月 20, 2017

Partition Selection is the process of determining at runtime ("execution
time") which leaf partitions we can skip scanning. Three types of Scan
operators benefit from partition selection: DynamicTableScan,
DynamicIndexScan, and BitmapTableScan.

Currently, there is a minimal amount of logging about what partitions
are selected, but they are scattered between DynamicIndexScan and
DynamicTableScan (and so we missed BitmapTableScan).

This commit moves the logging into the PartitionSelector operator
itself, when it exhausts its inputs. This also brings the nice side
effect of more granular information: the log now attributes the
partition selection to individual partition selectors.

038aa959

19 7月, 2017 1 次提交

[#147774653] Implemented ValuesScan Operator in ORCA · 819107b7

由 Bhuvnesh Chaudhary 提交于 7月 18, 2017

This commit introduces a new operator for ValuesScan, earlier we
generated `UNION ALL` for cases where VALUES lists passed are all
constants, but now a new Operator CLogicalConstTable with an array of
const tuples will be generated

Once the plan is generated by ORCA, it will be translated to valuesscan
node in GPDB.

This enhancement helps significantly in improving the total run time for the queries
involving values scan in ORCA with const values.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

819107b7

18 7月, 2017 1 次提交

Restrict only one reader of pipe on Linux. (#2771) · 82329ca1

由 Ming LI 提交于 7月 18, 2017

If there are two external tables refer to the same PIPE file using gpfdist
or file protocol directly, concurrent read will result in wrong data format
or hang for gpfdist. Now before read the pipe, it will firstly flock the
pipe file (Windows not supported yet), other requests from gpdb will report
error.
Signed-off-by: NMing LI <mli@apache.org>
Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>

82329ca1

15 7月, 2017 1 次提交

Remove PartOidExpr, it's not used in GPDB. (#2481) · 941327cd

由 Heikki Linnakangas 提交于 7月 14, 2017

* Remove PartOidExpr, it's not used in GPDB.

The target lists of DML nodes that ORCA generates includes a column for the
target partition OID. It can then be referenced by PartOidExprs. ORCA uses
these to allow sorting the tuples by partition, before inserting them to the
underlying table. That feature is used by HAWQ, where grouping tuples that
go to the same output partition is cheaper.

Since commit adfad608, which removed the gp_parquet_insert_sort GUC, we
don't do that in GPDB, however. GPDB can hold multiple result relations open
at the same time, so there is no performance benefit to grouping the tuples
first (or at least not enough benefit to counterbalance the cost of a sort).

So remove the now unused support for PartOidExpr in the executor.

* Bump ORCA version to 2.37
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

* Removed acceptedLeaf
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

941327cd

14 7月, 2017 2 次提交

During replay of AO XLOG records, keep track of missing AO/AOCO segment files · b659d047

由 Jimmy Yih 提交于 7月 11, 2017

When a standby is shut down and restarted, WAL recovery starts from
the last restartpoint. If we replay an AO write record which has a
following drop record, the WAL replay of the AO write record will find
that the segment file does not exist. To fix this, we piggyback on top
of the heap solution of tracking invalid pages in the invalid_page_tab
hash table. The hash table key struct uses a block number which, for
AO's sake, we pretend is the segment file number for AO/AOCO
tables. This solution will be revisited to possibly create a separate
hash table for AO/AOCO tables with a proper key struct.

Big thanks to Heikki for pointing out the issue.

b659d047

Replay of AO XLOG records · cc9131ba

由 Ashwin Agrawal 提交于 6月 23, 2017

We generate AO XLOG records when --enable-segwalrep is configured. We
should now replay those records on the mirror or during recovery. The
replay is only performed for standby mode since promotion will not
execute until after there are no more XLOG records to read from the
WAL stream.

cc9131ba

13 7月, 2017 3 次提交

Remove unreachable and unused code (#2611) · f4e50a64

由 Daniel Gustafsson 提交于 7月 13, 2017

This removes code which is either unreachable due to prior identical
tests which break the codepath, or which is dead due to always being
true. Asserting that an unsigned integer is >= 0 will always be true,
so it's pointless.

Per "logically dead code" gripes by Coverity

f4e50a64

Use block number instead of LSN to batch changed blocks in filerep · abe13c79

由 Asim R P 提交于 6月 30, 2017

Filerep resync logic to fetch changed blocks from changetracking (CT)
log is changed. LSN is no longer used to filter out blocks from CT
log. If a relation's changed blocks falls above the threshold number
of blocks that can be fetched at a time, the last fetched block number
is remembered and used to form subsequent batch.

abe13c79

Add GUC to control number of blocks that a resync worker operates on · 2960bd7c

由 Asim R P 提交于 6月 27, 2017

The GUC gp_changetracking_max_rows replaces a compile time constant. Resync
worker obtains at the most gp_changetracking_max_rows number of changed blocks
from changetracking log at one time. Controling this with a GUC allows
exploiting bugs in resync logic around this area.

2960bd7c

11 7月, 2017 4 次提交

Speed up simple queries on AOCS tables, when only a few columns are needed. · 3c40df0a

由 Heikki Linnakangas 提交于 7月 11, 2017

If you have a query like "SELECT COUNT(col1) FROM wide_table", where the
table has dozens of columns, the overhead in aocs_getnext() just to figure
out which columns need to be fetched becomes noticeable. Optimize it.

3c40df0a

Turn a couple of hazardous macros into inline functions. · 2395d803

由 Heikki Linnakangas 提交于 7月 11, 2017

In aocsam.c, there's a block of code that does:

    if (...)
    {
        AOTupleIdInit_rowNum(...);
    }
    else
    {
        AOTupleIdInit_rowNum(...);
    }

While hacking, I removed the seemingly unnecessary braces, turning that
into just:

    if (...)
        AOTupleIdInit_rowNum(...);
    else
        AOTupleIdInit_rowNum(...);

But then I got a compiler error, about 'else' without 'if'. I was baffled
for a moment, until I looked at the definition of AOTupleIdInit_rowNum. The
way it includes curly braces makes it not work in an if-else construct like
above. These macros also have double-evaluation hazards.

To make this more robust, turn the macros into static inline functions.
Inline functions generally behave more sanely and are more readable than
macros.

2395d803

Remove unused field. · 781ff5b7

由 Heikki Linnakangas 提交于 7月 11, 2017

This does mean that we don't free the array quite as quickly as we used to,
but it's a drop in the sea. The array is very small, there are much bigger
data structures involved in evey AOCS scan that are not freed as quickly,
and it's freed at the end of the query in any case.

781ff5b7

H
Add missing prototypes to silence compiler warnings. · 99c30c58
由 Heikki Linnakangas 提交于 7月 11, 2017
```
Commit fa6c2d43 added two functions, but forgot to add prototypes for
them.
```
99c30c58