提交 · 92b5c1c79bea377f8cd37dba1f02ebd91f54ac3f · Greenplum / Gpdb

09 1月, 2020 1 次提交
- D
  
  Docs - updating doc build to reflect version 6.3 · 92b5c1c7
  由 dyozie 提交于 1月 08, 2020
  
  92b5c1c7
08 1月, 2020 3 次提交

Do not terminate the connection and quit in SyncRepWaitForLSN(). · af3bbc3e

由 Paul Guo 提交于 12月 03, 2019

We previously had the code to terminate the connection if needed on QE to avoid
potential data inconsistency. This is gpdb specific since upstream code there
seems to be not friendly for failover + data consistency. However that
introduces various abort or assert failures since apparently some shm exit
callback functions are not friendly to the current transaction state. Below are
some stack examples.

Originally I fixed them in those callback functions but I found on gpdb6 after
I fixed one, another one (in another callback function) come out. That's why I
could collect so many gpdb 6 stacks below. I just collect one gpdb master
stack, but I it should have more stacks also if we fixing in those callbacks
one by one. Anyway finally I decide to fix by delaying the ereport(FATAL)
exec_mpp_dtx_protocol_command() instead, and let QD retry 2PC to ensure the
data consistency. Note 1PC retry is currently not implemented but this should
be in another PR.

gpdb master (7) stack:

2 0x0000000000b48ddc in ExceptionalCondition (conditionName=0xe527e8 "!(ShmemAddrIsValid(nextElem))", errorType=0xe527bd "FailedAssertion",
fileName=0xe527b2 "shmqueue.c", lineNumber=74) at assert.c:66
3 0x0000000000996311 in SHMQueueDelete (queue=0x7ff5e6676da8) at shmqueue.c:74
4 0x00000000009689de in SyncRepCleanupAtProcExit () at syncrep.c:436
5 0x00000000009a7b49 in ProcKill (code=1, arg=0) at proc.c:949
6 0x000000000098c001 in shmem_exit (code=1) at ipc.c:288
7 0x000000000098be5f in proc_exit_prepare (code=1) at ipc.c:212
8 0x000000000098bd64 in proc_exit (code=1) at ipc.c:104
9 0x0000000000b4a7d4 in errfinish (dummy=0) at elog.c:738
10 0x000000000096860e in SyncRepWaitForLSN (lsn=210148624, commit=1 '\001') at syncrep.c:303
11 0x000000000055c082 in RecordTransactionCommitPrepared (xid=638, gid=0x2c603ad "1575462785-0000000012", nchildren=0, children=0x2c6d2d0, nrels=0, rels=0x2c6d2d0,
ndeldbs=0, deldbs=0x2c6d2d0, ninvalmsgs=0, invalmsgs=0x2c6d2d0, initfileinval=0 '\000') at twophase.c:2283
12 0x000000000055aae3 in FinishPreparedTransaction (gid=0x2c603ad "1575462785-0000000012", isCommit=1 '\001', raiseErrorIfNotFound=0 '\000') at twophase.c:1493
13 0x0000000000c4e4fe in performDtxProtocolCommitPrepared (gid=0x2c603ad "1575462785-0000000012", raiseErrorIfNotFound=0 '\000') at cdbtm.c:2037
14 0x0000000000c4e9d5 in performDtxProtocolCommand (dtxProtocolCommand=DTX_PROTOCOL_COMMAND_RECOVERY_COMMIT_PREPARED, gid=0x2c603ad "1575462785-0000000012",
contextInfo=0x1220f20) at cdbtm.c:2215

gpdb 6 stacks:

2 0x0000000000ad9ea5 in ExceptionalCondition (conditionName=0xdbfddb "!(MyProc->syncRepState == 0)", errorType=0xdbfd28 "FailedAssertion",
fileName=0xdbfcd0 "syncrep.c", lineNumber=130) at assert.c:66
3 0x000000000091ce81 in SyncRepWaitForLSN (XactCommitLSN=3400317528) at syncrep.c:130
4 0x000000000053991a in RecordTransactionCommit () at xact.c:1663
5 0x000000000053b0b2 in CommitTransaction () at xact.c:2756
6 0x000000000053c024 in CommitTransactionCommand () at xact.c:3646
7 0x00000000005c6c25 in RemoveTempRelationsCallback (code=1, arg=0) at namespace.c:4107
8 0x000000000093c353 in shmem_exit (code=1) at ipc.c:257
9 0x000000000093c248 in proc_exit_prepare (code=1) at ipc.c:214
10 0x000000000093c146 in proc_exit (code=1) at ipc.c:104
11 0x0000000000adb93d in errfinish (dummy=0) at elog.c:754
12 0x000000000091d2ef in SyncRepWaitForLSN (XactCommitLSN=3400294096) at syncrep.c:284
13 0x0000000000549d8e in EndPrepare (gxact=0x7f8a7d5fa0e0) at twophase.c:1241

3 0x0000000000ade6d1 in elog_finish (elevel=22, fmt=0xc3a898 "cannot abort transaction %u, it was already committed") at elog.c:1735
4 0x0000000000539d22 in RecordTransactionAbort (isSubXact=0 '\000') at xact.c:1923
5 0x000000000053b95c in AbortTransaction () at xact.c:3340
6 0x000000000053e0a7 in AbortOutOfAnyTransaction () at xact.c:5248
7 0x00000000005c68b9 in RemoveTempRelationsCallback (code=1, arg=0) at namespace.c:4088
8 0x000000000093c371 in shmem_exit (code=1) at ipc.c:257
9 0x000000000093c266 in proc_exit_prepare (code=1) at ipc.c:214
10 0x000000000093c164 in proc_exit (code=1) at ipc.c:104
11 0x0000000000adb94e in errfinish (dummy=0) at elog.c:754
12 0x000000000091d30d in SyncRepWaitForLSN (XactCommitLSN=19529538376) at syncrep.c:284
13 0x000000000053985a in RecordTransactionCommit () at xact.c:1663

2 0x0000000000adb9a9 in ExceptionalCondition (conditionName=0xdb2560 "!(entry->trans == ((void *)0))", errorType=0xdb2550 "FailedAssertion",
fileName=0xdb216a "pgstat.c", lineNumber=842) at assert.c:66
3 0x00000000008d3391 in pgstat_report_stat (force=1 '\001') at pgstat.c:842
4 0x00000000008d65e8 in pgstat_beshutdown_hook (code=1, arg=0) at pgstat.c:2685
5 0x000000000093deba in shmem_exit (code=1) at ipc.c:290
6 0x000000000093dd1a in proc_exit_prepare (code=1) at ipc.c:214
7 0x000000000093dc18 in proc_exit (code=1) at ipc.c:104
8 0x0000000000add441 in errfinish (dummy=0) at elog.c:750
9 0x000000000091ee5c in SyncRepWaitForLSN (XactCommitLSN=225227432) at syncrep.c:333
10 0x0000000000549dd8 in EndPrepare (gxact=0x7f02508680e0) at twophase.c:1241

2 0x0000000000adb9c2 in ExceptionalCondition (conditionName=0xdcb458 "!(!((allPgXact[proc->pgprocno].xid) != ((TransactionId) 0)))",
errorType=0xdcb408 "FailedAssertion", fileName=0xdcb3d9 "procarray.c", lineNumber=369) at assert.c:66
3 0x000000000093f614 in ProcArrayRemove (proc=0x7f4f1f5a05d0, latestXid=0) at procarray.c:369
4 0x00000000009586ec in RemoveProcFromArray (code=1, arg=0) at proc.c:904
5 0x000000000093ded3 in shmem_exit (code=1) at ipc.c:290
6 0x000000000093dd33 in proc_exit_prepare (code=1) at ipc.c:214
7 0x000000000093dc31 in proc_exit (code=1) at ipc.c:104
8 0x0000000000add45a in errfinish (dummy=0) at elog.c:750
9 0x000000000091ee75 in SyncRepWaitForLSN (XactCommitLSN=348629504) at syncrep.c:333
10 0x0000000000549dd8 in EndPrepare (gxact=0x7f4f1fa8cce0) at twophase.c:1241
11 0x000000000053b621 in PrepareTransaction () at xact.c:3115
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NAsim R P <apraveen@pivotal.io>

Cherry-picked from 7b761730

af3bbc3e

Ignore ORCA optimizer version in isolation2 tests · 1e983ec4

由 Abhijit Subramanya 提交于 1月 07, 2020

The modify_table_data_corrupt test failed due to difference in the ORCA version
string. So ignore it by adding the pattern in the isolation2 init_file.

1e983ec4

Fix crash with JSON/YAML format EXPLAIN. · 51257848

由 Heikki Linnakangas 提交于 1月 07, 2020

The code in EXPLAIN that displays the gang information of a node was
correctly prepared to handle the case that a node was missing flow
information, by looking at the child plan's flow instead. However, it's
possible to have two such nodes on top of each other. We need to look at
the grandchild's flow in that case.

This only occurs with JSON/YAML format EXPLAIN, because in text format we
only print the gang information on Motion nodes.

Fixes https://github.com/greenplum-db/gpdb/issues/9359. Fix on 6X_STABLE
only; 5X_STABLE didn't have JSON/YAML format output, and this works on
master. I'm not entirely sure why this works on master, but I'm not going
to spend time figuring that out right now, because I'm just about to
refactor this so that it's not based on the Flow nodes at all
(https://github.com/greenplum-db/gpdb/pull/9093).
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

51257848

07 1月, 2020 3 次提交

A

Bump ORCA version to v3.87.0 · 52a0e97c
由 Abhijit Subramanya 提交于 1月 06, 2020

52a0e97c

Fixes misleading indentation · 9f8cfec1

由 Jesse Zhang 提交于 1月 06, 2020

This commit fixes the new Clang 10 warnings around misleading
indentation, in the same vein as commit b93a631f.

9f8cfec1

Fix flaky test failures due to oid conflict. · d90c3759

由 Paul Guo 提交于 1月 06, 2020

Saw the below similar test failure (e.g. in test starve_case) some times.  It
seems that this is due to test misc. test misc was run in parallel with other
tests (besides starve_case).  It creates table and index in utility mode and
could easily introduce oid conflict. Moving the test out of the parallel
running group to fix the failures.

 create table starve (c int);

 CREATE

 create table starve_helper (name varchar, sessionid int);

-CREATE

+DETAIL:  Key (oid)=(131128) already exists.

+ERROR:  duplicate key value violates unique constraint "pg_type_oid_index"

(cherry picked from commit 2de2f3f7)

d90c3759

03 1月, 2020 9 次提交

Subquery's locus should keep general · 77aa1b6e

由 Zhenghua Lyu 提交于 1月 03, 2020

If a subquery's locus is general, we should keep it general here.
And general locus's numsegments should be the cluster size.

77aa1b6e

Set Recursive CTE plan's flow correctly. · fb90769e

由 Zhenghua Lyu 提交于 1月 03, 2020

Recursive union plannode contains two non-empty subplan trees,
so that this plannode's flow and locus should take both trees
into consideration.

Besides, between Recursive union plannode node and WorkTableScan plannode
there must be no Motion nodes because the execution of WorkTableScan
depends on the Recursive union's data structure. And we always use cteplan's
locus as WTS's locus. Remember WTS path cannot be turned to replicated(means
broadcast) when dealing with join. Most of the cases, it is OK. But for
replicated table whose locus is CdbLocusType_SegmentGeneral, it can not be
taken as everywhere, we will gather it to singleQE or redistribute it when joining.

To avoid such case, if cteplan's locus is CdbLocusType_SegmentGeneral,
we build WTS path using singlQE, and later in the function
`set_recursive_union_flow` to add a gather on the top of cteplan.

fb90769e

Use a different temp schema name pattern in utility mode · dd72d1a6

由 Ning Yu 提交于 10月 17, 2019

A temp table's schema name is pg_temp_<session_id> in normal mode, in
utility mode the name is pg_temp_<backend_id>, however once the
normal-mode session id equals to the utility-mode backend id they will
conflict with each other and cause catalog corruption on the segment.

To fix this issue we changed the name to pg_temp_0<backend_id> in
utility mode, this still matches the pattern "pg_temp_[0-9]+", which is
expected for temp schema names.

(cherry picked from commit 9bde1b01)

dd72d1a6

catch ImportError for gpversion (#9346) · 00f6b533

由 Huiliang.liu 提交于 1月 03, 2020

cherry pick from gpdb master.
gpload will run in GPDB6 compatibility mode if imports gpVersion failed

00f6b533

Fix invalid alloc size when invalidating caches · a8b0b5fd

由 Ning Yu 提交于 11月 20, 2019

In AddInvalidationMessage() a new chunk always has double size of last
chunk, but once the size exceeds 1GB, the max allowed alloc size, an
error like "invalid memory alloc request size 1,342,177,300" will be
thrown.

Fixed by limiting the chunk size.

(cherry picked from commit a268b387530ba4007d97a9c72e402546f48ce9bc)
(cherry picked from commit 8fdf36d7)

a8b0b5fd

Z
Add case splitupdate for check-tuple-locality · 021fdcdd
由 Zhenghua Lyu 提交于 1月 03, 2020
```
Commit d95f351a does not add splitupdate case.
This commit adds this kind of test cases.
```
021fdcdd
H
GPload: change metadata query SQL to improvement performance (#8904) · 770412c3
由 Huiliang.liu 提交于 11月 01, 2019
```
GPload: change metadata query SQL to improvement performance
Old query SQL may take long time if catalog is large.
```
770412c3

Fix error scenarios of alter_db_set_tablespace test · 7788f434

由 Ashwin Agrawal 提交于 1月 02, 2020

alter_db_set_tablespace test has scenarios to inject error fault for
content 0. Then run ALTER DATABASE SET TABLESPACE command. Once error
is hit on content 0, the transaction is aborted. Based on when the
transaction gets aborted, its unpredictable what point the command has
reached for non-content 0 primaries. If non-content 0 primaries, have
reached the point of directory copy, then only abort record for them
will have database directory deletion record to be replayed on mirror,
else not. The test was waiting for directory deletion fault to be
triggered for all the content mirrors. This expectation is incorrect
and makes test flaky based on timing.

Hence, modifying the test for error scenarios to only wait for
directory deletion for content 0. Then wait for all the mirrors to
replay all the currently generated wal records, post which make sure
destination directory is empty. This should eliminate the flakiness
from the test.
Reviewed-by: NAsim R P <apraveen@pivotal.io>

7788f434

GpDirsExist check for directories only · fe8fd394

由 Ashwin Agrawal 提交于 1月 02, 2020

gpdeletesystem uses GpDirsExist() to check if dump directories are
present to warn and avoid deleting the cluster. Only if "-f" option is
used allowed to delete the cluster with dump directories
present. Though this function incorrectly checks for files and
directories with name "*dump*" and not just directories.

So, gpdeletesystem started failing after commit eb036ac1. FTS writes
file with name of file as `gpsegconfig_dump`. GpDirsExist()
incorrectly reports this as backup directory present and fails. Fix
the same by only checking for directories and not files.

Fixes https://github.com/greenplum-db/gpdb/issues/8442Reviewed-by: NAsim R P <apraveen@pivotal.io>

fe8fd394

02 1月, 2020 1 次提交

(

Fix error code overwrite in cdbdisp_dumpDispatchResult. (#9323) (#9340) · 7e8e25fd

由 (Jerome)Junfeng Yang 提交于 1月 02, 2020

The error code should not set twice with different codes in `errfinish_and_return`.
Since the order of function's parameters is dependent on the compiler.
If the final error code is ERRCODE_INTERNAL_ERROR, file name and line number print out.
```
ERROR:  Error on receive from SEG IP:PORT pid=PID: *** (cdbdispatchresult.c:487)
```

It's strange to print out the file name and line number here.
So, remove ERRCODE_INTERNAL_ERROR, and only keep ERRCODE_GP_INTERCONNECTION_ERROR,
which is also the error code before commit: 143bb7c6.
Reviewed-by: NPaul Guo <paulguo@gmail.com>
(cherry picked from commit 55d6415b)

7e8e25fd

01 1月, 2020 1 次提交

Fix flakiness of alter_db_set_tablespace test · 8d1460fc

由 Ashwin Agrawal 提交于 12月 30, 2019

Test should make sure mirror has processed the drop database wal
record before proceeding to perform check for destination tablespace
directory non-existence. It skipped performing the wait for content 0,
incase of panic after writing wal record, that's incorrect.

Adding to wait for all the mirror's to process the wal record and then
only perform the validation. This should fix the failures seen in CI
with below diff

```
--- /tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/alter_db_set_tablespace.out    2019-10-14 16:09:43.638372174 +0000
+++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/alter_db_set_tablespace.out    2019-10-14 16:09:43.714379108 +0000
@@ -1262,25 +1271,352 @@
 CONTEXT:  PL/Python function "stat_db_objects"
 NOTICE:  dboid dir for database alter_db does not exist on dbid = 4
 CONTEXT:  PL/Python function "stat_db_objects"
-NOTICE:  dboid dir for database alter_db does not exist on dbid = 5
-CONTEXT:  PL/Python function "stat_db_objects"
 NOTICE:  dboid dir for database alter_db does not exist on dbid = 6
 CONTEXT:  PL/Python function "stat_db_objects"
 NOTICE:  dboid dir for database alter_db does not exist on dbid = 7
 CONTEXT:  PL/Python function "stat_db_objects"
 NOTICE:  dboid dir for database alter_db does not exist on dbid = 8
 CONTEXT:  PL/Python function "stat_db_objects"
- dbid | relfilenode_dboid_relative_path | size
-------+---------------------------------+------
-    1 |                                 |
-    2 |                                 |
-    3 |                                 |
-    4 |                                 |
-    5 |                                 |
-    6 |                                 |
-    7 |                                 |
-    8 |                                 |
-(8 rows)
+ dbid | relfilenode_dboid_relative_path |  size
+------+---------------------------------+--------
+    1 |                                 |
+    2 |                                 |
+    3 |                                 |
+    4 |                                 |
+    5 | 180273/112                      |  32768
+    5 | 180273/113                      |  32768
+    5 | 180273/12390                    |  65536
+    5 | 180273/12390_fsm                |  98304

<....choping output as very long...>

+    5 | 180273/PG_VERSION               |      4
+    5 | 180273/pg_filenode.map          |   1024
+    6 |                                 |
+    7 |                                 |
+    8 |                                 |
+(337 rows)

```
Reviewed-by: NAsim R P <apraveen@pivotal.io>

8d1460fc

31 12月, 2019 1 次提交
- P
  Return non-zero value if part of segments fail to do incremental recovery in gprecoverseg. (#9284) · 0d88532a
  由 Paul Guo 提交于 12月 31, 2019
```
This helps script handling by checking return values.
Reviewed-by: NAsim R P <apraveen@pivotal.io>
```
  0d88532a
30 12月, 2019 1 次提交

Fix a 'copy..reject limit N' bug · 933a68c5

由 xiong-gang 提交于 12月 30, 2019

When 'CopyReadLineText' find a broken end-of-copy marker, it errors out without
setting the current index in the buffer. In the case of 'reject limit' is set,
copy will process the line again.

933a68c5

28 12月, 2019 1 次提交

Change default value of wal_sender_timeout GUC · aaaa18d4

由 Ashwin Agrawal 提交于 12月 27, 2019

Based on reports from field for GPDB, 1 min of wal_sender_timeout GUC
is causing primary to terminate the replication connection too often
in heavy workload situations. This causes mirror to be marked down and
piles up WAL on primary. This is moslty seen in configurations where
fsync takes long time on mirrors. Hence, would be helpful to have
higher default value of this GUC to avoid unnecessary marking mirror
down situations. Only downside of this change would be when connection
between primary and mirror exist but for some reason mirror doesn't
respond, it will be detected little later compared to previous 1 min
timeout. But 1 min timeout is causing major downside and mirrors need
to be manually recovered after being marked down. Hence, its desirable
to not falsely break the connectiion due to timeout.

Increasing the timeout to 5 mins is just a educated guess as its hard
to come up with reasonable default, but bumping the value is desired
based on inputs.
Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>

aaaa18d4

27 12月, 2019 2 次提交
- Z
  Add tuple locality check for update · 794cf73b
  由 Zhenghua Lyu 提交于 12月 27, 2019
```
Previous commit c9655e2d forgot to check tuple locality for update.
This commit add check tuple locality for ExecUpdate and refactor test cases.
```
  794cf73b
- C
  
  Add missing space between gpexpand and -i (#9306) · 8b9c20ea
  由 Chuck Litzell 提交于 12月 26, 2019
  
  8b9c20ea
26 12月, 2019 3 次提交

zstd: check for OOM when creating {C,D}Ctx · 712c6f36

由 Ning Yu 提交于 12月 26, 2019

ZSTD creates CCtx and DCtx with malloc() by default, a NULL pointer will
be returned on OOM, the callers must check for NULL pointers.

Also fixed a typo in the comment.

Fixes: https://github.com/greenplum-db/gpdb/issues/9294

Reported-by: shellboy
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

(cherry picked from commit d74aa39f)

712c6f36

Fix flaky test segspace. · 328e3954

由 Paul Guo 提交于 12月 20, 2019

We recently start to control wal write burst by calling SyncRepWaitForLSN()
more frequently, then the changes cause the segspace test flaky.

In the segspace test case, there is an inject fault (exec_hashjoin_new_batch)
with interrupt event, this makes the test easier to have the cancel event
seen in SyncRepWaitForLSN() and then cause additional outputs sometimes.

Fixing this by disabling the current cancel handling code if it is not a commit
call of SyncRepWaitForLSN().

Here is the diff of the test failure.

 begin;
 insert into segspace_t1_created
 SELECT t1.* FROM segspace_test_hj_skew AS t1, segspace_test_hj_skew AS t2 WHERE t1.i1=t2.i2;
+DETAIL:  The transaction has already changed locally, it has to be replicated to standby.
 ERROR:  canceling MPP operation
+WARNING:  ignoring query cancel request for synchronous replication to ensure cluster consistency
 rollback;

Cherry-picked from 84642c4b

Besides, add the commit parameter in SyncRepWaitForLSN() following the master
code. Checked the related upstream patch. Adding the new parameter in current
gpdb version should be fine.

328e3954

icw: fix flaky alter_db_set_tablespace · 4e757457

由 Ning Yu 提交于 12月 25, 2019

The alter_db_set_tablespace test has been flaky for a long time, one
typical failure is like this:

    --- /regress/expected/alter_db_set_tablespace.out
    +++ /regress/results/alter_db_set_tablespace.out
    @@ -1204,21 +1213,348 @@
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 2
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 3
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 4
    -NOTICE:  dboid dir for database alter_db does not exist on dbid = 5
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 6
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 7
     NOTICE:  dboid dir for database alter_db does not exist on dbid = 8

The test disables fts probing with fault injection, however it does not
wait for the fault to be triggered.  The other problem is that the fts
probing was disabled after the PANIC, that might not be in time.

So the problem was that we were having a scenario where we were
injecting the fault after the fts loop was beyond the fault point and
then when the subsequent PANIC was caused, fts was still active.

By manually triggering, and then by waiting to ensure that the fault is
hit at least once, we can guarantee that the scenario described above
doesn't happen.
Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>
Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
(cherry picked from commit 54e3af6d)

4e757457

24 12月, 2019 3 次提交

(

Fix `DELIMETER 'OFF'` validate error for external table. · a3573af8

由 (Jerome)Junfeng Yang 提交于 12月 24, 2019

For below external table:
```
CREATE EXTERNAL WEB TABLE web_ext ( junk text) execute 'echo hi' on master
    FORMAT 'text' (delimiter 'OFF' null E'\\N' escape E'\\');
```
When querying the table, an unexpected error happens:
```
SELECT * FROM web_ext;
ERROR:  using no delimiter is only supported for external tables
```

Since external scan calls BeginCopyFrom to init CopyStateData.
When `ProcessCopyOptions` in `BeginCopy`, the relation may
be an external relation.
The fix checks whether the relation is an external relation and if yes,
set the correct parameters for `ProcessCopyOptions`.

a3573af8

Make commit_blocking_on_standby test stable · e60cab6f

由 Ashwin Agrawal 提交于 12月 23, 2019

Similar to commit 8c40565a, apply same change to
commit_blocking_on_standby test as well. Unnecessary to check
sync_state and it makes the test flaky, only way would be add reties
instead not checking for same is better.

e60cab6f

Make dtm_recovery_on_standby test stable · d32116d9

由 Ashwin Agrawal 提交于 12月 23, 2019

This test fails sometimes with below diff

```
-- Sync state between master and standby must be restored at the end.
 select application_name, state, sync_state from pg_stat_replication;
  application_name | state     | sync_state
 ------------------+-----------+------------
- gp_walreceiver   | streaming | sync
+ gp_walreceiver   | streaming | async
 (1 row)
```

The reason being, if this query is excuted in window between when
standby is created and changes state to streaming but yet to set flush
to valid location based on reply from standby.
pg_stat_get_wal_senders() reports sync_state as "async", if flush
location is invalid pointer. Hence, we get the above diff sometimes in
test based on timing.

To fix the same, removing the sync_state field from above query for
this test. As in GPDB we always create standby as sync only, secondly
"state" is giving us what we wish to check here, which is if standby
is up and running or not. Checking for sync_state is unnecessary and
hence avoid the same and make test stable. If we have to keep the
sync_state, would have to add unnecessary retry logic for this query.

d32116d9

23 12月, 2019 3 次提交

Fix add partition relops null pointer issue. · 1a093429

由 Zhenghua Lyu 提交于 12月 23, 2019

transformRelOptions may return a null pointer in some
cases, add the check in function `add_partition_rule`.

1a093429

Remove Motion codepath to detoast HeapTuples, convert to MemTuple instead. · 54022630

由 Heikki Linnakangas 提交于 12月 23, 2019

The Motion sender code has four different codepaths for serializing a
tuple from the input slot:

1. Fetch MemTuple from slot, copy it out as it is.

2. Fetch MemTuple from slot, re-format it into a new MemTuple by fetching
   and inlining any toasted datums. Copy out the re-formatted MemTuple.

3. Fetch HeapTuple from slot, copy it out as it is.

4. Fetch HeapTuple from slot, copy out each attribute separately, fetching
   and inlining any toasted datums.

In addition to the above, there are "direct" versions of codepaths 1 and 3,
used when the tuple fits in the caller-provided output buffer.

As discussed in https://github.com/greenplum-db/gpdb/issues/9253, the
fourth codepath is very inefficient, if the input tuple contains datums
that are compressed inline, but not toasted. We decompress such tuples
before serializing, and in the worst case, might need to recompress them
again in the receiver if it's written out to a table. I tried to fix that
in commit 4c7f6cf7, but it was broken and was reverted in commit
774613a8.

This is a new attempt at fixing the issue. This commit removes codepath 4.
altogether, so that if the input tuple is a HeapTuple with any toasted
attributes, it is first converted to a MemTuple and codepath 2 is used
to serialize it. That way, we have less code to test, and materializing a
MemTuple is roughly as fast as the old code to write out the attributes
of a HeapTuple one by one, except that the MemTuple codepath avoids the
decompression of already-compressed datums.

While we're at it, add some tests for the various codepaths through
SerializeTuple().

To test the performance of the affected case, where the input tuple is
a HeapTuple with toasted datums, I used this:

---
CREATE temporary TABLE foo (a text, b text, c text, d text, e text, f text,
  g text, h text, i text, j text, k text, l text, m text, n text, o text,
  p text, q text, r text, s text, t text, u text, v text, w text, x text,
  y text, z text, large text);
ALTER TABLE foo ALTER COLUMN large SET STORAGE external;
INSERT INTO foo
  SELECT 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
         'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
         repeat('1234567890', 1000)
  FROM generate_series(1, 10000);

-- verify that the data is uncompressed, should be about 110 MB.
SELECT pg_total_relation_size('foo');

\o /dev/null
\timing on
SELECT * FROM foo; -- repeat a few times
---

The last select took about 380 ms on my laptop, with or without this patch.
So the new codepath where the input HeapTuple is converted to a MemTuple
first, is about as fast as the old method. There might be small differences
in the serialized size of the tuple, too, but I didn't explicitly measure
that. If you have a toasted but not compressed datum, the input must be
quite large, so small differences in the datum header sizes shouldn't
matter much.

If the input HeapTuple contains any compressed datums, this avoids the
recompression, so even if converting to a MemTuple was somewhat slower in
that case, it should still be much better than before. I kept the
HeapTuple codepath for the case that there are no toasted datums. I'm not
sure it's significantly faster than converting to a MemTuple either; the
caller has to slot_deform_tuple() the received tuple before it can do
much with it, and that is slower with HeapTuples than MemTuples. But that
codepath is straightforward enough that getting rid of it wouldn't save
much code, and I don't feel like doing the performance testing to justify
it right now.
Reviewed-by: NAsim R P <apraveen@pivotal.io>

54022630

Remove unnecessary checks for NULL return from getChunkFromCache. · 8e78b8fb

由 Heikki Linnakangas 提交于 12月 23, 2019

It cannot return NULL. It will either return a valid pointer, or the
palloc() will ERROR out.
Reviewed-by: NAsim R P <apraveen@pivotal.io>

8e78b8fb

21 12月, 2019 3 次提交

COPY: Allocate partition tuple slot in the per query context · 5b41310a

由 ggbq 提交于 12月 20, 2019

Master having mode as COPY_DISPATCH incorrectly allocated the
TupleTableSlot for a new ResultRelInfo of one partition on the
per-tuple memory context. This can happen in the existence of
ResultRelInfo::ri_partInsertMap. This causes crash because the
per-tuple context will be reset for each tuple iteration. It should be
using the per-query context.

Reproduce the crash using the following SQL commands:

    DROP TABLE IF EXISTS partition_test;

    CREATE TABLE partition_test
    (
      id INT,
      tm TIMESTAMP
    )
    DISTRIBUTED BY (id)
    PARTITION BY RANGE(tm)
    (
    PARTITION p2019 START ('2019-01-01'::TIMESTAMP) END ('2020-01-01'::TIMESTAMP),
    DEFAULT PARTITION extra
    );

    ALTER TABLE partition_test ADD COLUMN dd TIMESTAMP;
    ALTER TABLE partition_test DROP COLUMN dd;
    ALTER TABLE partition_test ADD COLUMN dd TEXT;

    ALTER TABLE partition_test SPLIT DEFAULT PARTITION START ('2020-01-01'::TIMESTAMP) END ('2021-01-01'::TIMESTAMP)
     INTO (PARTITION p2020, DEFAULT PARTITION);

    COPY (SELECT generate_series, '2020-12-20'::TIMESTAMP, 'ABCDEF' FROM generate_series(1, 10000)) TO '/tmp/partition_test.txt';
    COPY partition_test FROM '/tmp/partition_test.txt';
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>

5b41310a

H
Fix version number in configure error message when ORCA is not found. · b5f8e7c3
由 Heikki Linnakangas 提交于 12月 21, 2019
```
Commit 589c737e bumped the expected ORCA version number to 3.86, but
forgot to update the error message.
```
b5f8e7c3
M
docs - backport - update platform requirements to mention GPDB 6, not GPDB 6.0.0 · 7b43d847
由 mkiyama 提交于 12月 20, 2019
```
see commit cffb21fa
```
7b43d847

20 12月, 2019 5 次提交

Set the default value of wal_keep_segments to 5 (#9276) · 34654aae

由 Hao Wu 提交于 12月 20, 2019

In this PR(https://github.com/greenplum-db/gpdb/pull/9248),
we set the default value of wal_keep_segments to 0, the same
as upstream, because we have replication slot to avoid removal
of WAL files required by the mirror. It seems to be fine.

But there is no replication slot for master/standby now. It's
unsafe to remove WAL files if the file was required by the standby.
So, for now, before the replication slot is added to master, let's
set the default value of wal_keep_segments to 5.

(cherry picked from commit 3ce78553)

34654aae

Avoid large replication lag during vacuum, alter table, ctas · 9a33f2ff

由 Ashwin Agrawal 提交于 12月 19, 2019

To cover for Alter Table and CTAS, heap_insert() is common place, so we
felt better to have the call in heap_insert() instead of spreading calls
specifically only for those two functionalities.

Vacuum full uses cluster code, we have placed call separately for vacuum
full, which also covers cluster. Lazy vacuum needed separate call as
well.
Co-authored-by: NAdam Lee <ali@pivotal.io>
Reviewed-by: NPaul Guo <pguo@pivotal.io>
(cherry picked from commit 22b4073d)

9a33f2ff

Check and wait to avoid large replication lag in heap_multi_insert() · ec8a9034

由 Ashwin Agrawal 提交于 12月 19, 2019

Reviewed-by: NAsim R P <apraveen@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NPaul Guo <pguo@pivotal.io>
(cherry picked from commit eae1c6ef)

ec8a9034

Check and wait to avoid large replication lag during AO WAL inserts · d987631c

由 Ashwin Agrawal 提交于 12月 19, 2019

Reviewed-by: NAsim R P <apraveen@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NPaul Guo <pguo@pivotal.io>
(cherry picked from commit cf254d1d)

d987631c

Add mechanism to wait for replication based on amount WAL written · 934e1d7b

由 Ashwin Agrawal 提交于 12月 19, 2019

Transactions on commit, wait for replication and make sure WAL is
flushed up to commit lsn on mirror in GPDB. While commit is madatory
sync/wait point, waiting for replication at some periodic intervals
even before that may be desirable/efficient to act as good citizen in
system. Consider for example setup where primary and mirror can write
at 20GB/sec, while network between them can only transfer at
2GB/sec. Now if CTAS is run in such setup for large table, it can
generate WAL very accresively on primary, but can't be transfered at
that rate to mirror. Hence, there would be pending WAL build-up on
primary. This exhibits two main things:

- new write transactions (even if single tuple I/U/D), would exhibit
latency for amount of time equivalent to the pending WAL to be
shipped and flushed to mirror

- primary needs to have space to hold that much WAL, since till the
WAL is not shipped to mirror, it can't be recycled

So, to make the situation better instead of waiting for mirror only at
commit point, need way to avoid primary racing to forward with WAL
generation and instead have way to move large transactions at more
sustained speed with network and mirrors. This will help to avoid bulk
transactions starving concurrent transactions from commiting due to
sync rep.

Adding global (backend local) variable, which tracks amount of wal
written by transaction. Interface `wait_to_avoid_large_repl_lag()`
which can be called at strategic points to wait for replication. This
interface if threshold amount of WAL is written (defined by a new GUC)
by transaction, calls SyncRepWaitForLSN() with LSN equal to cached
value of WAL flush point. So, using this interface large WAL
generation transactions can wait for replication based on amount of
WAL written by them much before reaching commit point as well.

Discussion:
https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/3qMsyIj3ikA/bcioZv8wAQAJReviewed-by: NAsim R P <apraveen@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NPaul Guo <pguo@pivotal.io>
(cherry picked from commit 0aec3c8f)

934e1d7b