提交 · 5e2c9e8b57ce7818d8554290a26f3276d46227ba · Greenplum / Gpdb

13 8月, 2020 6 次提交

Make Fault Injection sites cheaper, when no faults have been activated. · 5e2c9e8b

由 Heikki Linnakangas 提交于 8月 13, 2020

Fault injection is expected to be *very* cheap, we even enable it on
production builds. That's why I was very surprised when I saw 'perf' report
that FaultInjector_InjectFaultIfSet() was consuming about 10% of CPU time
in a performance test I was running on my laptop. I tracked it to the
FaultInjector_InjectFaultIfSet() call in standard_ExecutorRun(). It gets
called for every tuple between 10000 and 1000000, on every segment.

Why is FaultInjector_InjectFaultIfSet() so expensive? It has a quick exit
in it, when no faults have been activated, but before reaching the quick
exit it calls strlen() on the arguments. That's not cheap. And the function
call isn't completely negligible on hot code paths, either.

To fix, turn FaultInjector_InjectFaultIfSet() into a macro that's only
few instructions long in the fast path. That should be cheap enough.
Reviewed-by: NAshwin Agrawal <aashwin@vmware.com>
Reviewed-by: NJesse Zhang <jzhang@pivotal.io>
Reviewed-by: NAsim R P <pasim@vmware.com>

5e2c9e8b

H
Remove comment about an argument that was removed earlier. · 5edd8625
由 Heikki Linnakangas 提交于 8月 13, 2020
```
'GpPolicy' argument was removed in commit c892d95f.
```
5edd8625
H

Remove forward declaration for non-existent function. · 42ba6b0a
由 Heikki Linnakangas 提交于 8月 13, 2020

42ba6b0a

Fix query string truncation while dispatching to QE · 889ba39e

由 Polina Bungina 提交于 8月 13, 2020

Execution of a long enough query containing multi-byte characters can cause incorrect truncation of the query string. Incorrect truncation implies an occasional cut of a multi-byte character and (with log_min_duration_statement set to 0 ) subsequent write of an invalid symbol to segment logs. Such broken character present in logs produces problems when trying to fetch logs info from gp_toolkit.__gp_log_segment_ext  table - queries fail with the following error: «ERROR: invalid byte sequence for encoding…». 
This is caused by buildGpQueryString function in `cdbdisp_query.c`, which prepares query text for dispatch to QE. It does not take into account character length when truncation is necessary (text is longer than QUERY_STRING_TRUNCATE_SIZE).

889ba39e

P

Coverity: Cleanup minor memory leak issue · c0f0ba08
由 Pengzhou Tang 提交于 7月 29, 2020

c0f0ba08

Allow static partition selection for lossy casts in ORCA · 4bddeeff

由 Divyesh Vanjare 提交于 7月 14, 2020

For a table partitioned by timestamp column, a query such as
  SELECT * FROM my_table WHERE ts::date == '2020-05-10'
should only scan on a few partitions.

ORCA previously supported only implicit casts for partition selection.
This commit, extends ORCA to also support a subset of lossy (assignment)
casts that are order-preserving (increasing) functions. This will
improve ORCA's ability to partition elimination to produce faster plans.

To ensure correctness, the additional supported functions are captured
in an allow-list in gpdb::IsFuncAllowedForPartitionSelection(), which
includes some in-built lossy casts such as ts::date, float::int etc.

Details:
 - For list partitions, we compare our predicate with each distinct
   value in the list to determine if the partition has to be
   selected/eliminated. Hence, none of the operators need to be changed
   for list partition selection

 - For range partition selection, we check bounds of each partition and
   compare it with the predicates to determine if partition has to be
   selected/eliminated.

   A partition such as [1, 2) shouldn't be selected for float = 2.0, but
   should be selected for float::int = 2.  We change the logic for handling
   equality predicates differently when lossy casts are present (ub: upper
   bound, lb: lower bound)

   if (lossy cast on partition col):
     (lb::int <= 2) and (ub::int >= 2)
   else:
     ((lb <= 2 and inclusive lb) or (lb < 2))
     and
     ((ub >= 2 and inclusive ub ) or (ub > 2))

  - CMDFunctionGPDB now captures whether or not a cast is a lossy cast
    supported by ORCA for partition selection. This is then used in
    Expr2DXL translation to identify how partitions should be selected.

4bddeeff

12 8月, 2020 2 次提交

Fix compilation without libuv's uv.h header. · 7858128f

由 Heikki Linnakangas 提交于 8月 12, 2020

ic_proxy_backend.h includes libuv's uv.h header, and ic_proxy_backend.h
was being included in ic_tcp.c, even when compiling with
--disable-ic-proxy.

7858128f

ic-proxy: support parallel backend registeration to proxy · 608514c5

由 Hubert Zhang 提交于 7月 30, 2020

Previously, when backends connect to a proxy, we need to setup
domain socket pipe and send HELLO message(recv ack message) in
a blocking and non-parallel way. This makes ICPROXY hard to introduce
check_for_interrupt during backend registeration.

By utilizing libuv loop, we could register backend in paralle. Note
that this is one of the step to replace all the ic_tcp backend logic
reused by ic_proxy currently. In future, we should use libuv to replace
all the backend logic, from registeration to send/recv data.
Co-authored-by: NNing Yu <nyu@pivotal.io>

608514c5

10 8月, 2020 2 次提交

Cleanup idle gangs after releasing lock table's partition lock · 22308d6a

由 ppggff 提交于 8月 05, 2020

When the GUC parameter resource_cleanup_gangs_on_wait is on, the backend will clean up idle gangs before waiting for the resource queue lock. The cleanup operation involves network IO, so it takes a while.

In the current code, the cleanup operation still holds a partition lock that would normally only be held for a short period of time, which will prevent normal lock table operations by other backends. The cleanup operation should be moved to after releasing the partition lock and before the backend starts waiting.

22308d6a

ic-proxy: type checking in ic_proxy_new() · a3ef623d

由 Ning Yu 提交于 8月 10, 2020

A typical mistake on allocating typed memory is as below:

    int64 *ptr = malloc(sizeof(int32));

To prevent this, now we make ic_proxy_new() a typed allocator, it always
return a pointer of the specified type, for example:

    int64 *p1 = ic_proxy_new(int64); /* good */
    int64 *p2 = ic_proxy_new(int32); /* bad, gcc will raise a warning */
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>

a3ef623d

08 8月, 2020 4 次提交
- M
  docs - support proxies for GPDB interconnect (#10592) · 81f4406f
  由 Mel Kiyama 提交于 8月 07, 2020
```
* docs - support proxies for GPDB interconnect

-New topic in Admin. Guide.
-New GUC gp_interconnect_proxy_addresses
-Updated GUC gp_interconnect_type - new value PROXY

Also added note to gpexpand documents - do no use proxy during expand.

* docs - review comment updates

* docs - review comment updates.
-update IC proxy example function
-other minor edits.

* docs - add not about running ic-proxy configuration function.
```
  81f4406f
- L
  
  docs - clarify resgroup status cpu max value (#10590) · efbfb61b
  由 Lisa Owen 提交于 8月 07, 2020
  
  efbfb61b
- L
  
  docs - add gpcheckcat aoseg_table test (#10579) · d82e89b4
  由 Lisa Owen 提交于 8月 07, 2020
  
  d82e89b4
- L
  
  docs - gp_fts_replication_attempt_count guc info (#10578) · 87c18746
  由 Lisa Owen 提交于 8月 07, 2020
  
  87c18746
07 8月, 2020 5 次提交

Simplify responsibility for 'rel' in SET DISTRIBUTED BY and EXTEND TABLE. · 8518ed4f

由 Heikki Linnakangas 提交于 8月 07, 2020

SET DISTRIBUTED BY and EXTEND TABLE subcommands worked differently from
all other ALTER TABLE subcommands in who's responsible for closing the
relcache reference. In all other subcommands, ATRewriteCatalogs opens and
closes the 'rel', but for these two commands, the ATExec*() function
closed it. I don't see any good reason for that. There were very old
comments about forcing the relcache entry to be forgotten, but that
explanation doesn't make sense to me, and everything seems to work without
the early closing. Maybe it was needed a long time ago, but the code has
changed a lot since it was written. Simplify, by closing the relation in
ATRewriteCatalogs(), like with all other ALTER TABLE subcommands.
Reviewed-by: NAsim R P <pasim@vmware.com>

8518ed4f

ic-proxy: pass the copydml test · 0b46f726

由 Ning Yu 提交于 8月 07, 2020

The copydml test creates both BEFORE and AFTER triggers on a table and
checks the execution order of the client output.  The original output
order is "BEFORE -> RESULT -> AFTER", this is the order produced in
ic-tcp mode; in ic-udpifc mode the order has a chance to become "RESULT
-> BEFORE -> AFTER", it is not determined.  There are 2 variants of the
answer files for these 2 orders, so the test can pass on any order.

In ic-proxy mode, the order can also be "BEFORE -> AFTER -> RESULT", so
we also need a 3rd variant of the answer file for this order.  By
providing this we could pass the test in ic-proxy mode, and we could
re-enable the copydml test, it was previously disabled on ic-proxy
pipeline jobs.
Reviewed-by: NAsim R P <pasim@vmware.com>
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>

0b46f726

Fix the flaky test of walreceiver (#10588) · ef949df2

由 Hao Wu 提交于 8月 07, 2020

walreceiver is quite sensitive to any WAL write. After create a table
and insert some tuples, it doesn't run a vacuum. There may be other
cases that cause some WAL traffic. One of the WAL records is
relevant to the hot-standby to record RUNNING_XACTS.

The last test_receive() only tests there should be no WAL traffic
after receiving the WAL records to the current xlog location. It still
has a gap that a new WAL record transmitted to the walreceiver.
It's not meaningful to run test_receive(), since the function
test_receive_and_verify() has verified that all the WAL traffic
to the latest xlog location is received. So, removed test_receive().
Reviewed-by: NPaul Guo <paulguo@gmail.com>
Reviewed-by: N(Jerome)Junfeng Yang <jeyang@pivotal.io>

ef949df2

Fix connection string for gpsd · e4bde277

由 Abhijit Subramanya 提交于 8月 05, 2020

gpsd was failing because the connectionString that we passed to pgdb.connect
had the parameters in the wrong order. It started failing after upgrading to a
higher version of PyGreSQL. So use a dictionary instead in order to avoid
sending in the parameters incorrectly.
Co-authored-by: NAshwin Agrawal <aashwin@vmware.com>

e4bde277

L

docs - update pxf xrefs to v5.14 (#10591) · 7f79304d
由 Lisa Owen 提交于 8月 06, 2020

7f79304d

06 8月, 2020 4 次提交

coverity: Fix 'unchecked return value' issues. (#10545) · af0fac18

由 Paul Guo 提交于 8月 06, 2020

Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NHao Wu <gfphoenix78@gmail.com>

af0fac18

Fix potential panic in visibility check code. (#10589) · 85811692

由 Paul Guo 提交于 8月 06, 2020

We've seen a panic case on gpdb 6 with stack as below,

3 markDirty (isXmin=0 '\000', tuple=0x7effe221b3c0, relation=0x0, buffer=16058) at tqual.c:105
4 SetHintBits (xid=<optimized out>, infomask=1024, rel=0x0, buffer=16058, tuple=0x7effe221b3c0) at tqual.c:199
5 HeapTupleSatisfiesMVCC (relation=0x0, htup=<optimized out>, snapshot=0x15f0dc0 <CatalogSnapshotData>, buffer=16058) at tqual.c:1200
6 0x00000000007080a8 in systable_recheck_tuple (sysscan=sysscan@entry=0x2e85940, tup=tup@entry=0x2e859e0) at genam.c:462
7 0x000000000078753b in findDependentObjects (object=0x2e856e0, flags=<optimized out>, stack=0x0, targetObjects=0x2e85b40, pendingObjects=0x2e856b0,
depRel=0x7fff2608adc8) at dependency.c:793
8 0x00000000007883c7 in performMultipleDeletions (objects=objects@entry=0x2e856b0, behavior=DROP_RESTRICT, flags=flags@entry=0) at dependency.c:363
9 0x0000000000870b61 in RemoveRelations (drop=drop@entry=0x2e85000) at tablecmds.c:1313
10 0x0000000000a85e48 in ExecDropStmt (stmt=stmt@entry=0x2e85000, isTopLevel=isTopLevel@entry=0 '\000') at utility.c:1765
11 0x0000000000a87d03 in ProcessUtilitySlow (parsetree=parsetree@entry=0x2e85000,

The reason is that we pass a NULL relation to the visibility check code, which
might use the relation variable to determine if hint bit should be set or not.
Let's pass the correct relation variable even it might not be used finally.

I'm not able to reproduce the issue locally so I can not provide a test case
but that is surely a potential issue.
Reviewed-by: NAshwin Agrawal <aashwin@vmware.com>

85811692

Print CTID when we detect data distribution wrong for UPDATE|DELETE. · 3faf0b51

由 Zhenghua Lyu 提交于 8月 06, 2020

When update or delete statement errors out because of the CTID is
not belong to the local segment, we should also print out the CTID
of the tuple so that it will be much easier to locate the wrong-
distributed data via:
  `select * from t where gp_segment_id = xxx and ctid='(aaa,bbb)'`.

3faf0b51

D

Docs - update component versions for 6.10 · 0a475980
由 David Yozie 提交于 8月 05, 2020

0a475980

05 8月, 2020 2 次提交
- H
  Coverity: fix some coverity issues (#10511) · cdc41e2b
  由 Hao Wu 提交于 8月 05, 2020
```
Mainly fixed the file handler issues.
```
  cdc41e2b
- L
  docs - note how oss users get gpbackup (#10478) · fa7b4b4b
  由 Lisa Owen 提交于 8月 04, 2020
```
* docs - note how oss users get gpbackup

* small edit
```
  fa7b4b4b
04 8月, 2020 2 次提交

ic-proxy: correct SIGHUP handler · a181655b

由 Ning Yu 提交于 8月 04, 2020

Fixed the bug that the SIGHUP handler was installed for SIGINT by
mistake, so the ic-proxy bgworkers would die on SIGHUP.

By correcting the signal name, now we could let the ic-proxy bgworkers
reload the postgresql.conf by executing "gpstop -u".
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>

a181655b

gpcheckcat: fix inconsistent issues reporting · 85103e37

由 Adam Lee 提交于 7月 30, 2020

Note that since PyGreSQL 5.0 this method will return the values of array
type columns as Python lists.

ref: https://pygresql.org/contents/pg/query.html

85103e37

03 8月, 2020 7 次提交

H
Coverity: Check return value of strcmp · fd498cf9
由 Hubert Zhang 提交于 7月 29, 2020
```
return value of strcmp is not checked in some branches.
```
fd498cf9

Coverity: Behave test misformat · 126ba1c6

由 Hubert Zhang 提交于 7月 29, 2020

conn = dbconn.connect() should be aligned with if statement, or it
will never be executed.

126ba1c6

ic-proxy: handle early coming BYE correctly · 79ff4e62

由 Ning Yu 提交于 8月 02, 2020

In a query that contains multiple init/sub plans, the packets of the
second subplan might be received while the first is still being
processed in the ic-proxy mode, this is because in ic-proxy mode a local
host handshake is used instead of the global one.

To distinguish the packets of different subplans, especially for the
early coming ones, we must stop handling on the BYE immediately, and
pass any unhandled early coming pkts to the successor or the
placeholder.

This fixes the random hanging during the ICW parallel group of
qp_functions_in_from.  No new test is added.
Co-authored-by: NHubert Zhang <hzhang@pivotal.io>
Co-authored-by: NNing Yu <nyu@pivotal.io>

79ff4e62

H
Coverity: Sizeof not portable · cf23db49
由 Hubert Zhang 提交于 7月 29, 2020
```
sizeof(HeapTuple *) should be sizeof(HeapTuple)
```
cf23db49
H
Coverity: Logically dead code · 38552bf7
由 Hubert Zhang 提交于 7月 29, 2020
```
Remove dead code.
insertDesc is alwasy NULL in ao_vacuum_rel_compact()
```
38552bf7

Coverity: Variable unused · e5c86775

由 Hubert Zhang 提交于 7月 29, 2020

Remove unused varaible.
For tuplesort.h, we doesn't support mksort based cluster,
so we should just set is_mk_tuplesortstate to false

e5c86775

H
Coverity: Identical code for different branches · f076205c
由 Hubert Zhang 提交于 7月 29, 2020
```
Clean identical code in heap.c and analyze.c
```
f076205c

01 8月, 2020 5 次提交
- H
  
  Remove comment about field that was removed earlier. · 42f796ca
  由 Heikki Linnakangas 提交于 7月 31, 2020
  
  42f796ca
- H
  Remove incomplete support for dependency entries on pg_compression. · e6116093
  由 Heikki Linnakangas 提交于 7月 31, 2020
```
Also, the entry for ExtprotocolRelationid was in wrong place in
object_classes[]. It's a bit surprising that didn't cause any ill effects,
but let's fix it in any case.
```
  e6116093
- H
  
  Fix description of 'numeric_dec' function. · ddd4244f
  由 Heikki Linnakangas 提交于 7月 31, 2020
  
  ddd4244f
- H
  Remove stray unused struct. · fa94c56a
  由 Heikki Linnakangas 提交于 7月 31, 2020
```
It was added in commit 0138eed4, but was never used for anything.
```
  fa94c56a
- H
  
  Make function static. · c501ad2e
  由 Heikki Linnakangas 提交于 7月 31, 2020
  
  c501ad2e
31 7月, 2020 1 次提交

Remove fault injection from gpMgmt · 2f65547b

由 Tyler Ramer 提交于 7月 29, 2020

The command execution framework shipped with a fault injection in
delivered code. See https://github.com/greenplum-db/gpdb/issues/10546
for execution details and implications.

It seems the fault injection framework was added in 2009, used
sparingly, and should be removed until it can be safely replaced.

Additionally, the "gppylib/test/regress" folder used fault injector, but
the "check-regress" target seems not to have been called - obvious
because pygresql regression checks are present, but pygresql has not
been in master for some time without causing any errors to these tests
Authored-by: NTyler Ramer <tramer@vmware.com>

2f65547b