提交 · a8eb4dc1e4a95f5aaafca9e07dd8da64ed2503b2 · Greenplum / Gpdb

21 4月, 2020 2 次提交

Fix getDtxCheckPointInfo to contain all committed transactions (#9940) · a8eb4dc1

由 Hao Wu 提交于 4月 21, 2020

Half committed transactions in shmCommittedGxactArray are omitted.
The bug could cause data loss/inconsistency. If transaction T1
failed to commit prepared for some reasons, and the transaction T1
has been committed on the master and other segments, but the transaction
T1 isn't appended in the checkpoint record. So the DTX recovery
can't retrieve the transaction and run recovery-commit-prepared,
and the prepared transactions on the segment are aborted.
Co-authored-by: NGang Xiong <gxiong@pivotal.io>

a8eb4dc1

Change random_page_access default value to 4 · 9c466e58

由 Weinan WANG 提交于 4月 21, 2020

In upstream, random_page_access default value is always 4. For some unknown reason, we bump up to 100, which impacts bitmap index, index, and index-only scan cost different. After some performance tests in ao and aoco table, the diminishment of the value also can leverage these table scans' plans.

9c466e58

20 4月, 2020 5 次提交

Fix zero plan_node_id for BitmapOr/And in ORCA · 53a0b781

由 Denis Smirnov 提交于 4月 10, 2020

According to plannode.h "plan_node_id" should be unique across
entire final plan tree. But ORCA DXL to PlanStatement translator
returns uninitialized zero values for BitmapOr and BitmapAnd nodes.
This behaviour differs from Postgres planner and from all other
node translations in this class. It was fixed.

53a0b781

Do not push Volatile funcs below aggs · 885ca8a9

由 Sambitesh Dash 提交于 4月 09, 2020

Consider the scenario below

```
create table tenk1 (c1 int, ten int);
create temp sequence ts1;
explain select * from (select distinct ten from tenk1) ss where ten < 10 + nextval('ts1') order by 1;
```

The filter outside the subquery is a candidate to be pushed below the
'distinct' in the sub-query.  But since 'nextval' is a volatile function, we
should not push it.

Volatile functions give different results with each execution. We don't want
aggs to use result of a volatile function before it is necessary. We do it for
all aggs - DISTINCT and GROUP BY.

Also see commit 6327f25d.

885ca8a9

Use a unicast IP address for interconnection (#9696) · 790c7bac

由 Hao Wu 提交于 4月 20, 2020

* Use a unicast IP address for interconnection on the primary

Currently, interconnect/UDP always binds the wildcard address to
the socket, which makes all QEs on the same node share the same
port space(up to 64k). For dense deployment, the UDP port could run
out, even if there are multiple IP address.
To increase the total number of available ports for QEs on a node,
we bind a single/unicast IP address to the socket for interconnect/UDP,
instead of the wildcard address. So segments with different IP address
have different port space.
To fully utilize this patch to alleviate running out of port, it's
better to assign different ADDRESS(gp_segment_configuration.address) to
different segment, although it's not mandatory.

Note: QD/mirror uses the primary's address value in
gp_segment_configuration as the destination IP to connect to the
primary.  So the primary returns the ADDRESS as its local address
by calling `getsockname()`.

* Fix the origin of the source IP address for backends

The destination IP address uses the listenerAddr of the parent slice.
But the source IP address to bind is difficult. Because it's not
stored on the segment, and the slice table is sent to the QEs after
they had bound the address and port. The origin of the source
IP address for different roles is different:
1. QD : by calling `cdbcomponent_getComponentInfo()`
2. QE on master: by qdHostname dispatched by QD
3. QE on segment: by the local address for QE of the TCP connection

790c7bac

Fix a bug that reader gang always fail due to missing writer gang. (#9828) · 24f16417

由 Paul Guo 提交于 4月 20, 2020

The reason is that new created reader gang would fail on QE due to missing
writer gang process in locking code, and retry would fail again with the same
reason, since the cached writer gang is still used because QD does not know &
check the real libpq network status. See below for the repro case.

Fixing this by checking the error message and then reset all gangs if seeing
the error message, similar to the code logic that checks the startup/recovery
message in gang create function. We could have other fixes, e.g. checking the
writer gang network status, etc but those fixes seem to be ugly after trying.

create table t1(f1 int, f2 text);
<kill -9 one idle QE>

insert into t1 values(2),(1),(5);
ERROR:  failed to acquire resources on one or more segments
DETAIL:  FATAL:  reader could not find writer proc entry, lock [0,1260] AccessShareLock 0 (lock.c:874)
 (seg0 192.168.235.128:7002)

insert into t1 values(2),(1),(5);
 ERROR:  failed to acquire resources on one or more segments
 DETAIL:  FATAL:  reader could not find writer proc entry, lock [0,1260] AccessShareLock 0 (lock.c:874)
  (seg0 192.168.235.128:7002)

<-- Above query fails again.

The patch removes useless function GangOK() - this is not relevant of this fix
though.
Reviewed-by: NPengzhou Tang <ptang@pivotal.io>
Reviewed-by: NAsim R P <apraveen@pivotal.io>

24f16417

Fix flaky uao_crash_compaction_row test · b00916e6

由 Pengzhou Tang 提交于 4月 10, 2020

This test creates an AO table and inserts data on all segments, then
it deletes some data on seg0 and seg1 and do a vacuum on the AO
table. When doing vacuum, it suspends the QE in seg0 at starting
doing the post vacuum cleanup, then crash the seg0 and finally do
the post crash validation checks using gp_toolkit.__gp_aoseg(), this
function will check all aoseg info on all segments.

The VACUUM process on seg1 is in an uncertain state, it might have
finished the post cleanup which is expected or hasn't started yet,
so the aoseg info in seg1 is uncertain too.

To resolve this, this commit added a new injector on the point all
post vacuum cleanup are committed and validate the aoseg info after
the vacuum process on seg1 reached this point.

b00916e6

18 4月, 2020 10 次提交
- D
  
  Docs - fix some broken links · 3f44f202
  由 David Yozie 提交于 4月 17, 2020
  
  3f44f202
- H
  Fix bogus return value. · 01cf0a4a
  由 Heikki Linnakangas 提交于 4月 17, 2020
```
'cppcheck' didn't like the return value:

[src/backend/utils/error/elog.c:983]: (error) Returning pointer to local variable 'outbuf' that will be invalid when returning.

The return value was never used, but let's be tidy.
```
  01cf0a4a
- H
  Fix dangling pointer when computing legacy cdbhash for NaNs. · c026c1d7
  由 Heikki Linnakangas 提交于 4月 17, 2020
```
Identified by 'cppcheck'.
```
  c026c1d7
- H
  Remove misc unused functions. · bfbd0f61
  由 Heikki Linnakangas 提交于 4月 17, 2020
```
Identified by 'cppcheck'.
```
  bfbd0f61
- H
  Stop using obsolete 'adsrc' column in tests. · 5aabe7fb
  由 Heikki Linnakangas 提交于 4月 17, 2020
```
It is removed in PostgreSQL v12, commit fe503823. Let's get ahead the
curve, and stop using it in tests.
```
  5aabe7fb
- H
  Mark gp_create_restore_point() volatile. · 03ef2472
  由 Heikki Linnakangas 提交于 4月 17, 2020
```
It has side-effects, so surely IMMUTABLE is not right.
```
  03ef2472
- H
  
  Clean up some unnecessary cosmetic code differences vs upstream. · 5287f646
  由 Heikki Linnakangas 提交于 4月 17, 2020
  
  5287f646
- H
  Remove unused xid comparison functions. · c1085ca0
  由 Heikki Linnakangas 提交于 4月 17, 2020
```
There were probably supposed to be connected to an xid btree opclass, but
they were not. There were not even pg_proc entries for these.
```
  c1085ca0
- H
  Refactor toast_delete(). · cbbc8a8e
  由 Heikki Linnakangas 提交于 4月 17, 2020
```
My long-term goal is to get rid of GenericTuple, and this moves us one
step closer to that. All the callers of toast_delete() know whether
they're dealing with a HeapTuple or MemTuple, so no need for the
genericity here.
```
  cbbc8a8e
- L
  
  docs - add content for new gp_resource_group_queuing_timeout guc (#9925) · d8119bd4
  由 Lisa Owen 提交于 4月 17, 2020
  
  d8119bd4
17 4月, 2020 3 次提交

Fix plan when segmentgeneral union all general locus · 00c2aa93

由 Zhenghua Lyu 提交于 4月 17, 2020

Previously, planner cannot generate plan when a replicated
table union all a general locus scan. A typical case is:

  select a from t_replicate_table
  union all
  select * from generate_series(1, 10);

The root cause is in the function `set_append_path_locus`
it deduces the whole append path's locus to be segmentgeneral.
This is reasonable. However, in the function `cdbpath_create_motion_path`
it fails to handle two issues in the case:
  * segmentgeneral locus to segmentgeneral locus
  * general locus to segmentgeneral locus
And the both above locus change does not need motion in fact.

This commit fixes this by:
  1. add a check at the very begining of `cdbpath_create_motion_path`
     that if the subpath's locus is the same as target locus, just return
  2. add logic to handle general locus to segmentgeneral locus, just return

00c2aa93

Reconsider direct dispatch when Result node's contains gp_execution_segment() = <segid> · 509342bd

由 Zhenghua Lyu 提交于 4月 17, 2020

Greenplum may use the skill that setting Result plan's resconstantqual to
be an expression of gp_execution_segment() = <segid>. This skill is used
in to turn a general locus path to partitioned locus path in the function
set_append_path_locus. When using this skill, we should re-consider direct
dispatch info at the function of create_projection_plan to handle it correctly.

This commit fixes github issue: https://github.com/greenplum-db/gpdb/issues/9874.
For more details, refer to the issue.

509342bd

Improve efficiency in pg_lock_status() · 991273b2

由 Fang Zheng 提交于 4月 17, 2020

Allocate memory of CdbPgResults.pg_results with palloc0() instead
of calloc(), and free the memory afer use.

The CdbPgResults.pg_results array that is returned from various dispatch
functions is allocated by cdbdisp_returnResults() via calloc(), but in
most cases the memory is not free()-ed after use.

To avoid memory leak, the array is now allocated with palloc0() and
recycled with pfree().

Track which row and which result set is being processed in function context in pg_lock_status(), 
so that an ineffient inner loop can be eliminated.

991273b2

16 4月, 2020 4 次提交
- D
  
  Docs - update madlib package name examples to latest (1.17) · d0b9ae57
  由 David Yozie 提交于 4月 16, 2020
  
  d0b9ae57
- L
  
  docs - add dblink connection string example with sslmode (#9924) · f66945aa
  由 Lisa Owen 提交于 4月 16, 2020
  
  f66945aa
- J
  Fix system views pg_stat_* (#9911) · e6f93033
  由 Jinbao Chen 提交于 4月 16, 2020
```
In the past, these system views only counted the values on each segment,
and the master value was 0. We add some new views to gather the value from
segment to master.
Co-authored-by: NZhenghua Lyu <kainwen@gmail.com>
```
  e6f93033
- L
  docs - add info about querying the pg_stat_last_operation tbl (#9883) · 9dbdbbea
  由 Lisa Owen 提交于 4月 15, 2020
```
* docs - add info about querying the pg_stat_last_operation tbl

* edits requested by david
```
  9dbdbbea
15 4月, 2020 6 次提交

D
Fix typos in documentation · d9faedac
由 Daniel Gustafsson 提交于 4月 15, 2020
```
Various typos spotted in internal in-tree documentation.
```
d9faedac

Fix CTAS 'with no data' bug · b5b28dec

由 xiong-gang 提交于 4月 15, 2020

As reported in issue #9790, 'CTAS with no data' statement doesn't handle WITH
clause, the options in WITH clause should be added in 'pg_attribute_encoding'.

b5b28dec

Add missing field skipData in RefreshClause serialization. · 190022c9

由 Hubert Zhang 提交于 4月 15, 2020

SkipData flag should only short circuit in transientrel_receive on QE

We should still do the begin/end work, e.g. remove the new
created temp file, or we will have file leak.
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>

190022c9

D
Remove redundant bz2 lib specification · 81a7b921
由 Daniel Gustafsson 提交于 4月 15, 2020
```
lbz2 is already in the LIBS, remove the redundant.
Signed-off-by: NAdam Lee <ali@pivotal.io>
```
81a7b921

Set the libraries of gpfdist and gpmapreduce at the top level · 04c8bfcb

由 Daniel Gustafsson 提交于 4月 14, 2020

Before this the top level configure only sets the have_yaml, but doesn't
add -lyaml to LIBS for gpfdist or gpmapreduce.

Let Autoconf resolve all the relevant libraries here and place them in
$XXX_LIBS variables.
Signed-off-by: NAdam Lee <ali@pivotal.io>

04c8bfcb

Speed up stats derivation for large number of disjunction in ORCA · 9d05d090

由 Shreedhar Hardikar 提交于 4月 09, 2020

This bug is particularly evident with queries containing a large array
IN clause, e.g "a IN (1, 3, 5, ...)".

As a first step to improve optimization times for such queries, this
commit reduces unnecessary re-allocation of histogram buckets during the
merging of statistics of disjunctive predicates.

It improves the performance of the target query with 7000 elements in
the array comparison by around 50%.
Co-authored-by: NShreedhar Hardikar <shardikar@pivotal.io>
Co-authored-by: NAshuka Xue <axue@pivotal.io>

9d05d090

14 4月, 2020 1 次提交

Move --disable-pxf from compile_gpdb.bash to gpAux/Makefile (#9897) · 9cbeac64

由 Tingfang Bao 提交于 4月 14, 2020

Remove CONFIGURE_FLAGS for win32 in compile_gpdb.bash

  The configure values are not used at all, so remove the unused codes here.
Authored-by: NTingfang Bao <baotingfang@gmail.com>

9cbeac64

13 4月, 2020 1 次提交

Fix a bug when setting DistributedLogShared->oldestXmin · 2759b6dc

由 dh-cloud 提交于 4月 13, 2020

The shared oldestXmin (DistributedLogShared->oldestXmin) may be updated
concurrently. It should be set to a higher value, because a higher xmin
can belong to another distributed log segment, its older segments might
already be truncated.

For Example: txA and txB call DistributedLog_AdvanceOldestXmin concurrently.

```
txA and txB: both hold shared DistributedLogTruncateLock.

txA: set the DistributedLogShared->oldestXmin to XminA. TransactionIdToSegment(XminA) = 0009

txB: set the DistributedLogShared->oldestXmin to XminB. TransactionIdToSegment(XminB) = 0008

txA: truncate segment 0008, 0007...
```

After that, DistributedLogShared->oldestXmin == XminB, it is on removed
segment 0008. Subsequent GetSnapshotData() calls will be failed because SimpleLruReadPage will error out.

2759b6dc

10 4月, 2020 3 次提交

Add check for indexpath when bring_to_outer_query and bring_to_singleQE. · ef0a34d1

由 Zhenghua Lyu 提交于 4月 10, 2020

Previously, in function bring_to_outer_query and bring_to_singleQE
it depends on the path->param_info field to determine if the path
can be taken into consideration since we cannot pass params across
motion node. But this is not enough, for example, an index path's
param_info field might be null, but its orderbyclauses refs some
outer params. This commit fixes the issue by adding more check
for indexpath.

See Github Issue: https://github.com/greenplum-db/gpdb/issues/9733
for details.

ef0a34d1

Handle opfamilies/opclasses for distribution in ORCA · e7ec9f11

由 Shreedhar Hardikar 提交于 10月 15, 2019

GPDB 6 introduced a mechanism to distribute table tables on columns
using a custom hash opclass, instead of using cdbhash. Before this
commit, ORCA would ignore the distribution opclass, but ensuring the
translator would only allow queries in which all tables were distributed
by either their default or default "legacy" opclasses.

However, in case of tables distributed by legacy or default opclasses,
but joined using a non-default opclass operator, ORCA would produce an
incorrect plan, giving wrong results.

This commit fixes that bug by introducing support for distributed tables
using non-default opfamilies/opclasses. But, even though the support is
implemented, it is not fully enabled at this time. The logic to fallback
to planner when the plan contains tables distributed with non-default
non-legacy opclasses remains. Our intention is to support it fully in
the future.

How does this work?
For hash joins, capture the opfamily of each hash joinable operator. Use
that to create hash distribution spec requests for either side of the
join.  Scan operators derive a distribution spec based on opfamily
(corresponding to the opclass) of each distribution column.  If there is
a mismatch between distribution spec requested/derived, add a Motion
Redistribute node using the distribution function from the requested
hash opfamily.

The commit consists of several sub-sections:
- Capture distr opfamilies in CMDRelation and related classes

  For each distribution column of the relation, track the opfamily of
  "opclass" used in the DISTRIBUTED BY clause. This information is then
  relayed to CTableDescriptor & CPhysicalScan.

  Also support this in other CMDRelation subclasses: CMDRelationCTAS
  (via CLogicalCTAS) & CMDRelationExternalGPDB.

- Capture hash opfamily of CMDScalarOp using gpdb::GetCompatibleHashOpFamily()
  This is need to determined distribution spec requests from joins.

- Track hash opfamilies of join predicates

  This commit extends the caching of join keys in Hash/Merge joins by
  also caching the corresponding hash opfamilies of the '=' operators
  used in those predicates.

- Track opfamily in CDistributionSpecHashed.

  This commit also constructs CDistributionSpecHashed with opfamily
  information that was previously cached in CScalarGroup in the case of
  HashJoins.
  It also includes the compatibility checks that reject distributions
  specs with mismatched opfamilies in order to produce Redistribute
  motions.

- Capture default distribution (hash) opfamily in CMDType
- Handle legacy opfamilies in CMDScalarOp & CMDType
- Handle opfamilies in HashExprList Expr->DXL translation

ORCA-side notes:
1. To ensure correctness, equivalent classes can only be determined over
   a specific opfamily. For example, the expression `a = b` implies a &
   b belong to an equiv classes only for the opfamily `=` belongs to.
   Otherwise expression `b |=| c` can be used to imply a & c belong to
   the same equiv class, which is incorrect, as the opfamily of `=` and
   `|=|` differ.
   For this commit, determine equiv classes only for default opfamilies.
   This will ensure correct behavior for majority of cases.
2. This commit does *not* implement similar features for merge joins.
   That is left for future work.
3. This commit introduces two traceflags:
   - EopttraceConsiderOpfamiliesForDistribution: If this is off,
     opfamilies is ignored and set to NULL. This mimics behavior before
     this PR. Ctest MDPs are run this way.
   - EopttraceUseLegacyOpfamilies: Set if ANY distribution col in the
     query uses a legacy opfamily/opclass. MDCache getters will then
     return legacy opfamilies instead of the default opfamilies for all
     queries.

What new information is captured from GPDB?
1. Opfamily of each distribution column in CMDRelation,
   CMDRelationCtasGPDB & CMDRelationExternalGPDB
2. Compatible hash opfamily of each CMDScalarOp using
   gpdb::GetCompatibleHashOpFamily()
3. Default distribution (hash) opfamily of every type.
   This maybe NULL for some types. Needed for certain operators (e.g
   HashAgg) that request distribution spec that cannot be inferred in
   any other way: cannot derive it, cannot get it from any scalar op
   etc. See GetDefaultDistributionOpfamilyForType()
4. Legacy opfamilies for types & scalar operators.
   Needed for supporting tables distributed by legacy opclasses.

Other GPDB side changes:

1. HashExprList no longer carries the type of the expression (it is
   inferred from the expr instead). However, it now carries the hash
   opfamily to use when deriving the distribution hash function. To
   maintain compatibility with older versions, the opfamily is used only
   if EopttraceConsiderOpfamiliesForDistribution is set, otherwise,
   default hash distribution function of the type of the expr is used.
2. Don't worry about left & right types in get_compatible_hash_opfamily()
3. Consider COERCION_PATH_RELABELTYPE as binary coercible for ORCA.
4. EopttraceUseLegacyOpfamilies is set if any table is distributed by a
   legacy opclass.

e7ec9f11

S
Revert "Fallback when citext op non-citext join predicate is present" · 0410b7ba
由 Shreedhar Hardikar 提交于 10月 28, 2019
```
This reverts commit 3e45f064.
```
0410b7ba

09 4月, 2020 3 次提交

Always generate merge gather for select statement with locking clause and sort clause. · 9c78d5af

由 Zhenghua Lyu 提交于 4月 09, 2020

LockRowsPath will clear the pathkeys info since
when some other transactions concurrently update
the same relation then it cannot guarantee the order.
Postgres will not consider parallel path for the
select statement with locking clause (it sets parallel_safe
to false and parallel_workers to 0 in function
create_lockrows_path). However, Greenplum contains many
segments and is innately parallel. If we simply clear
the pathkey then if later we need a gather, we will
not choose merge gather so even if there is no concurrent
transaction, the data is not in order. See Github issue:
https://github.com/greenplum-db/gpdb/issues/9724.
So just before the finaly gather, we save the pathkeys
and then invoke create_lockrows_path. In the following
gather, if we found saved_pathkeys is not NIL, we just
create a merge gather.

Another need to mention here is that, the condition that
code can reach create_lockrows_path is very rigour: the
query has to be a toplevel select statement and the range
table has to be a normal heap table and there is only one
table invole the query and many other conditions (please
refer `checkCanOptSelectLockingClause` for details. As the
above analysis, if the code reaches here and the path->pathkeys
is not NIL, the following gather has to be the final gather.
This is very important because if it is not the final gather,
it might be used by others as subpath, and its pathkeys is
not NIL which breaks the rules for lockrows path. We need
to keep the pathkeys in the final gather here.

9c78d5af

Truncate AO relation correctly (#9806) · bb8d0305

由 Hubert Zhang 提交于 4月 09, 2020

In past, when create table and truncate table are in the same
transaction, Greenplum will call heap_truncate_one_rel() to truncate
the relation. But for AO tables, it has different segmenting logic,
which leads to some of segment files cannot be truncated to zero.

Using ao_truncate_one_rel to replace heap_truncate_one_rel() and
handle the segmenting by ao_foreach_extent_file().
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>

bb8d0305

Update orca test pipeline · 79d4b3e9

由 Ashuka Xue 提交于 4月 08, 2020

Add a missing resource so that the ORCA dev pipeline can be run for both
master and 6X.

79d4b3e9

08 4月, 2020 2 次提交

P
Remove redundant 'hasError' flag in TeardownTCPInterconnect · a6ae448d
由 Pengzhou Tang 提交于 3月 30, 2020
```
This flag is duplicated with 'forceEOS', 'forceEOS' can also tell
whether errors occur or not.
```
a6ae448d

Fix interconnect hung issue · ec1d9a70

由 Pengzhou Tang 提交于 3月 24, 2020

We hit interconnect hung issue many times in many cases, all have
the same pattern: the downstream interconnect motion senders keep
sending the tuples and they are blind to the fact that upstream
nodes have finished and quitted the execution earlier, the QD
then get enough tuples and wait all QEs to quit which cause a
deadlock.

Many nodes may quit execution earlier, eg, LIMIT, HashJoin, Nest
Loop, to resolve the hung issue, they need to stop the interconnect
stream explicitly by calling ExecSquelchNode(), however, we cannot
do that for rescan cases in which data might lose, eg, commit
2c011ce4. For rescan cases, we tried using QueryFinishPending to
stop the senders in commit 02213a73 and let senders check this
flag and quit, that commit has its own problem, firstly, QueryFini
shPending can only set by QD, it doesn't work for INSERT or UPDATE
cases, secondly, that commit only let the senders detect the flag
and quit the loop in a rude way (without sending the EOS to its
receiver), the receiver may still be stuck inreceiving tuples.

This commit revert the QueryFinishPending method firstly.

To resolve the hung issue, we move TeardownInterconnect to the
ahead of cdbdisp_checkDispatchResult so it guarantees to stop
the interconnect stream before waiting and checking the status
of QEs.

For UDPIFC, TeardownInterconnect() remove the ic entries, any
packets for this interconnect context will be treated as 'past'
packets and be acked with STOP flag.

For TCP, TeardownInterconnect() close all connection with its
children, the children will treat any readable data in the
connection as a STOP message include the closure operation.

A test case is not included, both commit 2c011ce4 and 02213a73
contain one.

ec1d9a70