提交 · 17e8ed4b58ea7cc4054d76d76d21a68ad885069a · Greenplum / Gpdb

05 11月, 2020 2 次提交

extract config_primaries_for_replication · 17e8ed4b

由 Bhuvnesh Chaudhary 提交于 10月 29, 2020

This commit does the following:
1. Extract config_primaries_for_replication to be used by both gpaddmirrors
and gprecoverseg.

2. Add --hba-hostname handling

3. gprecoverseg: add replication entries for primaries and add tests
Co-authored-by: NKalen Krempely <kkrempely@vmware.com>

17e8ed4b

B
Remove dead code appendNewEntriesToHbaFile · e881f989
由 Bhuvnesh Chaudhary 提交于 10月 29, 2020
```
Co-authored-by: NKalen Krempely <kkrempely@vmware.com>
```
e881f989

04 11月, 2020 5 次提交

Fix dangling pointer issue when refreshing a matview relation · 300afe37

由 Robert Mu 提交于 12月 24, 2019

019-12-04 23:16:23.075209 CST,"gpadmin","regression",p30552,th-790954624,"[local]",,2019-12-04 23:16:03 CST,8192,con143,cmd54,seg-1,,,x8192,sx1,"ERROR","XX000","unrecognized node type: 0 (copyfuncs.c:6424)",,,,,,"REFRESH MATERIALIZED VIEW m_aocs WITH NO DATA;",0,,"copyfuncs.c",6273,"Stack trace:
1 0xaf1f9c postgres errstart (elog.c:561)
2 0xaf4b83 postgres elog_finish (elog.c:1734)
3 0x7b15f7 postgres copyObject (copyfuncs.c:6424)
4 0x6c6ad1 postgres ExecRefreshMatView (matview.c:409)
5 0x997841 postgres <symbol not found> (utility.c:1743)
6 0x9969f4 postgres standard_ProcessUtility (utility.c:1071)
7 0x993bc5 postgres <symbol not found> (palloc.h:176)
8 0x9945c5 postgres <symbol not found> (pquery.c:1552)
9 0x995a21 postgres PortalRun (pquery.c:1022)
10 0x9908d4 postgres <symbol not found> (postgres.c:1791)
11 0x99359b postgres PostgresMain (postgres.c:5123)
12 0x541c32 postgres <symbol not found> (postmaster.c:4445)
13 0x87dcea postgres PostmasterMain (postmaster.c:1519)
14 0x543fbb postgres main (discriminator 1)
15 0x7f4ccb783505 libc.so.6 __libc_start_main + 0xf5
16 0x54485f postgres <symbol not found> + 0x54485f

Root cause
1 Save the rule (query tree with parentStmtType = PARENTSTMTTYPE_NONE) in the pg_rewrite table when creating a matview relation.
In ExecRefreshMatView function
2 Use dataQuery pointer to get the rule (query tree) of matview relation data(relcache)
3 Set dataQuery->parentStmtType = PARENTSTMTTYPE_REFRESH_MATVIEW
4 QD may receive a reset message(shared-inval-queue overflow) when make_new_heap is called,causing QD to rebuild the entire relcache of the backend, including the matview relation
When rebuilding matview relation(relcache), it is found that oldRel->rule(parentStmtType = PARENTSTMTTYPE_REFRESH_MATVIEW) is not equal to newRel->rule(parentStmtType = PARENTSTMTTYPE_NONE), caused oldRel->rule(dataQuery) to be released
5 refresh_matview_datafill using dataQuery will report an error

(cherry picked from commit 474088cb)

300afe37

盏

Fix the FATAL error of autovacuum on template0 (#10920) · e0f598b1

由盏一提交于 11月 04, 2020

The autovacuum worker for template0 would FATAL because Gp_session_role is still the dispatcher
role with below message.

2020-09-29 21:20:02.686827 CST,,,p19902,th-881792832,,,,0,,,seg2,,,,sx1,"FATAL","57P03","
connections to primary segments are not allowed","This database instance is running as a
primary segment in a Greenplum cluster and does not permit direct connections.","
To force a connection anyway (dangerous!), use utility mode.",,,,,0,,"postinit.c",1151,

Fixing this by setting Gp_session_role as the utility role.

e0f598b1

盏

mask all signal in the udp pthreads · e0c0217a

由盏一提交于 10月 21, 2020

In some cases, some signals (like SIGQUIT) that should only be
processed by the main thread of the postmaster may be dispatched to rxThread.
So we should and it is safe to block all signals in the udp pthreads.

Fix #11006

(cherry picked from commit 54451fc0)

e0c0217a

Experimental cost model update (#11083) · 42a7d061

由 Hans Zeller 提交于 11月 03, 2020

* Avoid costing change for IN predicates on btree indexes

Commit e5f1716 changed the way we handle IN predicates on indexes, it
now uses a more efficient array comparison instead of treating it like
an OR predicate. A side effect is that the cost function,
CCostModelGPDB::CostBitmapTableScan, now goes through a different code
path, using the "small NDV" or "large NDV" costing method. This produces
very high cost estimates when the NDV increases beyond 2, so we
basically never choose an index for these cases, although a btree
index used in a bitmap scan isn't very sensitive to the NDV.

To avoid this, we go back to the old formula we used before commit e5f1716.
The fix is restricted to IN predicates on btree indexes, used in a bitmap
scan.

* Add an MDP for a larger IN list, using a btree index on an AO table

* Misc. changes to the calibration test program

- Added tests for btree indexes (btree_scan_tests).
- Changed data distribution so that all column values range from 1...n.
- Parameter values for test queries are now proportional to selectivity,
a parameter value of 0 produces a selectivity of 0%.
- Changed the logic to fake statistics somewhat, hopefully this will
lead to more precise estimates. Incorporated the changes to the
data distribution with no more 0 values. Added fake stats for
unique columns.
- Headers of tests now use semicolons to separate parts, to give
a nicer output when pasting into Google Docs.
- Some formatting changes.
- Log fallbacks.
- When using existing tables, the program now determines the table
structure (heap or append-only) and the row count.
- Split off two very slow tests into separate test units. These are
not included when running "all" tests, they have to be run
explicitly.
- Add btree join tests, rename "bitmap_join_tests" to "index_join_tests"
and run both bitmap and btree joins
- Update min and max parameter values to cover a range that includes
or at least is closer to the cross-over between index and table scan
- Remove the "high NDV" tests, since the ranges in the general test
now include both low and high NDV cases (<= and > 200)
- Print out selectivity of each query, if available
- Suppress standard deviation output when we execute queries only once
- Set search path when connecting
- Decrease the parameter range when running bitmap scan tests on
heap tables
- Run btree scan tests only on AO tables, they are not designed
for testing index scans

* Updates to the experimental cost model, new calibration

1. Simplify some of the formulas, the calibration process seemed to justify
that. We might have to revisit if problems come up. Changes:
- Rewrite some of the formulas so the costs per row and costs per byte
are more easy to see
- Make the cost for the width directly proportional
- Unify the formula for scans and joins, use the same per-byte costs
and make NDV-dependent costs proportional to num_rebinds * dNDV,
except for the logic in item 3.

That makes the cost for the new experimental cost model a simple linear formula:

num_rebinds * ( rows * c1 + rows * width * c2 + ndv * c3 + bitmap_union_cost + c4 ) + c5

We have 5 constants, c1 ... c5:

c1: cost per row (rows on one segment)
c2: cost per byte
c3: cost per distinct value (total NDV on all segments)
c4: cost per rebind
c5: initialization cost
bitmap_union_cost: see item 3 below

2. Recalibrate some of the cost parameters, using the updated calibration
program src/backend/gporca/scripts/cal_bitmap_test.py

3. Add a cost penalty for bitmap index scans on heap tables. The added
cost takes the form bitmap_union_cost = <base table rows> * (NDV-1) * c6.

The reason for this is, as others have pointed out, that heap tables
lead to much larger bit vectors, since their CTIDs are more spaced out
than those of AO tables. The main factor seems to be the cost of unioning
these bit vectors, and that cost is proportional to the number of bitmaps
minus one and the size of the bitmaps, which is approximated here by the
number of rows in the table.

Note that because we use (NDV-1) in the formula, this penalty does not
apply to usual index joins, which have an NDV of 1 per rebind. This is
consistent with what we see in measurements and it also seems reasonable,
since we don't have to union bitmaps in this case.

4. Fix to select CostModelGPDB for the 'experimental' model, as we do in 5X.

5. Calibrate the constants involved (c1 ... c6), using the calibration program
and running experiments with heap and append-only tables on a laptop and
also on a Linux cluster with 24 segments. Also run some other workloads
for validation.

6. Give a small initial advantage to bitmap scans, so they will be chosen over
table scans for small tables. Otherwise, small queries will
have more or less random plans, all of which cost around 431, the value
of the initial cost. Added a 10% advantage of the bitmap scan.

42a7d061

Improve partition elimination when indexes are present (#10970) · 4a7a6821

由 Hans Zeller 提交于 11月 03, 2020

* Use original join pred for DPE with index nested loop joins

Dynamic partition selection is based on a join predicate. For index
nested loop joins, however, we push the join predicate to the inner
side and replace the join predicate with "true". This meant that
we couldn't do DPE for nested index loop joins.

This commit remembers the original join predicate in the index nested
loop join, to be used in the generated filter map for DPE. The original
join predicate needs to be passed through multiple layers.

* SPE for index preds

Some of the xforms use method CXformUtils::PexprRedundantSelectForDynamicIndex
to duplicate predicates that could be used both as index predicates and as
partition elimination predicates. The call was missing in some other xforms.
Added it.

* Changes to equivalent distribution specs with redundant predicates

Adding redundant predicates causes some issues with generating
equivalent distribution specs, to be used for the outer table of
a nested index loop join. We want the equivalent spec to be
expressed in terms of outer references, which are the columns of
the outer table.

By passing in the outer refs, we can ensure that we won't replace
an outer ref in a distribution spec with a local variable from
the original distribution spec.

Also removed the asserts in CPhysicalFilter::PdsDerive that ensure the
distribution spec is complete (consisting of only columns from the
outer table) after we see a select node. Even without my changes, the
asserts do not always hold, as this test case shows:

  drop table if exists foo, bar;
  create table foo(a int, b int, c int, d int, e int) distributed by(a,b,c);
  create table bar(a int, b int, c int, d int, e int) distributed by(a,b,c);

  create index bar_ixb on bar(b);

  set optimizer_enable_hashjoin to off;
  set client_min_messages to log;

  -- runs into assert
  explain
  select *
  from foo join bar on foo.a=bar.a and foo.b=bar.b
  where bar.c > 10 and bar.d = 11;

Instead of the asserts, we now use the new method of passing in the
outer refs to ensure that we move towards completion. We also know
now that we can't always achieve a complete distribution spec, even
without redundant predicates.

* MDP changes

Various changes to MDPs:

- New SPE filters used in plan
- New redundant predicates (partitioning or on non-partitioning columns)
- Plan space changes
- Cost changes
- Motion changes
- Regenerated, because plan switched to a hash join, so used a guc
  to force an index plan
- Fixed lookup failures
- Add mdp where we try unsuccessfully to complete a distribution spec

* ICG result changes

- Test used the 'experimental' cost model to force an index scan, but we
  now get the index scan even with the default cost model.

4a7a6821

03 11月, 2020 2 次提交

Fix resgroup unusable if its dropping failed · 3e548e69

由 xiong-gang 提交于 11月 03, 2020

In function DropResourceGroup(), group->lockedForDrop is set
to true by calling ResGroupCheckForDrop, however, it can only
be set to false inside dropResgroupCallback. This callback is
registered at the ending of function DropResourceGroup. If an
error occured between them, group->lockedForDrop would be true
forever.

Fix it by putting the register process ahead of the lock call.
To prevent Assert(group->nRunning* > 0) if ResGroupCheckForDrop
throws an error, return directly if group->lockedForDrop did
not change.

See:

```
gpconfig -c gp_resource_manager -v group
gpstop -r -a

psql
                CPU_RATE_LIMIT=20,
                MEMORY_LIMIT=20,
                CONCURRENCY=50,
                MEMORY_SHARED_QUOTA=80,
                MEMORY_SPILL_RATIO=20,
                MEMORY_AUDITOR=vmtracker
        );

psql -U user_test
> \d -- hang
```
Co-authored-by: Ndh-cloud <60729713+dh-cloud@users.noreply.github.com>

3e548e69

(

Reset wrote_xlog in pg_conn to avoid keeping old value. · 310ab79b

由 (Jerome)Junfeng Yang 提交于 11月 03, 2020

On QD, it tracks whether QE wrote_xlog in the libpq connection.

The logic is, if QE writes xlog, it'll send a libpq msg to QD. But the
msg is sent in ReadyForQuery. So, before QE execute this function, the
QE may already send back results to QD. Then when QD process this
message, it does not read the new wrote_xlog value. This makes the
connection still contains the previous dispatch wrote_xlog value,
which will affect whether choosing one phase commit.

The issue only happens when the QE flush the libpq msg before the
ReadyForQuery function, hard to find a case to cover it.
I found the issue when I playing the code to send some information from
QE to QD. And it breaks the gangsize test which shows the commit info.

(cherry picked from commit 777b51cd)

310ab79b

02 11月, 2020 2 次提交

Update greenplum-database-release default repo to main · 4beb2a7c

由 Ning Wu 提交于 11月 02, 2020

The repo https://github.com/greenplum-db/greenplum-database-release has
been changed default repo from master to main. This is to sync up this
change
Co-authored-by: NNing Wu <ningw@vmware.com>
Co-authored-by: NShaoqi Bai <bshaoqi@vmware.com>

4beb2a7c

GPCC want to hook query like (#11060) · 8aa4104e

由 Jialun 提交于 11月 02, 2020

- create table ... as select ...
- create materialized view ... as select ...

This is backport from commit: 7ae210a1bf7e569a18cda32dcec3b55665a42ee7

8aa4104e

31 10月, 2020 1 次提交
- D
  
  Docs - add note regarding change in TRUNCATE behavior with foreign key (#11028) · 8ebf8c36
  由 David Yozie 提交于 10月 30, 2020
  
  8ebf8c36
30 10月, 2020 3 次提交

Docs - update interconnect proxy discussion to cover hostname support (#11027) · 9284cf79

由 David Yozie 提交于 10月 29, 2020

* Docs - update interconnect proxy discussion to cover hostname support

* Change gp_interconnect_type -> gp_interconnect_proxy_addresses in note

9284cf79

L

docs - update some troubleshooting info (#11064) · b6cdd143
由 Lisa Owen 提交于 10月 29, 2020

b6cdd143

gpinitsystem -I should respect master dbid != 1 · 7e914f23

由 dh-cloud 提交于 10月 29, 2020

Looking at GP documents, there is no indication that master dbid
must be 1. However, when CREATE_QD_DB, gpinitsystem always writes
"gp_dbid=1" into file `internal.auto.conf` even if we specify:

```
mdw~5432~/data/master/gpseg-1~2~-1
 OR
mdw~5432~/data/master/gpseg-1~0~-1
```

But catalog gp_segment_configuration can have the correct master
dbid value (2 or 0), the mismatch causes gpinitsystem hang.
Users can run into such problem for their first time to use
gpinitsystem -I.

Here we test dbid 0, because PostmasterMain() will simply check
dbid >= 0 (non-utility mode), it says:

> This value must be >= 0, or >= -1 in utility mode

It seems 0 is a valid value.

Changes:

- use specified master dbid field when CREATE_QD_DB.
Reviewed-by: NAshwin Agrawal <aashwin@vmware.com>
(cherry picked from commit 00ae3013)

7e914f23

29 10月, 2020 2 次提交

L

docs - add info about postgres_fdw module (#11075) · c188ed7a
由 Lisa Owen 提交于 10月 29, 2020

c188ed7a

Skip fts probe for fts process · db74df9f

由 dh-cloud 提交于 10月 28, 2020

If cdbcomponent_getCdbComponents() caught an error threw by
function getCdbComponents, FtsNotifyProber would be called.
But if it happened inside fts process, ftp process would hang.

Skip fts probe for fts process, after that, under the same
situation, fts process would exit and then be restarted by
postmaster.

(cherry picked from commit 3cf72f6c)

db74df9f

28 10月, 2020 10 次提交

A
Mock cmd.get_stdout() to fix test regression · bfb71c43
由 Adam Lee 提交于 10月 26, 2020
```
Otherwise it will raise an exception "command not run yet".
```
bfb71c43

gprecoverseg: log the error if pg_rewind fails · 61b5c3bd

由 Adam Lee 提交于 10月 22, 2020

It didn't log the error message before if pg_rewind fails, fix that to make
DBA/field/developer's life eaisier.

Before this:
```
20201022:15:19:10:011118 gprecoverseg:earth:adam-[INFO]:-Running pg_rewind on required mirrors
20201022:15:19:12:011118 gprecoverseg:earth:adam-[WARNING]:-Incremental recovery failed for dbid 2. You must use gprecoverseg -F to recover the segment.
20201022:15:19:12:011118 gprecoverseg:earth:adam-[INFO]:-Starting mirrors
20201022:15:19:12:011118 gprecoverseg:earth:adam-[INFO]:-era is 0406b847bf226356_201022151031
```

After this:
```
20201022:15:33:31:019577 gprecoverseg:earth:adam-[INFO]:-Running pg_rewind on required mirrors
20201022:15:33:31:019577 gprecoverseg:earth:adam-[WARNING]:-pg_rewind: fatal: could not find common ancestor of the source and target cluster's timelines
20201022:15:33:31:019577 gprecoverseg:earth:adam-[WARNING]:-Incremental recovery failed for dbid 2. You must use gprecoverseg -F to recover the segment.
20201022:15:33:31:019577 gprecoverseg:earth:adam-[INFO]:-Starting mirrors
20201022:15:33:31:019577 gprecoverseg:earth:adam-[INFO]:-era is 0406b847bf226356_201022151031
```

61b5c3bd

Hardcode missing cdblegacyhash_bpchar in isLegacyCdbHashFunction check · 4c176763

由 Jimmy Yih 提交于 10月 19, 2020

Currently, when inserting into a table distributed by a bpchar using
the legacy bpchar hash operator, the row goes through jump consistent
hashing instead of lazy modular hashing. This is because the
cdblegacyhash_bpchar funcid is missing from the
isLegacyCdbHashFunction check function which determines if an
attribute is using a legacy hash function or not. The funcids
currently in that check function come from the auto-generated
fmgroids.h header file which only creates a DEFINE for the
pg_proc.prosrc field. Unfortunately, cdblegacyhash_bpchar is left out
because its prosrc is cdblegacyhash_text.

A proper fix would require a catalog change. To fix this issue in
6X_STABLE, we need to hardcode cdblegacyhash_bpchar funcid 6148 into
the isLegacyCdbHashFunction check function. This should be fine since
the GPDB 6X_STABLE catalog is frozen.

This issue was reported by github user cobolbaby in the gpbackup
repository while the user was migrating GPDB 5X tables to GPDB 6X:
https://github.com/greenplum-db/gpbackup/issues/425

4c176763

Pin PR resource to v0.21 to avoid Github abuse rate limit · aa4cf74b

由 Jesse Zhang 提交于 10月 23, 2020

We started hitting this on Thursday, and there's been ongoing report
from the community about this as well. While upstream is figuring out a
long term solution [1], we've been advised [2] to pin to the previous
release (v0.21.0) to avoid being blocked for hours at once.

[1]: https://github.com/telia-oss/github-pr-resource/pull/238
[2]: https://github.com/telia-oss/github-pr-resource/pull/238#issuecomment-714830491

(cherry picked from commit f4bf9be6)

aa4cf74b

Remove duplicate definition of data_directory · 8f0efc83

由 Bradford D. Boyle 提交于 10月 27, 2020

Compiling with gcc 10 on Debian testing fails with the following error:

```
/usr/bin/ld: utils/misc/guc_gp.o:(.bss+0x308): multiple definition of
`data_directory'; utils/misc/guc.o:(.bss+0x70): first defined here
```

8f0efc83

Update python configure macros · 9f94b977

由 Bradford D. Boyle 提交于 10月 27, 2020

Some platforms do not have unversioned "python" available but do have
versioned "python2". Configure should look for either "python" or
"python2" when run with the option "--with-python".

These changes were manually copied from the Postgres build system but
omitted searching for "python3" since Greenplum does not have support
for Python 3 yet.

9f94b977

L

docs - remove duplicate left nav entry (#11040) · 2bfcffe4
由 Lisa Owen 提交于 10月 27, 2020

2bfcffe4
L

docs - add -a (no prompt) option to analyzedb call in script (#10985) · 533fc4ed
由 Lisa Owen 提交于 10月 27, 2020

533fc4ed
D
Add workload3 to explain pipeline · 7192731a
由 David Kimura 提交于 10月 21, 2020
```
(cherry picked from commit 91ed33c9)
```
7192731a
D
Copy and follow symlinks in run_explain_suite pipeline · d92683cc
由 David Kimura 提交于 10月 21, 2020
```
This allows us to reduce code duplication of workload SQL scripts

(cherry picked from commit 8c204bd5)
```
d92683cc

27 10月, 2020 3 次提交

postgres_fdw: disable UPDATE/DELETE on foreign Greenplum servers · e1fed42a

由 Xiaoran Wang 提交于 10月 27, 2020

Greenplum only supports INSERT, because UPDATE/DELETE requires the
hidden column gp_segment_id and the other "ModifyTable mixes distributed
and entry-only tables" issue.

e1fed42a

Remove Orca assertions when merging buckets · 44621b6f

由 Chris Hajas 提交于 10月 16, 2020

These assertions started getting tripped in the previous commit when
adding tests, but aren't related to the Epsilon change. Rather, we're
calculating the frequency of a singleton bucket using two different
methods which causes this assertion to break down. The first method
(calculating the upper_third) assumes the singleton has 1 NDV and that there is an even distribution
across the NDVs. The second (in GetOverlapPercentage) calculates a
"resolution" that is based on Epsilon and assumes the bucket contains
some small Epsilon frequency. It results in the overlap percentage being
too high, instead it too should likely be based on the NDV.

In practice, this won't have much impact unless the NDV is very small.
Additionally, the conditional logic is based on the bounds, not
frequency. However, it would be good to align in the future so our
statistics calculations are simpler to understand and predictable.

For now, we'll remove the assertions and add a TODO. Once we align the
methods, we should add these assertions back.

44621b6f

Fix stats bucket logic for Double values in UNION queries in Orca · 45e49e17

由 Chris Hajas 提交于 10月 16, 2020

When merging statistics buckets for UNION and UNION ALL queries
involving a column that maps to Double (eg: floats, numeric, time
related types), we could end up in an infinite loop. This occurred if
the bucket boundaries that we compared were within a very small value,
defined in Orca as Epsilon. While we considered that two values were
equal if they were within Epsilon, we didn't when computing whether
datum1 < datum2. Therefore we'd get into a situation where a datum
could be both equal to and less than another datum, which the logic
wasn't able to handle.

The fix is to make sure we have a hard boundary of when we consider a
datum less than another datum by including the epsilon logic in all
datum comparisons. Now, 2 datums are equal if they are within epsilon,
but datum1 is less than datum 2 only if datum1 < datum2 - epsilon.

Also add some tests since we didn't have any tests for types that mapped
to Double.

45e49e17

26 10月, 2020 1 次提交
- X
  Enable postgres_fdw test in icw test of gpdb pipeline (#11031) · d8e400dd
  由 Xiaoran Wang 提交于 10月 26, 2020
```
* Enable postgres_fdw test in icw test

postgres_fdw test is disabled by default, and it's
enabled in gpdb pipelines.
```
  d8e400dd
23 10月, 2020 4 次提交

Fix CLOSE_WAIT leaks when Gang recycling · 30b52372

由 dh-cloud 提交于 10月 22, 2020

Postgresql libpq document:

> Note that when PQconnectStart or PQconnectStartParams returns a
> non-null pointer, you must call PQfinish when you are finished
> with it, in order to dispose of the structure and any associated
> memory blocks. **This must be done even if the connection attempt
> fails or is abandoned**.

However, cdbconn_disconnect() function did not call PQfinish when
CONNECTION_BAD, it can cause socket leaks (CLOSE_WAIT state).

30b52372

Fix an orphaned prepared transaction case due to race between checkpointer and... · 29aa03b9

由 Paul Guo 提交于 10月 15, 2020

Fix an orphaned prepared transaction case due to race between checkpointer and COMMIT PREPARE xlog recording

On Greenplum, checkpoint would collect prepared transactions which are actually
committed. If the COMMIT PREPARE xlog is before checkpoint.redo, after the
segment reboot, there would always be an orphaned (actually committed) prepared
transaction in memory. That happens when we collect the prepared transaction in
checkpointer before gxact->valid is reset and after the COMMIT PREPARE xlog is
recorded, see code in FinishPreparedTransaction().

That could lead to various issues. e.g. dtx recovery would keep trying to abort
that and then cause panic on the segment with message like "cannot abort
transaction 3285003, it was already committed (twophase.c:2205)".

Fixing this by adding a new variable committed in gxact to specifiy whether the
global transaction is committed or not. If being committed we surely do not
need to log the gxact in checkpointer xlog. We could also fix this by delaying
checkpointer later after gxact->valid resetting in
FinishPreparedTransaction()), but RecordTransactionCommitPrepared() ->
SyncRepWaitForLSN() might be time-consuming or block for some time somehow
(locking, network lag, etc), thus it could block checkpointer for too long time
- that is surely not good. Also it seems that we could fix that by moving
"gxact->valid = false" ahead of delayChkpt resetting, but that is kind of ugly
also, also that is a bit risky for a stable release.

There were two solutions that were discussed previously. One is to use locking
mechanism, but that hurts OLTP performance; Another is to remove the false
positive cases in RecoverPreparedTransactions(), but it is possible that
related clog has been removed by subsequent vacuum operations so it is not
reliable also.
Co-authored-by: NHao Wu <gfphoenix78@gmail.com>
Reviewed-by: NAsim R P <pasim@vmware.com>
Reviewed-by: NAshwin Agrawal <aashwin@vmware.com>

29aa03b9

Save errno across LWLockRelease() calls · 98b1c3f9

由 Peter Eisentraut 提交于 4月 05, 2020

Fixup for "Drop slot's LWLock before returning from SaveSlotToPath()"
Reported-by: NMichael Paquier <michael@paquier.xyz>
(cherry picked from commit 72b2b9c52e3a86ae414fc07acf6db3de0776fc13)

98b1c3f9

Drop slot's LWLock before returning from SaveSlotToPath() · 32893eaf

由 Peter Eisentraut 提交于 3月 26, 2020

When SaveSlotToPath() is called with elevel=LOG, the early exits didn't
release the slot's io_in_progress_lock.

This could result in a walsender being stuck on the lock forever.  A
possible way to get into this situation is if the offending code paths
are triggered in a low disk space situation.

Author: Pavan Deolasee <pavan.deolasee@2ndquadrant.com>
Reported-by: NCraig Ringer <craig@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/flat/56a138c5-de61-f553-7e8f-6789296de785%402ndquadrant.com
(cherry picked from commit ce28a43ffa89b49584e75d6bb9f8ae03a8e13151)

32893eaf

22 10月, 2020 3 次提交
- D
  
  Docs - update versioning for 6.12 release · 75866e6a
  由 David Yozie 提交于 10月 21, 2020
  
  75866e6a
- B
  fix gprecoverseg -r when password authentification enabled for gpadmin · 68bf31ee
  由 Bhuvnesh Chaudhary 提交于 10月 20, 2020
```
The parameters were incorrectly passed while gprecoverseg was invoked
causing gprecoverseg to fail.
Co-authored-by: NAleksey Kashin <kashinav@yandex-team.ru>
```
  68bf31ee
- J
  Merge pull request #11016 from kalensk/demo_cluster_symlinks_6X · 6a7f3d2e
  由 Jamie McAtamney 提交于 10月 21, 2020
```
demo_cluster.sh: remove GPSEARCH
```
  6a7f3d2e
21 10月, 2020 2 次提交

The inner relation of LASJ_NOTIN should not have partition locaus · 60210d2a

由 Jinbao Chen 提交于 10月 21, 2020

The result of NULL not in an unempty set is false. The result of
NULL not in an empty set is true. But if an unempty set has
partitioned locus. This set will be divided into several subsets.
Some subsets may be empty. Because NULL not in empty set equals
true. There will be some tuples that shouldn't exist in the result
set.

The patch disable the partitioned locus of inner table by removing
the join clause from the redistribution_clauses.

this commit cherry pick from master f77bf087Co-authored-by: NHubert Zhang <hubertzhang@apache.org>
Co-authored-by: NRichard Guo <riguo@pivotal.io>

60210d2a

Fix crash when planning for sub-queries that have grouping sets · a1b2b871

由 Richard Guo 提交于 10月 21, 2020

When constructing plans for a list of rollups, the rollup_subplan may be
retrieved from root->simple_rel_array[i]->subplan, which is not always type of
SubqueryScan. So later in function rebuild_append_simple_rel_and_rte(), we need
to verify if the subplan is a SubqueryScan node before assigning a scanrelid to
it.

Fixes issue #10813 and issue #10841
Reviewed-by: NJunfeng(Jerome) Yang <jeyang@pivotal.io>

a1b2b871