提交 · db1160e8c0de1e31f21e686290402be1f73cab26 · Greenplum / Gpdb

20 3月, 2020 1 次提交

docs - initial doc content for Greenplum R client beta (#9772) · db1160e8

由 Lisa Owen 提交于 3月 19, 2020

* docs - initial doc content for Greenplum R client beta

* use correct download fname, qualify version as Beta

* add more functions to the table, move html help info

db1160e8

19 3月, 2020 6 次提交

D

Docs - further reorg of pl/container files · bd91d392
由 David Yozie 提交于 3月 18, 2020

bd91d392

Feature/plcontainer reorg (#9768) · cb6107c8

由 Lena Hunter 提交于 3月 18, 2020

* clarifying pg_upgrade note

* first edits

* edits for menu and content

* menu changes

* further funtion edits

* new file for config reference

* editing links and configuration file reference

* fixing references

* further link edits

* move of logging section

* arranging sections

* fixing references

* intro edits

* small edit

* small improvements

cb6107c8

Skip comparison for bulk-extended pages in gp_replica_check. (#9750) · 552a4859

由 Paul Guo 提交于 3月 19, 2020

This fixes a flaky gp_replica_check test. PG supports block bulk-extend. In
such case some pages are extended, initialized but not xlogged. On mirror those
pages are just zero filled so we'd skip comparison for these pages in
gp_replica_check.

Note the related upstream patch is:

commit 719c84c1
Author: Robert Haas <rhaas@postgresql.org>
Date:   Fri Apr 8 02:04:46 2016 -0400

    Extend relations multiple blocks at a time to improve scalability.
Co-authored-by: NHao Wu <gfphoenix78@gmail.com>
Reviewed-by: NAsim R P <apraveen@pivotal.io>
Reviewed-by: NGang Xiong <gxiong@pivotal.io>
Reviewed-by: NNing Yu <nyu@pivotal.io>

552a4859

Do final commit (notification) for one phase before local transaction commit... · 4bd1f6e3

由 Paul Guo 提交于 3月 19, 2020

Do final commit (notification) for one phase before local transaction commit recording on QD. (#9762)

Previously that code is after local transaction commit recording. If notification fails,
elog(ERROR) would cause long-jumping to do transaction aborting however that causes panic:
"cannot abort transaction %u, it was already committed".
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NAsim R P <apraveen@pivotal.io>

4bd1f6e3

L
docs - update pxf supported platforms for v5.11 (#9756) · 6df87adb
由 Lisa Owen 提交于 3月 18, 2020
```
* docs - update pxf supported platforms for v5.11

* combine 5.10 and 5.11 rows
```
6df87adb
D

Docs - update component lists for 6.5 · 2a434ae6
由 David Yozie 提交于 3月 18, 2020

2a434ae6

18 3月, 2020 4 次提交

Revert "Fix system views pg_stat_* (#9565)" (#9758) · ee4093b5

由 Jinbao Chen 提交于 3月 18, 2020

A query in the test will trigger a hang issue, but this issue cannot be
fixed in a short time. So we first revert it.

This reverts commit 148a20df.

ee4093b5

H
Fix assertion failure with two-stage agg, ROLLUP and HAVING. · 77aded9d
由 Heikki Linnakangas 提交于 3月 18, 2020
```
Fixes github issue https://github.com/greenplum-db/gpdb/issues/9760
```
77aded9d

Avoid ADD COLUMN full table rewrite for AOCO partitions · e707c19c

由 Melanie 提交于 3月 17, 2020

ALTER TABLE ADD COLUMN to a partition with storage type AOCO should not
trigger a full table rewrite. Instead, only data corresponding to the
new column should be written.

There was a regression caused by an implementation detail of the ALTER
TABLE machinery. 6c572399 in upstream Postgres (Postgres 9.1+ and
GPDB6+), changed ALTER TABLE ADD COLUMN.

Before 6c572399, ATPrepAddColumn() recursively processed child
partitions during the "prep" phase (phase 2), appending subcmds to the
AlteredTableInfo such that each partition table had a copy of all
AlterTableCmds. After 6c572399, ATExecAddColumn() recursively
processes children during the "exec" phase (phase 3), which happens
after subcmds are populated.

The AOCO ADD COLUMN code did a sanity check for the presence of
AlteredTableInfo->subcmds to determine if we can write only the new
column and avoid a full table rewrite. Child partitions were missing
these subcmds, so ALTER TABLE ADD COLUMN always did a full table
rewrite.

We initially considered moving the recursion back to the "prep" phase.
Other subcommand types (e.g. ALTER TABLE ALTER COLUMN TYPE) do recursion
during the "prep" phase. However, this would result in an unnecessary
and potentially incorrect diff from upstream Postgres.

Instead, we decoupled the logic for the table rewrite optimization from
the contents of the AlteredTableInfo->subcmds. Based on the ADD COLUMN
subcommands of the root partition, we determine if the optimization is
*possible*. Then we recurse to all descendant partitions, and, based on
the storage type of each of those relations, set the flag to indicate
whether or not the full table rewrite is required

Co-authored-by: Ashwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: Jesse Zhang <sbjesse@gmail.com>
Co-authored-by: Melanie Plageman <mplageman@pivotal.io>
Reviewed-by: Asim R P <apraveen@pivotal.io>
Reviewed-by: Soumyadeep Chakraborty <sochakraborty@pivotal.io>

e707c19c

gpinitsystem: update catalog with correct hostname · 03c7d557

由 Jamie McAtamney 提交于 3月 10, 2020

Previously, gpintsystem was incorrectly filling the hostname field of each
segment in gp_segment_configuration with the segment's address. This commit
changes it to correctly resolve hostnames and update the catalog accordingly.
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>

03c7d557

17 3月, 2020 4 次提交

Simplify the implementation of "dedup" semi-join paths. · 9628a332

由 Heikki Linnakangas 提交于 3月 11, 2020

There are two parts to this commit:

1. Instead of using ctids and "fake" ctids to disambiguate rows that have
been duplicated by a Broadcast motion, generate a synthetic 64-bit
rowid specifically for that purpose, just below the Broadcast. The old
method had to generate subplan IDs, fake ctids etc. to have a
combination of columns that formed the unique ID, and that entailed a
lot of infrastructure, like the whole concept of "pseudo columns". The
new method is much simpler, and it works the same way regardless of
what's below the Broadcast. The new 64-bit rowids are generated by
a new RowIdExpr expression node. The high 16 bits of the rowid are
the segment ID, and the low bits are generated by a simple counter.

2. Replace the cdbpath_dedup_fixup() post-processing step on the Plan
tree, by adding the unique ID column (now a RowIdExpr) in the Path's
pathtarget in the first place. cdbpath_motion_for_join() is the
function that does that now. Seems like a logical place, since that's
where any needed Motions are added on top of the join inputs. The
responsibility for creating the Unique path is moved from
cdb_add_join_path() to the create_nestloop/hashjoin/mergejoin_path()
functions.

This slightly changes the plan when one side of the join has Segment or
SegmentGeneral locus. For example:

select * from srf() r(a) where r.a in (select t.a/10 from tab t);

With this commit, that's implemented by running the FunctionScan and
generating the row IDs on only one node, and redistributing/broadcasting
the result. This is not exactly the same as before this PR: previously, if
the function was stable or immutable, we assumed that a SRF returns rows
in the same order on all nodes and relied on that when generating the
"synthetic ctid" on each row. We no longer assume that, but force the
function to run on a single node instead. That seems acceptable, this kind
of plans are helpful when the other side of the join is much larger, so
redistributing the smaller side is probably still OK.

This also uses the same strategy when the inner side is a replicated table
rather than a SRF. That was not supported before, but there was a TODO
comment that hinted at the possibility.

Fixes https://github.com/greenplum-db/gpdb/issues/9741, and adds tests
for it.
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

9628a332

Avoid re-executing if all tuples of the plan have been emitted. (#9596) · 3d11e871

由 Paul Guo 提交于 3月 17, 2020

On postgres, holdable cursor will rerun the executor after rewinding to put all
tuples in tuplestore during commit for later access, but for gpdb we do not
support backwards scanning so we just continue executing without rewinding the
executor, this could lead to some issues since some executor nodes are not
written or designed for the case that previously the plan has emitted all tuples.

Typical issues are seen as below:

1. assert failure (in execMotionSortedReceiver())
FailedAssertion("!(!((heap)->bh_size == 0) && heap->bh_has_heap_property)", File: "binaryheap.c"

2. elog(ERROR) like below.
ERROR: cannot execute squelched plan node of type: 232 (execProcnode.c:887)
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>

3d11e871

Fix memory leak in checkpointer process. (#9754) · 11825f99

由 Hao Wu 提交于 3月 17, 2020

Fix memory leak in checkpointer process. `dtxCheckPointInfo` is not freed in the memory context of the checkpoint process.

11825f99

S

Bump ORCA version to v 3.95.0 · 6b0465d5
由 Sambitesh Dash 提交于 3月 15, 2020

6b0465d5

16 3月, 2020 2 次提交

Remove pointless allocation check · e7e6f8d3

由 Daniel Gustafsson 提交于 3月 16, 2020

pg_strdup will only ever return a properly allocated pointer, so there
is no reason to check the returned pointer.
Reviewed-by: NPaul Guo <pguo@pivotal.io>

e7e6f8d3

Enable and enhance gpfdist SSL test cases (#9722) · d7e86dda

由 Huiliang.liu 提交于 3月 16, 2020

* Enable and enhance gpfdist SSL test cases

1. Add multiple root CA test cases for gpfdist SSL
2. Fix output file due to foreign table modification

d7e86dda

15 3月, 2020 1 次提交
- X
  
  Fix a bug in LocalXidGetDistributedXid · f3a92044
  由 xiong-gang 提交于 3月 15, 2020
  
  f3a92044
14 3月, 2020 3 次提交

Harden analyzedb when a table is dropped · 48be5628

由 Ashuka Xue 提交于 3月 09, 2020

Previously, analyzedb would error out and fail if a table was dropped
during analyzedb. Now, we silently skip dropped tables when determining
the tables to analyze.

48be5628

Fix gpinitsystem when setting password for numeric username · f188ecb5

由 Adam Berlin 提交于 3月 09, 2020

gpinitsystem did not quote the username while performing ALTER USER. When the username
is a numeric value the postgres parser gets upset - unless the username is quoted.

See here for more details:

https://www.postgresql.org/docs/9.4/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

- SQL identifiers and key words must begin with a letter (a-z, but also letters with
  diacritical marks and non-Latin letters) or an underscore (_).

- Also, there is a second kind of identifier: the delimited identifier or quoted
  identifier. It is formed by enclosing an arbitrary sequence of characters in
  double-quotes (")

- use variable interpolation provided by psql to properly quote user-provided values.

- use RETVAL to perform testing due to Commit d7b7a40aCo-authored-by: NJacob Champion <pchampion@pivotal.io>

f188ecb5

docs - new GUC plan_cache_mode (#9734) · 40347064

由 Mel Kiyama 提交于 3月 13, 2020

* docs - new GUC plan_cache_mode

--Add GUC
--update PREPARE reference, Notes section

Also, minor fix in guc_category_list

* docs - minor edits

40347064

13 3月, 2020 3 次提交

Fix ACL error when creating partition table. · a47fe8f7

由 prajnamort 提交于 3月 13, 2020

Fix GitHub Issue: #9524
Cause of this bug: Both heap_create_with_catalog() and
CopyRelationAcls() tried to write ACL for partition child tables,
which is not allowed.

Instead, We leave ACL to be NULL when heap_create_with_catalog()
creates child relations. This should fix the issue.

a47fe8f7

Convert disable_cost to a guc (#9728) · 91108398

由 Paul Guo 提交于 3月 13, 2020

There was a discussion about disable_cost being small sometimes on upstream,
https://www.postgresql.org/message-id/flat/CAO0i4_SSPV9TVxbbTRVLOnCyewopcc147fBZy%3Df2ABk15eHS%2Bg%40mail.gmail.com
but there is no conclusion there although many solutions are discussed there.
This issue seem to be more urgent for Greenplum since Greenplum is a
distributed system and can handle much more data than single node postgres.

Recently we encountered a user issue again that is quite relevant of this. In
the case merge join, instead of hashjoin is chosen. Admittedly in the case
there is a corner costing issue (might be uglily fixed though) in the hashjoin
code, but generally we are wondering if we should tune disable_cost to fix
given small disable_cost value has been known to be an issue sometimes also.

We expect finally disable_cost should be gone on gpdb master, and we know we
need to work to make hashjoin costing more accurate (though sometimes it is
hard), but for now users are waiting for a solution on gpdb6, so we use the guc
solution temporarily. Note in real scenario please temporarily tune it only
when needed (typically when enabled path cost is near to or greater than
disable cost but we want the enabled path) else there might be some side effect
of planning for "disabled" paths due to precision (e.g. some paths are ignored
due to precision).
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
Reviewed-by: NNing Yu <nyu@pivotal.io>

91108398

Fix analyzedb with config file to work with partitioned tables · d1611944

由 Chris Hajas 提交于 3月 09, 2020

Previously, running analyzedb with an input file (`analyzedb -f
<config_file`) containing a root partition would fail as we did not
properly populate the list of leaf partitions. The logic in analyzedb
assumes that we enumerate leaf partitions from the root partition that
the user had input (either from the command line or from an input file).
While we did this properly when the table was passed in from the command
line, we looked for the table name rather than the schema-qualifed table
for input files.

This would cause partitioned heap tables to fail when writing the
report/status files at the end, and would cause analyzedb to not track
DML changes in partitioned AO tables. Now, we properly check for the
schema-qualified table name.

d1611944

12 3月, 2020 7 次提交

M

docs - gpload new option --max-retries (#9716) · e3bf085a
由 Mel Kiyama 提交于 3月 11, 2020

e3bf085a

docs - add deflate support for s3 protocol (#9707) · 88dab77b

由 Mel Kiyama 提交于 3月 11, 2020

* docs - add deflate support for s3 protocol

* docs - review updates for s3 protocol compressed file information

* docs - clarified how s3 protocol recognizes gzip, deflate compressed files.

88dab77b

docs - CREATE FUNCTION - new attribute EXECUTE ON INITPLAN (#9693) · 10ded76b

由 Mel Kiyama 提交于 3月 11, 2020

* docs - CREATE FUNCTION - new EXECUTE ON INITPLAN attribute

Updated
CREATE FUNCTION
ALTER FUNCTION
Using Functions and Operators

This will be backported to 6X_STABLE

HTML output on temporary GPDB doc review site
https://docs-msk-gpdb6-dev.cfapps.io/7-0/ref_guide/sql_commands/CREATE_FUNCTION.html

https://docs-msk-gpdb6-dev.cfapps.io/7-0/ref_guide/sql_commands/ALTER_FUNCTION.html
https://docs-msk-gpdb6-dev.cfapps.io/7-0/admin_guide/query/topics/functions-operators.html

* docs - update EXECUTE ON INITPLAN attribute description based on review comments.
Labeled the attribute as Beta.

* docs - removed Beta note, fixed minor errors and typos.

10ded76b

M
docs - add examples for deprecated timestamp format YYYYMMDDHH24MISS. (#9665) · 180cc813
由 Mel Kiyama 提交于 3月 11, 2020
```
Add examples to note when migration from GPDB 4,5 to 6
What fails in 6 and some workarounds.
```
180cc813

Provide workaround to reclaim space on truncate cmd in sub-transaction · 6c16abca

由 Ashwin Agrawal 提交于 3月 11, 2020

As discussed in gpdb-users thread [1], currently no mechanism exist to
reclaim disk space for a table created and truncated or dropped
iteratively in a plpython function. PostgreSQL 11 provides
`plpy.commit()` using which this can be achieved. But in absence of
same need to provide some mechanism as MADlib has requirement for this
functionality for GPDB6 and forward.

Hence, considering the requirement as interim work-around adding guc
to be able to perform unsafe truncate instead of safe truncate from
plpython execute function. Setting the GUC can force unsafe truncation
only if

- inside sub-transaction and not in top transaction
- table was created somewhere within this transactions scope and not
  outside of it

The GUC will be set in the plpython udf and if any `plpy.execute()`
errors out the top transaction will also rollback. The GUC can't be
set in postgresql.conf file. Also, added description to warn the guc
is not for general purpose use and is developer only guc.

Added test showcases in simple form the scenario.

[1] https://groups.google.com/a/greenplum.org/d/msg/gpdb-users/YCtI4oUA3r0/t0CzhtL6AQAJReviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>

6c16abca

Add optimizer_enable_range_predicate_dpe GUC · 4ff5b9f1

由 Ashuka Xue 提交于 3月 03, 2020

This commit adds a GUC needed to enable/disable ORCA traceflag
introduced in ORCA commit : "Allow only equality comparisons for
Dynamic Partition Elimination"

4ff5b9f1

Add plan_cache_mode setting · f78c920c

由 Peter Eisentraut 提交于 7月 16, 2018

This allows overriding the choice of custom or generic plan.

Author: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRAGLaiEm8ur5DWEBo7qHRWTk9HxkuUAz00CZZtJj-LkCA%40mail.gmail.com
(cherry picked from commit f7cb2842)

f78c920c

11 3月, 2020 1 次提交

Align unknown type handling with upstream (#9686) · 4c8dace3

由 David Kimura 提交于 3月 10, 2020

In Postgres / Greenplum, a literal (e.g. NULL or 'text') in a query with
no obvious type gets a pseudo-type "unknown". A column that refers to
such literals will also get the "unknown" type.

We used to have logic to infer type of unknown literals in a subquery
based on the context. For example, the following query returns 124 as
foo is inferred to be an integer:

```sql
select foo + 123 from (select '1' as foo) a;
```

As another example, Greenplum infers the type of NULL as int in the
following "INSERT ... SELECT" with unknown-typed columns from a nested
subquery:

```sql
-- CREATE TABLE foo (a int, b text, c int);

INSERT INTO foo
SELECT * FROM (
  '1',
  'bbb',
  NULL
) bar(x,y,z);
```

While this user experience is nice, this cleverness is flawed.
Specifically, given an operator expression (let's say it's an addition)
where one operand has "unknown" type, and the other operand has type T
(let's say int), it's questionable that we can infer the unknown must be
a T (an int). In fact, the number of operators that fit the pattern of
"+(T, ?)" could be either ambiguous, or nonexistent.

Of course, there are scenarios where types are clearer: the SELECT list
that feeds into an INSERT or UPDATE has to be compatible (read:
cast-ible) with the target columns. Similarly, the SELECT list that goes
into a set operation (e.g. UNION) has to be compatible with
corresponding columns from other sub-queries. Upstream Postgres already
handles a large part of these cases.

This commit removes the Greenplum-specific cleverness. See the updated
tests for the behavior changes. Because this mostly changes
parse-analysis, expressions that are already stored in the catalog are
_not_ impacted (read: we will be able to restore the output of pg_dump).

Not that it will give us feature parity, but in Postgres 10, we will
coerce remaining unknown typed literals to text. See Postgres commit
1e7c4bb0.
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

4c8dace3

10 3月, 2020 3 次提交

Remove gp_elog() function. · 4f0a5ced

由 Heikki Linnakangas 提交于 3月 10, 2020

It was only used for one message in gprecoverseg, and it doesn't seem
important.

The second argument to the function didn't do anything, since the removal
of email and SNMP alerts in commit 65822b80. And the NULL checks for the
arguments were pointless, because the function was marked as strict. But
rather than clean those up, let's just remove it altogether.
Reviewed-by: NAsim R P <apraveen@pivotal.io>
Reviewed-by: NJimmy Yih <jyih@pivotal.io>

4f0a5ced

W

Fix typo and deadlinks in gppc readme · fc008106
由 Wang Hao 提交于 3月 06, 2020

fc008106

Refactor the way executor parameters are dispatched. · 53d12bd5

由 Heikki Linnakangas 提交于 3月 09, 2020

We used to append the executor parameters to the array of external
parameters in the QD, before the query was dispatched, and copied them
back to the array of executor parameters in the QE. For clarity, refactor
that so that external query parameters are kept separate from executor
parameters at all times.

The old comments talked about parameters with type PARAM_EXEC_REMOTE.
There must have been some plan to use that with dispatched parameters, but
it had never actually been used.
Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>

53d12bd5

09 3月, 2020 5 次提交

Remove code to add SubPlans used in Motions to subplan's target list. · 1afbbbad

由 Heikki Linnakangas 提交于 3月 09, 2020

I did some digging, this code was added in 2011 in this commit:

commit 796dcb358dc9dd40f2674373a2f542fd7c796e6a
Date:   Fri Oct 21 16:56:58 2011 -0800

    Add entries to subplan's targetlist for fixing setrefs of flow nodes in motion selectively.

    [JIRA: MPP-15086, MPP-15073]

    [git-p4: depot-paths = "//cdb2/main/": change = 99193]

Those JIRAs were duplicates of the same issue, which was an error in
regression tests, in the 'DML_over_joins' test. 'DML_over_joins' has
changed a lot since those days, but FWIW, it's not failing now, with
this code removed. The error from the JIRA looked like this:

    select m.* from m,r,purchase_par where m.a = r.a and m.b = purchase_par.id + 1;
    ERROR:  attribute number 2 exceeds number of columns 1 (execQual.c:626)  (seg2 slice1 rh55-tst3:10100 pid=12142)

The code has changed a lot since, and I believe this isn't needed anymore.
If the planner chooses to redistribute based on an expression, that
expression is surely used somewhere above the Motion, and should therefore
be in the targetlist already.

This fixes an error in the second test query that this adds. Before this
commit, the hashed SubPlan appeared twice in the target list of the Seq
Scan below the Redistribute Motion:

explain (verbose, costs off)
select * from foo where
  (case when foo.i in (select a.i from baz a) then foo.i else null end) in
  (select b.i from baz b);
                                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: foo.i, foo.j
   ->  Result
         Output: foo.i, foo.j
         ->  HashAggregate
               Output: foo.i, foo.j, foo.ctid, foo.gp_segment_id
               Group Key: foo.ctid, foo.gp_segment_id
               ->  Redistribute Motion 3:3  (slice2; segments: 3)
                     Output: foo.i, foo.j, foo.ctid, foo.gp_segment_id
                     Hash Key: foo.ctid
                     ->  Hash Join
                           Output: foo.i, foo.j, foo.ctid, foo.gp_segment_id
                           Hash Cond: (b.i = (CASE WHEN (hashed SubPlan 1) THEN foo.i ELSE NULL::integer END))
                           ->  Seq Scan on subselect_gp.baz b
                                 Output: b.i, b.j
                           ->  Hash
                                 Output: foo.i, foo.j, ((hashed SubPlan 1)), foo.ctid, foo.gp_segment_id, (CASE WHEN (hashed SubPlan 1) THEN foo.i ELSE NULL::integer END)
                                 ->  Redistribute Motion 3:3  (slice3; segments: 3)
                                       Output: foo.i, foo.j, ((hashed SubPlan 1)), foo.ctid, foo.gp_segment_id, (CASE WHEN (hashed SubPlan 1) THEN foo.i ELSE NULL::integer END)
                                       Hash Key: (CASE WHEN (hashed SubPlan 1) THEN foo.i ELSE NULL::integer END)
                                       ->  Seq Scan on subselect_gp.foo
                                             Output: foo.i, foo.j, (hashed SubPlan 1), foo.ctid, foo.gp_segment_id, CASE WHEN (hashed SubPlan 1) THEN foo.i ELSE NULL::integer END
                                             SubPlan 1
                                               ->  Broadcast Motion 3:3  (slice4; segments: 3)
                                                     Output: a.i
                                                     ->  Seq Scan on subselect_gp.baz a
                                                           Output: a.i
 Optimizer: Postgres query optimizer
 Settings: optimizer=off
(29 rows)

That failed at runtime with:

ERROR:  illegal rescan of motion node: invalid plan (nodeMotion.c:1232)  (seg0 slice3 127.0.0.1:40000 pid=16712) (nodeMotion.c:1232)
HINT:  Likely caused by bad NL-join, try setting enable_nestloop to off

Fixes: https://github.com/greenplum-db/gpdb/issues/9701Reviewed-by: NRichard Guo <riguo@pivotal.io>

1afbbbad

Fix expected modcount values in behave tests · 4a7a5497

由 Asim R P 提交于 3月 09, 2020

Now that we no longer maintain modcount for append-optimized tables on
master, analyzedb was changed to obtain sum of modcounts from segments
in 67321b1c. This commit adjusts modcounts expected by tests
accordingly. E.g. expected modcount of 2 needs to be 6 now (sum of
modcount from 3 segments). In another test, expected modcount of 2
now becomes 3 (1 on seg0 + 2 on seg1 + 0 on seg2).

Note that expecting a modcount to have a specific value makes the
tests depend on demo cluster configuration. That dependency already
exists in some ICW tests, so it seems alright.

4a7a5497

Improve Append plan when General children exist. · 75c4769a

由 prajnamort 提交于 3月 09, 2020

When we append General and Partitioned children together, consider
General as Strewn (rather than SingleQE) to postpone gather motion.
Add a special projection path to General, so that only a single QE
will actually produce rows.

75c4769a

Collect 'modcount' correctly, now that AO segments are managed in QE. · 67321b1c

由 Heikki Linnakangas 提交于 3月 09, 2020

'modcount' is not kept up-to-date in the QD node anymore, so we need to
sum it up across all the segments. The analyzedb tests on concourse master
pipeline were failing because modcount was always caming out as 0.

67321b1c

Manage AO segments in QE nodes, instead of the master. · 5778f311

由 Heikki Linnakangas 提交于 3月 09, 2020

Overview
--------

Manage the allocation of AO segfiles in QE nodes, instead of the master.
That simplifies a lot of things. We no longer need to send the decision
on which segfile to use from QD to QE, we no longer need to keep up-to-date
tuple counts in the master, and so forth.

This gets rid of the in-memory hash table to track AO segment status. To
lock a segfile for insertion or other operations, we now rely on tuple
locks on the AO segment table instead.

VACUUM code has been refactored. I tried to make the code structure closer
to upstream, to reduce merge conflicts as we merge with upstream. The way
the VACUUM phases are dispatched, and the phases needed, are different in
the new regime. Some phases could use local transactions, but others have
to still use distributed transactions, but for simplicity we use
distributed transactions for all of the phases.

Now that QE nodes choose the insertion target segfile independently, we
could use the same algorithm for choosing the target segfile in utility
mode, and wouldn't need to reserve segfile #0 for special operations
anymore. But to keep things familiar, and to avoid any more churn in the
regression tests, this keeps the historical behavior and still reserves
segfile #0 for utility mode and CREATE TABLE AS.

Previously, SERIALIZABLE (and REPATABLE READ) transactions were treated
differently from transactions in READ COMMITTED mode. If there was an
old serializable transaction running, VACUUM refrained from recycling
old AO segfiles after compaction, because the serializable transaction
might lock the table later and try to read the rows that were compacted
away. But that was not completely waterproof; a READ COMMITTED transaction
that's still holding onto an old snapshot is no different from a
serializable transaction. There's a gap between acquiring a snapshot and
locking a table, so even if a READ COMMITTED backend is not holding a lock
on a table right now, it might open it later. To fix that, remove the
distinction between serializable and other transactions. Any open
transaction will now prevent AO segfiles in awaiting-drop state from being
removed. This commit adds an isolation2 test case called
'vacuum_self_function' to test one such case; it used to fail before this
commit.

That is a user-visible change in behavior that's worth calling out
explicitly. If there are concurrent transactions running, even in
READ COMMITTED mode, and even if they don't access the table at all,
VACUUM will no longer be able to recycle compacted segfiles. Next VACUUM
will clean them up, after the old concurrent transactions. Meanwhile,
the awaiting-drop segfiles will use up disk space, and increase the risk
that an insertion fails because all segfiles are in use.

This work was originally submitted github PR #790; you might see references
to that in the discussions.

Code change details
-------------------

This gets rid of the in-memory hash table to track AO segment status. To
lock a segfile for insertion or other operations, we now rely on tuple
locks on the AO segment table instead. When an AO segment is chosen as
insertion target, its AO segment table row is locked for the transaction
(by heap_lock_tuple, i.e. by stamping its XMAX). Later, when the insertion
finishes, the tuple is updated with the new EOF (no change there). When
choosing a new insertion target segfile, we scan the pg_aoseg table instead
of the hash table, and use the xmin/xmax to determine which ones are safe
to use. The process of choosing a target segment itself is not
concurrency-safe, so the relation extension lock is held to protect that.
See new "Locking and snapshots" section in
src/backend/access/appendonly/README.md for details.

tupcount and modcount are not kept up-to-date in the master. Remove all the
code needed to send the totals back from QE to QD at end of a command,
including the GPDB-specific 'o' protocol FE/BE message.

Remove the machinery to track which PROCs are in a serializable transaction.
It was not completely waterproof anyway, and there were some existing bugs
in that area. A READ COMMITTED transaction that's still holding onto an old
snapshot is no different from a serializable transaction. This commit adds
an isolation2 test case called 'vacuum_self_function' to test one such case;
it used to fail before this commit.

Move responsibility of creating AO segfile to callers of
appendonly_insert_init(). The caller must have locked, and created if
necessary, the segfile by calling ChooseSegnoForWrite() or
LockSegnoForWrite(). Previously, the pg_aoseg was not locked in rewriting
ALTER TABLE and other such commands that always used segfile #0. That was
OK, because they held an AccessExclusiveLock on the table, but let's be
consistent.

When selecting the insertion target for VACUUM compaction, prefer creating
a new segfile over using an existing non-empty segfile. Existing empty
segfiles are still choice #1, but let's avoid writing to non-empty
existing files, because otherwise we cannot vacuum those segfiles in the
same VACUUM cycle.

Warn, if compaction fails because no insertion segfile could be chosen.
This used to be an ERROR in the old code, but I think a WARNING is more
appropriate.

In targetid_get_partition(), don't create duplicate ResultRelInfo entries.
It's possible that the "result relation" is a partition of a partitioned
table, rather than the root. Don't create a new ResultRelInfo hash entry
for it in that case, use the parent ResultRelInfo. This was harmless
before, but now the logic of when we need to increment the modcount was
getting confused. Per the 'qp_dropped_cols' regression test.

Test change details
-------------------

src/test/regress/greenplum_schedule

Run uao[cs]_catalog_tables tests separately.

Because we are now more lazy with dropping AWAITING_DROP segments, the
issue mentioned in the comment with serializable transactions now
applies to all transactions.

uao_compaction/threshold
uaocs_compaction/threshold

Adjust uao[cs]_compaction/threshold tests case to make it pass.

In the first few changed outputs, I'm not entirely sure why one segment
chooses to insert the new tuples to segfile #2 while others pick
segfile #1, but there's also nothing wrong with that. The point of this
PR is that QE nodes can make the decision independently.

In the last test with gp_toolkit.__gp_aovisimap_compaction_info(),
all the test data is inserted to one QE node, so segfile 2 isn't
allocated on others.

isolation2/vacuum_self_function

Add test case for the non-SERIALIZABLE case discussed earlier.

See also discussion at
https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/49MW3tBoH8s/W_bdyxZMAgAJ

This test errored out before this commit:

ERROR: read beyond eof in table "ao" file "base/16384/16387.1", read position 0 (small offset 0), actual read length 0 (large read length 192) (cdbbufferedread.c:211)

Works with this PR, but we don't drop old AO segments as aggressively
in vacuum anymore. Partly because we're now more careful and correct.
But also because we don't detect the case that the VACUUM is the oldest
transaction in the system, and nothing else is running; we miss the
opportunity to drop compacted AO segments immediately in that case. The
next VACUUM should drop them though.

Furthermore, this test is written so that if we tried to perform AO
compaction phase in a local rather than distributed transaction, it
would fail. (I know because I tried to do it that way at first.) This
is "Problem 3" that I mentioned on that gpdb-dev thread.

isolation2/fsync_ao

Adjust "hit counts" in fsync_ao test.

The number of times the fsyncs happen have changed. To be honest, I'm not
sure why, nor what the principled way to get the correct value would be,
but as long as the counts are > 0, this looks like success to me.

isolation2/gdd/planner_insert_while_vacuum_drop

Remove GDD test that's not relevant anymore.

The test was testing a deadlock between QD and QE, involving AO vacuum.
We no longer take an AccessExclusiveLock in the QD, so the test case
doesn't work. I'm not sure what an equivalent deadlock might be with the
new AO vacuum, so just remove the test.

isolation2/vacuum_drop_phase_ao
isolation2/uao/insert_should_not_use_awaiting_drop

These tests don't make sense in the QE-managed regime anymore. I
couldn't figure out what these should look like with this commit, so
remove.

isolation2/uao/insert_policy

Update test case to cover a scenario where two transactions try to
create the initial segment for relation. We had a bug in that scenario
during development, and it wasn't covered by any existing tests.

isolation2/uao/select_while_vacuum_serializable2

The behavior is different now, a serializable transaction doesn't
prevent compaction phase from happening, only the recycling of the
segfile. Adjust expected output accordingly.

isolation2/uao/compaction_utility

VACUUM in utility mode can compact now.
Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NXin Zhang <xzhang@pivotal.io>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>
Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/49MW3tBoH8s/W_bdyxZMAgAJ
Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/lUwtNxaT3f0/Cg39CP8uAwAJ

5778f311