提交 · e470a38b471c34549a1cb48fb233c56eac0a2479 · Greenplum / Gpdb

20 3月, 2020 5 次提交

Docs - updates for PL/Container 3 beta (#9773) · e470a38b

由 David Yozie 提交于 3月 20, 2020

* Docs - updates for PL/Container 3 beta

* Update to account for beta being available for Ubuntu

* Update install, uninstall tasks for 3 Beta

* Qualify r image as beta

e470a38b

(

Enable external table's error log to be persistent for ETL. (#9757) · 04fdd0a6

由 (Jerome)Junfeng Yang 提交于 3月 20, 2020

For ETL user scenarios, there are some cases that may frequently create
and drop the same external table. And once the external table gets dropped.
All errors stored in the error log will lose.

To enable error log persistent for external with the same
"dbname"."namespace"."table".
Bring in "error_log_persistent" external table option. If create the
external table with `OPTIONS (error_log_persistent 'true')` and `LOG ERROR`,
the external's error log will be name as "dbid_namespaceid_tablename"
under "errlogpersistent" directory.
And drop external table will ignore to delete the error log.

Since GPDB 5, 6 still use pg_exttable's options to mark
LOG ERRORS PERSISTENTLY, so keep the ability for loading from
OPTIONS(error_log_persistent 'true').

Create separate `gp_read_persistent_error_log` function to read
persistent error log.
If the external table gets deleted, only the namespace owner
has permission to delete the error log.

Create separate `gp_truncate_persistent_error_log` function to delete
persistent error log.
If the external table gets deleted. Only the namespace owner has
permission to delete the error log.
It also supports wildcard input to delete error logs
belong to a database or whole cluster.

If drop an external table create with `error_log_persistent`. And then
create the same "dbname"."namespace"."table" external table without
persistent error log. It'll write errors to the normal error log.
The persistent error log still exists.
Reviewed-by: NHaozhouWang <hawang@pivotal.io>
Reviewed-by: NAdam Lee <ali@pivotal.io>

04fdd0a6

Dump combocid message to dynamic shared memory instead of BufFile · 471653e7

由 Weinan WANG 提交于 3月 20, 2020

For write-gang and read-gang combocid synchronization, I remove
BufFile code and replace it with dynamic shared memory.

- Remove the combocids array from SharedLocalSnapshotSlot. Always rely on
  the array sharing mechanism in combocid.c. 

- Revert comboCids array to the way it is in the upstream; remove 'xmin'
  field.

- Remove the changes to assertions referring to MPP-8317. As far as I can
  see, QE reader processed need to always have a correct view of all their 
  "current" transactions, whether or not you're running a cursor. We
  were not consistent with relaxing those assertions, we had relaxed the
  one in HeapTupleHeaderGetCmax(), for example, but not the one in
  HeapTupleHeaderGetCmin(). The relaxations must have become obsolete 
  somewhere along the line.

- In the DSM segment, use the same array format as in the backend-private
  comboCids array. Rename and move things around to make it more explicit
  that the shared array is a copy of the backend-private comboCids array.

- Improve the dsm_attach() retry logic. We can detect the case that
  dsm_attach() fails because the QE writer process reallocated a new DSM
  segment, so check for that explicitly, remove the sleeps, and throw an
  error on other failures.
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

471653e7

Fix fts probe response deserialization (#9770) · fe4bcafa

由 David Kimura 提交于 3月 19, 2020

In probeRecordResponse(), columns are read using PQgetvalue() and cast to int
pointers. Then values are stored as bools.  However, SendFtsResponse() writes
only one byte per column so we really probably should not be casting to int
pointers as we have no guarantee what the high bits will be. This could lead to
unexpected behavior.

Issue is that depending on the underlying type of bool, we could
inadvertantly use the undefined high bits behind the int pointer cast
dereference to decide the value to store into the bool. Thus we could
read unexpected values for isMirrorAlive, isInSync, isSyncRepEnabled,
etc.

For example, in this code `b` and `c` have different values:
```
int i = 0x1000;
unsigned char c = *&i;
bool b = *&i;
```

This was caught during postgres 12 merge iteration due to postgres 11 commit
9a95a77d which changed the preferred definition of bool to use stdbool.h
instead of typedef unsigned char.

Issue can be reproduced outside the merge branch by simply adding includes for
`<stdbool.h>` inside fts.c and ftsprobe.h files.
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>

fe4bcafa

docs - initial doc content for Greenplum R client beta (#9772) · db1160e8

由 Lisa Owen 提交于 3月 19, 2020

* docs - initial doc content for Greenplum R client beta

* use correct download fname, qualify version as Beta

* add more functions to the table, move html help info

db1160e8

19 3月, 2020 6 次提交

D

Docs - further reorg of pl/container files · bd91d392
由 David Yozie 提交于 3月 18, 2020

bd91d392

Feature/plcontainer reorg (#9768) · cb6107c8

由 Lena Hunter 提交于 3月 18, 2020

* clarifying pg_upgrade note

* first edits

* edits for menu and content

* menu changes

* further funtion edits

* new file for config reference

* editing links and configuration file reference

* fixing references

* further link edits

* move of logging section

* arranging sections

* fixing references

* intro edits

* small edit

* small improvements

cb6107c8

Skip comparison for bulk-extended pages in gp_replica_check. (#9750) · 552a4859

由 Paul Guo 提交于 3月 19, 2020

This fixes a flaky gp_replica_check test. PG supports block bulk-extend. In
such case some pages are extended, initialized but not xlogged. On mirror those
pages are just zero filled so we'd skip comparison for these pages in
gp_replica_check.

Note the related upstream patch is:

commit 719c84c1
Author: Robert Haas <rhaas@postgresql.org>
Date:   Fri Apr 8 02:04:46 2016 -0400

    Extend relations multiple blocks at a time to improve scalability.
Co-authored-by: NHao Wu <gfphoenix78@gmail.com>
Reviewed-by: NAsim R P <apraveen@pivotal.io>
Reviewed-by: NGang Xiong <gxiong@pivotal.io>
Reviewed-by: NNing Yu <nyu@pivotal.io>

552a4859

Do final commit (notification) for one phase before local transaction commit... · 4bd1f6e3

由 Paul Guo 提交于 3月 19, 2020

Do final commit (notification) for one phase before local transaction commit recording on QD. (#9762)

Previously that code is after local transaction commit recording. If notification fails,
elog(ERROR) would cause long-jumping to do transaction aborting however that causes panic:
"cannot abort transaction %u, it was already committed".
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NAsim R P <apraveen@pivotal.io>

4bd1f6e3

L
docs - update pxf supported platforms for v5.11 (#9756) · 6df87adb
由 Lisa Owen 提交于 3月 18, 2020
```
* docs - update pxf supported platforms for v5.11

* combine 5.10 and 5.11 rows
```
6df87adb
D

Docs - update component lists for 6.5 · 2a434ae6
由 David Yozie 提交于 3月 18, 2020

2a434ae6

18 3月, 2020 4 次提交

Revert "Fix system views pg_stat_* (#9565)" (#9758) · ee4093b5

由 Jinbao Chen 提交于 3月 18, 2020

A query in the test will trigger a hang issue, but this issue cannot be
fixed in a short time. So we first revert it.

This reverts commit 148a20df.

ee4093b5

H
Fix assertion failure with two-stage agg, ROLLUP and HAVING. · 77aded9d
由 Heikki Linnakangas 提交于 3月 18, 2020
```
Fixes github issue https://github.com/greenplum-db/gpdb/issues/9760
```
77aded9d

Avoid ADD COLUMN full table rewrite for AOCO partitions · e707c19c

由 Melanie 提交于 3月 17, 2020

ALTER TABLE ADD COLUMN to a partition with storage type AOCO should not
trigger a full table rewrite. Instead, only data corresponding to the
new column should be written.

There was a regression caused by an implementation detail of the ALTER
TABLE machinery. 6c572399 in upstream Postgres (Postgres 9.1+ and
GPDB6+), changed ALTER TABLE ADD COLUMN.

Before 6c572399, ATPrepAddColumn() recursively processed child
partitions during the "prep" phase (phase 2), appending subcmds to the
AlteredTableInfo such that each partition table had a copy of all
AlterTableCmds. After 6c572399, ATExecAddColumn() recursively
processes children during the "exec" phase (phase 3), which happens
after subcmds are populated.

The AOCO ADD COLUMN code did a sanity check for the presence of
AlteredTableInfo->subcmds to determine if we can write only the new
column and avoid a full table rewrite. Child partitions were missing
these subcmds, so ALTER TABLE ADD COLUMN always did a full table
rewrite.

We initially considered moving the recursion back to the "prep" phase.
Other subcommand types (e.g. ALTER TABLE ALTER COLUMN TYPE) do recursion
during the "prep" phase. However, this would result in an unnecessary
and potentially incorrect diff from upstream Postgres.

Instead, we decoupled the logic for the table rewrite optimization from
the contents of the AlteredTableInfo->subcmds. Based on the ADD COLUMN
subcommands of the root partition, we determine if the optimization is
*possible*. Then we recurse to all descendant partitions, and, based on
the storage type of each of those relations, set the flag to indicate
whether or not the full table rewrite is required

Co-authored-by: Ashwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: Jesse Zhang <sbjesse@gmail.com>
Co-authored-by: Melanie Plageman <mplageman@pivotal.io>
Reviewed-by: Asim R P <apraveen@pivotal.io>
Reviewed-by: Soumyadeep Chakraborty <sochakraborty@pivotal.io>

e707c19c

gpinitsystem: update catalog with correct hostname · 03c7d557

由 Jamie McAtamney 提交于 3月 10, 2020

Previously, gpintsystem was incorrectly filling the hostname field of each
segment in gp_segment_configuration with the segment's address. This commit
changes it to correctly resolve hostnames and update the catalog accordingly.
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>

03c7d557

17 3月, 2020 4 次提交

Simplify the implementation of "dedup" semi-join paths. · 9628a332

由 Heikki Linnakangas 提交于 3月 11, 2020

There are two parts to this commit:

1. Instead of using ctids and "fake" ctids to disambiguate rows that have
been duplicated by a Broadcast motion, generate a synthetic 64-bit
rowid specifically for that purpose, just below the Broadcast. The old
method had to generate subplan IDs, fake ctids etc. to have a
combination of columns that formed the unique ID, and that entailed a
lot of infrastructure, like the whole concept of "pseudo columns". The
new method is much simpler, and it works the same way regardless of
what's below the Broadcast. The new 64-bit rowids are generated by
a new RowIdExpr expression node. The high 16 bits of the rowid are
the segment ID, and the low bits are generated by a simple counter.

2. Replace the cdbpath_dedup_fixup() post-processing step on the Plan
tree, by adding the unique ID column (now a RowIdExpr) in the Path's
pathtarget in the first place. cdbpath_motion_for_join() is the
function that does that now. Seems like a logical place, since that's
where any needed Motions are added on top of the join inputs. The
responsibility for creating the Unique path is moved from
cdb_add_join_path() to the create_nestloop/hashjoin/mergejoin_path()
functions.

This slightly changes the plan when one side of the join has Segment or
SegmentGeneral locus. For example:

select * from srf() r(a) where r.a in (select t.a/10 from tab t);

With this commit, that's implemented by running the FunctionScan and
generating the row IDs on only one node, and redistributing/broadcasting
the result. This is not exactly the same as before this PR: previously, if
the function was stable or immutable, we assumed that a SRF returns rows
in the same order on all nodes and relied on that when generating the
"synthetic ctid" on each row. We no longer assume that, but force the
function to run on a single node instead. That seems acceptable, this kind
of plans are helpful when the other side of the join is much larger, so
redistributing the smaller side is probably still OK.

This also uses the same strategy when the inner side is a replicated table
rather than a SRF. That was not supported before, but there was a TODO
comment that hinted at the possibility.

Fixes https://github.com/greenplum-db/gpdb/issues/9741, and adds tests
for it.
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

9628a332

Avoid re-executing if all tuples of the plan have been emitted. (#9596) · 3d11e871

由 Paul Guo 提交于 3月 17, 2020

On postgres, holdable cursor will rerun the executor after rewinding to put all
tuples in tuplestore during commit for later access, but for gpdb we do not
support backwards scanning so we just continue executing without rewinding the
executor, this could lead to some issues since some executor nodes are not
written or designed for the case that previously the plan has emitted all tuples.

Typical issues are seen as below:

1. assert failure (in execMotionSortedReceiver())
FailedAssertion("!(!((heap)->bh_size == 0) && heap->bh_has_heap_property)", File: "binaryheap.c"

2. elog(ERROR) like below.
ERROR: cannot execute squelched plan node of type: 232 (execProcnode.c:887)
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>

3d11e871

Fix memory leak in checkpointer process. (#9754) · 11825f99

由 Hao Wu 提交于 3月 17, 2020

Fix memory leak in checkpointer process. `dtxCheckPointInfo` is not freed in the memory context of the checkpoint process.

11825f99

S

Bump ORCA version to v 3.95.0 · 6b0465d5
由 Sambitesh Dash 提交于 3月 15, 2020

6b0465d5

16 3月, 2020 2 次提交

Remove pointless allocation check · e7e6f8d3

由 Daniel Gustafsson 提交于 3月 16, 2020

pg_strdup will only ever return a properly allocated pointer, so there
is no reason to check the returned pointer.
Reviewed-by: NPaul Guo <pguo@pivotal.io>

e7e6f8d3

Enable and enhance gpfdist SSL test cases (#9722) · d7e86dda

由 Huiliang.liu 提交于 3月 16, 2020

* Enable and enhance gpfdist SSL test cases

1. Add multiple root CA test cases for gpfdist SSL
2. Fix output file due to foreign table modification

d7e86dda

15 3月, 2020 1 次提交
- X
  
  Fix a bug in LocalXidGetDistributedXid · f3a92044
  由 xiong-gang 提交于 3月 15, 2020
  
  f3a92044
14 3月, 2020 3 次提交

Harden analyzedb when a table is dropped · 48be5628

由 Ashuka Xue 提交于 3月 09, 2020

Previously, analyzedb would error out and fail if a table was dropped
during analyzedb. Now, we silently skip dropped tables when determining
the tables to analyze.

48be5628

Fix gpinitsystem when setting password for numeric username · f188ecb5

由 Adam Berlin 提交于 3月 09, 2020

gpinitsystem did not quote the username while performing ALTER USER. When the username
is a numeric value the postgres parser gets upset - unless the username is quoted.

See here for more details:

https://www.postgresql.org/docs/9.4/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

- SQL identifiers and key words must begin with a letter (a-z, but also letters with
  diacritical marks and non-Latin letters) or an underscore (_).

- Also, there is a second kind of identifier: the delimited identifier or quoted
  identifier. It is formed by enclosing an arbitrary sequence of characters in
  double-quotes (")

- use variable interpolation provided by psql to properly quote user-provided values.

- use RETVAL to perform testing due to Commit d7b7a40aCo-authored-by: NJacob Champion <pchampion@pivotal.io>

f188ecb5

docs - new GUC plan_cache_mode (#9734) · 40347064

由 Mel Kiyama 提交于 3月 13, 2020

* docs - new GUC plan_cache_mode

--Add GUC
--update PREPARE reference, Notes section

Also, minor fix in guc_category_list

* docs - minor edits

40347064

13 3月, 2020 3 次提交

Fix ACL error when creating partition table. · a47fe8f7

由 prajnamort 提交于 3月 13, 2020

Fix GitHub Issue: #9524
Cause of this bug: Both heap_create_with_catalog() and
CopyRelationAcls() tried to write ACL for partition child tables,
which is not allowed.

Instead, We leave ACL to be NULL when heap_create_with_catalog()
creates child relations. This should fix the issue.

a47fe8f7

Convert disable_cost to a guc (#9728) · 91108398

由 Paul Guo 提交于 3月 13, 2020

There was a discussion about disable_cost being small sometimes on upstream,
https://www.postgresql.org/message-id/flat/CAO0i4_SSPV9TVxbbTRVLOnCyewopcc147fBZy%3Df2ABk15eHS%2Bg%40mail.gmail.com
but there is no conclusion there although many solutions are discussed there.
This issue seem to be more urgent for Greenplum since Greenplum is a
distributed system and can handle much more data than single node postgres.

Recently we encountered a user issue again that is quite relevant of this. In
the case merge join, instead of hashjoin is chosen. Admittedly in the case
there is a corner costing issue (might be uglily fixed though) in the hashjoin
code, but generally we are wondering if we should tune disable_cost to fix
given small disable_cost value has been known to be an issue sometimes also.

We expect finally disable_cost should be gone on gpdb master, and we know we
need to work to make hashjoin costing more accurate (though sometimes it is
hard), but for now users are waiting for a solution on gpdb6, so we use the guc
solution temporarily. Note in real scenario please temporarily tune it only
when needed (typically when enabled path cost is near to or greater than
disable cost but we want the enabled path) else there might be some side effect
of planning for "disabled" paths due to precision (e.g. some paths are ignored
due to precision).
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
Reviewed-by: NNing Yu <nyu@pivotal.io>

91108398

Fix analyzedb with config file to work with partitioned tables · d1611944

由 Chris Hajas 提交于 3月 09, 2020

Previously, running analyzedb with an input file (`analyzedb -f
<config_file`) containing a root partition would fail as we did not
properly populate the list of leaf partitions. The logic in analyzedb
assumes that we enumerate leaf partitions from the root partition that
the user had input (either from the command line or from an input file).
While we did this properly when the table was passed in from the command
line, we looked for the table name rather than the schema-qualifed table
for input files.

This would cause partitioned heap tables to fail when writing the
report/status files at the end, and would cause analyzedb to not track
DML changes in partitioned AO tables. Now, we properly check for the
schema-qualified table name.

d1611944

12 3月, 2020 7 次提交

M

docs - gpload new option --max-retries (#9716) · e3bf085a
由 Mel Kiyama 提交于 3月 11, 2020

e3bf085a

docs - add deflate support for s3 protocol (#9707) · 88dab77b

由 Mel Kiyama 提交于 3月 11, 2020

* docs - add deflate support for s3 protocol

* docs - review updates for s3 protocol compressed file information

* docs - clarified how s3 protocol recognizes gzip, deflate compressed files.

88dab77b

docs - CREATE FUNCTION - new attribute EXECUTE ON INITPLAN (#9693) · 10ded76b

由 Mel Kiyama 提交于 3月 11, 2020

* docs - CREATE FUNCTION - new EXECUTE ON INITPLAN attribute

Updated
CREATE FUNCTION
ALTER FUNCTION
Using Functions and Operators

This will be backported to 6X_STABLE

HTML output on temporary GPDB doc review site
https://docs-msk-gpdb6-dev.cfapps.io/7-0/ref_guide/sql_commands/CREATE_FUNCTION.html

https://docs-msk-gpdb6-dev.cfapps.io/7-0/ref_guide/sql_commands/ALTER_FUNCTION.html
https://docs-msk-gpdb6-dev.cfapps.io/7-0/admin_guide/query/topics/functions-operators.html

* docs - update EXECUTE ON INITPLAN attribute description based on review comments.
Labeled the attribute as Beta.

* docs - removed Beta note, fixed minor errors and typos.

10ded76b

M
docs - add examples for deprecated timestamp format YYYYMMDDHH24MISS. (#9665) · 180cc813
由 Mel Kiyama 提交于 3月 11, 2020
```
Add examples to note when migration from GPDB 4,5 to 6
What fails in 6 and some workarounds.
```
180cc813

Provide workaround to reclaim space on truncate cmd in sub-transaction · 6c16abca

由 Ashwin Agrawal 提交于 3月 11, 2020

As discussed in gpdb-users thread [1], currently no mechanism exist to
reclaim disk space for a table created and truncated or dropped
iteratively in a plpython function. PostgreSQL 11 provides
`plpy.commit()` using which this can be achieved. But in absence of
same need to provide some mechanism as MADlib has requirement for this
functionality for GPDB6 and forward.

Hence, considering the requirement as interim work-around adding guc
to be able to perform unsafe truncate instead of safe truncate from
plpython execute function. Setting the GUC can force unsafe truncation
only if

- inside sub-transaction and not in top transaction
- table was created somewhere within this transactions scope and not
  outside of it

The GUC will be set in the plpython udf and if any `plpy.execute()`
errors out the top transaction will also rollback. The GUC can't be
set in postgresql.conf file. Also, added description to warn the guc
is not for general purpose use and is developer only guc.

Added test showcases in simple form the scenario.

[1] https://groups.google.com/a/greenplum.org/d/msg/gpdb-users/YCtI4oUA3r0/t0CzhtL6AQAJReviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>

6c16abca

Add optimizer_enable_range_predicate_dpe GUC · 4ff5b9f1

由 Ashuka Xue 提交于 3月 03, 2020

This commit adds a GUC needed to enable/disable ORCA traceflag
introduced in ORCA commit : "Allow only equality comparisons for
Dynamic Partition Elimination"

4ff5b9f1

Add plan_cache_mode setting · f78c920c

由 Peter Eisentraut 提交于 7月 16, 2018

This allows overriding the choice of custom or generic plan.

Author: Pavel Stehule <pavel.stehule@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRAGLaiEm8ur5DWEBo7qHRWTk9HxkuUAz00CZZtJj-LkCA%40mail.gmail.com
(cherry picked from commit f7cb2842)

f78c920c

11 3月, 2020 1 次提交

Align unknown type handling with upstream (#9686) · 4c8dace3

由 David Kimura 提交于 3月 10, 2020

In Postgres / Greenplum, a literal (e.g. NULL or 'text') in a query with
no obvious type gets a pseudo-type "unknown". A column that refers to
such literals will also get the "unknown" type.

We used to have logic to infer type of unknown literals in a subquery
based on the context. For example, the following query returns 124 as
foo is inferred to be an integer:

```sql
select foo + 123 from (select '1' as foo) a;
```

As another example, Greenplum infers the type of NULL as int in the
following "INSERT ... SELECT" with unknown-typed columns from a nested
subquery:

```sql
-- CREATE TABLE foo (a int, b text, c int);

INSERT INTO foo
SELECT * FROM (
  '1',
  'bbb',
  NULL
) bar(x,y,z);
```

While this user experience is nice, this cleverness is flawed.
Specifically, given an operator expression (let's say it's an addition)
where one operand has "unknown" type, and the other operand has type T
(let's say int), it's questionable that we can infer the unknown must be
a T (an int). In fact, the number of operators that fit the pattern of
"+(T, ?)" could be either ambiguous, or nonexistent.

Of course, there are scenarios where types are clearer: the SELECT list
that feeds into an INSERT or UPDATE has to be compatible (read:
cast-ible) with the target columns. Similarly, the SELECT list that goes
into a set operation (e.g. UNION) has to be compatible with
corresponding columns from other sub-queries. Upstream Postgres already
handles a large part of these cases.

This commit removes the Greenplum-specific cleverness. See the updated
tests for the behavior changes. Because this mostly changes
parse-analysis, expressions that are already stored in the catalog are
_not_ impacted (read: we will be able to restore the output of pg_dump).

Not that it will give us feature parity, but in Postgres 10, we will
coerce remaining unknown typed literals to text. See Postgres commit
1e7c4bb0.
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

4c8dace3

10 3月, 2020 3 次提交

Remove gp_elog() function. · 4f0a5ced

由 Heikki Linnakangas 提交于 3月 10, 2020

It was only used for one message in gprecoverseg, and it doesn't seem
important.

The second argument to the function didn't do anything, since the removal
of email and SNMP alerts in commit 65822b80. And the NULL checks for the
arguments were pointless, because the function was marked as strict. But
rather than clean those up, let's just remove it altogether.
Reviewed-by: NAsim R P <apraveen@pivotal.io>
Reviewed-by: NJimmy Yih <jyih@pivotal.io>

4f0a5ced

W

Fix typo and deadlinks in gppc readme · fc008106
由 Wang Hao 提交于 3月 06, 2020

fc008106

Refactor the way executor parameters are dispatched. · 53d12bd5

由 Heikki Linnakangas 提交于 3月 09, 2020

We used to append the executor parameters to the array of external
parameters in the QD, before the query was dispatched, and copied them
back to the array of executor parameters in the QE. For clarity, refactor
that so that external query parameters are kept separate from executor
parameters at all times.

The old comments talked about parameters with type PARAM_EXEC_REMOTE.
There must have been some plan to use that with dispatched parameters, but
it had never actually been used.
Reviewed-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>

53d12bd5

09 3月, 2020 1 次提交

Remove code to add SubPlans used in Motions to subplan's target list. · 1afbbbad

由 Heikki Linnakangas 提交于 3月 09, 2020

I did some digging, this code was added in 2011 in this commit:

commit 796dcb358dc9dd40f2674373a2f542fd7c796e6a
Date:   Fri Oct 21 16:56:58 2011 -0800

    Add entries to subplan's targetlist for fixing setrefs of flow nodes in motion selectively.

    [JIRA: MPP-15086, MPP-15073]

    [git-p4: depot-paths = "//cdb2/main/": change = 99193]

Those JIRAs were duplicates of the same issue, which was an error in
regression tests, in the 'DML_over_joins' test. 'DML_over_joins' has
changed a lot since those days, but FWIW, it's not failing now, with
this code removed. The error from the JIRA looked like this:

    select m.* from m,r,purchase_par where m.a = r.a and m.b = purchase_par.id + 1;
    ERROR:  attribute number 2 exceeds number of columns 1 (execQual.c:626)  (seg2 slice1 rh55-tst3:10100 pid=12142)

The code has changed a lot since, and I believe this isn't needed anymore.
If the planner chooses to redistribute based on an expression, that
expression is surely used somewhere above the Motion, and should therefore
be in the targetlist already.

This fixes an error in the second test query that this adds. Before this
commit, the hashed SubPlan appeared twice in the target list of the Seq
Scan below the Redistribute Motion:

explain (verbose, costs off)
select * from foo where
  (case when foo.i in (select a.i from baz a) then foo.i else null end) in
  (select b.i from baz b);
                                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: foo.i, foo.j
   ->  Result
         Output: foo.i, foo.j
         ->  HashAggregate
               Output: foo.i, foo.j, foo.ctid, foo.gp_segment_id
               Group Key: foo.ctid, foo.gp_segment_id
               ->  Redistribute Motion 3:3  (slice2; segments: 3)
                     Output: foo.i, foo.j, foo.ctid, foo.gp_segment_id
                     Hash Key: foo.ctid
                     ->  Hash Join
                           Output: foo.i, foo.j, foo.ctid, foo.gp_segment_id
                           Hash Cond: (b.i = (CASE WHEN (hashed SubPlan 1) THEN foo.i ELSE NULL::integer END))
                           ->  Seq Scan on subselect_gp.baz b
                                 Output: b.i, b.j
                           ->  Hash
                                 Output: foo.i, foo.j, ((hashed SubPlan 1)), foo.ctid, foo.gp_segment_id, (CASE WHEN (hashed SubPlan 1) THEN foo.i ELSE NULL::integer END)
                                 ->  Redistribute Motion 3:3  (slice3; segments: 3)
                                       Output: foo.i, foo.j, ((hashed SubPlan 1)), foo.ctid, foo.gp_segment_id, (CASE WHEN (hashed SubPlan 1) THEN foo.i ELSE NULL::integer END)
                                       Hash Key: (CASE WHEN (hashed SubPlan 1) THEN foo.i ELSE NULL::integer END)
                                       ->  Seq Scan on subselect_gp.foo
                                             Output: foo.i, foo.j, (hashed SubPlan 1), foo.ctid, foo.gp_segment_id, CASE WHEN (hashed SubPlan 1) THEN foo.i ELSE NULL::integer END
                                             SubPlan 1
                                               ->  Broadcast Motion 3:3  (slice4; segments: 3)
                                                     Output: a.i
                                                     ->  Seq Scan on subselect_gp.baz a
                                                           Output: a.i
 Optimizer: Postgres query optimizer
 Settings: optimizer=off
(29 rows)

That failed at runtime with:

ERROR:  illegal rescan of motion node: invalid plan (nodeMotion.c:1232)  (seg0 slice3 127.0.0.1:40000 pid=16712) (nodeMotion.c:1232)
HINT:  Likely caused by bad NL-join, try setting enable_nestloop to off

Fixes: https://github.com/greenplum-db/gpdb/issues/9701Reviewed-by: NRichard Guo <riguo@pivotal.io>

1afbbbad