提交 · 4319b7bb8e0cf168f71c062eb79b02362e571769 · Greenplum / Gpdb

24 11月, 2017 5 次提交

Backport implementation of ORDER BY within aggregates, from PostgreSQL 9.0. · 4319b7bb

由 Heikki Linnakangas 提交于 9月 24, 2017

This is functionality that was lost by the ripout & replace.

commit 34d26872
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Dec 15 17:57:48 2009 +0000

    Support ORDER BY within aggregate function calls, at long last providing a
    non-kluge method for controlling the order in which values are fed to an
    aggregate function.  At the same time eliminate the old implementation
    restriction that DISTINCT was only supported for single-argument aggregates.

    Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
    dropped null values of x unconditionally.  Now, it does so only if the
    agg transition function is strict; otherwise nulls are treated as DISTINCT
    normally would, ie, you get one copy.

    Andrew Gierth, reviewed by Hitoshi Harada

4319b7bb

Remove PercentileExpr. · bb6a757e

由 Heikki Linnakangas 提交于 9月 22, 2017

This loses the functionality, and leaves all the regression tests that used
those functions failing.

The plan is to later backport the upstream implementation of those
functions from PostgreSQL 9.4. The feature is called "ordered set
aggregates" there.

bb6a757e

Wholesale rip out and replace Window planner and executor code · f62bd1c6

由 Heikki Linnakangas 提交于 11月 23, 2017

This adds some limitations, and removes some functionality that tte old
implementation had. These limitations will be lifted, and missing
functionality will be added back, in subsequent commits:

* You can no longer have variables in start/end offsets

* RANGE is not implemented (except for UNBOUNDED)

* If you have multiple window functions that require a different sort
  ordering, the planner is not smart about placing them in a way that
  minimizes the number of sorts.

This also lifts some limitations that the GPDB implementation had:

* LEAD/LAG offset can now be negative. In the qp_olap_windowerr, a lot of
  queries that used to throw an "ROWS parameter cannot be negative" error
  are now passing. That error was an artifact of the eay LEAD/LAG were
  implemented. Those queries contain window function calls like "LEAD(col1,
  col2 - col3)", and sometimes with suitable values in col2 and col3, the
  second argument went negative. That caused the error. implementation of
  LEAD/LAG is OK with a negative argument.

* Aggregate functions with no prelimfn or invprelimfn are now supported as
  window functions

* Window functions, e.g. rank(), no longer require an ORDER BY. (The output
  will vary from one invocation to another, though, because the order is
  then not well defined. This is more annoying on GPDB than on PostgreSQL,
  because in GDPB the row order tends to vary because the rows are spread
  out across the cluster and will arrive in the master in unpredictable
  order)

* NTILE doesn't require the argument expression to be in PARTITION BY

* A window function's arguments may contain references to an outer query.

This changes the OIDs of the built-in window functions to match upstream.
Unfortunately, the OIDs had been hard-coded in ORCA, so to work around that
until those hard-coded values are fixed in ORCA, the ORCA translator code
contains a hack to map the old OID to the new ones.

f62bd1c6

Extend the unknowns-are-same-as-known-inputs type resolution heuristic. · 0bf6f8b1

由 Heikki Linnakangas 提交于 11月 23, 2017

This is a cherry-pick from PosgreSQL 9.2. This will help to make some
existing test cases pass, after the big window functions rewrite that we've
been working on, where a NULL argument to percentile_cont() is otherwise
ambiguous. While we could include this in the big window functions rewrite,
that is large enough already, and this is a rather orthogonal patch, so
seems better to do this as a separate commit.

Add a test case, because there was none included in the upstream commit.

Upstream commit:

commit 1a8b9fb5
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu Nov 17 18:28:41 2011 -0500

Extend the unknowns-are-same-as-known-inputs type resolution heuristic.

For a very long time, one of the parser's heuristics for resolving
ambiguous operator calls has been to assume that unknown-type literals are
of the same type as the other input (if it's known). However, this was
only used in the first step of quickly checking for an exact-types match,
and thus did not help in resolving matches that require coercion, such as
matches to polymorphic operators. As we add more polymorphic operators,
this becomes more of a problem. This patch adds another use of the same
heuristic as a last-ditch check before failing to resolve an ambiguous
operator or function call. In particular this will let us define the range
inclusion operator in a less limited way (to come in a follow-on patch).

0bf6f8b1

Clean up qp_functions test case a bit. · 9451eb12

由 Heikki Linnakangas 提交于 11月 23, 2017

It created a schema at the beginning, but the "set search_path" command
was misspelled, so the schema was not actually used for anything. All the
objects were created in public. Also, at the end, the "start/end_ignore"
around the DROP commands were misspelled, so the output of the DROPs were
not ignored, after all.

Clean that up, by accepting the status quo. Continue to create the objects
in public, but remove the useless schema. The objects might be useful
for testing pg_dump/restore (particularly as part of pg_upgrade). And
remove the misspelled ignore-comments, to avoid confusion.

9451eb12

23 11月, 2017 7 次提交

Make 'must_gather' logic when planning DISTINCT and ORDER BY more robust. · a5610212

由 Heikki Linnakangas 提交于 11月 23, 2017

The old logic was:

1. Decide if we need to put a Gather motion on top of the plan
2. Add nodes to handle DISTINCT
3. Add nodes to handle ORDER BY.
4. Add Gather node, if we decided so in step 1.

If in step 1, if the result was already focused on a single segment, we
would make note that no Gather is needed, and not add one in step 4.
However, the DISTINCT processing might add a Redistribute Motion node, so
that the final result is not focused on a single node.

I couldn't come up with a query where that would happen, as the code stands,
but we saw such a case on the "window functions rewrite" branch we've been
working on. There, the sort order/distribution of the input can be changed
to process window functions. But even if this isn't actively broken right
now, it seems more robust to change the logic so that 'must_gather' means
'at the end, the result must end up on a single node', instead of 'we must
add a Gather node'. The test that this adds exercises this issue after the
the window functions rewrite, but right now it passes with or without these
code changes. But might as well add it now.

a5610212

Fix DISTINCT with window functions. · 898ced7c

由 Heikki Linnakangas 提交于 11月 23, 2017

The last 8.4 merge commit introduced support for DISTINCT with hashing,
and refactored the way grouping_planner() works with the path keys. That
broke DISTINCT with window functions, because the new distinct_pathkeys
field was not set correctly.

In commit 474f1db0, I moved some GPDB-added tests from the 'aggregates'
test, to a new 'gp_aggregates' test. But I forgot to add the new test file
to the test schedule, so it was not run. Oops. Add it to the schedule now.
The tests in 'gp_aggregates' cover this bug.

898ced7c

Add MMXLOG_REMOVE_HEAP_FILE and MMXLOG_REMOVE_APPENDONLY_FILE xlog records. · 0b29315a

由 Abhijit Subramanya 提交于 11月 16, 2017

The redo code for mmxlog records looks at the segment file number of the files
to be removed in order to decide whether to call the remove method for heap or
appendonly. If the segment file number is zero, it assumes a heap file and
calls smgrdounlink(). This is incorrect since in case of appendonly files,
since smgrdounlink() will end up deleting all the files including .1, .2 etc.

This patch introduces two new mmxlog record types and removes the old record
type for removing files so that the redo code can call the correct remove
method for heap and appendonly files.

0b29315a

Make atmsort recognize an ORDER BY in an OVER or WITHIN GROUP construct. · 7bc5d005

由 Heikki Linnakangas 提交于 11月 22, 2017

This silences false regression failures caused by row order differences in
some queries that use those constructs. While we don't have such false
failures in the regression suite right now, the upcoming rewrite of the
window functions implementation will change the plans for some queries, and
would cause some.

7bc5d005

Make test case predictable, by not using random() to produce test data. · fde41115

由 Heikki Linnakangas 提交于 11月 22, 2017

Currently, all the test queries on this data error out, which is why it
hasn't mattered. But we're about to make the queries work, in the window
functions rewrite.

fde41115

Change the OIDs of a few GPDB-added functions. · d324ea8e

由 Heikki Linnakangas 提交于 11月 22, 2017

To avoid clashing with upstream functions with the same OIDs. We are about
to merge the Window Functions patch from upstream, and the new window
functions are clashing with these.

d324ea8e

Update QueryFinishPending test for SharedInputScan · 27f58dd0

由 Dhanashree Kashid 提交于 11月 22, 2017

The intent of these tests is to test fault injection for plans with
SharedInputScan over a Sort. While the planner currnetly generates a
SharedInputScan for the window functions, after the window functions
rewrite we're working on, it will now.

To fulfill the intent; change the query to use WITH clause and ORDER BY
such that Planner generated SharedInputScan over a Sort node.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

27f58dd0

22 11月, 2017 12 次提交

Fix regression in RelationTruncate(), xlog correctly. · c8dd0c60

由 Ashwin Agrawal 提交于 11月 22, 2017

Commit 33960006 seems to have introduced
regression, need to add negation. For non-temp relations xloggging is needed,
this was caught by new replic_check facility built for wal replication, as it
flagged differenece between primary and mirror files, since primary was getting
truncated to 0 but missing xlog record didn't truncate the file on mirror.

c8dd0c60

A

Enable PageClearAllVisible() in mask_page_hint_bits(). · caa1b988
由 Ashwin Agrawal 提交于 11月 21, 2017

caa1b988

Fix bug in gplibpq.c for handling CONTRECORD. · 945d96f3

由 Xin Zhang 提交于 11月 21, 2017

Originally, we compute the offset of CONTRECORD, and compose the full
xlaorecord, and then just extract the header information. During the
composition of the xlaorecord, we point to a wrong location for the
continous record, which results in incorrect parsing of XLOG records.

Now, we only extract the header, and move offset to the proper page
boundary, so that the common page header code can handle the
CONTRECORD.

Author: Xin Zhang <xzhang@pivotal.io>
Author: Ashwin Agrawal <aagrawal@pivotal.io>

945d96f3

Refactor fts test with expected configuration. · 4ed97728

由 Ashwin Agrawal 提交于 11月 20, 2017

ExpectedPrimaryAndMirrorConfiguration() can explicitly
specify the primary and mirror status as well as the mode after
given responses and cdb component databases.

This will simplify the verification logic, and also easy to show
the intend of each test, and verify the final segment configuration.

Author: Ashwin Agrawal <aagrawal@pivotal.io>
Author: Xin Zhang <xzhang@pivotal.io>

4ed97728

A

Unit tests for FTS state transitions. · dface60a
由 Ashwin Agrawal 提交于 11月 16, 2017

dface60a

FTS correctly handle double fault scenario. · d8334860

由 Ashwin Agrawal 提交于 11月 17, 2017

When primary and mirror are not in sync, cluster is running with no HA for that
segment. If FTS fails to probe primary as well, currently nothing much can be
done except retrying. So, in such case FTS should not mark both the segment as
down as it would be cure is worse than the disease.

d8334860

FTS should record transition of primary to SYNC. · 1fd751a1

由 Ashwin Agrawal 提交于 11月 15, 2017

If walsender state transitions to streaming, FTS must record transition of same
on master in catalog. Recording the state to SYNC will enable failing over to
mirror incase of need.

1fd751a1

A

Use macros from gp_segment_config.h and delete redundant definition · 9182561a
由 Ashwin Agrawal 提交于 11月 15, 2017

9182561a

Bring back strdup on optline in pg_ctl read_post_opts · 096934e5

由 Jimmy Yih 提交于 11月 21, 2017

The strdups were removed in a recent 8.4 postgres merge commit
fa287d01. Without them, pg_ctl restart will usually fail as the memory
is freed for optline. Bring them back similar to how it was fixed in
an earlier commit.

Reference commit by Daniel Gustafsson:
https://github.com/greenplum-db/gpdb/commit/627ba5d8d83caa133ed1d8c9e13570bd162933bf

096934e5

Do not fast exit SyncRepWaitForLSN() based on SyncStandbysDefined() · 35a59e81

由 Ashwin Agrawal 提交于 11月 17, 2017

By setting up synchronized replication between primary and standby, commits on
primary should be blocked by function `SyncRepWaitForLSN()` until its effects
replicated to the standby. `SyncRepWaitForLSN()` was fast exiting if the
`SyncStandbysDefined()` is false, which means GUC `synchronous_standby_names` is
empty.

There was a race condition where the GUC is set but not reflected by some
backends. However, the `pg_stat_replication` `sync_state` is `sync`, and the
`state` is `streaming`, which means the commit should block and get replicated
to the mirror before it returns.

The change is to NOT do the fast exit based on the GUC, but only rely on
`WalSndCtl->sync_standbys_defined`.

Issue was spotted by Asim Praveen during whiteboard discussion.
Discussion: https://www.postgresql.org/message-id/flat/CABrsG8j3kPD+kbbsx_isEpFvAgaOBNGyGpsqSjQ6L8vwVUaZAQ@mail.gmail.com

35a59e81

A
Call log_invalid_page() with MAIN_FORKNUM for AO. · 7e5ade82
由 Ashwin Agrawal 提交于 11月 21, 2017
```
Fixing compilation error introduced due to commit
321c0529 with --enable-segwalrep.
```
7e5ade82

Use rank to exchange unnamed external partitions in pg_dump and cdb_dump_agent · 708a042c

由 Chris Hajas 提交于 11月 13, 2017

Previously we used parname to reference partitions to exchange, but this
doesn't work for range partitions that are not named. We now use
partition rank to exchange partitions that do not have names.

Solution derived from PR #1940
Signed-off-by: NKaren Huddleston <khuddleston@pivotal.io>

708a042c

21 11月, 2017 10 次提交

Remove unnecessary serial columns from a few upstream tests. · ea42a1cf

由 Heikki Linnakangas 提交于 11月 21, 2017

These had been added in GPDB a long time ago, presumably to make the test
output repeatable. But they're not needed anymore. Remove them to match
the upstream more closely, which helps when merging.

ea42a1cf

V

Uncomment test queries in qp_misc_jiras and domain tests · cd60609f
由 Venkatesh Raghavan 提交于 11月 21, 2017

cd60609f

Fix race condition between DROP IF EXISTS and RENAME again. · 4dc999fb

由 Heikki Linnakangas 提交于 11月 21, 2017

If a DROP IF EXISTS is run concurrently with ALTER TABLE RENAME TO, it's
possible that the table gets dropped on master, but not on the segments.
This was fixed earlier already, by commit 4739cd22, but the fix got lost
during the 8.4 merge. Fix it again, in the same fashion, and also add
a test case so that it won't get lost again, at least not in exactly the
same manner.

Spotted by @kuien, github issue #3874. Thanks to @tvesely for the test
case.

4dc999fb

Supporting Join Optimization Levels in GPORCA · c8192690

由 Bhuvnesh Chaudhary 提交于 11月 09, 2017

The concept of optimization levels is known in many enterprise
optimizers. It enables user to handle the degree of optimization that is
being employed. The optimization levels allow the grouping of
transformations into bags of rules (where each is assigned a particular
level). By default all rules are applied, but if a user wants to apply
fewer rules they are able to. This decision is made by them based on
domain knowledge, and they know even with fewer rules being applied the
plan generated satisfies their needs.

The Cascade optimizer, on which GPORCA is based on, allows grouping of
transformation rules into optimization levels. This concept of
optimization levels has also been extended to join ordering allowing
user to pick the join order via the query or, use greedy approach or use
exhaustive approach.

Postgres based planners use join_limit and from_limit to reduce the
search space. While the objective of Optimization/Join is also to reduce
search space, but the way it does it is different. It is requesting the
optimizer to apply or not apply a subset of rules and providing more
flexibility to the customer. This is one of the most frequently
requested feature from our enterprise clients who have high degree of
domain knowledge.

This PR introduces this concept. In the immediate future we are planning
to add different polynomial join ordering techniques with guaranteed
bound as part of the "Greedy" search.
Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>

c8192690

Refactor dynamic index scans and bitmap scans, to reduce diff vs. upstream. · 198f701e

由 Heikki Linnakangas 提交于 11月 20, 2017

Much of the code and structs used by index scans and bitmap index scans had
been fused together and refactored in GPDB, to share code between dynamic
index scans and regular ones. However, it would be nice to keep upstream
code unchanged as much as possible. To that end, refactor the exector code
for dynamic index scans and dynamic bitmap index scans, to reduce the diff
vs upstream.

The Dynamic Index Scan executor node is now a thin wrapper around the
regular Index Scan node, even thinner than before. When a new Dynamic Index
Scan begins, we don't do much initialization at that point. When the scan
begins, we initialize an Index Scan node for the first partition, and
return rows from it until it's exhausted. On next call, the underlying
Index Scan is destroyed, and a new Index Scan node is created, for the next
partition, and so on. Creating and destroying the IndexScanState for every
partition adds some overhead, but it's not significant compared to all the
other overhead of opening and closing the relations, building scan keys
etc.

Similarly, a Dynamic Bitmap Index Scan executor node is just a thin wrapper
for regular Bitmap Index Scan. When MultiExecDynamicBitmapIndexScan() is
called, it initializes an BitmapIndexScanState for the current partition,
and calls it. On ReScan, the BitmapIndexScan executor node for the old
partiton is shut down. A Dynamic Bitmap Index Scan differs from Dynamic
Index Scan in that a Dynamic Index Scan is responsible for iterating
through all the active partitions, while a Dynamic Bitmap Index Scan works
as a slave for the Dynamic Bitmap Heap Scan node above it.

It'd be nice to do a similar refactoring for heap scans, but that's for
another day.

198f701e

H

Oops, missed a comma. · 23feb165
由 Heikki Linnakangas 提交于 11月 20, 2017

23feb165

Don't bother to update distributed clog on sub-commit. · 21ff60e0

由 Heikki Linnakangas 提交于 11月 20, 2017

The distributed clog, like normal clog, will not be consulted for
subtransactions that are part of a still-in-progress transaction, so there
is no need to update it until we're ready to commit the top transaction.
This is basically the same changes that was done in the upstream for clog
in commit 06da3c57. We're about to merge that change from the upstream as
part of the PostgreSQL 8.4 merge, but we can make that change for the
distributed log separately, to keep the actual merge commit smaller.

21ff60e0

H
Fix indentation in smgr code. · 14cb19b4
由 Heikki Linnakangas 提交于 11月 20, 2017
```
By running pgindent, and tidying a bunch of things manually.
```
14cb19b4

Move some GPDB-specific code out of smgr.c and md.c. · 306b189d

由 Heikki Linnakangas 提交于 11月 20, 2017

For clarity, and to make merging easier.

The code to manage the hash table of "pending resync EOFs" for append-only
tables is moved to smgr_ao.c. One notable change here is that the
pendingDeletesPerformed flag is removed. It was used to track whether there
are any pending deletes, or any pending AO table resyncs, but we might as
well check the pending delete list and the pending syncs hash table
directly, it's hardly any slower than checking a separate boolean.

There are still plenty of GPDB changes in smgr.c, but this is a good step
forward.

306b189d

Remove unnecessary ORDER BYs from upstream tests. · cc6b462b

由 Heikki Linnakangas 提交于 11月 20, 2017

These were added in GPDB a long time ago, probably before gpdiff.pl was
introduced to mask row order differences in regression test output. But
now that gpdiff.pl can do that, these are unnecessary. Revert to match
the upstream more closely.

This includes updates to the 'rules' and 'inherit' tests, although they
are disabled and still doesn't pass after these changes.

cc6b462b

19 11月, 2017 1 次提交

Use the new 8.3 style when printing Access Privileges in psql. · ef408151

由 Heikki Linnakangas 提交于 11月 18, 2017

Long time ago, when updating our psql version to 8.3 (or something higher),
we had decided to keep the old single-line style when displaying access
privileges, to avoid having to update regression tests. It's time to move
forward, update the tests, and use the nicer 8.3 style for displaying
access privileges.

Also, \d on a view no longer prints the View Definition. You need to use
the verbose \d+ option for that. (I'm not a big fan of that change myself:
when I want to look at a view I'm almost always interested in the View
Definition. But let's not second-guess decisions made almost 10 years ago
in the upstream.)

Note: psql still defaults to the "old-ascii" style when printing multi-line
fields. The new style was introduced only later, in 9.0, so to avoid
changing all the expected output files, we should stick to the old style
until we reach that point in the merge. This commit only changes the style
for Access privileges, which is different from the multi-line style.

ef408151

17 11月, 2017 3 次提交

Remove unused istoasted field. · 9d0b7d17

由 Heikki Linnakangas 提交于 11月 17, 2017

According to git history, this was added a very long time ago, before GPDB
was open sourced, because it was needed by some migration tool back then.
But the code that used it has been removed since.

9d0b7d17

Unit test probeWalRepPublishUpdate · cd74a680

由 Taylor Vesely 提交于 11月 14, 2017

Initial work was done in collaboration with Kanaiya Kariya @kanhaiya7
and Jiangtian Nie @zhadan01
Signed-off-by: NXin Zhang <xzhang@pivotal.io>
Signed-off-by: NAsim Praveen <apraveen@pivotal.io>
Signed-off-by: NAshwin Agrawal <aagrawal@pivotal.io>

cd74a680

A

Remove usage of global variable cdb_component_dbs. · 4e525e1d
由 Ashwin Agrawal 提交于 11月 15, 2017

4e525e1d

16 11月, 2017 2 次提交

D
Fix _optimizer file for HASH partition removal · 456e59e0
由 Daniel Gustafsson 提交于 11月 16, 2017
```
This was lost in commit 152d1223 which has the corresponding
update for the postgres planner output.
```
456e59e0

Refactor and simplify coerce_partition_value() · 62f45b47

由 Daniel Gustafsson 提交于 11月 16, 2017

Avoid the extra string copy by using the StringInfo buffer and
simplify the code a bit now that HASH partitions are removed.

62f45b47