提交 · 44367278ae907976ba895bad5f5dc5696b4063ea · Greenplum / Gpdb

30 11月, 2017 2 次提交
- H
  Fix reversed flags to pull_up_clause(). · 44367278
  由 Heikki Linnakangas 提交于 11月 30, 2017
```
Looks like you can't actually get here with any aggregates or placeholders
in the start/end offsets, or we would've gotten errors.
```
  44367278
- H
  Remove some code that was duplicated by a mis-merge. · 8b60e44d
  由 Heikki Linnakangas 提交于 11月 29, 2017
```
Harmless, but clearly useless and unintentional.
```
  8b60e44d
24 11月, 2017 11 次提交

Backport upstream comment updates · 122e817b

由 Heikki Linnakangas 提交于 10月 07, 2017

commit 96f990e2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Jul 13 20:23:09 2011 -0400

    Update some comments to clarify who does what in targetlist creation.

    No code changes; just avoid blaming query_planner for things it doesn't
    really do.

122e817b

Backport upstream bugfix related to Window functions. · 411a033c

由 Heikki Linnakangas 提交于 10月 07, 2017

The test case added to the regression suite actually seems to work on
GPDB even without this, but nevertheless seems like a good idea to pick
it now, since we have the code it affected. Also, I'm about to backport
more stuff that depend on this.

commit c1d9579d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue Jul 12 18:23:55 2011 -0400

Avoid listing ungrouped Vars in the targetlist of Agg-underneath-Window.

Regular aggregate functions in combination with, or within the arguments
of, window functions are OK per spec; they have the semantics that the
aggregate output rows are computed and then we run the window functions
over that row set. (Thus, this combination is not really useful unless
there's a GROUP BY so that more than one aggregate output row is possible.)
The case without GROUP BY could fail, as recently reported by Jeff Davis,
because sloppy construction of the Agg node's targetlist resulted in extra
references to possibly-ungrouped Vars appearing outside the aggregate
function calls themselves. See the added regression test case for an
example.

Fixing this requires modifying the API of flatten_tlist and its underlying
function pull_var_clause. I chose to make pull_var_clause's API for
aggregates identical to what it was already doing for placeholders, since
the useful behaviors turn out to be the same (error, report node as-is, or
recurse into it). I also tightened the error checking in this area a bit:
if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
that was a long time ago, so complain instead of ignoring them.

Backpatch into 9.1. The failure exists in 8.4 and 9.0 as well, but seeing
that it only occurs in a basically-useless corner case, it doesn't seem
worth the risks of changing a function API in a minor release. There might
be third-party code using pull_var_clause.

411a033c

Cherry-pick change to pull_var_clause() API. · bd3ab7bd

由 Heikki Linnakangas 提交于 10月 07, 2017

We would get this later in PostgreSQL 8.4, but I'm about to cherry-pick
more commits now, that depends on this.

Upstream commmit:

commit 1d97c19a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Apr 19 19:46:33 2009 +0000

    Fix estimate_num_groups() to not fail on PlaceHolderVars, per report from
    Stefan Kaltenbrunner.  The most reasonable behavior (at least for the near
    term) seems to be to ignore the PlaceHolderVar and examine its argument
    instead.  In support of this, change the API of pull_var_clause() to allow
    callers to request recursion into PlaceHolderVars.  Currently
    estimate_num_groups() is the only customer for that behavior, but where
    there's one there may be others.

bd3ab7bd

H
Fix assertion failure from calling contain_agg_clause on raw parse tree · 7c0ceea1
由 Heikki Linnakangas 提交于 10月 02, 2017
```
It assumes that any SubLinks have been processed already.
```
7c0ceea1

Re-implement RANGE PRECEDING/FOLLOWING. · 14a9108a

由 Heikki Linnakangas 提交于 9月 29, 2017

This is similar to the old implementation, in that we use "+", "-" to
compute the boundaries.

Unfortunately it seems unlikely that this would be accepted in the
upstream, but at least we have that feature back in GPDB now, the way it
used to be. See discussion on pgsql-hackers about that:
https://www.postgresql.org/message-id/26801.1265656635@sss.pgh.pa.us

14a9108a

Support ordered-set (WITHIN GROUP) aggregates. · fd6212ce

由 Heikki Linnakangas 提交于 9月 29, 2017

This is backport from PostgreSQL 9.4. It brings back functionality that we
lost with the ripout & replace of the window function implementation.

I left out all the code and tests related to COLLATE, because we don't have
that feature. Will need to put that back when we merge collation support, in
9.1.

commit 8d65da1f
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Dec 23 16:11:35 2013 -0500

    Support ordered-set (WITHIN GROUP) aggregates.

    This patch introduces generic support for ordered-set and hypothetical-set
    aggregate functions, as well as implementations of the instances defined in
    SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
    percent_rank(), cume_dist()).  We also added mode() though it is not in the
    spec, as well as versions of percentile_cont() and percentile_disc() that
    can compute multiple percentile values in one pass over the data.

    Unlike the original submission, this patch puts full control of the sorting
    process in the hands of the aggregate's support functions.  To allow the
    support functions to find out how they're supposed to sort, a new API
    function AggGetAggref() is added to nodeAgg.c.  This allows retrieval of
    the aggregate call's Aggref node, which may have other uses beyond the
    immediate need.  There is also support for ordered-set aggregates to
    install cleanup callback functions, so that they can be sure that
    infrastructure such as tuplesort objects gets cleaned up.

    In passing, make some fixes in the recently-added support for variadic
    aggregates, and make some editorial adjustments in the recent FILTER
    additions for aggregates.  Also, simplify use of IsBinaryCoercible() by
    allowing it to succeed whenever the target type is ANY or ANYELEMENT.
    It was inconsistent that it dealt with other polymorphic target types
    but not these.

    Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
    and rather heavily editorialized upon by Tom Lane

Also includes this fixup:

commit cf63c641
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Dec 23 20:24:07 2013 -0500

    Fix portability issue in ordered-set patch.

    Overly compact coding in makeOrderedSetArgs() led to a platform dependency:
    if the compiler chose to execute the subexpressions in the wrong order,
    list_length() might get applied to an already-modified List, giving a
    value we didn't want.  Per buildfarm.

fd6212ce

Add infrastructure for storing a VARIADIC ANY function's VARIADIC flag. · c70617a2

由 Heikki Linnakangas 提交于 9月 29, 2017

This is a backport from the following commit from PostgreSQL 9.3. Needed
now, because subsequent backported commits depend on it.

commit 75b39e79
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon Jan 21 20:25:26 2013 -0500

Add infrastructure for storing a VARIADIC ANY function's VARIADIC flag.

Originally we didn't bother to mark FuncExprs with any indication whether
VARIADIC had been given in the source text, because there didn't seem to be
any need for it at runtime. However, because we cannot fold a VARIADIC ANY
function's arguments into an array (since they're not necessarily all the
same type), we do actually need that information at runtime if VARIADIC ANY
functions are to respond unsurprisingly to use of the VARIADIC keyword.
Add the missing field, and also fix ruleutils.c so that VARIADIC ANY
function calls are dumped properly.

Extracted from a larger patch that also fixes concat() and format() (the
only two extant VARIADIC ANY functions) to behave properly when VARIADIC is
specified. This portion seems appropriate to review and commit separately.

Pavel Stehule

c70617a2

Centralize the logic for detecting misplaced aggregates, window funcs, etc. · fc8f849d

由 Heikki Linnakangas 提交于 9月 29, 2017

This cherry-picks the following commit. This is needed because subsequent
commits depend on this one.

I took the EXPR_KIND_PARTITION_EXPRESSION value from PostgreSQL v10, where
it's also for partition-related things. Seems like a good idea, even though
our partitioning implementation is completely different.

commit eaccfded
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Aug 10 11:35:33 2012 -0400

Centralize the logic for detecting misplaced aggregates, window funcs, etc.

Formerly we relied on checking after-the-fact to see if an expression
contained aggregates, window functions, or sub-selects when it shouldn't.
This is grotty, easily forgotten (indeed, we had forgotten to teach
DefineIndex about rejecting window functions), and none too efficient
since it requires extra traversals of the parse tree. To improve matters,
define an enum type that classifies all SQL sub-expressions, store it in
ParseState to show what kind of expression we are currently parsing, and
make transformAggregateCall, transformWindowFuncCall, and transformSubLink
check the expression type and throw error if the type indicates the
construct is disallowed. This allows removal of a large number of ad-hoc
checks scattered around the code base. The enum type is sufficiently
fine-grained that we can still produce error messages of at least the
same specificity as before.

Bringing these error checks together revealed that we'd been none too
consistent about phrasing of the error messages, so standardize the wording
a bit.

Also, rewrite checking of aggregate arguments so that it requires only one
traversal of the arguments, rather than up to three as before.

In passing, clean up some more comments left over from add_missing_from
support, and annotate some tests that I think are dead code now that that's
gone. (I didn't risk actually removing said dead code, though.)

Author: Heikki Linnakangas <hlinnakangas@pivotal.io>
Author: Ekta Khanna <ekhanna@pivotal.io>

fc8f849d

Backport implementation of ORDER BY within aggregates, from PostgreSQL 9.0. · 4319b7bb

由 Heikki Linnakangas 提交于 9月 24, 2017

This is functionality that was lost by the ripout & replace.

commit 34d26872
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Dec 15 17:57:48 2009 +0000

    Support ORDER BY within aggregate function calls, at long last providing a
    non-kluge method for controlling the order in which values are fed to an
    aggregate function.  At the same time eliminate the old implementation
    restriction that DISTINCT was only supported for single-argument aggregates.

    Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
    dropped null values of x unconditionally.  Now, it does so only if the
    agg transition function is strict; otherwise nulls are treated as DISTINCT
    normally would, ie, you get one copy.

    Andrew Gierth, reviewed by Hitoshi Harada

4319b7bb

Remove PercentileExpr. · bb6a757e

由 Heikki Linnakangas 提交于 9月 22, 2017

This loses the functionality, and leaves all the regression tests that used
those functions failing.

The plan is to later backport the upstream implementation of those
functions from PostgreSQL 9.4. The feature is called "ordered set
aggregates" there.

bb6a757e

Wholesale rip out and replace Window planner and executor code · f62bd1c6

由 Heikki Linnakangas 提交于 11月 23, 2017

This adds some limitations, and removes some functionality that tte old
implementation had. These limitations will be lifted, and missing
functionality will be added back, in subsequent commits:

* You can no longer have variables in start/end offsets

* RANGE is not implemented (except for UNBOUNDED)

* If you have multiple window functions that require a different sort
  ordering, the planner is not smart about placing them in a way that
  minimizes the number of sorts.

This also lifts some limitations that the GPDB implementation had:

* LEAD/LAG offset can now be negative. In the qp_olap_windowerr, a lot of
  queries that used to throw an "ROWS parameter cannot be negative" error
  are now passing. That error was an artifact of the eay LEAD/LAG were
  implemented. Those queries contain window function calls like "LEAD(col1,
  col2 - col3)", and sometimes with suitable values in col2 and col3, the
  second argument went negative. That caused the error. implementation of
  LEAD/LAG is OK with a negative argument.

* Aggregate functions with no prelimfn or invprelimfn are now supported as
  window functions

* Window functions, e.g. rank(), no longer require an ORDER BY. (The output
  will vary from one invocation to another, though, because the order is
  then not well defined. This is more annoying on GPDB than on PostgreSQL,
  because in GDPB the row order tends to vary because the rows are spread
  out across the cluster and will arrive in the master in unpredictable
  order)

* NTILE doesn't require the argument expression to be in PARTITION BY

* A window function's arguments may contain references to an outer query.

This changes the OIDs of the built-in window functions to match upstream.
Unfortunately, the OIDs had been hard-coded in ORCA, so to work around that
until those hard-coded values are fixed in ORCA, the ORCA translator code
contains a hack to map the old OID to the new ones.

f62bd1c6

23 11月, 2017 2 次提交

Make 'must_gather' logic when planning DISTINCT and ORDER BY more robust. · a5610212

由 Heikki Linnakangas 提交于 11月 23, 2017

The old logic was:

1. Decide if we need to put a Gather motion on top of the plan
2. Add nodes to handle DISTINCT
3. Add nodes to handle ORDER BY.
4. Add Gather node, if we decided so in step 1.

If in step 1, if the result was already focused on a single segment, we
would make note that no Gather is needed, and not add one in step 4.
However, the DISTINCT processing might add a Redistribute Motion node, so
that the final result is not focused on a single node.

I couldn't come up with a query where that would happen, as the code stands,
but we saw such a case on the "window functions rewrite" branch we've been
working on. There, the sort order/distribution of the input can be changed
to process window functions. But even if this isn't actively broken right
now, it seems more robust to change the logic so that 'must_gather' means
'at the end, the result must end up on a single node', instead of 'we must
add a Gather node'. The test that this adds exercises this issue after the
the window functions rewrite, but right now it passes with or without these
code changes. But might as well add it now.

a5610212

Fix DISTINCT with window functions. · 898ced7c

由 Heikki Linnakangas 提交于 11月 23, 2017

The last 8.4 merge commit introduced support for DISTINCT with hashing,
and refactored the way grouping_planner() works with the path keys. That
broke DISTINCT with window functions, because the new distinct_pathkeys
field was not set correctly.

In commit 474f1db0, I moved some GPDB-added tests from the 'aggregates'
test, to a new 'gp_aggregates' test. But I forgot to add the new test file
to the test schedule, so it was not run. Oops. Add it to the schedule now.
The tests in 'gp_aggregates' cover this bug.

898ced7c

21 11月, 2017 1 次提交

Refactor dynamic index scans and bitmap scans, to reduce diff vs. upstream. · 198f701e

由 Heikki Linnakangas 提交于 11月 20, 2017

Much of the code and structs used by index scans and bitmap index scans had
been fused together and refactored in GPDB, to share code between dynamic
index scans and regular ones. However, it would be nice to keep upstream
code unchanged as much as possible. To that end, refactor the exector code
for dynamic index scans and dynamic bitmap index scans, to reduce the diff
vs upstream.

The Dynamic Index Scan executor node is now a thin wrapper around the
regular Index Scan node, even thinner than before. When a new Dynamic Index
Scan begins, we don't do much initialization at that point. When the scan
begins, we initialize an Index Scan node for the first partition, and
return rows from it until it's exhausted. On next call, the underlying
Index Scan is destroyed, and a new Index Scan node is created, for the next
partition, and so on. Creating and destroying the IndexScanState for every
partition adds some overhead, but it's not significant compared to all the
other overhead of opening and closing the relations, building scan keys
etc.

Similarly, a Dynamic Bitmap Index Scan executor node is just a thin wrapper
for regular Bitmap Index Scan. When MultiExecDynamicBitmapIndexScan() is
called, it initializes an BitmapIndexScanState for the current partition,
and calls it. On ReScan, the BitmapIndexScan executor node for the old
partiton is shut down. A Dynamic Bitmap Index Scan differs from Dynamic
Index Scan in that a Dynamic Index Scan is responsible for iterating
through all the active partitions, while a Dynamic Bitmap Index Scan works
as a slave for the Dynamic Bitmap Heap Scan node above it.

It'd be nice to do a similar refactoring for heap scans, but that's for
another day.

198f701e

11 11月, 2017 1 次提交

Align simplify_EXISTS_query with upstream · c823e7c6

由 Dhanashree Kashid 提交于 10月 09, 2017

This function had diverged a lot from upstream; post subselect merge.
One of the main reason is that upstream has lot of restrictive checks
which prevent pull-up of EXISTS/NOT EXISTS. GPDB handles them
differently; thus producing a join/initplan or a one-time filter.

The cases that GPDB handles and for which we have not ported the checks
from upstream are as follows:

- AGG with limit count with/without offset
- HAVING clause without AGG
- AGG without HAVING clause

For other conditions, we bail out as upstream. Hence we have added
checks differently for having and aggs inside simplify_EXISTS_query.
Rest of the code is similar to upstream.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

c823e7c6

30 10月, 2017 1 次提交
- A
  Retire gp_libpq_fe part 2, changing including path · 974c414e
  由 Adam Lee 提交于 10月 23, 2017
```
Signed-off-by: NAdam Lee <ali@pivotal.io>
```
  974c414e
21 10月, 2017 1 次提交

Fix distribution of rows in CREATE TABLE AS and ORDER BY. · c159ec72

由 Heikki Linnakangas 提交于 10月 20, 2017

If a CREATE TABLE AS query contained an ORDER BY, the planner put a Motion
node on top of the plan that focuses all the rows to a single node.
However, that was confused with the re-distribute motion that CREATE TABLE
AS that is supposed to go to the top, to distribute the rows according to
the DISTRIBUTED BY of the table. This used to work before commit
7e268107, because we used to not add an explicit Motion node on top of
the plan for ORDER BY, but we just changed the sort-order information in
the Flow.

I have a nagging feeling that the apply_motion code isn't dealing with
Motion on top of a Motion node correctly, because I would've expected to
get a plan like that without this fix. Perhaps apply_motion silentlye
refuses to add a Motion node on top of an existing Motion? That'd be a
silly plan, of course, and the planner doesn't fortunately create such
plans, so I'm not going to dig deeper into that right now.

The test case is a simplified version from one of the
"mpp21090_drop_col_oids_dml_*" TINC tests. I noticed this while moving
those tests over from TINC to the main suite. We only run those tests
in the concourse pipeline with "set optimizer=on", so it didn't catch
this issue with optimizer=off.

Fixes github issue #3577.

c159ec72

20 10月, 2017 1 次提交

Refactor to use AOCS totalbyte code · 70feac1c

由 Daniel Gustafsson 提交于 10月 20, 2017

We had code for getting the total bytes consumed by an AOCS table,
as well as code implementing the very same asking for a common
function. Extend GetAOCSTotalBytes() to deal with uncompressed or
compressed data and refactor callsites to use this function.

This also removes the need for a memory allocation in the codepaths
which only want to know the number of bytes.

70feac1c

17 10月, 2017 1 次提交

Fix assertion failure if a window function's PARTITION BY is constant. · e49c5e70

由 Heikki Linnakangas 提交于 10月 17, 2017

If a window function had a PARTITION BY clause, but the planner was able
to deduce that it's a constant at runtime, we would still try to distribute
the rows according to the non-existent hash expression. Creating a hash
locus with no hash expressions tripped an assertion.

Fixes github issues #3423 and #3446. Backpatch to 5X_STABLE.

e49c5e70

13 10月, 2017 1 次提交

Remove superfluous pathkey canonicalization · 7913e231

由 Jesse Zhang 提交于 10月 12, 2017

`make_pathkeys_for_sortclauses` with a `true` last argument promises to
canonicalize the returned path keys. We somehow cargo-culted a few
unnecessary `canonicalize_pathkeys` immediately after those calls.

This commit removes such superfluous calls to `canonicalize_pathkeys`.
Signed-off-by: NMax Yang <myang@pivotal.io>

7913e231

12 10月, 2017 1 次提交
- H
  Fix planner bug in handling LIMIT without an ORDER BY. · d325233c
  由 Heikki Linnakangas 提交于 10月 12, 2017
```
This bug was introduced in commit 7e268107, which changed the way we
track the "current" ordering in the planner.
```
  d325233c
11 10月, 2017 1 次提交

Fix crash with a ROLLUP query. · 281ad5e9

由 Heikki Linnakangas 提交于 10月 11, 2017

This was broken by commit 7e268107, which refactored the code that deals
with path keys and sorts in plangroupext.c. The new function
make_sort_from_pathkeys_and_groupingcol(), which replaced the old
make_sort_from_reordered_groupcols() function, didn't work quite the same
as the old function. I'm not sure what exactly went wrong there, but the
caller already has the column number and operator information at hand, so
we can use it to construct the Sort directly, without trying to re-find the
original target list entries of the sort columns.

Commit 7e268107 also neglected the comments in
make_sort_from_pathkeys_and_groupingcol(), but this commit removes the whole
function.

Fixes github issue #3447.

281ad5e9

10 10月, 2017 1 次提交

Hide the two tuplesort implementations behind a common facade. · bbf40a8c

由 Heikki Linnakangas 提交于 10月 10, 2017

We have two implementations of tuplesort: the "regular" one inherited
from upstream, in tuplesort.c, and a GPDB-specific tuplesort_mk.c. We had
modified all the callers to check the gp_enable_mk_sort GUC, and deal with
both of them. However, that makes merging with upstream difficult, and
litters the code with the boilerplate to check the GUC and call one of
the two implementations.

Simplify the callers, by providing a single API that hides the two
implementations from the rest of the system. The API is the tuplesort_*
functions, as in upstream. This requires some preprocessor trickery,
so that tuplesort.c can use the tuplesort_* function names as is, but in
the rest of the codebase, calling tuplesort_*() will call a "switcheroo"
function that decides which implementation to actually call. While this
is more lines of code overall, it keeps all the ugliness confined in
tuplesort.h, not littered throughout the codebase.

bbf40a8c

30 9月, 2017 1 次提交

Don't transform large array Const into ArrayExpr for Orca (#3406) · 1c82fac2

由 Haisheng Yuan 提交于 9月 29, 2017

Don't transform large array Const into ArrayExpr for Orca (#3406)

If the number of elements in the array Const is greater than
optimizer_array_expansion_threshold, returns the original Const unmodified.
Otherwise, it will cause severe performance issue for Orca optimizer for array
with very large number of elements, e.g. 50K.

Fixes issue #3355
[#151340976]

1c82fac2

28 9月, 2017 1 次提交

Fix unintentional fallthrough for SpecialJoinInfo · 98538492

由 Ekta Khanna 提交于 9月 27, 2017

In copyFuncs(), the fallthrough for SpecialJoinInfo was unintentionally
done as a part of subselect merge.  This commit fixes that along with
adding support for new fields in copyFuncs(), outFunc() and equalFunc()

[Ref #151491231]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

98538492

27 9月, 2017 13 次提交

Disable flattening of IN/EXISTS sublinks inside outer joins · fb1448d0

由 Tom Lane 提交于 2月 27, 2009

commit 07b9936a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Fri Feb 27 23:30:29 2009 +0000

    Temporarily (I hope) disable flattening of IN/EXISTS sublinks that are within
    the ON clause of an outer join.  Doing so is semantically correct but results
    in de-optimizing queries that were structured to take advantage of the sublink
    style of execution, as seen in recent complaint from Kevin Grittner.  Since
    the user can get the other behavior by reorganizing his query, having the
    flattening happen automatically is just a convenience, and that doesn't
    justify breaking existing applications.  Eventually it would be nice to
    re-enable this, but that seems to require a significantly different approach
    to outer joins in the executor.

Added relevant test case.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

fb1448d0

Don't assume a subquery's output is unique if there's a SRF in its tlist · e7ff3ef1

由 Ekta Khanna and Jemish Patel 提交于 9月 18, 2017

Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Jul 8 14:03:32 2014 -0400

    While the x output of "select x from t group by x" can be presumed unique,
    this does not hold for "select x, generate_series(1,10) from t group by x",
    because we may expand the set-returning function after the grouping step.
    (Perhaps that should be re-thought; but considering all the other oddities
    involved with SRFs in targetlists, it seems unlikely we'll change it.)
    Put a check in query_is_distinct_for() so it's not fooled by such cases.

    Back-patch to all supported branches.

    David Rowley

(cherry picked from commit 2e7469dc8b3bac4fe0f9bd042aaf802132efde85)

e7ff3ef1

Fix possible crash with nested SubLinks. · cb7e418d

由 Ekta Khanna 提交于 9月 13, 2017

Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Dec 10 16:10:36 2013 -0500

An expression such as WHERE (... x IN (SELECT ...) ...) IN (SELECT ...)
could produce an invalid plan that results in a crash at execution time,
if the planner attempts to flatten the outer IN into a semi-join.
This happens because convert_testexpr() was not expecting any nested
SubLinks and would wrongly replace any PARAM_SUBLINK Params belonging
to the inner SubLink.  (I think the comment denying that this case could
happen was wrong when written; it's certainly been wrong for quite a long
time, since very early versions of the semijoin flattening logic.)

Per report from Teodor Sigaev.  Back-patch to all supported branches.

(cherry picked from commit 884c6384a2db34f6a65573e6bfd4b71dfba0de90)

cb7e418d

Fix planner's handling of outer PlaceHolderVars within subqueries. · 45cbf64a

由 Ekta Khanna 提交于 9月 12, 2017

commit 0a0ca1cb18a34e92ab549df171e174dcce7bf7a3
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat Mar 24 16:22:00 2012 -0400

Fix planner's handling of outer PlaceHolderVars within subqueries.

For some reason, in the original coding of the PlaceHolderVar mechanism
I had supposed that PlaceHolderVars couldn't propagate into subqueries.
That is of course entirely possible. When it happens, we need to treat
an outer-level PlaceHolderVar much like an outer Var or Aggref, that is
SS_replace_correlation_vars() needs to replace the PlaceHolderVar with
a Param, and then when building the finished SubPlan we have to provide
the PlaceHolderVar expression as an actual parameter for the SubPlan.
The handling of the contained expression is a bit delicate but it can be
treated exactly like an Aggref's expression.

In addition to the missing logic in subselect.c, prepjointree.c was failing
to search subqueries for PlaceHolderVars that need their relids adjusted
during subquery pullup. It looks like everyplace else that touches
PlaceHolderVars got it right, though.

Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this
oversight would fail with "ERROR: Upper-level PlaceHolderVar found where
not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong
answers, since the value transmitted into the subquery wouldn't go to null
when it should.

45cbf64a

Remove is_simple_subquery() check in simplify_EXISTS_query() · 77f804f5

由 Shreedhar Hardikar 提交于 9月 07, 2017

GPDB handles a lot of the cases that are restricted by
is_simple_subquery; and the restrictions not handled, are checked for
separately in convert_EXISTS_sublink_to_join().

Resulting from cascading ICG failures, we also fixed the following:

- initialize all the members of IncrementVarSublevelsUp_context
  properly.
- remove incorrect assertions brought in from upstream. In GPDB, these
  cases are handled.
- improve plans for NOT EXISTS sub-queries containing an aggregation
  without limits by creating a "false" one-time filter.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

77f804f5

Remove dead code around JoinExpr::subqfromlist. · f16deabd

由 Shreedhar Hardikar 提交于 9月 06, 2017

This was used to keep information about the subquery join tree for
pulled-up sublinks for use later in deconstruct_recurse(). With the
upstream subselect merge, a JoinExpr constructed at the pull-up time
itself, so this is no longer needed since the subquery join tree
information is available in the constructed JoinExpr.

Also with the merge, deconstruct_recurse() handles JOIN_SEMI JoinExprs.
However, since GPDB differs from upstream by treating SEMI joins as
INNER join for internal join planning, this commit also updates
inner_join_rels correctly for SEMI joins (see regression test).

Also remove unused function declaration for not_null_inner_vars().
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

f16deabd

Do not commute inner/outer rels of JOIN_ANTI and JOIN_LASJ_NOTIN · 70be2285

由 Shreedhar Hardikar 提交于 9月 07, 2017

This issue was discovered during the subselect merge. wherin planner
incorrectly commutes anti joins.
`cdb_add_subquery_join_paths()` creates join paths for (rel1, rel2) and
(rel2, rel1) for all join types including JOIN_ANTI and JOIN_LASJ_NOTIN.
This produces wrong results since these joins are order-sensitive w.r.t
inner and outer relations (see new regression tests). So, do not add
(rel2, rel1) for JOIN_ANTI and JOIN_LASJ_NOTIN.

This commit also refactors cdb_add_subquery_join_paths() and
make_join_rel() to make it easier to control the commuting.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

70be2285

Handle pending merge FIXMEs from merging · cc208986

由 Shreedhar Hardikar 提交于 9月 07, 2017

1. convert_IN_to_antijoin() should fail pull-up when left relids are not
   a subset of available_rels, otherwise we get wrong results. See
   regression tests in qp_correlated_query.sql.
2. convert_EXPR_to_join() is a GPDB-only function that already handles
   this case via ProcessSubqueryToJoin().
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

cc208986

Partial cherry-pick up of upstream commit 0dec322. · cdfc5616

由 Shreedhar Hardikar 提交于 9月 07, 2017

commit 0dec3226ee905f94d0b9d6e2f274e13bbcaf5370
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Jun 20 14:33:20 2011 -0400

    Fix thinko in previous patch for optimizing EXISTS-within-EXISTS.

    When recursing after an optimization in pull_up_sublinks_qual_recurse, the
    available_rels value passed down must include only the relations that are
    in the righthand side of the new SEMI or ANTI join; it's incorrect to pull
    up a sub-select that refers to other relations, as seen in the added test
    case.  Per report from BangarRaju Vadapalli.

NOTE: The second part of the upstream commit is not pulled in because that
produces inferior plans in GPDB by not pulling nested sublinks below NOT
EXISTS. That part is reverted later upstream in 9.2 anyway.

Also update regression tests.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

cdfc5616

Fix pull_up_sublinks' failure to handle nested pull-up opportunities · 40082bd2

由 Tom Lane 提交于 5月 02, 2011

commit f3f0f37068e06d01e88abbf3ed596664b139f7e2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon May 2 15:56:47 2011 -0400

Fix pull_up_sublinks' failure to handle nested pull-up opportunities.

After finding an EXISTS or ANY sub-select that can be converted to a
semi-join or anti-join, we should recurse into the body of the sub-select.
This allows cases such as EXISTS-within-EXISTS to be optimized properly.
The original coding would leave the lower sub-select as a SubLink, which
is no better and often worse than what we can do with a join. Per example
from Wayne Conrad.

Back-patch to 8.4. There is a related issue in older versions' handling
of pull_up_IN_clauses, but they're lame enough anyway about the whole area
that it seems not worth the extra work to try to fix.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

40082bd2

Fix mishandling of whole-row Vars referencing a view or sub-select · 385bb3cb

由 Tom Lane 提交于 6月 21, 2010

commit c4ac2ff765d9b68a3ff2a3461804489721770d06
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Jun 21 00:14:54 2010 +0000

    Fix mishandling of whole-row Vars referencing a view or sub-select.
    If such a Var appeared within a nested sub-select, we failed to translate it
    correctly during pullup of the view, because the recursive call to
    replace_rte_variables_mutator was looking for the wrong sublevels_up value.
    Bug was introduced during the addition of the PlaceHolderVar mechanism.
    Per bug #5514 from Marcos Castedo.

385bb3cb

Fix an oversight in convert_EXISTS_sublink_to_join · d3ff95a1

由 Dhanashree Kashid 提交于 8月 30, 2017

commit dcd647d7cf98e3393f919135f6e113e896781f60
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Jan 18 18:17:52 2010 +0000

    Fix an oversight in convert_EXISTS_sublink_to_join: we can't convert an
    EXISTS that contains a WITH clause.  This would usually lead to a
    "could not find CTE" error later in planning, because the WITH wouldn't
    get processed at all.  Noted while playing with an example from Ken Marshall.

d3ff95a1

Move exprType,exprTypmod,expression_tree_walker and related routines · e65f963b

由 Tom Lane 提交于 8月 25, 2008

  commit e5536e77
  Author: Tom Lane <tgl@sss.pgh.pa.us>
  Date:   Mon Aug 25 22:42:34 2008 +0000

      Move exprType(), exprTypmod(), expression_tree_walker(), and related routines
      into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside
      the backend.  There's probably more that should be done along this line,
      but this is a start anyway
Signed-off-by: NShreedhar Hardikar <shardikar@pivotal.io>

e65f963b