提交 · 44367278ae907976ba895bad5f5dc5696b4063ea · Greenplum / Gpdb

30 11月, 2017 1 次提交

Fix reversed flags to pull_up_clause(). · 44367278

由 Heikki Linnakangas 提交于 11月 30, 2017

Looks like you can't actually get here with any aggregates or placeholders
in the start/end offsets, or we would've gotten errors.

44367278

24 11月, 2017 7 次提交

Backport upstream comment updates · 122e817b

由 Heikki Linnakangas 提交于 10月 07, 2017

commit 96f990e2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Jul 13 20:23:09 2011 -0400

    Update some comments to clarify who does what in targetlist creation.

    No code changes; just avoid blaming query_planner for things it doesn't
    really do.

122e817b

Backport upstream bugfix related to Window functions. · 411a033c

由 Heikki Linnakangas 提交于 10月 07, 2017

The test case added to the regression suite actually seems to work on
GPDB even without this, but nevertheless seems like a good idea to pick
it now, since we have the code it affected. Also, I'm about to backport
more stuff that depend on this.

commit c1d9579d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue Jul 12 18:23:55 2011 -0400

Avoid listing ungrouped Vars in the targetlist of Agg-underneath-Window.

Regular aggregate functions in combination with, or within the arguments
of, window functions are OK per spec; they have the semantics that the
aggregate output rows are computed and then we run the window functions
over that row set. (Thus, this combination is not really useful unless
there's a GROUP BY so that more than one aggregate output row is possible.)
The case without GROUP BY could fail, as recently reported by Jeff Davis,
because sloppy construction of the Agg node's targetlist resulted in extra
references to possibly-ungrouped Vars appearing outside the aggregate
function calls themselves. See the added regression test case for an
example.

Fixing this requires modifying the API of flatten_tlist and its underlying
function pull_var_clause. I chose to make pull_var_clause's API for
aggregates identical to what it was already doing for placeholders, since
the useful behaviors turn out to be the same (error, report node as-is, or
recurse into it). I also tightened the error checking in this area a bit:
if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
that was a long time ago, so complain instead of ignoring them.

Backpatch into 9.1. The failure exists in 8.4 and 9.0 as well, but seeing
that it only occurs in a basically-useless corner case, it doesn't seem
worth the risks of changing a function API in a minor release. There might
be third-party code using pull_var_clause.

411a033c

Cherry-pick change to pull_var_clause() API. · bd3ab7bd

由 Heikki Linnakangas 提交于 10月 07, 2017

We would get this later in PostgreSQL 8.4, but I'm about to cherry-pick
more commits now, that depends on this.

Upstream commmit:

commit 1d97c19a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Apr 19 19:46:33 2009 +0000

    Fix estimate_num_groups() to not fail on PlaceHolderVars, per report from
    Stefan Kaltenbrunner.  The most reasonable behavior (at least for the near
    term) seems to be to ignore the PlaceHolderVar and examine its argument
    instead.  In support of this, change the API of pull_var_clause() to allow
    callers to request recursion into PlaceHolderVars.  Currently
    estimate_num_groups() is the only customer for that behavior, but where
    there's one there may be others.

bd3ab7bd

Re-implement RANGE PRECEDING/FOLLOWING. · 14a9108a

由 Heikki Linnakangas 提交于 9月 29, 2017

This is similar to the old implementation, in that we use "+", "-" to
compute the boundaries.

Unfortunately it seems unlikely that this would be accepted in the
upstream, but at least we have that feature back in GPDB now, the way it
used to be. See discussion on pgsql-hackers about that:
https://www.postgresql.org/message-id/26801.1265656635@sss.pgh.pa.us

14a9108a

Backport implementation of ORDER BY within aggregates, from PostgreSQL 9.0. · 4319b7bb

由 Heikki Linnakangas 提交于 9月 24, 2017

This is functionality that was lost by the ripout & replace.

commit 34d26872
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Dec 15 17:57:48 2009 +0000

    Support ORDER BY within aggregate function calls, at long last providing a
    non-kluge method for controlling the order in which values are fed to an
    aggregate function.  At the same time eliminate the old implementation
    restriction that DISTINCT was only supported for single-argument aggregates.

    Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
    dropped null values of x unconditionally.  Now, it does so only if the
    agg transition function is strict; otherwise nulls are treated as DISTINCT
    normally would, ie, you get one copy.

    Andrew Gierth, reviewed by Hitoshi Harada

4319b7bb

Remove PercentileExpr. · bb6a757e

由 Heikki Linnakangas 提交于 9月 22, 2017

This loses the functionality, and leaves all the regression tests that used
those functions failing.

The plan is to later backport the upstream implementation of those
functions from PostgreSQL 9.4. The feature is called "ordered set
aggregates" there.

bb6a757e

Wholesale rip out and replace Window planner and executor code · f62bd1c6

由 Heikki Linnakangas 提交于 11月 23, 2017

This adds some limitations, and removes some functionality that tte old
implementation had. These limitations will be lifted, and missing
functionality will be added back, in subsequent commits:

* You can no longer have variables in start/end offsets

* RANGE is not implemented (except for UNBOUNDED)

* If you have multiple window functions that require a different sort
  ordering, the planner is not smart about placing them in a way that
  minimizes the number of sorts.

This also lifts some limitations that the GPDB implementation had:

* LEAD/LAG offset can now be negative. In the qp_olap_windowerr, a lot of
  queries that used to throw an "ROWS parameter cannot be negative" error
  are now passing. That error was an artifact of the eay LEAD/LAG were
  implemented. Those queries contain window function calls like "LEAD(col1,
  col2 - col3)", and sometimes with suitable values in col2 and col3, the
  second argument went negative. That caused the error. implementation of
  LEAD/LAG is OK with a negative argument.

* Aggregate functions with no prelimfn or invprelimfn are now supported as
  window functions

* Window functions, e.g. rank(), no longer require an ORDER BY. (The output
  will vary from one invocation to another, though, because the order is
  then not well defined. This is more annoying on GPDB than on PostgreSQL,
  because in GDPB the row order tends to vary because the rows are spread
  out across the cluster and will arrive in the master in unpredictable
  order)

* NTILE doesn't require the argument expression to be in PARTITION BY

* A window function's arguments may contain references to an outer query.

This changes the OIDs of the built-in window functions to match upstream.
Unfortunately, the OIDs had been hard-coded in ORCA, so to work around that
until those hard-coded values are fixed in ORCA, the ORCA translator code
contains a hack to map the old OID to the new ones.

f62bd1c6

23 11月, 2017 2 次提交

Make 'must_gather' logic when planning DISTINCT and ORDER BY more robust. · a5610212

由 Heikki Linnakangas 提交于 11月 23, 2017

The old logic was:

1. Decide if we need to put a Gather motion on top of the plan
2. Add nodes to handle DISTINCT
3. Add nodes to handle ORDER BY.
4. Add Gather node, if we decided so in step 1.

If in step 1, if the result was already focused on a single segment, we
would make note that no Gather is needed, and not add one in step 4.
However, the DISTINCT processing might add a Redistribute Motion node, so
that the final result is not focused on a single node.

I couldn't come up with a query where that would happen, as the code stands,
but we saw such a case on the "window functions rewrite" branch we've been
working on. There, the sort order/distribution of the input can be changed
to process window functions. But even if this isn't actively broken right
now, it seems more robust to change the logic so that 'must_gather' means
'at the end, the result must end up on a single node', instead of 'we must
add a Gather node'. The test that this adds exercises this issue after the
the window functions rewrite, but right now it passes with or without these
code changes. But might as well add it now.

a5610212

Fix DISTINCT with window functions. · 898ced7c

由 Heikki Linnakangas 提交于 11月 23, 2017

The last 8.4 merge commit introduced support for DISTINCT with hashing,
and refactored the way grouping_planner() works with the path keys. That
broke DISTINCT with window functions, because the new distinct_pathkeys
field was not set correctly.

In commit 474f1db0, I moved some GPDB-added tests from the 'aggregates'
test, to a new 'gp_aggregates' test. But I forgot to add the new test file
to the test schedule, so it was not run. Oops. Add it to the schedule now.
The tests in 'gp_aggregates' cover this bug.

898ced7c

21 10月, 2017 1 次提交

Fix distribution of rows in CREATE TABLE AS and ORDER BY. · c159ec72

由 Heikki Linnakangas 提交于 10月 20, 2017

If a CREATE TABLE AS query contained an ORDER BY, the planner put a Motion
node on top of the plan that focuses all the rows to a single node.
However, that was confused with the re-distribute motion that CREATE TABLE
AS that is supposed to go to the top, to distribute the rows according to
the DISTRIBUTED BY of the table. This used to work before commit
7e268107, because we used to not add an explicit Motion node on top of
the plan for ORDER BY, but we just changed the sort-order information in
the Flow.

I have a nagging feeling that the apply_motion code isn't dealing with
Motion on top of a Motion node correctly, because I would've expected to
get a plan like that without this fix. Perhaps apply_motion silentlye
refuses to add a Motion node on top of an existing Motion? That'd be a
silly plan, of course, and the planner doesn't fortunately create such
plans, so I'm not going to dig deeper into that right now.

The test case is a simplified version from one of the
"mpp21090_drop_col_oids_dml_*" TINC tests. I noticed this while moving
those tests over from TINC to the main suite. We only run those tests
in the concourse pipeline with "set optimizer=on", so it didn't catch
this issue with optimizer=off.

Fixes github issue #3577.

c159ec72

13 10月, 2017 1 次提交

Remove superfluous pathkey canonicalization · 7913e231

由 Jesse Zhang 提交于 10月 12, 2017

`make_pathkeys_for_sortclauses` with a `true` last argument promises to
canonicalize the returned path keys. We somehow cargo-culted a few
unnecessary `canonicalize_pathkeys` immediately after those calls.

This commit removes such superfluous calls to `canonicalize_pathkeys`.
Signed-off-by: NMax Yang <myang@pivotal.io>

7913e231

12 10月, 2017 1 次提交
- H
  Fix planner bug in handling LIMIT without an ORDER BY. · d325233c
  由 Heikki Linnakangas 提交于 10月 12, 2017
```
This bug was introduced in commit 7e268107, which changed the way we
track the "current" ordering in the planner.
```
  d325233c
27 9月, 2017 8 次提交

Remove dead code around JoinExpr::subqfromlist. · f16deabd

由 Shreedhar Hardikar 提交于 9月 06, 2017

This was used to keep information about the subquery join tree for
pulled-up sublinks for use later in deconstruct_recurse(). With the
upstream subselect merge, a JoinExpr constructed at the pull-up time
itself, so this is no longer needed since the subquery join tree
information is available in the constructed JoinExpr.

Also with the merge, deconstruct_recurse() handles JOIN_SEMI JoinExprs.
However, since GPDB differs from upstream by treating SEMI joins as
INNER join for internal join planning, this commit also updates
inner_join_rels correctly for SEMI joins (see regression test).

Also remove unused function declaration for not_null_inner_vars().
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

f16deabd

Improve pull_up_subqueries logic w.r.t PlaceHolderVar · da29e67a

由 Ekta Khanna 提交于 7月 24, 2017

commit c59d8dd4
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Apr 28 21:31:16 2009 +0000

    Improve pull_up_subqueries logic so that it doesn't insert unnecessary
    PlaceHolderVar nodes in join quals appearing in or below the lowest
    outer join that could null the subquery being pulled up.  This improves
    the planner's ability to recognize constant join quals, and probably
    helps with detection of common sort keys (equivalence classes) as well.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

da29e67a

Refrain from creating the planner's placeholder_list · 695c9fdf

由 Ekta Khanna 提交于 7月 21, 2017

commit 31468d05
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed Oct 22 20:17:52 2008 +0000

Dept of better ideas: refrain from creating the planner's placeholder_list
until vars are distributed to rels during query_planner() startup. We don't
really need it before that, and not building it early has some advantages.
First, we don't need to put it through the various preprocessing steps, which
saves some cycles and eliminates the need for a number of routines to support
PlaceHolderInfo nodes at all. Second, this means one less unused plan for any
sub-SELECT appearing in a placeholder's expression, since we don't build
placeholder_list until after sublink expansion is complete.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

695c9fdf

Add a concept of "placeholder" variables to the planner · 2b5c8201

由 Bhuvnesh Chaudhary 提交于 7月 18, 2017

commit e6ae3b5d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue Oct 21 20:42:53 2008 +0000

Add a concept of "placeholder" variables to the planner. These are variables
that represent some expression that we desire to compute below the top level
of the plan, and then let that value "bubble up" as though it were a plain
Var (ie, a column value).

The immediate application is to allow sub-selects to be flattened even when
they are below an outer join and have non-nullable output expressions.
Formerly we couldn't flatten because such an expression wouldn't properly
go to NULL when evaluated above the outer join. Now, we wrap it in a
PlaceHolderVar and arrange for the actual evaluation to occur below the outer
join. When the resulting Var bubbles up through the join, it will be set to
NULL if necessary, yielding the correct results. This fixes a planner
limitation that's existed since 7.1.

In future we might want to use this mechanism to re-introduce some form of
Hellerstein's "expensive functions" optimization, ie place the evaluation of
an expensive function at the most suitable point in the plan tree.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

2b5c8201

Improve sublink pullup code to handle ANY/EXISTS sublinks · 1ddcb97e

由 Ekta Khanna 提交于 6月 01, 2017

commit 19e34b62
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Aug 17 01:20:00 2008 +0000

    Improve sublink pullup code to handle ANY/EXISTS sublinks that are at top
    level of a JOIN/ON clause, not only at top level of WHERE.  (However, we
    can't do this in an outer join's ON clause, unless the ANY/EXISTS refers
    only to the nullable side of the outer join, so that it can effectively
    be pushed down into the nullable side.)  Per request from Kevin Grittner.

    In passing, fix a bug in the initial implementation of EXISTS pullup:
    it would Assert if the EXIST's WHERE clause used a join alias variable.
    Since we haven't yet flattened join aliases when this transformation
    happens, it's necessary to include join relids in the computed set of
    RHS relids.

Ref [#142356521]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

1ddcb97e

Replace JOIN_LASJ by JOIN_ANTI · 6e7b4722

由 Ekta Khanna 提交于 5月 10, 2017

After merging with e006a24a, Anti Semi Join will
be denoted by `JOIN_ANTI` instead of `JOIN_LASJ`

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

6e7b4722

CDBlize the cherry-pick · 0feb1bd9

由 Ekta Khanna 提交于 5月 09, 2017

Original Flow:
cdb_flatten_sublinks
	+--> pull_up_IN_clauses
		+--> convert_sublink_to_join

New Flow:
cdb_flatten_sublinks
	+--> pull_up_sublinks

This commit contains relevant changes for the above flow.

Previously, `try_join_unique` was part of `InClauseInfo`. It was getting
set in `convert_IN_to_join()` and used in `cdb_make_rel_dedup_info()`.
Now, since `InClauseInfo` is not present and we construct
`FlattenedSublink` instead in `convert_ANY_sublink_to_join()`. And later
in the flow, we construct `SpecialJoinInfo` from `FlattenedSublink` in
`deconstruct_sublink_quals_to_rel()`. Hence, adding `try_join_unique` as
part of both `FlattenedSublink` and `SpecialJoinInfo`.

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

0feb1bd9

Implement SEMI and ANTI joins in the planner and executor. · fe2eb2c9

由 Ekta Khanna 提交于 5月 09, 2017

commit e006a24a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu Aug 14 18:48:00 2008 +0000

Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace
the old JOIN_IN code, but antijoins are new functionality.) Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins. Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation. That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch. So for the moment the output size estimates for semi and
especially anti joins are quite bogus.

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

fe2eb2c9

25 9月, 2017 1 次提交

Remove row order information from Flow. · 7e268107

由 Heikki Linnakangas 提交于 9月 25, 2017

A Motion node often needs to "merge" the incoming streams, to preserve the
overall sort order. Instead of carrying sort order information throughout
the later stages of planning, in the Flow struct, pass it as argument
directly to make_motion() and other functions, where a Motion node is
created. This simplifies things.

To make that work, we can no longer rely on apply_motion() to add the final
Motion on top of the plan, when the (sub-)query contains an ORDER BY. That's
because we no longer have that information available at apply_motion(). Add
the Motion node in grouping_planner() instead, where we still have that
information, as a path key.

When I started to work on this, this also fixed a bug, where the sortColIdx
of plan flow node may refer to wrong resno. A test case for that is
included. However, that case was since fixed by other coincidental changes
to partition elimination, so now this is just refactoring.

7e268107

21 9月, 2017 2 次提交

Fix CURRENT OF to work with PL/pgSQL cursors. · 91411ac4

由 Heikki Linnakangas 提交于 9月 21, 2017

It only worked for cursors declared with DECLARE CURSOR, before. You got
an "there is no parameter $0" error if you tried. This moves the decision
on whether a plan is "simply updatable", from the parser to the planner.
Doing it in the parser was awkward, because we only want to do it for
queries that are used in a cursor, and for SPI queries, we don't know it
at that time yet.

For some reason, the copy, out, read-functions of CurrentOfExpr were missing
the cursor_param field. While we're at it, reorder the code to match
upstream.

This only makes the required changes to the Postgres planner. ORCA has never
supported updatable cursors. In fact, it will fall back to the Postgres
planner on any DECLARE CURSOR command, so that's why the existing tests
have passed even with optimizer=off.

91411ac4

Add support for CREATE FUNCTION EXECUTE ON [MASTER | ALL SEGMENTS] · aa148d2a

由 Heikki Linnakangas 提交于 9月 20, 2017

We already had a hack for the EXECUTE ON ALL SEGMENTS case, by setting
prodataaccess='s'. This exposes the functionality to users via DDL, and adds
support for the EXECUTE ON MASTER case.

There was discussion on gpdb-dev about also supporting ON MASTER AND ALL
SEGMENTS, but that is not implemented yet. There is no handy "locus" in the
planner to represent that. There was also discussion about making a
gp_segment_id column implicitly available for functions, but that is also
not implemented yet.

The old behavior was that a function that if a function was marked as
IMMUTABLE, it could be executed anywhere. Otherwise it was always executed
on the master. For backwards-compatibility, this keeps that behavior for
EXECUTE ON ANY (the default), so even if a function is marked as EXECUTE ON
ANY, it will always be executed on the master unless it's IMMUTABLE.

There is no support for these new options in ORCA. Using any ON MASTER or
ON ALL SEGMENTS functions in a query cause ORCA to fall back. This is the
same as with the prodataaccess='s' hack that this replaces, but now that it
is more user-visible, it would be nice to teach ORCA about it.

The new options are only supported for set-returning functions, because for
a regular function marked as EXECUTE ON ALL SEGMENTS, it's not clear how
the results should be combined. ON MASTER would probably be doable, but
there's no need for that right now, so punt.

Another restriction is that a function with ON ALL SEGMENTS or ON MASTER can
only be used in the FROM clause, or in the target list of a simple SELECT
with no FROM clause. So "SELECT func()" is accepted, but "SELECT func() FROM
foo" is not. "SELECT * FROM func(), foo" works, however. EXECUTE ON ANY
functions, which is the default, work the same as before.

aa148d2a

17 9月, 2017 1 次提交

Convert WindowFrame to frameOptions + start + end · ebf9763c

由 Heikki Linnakangas 提交于 9月 17, 2017

In GPDB, we have so far used a WindowFrame struct to represent the start
and end window bound, in a ROWS/RANGE BETWEEN clause, while PostgreSQL
uses the combination of  a frameOptions bitmask and start and end
expressions. Refactor to replace the WindowFrame with the upstream
representation.

ebf9763c

12 9月, 2017 1 次提交

Split WindowSpec into separate before and after parse-analysis structs. · 789f443d

由 Heikki Linnakangas 提交于 9月 12, 2017

In the upstream, two different structs are used to represent a window
definition. WindowDef in the grammar, which is transformed into
WindowClause during parse analysis. In GPDB, we've been using the same
struct, WindowSpec, in both stages. Split it up, to match the upstream.

The representation of the window frame, i.e. "ROWS/RANGE BETWEEN ..." was
different between the upstream implementation and the GPDB one. We now use
the upstream frameOptions+startOffset+endOffset representation in raw
WindowDef parse node, but it's still converted to the WindowFrame
representation for the later stages, so WindowClause still uses that. I
will switch over the rest of the codebase to the upstream representation as
a separate patch.

Also, refactor WINDOW clause deparsing to be closer to upstream.

One notable difference is that the old WindowSpec.winspec field corresponds
to the winref field in WindowDef andWindowClause, except that the new
'winref' is 1-based, while the old field was 0-based.

Another noteworthy thing is that this forbids specifying "OVER (w
ROWS/RANGE BETWEEN ...", if the window "w" already specified a window frame,
i.e. a different ROWS/RANGE BETWEEN. There was one such case in the
regression suite, in window_views, and this updates the expected output of
that to be an error.

789f443d

06 9月, 2017 1 次提交

Ensure that stable functions in a prepared statement are re-evaluated. · ccca0af2

由 Heikki Linnakangas 提交于 9月 06, 2017

If a prepared statement, or a cached plan for an SPI query e.g. from a
PL/pgSQL function, contains stable functions, the stable functions were
incorrectly evaluated only once at plan time, instead of on every execution
of the plan. This happened to not be a problem in queries that contain any
parameters, because in GPDB, they are re-planned on every invocation
anyway, but non-parameter queries were broken.

In the planner, before this commit, when simplifying expressions, we set
the transform_stable_funcs flag to true for every query, and evaluated all
stable functions at planning time. Change it to false, and also rename it
back to 'estimate', as it's called in the upstream. That flag was changed
back in 2010, in order to allow partition pruning to work with qual
containing stable functions, like TO_DATE. I think back then, we always
re-planned every query, so that was OK, but we do cache plans now.

To avoid regressing to worse plans, change eval_const_expressions() so that
it still does evaluate stable functions, even when the 'estimate' flag is
off. But when it does so, mark the plan as "one-off", meaning that it must
be re-planned on every execution. That gives the old, intended, behavior,
that such plans are indeed re-planned, but it still allows plans that don't
use stable functions to be cached.

This seems to fix github issue #2661. Looking at the direct dispatch code
in apply_motion(), I suspect there are more issues like this lurking there.
There's a call to planner_make_plan_constant(), modifying the target list
in place, and that happens during planning. But this at least fixes the
non-direct dispatch cases, and is a necessary step for fixing any remaining
issues.

For some reason, the query now gets planned *twice* for every invocation.
That's not ideal, but it was an existing issue for prepared statements with
parameters, already. So let's deal with that separately.

ccca0af2

04 9月, 2017 1 次提交

Replace redundant functions with contain_window_function() from PG 8.4 · 232ecfc3

由 Heikki Linnakangas 提交于 9月 04, 2017

We don't need two different functions to check whether an expression
contains a window function. Replace both with the variant used in
the upstream, contain_window_function().

232ecfc3

01 9月, 2017 1 次提交

Fix Copyright and file headers across the tree · ed7414ee

由 Daniel Gustafsson 提交于 9月 01, 2017

This bumps the copyright years to the appropriate years after not
having been updated for some time. Also reformats existing code
headers to match the upstream style to ensure consistency.

ed7414ee

21 8月, 2017 1 次提交

Move ORCA invocation into standard_planner · d5dbbfd9

由 Daniel Gustafsson 提交于 8月 21, 2017

The way ORCA was tied into the planner, running a planner_hook
was not supported in the intended way. This commit moves ORCA
into standard_planner() instead of planner() and leaves the hook
for extensions to make use of, with or without ORCA. Since the
intention with the optimizer GUC is to replace the planner in
postgres, while keeping the planning proess, this allows for
planner extensions to co-operate with that.

In order to reduce the Greenplum footprint in upstream postgres
source files for future merges, the ORCA functions are moved to
their own file.

Also adds a memaccounting class for planner hooks since they
otherwise ran in the planner scope, as well as a test for using
planner_hooks.

d5dbbfd9

09 8月, 2017 1 次提交
- H
  
  Remove unused function. · 9637bb1d
  由 Heikki Linnakangas 提交于 8月 08, 2017
  
  9637bb1d
01 8月, 2017 1 次提交

Choose segment randomly to server as singleton reader gang · 06f56fe8

由 Pengzhou Tang 提交于 7月 19, 2017

This is a typo issue that cause segment 0 was always assigned
as singleton reader. It existed for a long time with no
functional issue, but may result in performance issue somehow.

Beside, root->config->cdbpath_segments is tuneable by GUC
gp_segments_for_planner, so gp_singleton_segindex may point to
an invalid segment, we use real segment count instead to avoid
mismatch.

06f56fe8

17 6月, 2017 1 次提交

Merge 8.4 CTE (sans recursive) · 41c3b698

由 Foyzur Rahman, Haisheng Yuan, Jesse Zhang, Kavinder Dhaliwal, Karthikeyan Jambu Rajaraman and Shreedhar Hardikar 提交于 6月 12, 2017

This brought in postgres/postgres@44d5be0 pretty much wholesale, except:

1. We leave `WITH RECURSIVE` for a later commit. The code is brought in,
    but kept dormant by us bailing early at the parser whenever there is
    a recursive CTE.
2. We use `ShareInputScan` in the stead of `CteScan`. ShareInputScan is
    basically the parallel-capable `CteScan`. (See `set_cte_pathlist`
    and `create_ctescan_plan`)
3. Consequently we do not put the sub-plan for the CTE in a
    pseudo-initplan: it is directly present in the main plan tree
    instead, hence we disable `SS_process_ctes` inside
    `subquery_planner`
4. Another corollary is that all new operators (`CteScan`,
    `RecursiveUnion`, and `WorkTableScan`) are dead code right now. But
    they will come to live once we bring in parallel implementation of
    `WITH RECURSIVE`

In general this commit reduces the divergence between Greenplum and
upstream.

User visible changes:
The merge in parser enables a corner case previously treated as error:
you can now specify fewer columns in your `WITH` clause than the actual
projected columns in the body subquery of the `WITH`.

Original commit message:

> Implement SQL-standard WITH clauses, including WITH RECURSIVE.
>
> There are some unimplemented aspects: recursive queries must use UNION ALL
> (should allow UNION too), and we don't have SEARCH or CYCLE clauses.
> These might or might not get done for 8.4, but even without them it's a
> pretty useful feature.
>
> There are also a couple of small loose ends and definitional quibbles,
> which I'll send a memo about to pgsql-hackers shortly.  But let's land
> the patch now so we can get on with other development.
>
> Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
>

(cherry picked from commit 44d5be0e)

41c3b698

01 6月, 2017 1 次提交

Fixup subplans referring to same plan_id · d0aea184

由 Bhuvnesh Chaudhary 提交于 5月 24, 2017

Before parallelization on nodes in cdbparallelize if there are
any subplan nodes in the plan which refer to the same plan_id,
parallelization step breaks as a node must be processed only
once by it. This patch fixes the issue by generating a new
subplan node in glob subplans, and updating the plan_id of the
subplan to refer to the newly created node.

d0aea184

11 5月, 2017 1 次提交

Remove unused subplans · 574d352a

由 Haisheng Yuan 提交于 5月 07, 2017

As of commit c7ff7663, subplans are initialized even they are unused. In some
case, the generated subplan is not used, but it has motionId = 0, which breaks
executor's sanity check. This patch fixes the issue by removing unused
subplans.

See discussions at:
https://github.com/greenplum-db/gpdb/issues/2383

574d352a

26 4月, 2017 1 次提交

Add expansion support to HHashTable to optimize HashAgg · 3bb360de

由 Shreedhar Hardikar 提交于 4月 05, 2017

When creating HHashTable, instead of using the available memory as the
sole basis to determine the number of buckets, it now computes nbuckets
as a function of estimated groups/entries given by the planner. To
prevent performance degradation when the statistics are off, the
hash table expands by doubling the number of buckets and rehashing all
the entries until it is out of memory.
If more space is needed, HHashTable spills to disk as before, but it can
now accurately allocate buckets when the spill files are reloaded based
on the number of entries spilled.

This commit also makes other minor fixes:
  - Change calcHashAggTableSizes() signature to make it reusable
  - Keep track of in-memory entries in the HT
  - Add tests for when it overflows multiple times
  - Estimate the overhead per entry in the hash table more acurately
  - Refactor statistics collection for EXPLAIN ANALZYE

3bb360de

14 4月, 2017 1 次提交

Cherry-pick upstream commit "SS_finalize_plan" · 4697811d

由 Dhanashree Kashid and Jemish Patel 提交于 4月 12, 2017

This is a cherry-pick of following upstream commit:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Jul 10 02:14:03 2008 +0000

    Tighten up SS_finalize_plan's computation of valid_params to exclude Params of
    the current query level that aren't in fact output parameters of the current
    initPlans.  (This means, for example, output parameters of regular subplans.)
    To make this work correctly for output parameters coming from sibling
    initplans requires rejiggering the API of SS_finalize_plan just a bit:
    we need the siblings to be visible to it, rather than hidden as
    SS_make_initplan_from_plan had been doing.  This is really part of my response
    to bug #4290, but I concluded this part probably shouldn't be back-patched,
    since all that it's doing is to make a debugging cross-check tighter.

    (cherry picked from commit eaf1b5d3)

4697811d

01 4月, 2017 1 次提交

Use PartitionSelectors for partition elimination, even without ORCA. · e378d84b

由 Heikki Linnakangas 提交于 4月 01, 2017

The old mechanism was to scan the complete plan, searching for a pattern
with a Join, where the outer side included an Append node. The inner
side was duplicated into an InitPlan, with the pg_partition_oid aggregate
to collect the Oids of all the partitions that can match. That was
inefficient and broken: if the duplicated plan was volatile, you might
choose wrong partitions. And scanning the inner side twice can obviously
be slow, if there are a lot of tuples.

Rewrite the way such plans are generated. Instead of using an InitPlan,
inject a PartitionSelector node into the inner side of the join.

Fixes github issues #2100 and #2116.

e378d84b

31 3月, 2017 1 次提交

Remove unused Append.hasXslice fields and code to set it. · 068f0d53

由 Heikki Linnakangas 提交于 3月 31, 2017

It was added back in 2010, as part of a patch to:

commit d0ca3d8a4333db510ac68145c30ff917626d2037
Author: kentj <a@b>
Date:   Mon Jan 25 13:50:14 2010 -0800

    MPP-7734: initialize executor nodes for the active slice instead of the
    whole tree. Postponding the initialization of the subplans of an Append
    node to the time when it is processed.
    Send init gpmon packages for the subnodes of the Append node even
    though we don't initialize them.

    [git-p4: depot-paths = "//cdb2/main/": change = 43428]

However, it was reverted only a few weeks later:

commit 5cb7e64a093dfcc1bcbdca5ed74c261e9c56d3a3
Author: kentj <a@b>
Date:   Tue Feb 16 16:40:49 2010 -0800

    MPP-8031, MPP-7734: revert back the Append changes, since it breaks the
    explain analyze. It is hard to fix the explain analyze with the existing
    Append changes. It needs some more thinking.

    [git-p4: depot-paths = "//cdb2/main/": change = 45078]

The revert removed all use of the flag, but left the flag behind. Remove
it.

068f0d53

02 3月, 2017 1 次提交

Add a GUC, to produce a message at INFO level when ORCA falls back. · fff8e621

由 Heikki Linnakangas 提交于 3月 01, 2017

We have a bunch of existing tests that test whether ORCA falls back. They
set optimizer_log_failure='all' and client_min_messages='log', and grep
the output for "Planner" or "Planner produced plan". That's error-prone.

This new GUC makes that kind of tests easier and more robust. You can
simply set the GUC, and if there are no extra INFO messages in the output,
ORCA didn't fall back.

fff8e621