提交 · b8545d57a5d85b434ab8a797888db81a008bb84f · Greenplum / Gpdb

31 8月, 2018 1 次提交

Rename "prelim function" to "combine function", to match upstream. · b8545d57

由 Heikki Linnakangas 提交于 8月 31, 2018

The GPDB "prelim" functions did the same things as the "combine"
functions introduced in PostgreSQL 9.6 This commit includes just the
catalog changes, to essentially search & replace "prelim" with
"combine". I did not pick the planner and executor changes that were
made as part of this in the upstream, yet.

Also replace the GPDB implementation of float8_amalg() and
float8_regr_amalg(), with the upstream float8_combine() and
float8_regr_combine(). They do the same thing, but let's use upstream
functions where possible.

Upstream commits:
commit a7de3dc5
Author: Robert Haas <rhaas@postgresql.org>
Date:   Wed Jan 20 13:46:50 2016 -0500

    Support multi-stage aggregation.

    Aggregate nodes now have two new modes: a "partial" mode where they
    output the unfinalized transition state, and a "finalize" mode where
    they accept unfinalized transition states rather than individual
    values as input.

    These new modes are not used anywhere yet, but they will be necessary
    for parallel aggregation.  The infrastructure also figures to be
    useful for cases where we want to aggregate local data and remote
    data via the FDW interface, and want to bring back partial aggregates
    from the remote side that can then be combined with locally generated
    partial aggregates to produce the final value.  It may also be useful
    even when neither FDWs nor parallelism are in play, as explained in
    the comments in nodeAgg.c.

    David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
    Linnakangas, Haribabu Kommi, and me.

commit af025eed
Author: Robert Haas <rhaas@postgresql.org>
Date:   Fri Apr 8 13:44:50 2016 -0400

    Add combine functions for various floating-point aggregates.

    This allows parallel aggregation to use them.  It may seem surprising
    that we use float8_combine for both float4_accum and float8_accum
    transition functions, but that's because those functions differ only
    in the type of the non-transition-state argument.

    Haribabu Kommi, reviewed by David Rowley and Tomas Vondra

b8545d57

03 8月, 2018 1 次提交
- K
  Revert "Merge with PostgreSQL 9.2beta2." · e0aa3ef2
  由 Karen Huddleston 提交于 8月 02, 2018
```
This reverts commit 4750e1b6.
```
  e0aa3ef2
02 8月, 2018 1 次提交

Merge with PostgreSQL 9.2beta2. · 4750e1b6

由 Richard Guo 提交于 8月 02, 2018

This is the final batch of commits from PostgreSQL 9.2 development,
up to the point where the REL9_2_STABLE branch was created, and 9.3
development started on the PostgreSQL master branch.

Notable upstream changes:

* Index-only scan was included in the batch of upstream commits. It
  allows queries to retrieve data only from indexes, avoiding heap access.

* Group commit was added to work effectively under heavy load. Previously,
  batching of commits became ineffective as the write workload increased,
  because of internal lock contention.

* A new fast-path lock mechanism was added to reduce the overhead of
  taking and releasing certain types of locks which are taken and released
  very frequently but rarely conflict.

* The new "parameterized path" mechanism was added. It allows inner index
  scans to use values from relations that are more than one join level up
  from the scan. This can greatly improve performance in situations where
  semantic restrictions (such as outer joins) limit the allowed join orderings.

* SP-GiST (Space-Partitioned GiST) index access method was added to support
  unbalanced partitioned search structures. For suitable problems, SP-GiST can
  be faster than GiST in both index build time and search time.

* Checkpoints now are performed by a dedicated background process. Formerly
  the background writer did both dirty-page writing and checkpointing. Separating
  this into two processes allows each goal to be accomplished more predictably.

* Custom plan was supported for specific parameter values even when using
  prepared statements.

* API for FDW was improved to provide multiple access "paths" for their tables,
  allowing more flexibility in join planning.

* Security_barrier option was added for views to prevents optimizations that
  might allow view-protected data to be exposed to users.

* Range data type was added to store a lower and upper bound belonging to its
  base data type.

* CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
  SELECT query is planned during the execution of the utility. To conform to
  this change, GPDB executes the utility statement only on QD and dispatches
  the plan of the SELECT query to QEs.
Co-authored-by: NAdam Lee <ali@pivotal.io>
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Co-authored-by: NGang Xiong <gxiong@pivotal.io>
Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
Co-authored-by: NPaul Guo <paulguo@gmail.com>
Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

4750e1b6

09 7月, 2018 1 次提交

Use a penalty cost to implement enable_* planner GUCs, like in upstream. · 8fcd3fdd

由 Heikki Linnakangas 提交于 2月 05, 2018

Instead of completely disabling the generation of Paths with disabled
plan types, add a high penalty to their cost estimates, like in the
upstream. This reduces our diff vs. upstream, making future merges more
straightforward.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/Az2cDcqf73g/_tY6Yv1kBgAJCo-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
Reviewed-by: NRichard Guo <riguo@pivotal.io>

8fcd3fdd

12 5月, 2018 1 次提交
- A
  
  Use IS_QUERY_DISPATCHER() wherever relevant. · fa511cab
  由 Ashwin Agrawal 提交于 5月 10, 2018
  
  fa511cab
03 5月, 2018 1 次提交

Add Global Deadlock Detector. · 03915d65

由 Zhenghua Lyu 提交于 5月 03, 2018

To prevent distributed deadlock, in Greenplum DB an exclusive table lock is
held for UPDATE and DELETE commands, so concurrent updates the same table are
actually disabled.

We add a backend process to do global deadlock detect so that we do not lock
the whole table while doing UPDATE/DELETE and this will help improve the
concurrency of Greenplum DB.

The core idea of the algorithm is to divide lock into two types:

- Persistent: the lock can only be released after the transaction is over(abort/commit)
- Otherwise cases

This PR’s implementation adds a persistent flag in the LOCK, and the set rule is:

- Xid lock is always persistent
- Tuple lock is never persistent
- Relation is persistent if it has been closed with NoLock parameter, otherwise
  it is not persistent Other types of locks are not persistent

More details please refer the code and README.

There are several known issues to pay attention to:

- This PR’s implementation only cares about the locks can be shown
  in the view pg_locks.
- This PR’s implementation does not support AO table. We keep upgrading
  the locks for AO table.
- This PR’s implementation does not take networking wait into account.
  Thus we cannot detect the deadlock of GitHub issue #2837.
- SELECT FOR UPDATE still lock the whole table.
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
Co-authored-by: NNing Yu <nyu@pivotal.io>

03915d65

02 5月, 2018 1 次提交

Re-enable MIN/MAX optimization. · 362fc756

由 Heikki Linnakangas 提交于 3月 28, 2018

I'm not sure why it's been disabled. It's not very hard to make it work, so
let's do it. Might not be a very common query type, but if you happen to
have a query where it helps, it helps a lot.

This adds a GUC, gp_enable_minmax_optimization, to enable/disable the
optimization. There's no such GUC in upstream, but we need at least a flag
in PlannerConfig for it, so that we can disable the optimization for
correlated subqueries, along with some other optimizer tricks. Seems best
to also have a GUC for it, for consistency with other flags in
PlannerConfig.

362fc756

29 3月, 2018 2 次提交

Support replicated table in GPDB · 7efe3204

由 Pengzhou Tang 提交于 1月 29, 2018

* Support replicated table in GPDB

Currently, tables are distributed across all segments by hash or random in GPDB. There
are requirements to introduce a new table type that all segments have the duplicate
and full table data called replicated table.

To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark
a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify
the distribution of tuples of a replicated table.  CdbLocusType_SegmentGeneral implies
data is generally available on all segments but not available on qDisp, so plan node with
this locus type can be flexibly planned to execute on either single QE or all QEs. it is
similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral
node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion
on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other
rel has bottleneck locus type, a problem is such motion may be redundant if the single QE
is not promoted to executed on qDisp finally, so we need to detect such case and omit the
redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since
it's always implies a broadcast motion bellow, it's not easy to plan such node as direct
dispatch to avoid getting duplicate data.

We don't support replicated table with inherit/partition by clause now, the main problem is
that update/delete on multiple result relations can't work correctly now, we can fix this
later.

* Allow spi_* to access replicated table on QE

Previously, GPDB didn't allow QE to access non-catalog table because the
data is incomplete,
we can remove this limitation now if it only accesses replicated table.

One problem is QE need to know if a table is replicated table,
previously, QE didn't maintain
the gp_distribution_policy catalog, so we need to pass policy info to QE
for replicated table.

* Change schema of gp_distribution_policy to identify replicated table

Previously, we used a magic number -128 in gp_distribution_policy table
to identify replicated table which is quite a hack, so we add a new column
in gp_distribution_policy to identify replicated table and partitioned
table.

This commit also abandon the old way that used 1-length-NULL list and
2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED
FULLY clause.

Beside, this commit refactor the code to make the decision-making of
distribution policy more clear.

* support COPY for replicated table

* Disable row ctid unique path for replicated table.
  Previously, GPDB use a special Unique path on rowid to address queries
  like "x IN (subquery)", For example:
  select * from t1 where t1.c2 in (select c2 from t3), the plan looks
  like:
   ->  HashAggregate
         Group By: t1.ctid, t1.gp_segment_id
            ->  Hash Join
                  Hash Cond: t2.c2 = t1.c2
                ->  Seq Scan on t2
                ->  Hash
                    ->  Seq Scan on t1

  Obviously, the plan is wrong if t1 is a replicated table because ctid
  + gp_segment_id can't identify a tuple, in replicated table, a logical
  row may have different ctid and gp_segment_id. So we disable such plan
  for replicated table temporarily, it's not the best way because rowid
  unique way maybe the cheapest plan than normal hash semi join, so
  we left a FIXME for later optimization.

* ORCA related fix
  Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io>
  Fallback to legacy query optimizer for queries over replicated table

* Adapt pg_dump/gpcheckcat to replicated table
  gp_distribution_policy is no longer a master-only catalog, do
  same check as other catalogs.

* Support gpexpand on replicated table && alter the dist policy of replicated table

7efe3204

Remove FIXME about group_id in Distinct HashAgg · 2b25c663

由 Dhanashree Kashid 提交于 3月 22, 2018

With the 8.4 merge, planner considers using HashAgg to implement
DISTINCT. At the end of planning, we replace the expressions in the
targetlist of certain operators (including Agg) into OUTER references
in targetlist of its lefttree (see set_plan_refs() >
set_upper_references()).
But, as per the code, in the case when grouping() or group_id() are
present in the target list of Agg, it skips the replacement and this is
problematic in case the Agg is implementing DISTINCT.

It seems that the Agg's targetlist need not compute grouping() or
group_id() when its lefttree is computing it. In that case, it may
simply refer to it. This would then also apply to other operators
WindowAgg, Result & PartitionSelector.

However, the Repeat node needs to compute these functions at each stage
because group_id is derived from RepeatState::repeat_count. Thus, it
connot be replaced by an OUTER reference.

Hence, this commit removes the special case for these functions for all
operators except Repeat. Then, a DISTINCT HashAgg produces the correct
results.
Signed-off-by: NShreedhar Hardikar <shardikar@pivotal.io>

2b25c663

16 3月, 2018 1 次提交

Remove GPDB_84_MERGE_FIXME in planner.c and prepunion.c · 74546663

由 Shreedhar Hardikar 提交于 3月 12, 2018

These were related to chosing the right arguments to send to GPDB's
make_agg() and cost_agg() methods for queries containing DISTINCT or set
operations.

Hash aggregation when used to implement a DISTINCT (in either form) in
the query is not related to grouping sets and thus the argments to
num_nullcols, input_grouping, grouping and rollup_gs_times should be 0.

However, since SetOp uses the upstream TupleHashTable while HashAgg uses
GPDB's HHashTable implementation, the hash table size calculations
should be computed differently. This is fixed in this commit
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

74546663

09 2月, 2018 1 次提交

Refactor the way Semi-Joins plans are constructed. · d4ce0921

由 Heikki Linnakangas 提交于 2月 09, 2018

This removes much of the GPDB machinery to handle "deduplication paths"
within the planner. We will now use the upstream code to build JOIN_SEMI
paths, as well as paths where the outer side of the join is first
deduplicated (JOIN_UNIQUE_OUTER/INNER).

The old style "join first and deduplicate later" plans can be better in
some cases, however. To still be able to generate such plan, add new
JOIN_DEDUP_SEMI join type, which is transformed into JOIN_INNER followed
by the deduplication step after the join, during planning.

This new way of constructing these plans is simpler, and allows removing
a bunch of code, and reverting some more code to the way it is in the
upstream.

I'm not sure if this can generate the same plans that the old code could,
in all cases. In particular, I think the old "late deduplication"
mechanism could delay the deduplication further, all the way to the top of
the join tree. I'm not sure when that woud be useful, though, and the
regression suite doesn't seem to contain any such cases (with EXPLAIN). Or
maybe I misunderstood the old code. In any case, I think this is good
enough.

d4ce0921

02 2月, 2018 1 次提交

Remove extra planner pass to remove "trivial" Result nodes. · c613cabf

由 Heikki Linnakangas 提交于 2月 02, 2018

Instead, avoid creating such Result nodes in the first place, by making
plan_pushdown_tlist() check if the Result node would have any work to do.

With this, you get Result nodes in some cases where the old code could zap
it away. But on the other hand, this can avoid inserting Result nodes, not
only on top of Appends, but on top of any node. This can be seen in the
included expected output changes: some test queries lose a Result, some
gain one. So performance-wise this is about a wash, but this is simpler.

The reason to do this right now is that we ran into issues with the
"zapping" code while working on the 9.0 merge. I'm sure we could fix those
issues, but let's do this rather than spend time debugging and fixing the
zapping code with the merge.

c613cabf

13 12月, 2017 4 次提交

Reword comment to avoid nested comments · 8105f067

由 Daniel Gustafsson 提交于 12月 13, 2017

The comment added in 916f460f created a nested comment structure
by accident, which triggered a warning in clang for -Wcomment. Reword
the comment slightly to make the compiler happy.

planner.c:194:15: warning: '/*' within block comment [-Wcomment]
         * support pl/* statements (relevant when they are planned on the segments).
                     ^

8105f067

Fix storage test failures caused by · 0d3ae2a0

由 Shreedhar Hardikar 提交于 12月 12, 2017

The default value of Gp_role is set to GP_ROLE_DISPATCH. Which means
auxiliary processes inherit this value. FileRep does the same, but also
executes queries using SPI on the segment. Which means Gp_role ==
GP_ROLE_DISPATCH is not a sufficient check for master QD.

So, bring back the check on GpIdentity.

Author: Asim R P <apraveen@pivotal.io>
Author: Shreedhar Hardikar <shardikar@pivotal.io>

0d3ae2a0

Rename querytree_safe_for_segment to querytree_safe_for_qe · 32f099fd

由 Shreedhar Hardikar 提交于 12月 08, 2017

The original name was deceptive because this check is also done for QE
slices that run on master. For example:

EXPLAIN SELECT * FROM func1_nosql_vol(5), foo;

                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=0.30..1.37 rows=4 width=12)
   ->  Nested Loop  (cost=0.30..1.37 rows=2 width=12)
         ->  Seq Scan on foo  (cost=0.00..1.01 rows=1 width=8)
         ->  Materialize  (cost=0.30..0.33 rows=1 width=4)
               ->  Broadcast Motion 1:3  (slice1)  (cost=0.00..0.30 rows=3 width=4)
                     ->  Function Scan on func1_nosql_vol  (cost=0.00..0.26 rows=1 width=4)
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(8 rows)

Note that in the plan, the function func1_nosql_vol() will be executed on a
master slice with Gp_role as GP_ROLE_EXECUTE.

Also, update output files
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>

32f099fd

Ensure that ORCA is not called on any process other than the master QD · 916f460f

由 Shreedhar Hardikar 提交于 12月 08, 2017

We don't want to use the optimizer for planning queries in SQL, pl/pgSQL
etc. functions when that is done on the segments.

ORCA excels in complex queries, most of which will access distributed
tables. We can't run such queries from the segments slices anyway
because they require dispatching a query within another - which is not
allowed in GPDB. Note that this restriction also applies to non-QD
master slices.  Furthermore, ORCA doesn't currently support pl/*
statements (relevant when they are planned on the segments).

For these reasons, restrict to using ORCA on the master QD processes
only.

Also revert commit d79a2c7f ("Fix pipeline failures caused by 0dfd0ebc.")
and separate out gporca fault injector tests in newly added
gporca_faults.sql so that the rest can run in a parallel group.
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>

916f460f

12 12月, 2017 1 次提交

Replace usage of deprecated error codes · fd0a1b75

由 Daniel Gustafsson 提交于 12月 12, 2017

These error codes were marked as deprecated in September 2007 but
the code didn't get the memo. Extend the deprecation into the code
and actually replace the usage. Ten years seems long enough notice
so also remove the renames, the odds of anyone using these in code
which compiles against a 6X tree should be low (and easily fixed).

fd0a1b75

30 11月, 2017 1 次提交

Fix reversed flags to pull_up_clause(). · 44367278

由 Heikki Linnakangas 提交于 11月 30, 2017

Looks like you can't actually get here with any aggregates or placeholders
in the start/end offsets, or we would've gotten errors.

44367278

24 11月, 2017 7 次提交

Backport upstream comment updates · 122e817b

由 Heikki Linnakangas 提交于 10月 07, 2017

commit 96f990e2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Jul 13 20:23:09 2011 -0400

    Update some comments to clarify who does what in targetlist creation.

    No code changes; just avoid blaming query_planner for things it doesn't
    really do.

122e817b

Backport upstream bugfix related to Window functions. · 411a033c

由 Heikki Linnakangas 提交于 10月 07, 2017

The test case added to the regression suite actually seems to work on
GPDB even without this, but nevertheless seems like a good idea to pick
it now, since we have the code it affected. Also, I'm about to backport
more stuff that depend on this.

commit c1d9579d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue Jul 12 18:23:55 2011 -0400

Avoid listing ungrouped Vars in the targetlist of Agg-underneath-Window.

Regular aggregate functions in combination with, or within the arguments
of, window functions are OK per spec; they have the semantics that the
aggregate output rows are computed and then we run the window functions
over that row set. (Thus, this combination is not really useful unless
there's a GROUP BY so that more than one aggregate output row is possible.)
The case without GROUP BY could fail, as recently reported by Jeff Davis,
because sloppy construction of the Agg node's targetlist resulted in extra
references to possibly-ungrouped Vars appearing outside the aggregate
function calls themselves. See the added regression test case for an
example.

Fixing this requires modifying the API of flatten_tlist and its underlying
function pull_var_clause. I chose to make pull_var_clause's API for
aggregates identical to what it was already doing for placeholders, since
the useful behaviors turn out to be the same (error, report node as-is, or
recurse into it). I also tightened the error checking in this area a bit:
if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
that was a long time ago, so complain instead of ignoring them.

Backpatch into 9.1. The failure exists in 8.4 and 9.0 as well, but seeing
that it only occurs in a basically-useless corner case, it doesn't seem
worth the risks of changing a function API in a minor release. There might
be third-party code using pull_var_clause.

411a033c

Cherry-pick change to pull_var_clause() API. · bd3ab7bd

由 Heikki Linnakangas 提交于 10月 07, 2017

We would get this later in PostgreSQL 8.4, but I'm about to cherry-pick
more commits now, that depends on this.

Upstream commmit:

commit 1d97c19a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Apr 19 19:46:33 2009 +0000

    Fix estimate_num_groups() to not fail on PlaceHolderVars, per report from
    Stefan Kaltenbrunner.  The most reasonable behavior (at least for the near
    term) seems to be to ignore the PlaceHolderVar and examine its argument
    instead.  In support of this, change the API of pull_var_clause() to allow
    callers to request recursion into PlaceHolderVars.  Currently
    estimate_num_groups() is the only customer for that behavior, but where
    there's one there may be others.

bd3ab7bd

Re-implement RANGE PRECEDING/FOLLOWING. · 14a9108a

由 Heikki Linnakangas 提交于 9月 29, 2017

This is similar to the old implementation, in that we use "+", "-" to
compute the boundaries.

Unfortunately it seems unlikely that this would be accepted in the
upstream, but at least we have that feature back in GPDB now, the way it
used to be. See discussion on pgsql-hackers about that:
https://www.postgresql.org/message-id/26801.1265656635@sss.pgh.pa.us

14a9108a

Backport implementation of ORDER BY within aggregates, from PostgreSQL 9.0. · 4319b7bb

由 Heikki Linnakangas 提交于 9月 24, 2017

This is functionality that was lost by the ripout & replace.

commit 34d26872
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Dec 15 17:57:48 2009 +0000

    Support ORDER BY within aggregate function calls, at long last providing a
    non-kluge method for controlling the order in which values are fed to an
    aggregate function.  At the same time eliminate the old implementation
    restriction that DISTINCT was only supported for single-argument aggregates.

    Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
    dropped null values of x unconditionally.  Now, it does so only if the
    agg transition function is strict; otherwise nulls are treated as DISTINCT
    normally would, ie, you get one copy.

    Andrew Gierth, reviewed by Hitoshi Harada

4319b7bb

Remove PercentileExpr. · bb6a757e

由 Heikki Linnakangas 提交于 9月 22, 2017

This loses the functionality, and leaves all the regression tests that used
those functions failing.

The plan is to later backport the upstream implementation of those
functions from PostgreSQL 9.4. The feature is called "ordered set
aggregates" there.

bb6a757e

Wholesale rip out and replace Window planner and executor code · f62bd1c6

由 Heikki Linnakangas 提交于 11月 23, 2017

This adds some limitations, and removes some functionality that tte old
implementation had. These limitations will be lifted, and missing
functionality will be added back, in subsequent commits:

* You can no longer have variables in start/end offsets

* RANGE is not implemented (except for UNBOUNDED)

* If you have multiple window functions that require a different sort
  ordering, the planner is not smart about placing them in a way that
  minimizes the number of sorts.

This also lifts some limitations that the GPDB implementation had:

* LEAD/LAG offset can now be negative. In the qp_olap_windowerr, a lot of
  queries that used to throw an "ROWS parameter cannot be negative" error
  are now passing. That error was an artifact of the eay LEAD/LAG were
  implemented. Those queries contain window function calls like "LEAD(col1,
  col2 - col3)", and sometimes with suitable values in col2 and col3, the
  second argument went negative. That caused the error. implementation of
  LEAD/LAG is OK with a negative argument.

* Aggregate functions with no prelimfn or invprelimfn are now supported as
  window functions

* Window functions, e.g. rank(), no longer require an ORDER BY. (The output
  will vary from one invocation to another, though, because the order is
  then not well defined. This is more annoying on GPDB than on PostgreSQL,
  because in GDPB the row order tends to vary because the rows are spread
  out across the cluster and will arrive in the master in unpredictable
  order)

* NTILE doesn't require the argument expression to be in PARTITION BY

* A window function's arguments may contain references to an outer query.

This changes the OIDs of the built-in window functions to match upstream.
Unfortunately, the OIDs had been hard-coded in ORCA, so to work around that
until those hard-coded values are fixed in ORCA, the ORCA translator code
contains a hack to map the old OID to the new ones.

f62bd1c6

23 11月, 2017 2 次提交

Make 'must_gather' logic when planning DISTINCT and ORDER BY more robust. · a5610212

由 Heikki Linnakangas 提交于 11月 23, 2017

The old logic was:

1. Decide if we need to put a Gather motion on top of the plan
2. Add nodes to handle DISTINCT
3. Add nodes to handle ORDER BY.
4. Add Gather node, if we decided so in step 1.

If in step 1, if the result was already focused on a single segment, we
would make note that no Gather is needed, and not add one in step 4.
However, the DISTINCT processing might add a Redistribute Motion node, so
that the final result is not focused on a single node.

I couldn't come up with a query where that would happen, as the code stands,
but we saw such a case on the "window functions rewrite" branch we've been
working on. There, the sort order/distribution of the input can be changed
to process window functions. But even if this isn't actively broken right
now, it seems more robust to change the logic so that 'must_gather' means
'at the end, the result must end up on a single node', instead of 'we must
add a Gather node'. The test that this adds exercises this issue after the
the window functions rewrite, but right now it passes with or without these
code changes. But might as well add it now.

a5610212

Fix DISTINCT with window functions. · 898ced7c

由 Heikki Linnakangas 提交于 11月 23, 2017

The last 8.4 merge commit introduced support for DISTINCT with hashing,
and refactored the way grouping_planner() works with the path keys. That
broke DISTINCT with window functions, because the new distinct_pathkeys
field was not set correctly.

In commit 474f1db0, I moved some GPDB-added tests from the 'aggregates'
test, to a new 'gp_aggregates' test. But I forgot to add the new test file
to the test schedule, so it was not run. Oops. Add it to the schedule now.
The tests in 'gp_aggregates' cover this bug.

898ced7c

21 10月, 2017 1 次提交

Fix distribution of rows in CREATE TABLE AS and ORDER BY. · c159ec72

由 Heikki Linnakangas 提交于 10月 20, 2017

If a CREATE TABLE AS query contained an ORDER BY, the planner put a Motion
node on top of the plan that focuses all the rows to a single node.
However, that was confused with the re-distribute motion that CREATE TABLE
AS that is supposed to go to the top, to distribute the rows according to
the DISTRIBUTED BY of the table. This used to work before commit
7e268107, because we used to not add an explicit Motion node on top of
the plan for ORDER BY, but we just changed the sort-order information in
the Flow.

I have a nagging feeling that the apply_motion code isn't dealing with
Motion on top of a Motion node correctly, because I would've expected to
get a plan like that without this fix. Perhaps apply_motion silentlye
refuses to add a Motion node on top of an existing Motion? That'd be a
silly plan, of course, and the planner doesn't fortunately create such
plans, so I'm not going to dig deeper into that right now.

The test case is a simplified version from one of the
"mpp21090_drop_col_oids_dml_*" TINC tests. I noticed this while moving
those tests over from TINC to the main suite. We only run those tests
in the concourse pipeline with "set optimizer=on", so it didn't catch
this issue with optimizer=off.

Fixes github issue #3577.

c159ec72

13 10月, 2017 1 次提交

Remove superfluous pathkey canonicalization · 7913e231

由 Jesse Zhang 提交于 10月 12, 2017

`make_pathkeys_for_sortclauses` with a `true` last argument promises to
canonicalize the returned path keys. We somehow cargo-culted a few
unnecessary `canonicalize_pathkeys` immediately after those calls.

This commit removes such superfluous calls to `canonicalize_pathkeys`.
Signed-off-by: NMax Yang <myang@pivotal.io>

7913e231

12 10月, 2017 1 次提交
- H
  Fix planner bug in handling LIMIT without an ORDER BY. · d325233c
  由 Heikki Linnakangas 提交于 10月 12, 2017
```
This bug was introduced in commit 7e268107, which changed the way we
track the "current" ordering in the planner.
```
  d325233c
27 9月, 2017 8 次提交

Remove dead code around JoinExpr::subqfromlist. · f16deabd

由 Shreedhar Hardikar 提交于 9月 06, 2017

This was used to keep information about the subquery join tree for
pulled-up sublinks for use later in deconstruct_recurse(). With the
upstream subselect merge, a JoinExpr constructed at the pull-up time
itself, so this is no longer needed since the subquery join tree
information is available in the constructed JoinExpr.

Also with the merge, deconstruct_recurse() handles JOIN_SEMI JoinExprs.
However, since GPDB differs from upstream by treating SEMI joins as
INNER join for internal join planning, this commit also updates
inner_join_rels correctly for SEMI joins (see regression test).

Also remove unused function declaration for not_null_inner_vars().
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

f16deabd

Improve pull_up_subqueries logic w.r.t PlaceHolderVar · da29e67a

由 Ekta Khanna 提交于 7月 24, 2017

commit c59d8dd4
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Apr 28 21:31:16 2009 +0000

    Improve pull_up_subqueries logic so that it doesn't insert unnecessary
    PlaceHolderVar nodes in join quals appearing in or below the lowest
    outer join that could null the subquery being pulled up.  This improves
    the planner's ability to recognize constant join quals, and probably
    helps with detection of common sort keys (equivalence classes) as well.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

da29e67a

Refrain from creating the planner's placeholder_list · 695c9fdf

由 Ekta Khanna 提交于 7月 21, 2017

commit 31468d05
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed Oct 22 20:17:52 2008 +0000

Dept of better ideas: refrain from creating the planner's placeholder_list
until vars are distributed to rels during query_planner() startup. We don't
really need it before that, and not building it early has some advantages.
First, we don't need to put it through the various preprocessing steps, which
saves some cycles and eliminates the need for a number of routines to support
PlaceHolderInfo nodes at all. Second, this means one less unused plan for any
sub-SELECT appearing in a placeholder's expression, since we don't build
placeholder_list until after sublink expansion is complete.
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

695c9fdf

Add a concept of "placeholder" variables to the planner · 2b5c8201

由 Bhuvnesh Chaudhary 提交于 7月 18, 2017

commit e6ae3b5d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue Oct 21 20:42:53 2008 +0000

Add a concept of "placeholder" variables to the planner. These are variables
that represent some expression that we desire to compute below the top level
of the plan, and then let that value "bubble up" as though it were a plain
Var (ie, a column value).

The immediate application is to allow sub-selects to be flattened even when
they are below an outer join and have non-nullable output expressions.
Formerly we couldn't flatten because such an expression wouldn't properly
go to NULL when evaluated above the outer join. Now, we wrap it in a
PlaceHolderVar and arrange for the actual evaluation to occur below the outer
join. When the resulting Var bubbles up through the join, it will be set to
NULL if necessary, yielding the correct results. This fixes a planner
limitation that's existed since 7.1.

In future we might want to use this mechanism to re-introduce some form of
Hellerstein's "expensive functions" optimization, ie place the evaluation of
an expensive function at the most suitable point in the plan tree.
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

2b5c8201

Improve sublink pullup code to handle ANY/EXISTS sublinks · 1ddcb97e

由 Ekta Khanna 提交于 6月 01, 2017

commit 19e34b62
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Aug 17 01:20:00 2008 +0000

    Improve sublink pullup code to handle ANY/EXISTS sublinks that are at top
    level of a JOIN/ON clause, not only at top level of WHERE.  (However, we
    can't do this in an outer join's ON clause, unless the ANY/EXISTS refers
    only to the nullable side of the outer join, so that it can effectively
    be pushed down into the nullable side.)  Per request from Kevin Grittner.

    In passing, fix a bug in the initial implementation of EXISTS pullup:
    it would Assert if the EXIST's WHERE clause used a join alias variable.
    Since we haven't yet flattened join aliases when this transformation
    happens, it's necessary to include join relids in the computed set of
    RHS relids.

Ref [#142356521]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

1ddcb97e

Replace JOIN_LASJ by JOIN_ANTI · 6e7b4722

由 Ekta Khanna 提交于 5月 10, 2017

After merging with e006a24a, Anti Semi Join will
be denoted by `JOIN_ANTI` instead of `JOIN_LASJ`

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

6e7b4722

CDBlize the cherry-pick · 0feb1bd9

由 Ekta Khanna 提交于 5月 09, 2017

Original Flow:
cdb_flatten_sublinks
	+--> pull_up_IN_clauses
		+--> convert_sublink_to_join

New Flow:
cdb_flatten_sublinks
	+--> pull_up_sublinks

This commit contains relevant changes for the above flow.

Previously, `try_join_unique` was part of `InClauseInfo`. It was getting
set in `convert_IN_to_join()` and used in `cdb_make_rel_dedup_info()`.
Now, since `InClauseInfo` is not present and we construct
`FlattenedSublink` instead in `convert_ANY_sublink_to_join()`. And later
in the flow, we construct `SpecialJoinInfo` from `FlattenedSublink` in
`deconstruct_sublink_quals_to_rel()`. Hence, adding `try_join_unique` as
part of both `FlattenedSublink` and `SpecialJoinInfo`.

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

0feb1bd9

Implement SEMI and ANTI joins in the planner and executor. · fe2eb2c9

由 Ekta Khanna 提交于 5月 09, 2017

commit e006a24a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu Aug 14 18:48:00 2008 +0000

Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace
the old JOIN_IN code, but antijoins are new functionality.) Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins. Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation. That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch. So for the moment the output size estimates for semi and
especially anti joins are quite bogus.

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

fe2eb2c9

25 9月, 2017 1 次提交

Remove row order information from Flow. · 7e268107

由 Heikki Linnakangas 提交于 9月 25, 2017

A Motion node often needs to "merge" the incoming streams, to preserve the
overall sort order. Instead of carrying sort order information throughout
the later stages of planning, in the Flow struct, pass it as argument
directly to make_motion() and other functions, where a Motion node is
created. This simplifies things.

To make that work, we can no longer rely on apply_motion() to add the final
Motion on top of the plan, when the (sub-)query contains an ORDER BY. That's
because we no longer have that information available at apply_motion(). Add
the Motion node in grouping_planner() instead, where we still have that
information, as a path key.

When I started to work on this, this also fixed a bug, where the sortColIdx
of plan flow node may refer to wrong resno. A test case for that is
included. However, that case was since fixed by other coincidental changes
to partition elimination, so now this is just refactoring.

7e268107

21 9月, 2017 1 次提交

Fix CURRENT OF to work with PL/pgSQL cursors. · 91411ac4

由 Heikki Linnakangas 提交于 9月 21, 2017

It only worked for cursors declared with DECLARE CURSOR, before. You got
an "there is no parameter $0" error if you tried. This moves the decision
on whether a plan is "simply updatable", from the parser to the planner.
Doing it in the parser was awkward, because we only want to do it for
queries that are used in a cursor, and for SPI queries, we don't know it
at that time yet.

For some reason, the copy, out, read-functions of CurrentOfExpr were missing
the cursor_param field. While we're at it, reorder the code to match
upstream.

This only makes the required changes to the Postgres planner. ORCA has never
supported updatable cursors. In fact, it will fall back to the Postgres
planner on any DECLARE CURSOR command, so that's why the existing tests
have passed even with optimizer=off.

91411ac4