提交 · 3f0d46f7ed30da252d72487ffdc41a07b42c3496 · Greenplum / Gpdb

28 9月, 2018 4 次提交

Order active window clauses for greater reuse of Sort nodes. · 3f0d46f7

由 Daniel Gustafsson 提交于 9月 28, 2018

This is a backport of the below commit from postgres 12dev, which in turn
is a patch that was influenced by an optimization from the previous version
of the Greenplum Window code. The idea is to order the Sort nodes based on
sort prefixes, such that sorts can be reused by subsequent nodes.

As this uses EXPLAIN in the test output, a new expected file is added for
ORCA output even though the patch only touches the postgres planner.

commit 728202b6
Author: Andrew Gierth <rhodiumtoad@postgresql.org>
Date: Fri Sep 14 17:35:42 2018 +0100

Order active window clauses for greater reuse of Sort nodes.

By sorting the active window list lexicographically by the sort clause
list but putting longer clauses before shorter prefixes, we generate
more chances to elide Sort nodes when building the path.

Author: Daniel Gustafsson (with some editorialization by me)
Reviewed-by: Alexander Kuzmenkov, Masahiko Sawada, Tom Lane
Discussion: https://postgr.es/m/124A7F69-84CD-435B-BA0E-2695BE21E5C2%40yesql.se

3f0d46f7

Remove unnecessary code for the first ORDER BY column in window agg. · e70f73e0

由 Heikki Linnakangas 提交于 9月 28, 2018

The purpose of this code was to treat the first ORDER BY column, in a
window agg like "ROW_NUMBER() OVER (ORDER BY x RANGE BETWEEN 2 PRECEDING
AND 2 FOLLOWING", the same way as volatile expressions, and add them to
the target list as is. That was to ensure that it would be available for
computing the window bounds. But upstream commit a2099360, merged as
part of the 9.3 merge, got rid of the distinction between volatile and
non-volatile expressions, so we no longer need to treat the first ORDER BY
column any different either.

e70f73e0

Move code marked with FIXME to make_windowInputTargetList(). · fa4a2ccb

由 Heikki Linnakangas 提交于 9月 28, 2018

make_windowInputTargetList() seems like a better place for this code,
as suggested by the FIXME comment that was left here in the 9.3 merge.

fa4a2ccb

Allow tables to be distributed on a subset of segments · 4eb65a53

由 ZhangJackey 提交于 9月 28, 2018

There was an assumption in gpdb that a table's data is always
distributed on all segments, however this is not always true for example
when a cluster is expanded from M segments to N (N > M) all the tables
are still on M segments, to workaround the problem we used to have to
alter all the hash distributed tables to randomly distributed to get
correct query results, at the cost of bad performance.

Now we support table data to be distributed on a subset of segments.

A new columne `numsegments` is added to catalog table
`gp_distribution_policy` to record how many segments a table's data is
distributed on.  By doing so we could allow DMLs on M tables, joins
between M and N tables are also supported.

```sql
-- t1 and t2 are both distributed on (c1, c2),
-- one on 1 segments, the other on 2 segments
select localoid::regclass, attrnums, policytype, numsegments
    from gp_distribution_policy;
 localoid | attrnums | policytype | numsegments
----------+----------+------------+-------------
 t1       | {1,2}    | p          |           1
 t2       | {1,2}    | p          |           2
(2 rows)

-- t1 and t1 have exactly the same distribution policy,
-- join locally
explain select * from t1 a join t1 b using (c1, c2);
                   QUERY PLAN
------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)
   ->  Hash Join
         Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
         ->  Seq Scan on t1 a
         ->  Hash
               ->  Seq Scan on t1 b
 Optimizer: legacy query optimizer

-- t1 and t2 are both distributed on (c1, c2),
-- but as they have different numsegments,
-- one has to be redistributed
explain select * from t1 a join t2 b using (c1, c2);
                          QUERY PLAN
------------------------------------------------------------------
 Gather Motion 1:1  (slice2; segments: 1)
   ->  Hash Join
         Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
         ->  Seq Scan on t1 a
         ->  Hash
               ->  Redistribute Motion 2:1  (slice1; segments: 2)
                     Hash Key: b.c1, b.c2
                     ->  Seq Scan on t2 b
 Optimizer: legacy query optimizer
```

4eb65a53

27 9月, 2018 1 次提交

Remove remove_subquery_in_RTEs() call in standard_planner() (#5863) · 69cd1ec5

由 Paul Guo 提交于 9月 27, 2018

As the comment said, this was useful howerver now that we have
upstream add_rte_to_flat_rtable() to handle that, let's remove
this call.

69cd1ec5

23 9月, 2018 1 次提交

Remove duplicate getgpsegmentCount() prototype · 2a10ac51

由 Daniel Gustafsson 提交于 9月 23, 2018

getgpsegmentCount() was defined in both cdbvars.h and cdbutil.h. While
not needing another header include in some cases, getgpsegmentCount()
is not a variable and the correct location is cdbutil.h. Remove the
prototype from cdbvars.g and update includes as required.

Also fix the function comment to match reality and minor tweaking of
the debug elog() performed.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>

2a10ac51

22 9月, 2018 2 次提交

Revert "Add DEBUG mode to the explain_memory_verbosity GUC" · 984cd3b9

由 Jesse Zhang 提交于 9月 21, 2018

Commit 825ca1e3 didn't seem to work well when we hook up ORCA's memory
system to memory accounting. We are tripping multiple asserts in
regression tests. The reg test failures seem to suggest we are
double-free'ing somewhere (or incorrectly accounting). Reverting for now
to get master back to green.

This reverts commit 825ca1e3.

984cd3b9

Add DEBUG mode to the explain_memory_verbosity GUC · 825ca1e3

由 Taylor Vesely 提交于 8月 24, 2018

The memory accounting system generates a new memory account for every
execution node initialized in ExecInitNode. The address to these memory
accounts is stored in the shortLivingMemoryAccountArray. If the memory
allocated for shortLivingMemoryAccountArray is full, we will repalloc
the array with double the number of available entries.

After creating approximately 67000000 memory accounts, it will need to
allocate more than 1GB of memory to increase the array size, and throw
an ERROR, canceling the running query.

PL/pgSQL and SQL functions will create new executors/plan nodes that
must be tracked my the memory accounting system. This level of detail is
not necessary for tracking memory leaks, and creating a separate memory
account for every executor will use large amount of memory just to track
these memory accounts.

Instead of tracking millions of individual memory accounts, we
consolidate any child executor account into a special 'X_NestedExecutor'
account. If explain_memory_verbosity is set to 'detailed' and below,
consolidate all child executors into this account.

If more detail is needed for debugging, set explain_memory_verbosity to
'debug', where, as was the previous behavior, every executor will be
assigned its own MemoryAccountId.

Originally we tried to remove nested execution accounts after they
finish executing, but rolling over those accounts into a
'X_NestedExecutor' account was impracticable to accomplish without the
possibility of a future regression.

If any accounts are created between nested executors that are not rolled
over to an 'X_NestedExecutor' account, recording which accounts are
rolled over can grow in the same way that the
shortLivingMemoryAccountArray is growing today, and would also grow too
large to reasonably fit in memory.

If we were to iterate through the SharedHeaders every time that we
finish a nested executor, it is not likely to be very performant.

While we were at it, convert some of the convenience macros dealing with
memory accounting for executor / planner node into functions, and move
them out of memory accounting header files into the sole callers'
compilation units.
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>
Co-authored-by: NAdam Berlin <aberlin@pivotal.io>
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>

825ca1e3

05 9月, 2018 3 次提交

D

Fix misnamed FIXME and typo in inheritance_planner comment · cdd5699a
由 Daniel Gustafsson 提交于 9月 05, 2018

cdd5699a

Explicitly initialize PlannerGlobal->non_eq_clauses. · 1c35fe0e

由 Richard Guo 提交于 9月 05, 2018

Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NRichard Guo <guofenglinux@gmail.com>

1c35fe0e

Remove a rowmark/junk_filter related FIXME in ExecInitModifyTable() · 0167e360

由 Paul Guo 提交于 8月 30, 2018

Checked our previous GPDB hacking code and newest implementation.
I found that previous gpdb related diffs do not seem to apply to
current gpdb code. So removing the FIXME. This patch also slightly
refactors some code and changes some other existing comments.

0167e360

03 9月, 2018 1 次提交

Fix assertion failure trying to build a Motion on a non-hashable column. · a031b9c0

由 Heikki Linnakangas 提交于 9月 03, 2018

Be more careful to not build a Redistribute Motion on an expression that's
not GPDB-hashable.

Fixes github issue #4868, as well as a couple of other similar cases that
were found while investigating this.

a031b9c0

31 8月, 2018 1 次提交

Rename "prelim function" to "combine function", to match upstream. · b8545d57

由 Heikki Linnakangas 提交于 8月 31, 2018

The GPDB "prelim" functions did the same things as the "combine"
functions introduced in PostgreSQL 9.6 This commit includes just the
catalog changes, to essentially search & replace "prelim" with
"combine". I did not pick the planner and executor changes that were
made as part of this in the upstream, yet.

Also replace the GPDB implementation of float8_amalg() and
float8_regr_amalg(), with the upstream float8_combine() and
float8_regr_combine(). They do the same thing, but let's use upstream
functions where possible.

Upstream commits:
commit a7de3dc5
Author: Robert Haas <rhaas@postgresql.org>
Date:   Wed Jan 20 13:46:50 2016 -0500

    Support multi-stage aggregation.

    Aggregate nodes now have two new modes: a "partial" mode where they
    output the unfinalized transition state, and a "finalize" mode where
    they accept unfinalized transition states rather than individual
    values as input.

    These new modes are not used anywhere yet, but they will be necessary
    for parallel aggregation.  The infrastructure also figures to be
    useful for cases where we want to aggregate local data and remote
    data via the FDW interface, and want to bring back partial aggregates
    from the remote side that can then be combined with locally generated
    partial aggregates to produce the final value.  It may also be useful
    even when neither FDWs nor parallelism are in play, as explained in
    the comments in nodeAgg.c.

    David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki
    Linnakangas, Haribabu Kommi, and me.

commit af025eed
Author: Robert Haas <rhaas@postgresql.org>
Date:   Fri Apr 8 13:44:50 2016 -0400

    Add combine functions for various floating-point aggregates.

    This allows parallel aggregation to use them.  It may seem surprising
    that we use float8_combine for both float4_accum and float8_accum
    transition functions, but that's because those functions differ only
    in the type of the non-transition-state argument.

    Haribabu Kommi, reviewed by David Rowley and Tomas Vondra

b8545d57

03 8月, 2018 1 次提交
- K
  Revert "Merge with PostgreSQL 9.2beta2." · e0aa3ef2
  由 Karen Huddleston 提交于 8月 02, 2018
```
This reverts commit 4750e1b6.
```
  e0aa3ef2
02 8月, 2018 1 次提交

Merge with PostgreSQL 9.2beta2. · 4750e1b6

由 Richard Guo 提交于 8月 02, 2018

This is the final batch of commits from PostgreSQL 9.2 development,
up to the point where the REL9_2_STABLE branch was created, and 9.3
development started on the PostgreSQL master branch.

Notable upstream changes:

* Index-only scan was included in the batch of upstream commits. It
  allows queries to retrieve data only from indexes, avoiding heap access.

* Group commit was added to work effectively under heavy load. Previously,
  batching of commits became ineffective as the write workload increased,
  because of internal lock contention.

* A new fast-path lock mechanism was added to reduce the overhead of
  taking and releasing certain types of locks which are taken and released
  very frequently but rarely conflict.

* The new "parameterized path" mechanism was added. It allows inner index
  scans to use values from relations that are more than one join level up
  from the scan. This can greatly improve performance in situations where
  semantic restrictions (such as outer joins) limit the allowed join orderings.

* SP-GiST (Space-Partitioned GiST) index access method was added to support
  unbalanced partitioned search structures. For suitable problems, SP-GiST can
  be faster than GiST in both index build time and search time.

* Checkpoints now are performed by a dedicated background process. Formerly
  the background writer did both dirty-page writing and checkpointing. Separating
  this into two processes allows each goal to be accomplished more predictably.

* Custom plan was supported for specific parameter values even when using
  prepared statements.

* API for FDW was improved to provide multiple access "paths" for their tables,
  allowing more flexibility in join planning.

* Security_barrier option was added for views to prevents optimizations that
  might allow view-protected data to be exposed to users.

* Range data type was added to store a lower and upper bound belonging to its
  base data type.

* CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
  SELECT query is planned during the execution of the utility. To conform to
  this change, GPDB executes the utility statement only on QD and dispatches
  the plan of the SELECT query to QEs.
Co-authored-by: NAdam Lee <ali@pivotal.io>
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Co-authored-by: NGang Xiong <gxiong@pivotal.io>
Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
Co-authored-by: NPaul Guo <paulguo@gmail.com>
Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

4750e1b6

09 7月, 2018 1 次提交

Use a penalty cost to implement enable_* planner GUCs, like in upstream. · 8fcd3fdd

由 Heikki Linnakangas 提交于 2月 05, 2018

Instead of completely disabling the generation of Paths with disabled
plan types, add a high penalty to their cost estimates, like in the
upstream. This reduces our diff vs. upstream, making future merges more
straightforward.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/Az2cDcqf73g/_tY6Yv1kBgAJCo-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
Reviewed-by: NRichard Guo <riguo@pivotal.io>

8fcd3fdd

12 5月, 2018 1 次提交
- A
  
  Use IS_QUERY_DISPATCHER() wherever relevant. · fa511cab
  由 Ashwin Agrawal 提交于 5月 10, 2018
  
  fa511cab
03 5月, 2018 1 次提交

Add Global Deadlock Detector. · 03915d65

由 Zhenghua Lyu 提交于 5月 03, 2018

To prevent distributed deadlock, in Greenplum DB an exclusive table lock is
held for UPDATE and DELETE commands, so concurrent updates the same table are
actually disabled.

We add a backend process to do global deadlock detect so that we do not lock
the whole table while doing UPDATE/DELETE and this will help improve the
concurrency of Greenplum DB.

The core idea of the algorithm is to divide lock into two types:

- Persistent: the lock can only be released after the transaction is over(abort/commit)
- Otherwise cases

This PR’s implementation adds a persistent flag in the LOCK, and the set rule is:

- Xid lock is always persistent
- Tuple lock is never persistent
- Relation is persistent if it has been closed with NoLock parameter, otherwise
  it is not persistent Other types of locks are not persistent

More details please refer the code and README.

There are several known issues to pay attention to:

- This PR’s implementation only cares about the locks can be shown
  in the view pg_locks.
- This PR’s implementation does not support AO table. We keep upgrading
  the locks for AO table.
- This PR’s implementation does not take networking wait into account.
  Thus we cannot detect the deadlock of GitHub issue #2837.
- SELECT FOR UPDATE still lock the whole table.
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>
Co-authored-by: NNing Yu <nyu@pivotal.io>

03915d65

02 5月, 2018 1 次提交

Re-enable MIN/MAX optimization. · 362fc756

由 Heikki Linnakangas 提交于 3月 28, 2018

I'm not sure why it's been disabled. It's not very hard to make it work, so
let's do it. Might not be a very common query type, but if you happen to
have a query where it helps, it helps a lot.

This adds a GUC, gp_enable_minmax_optimization, to enable/disable the
optimization. There's no such GUC in upstream, but we need at least a flag
in PlannerConfig for it, so that we can disable the optimization for
correlated subqueries, along with some other optimizer tricks. Seems best
to also have a GUC for it, for consistency with other flags in
PlannerConfig.

362fc756

29 3月, 2018 2 次提交

Support replicated table in GPDB · 7efe3204

由 Pengzhou Tang 提交于 1月 29, 2018

* Support replicated table in GPDB

Currently, tables are distributed across all segments by hash or random in GPDB. There
are requirements to introduce a new table type that all segments have the duplicate
and full table data called replicated table.

To implement it, we added a new distribution policy named POLICYTYPE_REPLICATED to mark
a replicated table and added a new locus type named CdbLocusType_SegmentGeneral to specify
the distribution of tuples of a replicated table.  CdbLocusType_SegmentGeneral implies
data is generally available on all segments but not available on qDisp, so plan node with
this locus type can be flexibly planned to execute on either single QE or all QEs. it is
similar with CdbLocusType_General, the only difference is that CdbLocusType_SegmentGeneral
node can't be executed on qDisp. To guarantee this, we try our best to add a gather motion
on the top of a CdbLocusType_SegmentGeneral node when planing motion for join even other
rel has bottleneck locus type, a problem is such motion may be redundant if the single QE
is not promoted to executed on qDisp finally, so we need to detect such case and omit the
redundant motion at the end of apply_motion(). We don't reuse CdbLocusType_Replicated since
it's always implies a broadcast motion bellow, it's not easy to plan such node as direct
dispatch to avoid getting duplicate data.

We don't support replicated table with inherit/partition by clause now, the main problem is
that update/delete on multiple result relations can't work correctly now, we can fix this
later.

* Allow spi_* to access replicated table on QE

Previously, GPDB didn't allow QE to access non-catalog table because the
data is incomplete,
we can remove this limitation now if it only accesses replicated table.

One problem is QE need to know if a table is replicated table,
previously, QE didn't maintain
the gp_distribution_policy catalog, so we need to pass policy info to QE
for replicated table.

* Change schema of gp_distribution_policy to identify replicated table

Previously, we used a magic number -128 in gp_distribution_policy table
to identify replicated table which is quite a hack, so we add a new column
in gp_distribution_policy to identify replicated table and partitioned
table.

This commit also abandon the old way that used 1-length-NULL list and
2-length-NULL list to identify DISTRIBUTED RANDOMLY and DISTRIBUTED
FULLY clause.

Beside, this commit refactor the code to make the decision-making of
distribution policy more clear.

* support COPY for replicated table

* Disable row ctid unique path for replicated table.
  Previously, GPDB use a special Unique path on rowid to address queries
  like "x IN (subquery)", For example:
  select * from t1 where t1.c2 in (select c2 from t3), the plan looks
  like:
   ->  HashAggregate
         Group By: t1.ctid, t1.gp_segment_id
            ->  Hash Join
                  Hash Cond: t2.c2 = t1.c2
                ->  Seq Scan on t2
                ->  Hash
                    ->  Seq Scan on t1

  Obviously, the plan is wrong if t1 is a replicated table because ctid
  + gp_segment_id can't identify a tuple, in replicated table, a logical
  row may have different ctid and gp_segment_id. So we disable such plan
  for replicated table temporarily, it's not the best way because rowid
  unique way maybe the cheapest plan than normal hash semi join, so
  we left a FIXME for later optimization.

* ORCA related fix
  Reported and added by Bhuvnesh Chaudhary <bchaudhary@pivotal.io>
  Fallback to legacy query optimizer for queries over replicated table

* Adapt pg_dump/gpcheckcat to replicated table
  gp_distribution_policy is no longer a master-only catalog, do
  same check as other catalogs.

* Support gpexpand on replicated table && alter the dist policy of replicated table

7efe3204

Remove FIXME about group_id in Distinct HashAgg · 2b25c663

由 Dhanashree Kashid 提交于 3月 22, 2018

With the 8.4 merge, planner considers using HashAgg to implement
DISTINCT. At the end of planning, we replace the expressions in the
targetlist of certain operators (including Agg) into OUTER references
in targetlist of its lefttree (see set_plan_refs() >
set_upper_references()).
But, as per the code, in the case when grouping() or group_id() are
present in the target list of Agg, it skips the replacement and this is
problematic in case the Agg is implementing DISTINCT.

It seems that the Agg's targetlist need not compute grouping() or
group_id() when its lefttree is computing it. In that case, it may
simply refer to it. This would then also apply to other operators
WindowAgg, Result & PartitionSelector.

However, the Repeat node needs to compute these functions at each stage
because group_id is derived from RepeatState::repeat_count. Thus, it
connot be replaced by an OUTER reference.

Hence, this commit removes the special case for these functions for all
operators except Repeat. Then, a DISTINCT HashAgg produces the correct
results.
Signed-off-by: NShreedhar Hardikar <shardikar@pivotal.io>

2b25c663

16 3月, 2018 1 次提交

Remove GPDB_84_MERGE_FIXME in planner.c and prepunion.c · 74546663

由 Shreedhar Hardikar 提交于 3月 12, 2018

These were related to chosing the right arguments to send to GPDB's
make_agg() and cost_agg() methods for queries containing DISTINCT or set
operations.

Hash aggregation when used to implement a DISTINCT (in either form) in
the query is not related to grouping sets and thus the argments to
num_nullcols, input_grouping, grouping and rollup_gs_times should be 0.

However, since SetOp uses the upstream TupleHashTable while HashAgg uses
GPDB's HHashTable implementation, the hash table size calculations
should be computed differently. This is fixed in this commit
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

74546663

09 2月, 2018 1 次提交

Refactor the way Semi-Joins plans are constructed. · d4ce0921

由 Heikki Linnakangas 提交于 2月 09, 2018

This removes much of the GPDB machinery to handle "deduplication paths"
within the planner. We will now use the upstream code to build JOIN_SEMI
paths, as well as paths where the outer side of the join is first
deduplicated (JOIN_UNIQUE_OUTER/INNER).

The old style "join first and deduplicate later" plans can be better in
some cases, however. To still be able to generate such plan, add new
JOIN_DEDUP_SEMI join type, which is transformed into JOIN_INNER followed
by the deduplication step after the join, during planning.

This new way of constructing these plans is simpler, and allows removing
a bunch of code, and reverting some more code to the way it is in the
upstream.

I'm not sure if this can generate the same plans that the old code could,
in all cases. In particular, I think the old "late deduplication"
mechanism could delay the deduplication further, all the way to the top of
the join tree. I'm not sure when that woud be useful, though, and the
regression suite doesn't seem to contain any such cases (with EXPLAIN). Or
maybe I misunderstood the old code. In any case, I think this is good
enough.

d4ce0921

02 2月, 2018 1 次提交

Remove extra planner pass to remove "trivial" Result nodes. · c613cabf

由 Heikki Linnakangas 提交于 2月 02, 2018

Instead, avoid creating such Result nodes in the first place, by making
plan_pushdown_tlist() check if the Result node would have any work to do.

With this, you get Result nodes in some cases where the old code could zap
it away. But on the other hand, this can avoid inserting Result nodes, not
only on top of Appends, but on top of any node. This can be seen in the
included expected output changes: some test queries lose a Result, some
gain one. So performance-wise this is about a wash, but this is simpler.

The reason to do this right now is that we ran into issues with the
"zapping" code while working on the 9.0 merge. I'm sure we could fix those
issues, but let's do this rather than spend time debugging and fixing the
zapping code with the merge.

c613cabf

13 12月, 2017 4 次提交

Reword comment to avoid nested comments · 8105f067

由 Daniel Gustafsson 提交于 12月 13, 2017

The comment added in 916f460f created a nested comment structure
by accident, which triggered a warning in clang for -Wcomment. Reword
the comment slightly to make the compiler happy.

planner.c:194:15: warning: '/*' within block comment [-Wcomment]
         * support pl/* statements (relevant when they are planned on the segments).
                     ^

8105f067

Fix storage test failures caused by · 0d3ae2a0

由 Shreedhar Hardikar 提交于 12月 12, 2017

The default value of Gp_role is set to GP_ROLE_DISPATCH. Which means
auxiliary processes inherit this value. FileRep does the same, but also
executes queries using SPI on the segment. Which means Gp_role ==
GP_ROLE_DISPATCH is not a sufficient check for master QD.

So, bring back the check on GpIdentity.

Author: Asim R P <apraveen@pivotal.io>
Author: Shreedhar Hardikar <shardikar@pivotal.io>

0d3ae2a0

Rename querytree_safe_for_segment to querytree_safe_for_qe · 32f099fd

由 Shreedhar Hardikar 提交于 12月 08, 2017

The original name was deceptive because this check is also done for QE
slices that run on master. For example:

EXPLAIN SELECT * FROM func1_nosql_vol(5), foo;

                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=0.30..1.37 rows=4 width=12)
   ->  Nested Loop  (cost=0.30..1.37 rows=2 width=12)
         ->  Seq Scan on foo  (cost=0.00..1.01 rows=1 width=8)
         ->  Materialize  (cost=0.30..0.33 rows=1 width=4)
               ->  Broadcast Motion 1:3  (slice1)  (cost=0.00..0.30 rows=3 width=4)
                     ->  Function Scan on func1_nosql_vol  (cost=0.00..0.26 rows=1 width=4)
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(8 rows)

Note that in the plan, the function func1_nosql_vol() will be executed on a
master slice with Gp_role as GP_ROLE_EXECUTE.

Also, update output files
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>

32f099fd

Ensure that ORCA is not called on any process other than the master QD · 916f460f

由 Shreedhar Hardikar 提交于 12月 08, 2017

We don't want to use the optimizer for planning queries in SQL, pl/pgSQL
etc. functions when that is done on the segments.

ORCA excels in complex queries, most of which will access distributed
tables. We can't run such queries from the segments slices anyway
because they require dispatching a query within another - which is not
allowed in GPDB. Note that this restriction also applies to non-QD
master slices.  Furthermore, ORCA doesn't currently support pl/*
statements (relevant when they are planned on the segments).

For these reasons, restrict to using ORCA on the master QD processes
only.

Also revert commit d79a2c7f ("Fix pipeline failures caused by 0dfd0ebc.")
and separate out gporca fault injector tests in newly added
gporca_faults.sql so that the rest can run in a parallel group.
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>

916f460f

12 12月, 2017 1 次提交

Replace usage of deprecated error codes · fd0a1b75

由 Daniel Gustafsson 提交于 12月 12, 2017

These error codes were marked as deprecated in September 2007 but
the code didn't get the memo. Extend the deprecation into the code
and actually replace the usage. Ten years seems long enough notice
so also remove the renames, the odds of anyone using these in code
which compiles against a 6X tree should be low (and easily fixed).

fd0a1b75

30 11月, 2017 1 次提交

Fix reversed flags to pull_up_clause(). · 44367278

由 Heikki Linnakangas 提交于 11月 30, 2017

Looks like you can't actually get here with any aggregates or placeholders
in the start/end offsets, or we would've gotten errors.

44367278

24 11月, 2017 7 次提交

Backport upstream comment updates · 122e817b

由 Heikki Linnakangas 提交于 10月 07, 2017

commit 96f990e2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Jul 13 20:23:09 2011 -0400

    Update some comments to clarify who does what in targetlist creation.

    No code changes; just avoid blaming query_planner for things it doesn't
    really do.

122e817b

Backport upstream bugfix related to Window functions. · 411a033c

由 Heikki Linnakangas 提交于 10月 07, 2017

The test case added to the regression suite actually seems to work on
GPDB even without this, but nevertheless seems like a good idea to pick
it now, since we have the code it affected. Also, I'm about to backport
more stuff that depend on this.

commit c1d9579d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue Jul 12 18:23:55 2011 -0400

Avoid listing ungrouped Vars in the targetlist of Agg-underneath-Window.

Regular aggregate functions in combination with, or within the arguments
of, window functions are OK per spec; they have the semantics that the
aggregate output rows are computed and then we run the window functions
over that row set. (Thus, this combination is not really useful unless
there's a GROUP BY so that more than one aggregate output row is possible.)
The case without GROUP BY could fail, as recently reported by Jeff Davis,
because sloppy construction of the Agg node's targetlist resulted in extra
references to possibly-ungrouped Vars appearing outside the aggregate
function calls themselves. See the added regression test case for an
example.

Fixing this requires modifying the API of flatten_tlist and its underlying
function pull_var_clause. I chose to make pull_var_clause's API for
aggregates identical to what it was already doing for placeholders, since
the useful behaviors turn out to be the same (error, report node as-is, or
recurse into it). I also tightened the error checking in this area a bit:
if it was ever valid to see an uplevel Var, Aggref, or PlaceHolderVar here,
that was a long time ago, so complain instead of ignoring them.

Backpatch into 9.1. The failure exists in 8.4 and 9.0 as well, but seeing
that it only occurs in a basically-useless corner case, it doesn't seem
worth the risks of changing a function API in a minor release. There might
be third-party code using pull_var_clause.

411a033c

Cherry-pick change to pull_var_clause() API. · bd3ab7bd

由 Heikki Linnakangas 提交于 10月 07, 2017

We would get this later in PostgreSQL 8.4, but I'm about to cherry-pick
more commits now, that depends on this.

Upstream commmit:

commit 1d97c19a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Sun Apr 19 19:46:33 2009 +0000

    Fix estimate_num_groups() to not fail on PlaceHolderVars, per report from
    Stefan Kaltenbrunner.  The most reasonable behavior (at least for the near
    term) seems to be to ignore the PlaceHolderVar and examine its argument
    instead.  In support of this, change the API of pull_var_clause() to allow
    callers to request recursion into PlaceHolderVars.  Currently
    estimate_num_groups() is the only customer for that behavior, but where
    there's one there may be others.

bd3ab7bd

Re-implement RANGE PRECEDING/FOLLOWING. · 14a9108a

由 Heikki Linnakangas 提交于 9月 29, 2017

This is similar to the old implementation, in that we use "+", "-" to
compute the boundaries.

Unfortunately it seems unlikely that this would be accepted in the
upstream, but at least we have that feature back in GPDB now, the way it
used to be. See discussion on pgsql-hackers about that:
https://www.postgresql.org/message-id/26801.1265656635@sss.pgh.pa.us

14a9108a

Backport implementation of ORDER BY within aggregates, from PostgreSQL 9.0. · 4319b7bb

由 Heikki Linnakangas 提交于 9月 24, 2017

This is functionality that was lost by the ripout & replace.

commit 34d26872
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Tue Dec 15 17:57:48 2009 +0000

    Support ORDER BY within aggregate function calls, at long last providing a
    non-kluge method for controlling the order in which values are fed to an
    aggregate function.  At the same time eliminate the old implementation
    restriction that DISTINCT was only supported for single-argument aggregates.

    Possibly release-notable behavioral change: formerly, agg(DISTINCT x)
    dropped null values of x unconditionally.  Now, it does so only if the
    agg transition function is strict; otherwise nulls are treated as DISTINCT
    normally would, ie, you get one copy.

    Andrew Gierth, reviewed by Hitoshi Harada

4319b7bb

Remove PercentileExpr. · bb6a757e

由 Heikki Linnakangas 提交于 9月 22, 2017

This loses the functionality, and leaves all the regression tests that used
those functions failing.

The plan is to later backport the upstream implementation of those
functions from PostgreSQL 9.4. The feature is called "ordered set
aggregates" there.

bb6a757e

Wholesale rip out and replace Window planner and executor code · f62bd1c6

由 Heikki Linnakangas 提交于 11月 23, 2017

This adds some limitations, and removes some functionality that tte old
implementation had. These limitations will be lifted, and missing
functionality will be added back, in subsequent commits:

* You can no longer have variables in start/end offsets

* RANGE is not implemented (except for UNBOUNDED)

* If you have multiple window functions that require a different sort
  ordering, the planner is not smart about placing them in a way that
  minimizes the number of sorts.

This also lifts some limitations that the GPDB implementation had:

* LEAD/LAG offset can now be negative. In the qp_olap_windowerr, a lot of
  queries that used to throw an "ROWS parameter cannot be negative" error
  are now passing. That error was an artifact of the eay LEAD/LAG were
  implemented. Those queries contain window function calls like "LEAD(col1,
  col2 - col3)", and sometimes with suitable values in col2 and col3, the
  second argument went negative. That caused the error. implementation of
  LEAD/LAG is OK with a negative argument.

* Aggregate functions with no prelimfn or invprelimfn are now supported as
  window functions

* Window functions, e.g. rank(), no longer require an ORDER BY. (The output
  will vary from one invocation to another, though, because the order is
  then not well defined. This is more annoying on GPDB than on PostgreSQL,
  because in GDPB the row order tends to vary because the rows are spread
  out across the cluster and will arrive in the master in unpredictable
  order)

* NTILE doesn't require the argument expression to be in PARTITION BY

* A window function's arguments may contain references to an outer query.

This changes the OIDs of the built-in window functions to match upstream.
Unfortunately, the OIDs had been hard-coded in ORCA, so to work around that
until those hard-coded values are fixed in ORCA, the ORCA translator code
contains a hack to map the old OID to the new ones.

f62bd1c6

23 11月, 2017 2 次提交

Make 'must_gather' logic when planning DISTINCT and ORDER BY more robust. · a5610212

由 Heikki Linnakangas 提交于 11月 23, 2017

The old logic was:

1. Decide if we need to put a Gather motion on top of the plan
2. Add nodes to handle DISTINCT
3. Add nodes to handle ORDER BY.
4. Add Gather node, if we decided so in step 1.

If in step 1, if the result was already focused on a single segment, we
would make note that no Gather is needed, and not add one in step 4.
However, the DISTINCT processing might add a Redistribute Motion node, so
that the final result is not focused on a single node.

I couldn't come up with a query where that would happen, as the code stands,
but we saw such a case on the "window functions rewrite" branch we've been
working on. There, the sort order/distribution of the input can be changed
to process window functions. But even if this isn't actively broken right
now, it seems more robust to change the logic so that 'must_gather' means
'at the end, the result must end up on a single node', instead of 'we must
add a Gather node'. The test that this adds exercises this issue after the
the window functions rewrite, but right now it passes with or without these
code changes. But might as well add it now.

a5610212

Fix DISTINCT with window functions. · 898ced7c

由 Heikki Linnakangas 提交于 11月 23, 2017

The last 8.4 merge commit introduced support for DISTINCT with hashing,
and refactored the way grouping_planner() works with the path keys. That
broke DISTINCT with window functions, because the new distinct_pathkeys
field was not set correctly.

In commit 474f1db0, I moved some GPDB-added tests from the 'aggregates'
test, to a new 'gp_aggregates' test. But I forgot to add the new test file
to the test schedule, so it was not run. Oops. Add it to the schedule now.
The tests in 'gp_aggregates' cover this bug.

898ced7c

21 10月, 2017 1 次提交

Fix distribution of rows in CREATE TABLE AS and ORDER BY. · c159ec72

由 Heikki Linnakangas 提交于 10月 20, 2017

If a CREATE TABLE AS query contained an ORDER BY, the planner put a Motion
node on top of the plan that focuses all the rows to a single node.
However, that was confused with the re-distribute motion that CREATE TABLE
AS that is supposed to go to the top, to distribute the rows according to
the DISTRIBUTED BY of the table. This used to work before commit
7e268107, because we used to not add an explicit Motion node on top of
the plan for ORDER BY, but we just changed the sort-order information in
the Flow.

I have a nagging feeling that the apply_motion code isn't dealing with
Motion on top of a Motion node correctly, because I would've expected to
get a plan like that without this fix. Perhaps apply_motion silentlye
refuses to add a Motion node on top of an existing Motion? That'd be a
silly plan, of course, and the planner doesn't fortunately create such
plans, so I'm not going to dig deeper into that right now.

The test case is a simplified version from one of the
"mpp21090_drop_col_oids_dml_*" TINC tests. I noticed this while moving
those tests over from TINC to the main suite. We only run those tests
in the concourse pipeline with "set optimizer=on", so it didn't catch
this issue with optimizer=off.

Fixes github issue #3577.

c159ec72