提交 · 6e7b4722b11755ff0c8fac9fe2e0b3a0698cff98 · Greenplum / Gpdb

27 9月, 2017 15 次提交

Replace JOIN_LASJ by JOIN_ANTI · 6e7b4722

由 Ekta Khanna 提交于 5月 10, 2017

After merging with e006a24a, Anti Semi Join will
be denoted by `JOIN_ANTI` instead of `JOIN_LASJ`

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

6e7b4722

Remove InClauseInfo and OuterJoinInfo · 8b63aafb

由 Ekta Khanna 提交于 5月 09, 2017

Since `InClauseInfo` and `OuterJoinInfo` are now combined into
`SpecialJoinInfo` after merging with e006a24a; this commit remove them
from the relevant places.

Access `join_info_list` instead of `in_info_list` and `oj_info_list`

Previously, `CdbRelDedupInfo` contained list of `InClauseInfo` s. While
making join decisions and overall join processing, we traversed this list
and invoked cdb specific functions: `cdb_make_rel_dedup_info()`, `cdbpath_dedup_fixup()`

Since `InClauseInfo` is no longer available,  `CdbRelDedupInfo` will contain list of
`SpecialJoinInfo` s. All the cdb specific routines which were previously called for
`InClauseInfo` list will now be called if `CdbRelDedupInfo` has valid `SpecialJoinInfo`
list and if join type in `SpecialJoinInfo` is `JOIN_SEMI`. A new helper routine `hasSemiJoin()`
has been added which traverses `SpecialJoinInfo` list to check if it contains `JOIN_SEMI`.

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

8b63aafb

Replace JOIN_IN by JOIN_SEMI in ORCA translator · db853de9

由 Ekta Khanna 提交于 5月 09, 2017

After merging with e006a24a, the jointype JOIN_IN has been renamed to
JOIN_SEMI.
This commit makes appropriate changes in ORCA for the same.

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

db853de9

Add pullup decisions in `convert_ANY_sublink_to_join` · 7be2c0ad

由 Ekta Khanna 提交于 5月 09, 2017

Add pullup decisions specific to CDB from `convert_IN_to_join`

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

7be2c0ad

Add pullup decisions in `convert_EXISTS_sublink_to_join` · d91f0efb

由 Dhanashree Kashid 提交于 5月 09, 2017

After merging with e006a24a, this commit adds CDB specific restrictions
as follows:

0. Add pullup decisions specific to CDB from
`convert_EXISTS_to_join` and `convert_NOT_EXISTS_to_join`

0. Before this cherry-pick, we used to generate extra quals
for the NOT EXISTS query. This was done by calling `cdbpullup_expr()`
in `convert_NOT_EXISTS_to_join()`.
However, for the exact same query with EXISTS, we never generated these
extra quals.
```
create table foo(t text, n numeric, i int, v varchar(10)) distributed by (t);
explain select * from foo t0 where not exists (select 1 from foo t1 where t0.i=t1.i + 1);
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=1.08..2.12 rows=4 width=19)
   ->  Hash Left Anti Semi Join  (cost=1.08..2.12 rows=2 width=19)
         Hash Cond: t0.i = (t1.i + 1)
         ->  Seq Scan on foo t0  (cost=0.00..1.00 rows=1 width=19)
         ->  Hash  (cost=1.04..1.04 rows=1 width=4)
               ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..1.04 rows=1 width=4)
                     ->  Seq Scan on foo t1  (cost=0.00..1.00 rows=1 width=4)
                           Filter: (i + 1) IS NOT NULL  -> extra filter
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(10 rows)

explain select * from foo t0 where exists (select 1 from foo t1 where t0.i=t1.i + 1);
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=1.08..2.12 rows=4 width=19)
   ->  Hash EXISTS Join  (cost=1.08..2.12 rows=2 width=19)
         Hash Cond: t0.i = (t1.i + 1)
         ->  Seq Scan on foo t0  (cost=0.00..1.00 rows=1 width=19)
         ->  Hash  (cost=1.04..1.04 rows=1 width=4)
               ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..1.04 rows=1 width=4)
                     ->  Seq Scan on foo t1  (cost=0.00..1.00 rows=1 width=4)
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(9 rows)

```
Currently with this commit, the combined pull-up code for EXISTS and NOT EXISTS
does not generate extra filters. This will be a future TODO.

0. Use `is_simple_subquery` in `simplify_EXISTS_query` to check if
subquery can be pulled up or not.

Ref [#142355175]
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

d91f0efb

Remove old pullup functions after merging · 2b5c1b9e

由 Dhanashree Kashid 提交于 5月 09, 2017

With the new flow, we don't need the following functions:

 - pull_up_IN_clauses
 - convert_EXISTS_to_join
 - convert_NOT_EXISTS_to_antijoin
 - not_null_inner_vars
 - safe_to_convert_NOT_EXISTS
 - convert_sublink_to_join

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

2b5c1b9e

CDBlize the cherry-pick · 0feb1bd9

由 Ekta Khanna 提交于 5月 09, 2017

Original Flow:
cdb_flatten_sublinks
	+--> pull_up_IN_clauses
		+--> convert_sublink_to_join

New Flow:
cdb_flatten_sublinks
	+--> pull_up_sublinks

This commit contains relevant changes for the above flow.

Previously, `try_join_unique` was part of `InClauseInfo`. It was getting
set in `convert_IN_to_join()` and used in `cdb_make_rel_dedup_info()`.
Now, since `InClauseInfo` is not present and we construct
`FlattenedSublink` instead in `convert_ANY_sublink_to_join()`. And later
in the flow, we construct `SpecialJoinInfo` from `FlattenedSublink` in
`deconstruct_sublink_quals_to_rel()`. Hence, adding `try_join_unique` as
part of both `FlattenedSublink` and `SpecialJoinInfo`.

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

0feb1bd9

Implement SEMI and ANTI joins in the planner and executor. · fe2eb2c9

由 Ekta Khanna 提交于 5月 09, 2017

commit e006a24a
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu Aug 14 18:48:00 2008 +0000

Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace
the old JOIN_IN code, but antijoins are new functionality.) Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins. Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation. That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch. So for the moment the output size estimates for semi and
especially anti joins are quite bogus.

Ref [#142355175]
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

fe2eb2c9

L

docs - simplify the CREATE EXERNAL TABLE ref page (part 1) (#3379) · b68bcd89
由 Lisa Owen 提交于 9月 26, 2017

b68bcd89

docs - restructure admin guide top level topics (#3389) · 56cdb519

由 Lisa Owen 提交于 9月 26, 2017

* admin guide working w/dbs section - pull some topics up a level

* promote ddl, crud topics; move querying topic up

56cdb519

Force behave to use older version of its dependency · b4cfe392

由 Nadeem Ghani 提交于 9月 26, 2017

Behave by default just grabs the latest parse_type package as part of
the setup requirement. However, the newest parse_type package (0.4.2)
uses a new convention on how it handles packaging which is not in the
older python versions before 2.7.13. Since we're in python 2.7.12, we
break.

Force requirement to use an older version as a hack to bypass this
issue.
Signed-off-by: NMarbin Tan <mtan@pivotal.io>

b4cfe392

M

docs: COPY... ON SEGMENT w/ SELECT is not supported. (#3348) · 2bc5401c
由 Mel Kiyama 提交于 9月 26, 2017

2bc5401c
L
Revert "Running 'make clean' does not remove objects under bin/gpcheckcloud (#3377)" · f1bfd7d9
由 Lav Jain 提交于 9月 26, 2017
```
This reverts commit f35b369c.
```
f1bfd7d9
L

Running 'make clean' does not remove objects under bin/gpcheckcloud (#3377) · f35b369c
由 Lav Jain 提交于 9月 26, 2017

f35b369c
L

docs - resgroup concurrency can be 0 but not for admin_group (#3358) · e397112d
由 Lisa Owen 提交于 9月 26, 2017

e397112d

26 9月, 2017 13 次提交

Convert GPDB-specific GUCs to the new "enum" type · 59165cfa

由 Jacob Champion 提交于 9月 26, 2017

Several GUCs are simply enumerated strings that are parsed into integer
types behind the scenes. As of 8.4, the GUC system recognizes a new
type, enum, which will do this for us. Move as many as we can to the new
system.

As part of this,
- gp_idf_deduplicate was changed from a char* string to an int, and new
  IDF_DEDUPLICATE_* macros were added for each option
- password_hash_algorithm was changed to an int
- for codegen_optimization_level, "none" is the default now when codegen
  is not enabled during compilation (instead of the empty string).

A couple of GUCs that *could* be represented as enums
(optimizer_minidump, gp_workfile_compress_algorithm) have been
purposefully kept with the prior system because they require the GUC
variable to be something other than an integer anyway.
Signed-off-by: NJacob Champion <pchampion@pivotal.io>

59165cfa

Backport FILTER implementation, from PostgreSQL 9.4 (commit ). · 4c7f65be

由 Heikki Linnakangas 提交于 9月 26, 2017

We had a very simplistic implementation in parse-analysis already, which
converted the FILTER WHERE clause into a CASE-WHEN expression. That did
not work for non-strict aggregates, and didn't deparse back into a FILTER
expression nicely, to name a few problems with it. Replace it with the
PostgreSQL implementation.

TODO:

* ORCA support. It now falls back to the Postgres planner.
* I disabled the three-stage DQA plan types if there are any FILTERs

4c7f65be

L

Update bucket location for singlecluster · 33ea9a4e
由 Lav Jain 提交于 9月 25, 2017

33ea9a4e
X
pgindent the segadmin.c · cf1680c4
由 Xin Zhang 提交于 9月 25, 2017
```
Signed-off-by: NAshwin Agrawal <aagrawal@pivotal.io>
```
cf1680c4

Refactor the gpsegwalrep.py · 26d4fe10

由 Ashwin Agrawal 提交于 9月 21, 2017

Adding timestamp to better trace the command.

Adding ClusterConfiguration to improve the reuse, and remove
ColdStartMaster class.
Signed-off-by: NXin Zhang <xzhang@pivotal.io>

26d4fe10

Add new primary and mirror in gp_segment_configuration as not-in-sync. · efed2fcc

由 Xin Zhang 提交于 9月 19, 2017

Under SEGWALREP, the mode for gp_segment_configuration is 'n' for
not-in-sync and 's' for in-sync.

When new primary or mirror is added to the cluster, the initial state of
the mode should be 'n' instead of 's' (in the original filerep mode.)

If mirror exists and synchronized with the primary, then the mode will
be updated to 's' to indicate the primary-mirror pair are in-sync.
Signed-off-by: NAshwin Agrawal <aagrawal@pivotal.io>

efed2fcc

Generalized gp_add_segment() · aecb8e69

由 Ashwin Agrawal 提交于 9月 19, 2017

Originally, gp_add_segment() is only used to add a primary segment. We
created a new gp_add_segment_primary() to keep the original
functionality, which automatically generate dbid and contentid.

We generalized the gp_add_segment() to be able to directly update
gp_segment_configuration and pg_filespace_entry for adding any type of
segments with full specification of segment and filespace mappings. In
this case, the new gp_add_segment() doesn't generate dbid and contentid
automatically, and rely on input parameters.

Originally there is separate code path in gp_add_segment_mirror() to
figure out primary dbid, such logic is actually common even with
gp_add_segment(), which can add both primary and mirror.

In that case, we refactor the primary dbid detection logic in common
function add_segment(), and refactor the gp_add_segment_mirror() to use
the add_segment() instead of add_segment_config_entry().

We update gpinitsystem to use the function gp_add_segment() instead of
update the gp_segment_configuration and pg_filespace_entry tables
directly via SQL.
Signed-off-by: NXin Zhang <xzhang@pivotal.io>

aecb8e69

X
Add #define macros for gp_segment_configuration · 22a1d7eb
由 Xin Zhang 提交于 9月 18, 2017
```
Signed-off-by: NAshwin Agrawal <aagrawal@pivotal.io>
```
22a1d7eb

Update README.md dev setup instructions · ebac7aaa

由 Ben Christel 提交于 9月 25, 2017

- Move Conan instructions above manual orca setup instructions
  so that people don't do manual work first (like we did).
- Inline recommendation to parallelize make with -j8.
- Fix formatting of code blocks in README.md

See issues:
 - https://github.com/greenplum-db/gpdb/issues/3178
 - https://github.com/greenplum-db/gpdb/pull/3332

ebac7aaa

Surface ORCA Memory Account info in EXPLAIN (#3354) · c9e93199

由 Kavinder Dhaliwal 提交于 9月 25, 2017

This commit will display the contents of the Optimizer Mem Account when
the optimizer GUC is on and explain_memory_verbosity is set to
'summary'.
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

c9e93199

Fix ORDER BY NULLS FIRST in an aggregate's ORDER BY. · d31fc2e7

由 Heikki Linnakangas 提交于 9月 25, 2017

When we merged with PostgreSQL 8.3, and got support for NULLS FIRST, we
missed or stubbed out some code, that is required to honor NULLS FIRST in an
aggregate's ORDER BY, e.g. "select array_agg(a order by b desc nulls last)"

d31fc2e7

L

load/unload page - add link to greenplum-spark connector (#3375) · 9ae72e09
由 Lisa Owen 提交于 9月 25, 2017

9ae72e09
A
Fix distclean target for platform directory · 0e0c17a9
由 Asim R P 提交于 9月 21, 2017
```
Signed-off-by: NTaylor Vesely <tvesely@pivotal.io>
```
0e0c17a9

25 9月, 2017 8 次提交

Remove the concept of window "key levels". · b1651a43

由 Heikki Linnakangas 提交于 9月 25, 2017

It wasn't very useful. ORCA and Postgres both just stack WindowAgg nodes
on top of each other, and no-one's been unhappy about that, so we might as
well do that, too. This reduces the difference between GPDB and the upstream
implementation, and will hopefully make it smoother to switch.

Rename the Window Plan node type to WindowAgg, to match upstream, now
that it is fairly close to the upstream version.

b1651a43

H
Rename WindowRef et al. to WindowFunc. · 9e82d83d
由 Heikki Linnakangas 提交于 9月 25, 2017
```
To match upstream.
```
9e82d83d

Avoid Division by Zero error. · ee7f7a9e

由 Heikki Linnakangas 提交于 9月 25, 2017

This test case could throw either "ROWS parameter cannot be negative", or
"Division By Zero", depending on which gets evaluated first. Remove the
division by zero error, to make it more predictable.

ee7f7a9e

R
Refactor test case for concurrent query cancellation/termination. · 286b431c
由 Richard Guo 提交于 9月 25, 2017
```
This case is to test cancel/terminate queries concurrently while they are 
running or waiting in resource group.
```
286b431c

Remove row order information from Flow. · 7e268107

由 Heikki Linnakangas 提交于 9月 25, 2017

A Motion node often needs to "merge" the incoming streams, to preserve the
overall sort order. Instead of carrying sort order information throughout
the later stages of planning, in the Flow struct, pass it as argument
directly to make_motion() and other functions, where a Motion node is
created. This simplifies things.

To make that work, we can no longer rely on apply_motion() to add the final
Motion on top of the plan, when the (sub-)query contains an ORDER BY. That's
because we no longer have that information available at apply_motion(). Add
the Motion node in grouping_planner() instead, where we still have that
information, as a path key.

When I started to work on this, this also fixed a bug, where the sortColIdx
of plan flow node may refer to wrong resno. A test case for that is
included. However, that case was since fixed by other coincidental changes
to partition elimination, so now this is just refactoring.

7e268107

Add pipeline support for AIX clients and loaders · 68362b41

由 Peifeng Qiu 提交于 9月 25, 2017

Concourse doesn't support AIX natively, we need to clone the repo
with the correspond commit on remote machine, compile the packages,
and download them back to concourse container as output.

Testing client and loader for platform without gpdb server is
another challenge. We setup GPDB server on concourse container just
like most installcheck tests, and use SSH tunnel to forward ports
from and to the remote host. This way both CL tools and GPDB server
feel they are on the same machine, and the test can run normally.

68362b41

Report COPY PROGRAM's error output · 2b51c16b

由 Adam Lee 提交于 9月 14, 2017

Replace popen() with popen_with_stderr() which is used in external web
table also to collect the stderr output of program.

Since popen_with_stderr() forks a `sh` process, it's almost always
sucessful, this commit catches errors happen in fwrite().

Also passes variables as the same as what external web table does.
Signed-off-by: NXiaoran Wang <xiwang@pivotal.io>

2b51c16b

Fix cgroup mount point detect in gpconfig. · 37e3e66d

由 Zhenghua Lyu 提交于 9月 25, 2017

Previous code us python package psutil to get the mount
information of the system which will read the content
of /etc/mtab. In some environments, /etc/mtab does not
contain the mount point information of cgroups. In this
commit, we scan /proc/self/mounts to find out cgroup
mount point.

37e3e66d

23 9月, 2017 4 次提交

K

Coverity fix: elog string formatting · d4a707c7
由 Kavinder Dhaliwal 提交于 9月 23, 2017

d4a707c7

Add a long living account for Relinquished Memory · 1822c826

由 Kavinder Dhaliwal 提交于 9月 22, 2017

There are cases where during execution a Memory Intensive Operator (MI)
may not use all the memory that is allocated to it. This means that this
extra memory (quota - allocated) can be relinquished for other MI nodes
to use during execution of a statement. For example

->  Hash Join
         ->  HashAggregate
         ->  Hash
In the above query fragment the HashJoin operator has a MI operator for
both its inner and outer subtree. If there ever is the case that the
Hash node used much less memory than was given as its quota it will now
call MemoryAccounting_DeclareDone() and the difference between its
quota and allocated amount will be added to the allocated amount of the
RelinquishedPool. Doing this will enable HashAggregate to request memory
from this RelinquishedPool if it exhausts its quota to prevent spilling.

This PR adds two new API's to the MemoryAccounting Framework

MemoryAccounting_DeclareDone(): Add the difference between a memory
account's quota and its allocated amount to the long living
RelinquishedPool

MemoryAccounting_RequestQuotaIncrease(): Retrieve all relinquished
memory by incrementing an operator's operatorMemKb and setting the
RelinquishedPool to 0

Note: This PR introduces the facility for Hash to relinquish memory to
the RelinquishedPool memory account and for the Agg operator
(specifically HashAgg) to request an increase to its quota before it
builds its hash table. This commit does not generally apply this
paradigm to all MI operators
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>
Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>

1822c826

Cherry-pick 'ae47eb1' from upstream to fix Nested CTE errors (#3360) · 009b1809

由 sambitesh 提交于 9月 22, 2017

Before this cherry-pick the below query would have errored out

WITH outermost(x) AS (
  SELECT 1
  UNION (WITH innermost as (SELECT 2)
         SELECT * FROM innermost
         UNION SELECT 3)
)
SELECT * FROM outermost;
Signed-off-by: NMelanie Plageman <mplageman@pivotal.io>

009b1809

Update 5.json with catalog changes (amgetmulti -> amgetbitmap) · 4daa7c5f

由 Tom Meyer 提交于 9月 22, 2017

To update 5.json, we ran:

cat src/include/catalog/*.h | perl src/backend/catalog/process_foreign_keys.pl > gpMgmt/bin/gppylib/data/5.json
Signed-off-by: NJacob Champion <pchampion@pivotal.io>

4daa7c5f