提交 · ea42a1cf5019e26ca271e400856008901ad9164d · Greenplum / Gpdb

21 11月, 2017 16 次提交

Remove unnecessary serial columns from a few upstream tests. · ea42a1cf

由 Heikki Linnakangas 提交于 11月 21, 2017

These had been added in GPDB a long time ago, presumably to make the test
output repeatable. But they're not needed anymore. Remove them to match
the upstream more closely, which helps when merging.

ea42a1cf

H
Remove obsolete ax_pthread.m4 file. · d068edba
由 Heikki Linnakangas 提交于 11月 21, 2017
```
It has been superseded by acx_pthread.m4, and was unused.
```
d068edba
V

Uncomment test queries in qp_misc_jiras and domain tests · cd60609f
由 Venkatesh Raghavan 提交于 11月 21, 2017

cd60609f

Fix race condition between DROP IF EXISTS and RENAME again. · 4dc999fb

由 Heikki Linnakangas 提交于 11月 21, 2017

If a DROP IF EXISTS is run concurrently with ALTER TABLE RENAME TO, it's
possible that the table gets dropped on master, but not on the segments.
This was fixed earlier already, by commit 4739cd22, but the fix got lost
during the 8.4 merge. Fix it again, in the same fashion, and also add
a test case so that it won't get lost again, at least not in exactly the
same manner.

Spotted by @kuien, github issue #3874. Thanks to @tvesely for the test
case.

4dc999fb

Supporting Join Optimization Levels in GPORCA · c8192690

由 Bhuvnesh Chaudhary 提交于 11月 09, 2017

The concept of optimization levels is known in many enterprise
optimizers. It enables user to handle the degree of optimization that is
being employed. The optimization levels allow the grouping of
transformations into bags of rules (where each is assigned a particular
level). By default all rules are applied, but if a user wants to apply
fewer rules they are able to. This decision is made by them based on
domain knowledge, and they know even with fewer rules being applied the
plan generated satisfies their needs.

The Cascade optimizer, on which GPORCA is based on, allows grouping of
transformation rules into optimization levels. This concept of
optimization levels has also been extended to join ordering allowing
user to pick the join order via the query or, use greedy approach or use
exhaustive approach.

Postgres based planners use join_limit and from_limit to reduce the
search space. While the objective of Optimization/Join is also to reduce
search space, but the way it does it is different. It is requesting the
optimizer to apply or not apply a subset of rules and providing more
flexibility to the customer. This is one of the most frequently
requested feature from our enterprise clients who have high degree of
domain knowledge.

This PR introduces this concept. In the immediate future we are planning
to add different polynomial join ordering techniques with guaranteed
bound as part of the "Greedy" search.
Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>

c8192690

L
Fix LD_LIBRARY_PATH spelling and remove redundant quotes · 56972aca
由 Larry Hamel 提交于 11月 15, 2017
```
Signed-off-by: NGoutam Tadi <gtadi@pivotal.io>
```
56972aca
L

docs - note that pxf does note yet support filter pushdown (#3937) · 7852dd15
由 Lisa Owen 提交于 11月 20, 2017

7852dd15
K

Capture and echo stdout and stderr when behave test fails on return code · 5f449812
由 Karen Huddleston 提交于 11月 17, 2017

5f449812

Refactor dynamic index scans and bitmap scans, to reduce diff vs. upstream. · 198f701e

由 Heikki Linnakangas 提交于 11月 20, 2017

Much of the code and structs used by index scans and bitmap index scans had
been fused together and refactored in GPDB, to share code between dynamic
index scans and regular ones. However, it would be nice to keep upstream
code unchanged as much as possible. To that end, refactor the exector code
for dynamic index scans and dynamic bitmap index scans, to reduce the diff
vs upstream.

The Dynamic Index Scan executor node is now a thin wrapper around the
regular Index Scan node, even thinner than before. When a new Dynamic Index
Scan begins, we don't do much initialization at that point. When the scan
begins, we initialize an Index Scan node for the first partition, and
return rows from it until it's exhausted. On next call, the underlying
Index Scan is destroyed, and a new Index Scan node is created, for the next
partition, and so on. Creating and destroying the IndexScanState for every
partition adds some overhead, but it's not significant compared to all the
other overhead of opening and closing the relations, building scan keys
etc.

Similarly, a Dynamic Bitmap Index Scan executor node is just a thin wrapper
for regular Bitmap Index Scan. When MultiExecDynamicBitmapIndexScan() is
called, it initializes an BitmapIndexScanState for the current partition,
and calls it. On ReScan, the BitmapIndexScan executor node for the old
partiton is shut down. A Dynamic Bitmap Index Scan differs from Dynamic
Index Scan in that a Dynamic Index Scan is responsible for iterating
through all the active partitions, while a Dynamic Bitmap Index Scan works
as a slave for the Dynamic Bitmap Heap Scan node above it.

It'd be nice to do a similar refactoring for heap scans, but that's for
another day.

198f701e

H

Oops, missed a comma. · 23feb165
由 Heikki Linnakangas 提交于 11月 20, 2017

23feb165

Don't bother to update distributed clog on sub-commit. · 21ff60e0

由 Heikki Linnakangas 提交于 11月 20, 2017

The distributed clog, like normal clog, will not be consulted for
subtransactions that are part of a still-in-progress transaction, so there
is no need to update it until we're ready to commit the top transaction.
This is basically the same changes that was done in the upstream for clog
in commit 06da3c57. We're about to merge that change from the upstream as
part of the PostgreSQL 8.4 merge, but we can make that change for the
distributed log separately, to keep the actual merge commit smaller.

21ff60e0

H
Fix indentation in smgr code. · 14cb19b4
由 Heikki Linnakangas 提交于 11月 20, 2017
```
By running pgindent, and tidying a bunch of things manually.
```
14cb19b4

Move some GPDB-specific code out of smgr.c and md.c. · 306b189d

由 Heikki Linnakangas 提交于 11月 20, 2017

For clarity, and to make merging easier.

The code to manage the hash table of "pending resync EOFs" for append-only
tables is moved to smgr_ao.c. One notable change here is that the
pendingDeletesPerformed flag is removed. It was used to track whether there
are any pending deletes, or any pending AO table resyncs, but we might as
well check the pending delete list and the pending syncs hash table
directly, it's hardly any slower than checking a separate boolean.

There are still plenty of GPDB changes in smgr.c, but this is a good step
forward.

306b189d

Remove unnecessary ORDER BYs from upstream tests. · cc6b462b

由 Heikki Linnakangas 提交于 11月 20, 2017

These were added in GPDB a long time ago, probably before gpdiff.pl was
introduced to mask row order differences in regression test output. But
now that gpdiff.pl can do that, these are unnecessary. Revert to match
the upstream more closely.

This includes updates to the 'rules' and 'inherit' tests, although they
are disabled and still doesn't pass after these changes.

cc6b462b

J
I broke the build · ec4c2db4
由 Jesse Zhang 提交于 11月 20, 2017
```
Fix it for realz. :(
```
ec4c2db4
J

Bump ORCA version to 2.49.1 · 240734df
由 Jesse Zhang 提交于 11月 20, 2017

240734df

20 11月, 2017 3 次提交

K

Use new test ami that contains pre-installed packages · be9a4e89
由 Karen Huddleston 提交于 11月 17, 2017

be9a4e89

concourse: hide gpexpand cluster setup in a task yml · dcdb8556

由 C.J. Jameson 提交于 11月 16, 2017

These steps are necessary for:

gpssh assumes that there's a public key already in-place if there's a
private key. When we create the cluster CCP, we don't propagate the
public keys. So, this is a work around for that.

When we run gpseginstall, it creates a tarball of the binary ($GPHOME)
on the same parent directory. Since we have GPDB installed at
/usr/local/greenplum-db, it tries to create the tarball at /usr/local.
Which means we need user permissions for that directory.
Signed-off-by: NMarbin Tan <mtan@pivotal.io>

dcdb8556

gpcloud: fix autocompress and improve test coverage · 09518a0b

由 Adam Lee 提交于 11月 17, 2017

autocompress feature was dropped by mistake, this commit fixes it and
add tests to cover all configuration options.

09518a0b

19 11月, 2017 1 次提交

Use the new 8.3 style when printing Access Privileges in psql. · ef408151

由 Heikki Linnakangas 提交于 11月 18, 2017

Long time ago, when updating our psql version to 8.3 (or something higher),
we had decided to keep the old single-line style when displaying access
privileges, to avoid having to update regression tests. It's time to move
forward, update the tests, and use the nicer 8.3 style for displaying
access privileges.

Also, \d on a view no longer prints the View Definition. You need to use
the verbose \d+ option for that. (I'm not a big fan of that change myself:
when I want to look at a view I'm almost always interested in the View
Definition. But let's not second-guess decisions made almost 10 years ago
in the upstream.)

Note: psql still defaults to the "old-ascii" style when printing multi-line
fields. The new style was introduced only later, in 9.0, so to avoid
changing all the expected output files, we should stick to the old style
until we reach that point in the merge. This commit only changes the style
for Access privileges, which is different from the multi-line style.

ef408151

18 11月, 2017 2 次提交
- X
  
  Add instruction for ssh-keygen · 269f3457
  由 Xin Zhang 提交于 11月 17, 2017
  
  269f3457
- M
  behave: gitignore artifacts from analyzedb and gpcheckcat tests · 235b0fc0
  由 Marbin Tan 提交于 11月 16, 2017
```
Signed-off-by: NC.J. Jameson <cjameson@pivotal.io>
```
  235b0fc0
17 11月, 2017 6 次提交

Remove unused istoasted field. · 9d0b7d17

由 Heikki Linnakangas 提交于 11月 17, 2017

According to git history, this was added a very long time ago, before GPDB
was open sourced, because it was needed by some migration tool back then.
But the code that used it has been removed since.

9d0b7d17

Unit test probeWalRepPublishUpdate · cd74a680

由 Taylor Vesely 提交于 11月 14, 2017

Initial work was done in collaboration with Kanaiya Kariya @kanhaiya7
and Jiangtian Nie @zhadan01
Signed-off-by: NXin Zhang <xzhang@pivotal.io>
Signed-off-by: NAsim Praveen <apraveen@pivotal.io>
Signed-off-by: NAshwin Agrawal <aagrawal@pivotal.io>

cd74a680

A

Remove usage of global variable cdb_component_dbs. · 4e525e1d
由 Ashwin Agrawal 提交于 11月 15, 2017

4e525e1d
G
remove unused yaml and bash script (#3905) · 7e0f045e
由 Goutam Tadi 提交于 11月 16, 2017
```
Signed-off-by: NGoutam Tadi <gtadi@pivotal.io>
```
7e0f045e

docs: gpbackup - add support for single level partitioned table w/ ex… (#3915) · e7874e0a

由 Mel Kiyama 提交于 11月 16, 2017

* docs: gpbackup - add support for single level partitioned table w/ ext. tbl. leaf partition.

* docs: gpbackup edits based on review comments.
-changed gprestore default --jobs to 1.
-removed gprestore comment about faster restore of indexes to match admin guide.

* docs: gpbackup fixed typo.

e7874e0a

Refactor scripts used for ORCA CI · 5156f834

由 Bhuvnesh Chaudhary 提交于 11月 15, 2017

- Add option to pass additional configure flags.
  - --disable-gpcloud for building gpdb in ORCA pipeline.

- Merge test_gpdb.py functionality with build_gpdb.py with a parameter
action=build or action=test. Both the files had common steps and it
makes difficult to keep them in sync, so better have them as different
actions in the script

- Add optional gcc-env-file argument in build_gpdb.py to source gcc
environment file. This is not used currently, but while working on this
issue we used a different images which made us realize that as we move
on different images it becomes cumbersome to source proper gcc env, so
better have an option.

- Install ORCA in the same location of GPDB installtion. This is
controlled by --install-orca-in-gpdb-location parameter.
In ORCA ci we need to build ORCA from github as
opposed to building using conan and test_gpdb.py expects the gpdb
packaged with all the dependencies.

- Minor refactor of GpBuild.py
Signed-off-by: NDhanashree Kashid <dkashid@pivotal.io>

5156f834

16 11月, 2017 12 次提交

D
Fix _optimizer file for HASH partition removal · 456e59e0
由 Daniel Gustafsson 提交于 11月 16, 2017
```
This was lost in commit 152d1223 which has the corresponding
update for the postgres planner output.
```
456e59e0

Refactor and simplify coerce_partition_value() · 62f45b47

由 Daniel Gustafsson 提交于 11月 16, 2017

Avoid the extra string copy by using the StringInfo buffer and
simplify the code a bit now that HASH partitions are removed.

62f45b47

Remove PARTTYPE_REFERENCE · c71d40f9

由 Daniel Gustafsson 提交于 11月 16, 2017

The REFERENCE partition type was never used anywhere except being
defined in the parttyp enum. Remove.

c71d40f9

Remove hash partitioning support · 152d1223

由 Daniel Gustafsson 提交于 11月 16, 2017

Hash partitioning was never fully implemented, and was never turned
on by default. There has been no effort to complete the feature, so
rather than carrying dead code this removes all support for hash
partitioning. Should we ever want this feature, we will most likely
start from scratch anyways.

As an effect from removing the unsupported MERGE/MODIFY commands,
this previously accepted query is no longer legal:

	create table t (a int, b int)
	distributed by (a)
	partition by range (b) (start () end(2));

The syntax was an effect of an incorrect rule in the parser which
made the start boundary optional for CREATE TABLE when it was only
intended for MODIFY PARTITION.

pg_upgrade was already checking for hash partitions so no new check
was required (upgrade would've been impossible anyways due to hash
algorithm change).

152d1223

Fix clash on two test tables named 'foo'. · 739db3a2

由 Heikki Linnakangas 提交于 11月 16, 2017

A Concourse job just failed with a diff like this:

*** ./expected/partition_pruning.out	2017-11-16 08:38:51.463996042 +0000
--- ./results/partition_pruning.out	2017-11-16 08:38:51.691996042 +0000
***************
*** 5785,5790 ****
--- 5786,5793 ----

  -- More Equality and General Predicates ---
  drop table if exists foo;
+ ERROR:  "foo" is not a base table
+ HINT:  Use DROP EXTERNAL TABLE to remove an external table.
  create table foo(a int, b int)
  partition by list (b)
  (partition p1 values(1,3), partition p2 values(4,2), default partition other);

That's because both 'partition_pruning' and 'portals_updatable' use a test
table called 'foo', and they are run concurrently in the test scheudle. If
the above 'drop table' happens to run at just the right time, when the table
is created but not yet dropped in 'portals_updatable', you get that error.
The test table in 'partitions_pruning' is created in a different schema,
so creating two tables with the same name would actually be a problem, but
the 'drop table' sees the table created in the public schema, too.

To fix, rename the table in 'portals_updatable', and also remove the above
unnecessary 'drop table. Either one would be enough to fix the race
condition, but might as well do both.

739db3a2

Small refactoring of ALTER ROLE .. RESOURCE QUEUE · 0a4af85c

由 Daniel Gustafsson 提交于 11月 16, 2017

Removing commented out code which is incorrect, resqueue comes from
a defelem which in turn is parsed with the IDENT rule and thus is
guaranteed to be lowercased. Also collapse the two identical if
statements into one since the latter will never fire due to the
condition being changed the former. This removes indentation and
simplifies the code a bit.

Also concatenate the error message making it easier to grep for and
end error hint with a period (following the common convention in
postgres code).

0a4af85c

resgroup: Fix memory quota calculation · c1cdb99d

由 xiong-gang 提交于 11月 16, 2017

If a QD exits when the transaction is still active, QD will abort
the transaction and destroy all gangs, without waiting for QEs
finish the transaction. In this case, a new transaction (the resource
group slot waken up by this QD) may execute before the QEs exit,
so we cannot use the number of currently running QEs to calculate
the memory quota.

c1cdb99d

Remove unused behave step · a6348eb7

由 Daniel Gustafsson 提交于 11月 16, 2017

Any trailing commas were intended to be removed from the partition
string, but due to a typo the fixed string was saved into a new
variable instead. However, Chris Hajas realized that the step was
no longer in use to remove it rather than fix it.

a6348eb7

Remove attempt to make EXPLAIN print something even on error. · ca4c29d4

由 Heikki Linnakangas 提交于 11月 16, 2017

Compared to upstream, there was a bunch of changes to the error handling
in EXPLAIN, to try to catch errors and produce EXPLAIN output even when an
error happens. I don't understand how it was supposed to work, but I don't
remember it ever doing anything too useful. And it has the strange effect
that if an ERROR happens during EXPLAIN ANALYZE, it is reported to the
client as merely a NOTICE. So, revert this to the way it is in the
upstream.

Also reorder the end-of-EXPLAIN steps so that the "Total runtime" is
printed at the very end, like in the upstream. I don't see any reason to
differ from upstream on that, and this makes the "Total runtime" numbers
more comparable with PostgreSQL's, for what it's worth.

I dug up the commit that made these changes in the old git repository, from
before GPDB was open sourced:

---
commit 1413010e71cb8eae860160ac9d5246216b2a80b4
Date:   Thu Apr 12 12:34:18 2007 -0800

    MPP-1177. EXPLAIN ANALYZE now can usually produce a report,
    perhaps a partial one, even if the query encounters a runtime
    error.  The Slice Statistics indicate 'ok' for slices whose worker
    processes all completed successfully; otherwise it shows how many
    workers returned errors, were canceled, or were not dispatched.

    The error message from EXPLAIN in such cases will appear as a
    NOTICE instead of an ERROR.  The client is first sent a successful
    completion response, followed by the NOTICE.  (psql displays the
    NOTICE before the EXPLAIN report, however.)  Although presented to
    the client as a NOTICE, the backend handles it just like an ERROR
    with regard to logging, rollback, etc.
---

I couldn't figure out under what circumstances that would be helpful,
and couldn't come up with an example. Perhaps it used to work differently
when it was originally committed, but not anymore? I also looked up the
MPP-1177 ticket in the old bug tracking system. The description for that
was:

---
EXPLAIN ANALYZE should report on the amount of data spilled to temporary
workfiles on disk, and how much work_mem would be required to complete
the query without spilling to disk.
----

Unfortunately, that description, and the comments that followed it, didn't
say anything about suppressing errors or being able to print out EXPLAIN
output even if an error happens. So I don't know why that change was made
as part of MPP-1177. It was seemingly for a different issue.

ca4c29d4

Silence warnings about deprecated bison flags. · b4d323c6

由 Heikki Linnakangas 提交于 11月 15, 2017

In Makefile.global, we set BISONFLAGS to "-Wno-deprecated". Don't override
that in the gpmapreduce's Makefile. Put the -d flag directly to the bison
invocation's command-line, like it's done e.g. in src/backend/parser.

b4d323c6

A

Fix missed input mapping for gpcheckcloud_tests_gpcloud_centos · 9d85a367
由 Adam Lee 提交于 11月 16, 2017

9d85a367

Retire the gpcloud_pipeline.yml · f10a8ca9

由 Adam Lee 提交于 11月 15, 2017

`kick-off` script is good enough for gpcloud tests, retire the
gpcloud_pipeline then.

f10a8ca9