提交 · b4e2e3e2280ca81d1a3a1b20d9659e229238ed18 · Greenplum / Gpdb

14 3月, 2019 2 次提交

Rename legacy planner to Postgres planner · b4e2e3e2

由 Daniel Gustafsson 提交于 3月 14, 2019

As we merge with upstream and by that keep refining the Postgres
planner, legacy planner is no longer a suitable name. This changes
all variations of the spelling (legacy planner, legacy optimizer,
legacy query optimizer etc) to say "Postgres" rather than "legacy".
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
Reviewed-by: NDavid Yozie <dyozie@pivotal.io>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

b4e2e3e2

H

Create fewer partitions, to speed up tests. · 80f8bede
由 Heikki Linnakangas 提交于 3月 14, 2019

80f8bede

15 2月, 2019 1 次提交

Revert eager dispatch of index creation during ALTER TABLE · fecd245a

由 Jesse Zhang 提交于 1月 23, 2019

This reverts the following commits:
commit 0ee987e64 - "Don't dispatch index creations too eagerly in ALTER TABLE."
commit 28dd0152 - "Enable alter table column with index (#6286)"

The motivation of commit 0ee987e64 is to stop eager dispatch of index creation
during ALTER TABLE, and instead perform a single dispatch. Doing so prevents
index name already exists errors when altering data types on indexed columns
such as:

    ALTER TABLE foo ALTER COLUMN test TYPE integer;
    ERROR:  relation "foo_test_key" already exists

Unfortunately, without eager dispatch of index creation the QEs can choose a
different name for a relation than was chosen on the QD. Eager dispatch was the
only mechanism we had to ensure a deterministic and consistent index name
between the QE and QD in some scenarios. In the absence of another mechanism we
must revert this commit.

This commit also rolls back commit 28dd0125 to enable altering data types on
indexed columns, which required commit 0ee987e64.
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-authored-by: NDavid Krieger <dkrieger@pivotal.io>

fecd245a

01 2月, 2019 2 次提交

Use normal hash operator classes for data distribution. · 242783ae

由 Heikki Linnakangas 提交于 2月 01, 2019

Replace the use of the built-in hashing support for built-in datatypes, in
cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time
to do this, since we've already made the change to use jump consistent
hashing in GPDB 6, so we'll need to deal with the upgrade problems
associated with changing the hash functions, anyway.

It is no longer enough to track which columns/expressions are used to
distribute data. You also need to know the hash function used. For that,
a new field is added to gp_distribution_policy, to record the hash
operator class used for each distribution key column. In the planner,
a new opfamily field is added to DistributionKey, to track that throughout
the planning.

Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the
default hash operator class for the datatype is used. But this patch
extends the syntax so that you can specify the operator class explicitly,
like "... DISTRIBUTED BY (column opclass)". This is similar to how an
operator class can be specified for each column in CREATE INDEX.

To support upgrade, the old hash functions have been converted to special
(non-default) operator classes, named cdbhash_*_ops. For example, if you
want to use the old hash function for an integer column, you could do
"DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist
of operators that have "compatible" cdbhash functions has been replaced
by putting the compatible hash opclasses in the same operator family. For
example, all legacy integer operator classes, cdbhash_int2_ops,
cdbhash_int4_ops and cdbhash_int8_ops, are all part of the
cdbhash_integer_ops operator family).

This removes the pg_database.hashmethod field. The hash method is now
tracked on a per-table and per-column basis, using the opclasses, so it's
not needed anymore.

To help with upgrade from GPDB 5, this introduces a new GUC called
'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash
opclasses, instead of the default hash opclasses, if the opclass is not
specified explicitly. pg_upgrade will set the new GUC, to force the use of
legacy hashops, when restoring the schema dump. It will also set the GUC
on all upgraded databases, as a per-database option, so any new tables
created after upgrade will also use the legacy opclasses. It seems better
to be consistent after upgrade, so that collocation between old and new
tables work for example. The idea is that some time after the upgrade, the
admin can reorganize all tables to use the default opclasses instead. At
that point, he should also clear the GUC on the converted databases. (Or
rather, the automated tool that hasn't been written yet, should do that.)

ORCA doesn't know about hash operator classes, or the possibility that we
might need to use a different hash function for two columns with the same
datatype. Therefore, it cannot produce correct plans for queries that mix
different distribution hash opclasses for the same datatype, in the same
query. There are checks in the Query->DXL translation, to detect that
case, and fall back to planner. As long as you stick to the default
opclasses in all tables, we let ORCA to create the plan without any regard
to them, and use the default opclasses when translating the DXL plan to a
Plan tree. We also allow the case that all tables in the query use the
"legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the
two, or using any non-default opclasses, forces ORCA to fall back.

One curiosity with this is the "int2vector" and "aclitem" datatypes. They
have a hash opclass, but no b-tree operators. GPDB 4 used to allow them
as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit
56e7c16b. Now they are allowed again, so you can specify an int2vector
or aclitem column in DISTRIBUTED BY, but it's still pretty useless,
because the planner still can't form EquivalenceClasses on it, and will
treat it as "strewn" distribution, and won't co-locate joins.

Abstime, reltime, tinterval datatypes don't have default hash opclasses.
They are being removed completely on PostgreSQL v12, and users shouldn't
be using them in the first place, so instead of adding hash opclasses for
them now, we accept that they can't be used as distribution key columns
anymore. Add a check to pg_upgrade, to refuse upgrade if they are used
as distribution keys in the old cluster. Do the same for 'money' datatype
as well, although that's not being removed in upstream.

The legacy hashing code for anyarray in GPDB 5 was actually broken. It
could produce a different hash value for two arrays that are considered
equal, according to the = operator, if there were differences in e.g.
whether the null bitmap was stored or not. Add a check to pg_upgrade, to
reject the upgrade if array types were used as distribution keys. The
upstream hash opclass for anyarray works, though, so it is OK to use
arrays as distribution keys in new tables. We just don't support binary
upgrading them from GPDB 5. (See github issue
https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of
'anyrange' had the same problem, but that was new in GPDB 6, so we don't
need a pg_upgrade check for that.

This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE
INDEX, so that you can no longer create a situation where a non-hashable
column becomes the distribution key. (Fixes github issue
https://github.com/greenplum-db/gpdb/issues/6317)

Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io>
Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
Co-authored-by: NPengzhou Tang <ptang@pivotal.io>
Co-authored-by: NChris Hajas <chajas@pivotal.io>
Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
Reviewed-by: NNing Yu <nyu@pivotal.io>
Reviewed-by: NSimon Gao <sgao@pivotal.io>
Reviewed-by: NJesse Zhang <jzhang@pivotal.io>
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
Reviewed-by: NYandong Yao <yyao@pivotal.io>

242783ae

Rename gp_distribution_policy.attrnums to distkey, and make it int2vector. · 69ec6926

由 Heikki Linnakangas 提交于 2月 01, 2019

This is in preparation for adding operator classes as a new column
(distclass) to gp_distribution_policy. This naming is consistent with
pg_index.indkey/indclass. Change the datatype to int2vector, also for
consistency with pg_index, and some other catalogs that store attribute
numbers, and because int2vector is slightly more convenient to work with
in the backend. Move the column to the end of the table, so that all the
variable-length and nullable columns are at the end, which makes it
possible to reference the other columns directly in Form_gp_policy.

Add a backend function, pg_get_table_distributedby(), to deparse the
DISTRIBUTED BY definition of a table into a string. This is similar to
pg_get_indexdef_columns(), pg_get_functiondef() etc. functions that we
have. Use the new function in psql and pg_dump, when connected to a GPDB6
server.
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NPeifeng Qiu <pqiu@pivotal.io>
Co-authored-by: NAdam Lee <ali@pivotal.io>

69ec6926

25 1月, 2019 1 次提交

Remove YYYYMMDDHH24MISS timestamp support · 71ec37a9

由 Daniel Gustafsson 提交于 1月 25, 2019

Greenplum had support for parsing YYYYMMDDHH24MISS timestamps, which
upstream did not.  This is however problematic since it cannot be
parsed unambigously. The following example shows a situation when
the parsing will perform a result which is unlikely to be what the
user was expecting:

  postgres=# set datestyle to 'mdy';
  SET
  postgres=# select '13081205132018'::timestamp;
        timestamp
  ---------------------
   1308-12-05 13:20:18
  (1 row)

The original intent was to aid ETL jobs from Teradata which only
supported this format. This is no longer true according to Teradata
documentation.

This retires the functionality (which was highlighted during the
merge process) aligning the code with upstream, and adds a negative
test for it, the corresponding documentation changes in the release
notes are already done. The existing test for this in qp_misc_jiras
was removed along with a test for a format supported in upstream,
which was already covered by existing suites.
Reviewed-by: NRob Eckhardt <reckhardt@pivotal.io>

71ec37a9

10 1月, 2019 1 次提交

Don't use TIDs with high offset numbers in AO tables. · c249ac7a

由 Heikki Linnakangas 提交于 1月 10, 2019

Change the mapping AO segfilenum+rownum to an ItemPointer, so that we
avoid using ItemPointer.ip_posid values higher than 32768. Such offsets
are impossible on heap tables, because you can't fit that many tuples on
a page. In GiST, since PostgreSQL 9.1, we have taken advantage of that by
using 0xfffe (65534) to mark special "invalid" GiST tuples. We can tolerate
that, because those invalid tuples can only appear on internal pages, so
they cannot be confused with AO TIDs, which only appear on leaf pages. But
we will also use of those high values for other similar magic values, in
later version of PostgreSQL, so it seems better to keep clear of them, even
if we could make it work.

To allow binary upgrades of indexes that already contain AO tids with high
offsets, we still allow and handle those, too, in the code to fetch AO
tuples. Also relax the sanity check in GiST code, to not confuse those high
values with invalid tuples.

Fixes https://github.com/greenplum-db/gpdb/issues/6227Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

c249ac7a

29 12月, 2018 1 次提交

Call executor nodes the same, whether generated by planner or ORCA. · 455b9a19

由 Heikki Linnakangas 提交于 12月 29, 2018

We used to call some node types different names in EXPLAIN output,
depending on whether the plan was generated by ORCA or the Postgres
planner. Also, a Bitmap Heap Scan used to be called differently, when the
table was an AO or AOCS table, but only in planner-generated plans. There
was some historical justification for this, because they used to
be different executor node types, but commit db516347 removed last such
differences.

Full list of renames:

Table Scan -> Seq Scan
Append-only Scan -> Seq Scan
Append-only Columnar Scan -> Seq Scan
Dynamic Table Scan -> Dynamic Seq Scan
Bitmap Table Scan -> Bitmap Heap Scan
Bitmap Append-Only Row-Oriented Scan -> Bitmap Heap Scan
Bitmap Append-Only Column-Oriented Scan -> Bitmap Heap Scan
Dynamic Bitmap Table Scan -> Dynamic Bitmap Heap Scan

455b9a19

13 12月, 2018 2 次提交

Update alter indexed column test to be more relevant · ac5cc7c8

由 Daniel Gustafsson 提交于 12月 13, 2018

This updates the testcase for altering an indexed column to be more
relevant now that commit 28dd0152 has changed the constraint which
was previously blocking this.

The comment was also wrong since the above referenced commit, so fix
that too.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>

ac5cc7c8

Reporting cleanup for GPDB specific errors/messages · 56540f11

由 Daniel Gustafsson 提交于 12月 13, 2018

The Greenplum specific error handling via ereport()/elog() calls was
in need of a unification effort as some parts of the code was using a
different messaging style to others (and to upstream). This aims at
bringing many of the GPDB error calls in line with the upstream error
message writing guidelines and thus make the user experience of
Greenplum more consistent.

The main contributions of this patch are:

* errmsg() messages shall start with a lowercase letter, and not end
  with a period. errhint() and errdetail() shall be complete sentences
  starting with capital letter and ending with a period. This attempts
  to fix this on as many ereport() calls as possible, with too detailed
  errmsg() content broken up into details and hints where possible.

* Reindent ereport() calls to be more consistent with the common style
  used in upstream and most parts of Greenplum:

	ereport(ERROR,
			(errcode(<CODE>),
			 errmsg("short message describing error"),
			 errhint("Longer message as a complete sentence.")));

* Avoid breaking messages due to long lines since it makes grepping
  for error messages harder when debugging. This is also the de facto
  standard in upstream code.

* Convert a few internal error ereport() calls to elog(). There are
  no doubt more that can be converted, but the low hanging fruit has
  been dealt with. Also convert a few elog() calls which are user
  facing to ereport().

* Update the testfiles to match the new messages.

Spelling and wording is mostly left for a follow-up commit, as this was
getting big enough as it was. The most obvious cases have been handled
but there is work left to be done here.

Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

56540f11

12 12月, 2018 1 次提交

Enable alter table column with index (#6286) · 28dd0152

由 Jinbao Chen 提交于 12月 12, 2018

* Make sure ALTER TABLE preserves index tablespaces.

When rebuilding an existing index, ALTER TABLE correctly kept the
physical file in the same tablespace, but it messed up the pg_class
entry if the index had been in the database's default tablespace
and "default_tablespace" was set to some non-default tablespace.
This led to an inaccessible index.

Fix by fixing pg_get_indexdef_string() to always include a tablespace
clause, whether or not the index is in the default tablespace. The
previous behavior was installed in commit 537e92e4, and I think it just
wasn't thought through very clearly; certainly the possible effect of
default_tablespace wasn't considered. There's some risk in changing the
behavior of this function, but there are no other call sites in the core
code. Even if it's being used by some third party extension, it's fairly
hard to envision a usage that is okay with a tablespace clause being
appended some of the time but can't handle it being appended all the time.

Back-patch to all supported versions.

Code fix by me, investigation and test cases by Michael Paquier.

Discussion: <1479294998857-5930602.post@n3.nabble.com>

* Enable alter table column with index

Originally, we disabled alter table column with index. Because INDEX CREATE is dispatched immediately, which unfortunately breaks the ALTER work queue. So we have a workaround and disable alter table column with index.
In postgres91, is_alter_table param was introduced to 'DefineIndex'. So we have a chance to re-enable this feature.

28dd0152

15 11月, 2018 1 次提交

Don't store total relpages/reltuples in root partition's pg_class. · 443e3631

由 Heikki Linnakangas 提交于 11月 15, 2018

For consistency with upstream. Sum up the child reltuples at planning
time in ORCA, instead. This adds some overhead to planning partitioned
tables with lots of partitions with ORCA, but we hardly care about the
startup time of ORCA, anyway..

This effectively reverts recent commit 3f3869d6. The tests that were
added to vacuum_gp in that commit don't make much sense anymore, but
instead of removing them altogether, rewrite them into something that is
marginally useful. We probably have tests for reltuples/relpages elsewhere,
in 'analyze' test file at least, but it seems good to have a smaller
summary of the intended behavior, in the form of a test like this.
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>

443e3631

25 10月, 2018 1 次提交

Remove "Rows Out" processing in test explain parsing · 07e489b4

由 Daniel Gustafsson 提交于 10月 25, 2018

The Rows out.. output was removed from EXPLAIN in the recent merge
with PostgreSQL 9. This output remained in the expected testfiles
though, and that worked due to the explain Perl module parsing out
the offending line from the resulting diff operation. This removes
the parsing and all the now-outdated rows from the expected files.
Reviewed-by: NJoao Pereira <jdealmeidapereira@pivotal.io>

07e489b4

28 9月, 2018 1 次提交

Fix syntax errors in testsuites that aren't on purpose · 3f6273e1

由 Daniel Gustafsson 提交于 9月 28, 2018

There were a few cases of broken queries in the test suites which
weren't done on purpose in order to test the parser/grammar. This
fixes the ones that stood out, but there are likely to be more in
ignore blocks that slip through the cracks.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

3f6273e1

11 9月, 2018 2 次提交

Z
Fix test cases that are failing to pass · ae3f337c
由 ZhangJackey 提交于 9月 11, 2018
```
Fix test cases that are failing to pass, They are caused by a4cbf586 .
```
ae3f337c

Make planner generate redistribute-motion. · a4cbf586

由 ZhangJackey 提交于 9月 11, 2018

When doing an inner join, we will test that if we can use redistribute motion
by the function cdbpath_partkeys_from_preds. But if a_partkey is NIL(it is NIL
at the beginning of the function), we append nothing into it. Thus this
function will only return false. This leads to the planner can only generate a
broadcast motion for the inner relation.

We fix this by the same logic as an outer join.

WTS node is immovable, this commit adds some code to handle it.
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

a4cbf586

05 9月, 2018 1 次提交

Re-enable the CDB planner for DISTINCT-qualified aggregates. · 35e60338

由 Heikki Linnakangas 提交于 9月 05, 2018

In the Window functions rewrite, I had accidentally disabled the creation
of the multi-phase DQA plans. As a result, queries containing DISTINCT-
qualified aggregates, e.g. "SUM(DISTINCT n)", generated naive single-phase
plans, which can be much less efficient.

In PostgreSQL, DISTINCT aggregates are planned similarly to ordered
aggregates, using a tuplesort inside the Agg node. Because of that, the
AggClauseCosts.numOrderedAggs count includes both DISTINCT and ordered
aggregates. With cdb_grouping_planner, we fall back to that method for
ordered aggregates, but for DISTINCT aggregates, it can generate smarter
plans. However, because it was checking numOrderedAggs to check whether
the plan contained any, it got fooled and also fell back if there were any
DISTINCT aggregates. To fix, add a separate counter for true ordered
aggregates that doesn't include DISTINCTs.

The DQA-planner code had bitrotted somewhat, while it's not been used.
Nowadays, the Aggref.args list contains TargetEntrys instead of plain
expressions. And when constructing subqueries on the fly, the RelOptInfos
created for the RTE_SUBQUERY rtable entries need to have the 'subplan'
and 'subroot' fields filled in. Those changed were made to the similar code
for the normal, non-DISTINCT 2-phase aggs during the merges, but the
DQA code was neglected, because it was unused.

We have tests for DQAs, but apparently none of the tests use EXPLAIN, so
although you still got correct results, we did not notice that the plans
regressed. The expected output of a few existing EXPLAIN tests changes but
they were fairly simple cases where the naive plan actually seems fine as
well. Add EXPLAINs to a few queries in the gp_dqa test, so that we will be
alarmed if this happens again.

Fixes https://github.com/greenplum-db/gpdb/issues/5627

35e60338

18 8月, 2018 1 次提交

Change cdbhash to use jump consistent hash. · 4a174240

由 Zhenghua Lyu 提交于 8月 18, 2018

Previously, Greenplum uses a very simple algorithm to reduce hash
value to segment id: it just does a modulo computation. Simply
modulo is not suitable for cluster expanding.

In this commit, we change cdbhash to use jump consistent hash algorithm.
More details, please refer the proposal on gpdb-dev mail-list:
https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/f5bqTzAZjAsCo-authored-by: NZhenghua Lyu <zlv@pivotal.io>
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NJialun Du <jdu@pivotal.io>
Co-authored-by: NDavid Kimura <dkimura@pivotal.io>
Co-authored-by: NAsim R P <apraveen@pivotal.io>

4a174240

15 8月, 2018 1 次提交

Refactor allow_system_table_mods into a boolean GUC (#5407) · 4c24d744

由 David Kimura 提交于 8月 15, 2018

The purpose of this refactor is to more closely align the GUC with postgres. It
started as a suggestion in https://github.com/greenplum-db/gpdb/pull/4790.
There are still differences, particularly around when this GUC can be set. In
GPDB it can be set by anyone at any time (PGC_USERSET), however in postgres it
is limited to postmaster restart (PGC_POSTMASTER). This difference was kept on
purpose until we have more buy-in as it is a bigger change on the end-user.
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>

4c24d744

03 8月, 2018 1 次提交
- K
  Revert "Merge with PostgreSQL 9.2beta2." · e0aa3ef2
  由 Karen Huddleston 提交于 8月 02, 2018
```
This reverts commit 4750e1b6.
```
  e0aa3ef2
02 8月, 2018 1 次提交

Merge with PostgreSQL 9.2beta2. · 4750e1b6

由 Richard Guo 提交于 8月 02, 2018

This is the final batch of commits from PostgreSQL 9.2 development,
up to the point where the REL9_2_STABLE branch was created, and 9.3
development started on the PostgreSQL master branch.

Notable upstream changes:

* Index-only scan was included in the batch of upstream commits. It
  allows queries to retrieve data only from indexes, avoiding heap access.

* Group commit was added to work effectively under heavy load. Previously,
  batching of commits became ineffective as the write workload increased,
  because of internal lock contention.

* A new fast-path lock mechanism was added to reduce the overhead of
  taking and releasing certain types of locks which are taken and released
  very frequently but rarely conflict.

* The new "parameterized path" mechanism was added. It allows inner index
  scans to use values from relations that are more than one join level up
  from the scan. This can greatly improve performance in situations where
  semantic restrictions (such as outer joins) limit the allowed join orderings.

* SP-GiST (Space-Partitioned GiST) index access method was added to support
  unbalanced partitioned search structures. For suitable problems, SP-GiST can
  be faster than GiST in both index build time and search time.

* Checkpoints now are performed by a dedicated background process. Formerly
  the background writer did both dirty-page writing and checkpointing. Separating
  this into two processes allows each goal to be accomplished more predictably.

* Custom plan was supported for specific parameter values even when using
  prepared statements.

* API for FDW was improved to provide multiple access "paths" for their tables,
  allowing more flexibility in join planning.

* Security_barrier option was added for views to prevents optimizations that
  might allow view-protected data to be exposed to users.

* Range data type was added to store a lower and upper bound belonging to its
  base data type.

* CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
  SELECT query is planned during the execution of the utility. To conform to
  this change, GPDB executes the utility statement only on QD and dispatches
  the plan of the SELECT query to QEs.
Co-authored-by: NAdam Lee <ali@pivotal.io>
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Co-authored-by: NGang Xiong <gxiong@pivotal.io>
Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
Co-authored-by: NPaul Guo <paulguo@gmail.com>
Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

4750e1b6

26 7月, 2018 1 次提交

ORCA now mimics planner when it comes to empty stats · 3f561d42

由 Omer Arap 提交于 7月 20, 2018

When there is no stats available for any table, ORCA was treating it as an
empty table while planning. On the other hand planner is utilizing a guc
`gp_enable_relsize_collection` to obtain the estimated size of the table, but
no other statistics. This commit enables ORCA to have the same behavior as
planner when the guc is set.
Signed-off-by: NSambitesh Dash <sdash@pivotal.io>

3f561d42

21 7月, 2018 1 次提交

Remove tests for gp_setwith_alter_storage. · f11dc049

由 Ashwin Agrawal 提交于 7月 18, 2018

This feature is not ready for primetime yet, so no point testing the same by
enabling the guc. Hence for now removing the tests whenever in future we expose
this feature add and enable tests for the same.

Github issue #5300 tracks to full enable the feature.

f11dc049

01 3月, 2018 1 次提交

Give a better error message, if preparing an xact fails. · b3c50e40

由 Heikki Linnakangas 提交于 3月 01, 2018

If an error happens in the prepare phase of two-phase commit, relay the
original error back to the client, instead of the fairly opaque
"Abort [Prepared]' broadcast failed to one or more segments" message you
got previously. A lot of things happen during the prepare phase that
can legitimately fail, like checking deferred constraints, like in the
'constraints' regression test. But even without that, there can be
triggers, ON COMMIT actions, etc., any of which can fail.

This commit consists of several parts:

* Pass 'true' for the 'raiseError' argument when dispatching the prepare
  dtx command in doPrepareTransaction(), so that the error is emitted to
  the client.

* Bubble up an ErrorData struct, with as many fields intact as possible,
  to the caller,  when dispatching a dtx command. (Instead of constructing
  a message in a StringInfo). So that we can re-throw the message to
  the client, with its original formatting.

* Don't throw an error in performDtxProtocolCommand(), if we try to abort
  a prepared transaction that doesn't exist. That is business-as-usual,
  if a transaction throws an error before finishing the prepare phase.

* Suppress the "NOTICE: Releasing segworker groups to retry broadcast."
  message, when aborting a prepared transaction.

Put together, the effect is if an error happens during prepare phase, the
client receives a message that is largely indistinguishable from the
message you'd get if the same failure happened while running a normal
statement.

Fixes github issue #4530.

b3c50e40

18 1月, 2018 1 次提交

Fix whitespace in tests, mostly in expected output. · 06a2bb64

由 Heikki Linnakangas 提交于 1月 18, 2018

Commit ce3153fa, about to be merged from PostgreSQL 9.0 soon, removes
the -w option from pg_regress's "diff" invocation. That commit will fix
all the PostgreSQL regression tests to pass without it, but we need to
also fix all the GPDB tests. That's what this commit does.

06a2bb64

13 12月, 2017 1 次提交

Rename querytree_safe_for_segment to querytree_safe_for_qe · 32f099fd

由 Shreedhar Hardikar 提交于 12月 08, 2017

The original name was deceptive because this check is also done for QE
slices that run on master. For example:

EXPLAIN SELECT * FROM func1_nosql_vol(5), foo;

                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice2; segments: 3)  (cost=0.30..1.37 rows=4 width=12)
   ->  Nested Loop  (cost=0.30..1.37 rows=2 width=12)
         ->  Seq Scan on foo  (cost=0.00..1.01 rows=1 width=8)
         ->  Materialize  (cost=0.30..0.33 rows=1 width=4)
               ->  Broadcast Motion 1:3  (slice1)  (cost=0.00..0.30 rows=3 width=4)
                     ->  Function Scan on func1_nosql_vol  (cost=0.00..0.26 rows=1 width=4)
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(8 rows)

Note that in the plan, the function func1_nosql_vol() will be executed on a
master slice with Gp_role as GP_ROLE_EXECUTE.

Also, update output files
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>

32f099fd

24 11月, 2017 4 次提交

Disable ORCA in failing tests and add FIXMEs · 76aa7355

由 Heikki Linnakangas 提交于 11月 23, 2017

Disabled ORCA in following tests, updated the answer files and added
appropriate FIXMEs.

- qp_olap_windowerr
  There are a lot of failures in qp_olap_windowerr; hence disable ORCA
  wholesale for it.

- bfv_aggregate
  Previously ORCA was falling back for the query with array_agg and ORDER
  BY. Now it generates a plan however it discards the ORDER BY in the
  subquery causing undeterministic order. Fix the SQL by adding ORDER BY
  inside the aggregate.

- eagerfree
  ORCA falls back to planner; update the expected output

- gporca.sql:
  Turn off ORCA for the queries with DQA, RANK() with OVER() and COUNT(*)
  with OVER(). Add appropriate FIXME and update the answer file.

- bfv_partition_plans_optimizer.out
  For the queries containing EXCEPT [ALL] or INTERSECT [ALL] ORCA falls
  back; and the planner plan produced does not contain "Partition
  Selector". Temporarily update the answer file until the issue is fixed.

Author: Melanie Plageman <mplageman@pivotal.io>
Author: Dhanashree Kashid <dkashid@pivotal.io>
Author: Venkatesh Raghavan <vraghavan@pivotal.io>
Author: Jacob Champion <pchampion@pivotal.io>
Author: Ekta Khanna <ekhanna@pivotal.io>
Author: Jesse Zhang <sbjesse@gmail.com>

76aa7355

Updating out files for qp_misc_jiras · 1b3ba000

由 Ekta Khanna 提交于 11月 15, 2017

Prior to window postgres merge, the plans generated had hashagg. Added
FIXME to fix the plan diffs.
Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>

1b3ba000

Fix expected output, for more precise results. · 831f65e3

由 Heikki Linnakangas 提交于 10月 12, 2017

The imprecision in these results was caused by the lossiness of the floating
point +/- when using the inverse-transition function. The new implementation
doesn't use inverse transition functions, so you get a more accurate result.

See https://github.com/greenplum-db/gpdb/issues/3508.

831f65e3

H
The queries work now, adjust expected output. · 95b53ad6
由 Heikki Linnakangas 提交于 10月 12, 2017
```
We have lifted some limitations that the old implementation had.
```
95b53ad6

21 11月, 2017 1 次提交
- V
  
  Uncomment test queries in qp_misc_jiras and domain tests · cd60609f
  由 Venkatesh Raghavan 提交于 11月 21, 2017
  
  cd60609f
01 11月, 2017 1 次提交
- S
  
  Update ICW files for changes in commit 356511da · f5aff604
  由 Shreedhar Hardikar 提交于 10月 31, 2017
  
  f5aff604
15 5月, 2017 1 次提交

Streamline Orca Gucs · 9f2c838b

由 Venkatesh Raghavan 提交于 5月 15, 2017

* Enable analyzing root partitions
* Ensure that the name of the guc is clear
* Remove double negation (where possible)
* Update comments
* Co-locate gucs that have similar purpose
* Remove dead gucs
* Classify them correctly so that they are no longer hidden

9f2c838b

15 2月, 2017 1 次提交

Port Bugbuster trigger test to ICW · 74a1efd7

由 Daniel Gustafsson 提交于 2月 15, 2017

Since there is no GPDB specific trigger suite in ICW, and this bug
was originally part of the Rio sprint back in the day, I added it
to qp_misc_jiras. The suite already contained a version of the test
so extended it to to include the segment redistribution check as
well rather than duplicating for essentially the same test. This
test relies on the rows being allocated to segment id 0, which
should hold true for test setups with 2/3 segments. Should this be
a problem going forward the test should be rewritten to check for
all rows being co-located on a single segment but for now the more
readable version is kept.

It's worth noting that the expected output in the Bugbuster suite
has the incorrect behaviour remembered, but inside ignore blocks
so this bug has never actually been tested for.

74a1efd7

19 12月, 2016 1 次提交

Make NOTICE for table distribution consistent · bbed4116

由 Daniel Gustafsson 提交于 12月 18, 2016

The different kinds of NOTICE messages regarding table distribution
were using a mix of upper and lower case for 'DISTRIBUTED BY'. Make
them consistent by using upper case for all messages and update the
test files, and atmsort regexes, to match.

bbed4116

23 11月, 2016 2 次提交

H
Speed up qp_misc_jiras test case. · fc1806c2
由 Heikki Linnakangas 提交于 11月 23, 2016
```
Use smaller test tables for tests that are not sensitive to the size of
the table.
```
fc1806c2

Speedup test case for an ancient bug with relation extension over 1 GB. · 6300ced7

由 Heikki Linnakangas 提交于 11月 22, 2016

The original bug was that when a relation grew over 1 GB, we did not
create the file to back the relation, and the next extension threw
a "No such file or directory" error. The bug was in _mdmir_openseg(),
and this block that's there now, was added to fix it:

    if ( createIfDoesntExist )
    {
        MirrorDataLossTrackingState mirrorDataLossTrackingState;
        int64        mirrorDataLossTrackingSessionNum;

        mirrorDataLossTrackingState =
            FileRepPrimary_GetMirrorDataLossTrackingSessionNum(&mirrorDataLossTrackingSessionNum);

        MirroredBufferPool_Create(&mirroredOpen,
                                  &reln->smgr_rnode,
                                  segno,
                                  /* relationName */ NULL, // Ok to be NULL -- we don't know the name here.
                                  mirrorDataLossTrackingState,
                                  mirrorDataLossTrackingSessionNum,
                                  &primaryError,
                                  &mirrorDataLossOccurred);
    }

If you remove that, the original problem reappears, and this new test
case will catch it, as well as the old one.

Before this change, the test index is bloated to over 3 GB in size. With this,
we "only" create a heap that's just above 1 GB, and we create that in a much
more efficient manner. That greatly reduces the load during testing.

Thanks to Asim Praveen for diggin up the original bug that this was trying
to test!

6300ced7

22 11月, 2016 3 次提交

H
Avoid database-wide ANALYZE in regression test. · 8b005b09
由 Heikki Linnakangas 提交于 11月 22, 2016
```
No need to analyze the whole database, just the two tables that the test
uses. To speed it up.
```
8b005b09
H
Misc cleanup of qp_misc_jiras test. · 4c312222
由 Heikki Linnakangas 提交于 11月 21, 2016
```
Remove unnecessary stuff, add comments explaining what we're trying to test.
```
4c312222

Simplify test for currtid2(). · 4219974b

由 Heikki Linnakangas 提交于 11月 21, 2016

currtid2() is not supported in GPDB. You get the same error message
regardless of the arguments, so no need to set up a test table for it.
(The test table wasn't actually even used, but I think the intention was
for the currtid2() call to refer to it.)

4219974b