提交 · e3cf4f269153a488e2b18ca67fafbe18a7901c73 · Greenplum / Gpdb

06 4月, 2019 1 次提交

Do not modify a cached plan statement during ExecuteTruncate. · e3cf4f26

由 Adam Berlin 提交于 4月 03, 2019

User defined functions cache their plan, therefore if we modify the
plan during execution, we risk having invalid data during the next
execution of the cached plan.

ExecuteTruncate modifies the plan's relations when declaring partitions
that need truncation.

Instead, we copy the list of relations that need truncation and add the
partition relations to the copy.

e3cf4f26

04 4月, 2019 1 次提交

Fix assert failure when truncating a simple toast unlogged table. (#7341) · 3a393aca

由 Paul Guo 提交于 4月 04, 2019

It will assert fail at this

mdcreate (reln=0x2ac5548, forkNum=INIT_FORKNUM, isRedo=0 '\000') at md.c:288
ExceptionalCondition (conditionName=0xe7cc00 "!(reln->md_fd[forkNum] == ((void *)0))", errorType=0xe7cbbe "FailedAssertion", fileName=0xe7cbb9 "md.c", lineNumber=288) at assert.c:46

This fixes https://github.com/greenplum-db/gpdb/issues/7340

Reported-by: zhbp366@github
Reviewed-by: NJimmy Yih <jyih@pivotal.io>
Reviewed-by: NTaylor Vesely <tvesely@pivotal.io>

3a393aca

14 3月, 2019 2 次提交

Rename legacy planner to Postgres planner · b4e2e3e2

由 Daniel Gustafsson 提交于 3月 14, 2019

As we merge with upstream and by that keep refining the Postgres
planner, legacy planner is no longer a suitable name. This changes
all variations of the spelling (legacy planner, legacy optimizer,
legacy query optimizer etc) to say "Postgres" rather than "legacy".
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
Reviewed-by: NDavid Yozie <dyozie@pivotal.io>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

b4e2e3e2

P
Merge pull request #7108 from paul-guo-/exchange · 063f4dfc
由 Paul Guo 提交于 3月 14, 2019
```
Make sure array type is not missed after partition exchange.
```
063f4dfc

13 3月, 2019 3 次提交

Correctly expand writable external table · 2c9cd895

由 Zhang Shujie 提交于 3月 13, 2019

Writable external table has an entry in gp_distribution_policy
so it has numsegments field. Previous code skips any external
tables so that their numsegments fields are not updated. This
commit fixes this by:
  1. add a column in status_detail table to record whether the
      table is writable external and invokes correct SQL to expand
      such tables.
  2. Support `Alter external table <tab> expand table` for writable
      external tables.

Co-authored-by: Zhenghua Lyu zlv@pivotal.io

2c9cd895

Add check on exchange partion (#7049) · 8f641583

由 Jinbao Chen 提交于 3月 13, 2019

On sql "alter table parttab exchange partition for (1) with table y",
we checked if the table has same columns with the partion table. But
on sql "alter table parttab alter partition for (1) exchange partition
for (1) with table x", we forgot the check. Add the check back.

8f641583

Check permission first to avoid access unavailable field. · 4f0f9b87

由 Jialun Du 提交于 1月 28, 2019

If the target is not a table, it must error out. So it's better
to do permission check first, or the logic may access some fields
which is null-able for non-table object and cause crash.

4f0f9b87

12 3月, 2019 4 次提交

Prevent check constraint drop on leaf partitions · ac12504d

由 Taylor Vesely 提交于 2月 21, 2019

This commit is part of Add Partitioned Indexes #7047.

Don't allow check constraints to be dropped on partition leaves. Only
allow them to be dropped from the root.

This restriction had been relaxed on PRIMARY KEY and UNIQUE constraints
after adding INTERNAL_AUTO dependencies to constraints. However, we need
to maintain the restiction on check constrants because they are
currently not being protected by pg_depend.
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>

ac12504d

Update exchange/split partition for partitioned indexes · f7971a6a

由 Taylor Vesely 提交于 2月 21, 2019

This commit is part of Add Partitioned Indexes #7047.

During exchange and split partition update the pg_depend relationships and
exchange constraint names.

AttachPartitionEnsureIndexes was cherry-picked from:
commit 8b08f7d4
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Fri Jan 19 11:49:22 2018 -0300

    Local partitioned indexes

    When CREATE INDEX is run on a partitioned table, create catalog entries
    for an index on the partitioned table (which is just a placeholder since
    the table proper has no data of its own), and recurse to create actual
    indexes on the existing partitions; create them in future partitions
    also.

    As a convenience gadget, if the new index definition matches some
    existing index in partitions, these are picked up and used instead of
    creating new ones.  Whichever way these indexes come about, they become
    attached to the index on the parent table and are dropped alongside it,
    and cannot be dropped on isolation unless they are detached first.

    To support pg_dump'ing these indexes, add commands
        CREATE INDEX ON ONLY <table>
    (which creates the index on the parent partitioned table, without
    recursing) and
        ALTER INDEX ATTACH PARTITION
    (which is used after the indexes have been created individually on each
    partition, to attach them to the parent index).  These reconstruct prior
    database state exactly.

    Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
            Langote, Jesper Pedersen, Simon Riggs, David Rowley
    Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsqlCo-authored-by: NKalen Krempely <kkrempely@pivotal.io>

f7971a6a

Add Partitioned Indexes · 9d40d472

由 Taylor Vesely 提交于 2月 21, 2019

This commit adds partitioned indexes from upstream Postgres. This commit is
mostly cherry-picked from Postgres 11 and bug fixes from Postgres 12.

Differences from upstream:
 - Postgres has two additional relkind's - RELKIND_PARTITIONED_TABLE and
   RELKIND_PARTITIONED_INDEX, which have no on disk storage. Greenplum does not
   have these additional relkinds. Thus, partitioned indexes have physical
   storage.
 - CREATE INDEX ON ONLY <table> DDL has not yet been implemented
 - ALTER INDEX ATTACH PARTITION DDL has not yet been implemented

Constraint changes:
 - Constraints and their backing index have the same names. Thus, partitions of
   a table no longer share the same constraint name, and are instead related to
   their parent via INTERNAL_AUTO dependencies.

Index changes:
 - Child partition indexes can no longer be directly dropped, and must be
   dropped from their root. This includes mid-level and leaf indexes.
 - Adding indexes to mid-level partitions cascade to their children.

These changes are mostly cherry-picked from:
commit 8b08f7d4
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date:   Fri Jan 19 11:49:22 2018 -0300

    Local partitioned indexes

    When CREATE INDEX is run on a partitioned table, create catalog entries
    for an index on the partitioned table (which is just a placeholder since
    the table proper has no data of its own), and recurse to create actual
    indexes on the existing partitions; create them in future partitions
    also.

    As a convenience gadget, if the new index definition matches some
    existing index in partitions, these are picked up and used instead of
    creating new ones.  Whichever way these indexes come about, they become
    attached to the index on the parent table and are dropped alongside it,
    and cannot be dropped on isolation unless they are detached first.

    To support pg_dump'ing these indexes, add commands
        CREATE INDEX ON ONLY <table>
    (which creates the index on the parent partitioned table, without
    recursing) and
        ALTER INDEX ATTACH PARTITION
    (which is used after the indexes have been created individually on each
    partition, to attach them to the parent index).  These reconstruct prior
    database state exactly.

    Reviewed-by: (in alphabetical order) Peter Eisentraut, Robert Haas, Amit
            Langote, Jesper Pedersen, Simon Riggs, David Rowley
    Discussion: https://postgr.es/m/20171113170646.gzweigyrgg6pwsg4@alvherre.pgsql

Changes were also cherry-picked from the following Postgres commits:
  eb7ed3f3 - Allow UNIQUE indexes on partitioned tables
  ae366aa5 - Detach constraints when partitions are detached
  19184fcc - Simplify coding to detach constraints when detaching partition
  c7d43c4d - Correct attach/detach logic for FKs in partitions
  17f206fb - Set pg_class.relhassubclass for partitioned indexes
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>

9d40d472

ALTER TABLE SET DISTRIBUTED RANDOMLY should check for primary key or unique index · c55fe2d6

由 Jimmy Yih 提交于 2月 25, 2019

Currently, a randomly distributed table cannot be created with a
primary key or unique index. We should put this restriction for ALTER
TABLE SET DISTRIBUTED RANDOMLY as well. This was caught by gpcheckcat
distribution_policy check.
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>

c55fe2d6

11 3月, 2019 4 次提交

Move partition type definition into errorpath · a4f623ae

由 Daniel Gustafsson 提交于 3月 11, 2019

Saving the partitioning type into a string was done with an elaborate
switch() only for the string to be consumed in a single errorpath.
Rather than spending cycles and memory for a rare error, inline the
partition type string handling into the errorpath.
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

a4f623ae

Turn partition add check into an assertion · 23670b93

由 Daniel Gustafsson 提交于 3月 11, 2019

get_part_rule() must only return NULL for this addition to make any
sense to continue with, but we silently ignored when it didn't with
a rather odd codepath. Since get_part_rule() is defined to always
error out on the conditions we don't want, add an assertion for this
and reindent the code to rely on it doing the right thing.

Also discard unused returnvalues.
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

23670b93

Gracefully error out on incorrect partition type add · 460789a0

由 Daniel Gustafsson 提交于 3月 11, 2019

When altering a partitioned table and adding an incorrectly specified
partition, an assertion was hit rather than gracefully erroring out.
Make sure that the requested partition matches the underlying table
definition before continuing down into the altering code. This also
adds a testcase for this.

Reported-by: Kalen Krempely in #6967
Reviewed-by: NPaul Guo <pguo@pivotal.io>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

460789a0

Retire the reshuffle method for table data expansion (#7091) · 1c262c6e

由 Ning Yu 提交于 3月 11, 2019

This method was introduced to improve the data redistribution
performance during gpexpand phase2, however per benchmark results the
effect does not reach our expectation. For example when expanding a
table from 7 segments to 8 segments the reshuffle method is only 30%
faster than the traditional CTAS method, when expanding from 4 to 8
segments reshuffle is even 10% slower than CTAS. When there are indexes
on the table the reshuffle performance can be worse, and extra VACUUM is
needed to actually free the disk space. According to our experiments
the bottleneck of reshuffle method is on the tuple deletion operation,
it is much slower than the insertion operation used by CTAS.

The reshuffle method does have some benefits, it requires less extra
disk space, it also requires less network bandwidth (similar to CTAS
method with the new JCH reduce method, but less than CTAS + MOD). And
it can be faster in some cases, however as we can not automatically
determine when it is faster it is not easy to get benefit from it in
practice.

On the other side the reshuffle method is less tested, it is possible to
have bugs in corner cases, so it is not production ready yet.

In such a case we decided to retire it entirely for now, we might add it
back in the future if we can get rid of the slow deletion or find out
reliable ways to automatically choose between reshuffle and ctas
methods.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/8xknWag-SkI/5OsIhZWdDgAJReviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

1c262c6e

15 2月, 2019 2 次提交

Recursively create partitioned indexes · f27b2a50

由 Taylor Vesely 提交于 2月 01, 2019

Pull from upstream Postgres to make DefineIndex recursively create partitioned
indexes. Instead of creating an individual IndexStmt for every partition,
create indexes by recursing on the partition children.  This aligns index
creation with upstream in preparation for adding INTERNAL_AUTO relationships
between partition indexes.

 * The QD will now choose the same name for partition indexes as Postgres.
 * Update tests to reflect the partition index names changes.
 * The changes to DefineIndex are mostly cherry-picked from Postgres commit:
   8b08f7d4
 * transformIndexStmt and its callers have been aligned with Postgres
   REL9_4_STABLE
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>

f27b2a50

Revert eager dispatch of index creation during ALTER TABLE · fecd245a

由 Jesse Zhang 提交于 1月 23, 2019

This reverts the following commits:
commit 0ee987e64 - "Don't dispatch index creations too eagerly in ALTER TABLE."
commit 28dd0152 - "Enable alter table column with index (#6286)"

The motivation of commit 0ee987e64 is to stop eager dispatch of index creation
during ALTER TABLE, and instead perform a single dispatch. Doing so prevents
index name already exists errors when altering data types on indexed columns
such as:

    ALTER TABLE foo ALTER COLUMN test TYPE integer;
    ERROR:  relation "foo_test_key" already exists

Unfortunately, without eager dispatch of index creation the QEs can choose a
different name for a relation than was chosen on the QD. Eager dispatch was the
only mechanism we had to ensure a deterministic and consistent index name
between the QE and QD in some scenarios. In the absence of another mechanism we
must revert this commit.

This commit also rolls back commit 28dd0125 to enable altering data types on
indexed columns, which required commit 0ee987e64.
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-authored-by: NDavid Krieger <dkrieger@pivotal.io>

fecd245a

13 2月, 2019 1 次提交
- H
  
  Whitespace and other cosmetic fixes, to match upstream more closely. · 3ebaa86b
  由 Heikki Linnakangas 提交于 2月 13, 2019
  
  3ebaa86b
01 2月, 2019 1 次提交

Use normal hash operator classes for data distribution. · 242783ae

由 Heikki Linnakangas 提交于 2月 01, 2019

Replace the use of the built-in hashing support for built-in datatypes, in
cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time
to do this, since we've already made the change to use jump consistent
hashing in GPDB 6, so we'll need to deal with the upgrade problems
associated with changing the hash functions, anyway.

It is no longer enough to track which columns/expressions are used to
distribute data. You also need to know the hash function used. For that,
a new field is added to gp_distribution_policy, to record the hash
operator class used for each distribution key column. In the planner,
a new opfamily field is added to DistributionKey, to track that throughout
the planning.

Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the
default hash operator class for the datatype is used. But this patch
extends the syntax so that you can specify the operator class explicitly,
like "... DISTRIBUTED BY (column opclass)". This is similar to how an
operator class can be specified for each column in CREATE INDEX.

To support upgrade, the old hash functions have been converted to special
(non-default) operator classes, named cdbhash_*_ops. For example, if you
want to use the old hash function for an integer column, you could do
"DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist
of operators that have "compatible" cdbhash functions has been replaced
by putting the compatible hash opclasses in the same operator family. For
example, all legacy integer operator classes, cdbhash_int2_ops,
cdbhash_int4_ops and cdbhash_int8_ops, are all part of the
cdbhash_integer_ops operator family).

This removes the pg_database.hashmethod field. The hash method is now
tracked on a per-table and per-column basis, using the opclasses, so it's
not needed anymore.

To help with upgrade from GPDB 5, this introduces a new GUC called
'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash
opclasses, instead of the default hash opclasses, if the opclass is not
specified explicitly. pg_upgrade will set the new GUC, to force the use of
legacy hashops, when restoring the schema dump. It will also set the GUC
on all upgraded databases, as a per-database option, so any new tables
created after upgrade will also use the legacy opclasses. It seems better
to be consistent after upgrade, so that collocation between old and new
tables work for example. The idea is that some time after the upgrade, the
admin can reorganize all tables to use the default opclasses instead. At
that point, he should also clear the GUC on the converted databases. (Or
rather, the automated tool that hasn't been written yet, should do that.)

ORCA doesn't know about hash operator classes, or the possibility that we
might need to use a different hash function for two columns with the same
datatype. Therefore, it cannot produce correct plans for queries that mix
different distribution hash opclasses for the same datatype, in the same
query. There are checks in the Query->DXL translation, to detect that
case, and fall back to planner. As long as you stick to the default
opclasses in all tables, we let ORCA to create the plan without any regard
to them, and use the default opclasses when translating the DXL plan to a
Plan tree. We also allow the case that all tables in the query use the
"legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the
two, or using any non-default opclasses, forces ORCA to fall back.

One curiosity with this is the "int2vector" and "aclitem" datatypes. They
have a hash opclass, but no b-tree operators. GPDB 4 used to allow them
as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit
56e7c16b. Now they are allowed again, so you can specify an int2vector
or aclitem column in DISTRIBUTED BY, but it's still pretty useless,
because the planner still can't form EquivalenceClasses on it, and will
treat it as "strewn" distribution, and won't co-locate joins.

Abstime, reltime, tinterval datatypes don't have default hash opclasses.
They are being removed completely on PostgreSQL v12, and users shouldn't
be using them in the first place, so instead of adding hash opclasses for
them now, we accept that they can't be used as distribution key columns
anymore. Add a check to pg_upgrade, to refuse upgrade if they are used
as distribution keys in the old cluster. Do the same for 'money' datatype
as well, although that's not being removed in upstream.

The legacy hashing code for anyarray in GPDB 5 was actually broken. It
could produce a different hash value for two arrays that are considered
equal, according to the = operator, if there were differences in e.g.
whether the null bitmap was stored or not. Add a check to pg_upgrade, to
reject the upgrade if array types were used as distribution keys. The
upstream hash opclass for anyarray works, though, so it is OK to use
arrays as distribution keys in new tables. We just don't support binary
upgrading them from GPDB 5. (See github issue
https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of
'anyrange' had the same problem, but that was new in GPDB 6, so we don't
need a pg_upgrade check for that.

This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE
INDEX, so that you can no longer create a situation where a non-hashable
column becomes the distribution key. (Fixes github issue
https://github.com/greenplum-db/gpdb/issues/6317)

Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io>
Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
Co-authored-by: NPengzhou Tang <ptang@pivotal.io>
Co-authored-by: NChris Hajas <chajas@pivotal.io>
Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
Reviewed-by: NNing Yu <nyu@pivotal.io>
Reviewed-by: NSimon Gao <sgao@pivotal.io>
Reviewed-by: NJesse Zhang <jzhang@pivotal.io>
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
Reviewed-by: NYandong Yao <yyao@pivotal.io>

242783ae

25 1月, 2019 1 次提交

Disallow alter parition table's leaf with different distribution compared with... · 74059cca

由 Shaoqi Bai 提交于 1月 25, 2019

Disallow alter parition table's leaf with different distribution compared with entire partitioned table. (#6780)
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>

74059cca

23 1月, 2019 1 次提交

Parent partition and children partition table must have same columns · d8a613b8

由 ZhangJackey 提交于 1月 23, 2019

In the previous code, we can modify the parent partition's column
by ALTER TABLE ONLY, so the column of the parent partition
and children partition may be different.

In order to prohibit this situation, we check the DROP COLUMN/
ADD COLUMN/ALTER TYPE COLUMN statement to prohibit the
user only modify the column of parent partition or children partitions.

There was a discussion on gpdb-dev@:
https://groups.google.com/a/greenplum.org/forum/#!msg/gpdb-dev/0SzL_gSbqKo/d-2RpwKrFwAJ

d8a613b8

10 1月, 2019 1 次提交

Resolve FIXME for validatepart by passing relpersistence of root · 263355cf

由 Melanie Plageman 提交于 1月 03, 2019

MergeAttributes was used in atpxPart_validate_spec to get the schema and
constraints to make a new leaf partition as part of ADD or SPLIT
PARTITION. It was likely used as a convenience, since it already
existed, and seems like the wrong function for the job.

Previously atpxPart_validate_spec simply hard-coded in false for the relation
persistence since the parameter was simply `isTemp`. Once the options
for relation persistence were expanded to included unlogged, this
paramter was changed to take a relpersistence. In MergeAttributes, for
the part which we actually hit when calling it from here (we pass in the
schema as NIL and therefore hit only half of the MergeAttributes code)
the `supers` parameter is actually that of the parent partition and
includes relpersistence, so, by passing in the relpersistence of the
parent as relpersistence here, the checks we do around relpersistence are
redundant because we are comparing the parent's relpersistence to its
own. However, because, currently, this function is only called when we
are making a new relation that, because we don't allow a different
persistence to be specified for the child would actually just be using
the relpersistence of the parent anyway, by passing it in hard-coded we
would actually be incorrectly assuming that we are creating a permanent
relation always.

Since MergeAttributes was overkill, we wrote a new helper
function, SetSchemaAndConstraints, to get the schema and constraints of
a relation. This function doesn't do very many special validation checks
that may be required by callers when using it in the context of
partition tables (so user beware), however, it is probably only useful
in the context of partition tables because it assumes constraints will
be cooked, which, wouldn't be the case for all relations.
We split it into two smaller inline functions for clarity. We also felt
this would be a useful helper function in general, so we extern'd it.

This commit also sets the relpersistence that is used to make the leaf
partition when adding a new partition or splitting an existing a partition.

makeRangeVar is a function from upstream which is basically a
constructor. It sets relpersistence in the RangeVar to a hard-coded
value of RELPERSISTENCE_PERMANENT. However, because we use the root
partition to get the constraints and column information for the new
leaf, after we use the default construction of the RangeVar, we need to
set the relpersistence to that of the parent.

This commit specifically only sets it back for the case in which we are
adding a partition with `ADD PARTITION` or through `SPLIT PARTITION`.

Without this commit, a leaf partition of an unlogged table created
through `ADD PARTITION` or `SPLIT PARTITION` would incorrectly have its
relpersistence set to permanent.
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>

263355cf

08 1月, 2019 1 次提交
- H
  
  Whitespace fixes, to match upstream. · 56f9ecd8
  由 Heikki Linnakangas 提交于 1月 08, 2019
  
  56f9ecd8
27 12月, 2018 1 次提交

Donot expose gp_segment_id to users for replicated table (#6342) · b120194a

由 Tang Pengzhou 提交于 12月 27, 2018

* Do not expose system columns to users for replicated table

for replicated table, all replica in segments should always be the same which
make gp_segment_id ambiguous for replicated table, instead of makeing effort
to keep gp_segment_id the same in all segments, I would like to just hide the
gp_segment_id for replicated table to make things simpler.

This commit only make gp_segment_id invisible to users at the stage of
transforming parsetree, in the underlying storage, each segment still store
different value for system column gp_segment_id, operations like SPLITUPDATE
might also use gp_segment_id to do an explicit redistribution motion, this is
fine as long as the user-visible columns have the same data.

* Fixup reshuffle* test cases

reshuffle.* cases used to use gp_segment_id to test that a replicated
table is actually expanded to new segments, now gp_segment_id is
invisible to users, some errors reports.

Because replicated table has triple data, we can get enough info
from the total count even without a group by of gp_segment_id. It's
not 100% accurate but should be enough already.

* Do dependencies check when converting a table to replicated table

system columns of replicated table should not be exposed to users,
we do the check at parser stage, however, if users create views or
rules involve the system columns of a hash distributed table and
then the hash distributed table is converted to a replicated table,
then users can still access the system columns of replicated table
through views and rules because they bypass the check in the parser
stage. To resolve this, we add a dependencies check when altering
a table to replicated table, users need to drop the views or rules
first.

I tried to add a recheck in later stages like planner or executor,
but it seems that system columns like CTID are used internally for
basic DMLs like UPDATE/DELETE and those columns are added even
before the rewrite of views and rules, so adding a recheck will
block basic DMLs too.

b120194a

25 12月, 2018 1 次提交

Make SendTuple do de-toasting for heap as well as memtuples (#6527) · bf5e3d5d

由 David Kimura 提交于 12月 24, 2018

* Make SendTuple do de-toasting for heap as well as memtuples

Motion node didn't consistently handle where MemTuples and HeapTuples
are de-toasted. This change delays de-toasting of MemTuple in motion
sender until SerializeTupleDirect(), which is where HeapTuples are
de-toasted.

This change also corrects a bug in de-toasting of memtuples in motion
sender.  De-toasting of memtuples marked the slot containing the
de-toasted memtuple as virtual.  When a TupleTableSlot is marked
virtual, its values array should point to addresses within the memtuple.
The bug was that the values array was populated *before* de-toasting and
the memory used by the toasted memtuple was freed.  Subsequent usages of
this slot suffered from dereferencing invalid memory through the
pointers in values array.

This change corrects the issue by simplifying ExecFetchSlotMemTuple() to
no longer manage memory required to de-toast which and also aligns the
code structure more similar to ExecFetchSlotMinimalTuple() in Postgres.
An alternative solution that we considered is to continue de-toasting in
ExecFetchSlotMemTuple() and clear the virtual bit before freeing the
tuple. A disadvantage of that approach is that it modifies an existing
slot and complicates the contract of memtuple and virtual table slots.
Also, it doesn't match with where HeapTuples are de-toasted so the
contract becomes less clear.

The bug is hard to discover because motion node in executor is the only place
where a memtuple is de-toasted, before sending it over interconnect.  Only a
few plan nodes make use of a tuple that's already returned to the parent nodes
e.g. Unique (DISTINCT ON()) and SetOp (INTERSECT / EXCEPT).  Example plan that
manifests the bug:

 Gather Motion 3:1
   Merge Key: a, b
   ->  Unique
         Group By: a, b
         ->  Sort
               Sort Key (Distinct): a, b
               ->  Seq Scan

The Unique node uses the previous tuple de-toasted by the Gather Motion sender
to compare with the new tuple obtained from the Sort node for eliminating
duplicates.
Co-authored-by: NAsim R P <apraveen@pivotal.io>

bf5e3d5d

19 12月, 2018 2 次提交

A

Remove duplication of conditional logic for append-only metadata relkind. · ae6b5965
由 Adam Berlin 提交于 12月 05, 2018

ae6b5965

Enable `UNLOGGED` appendonly tables. · d0306353

由 Adam Berlin 提交于 12月 03, 2018

- Unlogged recovery behavior requires an init fork for appendonly table and its auxilliary tables.
They allow recovery to identify tables that are unlogged and need truncation.
Co-authored-by: NAsim R P <apraveen@pivotal.io>

d0306353

15 12月, 2018 1 次提交

Clean up appendonly/aocs_getnext(). · 56ce426f

由 Heikki Linnakangas 提交于 12月 15, 2018

Make both of them to return a bool. Now they're consistent, and a bool
seems most useful for the callers.
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>

56ce426f

13 12月, 2018 1 次提交

Reporting cleanup for GPDB specific errors/messages · 56540f11

由 Daniel Gustafsson 提交于 12月 13, 2018

The Greenplum specific error handling via ereport()/elog() calls was
in need of a unification effort as some parts of the code was using a
different messaging style to others (and to upstream). This aims at
bringing many of the GPDB error calls in line with the upstream error
message writing guidelines and thus make the user experience of
Greenplum more consistent.

The main contributions of this patch are:

* errmsg() messages shall start with a lowercase letter, and not end
  with a period. errhint() and errdetail() shall be complete sentences
  starting with capital letter and ending with a period. This attempts
  to fix this on as many ereport() calls as possible, with too detailed
  errmsg() content broken up into details and hints where possible.

* Reindent ereport() calls to be more consistent with the common style
  used in upstream and most parts of Greenplum:

	ereport(ERROR,
			(errcode(<CODE>),
			 errmsg("short message describing error"),
			 errhint("Longer message as a complete sentence.")));

* Avoid breaking messages due to long lines since it makes grepping
  for error messages harder when debugging. This is also the de facto
  standard in upstream code.

* Convert a few internal error ereport() calls to elog(). There are
  no doubt more that can be converted, but the low hanging fruit has
  been dealt with. Also convert a few elog() calls which are user
  facing to ereport().

* Update the testfiles to match the new messages.

Spelling and wording is mostly left for a follow-up commit, as this was
getting big enough as it was. The most obvious cases have been handled
but there is work left to be done here.

Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

56540f11

12 12月, 2018 1 次提交

Enable alter table column with index (#6286) · 28dd0152

由 Jinbao Chen 提交于 12月 12, 2018

* Make sure ALTER TABLE preserves index tablespaces.

When rebuilding an existing index, ALTER TABLE correctly kept the
physical file in the same tablespace, but it messed up the pg_class
entry if the index had been in the database's default tablespace
and "default_tablespace" was set to some non-default tablespace.
This led to an inaccessible index.

Fix by fixing pg_get_indexdef_string() to always include a tablespace
clause, whether or not the index is in the default tablespace. The
previous behavior was installed in commit 537e92e4, and I think it just
wasn't thought through very clearly; certainly the possible effect of
default_tablespace wasn't considered. There's some risk in changing the
behavior of this function, but there are no other call sites in the core
code. Even if it's being used by some third party extension, it's fairly
hard to envision a usage that is okay with a tablespace clause being
appended some of the time but can't handle it being appended all the time.

Back-patch to all supported versions.

Code fix by me, investigation and test cases by Michael Paquier.

Discussion: <1479294998857-5930602.post@n3.nabble.com>

* Enable alter table column with index

Originally, we disabled alter table column with index. Because INDEX CREATE is dispatched immediately, which unfortunately breaks the ALTER work queue. So we have a workaround and disable alter table column with index.
In postgres91, is_alter_table param was introduced to 'DefineIndex'. So we have a chance to re-enable this feature.

28dd0152

04 12月, 2018 1 次提交
- T
  
  Add hook point for `alter stable set distributed by` · d3bb358e
  由 Teng zhang 提交于 11月 20, 2018
  
  d3bb358e
27 11月, 2018 1 次提交

Do not generate ReshuffleExpr for replicated table · e12d8cab

由 Zhenghua Lyu 提交于 11月 27, 2018

When we expand a partial replicated table via `alter
table t expand table`, internally we use the split-update
framework to implement the expansion. That framework is
designed for hash-distribtued tables at first. For replicated
table, we do not need the reshuffle_expr(filter condition) at
all because we need to transfer all data in a replicated table.

e12d8cab

24 11月, 2018 1 次提交

Improve error, on ALTER TABLE SET DISTRIBUTED BY on non-hashable type. · 0a7ea4ea

由 Heikki Linnakangas 提交于 11月 24, 2018

Old error:

ERROR:  cannot use expression as distribution key, because it is not hashable (cdbmutate.c:1329)

The new error is the same you get with CREATE TABLE. While we're at it,
also change "can't" contraction to "cannot" in the error message, to
follow the PostgreSQL error message guidelines.

0a7ea4ea

23 11月, 2018 1 次提交

Implement EXPAND syntax · cfe3f386

由 Pengzhou Tang 提交于 11月 15, 2018

Implement "ALTER TABLE table EXPAND TABLE" to expand tables.

"Expanding" and "Set Distributed by" are actually two different kind of
operations on tables, old gpexpand used to use "Set Distributed by" to
expand tables for historical reasons and our early version of expand
were also squashed into "Set Distributed by", this make code hard to
hard to understand and concept confused.

This commit divide "Expanding" and "Set Distributed by" totally and
implement "Expanding" with new syntax. We have two method to implement
data movement, one is CTAS, another is RESHUFFLE, depend on how much
data need to move. If tuples to move is less than 10000, choose
RESHUFFLE, or if scale to move is less than 30% choose RESHUFFLE,
otherwise, choose CTAS

For partition table, we disallow expand leaf partition seperately because
root partition cannot has different numsegments with leaf partitions,
SELECT/UPDATE should be fine if numsegments is inconsistent, however,
INSERT will make trouble that data are inserted to unexpected place.

The new syntax is supposed to only used by gpexpand and not be exposed
to normal users, so no need to update document.

cfe3f386

22 11月, 2018 1 次提交

Remove dead code in ATExecSetDistributedBy · 50f2e3bb

由 Zhenghua Lyu 提交于 11月 22, 2018

This commit is the first step to refactor ATExecSetDistributedBy. Its
main purpose is to remove some dead code in this function and during
the process we find some helper functions can also be simplified so
the simplification is also in this commit.

According to MPP-7770, we should disable changing storage options for now.
It is ugly to just throw an error when encounter `appendonly` option but
without removing the code. In this commit remove all related logic.

Because of with clause can only contain reshuffle|reorganize, we only
new_rel_opts if the table itself is ao|aoco. No need to deduce it from with
clause.

We also remove the unnecessary checks at the start of this function. Because
These checks have been already done in the function `ATPrepCmd`.

Co-authored-by: Shuejie Zhang <shzhang@pivotal.io >

50f2e3bb

19 11月, 2018 1 次提交

Correct behavior for reshuffling partition tables · 8bf413d6

由 ZhangJackey 提交于 11月 19, 2018

The previous code makes UPDATE statement for root
and its children partitions when we reshuffle a partition
table. It not only involves redundant work but also will
lead to an error while reshuffling a two-level partition
table(because the mid-level partitions have no data).

The commit does the following work:

* Only make UPDATE statement for leaf partition or
   non-partition table.
* Refactor the reshuffle test cases. We remove the
   python udf code and use `gp_execute_on_server`
   and `gp_dist_random` to test replicated table.

Co-authored-by: Shujie Zhang shzhang@pivotal.io
Co-authored-by: Zhenghua Lyu zlv@pivotal.io

8bf413d6

12 11月, 2018 2 次提交

Fix reshuffle issue on inheritance table · 4b404797

由 Pengzhou Tang 提交于 11月 01, 2018

When reshuffling inheritance table whose children have different gppolicy,
we find that the policy of children tables are changed to the same policy
with the parent table on all segments, eg, parent table mix_base_tbl is
distributed by (c1), its child mix_child_a is distributed by (c2), after
a reshuffling, the policy of mix_child_a is changed to be distributed by
(c1) on all segments. This is not by designed, reshuffle only expect the
numsegments property changes.

4b404797

Fix incorrect assertion within adjust_modifytable_flow · c6dba855

由 Pengzhou Tang 提交于 10月 24, 2018

Previously, when we try to update a partition table or an inherited table,
we assume all parent table and child table should have the same number of
segments in gp_distribution_policy catalog, otherwise, we let the query
fail.

The assumption is incorrect, updating on mix size of target relations
can work actually, we only need to set the flow size of modifytable node
to the maxinum size of all target relations and take care of the output
size of motion for each target relations individually.

It's very common that parent table and child table has different number
of segments in gp_distribution_policy, gpexpand may expand the child
table first and then the parent table.

c6dba855

07 11月, 2018 1 次提交

Adjust GANG size according to numsegments · 6dd2759a

由 ZhangJackey 提交于 11月 07, 2018

Now we have  partial tables and flexible GANG API, so we can allocate
GANG according to numsegments.

With the commit 4eb65a53, GPDB supports table distributed on partial segments,
and with the series of commits (a3ddac06, 576690f2), GPDB supports flexible
gang API. Now it is a good time to combine both the new features. The goal is
that creating gang only on the necessary segments for each slice. This commit
also improves singleQE gang scheduling and does some code clean work. However,
if ORCA is enabled, the behavior is just like before.

The outline of this commit is:

  * Modify the FillSliceGangInfo API so that gang_size is truly flexible.
  * Remove numOutputSegs and outputSegIdx fields in motion node. Add a new
     field isBroadcast to mark if the motion is a broadcast motion.
  * Remove the global variable gp_singleton_segindex and make singleQE
     segment_id randomly(by gp_sess_id).
  * Remove the field numGangMembersToBeActive in Slice because it is now
     exactly slice->gangsize.
  * Modify the message printed if the GUC Test_print_direct_dispatch_info
     is set.
  * Explicitly BEGIN create a full gang now.
  * format and remove destSegIndex
  * The isReshuffle flag in ModifyTable is useless, because it only is used
     when we want to insert tuple to the segment which is out the range of
     the numsegments.

Co-authored-by: Zhenghua Lyu zlv@pivotal.io

6dd2759a

05 11月, 2018 1 次提交

Make more use of make_dist_clause(). · c0d1dbd6

由 Heikki Linnakangas 提交于 11月 05, 2018

We had duplicated code in a few places, to reconstruct a DistributedBy
clause from policy of an existing relation. Use the existing function
to do that.

Rename the function to make_distributedby_for_rel(). That's a more
descriptive name.
Reviewed-by: NNing Yu <nyu@pivotal.io>

c0d1dbd6