提交 · b4e2e3e2280ca81d1a3a1b20d9659e229238ed18 · Greenplum / Gpdb

14 3月, 2019 1 次提交

Rename legacy planner to Postgres planner · b4e2e3e2

由 Daniel Gustafsson 提交于 3月 14, 2019

As we merge with upstream and by that keep refining the Postgres
planner, legacy planner is no longer a suitable name. This changes
all variations of the spelling (legacy planner, legacy optimizer,
legacy query optimizer etc) to say "Postgres" rather than "legacy".
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
Reviewed-by: NDavid Yozie <dyozie@pivotal.io>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

b4e2e3e2

11 3月, 2019 1 次提交

Retire the reshuffle method for table data expansion (#7091) · 1c262c6e

由 Ning Yu 提交于 3月 11, 2019

This method was introduced to improve the data redistribution
performance during gpexpand phase2, however per benchmark results the
effect does not reach our expectation. For example when expanding a
table from 7 segments to 8 segments the reshuffle method is only 30%
faster than the traditional CTAS method, when expanding from 4 to 8
segments reshuffle is even 10% slower than CTAS. When there are indexes
on the table the reshuffle performance can be worse, and extra VACUUM is
needed to actually free the disk space. According to our experiments
the bottleneck of reshuffle method is on the tuple deletion operation,
it is much slower than the insertion operation used by CTAS.

The reshuffle method does have some benefits, it requires less extra
disk space, it also requires less network bandwidth (similar to CTAS
method with the new JCH reduce method, but less than CTAS + MOD). And
it can be faster in some cases, however as we can not automatically
determine when it is faster it is not easy to get benefit from it in
practice.

On the other side the reshuffle method is less tested, it is possible to
have bugs in corner cases, so it is not production ready yet.

In such a case we decided to retire it entirely for now, we might add it
back in the future if we can get rid of the slow deletion or find out
reliable ways to automatically choose between reshuffle and ctas
methods.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/8xknWag-SkI/5OsIhZWdDgAJReviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

1c262c6e

27 2月, 2019 1 次提交

refactor NUMSEGMENTS related macro (#7028) · d28b7057

由 Jialun 提交于 2月 27, 2019

- Retire GP_POLICY_ALL_NUMSEGMENTS and GP_POLICY_ENTRY_NUMSEGMENTS,
  unify to getgpsegmentCount
- retire GP_POLICY_MINIMAL_NUMSEGMENTS & GP_POLICY_RANDOM_NUMSEGMENTS
- Change NUMSEGMENTS related macro from variable macro to function
  macro
- Change default return value of getgpsegmentCount to 1, which
  represents a singleton postgresql in utility mode
- change __GP_POLICY_INVALID_NUMSEGMENTS to GP_POLICY_INVALID_NUMSEGMENTS

d28b7057

06 2月, 2019 1 次提交

Fix "cache lookup failed" errors when reshuffling table with domain. · ae77c1de

由 Heikki Linnakangas 提交于 2月 06, 2019

The code to look up the hash functions for a Reshuffle plan used
get_opfamily_proc() instead of the more versatile
cdb_hashproc_in_family() function, which is used in other similar places
where we need to look up the hash functions for a distribution key. Like
in makeCdbHashForRelation(). That lead to errors if the datatype didn't
have a hash function defined directly for the datatype, but only via a
binary-coercible cast. Domain and enum types are such cases, for example.

Fixes https://github.com/greenplum-db/gpdb/issues/6901Reviewed-by: NDavid Kimura <dkimura@pivotal.io>
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>

ae77c1de

01 2月, 2019 1 次提交

Use normal hash operator classes for data distribution. · 242783ae

由 Heikki Linnakangas 提交于 2月 01, 2019

Replace the use of the built-in hashing support for built-in datatypes, in
cdbhash.c, with the normal PostgreSQL hash functions. Now is a good time
to do this, since we've already made the change to use jump consistent
hashing in GPDB 6, so we'll need to deal with the upgrade problems
associated with changing the hash functions, anyway.

It is no longer enough to track which columns/expressions are used to
distribute data. You also need to know the hash function used. For that,
a new field is added to gp_distribution_policy, to record the hash
operator class used for each distribution key column. In the planner,
a new opfamily field is added to DistributionKey, to track that throughout
the planning.

Normally, if you do "CREATE TABLE ... DISTRIBUTED BY (column)", the
default hash operator class for the datatype is used. But this patch
extends the syntax so that you can specify the operator class explicitly,
like "... DISTRIBUTED BY (column opclass)". This is similar to how an
operator class can be specified for each column in CREATE INDEX.

To support upgrade, the old hash functions have been converted to special
(non-default) operator classes, named cdbhash_*_ops. For example, if you
want to use the old hash function for an integer column, you could do
"DISTRIBUTED BY (intcol cdbhash_int4_ops)". The old hard-coded whitelist
of operators that have "compatible" cdbhash functions has been replaced
by putting the compatible hash opclasses in the same operator family. For
example, all legacy integer operator classes, cdbhash_int2_ops,
cdbhash_int4_ops and cdbhash_int8_ops, are all part of the
cdbhash_integer_ops operator family).

This removes the pg_database.hashmethod field. The hash method is now
tracked on a per-table and per-column basis, using the opclasses, so it's
not needed anymore.

To help with upgrade from GPDB 5, this introduces a new GUC called
'gp_use_legacy_hashops'. If it's set, CREATE TABLE uses the legacy hash
opclasses, instead of the default hash opclasses, if the opclass is not
specified explicitly. pg_upgrade will set the new GUC, to force the use of
legacy hashops, when restoring the schema dump. It will also set the GUC
on all upgraded databases, as a per-database option, so any new tables
created after upgrade will also use the legacy opclasses. It seems better
to be consistent after upgrade, so that collocation between old and new
tables work for example. The idea is that some time after the upgrade, the
admin can reorganize all tables to use the default opclasses instead. At
that point, he should also clear the GUC on the converted databases. (Or
rather, the automated tool that hasn't been written yet, should do that.)

ORCA doesn't know about hash operator classes, or the possibility that we
might need to use a different hash function for two columns with the same
datatype. Therefore, it cannot produce correct plans for queries that mix
different distribution hash opclasses for the same datatype, in the same
query. There are checks in the Query->DXL translation, to detect that
case, and fall back to planner. As long as you stick to the default
opclasses in all tables, we let ORCA to create the plan without any regard
to them, and use the default opclasses when translating the DXL plan to a
Plan tree. We also allow the case that all tables in the query use the
"legacy" opclasses, so that ORCA works after pg_upgrade. But a mix of the
two, or using any non-default opclasses, forces ORCA to fall back.

One curiosity with this is the "int2vector" and "aclitem" datatypes. They
have a hash opclass, but no b-tree operators. GPDB 4 used to allow them
as DISTRIBUTED BY columns, but we forbid that in GPDB 5, in commit
56e7c16b. Now they are allowed again, so you can specify an int2vector
or aclitem column in DISTRIBUTED BY, but it's still pretty useless,
because the planner still can't form EquivalenceClasses on it, and will
treat it as "strewn" distribution, and won't co-locate joins.

Abstime, reltime, tinterval datatypes don't have default hash opclasses.
They are being removed completely on PostgreSQL v12, and users shouldn't
be using them in the first place, so instead of adding hash opclasses for
them now, we accept that they can't be used as distribution key columns
anymore. Add a check to pg_upgrade, to refuse upgrade if they are used
as distribution keys in the old cluster. Do the same for 'money' datatype
as well, although that's not being removed in upstream.

The legacy hashing code for anyarray in GPDB 5 was actually broken. It
could produce a different hash value for two arrays that are considered
equal, according to the = operator, if there were differences in e.g.
whether the null bitmap was stored or not. Add a check to pg_upgrade, to
reject the upgrade if array types were used as distribution keys. The
upstream hash opclass for anyarray works, though, so it is OK to use
arrays as distribution keys in new tables. We just don't support binary
upgrading them from GPDB 5. (See github issue
https://github.com/greenplum-db/gpdb/issues/5467). The legacy hashing of
'anyrange' had the same problem, but that was new in GPDB 6, so we don't
need a pg_upgrade check for that.

This also tightens the checks ALTER TABLE ALTER COLUMN and CREATE UNIQUE
INDEX, so that you can no longer create a situation where a non-hashable
column becomes the distribution key. (Fixes github issue
https://github.com/greenplum-db/gpdb/issues/6317)

Discussion: https://groups.google.com/a/greenplum.org/forum/#!topic/gpdb-dev/4fZVeOpXllQCo-authored-by: NMel Kiyama <mkiyama@pivotal.io>
Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>
Co-authored-by: NPengzhou Tang <ptang@pivotal.io>
Co-authored-by: NChris Hajas <chajas@pivotal.io>
Reviewed-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
Reviewed-by: NNing Yu <nyu@pivotal.io>
Reviewed-by: NSimon Gao <sgao@pivotal.io>
Reviewed-by: NJesse Zhang <jzhang@pivotal.io>
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
Reviewed-by: NYandong Yao <yyao@pivotal.io>

242783ae

25 1月, 2019 1 次提交

Remove GPDB_92_MERGE_FIXME from prepunion.c · 669893be

由 Alexandra Wang 提交于 1月 22, 2019

The GPDB_92_MERGE_FIXME for whether we need to deep copy or memcopy
suffices in case of subroot can be removed as from the subroot all we
care about is the `parse->rtable`, therefore, creating a deep copy of
it is unnecessary.

This commit also removes the `Assert()` which is valid in Upstream but
for GPDB, since we create a new copy of the subplan if two SubPlans
refer to the same initplan. Therefore, when we try to set references for
subqueryscans in plans with copies of subplans refering to same
initplan, we cannot directly Assert on the RelOptInfo's subplan being
same as the subqueryscan's subplan.

Added a test case for the same, which will ensure we do not merge back
the Assert back from Upstream in future merges.
Co-authored-by: NEkta Khanna <ekhanna@pivotal.io>

669893be

15 1月, 2019 1 次提交

Assign plan_node_id for ModifyTable, MergeAppend (#6695) · 82fb3b5a

由 Wang Hao 提交于 1月 15, 2019

Some plan node types such as ModifyTable or MergeAppend is not covered in assign_plannode_id(), leads children nodes of them are not assigned with proper plan_node_id. The plan_node_id is required by gpmon and instrument for monitoring purpose, without proper plan_node_id assigned, the consistency of monitoring data will be broken.

This commit refactor assign_plannode_id() to use plan_tree_walker. As a result, ModifyTable, MergeAppend and potentially Sequence are covered. Another advantage of using plan_tree_walker is when new types are introduced, we don't need to take care of assign_plannode_id anymore, plan_tree_walker should do that.

Fixes https://github.com/greenplum-db/gpdb/issues/5247Reviewed-by: NNing Yu <nyu@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

82fb3b5a

19 12月, 2018 1 次提交

Fix split node's flow type. · 104c2b1c

由 Zhenghua Lyu 提交于 12月 19, 2018

Split-update is used for update-statement on a hash-distributed
table's hash-columns. A redistributed-motion has to be added above
the split node in the plan and this was achieved by marking the split
node strew. However, if the subplan's flow is an entry, we should not
mark it strewn.

104c2b1c

14 12月, 2018 1 次提交
- H
  
  Remove a bunch of unnecessary #includes. · bb803c72
  由 Heikki Linnakangas 提交于 12月 14, 2018
  
  bb803c72
13 12月, 2018 1 次提交

Reporting cleanup for GPDB specific errors/messages · 56540f11

由 Daniel Gustafsson 提交于 12月 13, 2018

The Greenplum specific error handling via ereport()/elog() calls was
in need of a unification effort as some parts of the code was using a
different messaging style to others (and to upstream). This aims at
bringing many of the GPDB error calls in line with the upstream error
message writing guidelines and thus make the user experience of
Greenplum more consistent.

The main contributions of this patch are:

* errmsg() messages shall start with a lowercase letter, and not end
  with a period. errhint() and errdetail() shall be complete sentences
  starting with capital letter and ending with a period. This attempts
  to fix this on as many ereport() calls as possible, with too detailed
  errmsg() content broken up into details and hints where possible.

* Reindent ereport() calls to be more consistent with the common style
  used in upstream and most parts of Greenplum:

	ereport(ERROR,
			(errcode(<CODE>),
			 errmsg("short message describing error"),
			 errhint("Longer message as a complete sentence.")));

* Avoid breaking messages due to long lines since it makes grepping
  for error messages harder when debugging. This is also the de facto
  standard in upstream code.

* Convert a few internal error ereport() calls to elog(). There are
  no doubt more that can be converted, but the low hanging fruit has
  been dealt with. Also convert a few elog() calls which are user
  facing to ereport().

* Update the testfiles to match the new messages.

Spelling and wording is mostly left for a follow-up commit, as this was
getting big enough as it was. The most obvious cases have been handled
but there is work left to be done here.

Discussion: https://github.com/greenplum-db/gpdb/pull/6378Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

56540f11

03 12月, 2018 2 次提交

Change representation of hash filter in Result from List to array. · 302a2aa8

由 Heikki Linnakangas 提交于 12月 02, 2018

For consistency: this is how we represent column indexes e.g. in Sort,
Unique, MergeAppend and many other plan types.
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>

302a2aa8

Stop abusing Result's hash filter for running a plan on arbitrary segment. · 3c89b2b4

由 Heikki Linnakangas 提交于 12月 02, 2018

ORCA generated plans, where the "hash filter" in the Result node was set
to an empty set of columns. That meant "discard all the rows, on all
segments, except one segment". This is used at least with set-returning
functions, where we don't care where the function is executed, but it only
needs to be executed once. (The planner creates a one-to-many Redistribute
Motion plan in that scenario, which makes a lot more sense to me, but
doing the same in ORCA would require more invasive surgery than what I'm
capable of.)

Instead of executing the subplan, and throwing away the result one row at
a time, use a Result plan with a One-Off Filter. That's more efficient.
Also, it allows removing the Result.hashFilter boolean flag, because there
the weird case of a hashFilter with zero columns is gone. You can check
"hashList != NIL" directly now.

The old method would always choose the same segment, which seems bad for
load distribution. The way it was chosen seemed totally accidental too:
we initialized the cdbhash object to the initial constant value, and
then reduced that into the target segment number, using the jump
consistent hash algorithm. We computed that for every row, but the result
was always the same. On a three-node cluster, the target was always
segment 1. Now, we pick a segment at random when generating the plan.
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>

3c89b2b4

27 11月, 2018 1 次提交

Correct numsegments in reshuffle node · c868f3fe

由 Zhenghua Lyu 提交于 11月 27, 2018

Previously the reshuffle node's numsegments is always
set to the cluster size. Now we have flexible gang & dispath
API, we should correct the numsegments field of reshuffle
node to set it as the its lefttree's flow->numsegments.
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

c868f3fe

23 11月, 2018 3 次提交

H

Remove unnecessary #includes. · f30355fa
由 Heikki Linnakangas 提交于 11月 23, 2018

f30355fa
H

Remove a few unused functions from ORCA's gpdbwrappers. · 3e18f878
由 Heikki Linnakangas 提交于 11月 23, 2018

3e18f878

Fix a bug of replicated table · 0e461e16

由 Pengzhou Tang 提交于 11月 06, 2018

Previously, when creating join path between CdbLocusType_SingleQE path
and CdbLocusType_SegmentGeneral path, we always add a motion on top
of CdbLocusType_SegmentGeneral path so even the join path is promoted
to executed on QD, the CdbLocusType_SegmentGeneral path can still be
executed to segments.
                     join (CdbLocusType_SingleQE)
					/    \
                   /      \
CdbLocusType_SingleQE     Gather Motion
                            \
                          CdbLocusType_SegmentGeneral

For example:
(select * from partitioned_table limit 1) as t1
Nested Loop
    ->  Gather Motion 1:1
	     ->  Seq Scan on replicated_table
    ->  Materialize
		 ->  Subquery Scan on t1
		    ->  Limit
			   ->  Gather Motion 3:1
	               ->  Limit
		               ->  Seq Scan on partitioned_table
replicated_table only store tuples on segments, so without
the gather motion, seq scan of replicated_table doesn't
provide tuples.

There is another problem, if join path is not promoted to
QD, the gather motion might be redundant, For example:

  (select * from replicated_table, (select * from
  partitioned_table limit 1) t1) sub1;

Gather Motion 3:1
  -> Nested Loop
      ->  Seq Scan on partitioned_table_2
      ->  Materialize
          ->  Broadcast Motion 1:3
              -> Nested Loop
                 ->  Gather Motion 1:1 (redundant motion)
	                 ->  Seq Scan on replicated_table
              ->  Materialize
		         ->  Subquery Scan on t1
		            ->  Limit
			             ->  Gather Motion 3:1
	                       ->  Limit
		                      ->  Seq Scan on partitioned_table

So in apply_motion_mutator(), we omit such redundant motion if
it's not gathered to top slice (QD). sliceDepth == 0 means it
is top slice, however, sliceDepth now is shared by both init
plans and main plan, so if main plan increased the sliceDepth,
init plan may omit the gather motion unexpectedly which create
a wrong results.

The fix is simple to reset sliceDepth for init plans

0e461e16

22 11月, 2018 3 次提交

Fix confusion with distribution keys of queries with FULL JOINs. · a25e2cd6

由 Heikki Linnakangas 提交于 11月 22, 2018

There was some confusion on how NULLs are distributed, when CdbPathLocus
is of Hashed or HashedOJ type. The comment in cdbpathlocus.h suggested
that NULLs can be on any segment. But the rest of the code assumed that
that's true only for HashedOJ, and that for Hashed, all NULLs are stored
on a particular segment. There was a comment in cdbgroup.c that said "Or
would HashedOJ ok, too?"; the answer to that is "No!". Given the comment
in cdbpathlocus.h, I'm not suprised that the author was not very sure
about that. Clarify the comments in cdbpathlocus.h and cdbgroup.c on that.

There were a few cases where we got that actively wrong. repartitionPlan()
function is used to inject a Redistribute Motion into queries used for
CREATE TABLE AS and INSERT, if the "current" locus didn't match the target
table's policy. It did not check for HashedOJ. Because of that, if the
query contained FULL JOINs, NULL values might end up on all segments. Code
elsewhere, particularly in cdbgroup.c, assumes that all NULLs in a table
are stored on a single segment, identified by the cdbhash value of a NULL
datum. Fix that, by adding a check for HashedOJ in repartitionPlan(), and
forcing a Redistribute Motion.

CREATE TABLE AS had a similar problem, in the code to decide which
distribution key to use, if the user didn't specify DISTRIBUTED BY
explicitly. The default behaviour is to choose a distribution key that
matches the distribution of the query, so that we can avoid adding an
extra Redistribute Motion. After fixing repartitionPlan, there was no
correctness problem, but if we chose the key based on a HashedOJ locus,
there is no performance benefit because we'd need a Redistribute Motion
anyway. So modify the code that chooses the CTAS distribution key to
ignore HashedOJ.

While we're at it, refactor the code to choose the CTAS distribution key,
by moving it to a separate function. It had become ridiculously deeply
indented.

Fixes https://github.com/greenplum-db/gpdb/issues/6154, and adds tests.
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>

a25e2cd6

Cosmetic fixes in the code to determine distribution key for CTAS. · a5fa3110

由 Heikki Linnakangas 提交于 11月 22, 2018

Fix indentation. In the code to generate a NOTICE, remove if() for
condition that we had checked earlier in the function already, and use a
StringInfo for building the string.

a5fa3110

New extension to debug partially distributed tables · 3119009a

由 Ning Yu 提交于 11月 22, 2018

Introduced a new debugging extension gp_debug_numsegments to get / set
the default numsegments when creating tables.

gp_debug_get_create_table_default_numsegments() gets the default
numsegments.

gp_debug_set_create_table_default_numsegments(text) sets the default
numsegments in text format, valid values are:
- 'FULL': all the segments;
- 'RANDOM': pick a random set of segments each time;
- 'MINIMAL': the minimal set of segments;

gp_debug_set_create_table_default_numsegments(integer) sets the default
numsegments directly, valid range is [1, gp_num_contents_in_cluster].

gp_debug_reset_create_table_default_numsegments(text) or
gp_debug_reset_create_table_default_numsegments(integer) reset the
default numsegments to the specified value, and the value can be reused
later.

gp_debug_reset_create_table_default_numsegments() resets the default
numsegments to the value passed last time, if there is no previous call
to it the value is 'FULL'.

Refactored ICG test partial_table.sql to create partial tables with this
extension.

3119009a

13 11月, 2018 1 次提交

Support 'copy (select statement) to file on segment' (#6077) · bad6cebc

由 Jinbao Chen 提交于 11月 13, 2018

In ‘copy (select statement) to file’, we generate a query plan and set
its dest receivor to copy_dest_receive. And run the dest receivor on QD.
In 'copy (select statement) to file on segment', we modify the query plan,
delete gather mothon, and let dest receivor run on QE.

Change 'isCtas' in Query to 'parentStmtType' to be able to mark the upper
utility statement type. Add a CopyIntoClause node to store copy
informations. Add copyIntoClause to PlannedStmt.

In postgres, we don't need to make a different query plan for the
query in the utility stament. But in greenplum, we need to.
So we use a field to indicate whether the query is contained in utitily
statemnt, and the type of utitily statemnt.

Actually the behavior of 'copy (select statement) to file on segment'
is very similar to 'SELECT ... INTO ...' and 'CREATE TABLE ... AS SELECT ...'.
We use distribution policy inherent in the query result as the final data
distribution policy. If not, we use the first clomn in target list as the key,
and redistribute. The only difference is that we used 'copy_dest_receiver'
instead of 'intorel_dest_receiver'

bad6cebc

07 11月, 2018 1 次提交

Adjust GANG size according to numsegments · 6dd2759a

由 ZhangJackey 提交于 11月 07, 2018

Now we have  partial tables and flexible GANG API, so we can allocate
GANG according to numsegments.

With the commit 4eb65a53, GPDB supports table distributed on partial segments,
and with the series of commits (a3ddac06, 576690f2), GPDB supports flexible
gang API. Now it is a good time to combine both the new features. The goal is
that creating gang only on the necessary segments for each slice. This commit
also improves singleQE gang scheduling and does some code clean work. However,
if ORCA is enabled, the behavior is just like before.

The outline of this commit is:

  * Modify the FillSliceGangInfo API so that gang_size is truly flexible.
  * Remove numOutputSegs and outputSegIdx fields in motion node. Add a new
     field isBroadcast to mark if the motion is a broadcast motion.
  * Remove the global variable gp_singleton_segindex and make singleQE
     segment_id randomly(by gp_sess_id).
  * Remove the field numGangMembersToBeActive in Slice because it is now
     exactly slice->gangsize.
  * Modify the message printed if the GUC Test_print_direct_dispatch_info
     is set.
  * Explicitly BEGIN create a full gang now.
  * format and remove destSegIndex
  * The isReshuffle flag in ModifyTable is useless, because it only is used
     when we want to insert tuple to the segment which is out the range of
     the numsegments.

Co-authored-by: Zhenghua Lyu zlv@pivotal.io

6dd2759a

06 11月, 2018 1 次提交

Pass type OIDs in makeCdbHash() call. · ce38fc23

由 Heikki Linnakangas 提交于 11月 05, 2018

Avoids looking through domains, array types, etc. on every call. That
seems like a more sensible API, since the data types don't change during
the lifetime of a CdbHash.

Make cdbhash() more convenient for callers, by handling NULLs within the
function. This way the callers don't need to do the NULL check and call
either cdbhash() or cdbhashnull().

This also fixes the performance issue caused by the syscache lookups
reported in https://github.com/greenplum-db/gpdb/issues/5961. The type's
type is now checked only once, when the CdbHash object is initialized,
instead of every row.
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

ce38fc23

29 10月, 2018 2 次提交

Remove memory context argument from GpPolicyFetch and friends. · 6d17d31f

由 Heikki Linnakangas 提交于 10月 29, 2018

Most callers were passing CurrentMemoryContext, so this makes most callers
slightly simpler. The few places that needed to pass a different context
now switch to the correct one before calling the GpPolicy*() function.
Reviewed-by: NDaniel Gustafsson <dgustafsson@pivotal.io>

6d17d31f

Allow reshuffling tables with update triggers · 15ee1437

由 Pengzhou Tang 提交于 10月 24, 2018

Previously, when updating a table with update triggers on its distribution column,
GPDB report an error like "ERROR: UPDATE on distributed key column not allowed on
relation with update triggers" because current GPDB executor don't support
statement-level update triggers and will also skip row-level update triggers for
a split-update is actually consist of delete and insert, so if the result relation
has update triggers, GPDB reject and error out because it's not functional.

There is an exception for 'ALTER TABLE SET WITH (RESHUFFLE)', RESHUFFLE also use
split-update node internal to rebalance/expand table, however, from the view of
users, ALTER TABLE should not hit any kind of triggers, so we don't need to
error out as same as UPDATE command.

15ee1437

23 10月, 2018 1 次提交

Table data should be reshuffled to new segments · f4f4bdcc

由 ZhangJackey 提交于 10月 23, 2018

Each table has a `numsegments` attribute in the
GP_DISTRIBUTION_POLICY table,  it indicates that the table's
data is distributed on the first N segments, In the common case,
the `numsegments` equal the total segment count of this
cluster.

When we add new segments into the cluster, `numsegments` no
longer equal the actual segment count in the cluster, we
need to reshuffle the table data to all segments in 2 steps:

	* Reshuffle the table data to all segments
	* Update `numsegments`

It is easy to update `numsegments`, so we focus on how to
reshuffle the table data, There are 3 type tables in the
Greenplum database, they are reshuffled in different ways.
For the hash distributed table, we reshuffle data based on
Update statement. Updating the hash keys of the	table
ill generate a Plan like:

	Update
		->Redistributed Motion
			->SplitUpdate
				->SeqScan

We can not use this Plan to reshuffle table data directly.
The problem is that we need to know the segment count
when Motion node computes the destination segment. When
we compute the destination segment of deleting tuple, it
need the old segment count which is equal `numsegments`;
n the other hand, we need to use the new segment count to
compute the destination segment for	inserting tuple.
So we have to add a new operator Reshuffle to compute the
destination segment, it records the O and N (O is the count
of old segments and N is the count of new segments), then
the Plan would be adjusted like:

	Update
		->Explicit Motion
			->Reshuffle
				->SplitUpdate
					->SeqScan

It can compute the destination segments directly with O and
N, at the same time we change the Motion type to Explicit,
it can send a tuple to the destination segment which we
computed in the Reshuffle node.

With changing the hash method to the `jump hash`, not all
the table data need to reshuffle, so we add an new
ReshuffleExpr to filter the tuples which are need to
reshuffle, this expression will compute the destination
segment ahead of schedule, if the destination segment is
current segment, the tuple do not need to reshuffle, with
the ReshuffleExpr the plan would adjust like that:

	Update
		->Explicit Motion
			->Reshuffle
				->SplitUpdate
					->SeqScan
						|-ReshuffleExpr

When we want to reshuffle one table, we use the SQL `ALTER
TABLE xxx SET WITH (RESHUFFLE)`, Actually it will generate
an new UpdateStmt parse tree, the parse tree is similar to
the parse tree which is generated by SQL `UPDATE xxx SET
xxx.aaa = COALESCE(xxx.aaa...) WHERE ReshuffleExpr`. We set
an reshuffle flag in the UpdateStmt, so it can distinguish
the common update and the reshuffling.

In conclusion, we reshuffle hash distributed table by
Reshuffle node and ReshuffleExpr, the ReshuffleExpr filter
the tuple need to reshuffle and the Reshuffle node do the
real reshuffling work, we can use that framework to
implement reshuffle random distributed table and replicated
table.

For random distributed table, it has no hash keys,  each
old segment need reshuffle (O - N) / N data to the new
segments, In the ReshuffleExpr, we can generate a random
value between [0, N), if the random values is greater than
O, it means that the tuple need to reshuffle, so SeqScan
node can return this tuple to ReshuffleNode.  Reshuffle node
will generate a random value between [O, N), it means which
new segment the tuple need to insert.

For replicated table, the table data is same in the all old
segments, so there do not need to delete any tuples, it only
need copy the tuple which is in the old segments to the new
segments, so the ReshuffleExpr do not filte any tuples, In
the Reshuffle node, we neglect the tuple which is generated
for deleting, only return the inserting tuple to motion. Let
me illustrate this with an example:

If there are 3 old segments in the cluster and we add 4 new
segments, the segment ID of old segments is (0,1,2) and the
segment ID of new segments is (3,4,5,6), when reshuffle the
replicated table, the seg#0 is responsible to copy data to
seg#3 and seg#6, the seg#1 is responsible to copy data to
seg#4, the seg#2 is responsible to copy data to seg#5.


Co-authored-by: Ning Yu nyu@pivotal.io
Co-authored-by: Zhenghua Lyu zlv@pivotal.io
Co-authored-by: Shujie Zhang shzhang@pivotal.io

f4f4bdcc

20 10月, 2018 1 次提交

Refactor the way Split Update nodes are constructed in the planner. · faf0ec6b

由 Heikki Linnakangas 提交于 10月 20, 2018

One specialty in a Split Update is that the node needs the *old* values
for all the distribution key columns, to compute the distribution hash for
each old row, so that they can be deleted. That was previously handled at
the time when the SplitUpdate node was created, by adding any missing
Vars for the old values to the subplan's target list, pushing them down
through joins and any other plan nodes, all the way down to the Scan
node for that relation. That seemed complicated and fragile.

The reason to tackle this right now is that we were seeing failures related
to this, while working on the PostgreSQL 9.4 merge. It added a test case,
where a Split Update was done through a security barrier view. The security
barrier view added a SubqueryScan to the plan tree, and the mechanism to
push through the old attributes couldn't cope with that. I'm sure we
could've hacked that to make it work, but this refactoring seems like a
better long term fix.

This patch makes it the responsibility of preprocess_targetlist(), to ensure
that the old values are made available to the top of the tree, if a Split
Update is needed. preprocess_targetlist() seems like the appropriate place,
because it already does that for columns that are not modified by the
UPDATE.

Now that we are making the decision on whether to do a split update in
preprocess_targetlist() already, add a flag to PlannerInfo to remember that
decision, until the point where the ModifyTable node is added to the top of
the plan tree.

Also add a test case, for an inherited table where some children have a
different distribution key, and an UPDATE on some of the children require a
Split Update, and others don't. That was causing me trouble at one point
during the development, and I'm not sure if there was any existing test to
cover that.

faf0ec6b

28 9月, 2018 1 次提交

Allow tables to be distributed on a subset of segments · 4eb65a53

由 ZhangJackey 提交于 9月 28, 2018

There was an assumption in gpdb that a table's data is always
distributed on all segments, however this is not always true for example
when a cluster is expanded from M segments to N (N > M) all the tables
are still on M segments, to workaround the problem we used to have to
alter all the hash distributed tables to randomly distributed to get
correct query results, at the cost of bad performance.

Now we support table data to be distributed on a subset of segments.

A new columne `numsegments` is added to catalog table
`gp_distribution_policy` to record how many segments a table's data is
distributed on.  By doing so we could allow DMLs on M tables, joins
between M and N tables are also supported.

```sql
-- t1 and t2 are both distributed on (c1, c2),
-- one on 1 segments, the other on 2 segments
select localoid::regclass, attrnums, policytype, numsegments
    from gp_distribution_policy;
 localoid | attrnums | policytype | numsegments
----------+----------+------------+-------------
 t1       | {1,2}    | p          |           1
 t2       | {1,2}    | p          |           2
(2 rows)

-- t1 and t1 have exactly the same distribution policy,
-- join locally
explain select * from t1 a join t1 b using (c1, c2);
                   QUERY PLAN
------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)
   ->  Hash Join
         Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
         ->  Seq Scan on t1 a
         ->  Hash
               ->  Seq Scan on t1 b
 Optimizer: legacy query optimizer

-- t1 and t2 are both distributed on (c1, c2),
-- but as they have different numsegments,
-- one has to be redistributed
explain select * from t1 a join t2 b using (c1, c2);
                          QUERY PLAN
------------------------------------------------------------------
 Gather Motion 1:1  (slice2; segments: 1)
   ->  Hash Join
         Hash Cond: a.c1 = b.c1 AND a.c2 = b.c2
         ->  Seq Scan on t1 a
         ->  Hash
               ->  Redistribute Motion 2:1  (slice1; segments: 2)
                     Hash Key: b.c1, b.c2
                     ->  Seq Scan on t2 b
 Optimizer: legacy query optimizer
```

4eb65a53

27 9月, 2018 1 次提交

Remove remove_subquery_in_RTEs() call in standard_planner() (#5863) · 69cd1ec5

由 Paul Guo 提交于 9月 27, 2018

As the comment said, this was useful howerver now that we have
upstream add_rte_to_flat_rtable() to handle that, let's remove
this call.

69cd1ec5

23 9月, 2018 1 次提交

Remove duplicate getgpsegmentCount() prototype · 2a10ac51

由 Daniel Gustafsson 提交于 9月 23, 2018

getgpsegmentCount() was defined in both cdbvars.h and cdbutil.h. While
not needing another header include in some cases, getgpsegmentCount()
is not a variable and the correct location is cdbutil.h. Remove the
prototype from cdbvars.g and update includes as required.

Also fix the function comment to match reality and minor tweaking of
the debug elog() performed.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>

2a10ac51

19 9月, 2018 1 次提交

Fix "could not find pathkey item to sort" error with MergeAppend plans. · 1722adb8

由 Heikki Linnakangas 提交于 9月 18, 2018

When building a Sort node to represent the ordering that is preserved
by a Motion node, in make_motion(), the call to make_sort_from_pathkeys()
would sometimes fail with "could not find pathkey item to sort". This
happened when the ordering was over a UNION ALL operation. When building
Motion nodes for MergeAppend subpaths, the path keys that represented the
ordering referred to the items in the append rel's target list, not the
subpaths. In create_merge_append_plan(), where we do a similar thing for
each subpath, we correctly passed the 'relids' argument to
prepare_sort_from_pathkeys(), so that prepare_sort_from_pathkeys() can
match the target list entries of the append relation with the entries of
the subpaths. But when creating the Motion nodes for each subpath, we
were passing NULL as 'relids' (via make_sort_from_pathkeys()).

At a high level, the fix is straightforward: we need to pass the correct
'relids' argument to prepare_sort_from_pathkeys(), in
cdbpathtoplan_create_motion_plan(). However, the current code structure
makes that not so straightforward, so this required some refactoring of
the make_motion() and related functions:

Previously, make_motion() and make_sorted_union_motion() would take a path
key list as argument, to represent the ordering, and it called
make_sort_from_pathkeys() to extract the sort columns, operators etc.
After this patch, those functions take arrays of sort columns, operators,
etc. directly as arguments, and the caller is expected to do the call to
make_sort_from_pathkeys() to get them, or build them through some other
means. In cdbpathtoplan_create_motion_plan(), call
prepare_sort_from_pathkeys() directly, rather than the
make_sort_from_pathkeys() wrapper, so that we can pass the 'relids'
argument. Because prepare_sort_from_pathkeys() is marked as 'static', move
cdbpathtoplan_create_motion_plan() from cdbpathtoplan.c to createplan.c,
so that it can call it.

Add test case. It's a slightly reduced version of a query that we already
had in 'olap_group' test, but seems better to be explicit. Revert the
change in expected output of 'olap_group', made in commit 28087f4e,
which memorized the error in the expected output.

Fixes https://github.com/greenplum-db/gpdb/issues/5695.
Reviewed-by: NPengzhou Tang <ptang@pivotal.io>
Reviewed-by: NMelanie Plageman <mplageman@pivotal.io>

1722adb8

18 9月, 2018 1 次提交

Remove dead code following ereport(ERROR.. call · 46cf24ee

由 Daniel Gustafsson 提交于 9月 18, 2018

The relation_close() call directly following ereport(ERROR.. will never be
called as the ereport won't return. While closing and cleaning up any used
resources is a good thing, they will be automatically handled by the error
handler so remove.

Also editorialized the error message to fit the error message style guide
and fixed test fallout.
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>

46cf24ee

15 9月, 2018 1 次提交

Fix typos in comments · 57f8b52e

由 Daniel Gustafsson 提交于 9月 15, 2018

Also do minor style fixups such as line length and capitalization to
some of the affected lines.

57f8b52e

10 9月, 2018 1 次提交

Correct cost calculation for SplitUpdate plan (#5698) · 0dff53a4

由 Lei (Alexandra) Wang 提交于 9月 10, 2018

Correct cost calculation for SplitUpdate plan

This partly addresses a GPDB_90_MERGE_FIXME introduced in 73801e8. As mentioned
in the FIXME, this will not help generating a better plan because we have no
choice other than simply adding the SplitUpdate node. Note that only the cost
is adjusted, the width is still incorrect. We will not fix width for now
because upstream commit 3fc6e2d7 will fix it.
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NAlexandra Wang <leiwangcheme@gmail.com>

0dff53a4

03 9月, 2018 1 次提交

Fix assertion failure trying to build a Motion on a non-hashable column. · a031b9c0

由 Heikki Linnakangas 提交于 9月 03, 2018

Be more careful to not build a Redistribute Motion on an expression that's
not GPDB-hashable.

Fixes github issue #4868, as well as a couple of other similar cases that
were found while investigating this.

a031b9c0

21 8月, 2018 1 次提交

Do not create split update for relations excluded by constraints · 9b8dd4f4

由 Taylor Vesely 提交于 8月 09, 2018

When the query_planner determines that a relation does not to need
scanning due to constraint exclusion, it will create a 'dummy' plan for
that operation. When we plan a split update, it does not understand this
'dummy' plan shape, and will fail with an assertion.

Instead, because an excluded relation will never return tuples, do not
attempt to create a split update at all.

9b8dd4f4

03 8月, 2018 1 次提交
- K
  Revert "Merge with PostgreSQL 9.2beta2." · e0aa3ef2
  由 Karen Huddleston 提交于 8月 02, 2018
```
This reverts commit 4750e1b6.
```
  e0aa3ef2
02 8月, 2018 1 次提交

Merge with PostgreSQL 9.2beta2. · 4750e1b6

由 Richard Guo 提交于 8月 02, 2018

This is the final batch of commits from PostgreSQL 9.2 development,
up to the point where the REL9_2_STABLE branch was created, and 9.3
development started on the PostgreSQL master branch.

Notable upstream changes:

* Index-only scan was included in the batch of upstream commits. It
  allows queries to retrieve data only from indexes, avoiding heap access.

* Group commit was added to work effectively under heavy load. Previously,
  batching of commits became ineffective as the write workload increased,
  because of internal lock contention.

* A new fast-path lock mechanism was added to reduce the overhead of
  taking and releasing certain types of locks which are taken and released
  very frequently but rarely conflict.

* The new "parameterized path" mechanism was added. It allows inner index
  scans to use values from relations that are more than one join level up
  from the scan. This can greatly improve performance in situations where
  semantic restrictions (such as outer joins) limit the allowed join orderings.

* SP-GiST (Space-Partitioned GiST) index access method was added to support
  unbalanced partitioned search structures. For suitable problems, SP-GiST can
  be faster than GiST in both index build time and search time.

* Checkpoints now are performed by a dedicated background process. Formerly
  the background writer did both dirty-page writing and checkpointing. Separating
  this into two processes allows each goal to be accomplished more predictably.

* Custom plan was supported for specific parameter values even when using
  prepared statements.

* API for FDW was improved to provide multiple access "paths" for their tables,
  allowing more flexibility in join planning.

* Security_barrier option was added for views to prevents optimizations that
  might allow view-protected data to be exposed to users.

* Range data type was added to store a lower and upper bound belonging to its
  base data type.

* CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
  SELECT query is planned during the execution of the utility. To conform to
  this change, GPDB executes the utility statement only on QD and dispatches
  the plan of the SELECT query to QEs.
Co-authored-by: NAdam Lee <ali@pivotal.io>
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Co-authored-by: NGang Xiong <gxiong@pivotal.io>
Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
Co-authored-by: NPaul Guo <paulguo@gmail.com>
Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

4750e1b6

23 7月, 2018 1 次提交

Enable update on distribution column in legacy planner. · 6be0a32a

由 Zhenghua Lyu 提交于 7月 23, 2018

Before, we cannot update distribution column in legacy planner, because the OLD tuple
and NEW tuple maybe belong to different segments. We enable this by borrowing ORCA's
logic, namely, split each update operation into delete and insert. The delete operation is hashed
by OLD tuple attributes, and insert operation is hashed by NEW tuple attributes. This change
includes following items:
* We need push missed OLD attributes to sub plan tree so that that attribute could be passed to top Motion.
* In addition, if the result relation has oids, we also need to put oid in the targetlist.
* If result relation is partitioned, we should special treat it because resultRelations is partition tables instead of root table, but that is true for normal Insert.
* Special treats for update triggers, because trigger cannot be executed across segments.
* Special treatment in nodeModifyTable, so that it can process Insert/Delete for update purpose.
* Proper initialization of SplitUpdate.

There are still TODOs:
* We don't handle cost gracefully, because we add SplitUpdate node after plan generated. Already added a FIXME for this
* For deletion, we could optimize in just sending distribution columns instead of all columns


Author: Xiaoran Wang <xiwang@pivotal.io>
Author: Max Yang <myang@pivotal.io>
Author: Shujie Zhang <shzhang@pivotal.io>
Author: Zhenghua Lyu <zlv@pivotal.io>

6be0a32a

11 7月, 2018 1 次提交

Fix duplicate distributed keys for CTAS · 7680b762

由 Pengzhou Tang 提交于 6月 19, 2018

To keep it consistent with the "Create table" syntax, CTAS should also
disallow duplicate distributed keys, otherwise backup and restore will
mess up.

7680b762

29 5月, 2018 1 次提交

Support RETURNING for replicated tables. · fb7247b9

由 Ning Yu 提交于 5月 28, 2018

* rpt: reorganize data when ALTER from/to replicated.

There was a bug that altering from/to a replicated table has no effect,
the root cause is that we did not change gp_distribution_policy neither
reorganize the data.

Now we perform the data reorganization by creating a temp table with the
new dist policy and transfering all the data to it.

* rpt: support RETURNING for replicated tables.

This is to support below syntax (suppose foo is a replicated table):

	INSERT INTO foo VALUES(1) RETURNING *;
	UPDATE foo SET c2=c2+1 RETURNING *;
	DELETE * FROM foo RETURNING *;

A new motion type EXPLICIT GATHER MOTION is introduced in EXPLAIN
output, data will be received from one explicit sender in this motion
type.

* rpt: fix motion type under explicit gather motion.

Consider below query:

	INSERT INTO foo SELECT f1+10, f2, f3+99 FROM foo
	  RETURNING *, f1+112 IN (SELECT q1 FROM int8_tbl) AS subplan;

We used to generate a plan like this:

	Explicit Gather Motion 3:1  (slice2; segments: 3)
	  ->  Insert
	        ->  Seq Scan on foo
	        SubPlan 1  (slice2; segments: 3)
	          ->  Gather Motion 3:1  (slice1; segments: 1)
	                ->  Seq Scan on int8_tbl

A gather motion is used for the subplan, which is wrong and will cause a
runtime error.

A correct plan is like below:

	Explicit Gather Motion 3:1  (slice2; segments: 3)
	  ->  Insert
	        ->  Seq Scan on foo
	        SubPlan 1  (slice2; segments: 3)
	          ->  Materialize
	                ->  Broadcast Motion 3:3  (slice1; segments: 3)
	                      ->  Seq Scan on int8_tbl

* rpt: add test case for with both PRIMARY and UNIQUE.

On a replicated table we could set both PRIMARY KEY and UNIQUE
constraints, test cases are added to ensure this feature during future
development.

(cherry picked from commit 72af4af8)

fb7247b9