提交 · a8f5a0455c2f2827aa8f039ef632df7bae534ade · Greenplum / Gpdb

07 7月, 2018 1 次提交

Do not automatically create an array type for child partitions · a8f5a045

由 Jimmy Yih 提交于 7月 02, 2018

As part of the Postgres 8.3 merge, all heap tables now automatically
create an array type. The array type will usually be created with
typname '_<heap_name>' since the automatically created composite type
already takes the typname '<heap_name>' first. If typname
'_<heap_name>' is taken, the logic will continue to prepend
underscores until no collision (truncating the end if typname gets
past NAMEDATALEN of 64). This might be an oversight in upstream
Postgres since certain scenarios involving creating a large number of
heap tables with similar names could result in a lot of typname
collisions until no heap tables with similar names can be
created. This is very noticable in Greenplum heap partition tables
because Greenplum has logic to automatically name child partitions
with similar names instead of having the user name each child
partition.

To prevent typname collision failures when creating a heap partition
table with a large number of child partitions, we will now stop
automatically creating the array type for child partitions.

References:
https://www.postgresql.org/message-id/flat/20070302234016.GF3665%40fetter.org
https://github.com/postgres/postgres/commit/bc8036fc666a8f846b1d4b2f935af7edd90eb5aa

a8f5a045

06 7月, 2018 2 次提交

Fix schema in rangefuncs_cdb ICG test · 1a8bd0ad

由 Jimmy Yih 提交于 7月 02, 2018

The schema is named differently from the one being used in the
search_path so all the tables, views, functions, and etc. were
incorrectly being created in the public schema.

1a8bd0ad

Remove deduplication in hyperloglog code · 9c456084

由 Omer Arap 提交于 6月 25, 2018

We had significant deduplication in hyperloglog extension and utility
library that we use in the analyze related code. This commit removes the
deduplication as well as significant amount dead code. It also fixes
some compiler warnings and some coverity issues.

This commit also puts the hyperloglog functions in a separate schema
which is non-modifiable by non superusers.
Signed-off-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>

9c456084

04 7月, 2018 1 次提交
- A
  copy.c: dispatch missing AO segno map for not-partitioned tables · 14cb1039
  由 Adam Lee 提交于 7月 03, 2018
```
The map was missed by mistake, all AO loading actions need it.
```
  14cb1039
30 6月, 2018 2 次提交

S

Remove irrelevant comments from sql test file · a8f6260e
由 Shreedhar Hardikar 提交于 6月 29, 2018

a8f6260e

Fix 'no parameter found for initplan subquery' · f50e5daf

由 Shreedhar Hardikar 提交于 6月 25, 2018

The issue happens because of constant folding in the testexpr of the
SUBPLAN expression node. The testexpr may be reduced to a const and any
PARAMs, previous used in the testexpr, disappear, However, the subplan
still remains.

This behavior is similar in upstream Postgres 10 and may be of
performance consideration. Leaving that aside for now, the constant
folding produces an elog(ERROR)s when the plan has subplans and no
PARAMs are used. This check in `addRemoteExecParamsToParamList()` uses
`context.params` which computes the used PARAMs in the plan and `nIntPrm
= list_length(root->glob->paramlist`, which is the number of PARAMs
declared/created.
Given the ERROR messages generated, the above check makes no sense.
Especially since it won’t even trip for the InitPlan bug (mentioned in
the comments) as long as there is at least one PARAM in the query.

This commit removes this check since it doesn't correctly capture the
intent.

In theory, it could be be replaced by one specifically aimed at
InitPlans, that is, find all the params ids used by InitPlan and then
make sure they are used in the plan. But we already do this and
remove any unused initplans in `remove_unused_initplans()`. So I don’t
see the point of adding that.

Fixes #2839

f50e5daf

29 6月, 2018 3 次提交

Fix incremental analyze for non-matching attnums · ef39e0d0

由 Omer Arap 提交于 6月 28, 2018

To merge stats in incremental analyze for root partition, we use leaf
tables' statistics. In commit b28d0297,
we fixed an issue where child attnum do not match with a root table's
attnum for the same column. After we fixed that issue with a test, that
test also exposed the bug in analyze code.

This commit fixes the issue in analyze using the similar fix in
b28d0297.

ef39e0d0

Fix querying stats for largest child · b28d0297

由 Omer Arap 提交于 5月 31, 2018

Previously, we would use the root table's information to acquire stats
from the `syscache` which return no result. The reason it does not
return any result is because we query syscache using `inh` field which
is set true for root table and false for the leaf tables.

Another issue which is not evident is the possibility of mismatching
`attnum`s for the root and leaf tables after running specific scenarios.
When we delete a column and then split a partition, unchanged partitions
and old partitions preserves the old attnums while newly created
partitions have increasing attnums with no gaps. If we query syscache
using the root's attnum for that column, we would be getting a wrong
stats for that specific column. Passing root's `inh` hide the issue of
having wrong stats.

This commit fixes the issue by getting the attribute name using the
root's attnume and use it to acquire correct attnum for the largest leaf
partition.

b28d0297

A
Perform analyze on specific table in spilltodisk test. · 37c75753
由 Ashwin Agrawal 提交于 6月 26, 2018
```
No need to have database scope analyze, only specific table needs to be
analyzed for the test.
```
37c75753

27 6月, 2018 1 次提交
- A
  COPY: don't dispatch AO segno map for unloading · b320948d
  由 Adam Lee 提交于 6月 26, 2018
```
Unloading doesn't need it, checking the distribution policy neither.
```
  b320948d
20 6月, 2018 1 次提交

Add tests for subqueries nested inside a scalar expression · dd77c59c

由 Dhanashree Kashid 提交于 6月 19, 2018

Add tests to ensure sane behavior when a subquery appears nested inside
a scalar expression. The intent is to check for correct results.

Bump ORCA version to 2.63.0
Signed-off-by: NShreedhar Hardikar <shardikar@pivotal.io>

dd77c59c

19 6月, 2018 4 次提交

Utilize hyperloglog and merge utilities to derive root table statistics · 9c1b1ae3

由 Omer Arap 提交于 1月 12, 2018

This commit introduces an end-to-end scalable solution to generate
statistics of the root partitions. This is done by merging the
statistics of leaf partition tables to generate the statistics of the
root partition. Therefore, ability to merge leaf table statistics for
the root table makes analyze very incremental and stable.

**CHANGES IN LEAF TABLE STATS COLLECTION:**

Incremental analyze will create sample for each partition as the
previous version. While analyzing the sample and generating statistics
for the partition, it will also create a `hyperloglog_counter` data
structure and add values from the sample to the `hyperloglog_counter`
such as number of multiples and sample size. Once the entire sample is
processed, analyze will save the `hyperloglog_counter` as a byte array
in `pg_statistic` catalog table. We reserve a slot for the
`hyperlog_counter` in the table and signify this as a specific type of
statistic kind which is `STATISTIC_KIND_HLL`. We only keep the
`hyperloglog_counter` in the `pg_catalog` for the leaf partitions. If
the user chooses to run FULL scan for HLL, we signify the kind as
`STATISTIC_KIND_FULLHLL`.

**MERGING LEAF STATISTICS**

Once all the leaf partitions are analyzed, we analyze the root
partition. Initially, we check if all the partitions have been analyzed
properly and have all the statistics available to us in the
`pg_statistic` catalog table. If there is a partition with no tuples,
even though it has no entry in `pg_catalog`, we consider it as analyzed.
If for some reason a single partition is not analyzed, we fall back to
the original analyze algorithm that requires to acquire sample for the
root partition and calculate statistic based on the sample.

Merging null fraction and average width from leaf partition statistics
is trivial and does not involve significant challenge. We do calculate
them first. Then, the remaining statistics information are:

- Number of distinct values (NDV)

- Most common values (MCV), and their frequencies termed as most common
frequency (MCF)

- Histograms that represent the distribution of the data values in the
table

**Merging NDV:**

Hyperloglog provides a functionality to merge multiple
`hyperloglog_counter`s into one and calculate the number of distinct
values using the aggregated `hyperlog_counter`. This aggregated
`hyperlog_counter` is sufficient only if the user chooses to run full
scan for hyperloglog. In the sample based approach, without the
hyperloglog algorithm, derivation of number of distinct values is not
possible. Hyperloglog enables us to merge the `hyperloglog_counter`s
from each partition and calculate the NDV on the merged
`hyperloglog_counter` with an acceptable error rate. However, it does
not give us the ultimate NDV of the root partition, it provides us the
NDV of the union of the samples from each partition.

The rest of the NDV interpolation depends on four metrics in postgres
and based on the formula used in postgres: NDV in the sample, number of
multiple values in the sample, sample size and total rows in the table.
Using these values the algorithm calculates the approximate NDV for the
table. While merging the statistics from the leaf partitions, with the
help of hyperloglog we can accurately generate NDV for the sample,
sample size and total rows, however, number of multiples in the
accumulated sample is unknown since we do not have an access to the
accumulated sample at this point.

_Number of Multiples_

Our approach to estimate the number of multiples in the aggregated
sample (which itself is unavailable) for the root requires the
availability of NDVs, number of multiples and size of each leaf sample.
The NDVs in each sample is trivial to calculate using the partition's
`hyperloglog_counter`. The number of multiples and sample size for each
partition is saved in the `hyperloglog_counter` of the partition to be
used in the merge during the leaf statistics gathering.

Estimating the number of multiples in the aggregate sample for the root
partition is a two step process. First, we accurately estimate the
number of values that reside in more than one partition's sample. Then,
we estimate the number of multiples that uniquely exists in a single
partition. Finally, we add these values to estimate the overall number
of multiples in the aggregate sample of the root partition.

To count the number of values that uniquely exists in one single
partition, we utilize hyperloglog functionality. We can easily estimate
how many values appear only on a specific partition _i_. We call the NDV
of overall aggregate of the entire partition as `NDV_all` and NDV of
aggregate of all partitions but _i_ as `NDV_minus_i`. The difference of
`NDV_all` and  `NDV_minus_i` would result in the values that appear in
only one partition. The rest of the values will contribute to the
overall number of multiples in the root’s aggregated sample, and we call
them as `nMultiple_inter` as the number of values that appear in more
than one partition.

However, that is not enough since even a single value only resides in
one partition, the partition might have multiple of them. We need a way
to express the possibility of existence of these values. Remember that
we also account the number of multiples that uniquely in partition
sample. We already know the number of multiples inside a partition
sample, however we need to normalize this value with the proportion of
the number of values unique to the partition sample to the number of
distinct values of the partition sample. The normalized value would be
partition sample i’s contribution to the overall calculation of the
nMultiple.

Finally, `nMultiple_root` would be the sum of the `nMultiple_inter` and
`normalized_m_i` for each partition sample.

**Merging MCVs:**

We utilize the merge functionality we imported from the 4.3 version of
the greenplum DB. The algorithm is trivial. We convert each MCV’s
frequency into count and add them up if they appear in more than one
partition. After every possible candidate’s count has been calculated,
we sort the candidate values and pick the top ones which is defined by
the `default_statistics_target`. 4.3 previously blindly picks the top
values with the highest count. We however incorporated the same logic
used in the current greenplum and postgres and test if a values is a
real MCV by running some tests. Therefore, even after the merge, the
logic totally aligns with the postgres.

**Merging Histograms:**

One of the main novel contribution of this commit comes in how we merge
the histograms from the leaf partitions. In 4.3 we use priority queue to
merge the histogram from the leaf partition. However, that approach is
very naive and loses very important statistical information. In
postgres, histogram is calculated over the values that did not qualify
as an MCV. The merge logic for the histograms in 4.3, did not take this
into consideration and significant statistical information is lost while
we merge the MCV values.

We introduce a novel approach to feed the MCV’s from the leaf partitions
that did not qualify as a root MCV to the histogram merge logic. To
fully utilize the previously implemented priority queue logic, we
treated non-qualified MCV’s as the histograms of a so called `dummy`
partitions. To be more previcate, if an MCV m1 is a non-qualified MCV we
create a histogram [m1, m1] where it only has one bucket and the bucket
size is the count of this non-qualified MCV. When we merge the
histograms of the leaf partitions and these dummy partitions the merged
histogram would not lose any statistical information.
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

9c1b1ae3

A
Fix COPY TO ON SEGMENT processed counting · cb63e543
由 Adam Lee 提交于 6月 13, 2018
```
The processed variable should not be reset while looping all partitions.
```
cb63e543

Fix COPY TO IGNORE EXTERNAL PARTITIONS · f118f4bd

由 Adam Lee 提交于 6月 12, 2018

BeginCopy() returns a brand new CopyState but ignored the value of
skip_ext_partition, set after it.

It's a simple boolean of struct CopyStmt, no need to wrap in options.

f118f4bd

A
Update .gitignore files · e0e8f475
由 Adam Lee 提交于 6月 12, 2018
```
To have a clean `git status` output.
```
e0e8f475

16 6月, 2018 1 次提交

Fix incorrect modification of storageAttributes.compress. · 7c82d50f

由 Ashwin Agrawal 提交于 6月 14, 2018

For CO table, storageAttributes.compress only conveys if should apply block
compression or not. RLE is performed as stream compression within the block and
hence storageAttributes.compress true or false doesn't relate to rle at all. So,
with rle_type compression storageAttributes.compress is true for compression
levels > 1 where along with stream compression, block compression is
performed. For compress level = 1 storageAttributes.compress is always false as
no block compression is applied. Now since rle doesn't relate to
storageAttributes.compress there is no reason to touch the same based on
rle_type compression.

Also, the problem manifests more due the fact in datumstream layer
AppendOnlyStorageAttributes in DatumStreamWrite (`acc->ao_attr.compress`) is
used to decide block type whereas in cdb storage layer functions
AppendOnlyStorageAttributes from AppendOnlyStorageWrite
(`idesc->ds[i]->ao_write->storageAttributes.compress`) is used. Due to this
difference changing just one that too unnecessarily is bound to cause issue
during insert.

So, removing the unnecessary and incorrect update to
AppendOnlyStorageAttributes.

Test case showcases the failing scenario without the patch.

7c82d50f

14 6月, 2018 1 次提交
- M
  Fixed EXTERNAL WEB TABLE crashed when ON MASTER without LOG ERRORS (#5153) · cf316e0a
  由 Ming LI 提交于 6月 14, 2018
```
 The hard-coded flag is not correct for all cases.
```
  cf316e0a
11 6月, 2018 2 次提交

J

Update cpuset error message to include more helpful hints (#5124) · f85c426b
由 Jialun 提交于 6月 11, 2018

f85c426b

Fix external table with non-UTF8 encoding data · 6822104f

由 Adam Lee 提交于 6月 08, 2018

1, pass external table encoding to copy's options, then set
cstate->file_encoding to it, for reading and writing.

2, after the merge, copy state doesn't have a member of client encoding,
which used to set to the target encoding, get the converted data as a
client, now passes the file encoding (from copy options) to convert
directly.

6822104f

09 6月, 2018 2 次提交
- A
  Add start_ignore and end_ignore around all gp_inject_fault loads (#5097) · fa35b73f
  由 Andreas Scherbaum 提交于 6月 09, 2018
```
* Add start_ignore and end_ignore around all gp_inject_fault loads
```
  fa35b73f
- A
  
  Remove drop table from partition* test to cut-down wasted secs. · 80b935dc
  由 Ashwin Agrawal 提交于 5月 23, 2018
  
  80b935dc
08 6月, 2018 2 次提交

Reduce runtimes for some qp_* tests. · 94c30fdb

由 Ashwin Agrawal 提交于 6月 06, 2018

Before:
     qp_functions             ... ok (76.24 sec)  (diff:0.06 sec)
     qp_gist_indexes4         ... ok (88.46 sec)  (diff:0.07 sec)
     qp_with_clause           ... ok (130.70 sec)  (diff:0.32 sec)

After:
     qp_functions             ... ok (4.49 sec)  (diff:0.06 sec)
     qp_gist_indexes4         ... ok (16.18 sec)  (diff:0.06 sec)
     qp_with_clause           ... ok (54.41 sec)  (diff:0.30 sec)

94c30fdb

Add tests to verify that dummy joins are created · fab372cb

由 Bhuvnesh Chaudhary 提交于 6月 06, 2018

For semi join queries if the constraints can eliminate the scanned relations,
the resulting relation should be marked as a dummy and the join using it should
be a dummy join.

fab372cb

07 6月, 2018 1 次提交

Fix hang issue due to result node don't squelch outer node explicitly · 2c011ce4

由 Pengzhou Tang 提交于 5月 28, 2018

For a result node with one-time filter, if it's outer plan is not
empty and contains a motion node, then it needs to squelch the outer
node explicitly if the one-time filter check is false. This is necessary
espically for motion node under it, ExecSquelchNode() force a stop
message so the interconnect sender don't stuck at recending or polling
ACKs.

2c011ce4

06 6月, 2018 5 次提交

Disable GDD scan for dispatch tests · fd54a398

由 Pengzhou Tang 提交于 6月 05, 2018

Dispatch tests don't expect backends created by other tests or auxiliary
processes like FTS and GDD, this commit disables GDD too to make dispatch
tests stable.

fd54a398

Fix autovacuum test after fault injector refactor · 96499060

由 Jimmy Yih 提交于 6月 05, 2018

There was a recent change to fault injector framework that made simple
form "gp_inject_fault(faultname, type, db_id)" not work with
wait_until_triggered fault type. To get around this, we should
properly use "gp_wait_until_triggered_fault()" instead.

Reference:
https://github.com/greenplum-db/gpdb/commit/723e58481ad706d4c8f4f7af1325be2dcd36c985

96499060

Optimize and correct copy_append_only_data(). · b3aff72d

由 Ashwin Agrawal 提交于 5月 31, 2018

Alter tablespace needs to copy all underlying files of table from one tablespace
to other. For AO/CO tables this was implemented using full directory scan to
find files and copy when persistent tables were removed. This gets very
inefficient and varies in performance based on number of files present in the
directory. Instead use the same optimization logic used for `mdunlink_ao()`
leveraging known file layout for AO/CO tables.

Also, old logic had couple of bugs:
- missed coping the base or .0 file. Which means data loss if table was altered in past.
- xlogging even for temp tables

These are fixed as well with this patch. Additional tests added to cover for
those missing scenaiors. Also, moved the AO specific code to aomd.c, out of
tablecmds.c file to reduce conflicts to upstream.

b3aff72d

Improve query_finish_pending test, cut down 100 secs. · 94f716d4

由 Ashwin Agrawal 提交于 6月 04, 2018

Commit 07ee8008 added test section in
query_finish_pending.sql to validate case where if a query can be canceled when
cancel signal arrives fast than the query dispatched. For the same uses sleep
fault.

But the test was incorrect due to usage of "begin", as begin sleeps for 50 secs
instead of actual select query sleeping. Also, since the fault always trigger
the reset fault sleeps for additional 50 secs. Instead remove begin and just set
endoccurence to 1. Verified modified test fails/hangs without the fix and
passes/completes in couple secs with the fix.

94f716d4

Drop role if exists in bfv_partition test. · 9b13e7df

由 Ashwin Agrawal 提交于 6月 04, 2018

bfv_partition tests fail if ICW is run n times after creating the cluster, as
the role is not dropped. With this commit now this test can be run n times
successfully without re-creating the cluster.

On the way also remove the suppression of warnings in role.sql.

9b13e7df

05 6月, 2018 2 次提交

A
SPI 64 bit changes for pl/Python (#4154) · ce22b327
由 Andreas Scherbaum 提交于 6月 05, 2018
```
SPI 64 bit changes for pl/Python

Includes fault injection tests
```
ce22b327

Implement CPUSET (#5023) · 0c0782fe

由 Jialun 提交于 6月 05, 2018

* Implement CPUSET, a new management of cpu resource in resource
group which can reserve the specified cores for specified
resource group exclusively. This can ensure that there are always
available cpu resources for the group which has set CPUSET.
The most common scenario is allocating fixed cores for short
queries.

- One can use it by executing CREATE RESOURCE GROUP xxx WITH (
  cpuset='0-1', xxxx). 0-1 are the reserved cpu cores for
  this group. Or ALTER RESOURCE GROUP SET CPUSET '0,1' to modify
  the value.
- The syntax of CPUSET is a combination of the tuples, each
  tuple represents one core number or the core numbers interval,
  separated by comma. E.g. 0,1,2-3. All the core in CPUSET must be
  available in system and the core numbers in each group cannot
  have overlap.
- CPUSET and CPU_RATE_LIMIT are mutually exclusive. One cannot
  create a resource group with both CPUSET and CPU_RATE_LIMIT.
  But the CPUSET and CPU_RATE_LIMIT can be freely switched in
  one group by executing ALTER operation, that means if one
  feature has been set, the other is disabled.
- The cpu cores will be returned to GPDB, when the group has been
  dropped, or the CPUSET value has been changed, or the CPU_RATE_LIMIT
  has been set.
- If some of the cores have been allocated to the resource group,
  then the CPU_RATE_LIMIT in other groups only indicating the
  percentage of cpu resources of the left cpu cores.
- If the GPDB is busy, all the other cores which have not be
  allocated to any resource groups exclusively through CPUSET
  have already been run out, the cpu cores in CPUSET will still
  not be allocated.
- The cpu cores in CPUSET will be used exclusively only in GPDB
  level, the other non-GPDB processes in system may use them.
- Add test cases for this new feature, and the test environment
  must contain at least two cpu cores, so we upgrade the configuration
  of instance_type in resource_group jobs.

* - Compatible with the case that cgroup directory cpuset/gpdb
  does not exist
- Implement pg_dump for cpuset & memory_auditor
- Fix a typo
- Change default cpuset value from empty string to -1, for
  the code in 5X assume that all the default value in
  resource group is integer, a non-integer value will make the
  system fail to start

0c0782fe

04 6月, 2018 1 次提交

Make resource queues object addressable · ace7a3e9

由 Daniel Gustafsson 提交于 6月 04, 2018

In order to be able to set comments on resource queues, they must
be object addressable, so fix by implement object addressing. Also
add a small test for commenting on a resource queue.

ace7a3e9

01 6月, 2018 1 次提交

Add tests for GPDB specific collation creation · e0845912

由 Taylor Vesely 提交于 5月 08, 2018

Unlike upstream, GPDB needs to keep collations in-sync between multiple
databases. Add tests for GPDB specific collation behavior.

These tests need to import a system locale, so add a @syslocale@ variable to
gpstringstubs.pl in order to test the creation/deletion of collations from
system locales.
Co-authored-by: NJim Doty <jdoty@pivotal.io>

e0845912

30 5月, 2018 1 次提交

Refine the fault injector framework (#5013) · 723e5848

由 Tang Pengzhou 提交于 5月 30, 2018

* Refine the fault injector framework

* Add counting feature so a fault can be triggered N times.
* Add a simpler version named gp_inject_fault_infinite.
* Refine and make code cleaner include renaming sleepTimes
  to extraArg so it can be used by other fault types.

Now 3 functions provided:

1. gp_inject_fault(faultname, type, ddl, database, tablename,
					start_occurrence, end_occurrence, extra_arg, db_id)
startOccurrence: nth occurrence that a fault starts triggering
endOccurrence: nth occurrence that a fault stops triggering,
-1 means the fault is always triggered until it is reset.

2. gp_inject_fault(faultname, type, db_id)
simpler version for fault triggered only once.

3. gp_inject_fault_infinite(faultname, type, db_id)
simpler version for fault always triggered until it's reset.

* fix bgwriter_checkpoint case

* use gp_inject_fault_infinite here instead of gp_inject_fault so cache
  of pg_proc that contains gp_inject_fault_infinite is loaded before
  checkpoint and the following gp_inject_fault_infinite don't dirty the
  buffer again.
* Add a matchsubs to ignore 5 or 6 times hits of fsync_counter.

* Fix flaky twophase_tolerance_with_mirror_promotion test

* use different session for  Scenario 2 and  Scenario 3 because
  the gang of session 2 is no longer valid.
* wait for wanted fault to be triggered so no unexpected error occurs.

* Add more segment status info to identify error quickly

Some cases are right behind FTS test cases. If the segments are not
in the desired status, those test cases will fail unexpectedly, this
commit adds more debug info at the beginning of test cases to help
to identify issues quickly.

* Enhance cases to skip fts probe for sure

* Do FTS probe request twice to guarantee fts error is triggered

723e5848

29 5月, 2018 3 次提交

Update RETURNING test cases of replicated tables. · 97fff0e1

由 Ning Yu 提交于 5月 29, 2018

Some error messages were updated during the 9.1 merge, update the
answers for the RETURNING test cases of replicated tables.

97fff0e1

Support RETURNING for replicated tables. · fb7247b9

由 Ning Yu 提交于 5月 28, 2018

* rpt: reorganize data when ALTER from/to replicated.

There was a bug that altering from/to a replicated table has no effect,
the root cause is that we did not change gp_distribution_policy neither
reorganize the data.

Now we perform the data reorganization by creating a temp table with the
new dist policy and transfering all the data to it.

* rpt: support RETURNING for replicated tables.

This is to support below syntax (suppose foo is a replicated table):

	INSERT INTO foo VALUES(1) RETURNING *;
	UPDATE foo SET c2=c2+1 RETURNING *;
	DELETE * FROM foo RETURNING *;

A new motion type EXPLICIT GATHER MOTION is introduced in EXPLAIN
output, data will be received from one explicit sender in this motion
type.

* rpt: fix motion type under explicit gather motion.

Consider below query:

	INSERT INTO foo SELECT f1+10, f2, f3+99 FROM foo
	  RETURNING *, f1+112 IN (SELECT q1 FROM int8_tbl) AS subplan;

We used to generate a plan like this:

	Explicit Gather Motion 3:1  (slice2; segments: 3)
	  ->  Insert
	        ->  Seq Scan on foo
	        SubPlan 1  (slice2; segments: 3)
	          ->  Gather Motion 3:1  (slice1; segments: 1)
	                ->  Seq Scan on int8_tbl

A gather motion is used for the subplan, which is wrong and will cause a
runtime error.

A correct plan is like below:

	Explicit Gather Motion 3:1  (slice2; segments: 3)
	  ->  Insert
	        ->  Seq Scan on foo
	        SubPlan 1  (slice2; segments: 3)
	          ->  Materialize
	                ->  Broadcast Motion 3:3  (slice1; segments: 3)
	                      ->  Seq Scan on int8_tbl

* rpt: add test case for with both PRIMARY and UNIQUE.

On a replicated table we could set both PRIMARY KEY and UNIQUE
constraints, test cases are added to ensure this feature during future
development.

(cherry picked from commit 72af4af8)

fb7247b9

Preserve persistence when reorganizing temp tables. · 0ce07109

由 Ning Yu 提交于 5月 29, 2018

When altering a table's distribution policy we might need to reorganize
the data by creating a __temp__ table, copying the data to it, then swap
the underlying relation files.  However we always create the __temp__
table as permanent, then when the original table is temp the underlying
files can not be found in later queries.

	CREATE TEMP TABLE t1 (c1 int, c2 int) DISTRIBUTED BY (c1);
	ALTER TABLE t1 SET DISTRIBUTED BY (c2);
	SELECT * FROM t1;

0ce07109

28 5月, 2018 2 次提交

N
Revert "Support RETURNING for replicated tables." · a74875cd
由 Ning Yu 提交于 5月 28, 2018
```
This reverts commit 72af4af8.
```
a74875cd

Support RETURNING for replicated tables. · 72af4af8

由 Ning Yu 提交于 5月 28, 2018

* rpt: reorganize data when ALTER from/to replicated.

There was a bug that altering from/to a replicated table has no effect,
the root cause is that we did not change gp_distribution_policy neither
reorganize the data.

Now we perform the data reorganization by creating a temp table with the
new dist policy and transfering all the data to it.


* rpt: support RETURNING for replicated tables.

This is to support below syntax (suppose foo is a replicated table):

	INSERT INTO foo VALUES(1) RETURNING *;
	UPDATE foo SET c2=c2+1 RETURNING *;
	DELETE * FROM foo RETURNING *;

A new motion type EXPLICIT GATHER MOTION is introduced in EXPLAIN
output, data will be received from one explicit sender in this motion
type.


* rpt: fix motion type under explicit gather motion.

Consider below query:

	INSERT INTO foo SELECT f1+10, f2, f3+99 FROM foo
	  RETURNING *, f1+112 IN (SELECT q1 FROM int8_tbl) AS subplan;

We used to generate a plan like this:

	Explicit Gather Motion 3:1  (slice2; segments: 3)
	  ->  Insert
	        ->  Seq Scan on foo
	        SubPlan 1  (slice2; segments: 3)
	          ->  Gather Motion 3:1  (slice1; segments: 1)
	                ->  Seq Scan on int8_tbl

A gather motion is used for the subplan, which is wrong and will cause a
runtime error.

A correct plan is like below:

	Explicit Gather Motion 3:1  (slice2; segments: 3)
	  ->  Insert
	        ->  Seq Scan on foo
	        SubPlan 1  (slice2; segments: 3)
	          ->  Materialize
	                ->  Broadcast Motion 3:3  (slice1; segments: 3)
	                      ->  Seq Scan on int8_tbl


* rpt: add test case for with both PRIMARY and UNIQUE.

On a replicated table we could set both PRIMARY KEY and UNIQUE
constraints, test cases are added to ensure this feature during future
development.

72af4af8

24 5月, 2018 1 次提交

Minimize the time sensitivity in autovacuum regression test · f437fe4f

由 Jimmy Yih 提交于 5月 21, 2018

To verify that autovacuum actually freezes template0, we used to just
busy wait for about two minutes, expecting to observe the change of
pg_database.datfrozenxid. While this "usually works", it's too sensitive
to the amount of time it takes to vacuum freeze template0. Specifically,
in some of our very I/O-deprived environments, this process sometimes
takes slightly longer than two minutes.

This patch introduces a fault injector to help us observe the expected
vacuuming. The wait-in-a-loop is still there, but the bulk of the
uncertain timing is now before the loop, not during the loop.
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Co-authored-by: NJimmy Yih <jyih@pivotal.io>

f437fe4f