提交 · b4e2e3e2280ca81d1a3a1b20d9659e229238ed18 · Greenplum / Gpdb

14 3月, 2019 1 次提交

Rename legacy planner to Postgres planner · b4e2e3e2

由 Daniel Gustafsson 提交于 3月 14, 2019

As we merge with upstream and by that keep refining the Postgres
planner, legacy planner is no longer a suitable name. This changes
all variations of the spelling (legacy planner, legacy optimizer,
legacy query optimizer etc) to say "Postgres" rather than "legacy".
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>
Reviewed-by: NDavid Yozie <dyozie@pivotal.io>
Reviewed-by: NGeorgios Kokolatos <gkokolatos@pivotal.io>

b4e2e3e2

12 3月, 2019 1 次提交

Add regress tests for index and constraint naming · a3d17a98

由 David Krieger 提交于 12月 13, 2018

This commit is part of Add Partitioned Indexes #7047.

Tests were added to verify:
 - Index backed constraint names have matching index name
 - Constraints on partition tables including ADD PARTITION and EXCHANGE
   PARTITION.
 - Constraints and indexes can be upgraded. This includes testing directly in
   pg_regress, or creating tables to be used by pg_upgrade.
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>

a3d17a98

11 3月, 2019 1 次提交

Retire the reshuffle method for table data expansion (#7091) · 1c262c6e

由 Ning Yu 提交于 3月 11, 2019

This method was introduced to improve the data redistribution
performance during gpexpand phase2, however per benchmark results the
effect does not reach our expectation. For example when expanding a
table from 7 segments to 8 segments the reshuffle method is only 30%
faster than the traditional CTAS method, when expanding from 4 to 8
segments reshuffle is even 10% slower than CTAS. When there are indexes
on the table the reshuffle performance can be worse, and extra VACUUM is
needed to actually free the disk space. According to our experiments
the bottleneck of reshuffle method is on the tuple deletion operation,
it is much slower than the insertion operation used by CTAS.

The reshuffle method does have some benefits, it requires less extra
disk space, it also requires less network bandwidth (similar to CTAS
method with the new JCH reduce method, but less than CTAS + MOD). And
it can be faster in some cases, however as we can not automatically
determine when it is faster it is not easy to get benefit from it in
practice.

On the other side the reshuffle method is less tested, it is possible to
have bugs in corner cases, so it is not production ready yet.

In such a case we decided to retire it entirely for now, we might add it
back in the future if we can get rid of the slow deletion or find out
reliable ways to automatically choose between reshuffle and ctas
methods.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/8xknWag-SkI/5OsIhZWdDgAJReviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

1c262c6e

07 12月, 2018 1 次提交

Suppress a warning generated on inherited tables. · 8a8b7e5a

由 Ning Yu 提交于 12月 04, 2018

The following WARNING is generated by ANALYZE when some sample tuples
are from segments outside the [0, numsegments-1] range, however this
does not indicate the data distribution is wrong.  Take inherited tables
for example, when inherited tables has greater numsegments than parent
this WARNING will be raised, and it is expected.  This could happen
normally on the random_numsegments pipeline job, so ignore this WARNING.

    WARNING:  table "patest0" contains rows in segment 2,
              which is outside the # of segments for the table's policy
              (2 segments)

Added this pattern to init_file to ignore it.

8a8b7e5a

30 11月, 2018 1 次提交

Remove unused matchignore rules from atmsort · 23efecf9

由 Daniel Gustafsson 提交于 11月 30, 2018

These rules cover messages that are no longer in the code, so they will
never match. Remove.
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: NVenkatesh Raghavan <vraghavan@pivotal.io>

23efecf9

29 11月, 2018 1 次提交

Provide test hook to randomize default numsegments. · 968dfc41

由 Ning Yu 提交于 11月 19, 2018

By loading this hook CREATE TABLE will create tables with random
numsegments by using the gp_debug_numsegments extension.

It can be enabled via make command like this:

make installcheck EXTRA_REGRESS_OPTS=--prehook=randomize_create_table_default_numsegments

However as the plans can be different with random numsegments it is
recommended to also ignore the plan diffs, so the make command becomes
this:

make installcheck EXTRA_REGRESS_OPTS="--prehook=randomize_create_table_default_numsegments --ignore-plans"

968dfc41

23 11月, 2018 1 次提交

Reduce differences between reshuffle tests · 2eef2ba2

由 Ning Yu 提交于 11月 23, 2018

There are 3 reshuffle tests, the ao one, the co one, and the heap one.
They share almost the same cases, but different on table names and
create table options.  There are also some differences caused when
adding regression tests, they are only added in one file but not others.

We want to keep minimal differences between these tests, so we ensure
that a regression test for ao also covers similar case for heap.  And
once we understand one of the test file we have almost the same
knowledge on the others.

Here is a list of changes to these tests:
- reduce differences on table names by using schema;
- reduce differences on CREATE TABLE options by setting default storage
  options;
- simplify the creation of partially distributed tables by using the
  gp_debug_numsegments extension;
- copy some regression tests to all the tests;
- retire the no longer used helper function;
- move the tests into an existing parallel test group;

pg_regress test framework provides some @@ tokens for ao/co tests,
however we still can not merge the ao and co tests into one file as
WITH (OIDS) is only supported by ao but not co.

2eef2ba2

22 9月, 2018 1 次提交

Change pretty-printing of expressions in EXPLAIN to match upstream. · 4c54c894

由 Heikki Linnakangas 提交于 9月 21, 2018

We had changed this in GPDB, to print less parens. That's fine and dandy,
but it hardly seems worth it to carry a diff vs upstream for this. Which
format is better, is a matter of taste. The extra parens make some
expressions more clear, but OTOH, it's unnecessarily verbose for simple
expressions. Let's follow the upstream on this.

These changes were made to GPDB back in 2006, as part of backporting
to EXPLAIN-related patches from PostgreSQL 8.2. But I didn't see any
explanation for this particular change in output in that commit message.

It's nice to match upstream, to make merging easier. However, this won't
make much difference to that: almost all EXPLAIN plans in regression
tests are different from upstream anyway, because GPDB needs Motion nodes
for most queries. But every little helps.

4c54c894

11 8月, 2018 1 次提交

Adding GiST support for GPORCA · ec3693e6

由 Ashuka Xue 提交于 7月 13, 2018

Prior to this commit, there was no support for GiST indexes in GPORCA.
For queries involving GiST indexes, ORCA was selecting Table Scan paths
as the optimal plan. These plans could take up to 300+ times longer than
Planner, which generated a index scan plan using the GiST index.

Example:
```
CREATE TABLE gist_tbl (a int, p polygon);
CREATE TABLE gist_tbl2 (b int, p polygon);
CREATE INDEX poly_index ON gist_tbl USING gist(p);

INSERT INTO gist_tbl SELECT i, polygon(box(point(i, i+2),point(i+4,
i+6))) FROM generate_series(1,50000)i;
INSERT INTO gist_tbl2 SELECT i, polygon(box(point(i+1, i+3),point(i+5,
i+7))) FROM generate_series(1,50000)i;

ANALYZE;
```
With the query `SELECT count(*) FROM gist_tbl, gist_tbl2 WHERE
gist_tbl.p <@ gist_tbl2.p;`, we see a performance increase with the
support of GiST.

Before:
```
EXPLAIN SELECT count(*) FROM gist_tbl, gist_tbl2 WHERE gist_tbl.p <@ gist_tbl2.p;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=0.00..171401912.12 rows=1 width=8)
   ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..171401912.12 rows=1 width=8)
         ->  Aggregate  (cost=0.00..171401912.12 rows=1 width=8)
               ->  Nested Loop  (cost=0.00..171401912.12 rows=335499869 width=1)
                     Join Filter: gist_tbl.p <@ gist_tbl2.p
                     ->  Table Scan on gist_tbl2  (cost=0.00..432.25 rows=16776 width=101)
                     ->  Materialize  (cost=0.00..530.81 rows=49997 width=101)
                           ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..525.76 rows=49997 width=101)
                                 ->  Table Scan on gist_tbl  (cost=0.00..432.24 rows=16666 width=101)
 Optimizer status: PQO version 2.65.1
(10 rows)

Time: 170.172 ms
SELECT count(*) FROM gist_tbl, gist_tbl2 WHERE gist_tbl.p <@ gist_tbl2.p;
 count
-------
 49999
(1 row)

Time: 546028.227 ms
```

After:
```
EXPLAIN SELECT count(*) FROM gist_tbl, gist_tbl2 WHERE gist_tbl.p <@ gist_tbl2.p;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=0.00..21749053.24 rows=1 width=8)
   ->  Gather Motion 3:1  (slice2; segments: 3)  (cost=0.00..21749053.24 rows=1 width=8)
         ->  Aggregate  (cost=0.00..21749053.24 rows=1 width=8)
               ->  Nested Loop  (cost=0.00..21749053.24 rows=335499869 width=1)
                     Join Filter: true
                     ->  Broadcast Motion 3:3  (slice1; segments: 3)  (cost=0.00..526.39 rows=50328 width=101)
                           ->  Table Scan on gist_tbl2  (cost=0.00..432.25 rows=16776 width=101)
                     ->  Bitmap Table Scan on gist_tbl  (cost=0.00..21746725.48 rows=6667 width=1)
                           Recheck Cond: gist_tbl.p <@ gist_tbl2.p
                           ->  Bitmap Index Scan on poly_index  (cost=0.00..0.00 rows=0 width=0)
                                 Index Cond: gist_tbl.p <@ gist_tbl2.p
 Optimizer status: PQO version 2.65.1
(12 rows)

Time: 617.489 ms

SELECT count(*) FROM gist_tbl, gist_tbl2 WHERE gist_tbl.p <@ gist_tbl2.p;
 count
-------
 49999
(1 row)

Time: 7779.198 ms
```

GiST support was implemented by sending over GiST index information to
GPORCA in the metadata using a new index enum specifically for GiST.
Signed-off-by: NBhuvnesh Chaudhary <bchaudhary@pivotal.io>
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

ec3693e6

03 8月, 2018 1 次提交
- K
  Revert "Merge with PostgreSQL 9.2beta2." · e0aa3ef2
  由 Karen Huddleston 提交于 8月 02, 2018
```
This reverts commit 4750e1b6.
```
  e0aa3ef2
02 8月, 2018 1 次提交

Merge with PostgreSQL 9.2beta2. · 4750e1b6

由 Richard Guo 提交于 8月 02, 2018

This is the final batch of commits from PostgreSQL 9.2 development,
up to the point where the REL9_2_STABLE branch was created, and 9.3
development started on the PostgreSQL master branch.

Notable upstream changes:

* Index-only scan was included in the batch of upstream commits. It
  allows queries to retrieve data only from indexes, avoiding heap access.

* Group commit was added to work effectively under heavy load. Previously,
  batching of commits became ineffective as the write workload increased,
  because of internal lock contention.

* A new fast-path lock mechanism was added to reduce the overhead of
  taking and releasing certain types of locks which are taken and released
  very frequently but rarely conflict.

* The new "parameterized path" mechanism was added. It allows inner index
  scans to use values from relations that are more than one join level up
  from the scan. This can greatly improve performance in situations where
  semantic restrictions (such as outer joins) limit the allowed join orderings.

* SP-GiST (Space-Partitioned GiST) index access method was added to support
  unbalanced partitioned search structures. For suitable problems, SP-GiST can
  be faster than GiST in both index build time and search time.

* Checkpoints now are performed by a dedicated background process. Formerly
  the background writer did both dirty-page writing and checkpointing. Separating
  this into two processes allows each goal to be accomplished more predictably.

* Custom plan was supported for specific parameter values even when using
  prepared statements.

* API for FDW was improved to provide multiple access "paths" for their tables,
  allowing more flexibility in join planning.

* Security_barrier option was added for views to prevents optimizations that
  might allow view-protected data to be exposed to users.

* Range data type was added to store a lower and upper bound belonging to its
  base data type.

* CTAS (CREATE TABLE AS/SELECT INTO) is now treated as utility statement. The
  SELECT query is planned during the execution of the utility. To conform to
  this change, GPDB executes the utility statement only on QD and dispatches
  the plan of the SELECT query to QEs.
Co-authored-by: NAdam Lee <ali@pivotal.io>
Co-authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-authored-by: NAshwin Agrawal <aagrawal@pivotal.io>
Co-authored-by: NAsim R P <apraveen@pivotal.io>
Co-authored-by: NDaniel Gustafsson <dgustafsson@pivotal.io>
Co-authored-by: NGang Xiong <gxiong@pivotal.io>
Co-authored-by: NHaozhou Wang <hawang@pivotal.io>
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NJesse Zhang <sbjesse@gmail.com>
Co-authored-by: NJinbao Chen <jinchen@pivotal.io>
Co-authored-by: NJoao Pereira <jdealmeidapereira@pivotal.io>
Co-authored-by: NMelanie Plageman <mplageman@pivotal.io>
Co-authored-by: NPaul Guo <paulguo@gmail.com>
Co-authored-by: NRichard Guo <guofenglinux@gmail.com>
Co-authored-by: NShujie Zhang <shzhang@pivotal.io>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-authored-by: NZhenghua Lyu <zlv@pivotal.io>

4750e1b6

19 6月, 2018 1 次提交

Utilize hyperloglog and merge utilities to derive root table statistics · 9c1b1ae3

由 Omer Arap 提交于 1月 12, 2018

This commit introduces an end-to-end scalable solution to generate
statistics of the root partitions. This is done by merging the
statistics of leaf partition tables to generate the statistics of the
root partition. Therefore, ability to merge leaf table statistics for
the root table makes analyze very incremental and stable.

**CHANGES IN LEAF TABLE STATS COLLECTION:**

Incremental analyze will create sample for each partition as the
previous version. While analyzing the sample and generating statistics
for the partition, it will also create a `hyperloglog_counter` data
structure and add values from the sample to the `hyperloglog_counter`
such as number of multiples and sample size. Once the entire sample is
processed, analyze will save the `hyperloglog_counter` as a byte array
in `pg_statistic` catalog table. We reserve a slot for the
`hyperlog_counter` in the table and signify this as a specific type of
statistic kind which is `STATISTIC_KIND_HLL`. We only keep the
`hyperloglog_counter` in the `pg_catalog` for the leaf partitions. If
the user chooses to run FULL scan for HLL, we signify the kind as
`STATISTIC_KIND_FULLHLL`.

**MERGING LEAF STATISTICS**

Once all the leaf partitions are analyzed, we analyze the root
partition. Initially, we check if all the partitions have been analyzed
properly and have all the statistics available to us in the
`pg_statistic` catalog table. If there is a partition with no tuples,
even though it has no entry in `pg_catalog`, we consider it as analyzed.
If for some reason a single partition is not analyzed, we fall back to
the original analyze algorithm that requires to acquire sample for the
root partition and calculate statistic based on the sample.

Merging null fraction and average width from leaf partition statistics
is trivial and does not involve significant challenge. We do calculate
them first. Then, the remaining statistics information are:

- Number of distinct values (NDV)

- Most common values (MCV), and their frequencies termed as most common
frequency (MCF)

- Histograms that represent the distribution of the data values in the
table

**Merging NDV:**

Hyperloglog provides a functionality to merge multiple
`hyperloglog_counter`s into one and calculate the number of distinct
values using the aggregated `hyperlog_counter`. This aggregated
`hyperlog_counter` is sufficient only if the user chooses to run full
scan for hyperloglog. In the sample based approach, without the
hyperloglog algorithm, derivation of number of distinct values is not
possible. Hyperloglog enables us to merge the `hyperloglog_counter`s
from each partition and calculate the NDV on the merged
`hyperloglog_counter` with an acceptable error rate. However, it does
not give us the ultimate NDV of the root partition, it provides us the
NDV of the union of the samples from each partition.

The rest of the NDV interpolation depends on four metrics in postgres
and based on the formula used in postgres: NDV in the sample, number of
multiple values in the sample, sample size and total rows in the table.
Using these values the algorithm calculates the approximate NDV for the
table. While merging the statistics from the leaf partitions, with the
help of hyperloglog we can accurately generate NDV for the sample,
sample size and total rows, however, number of multiples in the
accumulated sample is unknown since we do not have an access to the
accumulated sample at this point.

_Number of Multiples_

Our approach to estimate the number of multiples in the aggregated
sample (which itself is unavailable) for the root requires the
availability of NDVs, number of multiples and size of each leaf sample.
The NDVs in each sample is trivial to calculate using the partition's
`hyperloglog_counter`. The number of multiples and sample size for each
partition is saved in the `hyperloglog_counter` of the partition to be
used in the merge during the leaf statistics gathering.

Estimating the number of multiples in the aggregate sample for the root
partition is a two step process. First, we accurately estimate the
number of values that reside in more than one partition's sample. Then,
we estimate the number of multiples that uniquely exists in a single
partition. Finally, we add these values to estimate the overall number
of multiples in the aggregate sample of the root partition.

To count the number of values that uniquely exists in one single
partition, we utilize hyperloglog functionality. We can easily estimate
how many values appear only on a specific partition _i_. We call the NDV
of overall aggregate of the entire partition as `NDV_all` and NDV of
aggregate of all partitions but _i_ as `NDV_minus_i`. The difference of
`NDV_all` and  `NDV_minus_i` would result in the values that appear in
only one partition. The rest of the values will contribute to the
overall number of multiples in the root’s aggregated sample, and we call
them as `nMultiple_inter` as the number of values that appear in more
than one partition.

However, that is not enough since even a single value only resides in
one partition, the partition might have multiple of them. We need a way
to express the possibility of existence of these values. Remember that
we also account the number of multiples that uniquely in partition
sample. We already know the number of multiples inside a partition
sample, however we need to normalize this value with the proportion of
the number of values unique to the partition sample to the number of
distinct values of the partition sample. The normalized value would be
partition sample i’s contribution to the overall calculation of the
nMultiple.

Finally, `nMultiple_root` would be the sum of the `nMultiple_inter` and
`normalized_m_i` for each partition sample.

**Merging MCVs:**

We utilize the merge functionality we imported from the 4.3 version of
the greenplum DB. The algorithm is trivial. We convert each MCV’s
frequency into count and add them up if they appear in more than one
partition. After every possible candidate’s count has been calculated,
we sort the candidate values and pick the top ones which is defined by
the `default_statistics_target`. 4.3 previously blindly picks the top
values with the highest count. We however incorporated the same logic
used in the current greenplum and postgres and test if a values is a
real MCV by running some tests. Therefore, even after the merge, the
logic totally aligns with the postgres.

**Merging Histograms:**

One of the main novel contribution of this commit comes in how we merge
the histograms from the leaf partitions. In 4.3 we use priority queue to
merge the histogram from the leaf partition. However, that approach is
very naive and loses very important statistical information. In
postgres, histogram is calculated over the values that did not qualify
as an MCV. The merge logic for the histograms in 4.3, did not take this
into consideration and significant statistical information is lost while
we merge the MCV values.

We introduce a novel approach to feed the MCV’s from the leaf partitions
that did not qualify as a root MCV to the histogram merge logic. To
fully utilize the previously implemented priority queue logic, we
treated non-qualified MCV’s as the histograms of a so called `dummy`
partitions. To be more previcate, if an MCV m1 is a non-qualified MCV we
create a histogram [m1, m1] where it only has one bucket and the bucket
size is the count of this non-qualified MCV. When we merge the
histograms of the leaf partitions and these dummy partitions the merged
histogram would not lose any statistical information.
Signed-off-by: NJesse Zhang <sbjesse@gmail.com>
Signed-off-by: NEkta Khanna <ekhanna@pivotal.io>

9c1b1ae3

09 11月, 2017 1 次提交

Fix cases which are unpredictable (#3797) · 195aaf54

由 Adam Lee 提交于 11月 09, 2017

* Several small fixes of the tests

1, ignore two generated test files.
2, remove the string containing unpredictable segment numbers.
3, drop tables in external_table case, so we could run multiple times of it once.

* Fix cases which are unpredictable

> commit 3bbedbe9
> Author: Heikki Linnakangas <hlinnakangas@pivotal.io>
> Date:   Thu Nov 2 10:04:58 2017 +0200
>
>     Wake up faster, if a segment returns an error.
>     Previously, if a segment reported an error after starting up the
>     interconnect, it would take up to 250 ms for the main thread in the QD
>     process to wake up and poll the dispatcher connections, and to see that
>     there was an error. Shorten that time, by waking up immediately if the
>     QD->QE libpq socket becomes readable while we're waiting for data to
>     arrive in a Motion node.
>     This isn't a complete solution, because this will only wake up if one
>     arbitrarily chosen connection becomes readable, and we still rely on
>     polling for the others. But this greatly speeds up many common scenarios.
>     In particular, the "qp_functions_in_select" test now runs in under 5 s
>     on my laptop, when it took about 60 seconds before.

> Before this commit, the master would only check every 250 ms if one of the
> segments had reported an error. Now it wakes up and cancels the whole query as
> soon as it receives an error from the first segment. That makes it more likely
> that the other segments have not yet reached the same number of errors as what
> is memorized in the expected output.

These two cases check:

1, when selecting from a cte fails, one of the external table of the cte
reached the error limit, how many errors happened in the other external
table of the cte, which would not reached the limit.

2, when selecting from an external table with two locations mapped to
two segments each, one segment reached the reject limit, the other also
reached the same.

We could not predict these two results without special test files, even
without that commit actually. This commit removes the cte case and
checks at least one segment failed in case readable_query26.

195aaf54

18 10月, 2017 1 次提交
- R
  
  Polish warning message that indicates resource queue in disabled. · 6e3a414b
  由 Richard Guo 提交于 10月 17, 2017
  
  6e3a414b
14 8月, 2017 1 次提交

Make ICW pass when resgroup is enabled. · e1eed831

由 Ning Yu 提交于 8月 14, 2017

* resgroup: increase max slots for isolation tests.
* ICW: ignore resgroup related warnings.
* ICW: try to load resgroup variant of answers when resgroup enabled.
* ICW: provide resgroup variant of answers.
* ICW: check whether resqueue is enabled in UDF.
* ICR: substitude usrname in gpconfig output.
* ICR: explicitly set max_connections.
* isolation2: increase resgroup concurrency for max_concurrency tests.

e1eed831

09 8月, 2017 2 次提交

Replace gpfaultinject binary with gp_inject_fault extension in tests. · 5104ca08

由 Heikki Linnakangas 提交于 8月 09, 2017

This replaces all places in regression tests, where the gpfaultinject binary
was used, with the SQL-callable function in the new gp_inject_fault
extension. The SQL function is more forgivin about the dev environemnt, and
doesn't need gpfaultinject to be in $PATH, for starters. Also, it's just
good to harmonize and have just one way of injecting faults.

More uses of gpfaultinject remain in the TINC tests, so we cannot get rid
of it any time soon, but this is a step in that direction, anyway.

5104ca08

Enhance gp_inject_fault. · f52fbe57

由 Heikki Linnakangas 提交于 8月 09, 2017

* Turn it into an extension, for easier installation.

* Add a simpler variant of the gp_inject_fault function, with less options.
  This is applicable to almost all the calls in the regression suite, so it's
  nice to make them less verbose.

* Change the dbid argument from smallint to int4. For convenience, so that
  you don't need a cast when calling the function.

f52fbe57

04 8月, 2017 1 次提交

Set correct errcode for COPY .. ON SEGMENT ereport · f160a47d

由 Daniel Gustafsson 提交于 8月 04, 2017

Using errcode 0 will cause ereport() to treat it as an internal
error and print the filename/line. Since this is a userfacing
error it should have a proper errcode to avoid this. This also
allows the gpdiff rule to be removed.

f160a47d

10 5月, 2017 1 次提交

Remove the shared memory object only when transaction of drop resource group is committed · 04ffeec0

由 xiong-gang 提交于 5月 10, 2017

Previously, we remove the shared memory object when drop resource group,
and restore it if the transaction aborts. The concurrently access to
shared memory object would fail in this way.
If the resource group is dropped, new transactions to this resource
group will be queued up until the drop transaction is finished.
Signed-off-by: NRichard Guo <riguo@pivotal.io>

04ffeec0

03 5月, 2017 1 次提交

Support COPY ON SEGMENT command · 49b12f18

由 Adam Lee 提交于 5月 03, 2017

Support COPY statement that exports the table directly from segment
to local file parallelly.

This commit adds a keyword "on segment" to save the copied file on
"segment" instead of on "master".

Two place holders are used, which are "<SEG_DATA_DIR>" and "<SEGID>"
and will be replced to segment datadir and segment id.

E.g.

```
COPY tbl TO '/tmp/<SEG_DATA_DIR>filename<SEGID>.txt' ON SEGMENT;
```
Signed-off-by: NYuan Zhao <yuzhao@pivotal.io>
Signed-off-by: NHaozhou Wang <hawang@pivotal.io>
Signed-off-by: NAdam Lee <ali@pivotal.io>

49b12f18

28 2月, 2017 1 次提交

Fix typos in comments and elogs · 63a54605

由 Daniel Gustafsson 提交于 2月 28, 2017

The error messages are developer or debug facing, no reason to
believe this will break anyones regexing of logfiles in prod.

63a54605

27 1月, 2017 1 次提交
- N
  Make gpfaultinjector less noisy and update related icg tests · 554b878f
  由 Nikos Armenatzoglou 提交于 1月 26, 2017
```
Closes #1606
Signed-off-by: NHaisheng Yuan <hyuan@pivotal.io>
```
  554b878f
24 1月, 2017 2 次提交

Move patterns used only by a particular test out of global init_file. · ae5bccd1

由 Heikki Linnakangas 提交于 1月 24, 2017

This reduces the risk of accidentally masking out messages in a test that's
not supposed to produce such messages in the first place, and is just
nicer in general, IMHO.

While we're at it, add a brief comment to init_file to explain what it's
for. Also, remove a few more matchsubs from atmsort.pm that seem to be
unused.

ae5bccd1

Rewrite table redistribution with dropped types in ALTER TABLE · 62d66c06

由 Heikki Linnakangas 提交于 1月 24, 2017

When a table which had an attribute whose type has been dropped process
the ALTER TABLE command queue, a "hidden" type will be created, and
immediately dropped, during ALTER TABLE processing for table
redistribution. This will emit several NOTICEs which can be confusing to
the user as it's an autogenerated name and the DROP TYPE can have happened
at a previous time. Below is an example of the output:

create table <tablename> (a integer, b <typename>);
drop type <typename>;
...
alter table <tablename> set with(reorganize = true) distributed randomly;
NOTICE: return type pg_atsdb_<oid>_2_3 is only a shell
NOTICE: argument type pg_atsdb_<oid>_2_3 is only a shell
NOTICE: drop cascades to function pg_atsdb_<oid>_2_3_out(pg_atsdb_<oid>_2_3)
NOTICE: drop cascades to function pg_atsdb_<oid>_2_3_in(cstring)

The reason for adding the hidden types is that the redistribution is
performed with a CTAS doing SELECT *. To fix, change the way the CTAS is
done, to not create hidden types.

The temp table that we create still needs to include dropped columns at the
same positions as the old one. Otherwise, when we swap the relation files,
a tuple's representation on-disk won't match the catalogs. However, we
cannot easily re-construct a dropped column with the same attlen, attalign,
etc. as the original dropped column. Instead, create it as if it was an
INT4 column, and just before swapping the relation files, update the
attlen, attalign fields in pg_attribute entries of the dropped columns to
match that of INT4. That way, the original table's catalog entries match
that of the temp table.

Alternatively, we could build the temp table without the dropped columns,
and remove them from pg_attribute altogether. However, we'd need to update
the attnum field of all following columns, and cascade that change to at
least pg_attrdef and pg_depend. That seems more complicated.

Also remove output from expected testfiles and perform minor cleanups.

Original patch by Daniel Gustafsson, with the int4-placeholder mechanism
added by me.

62d66c06

19 12月, 2016 2 次提交

Downgrade buffer capacity WARNING to LOG · d355e78e

由 Daniel Gustafsson 提交于 12月 18, 2016

While it should be rare (and the original ticket referred indicates
that it is), it's perfectly legal for a UDP buffer to fill up. Set
the messagelevel to LOG rather than WARNING.

d355e78e

Make NOTICE for table distribution consistent · bbed4116

由 Daniel Gustafsson 提交于 12月 18, 2016

The different kinds of NOTICE messages regarding table distribution
were using a mix of upper and lower case for 'DISTRIBUTED BY'. Make
them consistent by using upper case for all messages and update the
test files, and atmsort regexes, to match.

bbed4116

18 11月, 2016 2 次提交

Use proper error code for errors. · 0bf31cd6

由 Heikki Linnakangas 提交于 11月 17, 2016

Attach a suitable error code for many errors that were previously reported
as "internal errors". GPDB's elog.c prints a source file name and line
number for any internal errors, which is a bit ugly for errors that are
in fact unexpected internal errors, but user-facing errors that happen
as a result of e.g. an invalid query.

To make sure we don't accumulate more of these, adjust the regression tests
to not ignore the source file and line number in error messages. There are
a few exceptions, which are listed explicitly.

0bf31cd6

H
Remove unnecessary ignore-directive for COptTasks.cpp. · 459db892
由 Heikki Linnakangas 提交于 11月 17, 2016
```
With commit 61972775, we use a proper SQLSTATE for the errors that
needed this before.
```
459db892

19 9月, 2016 1 次提交

Make dispatch testcases stable and independent of compile configuration · 1c5c12a6

由 Pengzhou Tang 提交于 9月 18, 2016

When process_startup_packets is triggered, gp_debug_linger was not set to 0 which cause anoying message "HINT: process xxxx" exist in output file and make tests unstable. This commit change the fault injection location to send_qe_details_init_backend where gp_debug_linger has been set to 0, so no hint message is generated in output file.

1c5c12a6

13 9月, 2016 1 次提交

add test cases for dispatch gang creation. · db41b9b9

由 Pengzhou Tang 提交于 9月 06, 2016

To test corner cases, we use faultinjector utility to simulate
segment recovery, segment FATAL&ERROR level errors when gangs are creating.

db41b9b9

06 9月, 2016 1 次提交
- K
  Modify init_file of regression tests. · 811f56ba
  由 Kenan Yao 提交于 9月 06, 2016
```
This change should be part of commit ba13e666,
but was missed check in previously.
```
  811f56ba
05 9月, 2016 1 次提交
- K
  Add regression test to verify QD will report failure if QE fails to send its · ba13e666
  由 Kenan Yao 提交于 8月 30, 2016
```
motion_listener back
```
  ba13e666
20 8月, 2016 1 次提交
- K
  
  Fix translator to show missing message from exception · 7f10e308
  由 Karthikeyan Jambu Rajaraman 提交于 8月 18, 2016
  
  7f10e308
17 8月, 2016 1 次提交
- H
  Guard against ScalarArrayOpExpr qual under Index Scan [#128092771] · 684b4294
  由 Haisheng Yuan 提交于 8月 12, 2016
```
Also updated gp_optimizer expected output and ignore line number
difference for functions.c
```
  684b4294
16 7月, 2016 1 次提交

Clean up gpdiff's init_file and built-in ignore- and subs- patterns · c425899d

由 Heikki Linnakangas 提交于 7月 14, 2016

* Anchor all the ERROR, WARNING etc. messages to beginning of line, with
  "/^..."

* Remove obsolete substititions, for error messages that don't appear
  anywhere in the code anymore.

* Remove redundant replacements of source line numbers in error messages,
  like "(xact.c:%d)". There is a special rule that replaces all of those
  with (SOMEFILE:SOMEFUNC).

* Replace case-insensitive rules with case-sensitive ones.

* Replace sloppy use of "\s+", with the actuala mount of whitespace in
  the error messages.

* Remove unnecessary "s/.../" lines from the matchignore block in init_file.
  You don't need those with "matchignore", only with "matchsubs".

Aside from being tidier, these changes make the diffing significantly
faster. There are less regular expressions to parse, and the remaining ones
are faster to evaluate.

c425899d

22 6月, 2016 1 次提交

Avoid throwing an error in bfz_close() during aborting a transaction · 5f5ef9e0

由 Daniel Gustafsson 提交于 6月 21, 2016

When bfz_close() is called in the codepath during the abortion of a
transaction we must avoid throwing even more errors unless the
situation calls for it. For bfz_close() it's fine to lower the
ereport level to WARNING in this case. Longer term we should move
this, and other, codepaths away from calling unlink() directly and
instead use the API provided but this closes a current issue in
ICG so better to close this immediately and refactor all callsites
when having a clean ICG.

5f5ef9e0

12 3月, 2016 1 次提交

Add more installcheck test coverage for AO/CO tables. · 7d0325e4

由 Jimmy Yih 提交于 3月 02, 2016

Most of these test additions are inspired from Pivotal's internal
testing and needed to be added to the open source installcheck to
give the community more test coverage on AO/CO tables. This commit
mostly adds extra coverage for indexes and partition tables.

7d0325e4

11 3月, 2016 1 次提交
- K
  Ignore the line numbers of plpython.c in regression · 90c528a3
  由 Kuien Liu 提交于 3月 10, 2016
```
based on patch by Daniel Gustafsson
```
  90c528a3
08 3月, 2016 1 次提交

Add tests relating to alter distribution policy and AOCO tables. · 1360a488

由 Jimmy Yih 提交于 2月 26, 2016

Most of these test additions are inspired from Pivotal's internal
testing and needed to be added to the open source installcheck to
give the community more test coverage.

1360a488

05 12月, 2015 1 次提交

Fix the case where VACUUM FULL on an appendonly table would cause its · 65400193

由 Abhijit Subramanya 提交于 11月 18, 2015

auxiliary tables to not get shrunk and generate a notice to the user.

The AppendOnlyCompaction_IsRelationEmpty() function incorrectly assumed that
the column number for tupcount column was the same in pg_aoseg and pg_aocsseg
tables. This cause it to incorrectly return true even when the CO relation was
not empty. This method is used in vacuum to determine if the auxiliary
relations need to be vacuumed. Due to the bug, vacuum would update the
pg_aocsseg relation and vacuum it within the same transaction and hence
generate the NOTICE that it can't shrink the relation because transaction is
already in progress and would not shrink the relation.

Also make sure that we do a vacuum on the auxiliary relations only in two cases :-
1. Vacuum cleanup phase
2. Relation is empty and we are in prepare phase
Otherwise we will end up with the same issue above if some of the segments have
zero rows

65400193