提交 · b1b99c434d50f9868d47e9dd5a26ab790c37ba4f · Greenplum / Gpdb

24 6月, 2020 5 次提交

Check whether the directory exists when deleting the tablespace (#10305) · b1b99c43

由 Jinbao Chen 提交于 6月 24, 2020

If the directory of tablespace does not exist, we should got a
error on commit transaction. But error on commit transaction will
cause a panic. So the directory of tablespace should be checked
so that we can avoid panic.

b1b99c43

Only apply transformGroupedWindows() with ORCA. (#10306) · e52dd032

由 Heikki Linnakangas 提交于 6月 24, 2020

* Only apply transformGroupedWindows() with ORCA.

The Postgres planner doesn't need it. Move the code to do it, so that it's
only used before passing a tree to ORCA. This doesn't change anything with
ORCA, but with the Postgres planner, it has some benefits:

* Some cases before this patch do not give correct results and now they run
  correctly (e.g. case `regress/olap_window_seq`)
* Fixes github issue #10143.

* Make transformGroupedWindows walk the entire tree

The transformGroupedWindows function now recursively transforms any
Query node in the tree that has both window functions and groupby or
aggregates.

Also fixed a pre-existing bug where we put a subquery in the target
list of such a Query node into the upper query, Q'. This meant that
any outer references to the scope of Q' no longer had the correct
varattno. The fix is to place the subquery into the target list of
the lower query, Q'' instead, which has the same range table as the
original query Q. Therefore, the varattnos to outer references to the
scope of Q (now Q'') don't need to be updated. Note that varlevelsup to
scopes above Q still need to be adjusted, since we inserted a new
scope Q'. (See comments in code for explanations of Q, Q', Q'').
Co-authored-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Co-authored-by: NHans Zeller <hzeller@vmware.com>
Co-authored-by: NAbhijit Subramanya <asubramanya@pivotal.io>

e52dd032

Improve handling of target lists of window queries (#10309) · 33c4582e

由 Hans Zeller 提交于 6月 23, 2020

Fixing two bugs related to handling queries with window functions and refactoring the related code.

ORCA can't handle expressions on window functions like rank() over() - 1 in a target list. To avoid these, we split Query blocks that contain them into two. The new lower Query computes the window functions, the new upper Query computes the expressions.

We use three mutators and walkers to help with this process:

Increase the varlevelsup of outer references in the new lower Query, since we now inserted a new scope above it.
Split expressions on window functions into the window functions (for the lower scope) and expressions with a Var substituted for the WindowFunc (for the upper scope). Also adjust the varattno for Vars that now appear in the upper scope.
Increase the ctelevelsup for any RangeTblEntrys in the lower scope.
The bugs we saw were related to these mutators. The second one didn't recurse correctly into the required types of subqueries, the third one didn't always increment the query level correctly. The refactor hopefully will simplify this code somewhat. For details, see individual commit messages.

Note: In the 6X_STABLE branch, we currently have a temporary check that triggers a fallback to planner when we see window queries with outer refs in them. When this code gets merged into 6X, we will remove the temporary check. See #10265.

* Add test cases
* Refactor: Renaming misc variables and methods
* Refactor RunIncrLevelsUpMutator

Made multiple changes to how we use the mutator:

1. Start the call with a method from gpdbwrappers.h, for two reasons:
a) execute the needed wrapping code for GPDB calls
b) avoid calling the walker function on the top node, since we don't
want to increment the query level when we call the method on a
query node

2. Now that we don't have to worry anymore about finding a top-level
query node, simplify the logic to recurse into subqueries by simply
doing that when we encounter a Query node further down. Remove the
code dealing with sublinks, RTEs, CTEs.

3. From inside the walker functions, call GPDB methods without going
through the wrapping layer again.

4. Let the mutator code make a copy of the target entry instead of
creating one before calling the mutator.

* Refactor RunWindowProjListMutator, fix bug

Same as previous commit, this time RunWindowProjListMutator gets refactored.
This change also should fix one of the bugs we have seen, that this
mutator did not recurse into derived tables that were inside scalar
subqueries in the select list.

Made multiple changes to how we use the mutator:

3. From inside the walker functions, call GPDB methods without going
through the wrapping layer again.

4. Let the mutator code make a copy of the target entry instead of
creating one before calling the mutator.

* Refactor RunFixCTELevelsUpMutator, fix bug

Converted this mutator into a walker, since only walkers visit RTEs, which
makes things a lot easier.

Fixed a bug where we incremented the CTE levels for scalar subqueries
that went into the upper-level query.

Otherwise, same types of changes as in previous two commits.

* Refactor and reorder code

Slightly modified the flow in methods CQueryMutators::ConvertToDerivedTable
and CQueryMutators::NormalizeWindowProjList

* Remove obsolete methods
* Update expected files

33c4582e

Drop -Wno-variadic-macros, which is inapplicable. · 3a84a379

由 Jesse Zhang 提交于 5月 21, 2020

ORCA actively uses variadic macros (__VA_ARGS__) and we used to suppress
a warning out of pedantry (it's a widely available language extension,
but not in C++98 standard). Now that variadic macros are part of
standard C++11, and that we mandate C++14, drop the warning suppression.

3a84a379

Avoid non-transactional modification of relfrozenxid during CLUSTER · 7f7fa498

由 Andrey Borodin 提交于 6月 23, 2020

Cluster calls vac_update_relstats() which in-place (non-transactional)
modifies pg_class. Incase of CLUSTER command aborts, these changes
can't be rolled back. This creates problem leaving behind inaccurate
relfrozenxid and other fields.

Non-transaction update to reltuples, relpages, relallvisible is fine
but not to relfrozenxid and relminmxid. Hence, this commit avoids
in-place updating relfrozenxid and relminmxid for CLUSTER.

Fixes https://github.com/greenplum-db/gpdb/issues/10150.
Reviewed-by: NAshwin Agrawal <aashwin@vmware.com>

7f7fa498

23 6月, 2020 3 次提交

Fix pgbench --tablespace option. · f6ec65f7

由 Heikki Linnakangas 提交于 6月 23, 2020

The CREATE TABLE commands constructed in pgbench had the DISTRIBUTED BY
and TABLESPACE options the wrong way 'round, so that you got a syntax
error. For example:

$ pgbench postgres -i --tablespace "pg_default"
creating tables...
ERROR:  syntax error at or near "tablespace"
LINE 1: ...22)) with (appendonly=false) DISTRIBUTED BY (bid) tablespace...
                                                             ^
Put the clauses in right order.

We have no test coverage for this at the moment, but PostgreSQL v11 adds
a test for this (commit ed8a7c6f). I noticed this while looking at test
failures with the PostgreSQL v12 merge.
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

f6ec65f7

Fix tupdesc dangling pointer segfault in HashAgg · 41ce55bf

由 Denis Smirnov 提交于 6月 17, 2020

This problem manifests itself with HashAgg on the top of
DynamicIndexScan node and can cause a segmentation fault.

1. A HashAgg node initializes a tuple descriptor for its hash
slot using a reference from input tuples (coming from
DynamicIndexScan through a Sequence node).
2. At the end of every partition index scan in DynamicIndexScan
we unlink and free unused memory chunks and reset partition's
memory context. It causes a total destruction of all objects in
the context including partition index tuple descriptor used in a
HashAgg node.
As a result we get a dangling pointer in HashAgg on switching to
a new index partition during DynamicIndexScan that can cause a
segfault.

41ce55bf

Make cdbpullup_missingVarWalker also consider PlaceHolderVar. · 2cb36320

由 Zhenghua Lyu 提交于 6月 23, 2020

When planner adds a redistribute motion above this subplan, planner
will invoke `cdbpullup_findEclassInTargetList` to make sure the
distkey can be computed based on subplan's targetlist. When the distkey
is an expression based on some PlaceholderVar elements in targetlist,
the function `cdbpullup_missingVarWalker` does not handle it correctly.

For example, when distkey is:

```sql
CoalesceExpr [coalescetype=23 coalescecollid=0 location=586]
        [args]
                PlaceHolderVar [phrels=0x00000040 phid=1 phlevelsup=0]
                        [phexpr]
                                CoalesceExpr [coalescetype=23 coalescecollid=0 location=49]
                                        [args] Var [varno=6 varattno=1 vartype=23 varnoold=6 varoattno=1]
```

and targetlist is:

```
TargetEntry [resno=1]
        Var [varno=2 varattno=1 vartype=23 varnoold=2 varoattno=1]
TargetEntry [resno=2]
        Var [varno=2 varattno=2 vartype=23 varnoold=2 varoattno=2]
TargetEntry [resno=3]
        PlaceHolderVar [phrels=0x00000040 phid=1 phlevelsup=0]
                [phexpr]
                        CoalesceExpr [coalescetype=23 coalescecollid=0 location=49]
                                [args] Var [varno=6 varattno=1 vartype=23 varnoold=6 varoattno=1]
TargetEntry [resno=4]
        PlaceHolderVar [phrels=0x00000040 phid=2 phlevelsup=0]
                [phexpr]
                        CoalesceExpr [coalescetype=23 coalescecollid=0 location=78]
                                [args] Var [varno=6 varattno=2 vartype=23 varnoold=6 varoattno=2]
```

Previously only consider Var leads to `cdbpullup_missingVarWalker` fail.

See Github issue: https://github.com/greenplum-db/gpdb/issues/10315 for
details.

This commit fixes the issue by considering PlaceHolderVar in function
`cdbpullup_missingVarWalker`.

2cb36320

22 6月, 2020 2 次提交

Fix parameterized paths · 9cc1da61

由 Richard Guo 提交于 6月 22, 2020

This patch fixes two issues related to parameterized path logic on
master.

1. When generating unique row ID on the outer/inner side for join
JOIN_DEDUP_SEMI/JOIN_DEDUP_SEMI_REVERSE, we need to pass the param info
of outerpath/innerpath to the projection path. Otherwise we would have
problems when deciding whether a joinclause is movable to this join rel.

2. We should not pick up the parameterized path when its required outer
is beyond a Motion, since we cannot pass a param through Motion.

Fixes issue #10012
Reviewed-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NJinbao Chen <jinchen@pivotal.io>

9cc1da61

(

Fix flaky appendonly test. · f860ff0c

由 (Jerome)Junfeng Yang 提交于 6月 22, 2020

This fix the error:
```
---
/tmp/build/e18b2f02/gpdb_src/src/test/regress/expected/appendonly.out
2020-06-16 08:30:46.484398384 +0000
+++ /tmp/build/e18b2f02/gpdb_src/src/test/regress/results/appendonly.out
2020-06-16 08:30:46.556404454 +0000
@@ -709,8 +709,8 @@
   SELECT oid FROM pg_class WHERE relname='tenk_ao2'));
       case    | objmod | last_sequence | gp_segment_id
        -----------+--------+---------------+---------------
      + NormalXid |      0 | 1-2900        |             1
        NormalXid |      0 | >= 3300       |             0
      - NormalXid |      0 | >= 3300       |             1
        NormalXid |      0 | >= 3300       |             2
        NormalXid |      1 | zero          |             0
        NormalXid |      1 | zero          |             1
```

The flaky is because of the orca `CREATE TABLE` statement without
`DISTRIBUTED BY` will treat the table as randomly distributed.
But the planner will treat as distributed by the table's first column.

ORCA:
```
CREATE TABLE tenk_ao2 with(appendonly=true, compresslevel=0,
blocksize=262144) AS SELECT * FROM tenk_heap;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL
policy entry.
```

Planner:
```
CREATE TABLE tenk_ao2 with(appendonly=true, compresslevel=0,
blocksize=262144) AS SELECT * FROM tenk_heap;
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s)
named 'unique1' as the Greenplum Database data distribution key for this
table.
```

So the data distribution for table tenk_ao2 is not as expected.
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

f860ff0c

19 6月, 2020 3 次提交

Fix cursor snapshot dump xid issue · 32a3a4db

由 Weinan WANG 提交于 6月 19, 2020

For cursor snapshot dump, we need to record both distributed and local
xid. So far, we only record distributed xid in the dump, as well as,
incorrectly assign distributed xid to local xid by dump read function.

Fix it.

32a3a4db

Re-enable test segwalrep/dtx_recovery_wait_lsn (#10320) · fe26d931

由 Paul Guo 提交于 6月 19, 2020

Enable and refactor test isolation2:segwalrep/dtx_recovery_wait_lsn

The test was disabled in 791f3b01.
Because there was concern about the change of the line number in
sql_isolation_testcase.py in the answer file. We refactor the test
to ease the concern and then enable the test again.
Co-authored-by: NGang Xiong <gxiong@pivotal.io>
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

fe26d931

Avoid generating core files during testing. (#10304) · 4a61357c

由 Paul Guo 提交于 6月 19, 2020

We had some negative tests that need to panic and thus generating core files
finally if the system is configured with corefile dump. Long ago we did
optimization to avoid generating core files in some cases. Now we found other
new scenarios that could be further optimized.

1. avoid core file generation with setrlimit() in the FATAL fault inject cod.
Some times FATAL is upgraded to PANIC (e.g. critical section, fail when doing
QD prepare related work). So we could avoid generating core file for this
scenario also. Note even if the FATAL is not upgraded, it's fine mostly to
avoid core file generation since the process will quit soon. With the code
change, We avoid two core files from test isolation2:crash_recovery_dtm.

2. We previously had sanity check dbid/segidx in QE:HandleFtsMessage(), and
panic if there is inconsistency when cassert is enabled, but it seems that we
really do not need to panic since the root cause of the failure is quite
straightforward, and the call stack is quite simple: PostgresMain() ->
HandleFtsMessage(), and also that part of code does not invovle shared memory
so no need to worry about shared memory mess (else we might want a core file to
check). Downgrading the log level to FATAL. This avoids 6 core files from test
isolation2:segwalrep/recoverseg_from_file.
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

4a61357c

18 6月, 2020 4 次提交

(

Fix CASE WHEN IS NOT DISTINCT FROM clause incorrect dump. (#10298) · 3b2aed6e

由 (Jerome)Junfeng Yang 提交于 6月 18, 2020

The clause 'CASE WHEN (arg1) IS NOT DISTINCT FROM (arg2)' dump will miss
the arg1. For example:
```
CREATE OR REPLACE VIEW xxxtest AS
SELECT
    CASE
    WHEN 'I will disappear' IS NOT DISTINCT FROM ''::text
    THEN 'A'::text
    ELSE 'B'::text
    END AS t;
```
The dump will lose 'I will disappear'.

```
SELECT
    CASE
    WHEN IS NOT DISTINCT FROM ''::text
    THEN 'A'::text
    ELSE 'B'::text
    END AS t;
```

3b2aed6e

Fix a flaky test for gdd/dist-deadlock-upsert (#10302) · a3f34ae7

由 Hao Wu 提交于 6月 18, 2020

* Fix a flaky test for gdd/dist-deadlock-upsert

When to run GDD probe is undermined, but it is import for the test
gdd/dist-deadlock-upsert. If the GDD probe runs immediately after
the 2 inter-dead-locked transactions, one of the transactions will
be killed. The isolation2 framework consider the transaction being
blocked if the transaction doesn't finished in 0.5 second. So, if
the killed transaction is too early to be aborted, the test framework
sees no dead lock.
Analyzed-by: NGang Xiong <gxiong@pivotal.io>

* rm sleep

a3f34ae7

resgroup: fix the cpu value of the per host status view · e0d78729

由 Ning Yu 提交于 6月 18, 2020

Resource group we does not distinguish the per segment cpu usage, the
cpu usage reported by a segment is actually the total cpu usage of all
the segments on the host.  This is by design, not a bug.  However, in
the gp_toolkit.gp_resgroup_status_per_host view it reports the cpu usage
as the sum of all the segments on the same host, so the reported per
host cpu usage is actually N times of the actual usage, where N is the
count of the segments on that host.

Fixed by reporting the avg() instead of the sum().

Tests are not provided as the resgroup/resgroup_views did not verify cpu
usages since the beginning, because the cpu usage is unstable on
pipelines.  However, I have verified manually.
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>

e0d78729

Enable brin in ao/aocs table (#9537) · 46d9e26a

由 Jinbao Chen 提交于 6月 18, 2020

We merge the brin from Postgres9.5, but greenplum did not enable
brin on ao/aocs table.

The reason brin cannot be used directly on the ao / aocs table is
that the storage structure of ao / aocs is different from the heap
table. Heap has only one physical file, and all block numbers are
continuous. The revmap in brin is a array that spans multiple
blocks, but it does not make sense in ao/aocs table.

Ao/aocs has 128 segment files, and the block numbers in these
segments are distributed over the entire value range. If we use an
array to record the information of each block, this array will be
too large.

So we introduced an upper structure to solve this problem. The
upper level is a array which records the block number of the
revmap block. The revmap blocks are not continuous. When we need
an new revmap block, just extend a new one and record the block
number in the upper level array.
Reviewed-by: NAsim R P <pasim@vmware.com>
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>
Reviewed-by: Nxiong-gang <gxiong@pivotal.io>
Reviewed-by: NAdam Lee <adam8157@gmail.com>

46d9e26a

17 6月, 2020 6 次提交

Disallow to change the distribution policy to REPLICATED for partition table (#10313) · 78cccb81

由 Hao Wu 提交于 6月 17, 2020

This patch fixes the issue: https://github.com/greenplum-db/gpdb/issues/10224
Replicated table is not allowed to be a partition table.
So an existing partition table must not be altered its
distribution policy to REPLICATED.
Reported-by: NHeikki Linnakangas <hlinnakangas@pivotal.io>
Reviewed-by: NZhenghua Lyu <zlv@pivotal.io>

78cccb81

GetLatestSnapshot on QEs always return without distributed snapshot. · d8f4a45f

由 Zhenghua Lyu 提交于 6月 17, 2020

Greenplum tests the visibility of heap tuples firstly using
distributed snapshot. Distributed snapshot is generated on
QD and then dispatched to QEs. Some utility statement needs
to work under the latest snapshot when executing, so that they
invoke the function `GetLatestSnapshot` in QEs. But remember
we cannot get the latest distributed snapshot.

Subtle cases are: Alter Table or Alter Domain statements on QD
get snapshot in Portal Run and then try to hold locks on the
target table in ProcessUtilitySlow. Here is the key point:
  1. try to hold lock ==> it might be blocked by other transactions
  2. then it will be waked up to continue
  3. when it can continue, the world has changed because other transactions
     then blocks it has been over

Previously, on QD we do not getsnapshot before we dispatch utility
statement to QEs which leads to the distributed snapshot does not
reflect the "world change". This will lead to some bugs. For example,
if the first transaction is to rewrite the whole heap, and then
the second Alter Table or Alter Domain statements continues with
the distributed snapshot that txn1 does not commit yet, it will
see no tuples in the new heap!

This commit fixes the issue by getting a local snapshot when
invoking `GetLatestSnapshot` when in QEs.

See Github issue: https://github.com/greenplum-db/gpdb/issues/10216Co-authored-by: NHubert Zhang <hzhang@pivotal.io>

d8f4a45f

Remove dtx_recovery_wait_lsn test · 791f3b01

由 Tyler Ramer 提交于 6月 01, 2020

The test addressed in this commit was added in commit f3df8b18, fails
for the entirely unrelated reason that, due to a modification of
sql_isolation_testcase.py, the line numbers are different.

I find this test very fragile for this reason, and for the fact that
we're relying on an execution failure in isolation2 python code to test
the database code. This means that any refactoring of isolation2 will
cause this test to fail - which should not be.

I looked into adding an ignore to the exact lines, but isolation2 wants
there to be a matched ignore in the input sql file - which makes the
test useless, because we're looking for some exact exception from
isolation2 from a valid sql input. Isolation2 doesn't give us the
framework to ignore just some messages on the output side. Using a
isolation2 init modification still just ignores the actual problem, but
in a different file.

This fix should just be considered a tempory work to get the pipeline
green while a better solution is determined later.
Authored-by: NTyler Ramer <tramer@pivotal.io>

791f3b01

Update isolation2 expected output considering changes in pg · 1131c5a9

由 Tyler Ramer 提交于 6月 01, 2020

The update to pygresql pg connection allows the output of sql isolation2
testing to be more similar to psql. Thus, we are reverting some of the
changes made in commits 20b3aa3a to instead be more inline with the
usual psql output. Notably, trailing zeroes on floats are trimmed.
Co-authored-by: NTyler Ramer <tramer@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>

1131c5a9

Refactor dbconn · 330db230

由 Tyler Ramer 提交于 5月 13, 2020

One reason pygresql was previously modified was that it did not handle
closing a connection very gracefully. In the process of updating
pygresql, we've wrapped the connection it provides with a
ClosingConnection function, which should handle gracefully closing the
connection when the "with dbconn.connect as conn" syntax is used.

This did, however, illustrate issues where a cursor might have been
created as the result of a dbconn.execSQL() call, which seems to hold
the connection open if not specifically closed.

It is therefore necessary to remove the ability to get a cursor from
dbconn.execSQL(). To highlight this difference, and to ensure that
future calls to this library is easy to use, I've cleaned up and
clarified the dbconn execution code, to include the following features.

- dbconn.execSQL() closes the cursor as part of the function. It returns
  no rows
- functions dbconn.query() is added, which behaves like dbconn.execSQL()
  except that it now returns a cursor
- function dbconn.execQueryforSingleton() is renamed
  dconn.querySingleton()
- function dbconn.execQueryforSingletonRow() is renamed
  dconn.queryRow()
Authored-by: NTyler Ramer <tramer@pivotal.io>

330db230

Update PyGreSQL from 4.0.0 to 5.1.2 · f5758021

由 Tyler Ramer 提交于 5月 26, 2020

This commit updates pygresql from 4.0.0 to 5.1.2, which requires
numerous changes to take advantages of the major result syntax change
that pygresql5 implemented. Of note, cursors or query objects
automatically cast returned values as appropriate python types - list of
ints, for example, instead of a string like "{1,2}". This is the bulk of
the changes.

Updating to pygresql 5.1.2 provides numerous benfits, including the
following:

- CVE-2018-1058 was addressed in pygresql 5.1.1

- We can save notices in the pgdb module, rather than relying on importing
the pg module, thanks to the new "set_notices()"

- pygresql 5 supports python3

- Thanks to a change in the cursor, using a "with" syntax guarentees a
  "commit" on the close of the with block.

This commit is a starting point for additional changes, including
refactoring the dbconn module.

Additionally, since isolation2 uses pygresql, some pl/python scripts
were updated, and isolation2 SQL output is further decoupled from
pygresql. The output of a psql command should be similar enough to
isolation2's pg output that minimal or no modification is needed to
ensure gpdiff can recognize the output.
Co-Authored-by: NTyler Ramer <tramer@pivotal.io>
Co-authored-by: NJamie McAtamney <jmcatamney@pivotal.io>

f5758021

16 6月, 2020 5 次提交

Properly mark null return from combine functions · 736898ad

由 Jesse Zhang 提交于 4月 16, 2020

We had a bug in a few of the combine functions where if the combine
function returned a NULL, it didn't set fcinfo->isnull = true. This led
to a segfault when we would spill in the final hashagg of a two-stage
agg inside the serial function. So, properly mark NULL outputs from the
combine functions.
Co-authored-by: NDenis Smirnov <sd@arenadata.io>
Co-authored-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>

736898ad

Fix double deduction of FREEABLE_BATCHFILE_METADATA · 66a0cb4d

由 Jesse Zhang 提交于 4月 17, 2020

Earlier, we always deducted FREEABLE_BATCHFILE_METADATA inside
closeSpillFile() regardless of whether the spill file was already
suspended. This deduction, is already performed inside
suspendSpillFiles(). This double accounting leads to
hashtable->mem_for_metadata becoming negative and we get:

FailedAssertion("!(hashtable->mem_for_metadata > 0)", File: "execHHashagg.c", Line: 2141)
Co-authored-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>

66a0cb4d

Fix assert condition in spill_hash_table() · 067bb350

由 Jesse Zhang 提交于 4月 17, 2020

This commit fixes the following assertion failure message reported in:
(#9902) https://github.com/greenplum-db/gpdb/issues/9902

FailedAssertion("!(hashtable->nbuckets > spill_set->num_spill_files)", File: "execHHashagg.c", Line: 1355)

hashtable->nbuckets can actually end up being equal to
spill_set->num_spill_files, which causes the failure. This is because:

hashtable->nbuckets is set with HashAggTableSizes->nbuckets, which can
end up being equal to: gp_hashagg_default_nbatches. Refer:
nbuckets = Max(nbuckets, gp_hashagg_default_nbatches);

Also, spill_set->num_spill_files is set with
HashAggTableSizes->nbatches, which is further set to
gp_hashagg_default_nbatches.

Thus, these two entities can be equal.
Co-authored-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>

067bb350

(
Increase retry count for pg_rewind tests' replication promotion and streaming. (#10292) · a3d8302a
由 (Jerome)Junfeng Yang 提交于 6月 16, 2020
```
Increase the retry count to prevent test failed. Most of the time, the
failure is because slow processing.
```
a3d8302a

Fix ICW test if GPDB compiled without ORCA · 9aa2b26c

由 Chris Hajas 提交于 6月 08, 2020

We need to ignore the output when enabling/disabling an Orca xform, as
if the server is not compiled with Orca there will be a diff (and we
don't really care about this output).

Additionally, clean up unnecessaary/excessive setting of GUCs

Some of these gucs were on by default or only intended for a specific
test. Explicitly setting them caused them to appear at the end of
`explain verbose` plans, making the expected output more difficult to
match with if the server was built with/without Orca.

9aa2b26c

15 6月, 2020 4 次提交

Retry more for replication synchronization waiting to avoid isolation2 test flakiness. (#10281) · ca360700

由 Paul Guo 提交于 6月 15, 2020

Some test cases have been failing due to too few retries. Let's increase them and also
create some common UDF for use.
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>
Reviewed-by: NAshwin Agrawal <aagrawal@pivotal.io>

ca360700

Fix flakiness of "select 1" output after master reset due to injected panic... · 02ad1fc4

由 Paul Guo 提交于 6月 15, 2020

Fix flakiness of "select 1" output after master reset due to injected panic fault before_read_command (#10275)

Several tests inject panic in before_read_command to trigger master reset.
Previous we run "select 1" after the fault inject query to verify, but the
output is not deterministic sometimes. i.e. sometimes we do not see the line

PANIC: fault triggered, fault name:'before_read_command' fault type:'panic'

This was actually observed in test crash_recovery_redundant_dtx per commit
message and test comment. It ignores the output of "select 1", but probably
we still want the output to verify the fault is encountered.

It's still mysterious why sometimes the PANIC message is missing. I spent some
time on digging but reckon that I can not root cause in short time. One guess
is that the PANIC message was although sent to the frontend in errfinish() but
the kernel buffer-ed data was dropped after abort() due to ereport(PANIC);
Another guess is something wrong related to libpq protocol (not saying it's a
libpq bug). In any case, it does not deserve much time to work on the tests
only, so simply mask the PANIC message to make the test result deterministic
and also not affect the test purpose.
Reviewed-by: NHubert Zhang <hzhang@pivotal.io>

02ad1fc4

Move to a resource group with memory_limit 0 · 37a19376

由 xiong-gang 提交于 6月 15, 2020

When move a query to a resource group whose memory_limit is 0, the available
memory is the current available global shared memory.

37a19376

Fix a recursive AbortTransaction issue · b5c4fdc0

由 xiong-gang 提交于 6月 15, 2020

When the error happens after ProcArrayEndTransaction, it will recurse back to
AbortTransaction, we need to make sure it will not generate extra WAL record
and not fail the assertions.

b5c4fdc0

13 6月, 2020 2 次提交

J

Update catalog version since commit d86f32e5 modify catalog · 434bb90b
由 Junfeng(Jerome) Yang 提交于 6月 13, 2020

434bb90b

Fix flaky test exttab1 and pxf_fdw · f154e5a5

由 Hubert Zhang 提交于 6月 13, 2020

The flaky case happens when select an external table with option
"fill missing fields". By gdb the qe, this value is not false
on QE sometimes. When ProcessCopyOptions, we use intVal(defel->arg)
to parse the boolean value, which is not correct. Using defGetBoolean
to replace it.
Also fix a pxf_fdw test case, which should set fill_missing_fields to true
explicitly.

f154e5a5

12 6月, 2020 1 次提交

(

Create external table fdw extension under gpcontrib. (#10187) · d86f32e5

由 (Jerome)Junfeng Yang 提交于 6月 12, 2020

Remove pg_exttable.h since the catalog is no longer exist anymore.
Move function declaration in pg_exttable.h into external.h.
Extract related code into external.c which maintains all codes that
can not be moved into an external table fdw extension.

Also, move the external table orca interface into external.c as a workaround.
Maybe provide orca fdw routine in the future.

Extract the external table's execution logic into external table fdw
extension.

Create the gp_exttable_fdw extension during gpinitsystem to allow
creating system external tables.

d86f32e5

11 6月, 2020 3 次提交

Revert "Fix flaky test exttab1" · f538f4b6

由 Hubert Zhang 提交于 6月 11, 2020

This reverts commit 026e4595.
This commit break pxf test case. We need to handle it firstly.

f538f4b6

Fix flaky test terminate_in_gang_creation · 63b5adf9

由 Hubert Zhang 提交于 6月 11, 2020

The test case restarts all primaries and expects the old session
would fail for the next query since gangs are cached.
But the restart may last more than 18s which is the max idle
time QEs could exist. In this case, the new query in the old
session will just fetch a new gang without expected errors.
Just set gp_vmem_idle_resource_timeout to 0 to fix this flaky test.
Reviewed-by: NPaul Guo <pguo@pivotal.io>

63b5adf9

Fix flaky test exttab1 · 026e4595

由 Hubert Zhang 提交于 6月 11, 2020

The flaky case happens when select an external table with option
"fill missing fields". By gdb the qe, this value is not false
on QE sometimes. When ProcessCopyOptions, we use intVal(defel->arg)
to parse the boolean value, which is not correct. Using defGetBoolean
to replace it.

026e4595

10 6月, 2020 2 次提交

Add GUC write_to_gpfdist_timeout (#10214) · ab737132

由 Huiliang.liu 提交于 6月 10, 2020

* Add GUC write_to_gpfdist_timeout

write_to_gpfdist_timeout controls timeout value (in seconds) for writing data to gpfdist server. Default value is 300, valid scope is [1, 7200]

Set CURLOPT_TIMEOUT as write_to_gpfdist_timeout
For any error, retry with double interval time, returns SQL ERROR if write_to_gpfdist_timeout is reached

Add regression test for GUC writable_external_table_timeout

ab737132

Fix test_gpdb slack command pipeline to work with new master changes · 6a979eec

由 Chris Hajas 提交于 6月 09, 2020

The python changes required new images, so we now need a separate
pipeline for slack commands from 6X. We also no longer need libsigar.

6a979eec