提交 · c65b58c9bcaaf488a6403d4e5f7bd1c88918ae46 · Greenplum / Gpdb

13 1月, 2018 40 次提交

H

Fix yet more mock calls to GpDB.initFromString() · c65b58c9
由 Heikki Linnakangas 提交于 1月 04, 2018

c65b58c9
H
Remove obsolete mock calls from gpstart test. · 47da2cc7
由 Heikki Linnakangas 提交于 1月 04, 2018
```
These functions don't exist anymore.
```
47da2cc7

Disable gp_replica_check for sequences. · a47e7e8b

由 Heikki Linnakangas 提交于 1月 04, 2018

There are some "known" differences that are currently not masked out.
I think the 'last_value' field is WAL-logged to a higher value than what's
written to disk in the master.

This was not visible before, when the demo cluster didn't have a standby
for the QD node. In QE nodes, sequences are not used.

a47e7e8b

H

Make gp_replica_check respond to query cancel faster. · 70558a97
由 Heikki Linnakangas 提交于 1月 04, 2018

70558a97
X
Fix tinc can not load test case · 34234d01
由 Xiaoran Wang 提交于 1月 04, 2018
```
Signed-off-by: NMax Yang <myang@pivotal.io>
```
34234d01

Remove pg_filespace from query in GPDBConfig. · 1e4faf8f

由 Ashwin Agrawal 提交于 1月 03, 2018

Also, remove caling GpRecover() in test_mpp23395. Ideally this test
needs to be re-written in isolation2 but this should atleast get it
green.

1e4faf8f

H
Ignore walsender slots that are not currently used in gp_replica_check. · 7d3463c2
由 Heikki Linnakangas 提交于 1月 04, 2018
```
Now that max_wal_senders is not pegged at 1, it's normal for there to be
unused slots.
```
7d3463c2
J
Fix more calls to GpDB.initFromString() · fe0f7008
由 Jacob Champion 提交于 1月 03, 2018
```
Follow-up to the previous commit.
```
fe0f7008
H

Fix more test code that constructs a mock "gparray". · 49dc51dc
由 Heikki Linnakangas 提交于 1月 03, 2018

49dc51dc
H
Fix the way this test gets location of datadir. · d0c73f51
由 Heikki Linnakangas 提交于 1月 03, 2018
```
It's now in gp_segment_configuration.
```
d0c73f51
H

Fix test cases, now that datadir is in gp_segment_configuration. · 4ddc2c8d
由 Heikki Linnakangas 提交于 1月 03, 2018

4ddc2c8d
H
Fix unit test for gparray.py. · 6c73147a
由 Heikki Linnakangas 提交于 1月 03, 2018
```
I removed some fields from here earlier, update unit test accordingly.
```
6c73147a
H

Remove obsolete filespace stuff from gpactivatestandby · f3cad0a7
由 Heikki Linnakangas 提交于 1月 03, 2018

f3cad0a7
H

Remove unit tests for remove gpfilespace utility. · 34d22eb8
由 Heikki Linnakangas 提交于 1月 03, 2018

34d22eb8
H

Remove obsolete filespace stuff from buildMirrorSegments.py. · 708c3b21
由 Heikki Linnakangas 提交于 1月 03, 2018

708c3b21
H

Fix another "gpinitstandby -F" reference in behave tests. · f08faecf
由 Heikki Linnakangas 提交于 1月 03, 2018

f08faecf
H
Attempt to fix behave test for 'gpinitsystem'. · 96ba67bb
由 Heikki Linnakangas 提交于 1月 03, 2018
```
The gpinitsystem -F option is now just path to the main data dir, without
a filespace name.
```
96ba67bb

Fix 'tablespace' test to work with ORCA. · 32fbf1e0

由 Heikki Linnakangas 提交于 1月 03, 2018

With ORCA, the CREATE TABLE AS used in the test created a randomly
distributed table, while with the Postgres planner, it was distributed by
the only column. A randomly distributed table cannot have an index, so you
got an error with that. Fix by forcing the distribution.

32fbf1e0

Update tests, now that QD node has a standby too, by default. · b0966a5e

由 Heikki Linnakangas 提交于 1月 03, 2018

I'm actually not sure why the QD now suddenly has a standby in the default
demo cluster. That seems to have happened as a side-effect of the filespace
removal commit. That change was supposedly made already in commit 67c9ab91b9
already, on Dec 20th, but seems that it didn't actually happen until now,
for some reason. In any case, that is what we want, so adjust the tests
to cope.

b0966a5e

Bump up max_wal_senders, to fix src/test/walrep/ tests. · 32a6b271

由 Heikki Linnakangas 提交于 1月 03, 2018

The test uses as a replication connection, so it didn't work with
max_wal_senders=1 if the QD node has a standby. And it seems like a good
idea to have some headroom, anyway. At some point, we ought to make this a
true GUC like in the upstream, but this will do for now.

32a6b271

H
Remove reference to removed 'filespace' python module. · f5a565c9
由 Heikki Linnakangas 提交于 1月 03, 2018
```
There are more references in other tools, but we're not testing those yet.
```
f5a565c9

Embed dbid in tablespace path, to avoid clash between servers on same host. · 72e20655

由 Heikki Linnakangas 提交于 1月 03, 2018

This is a backport upstream commit 22817041, from PostgreSQL 9.0, which
added the server version number in the path. But in GPDB, also include the
gp_dbid in the path. This makes it possible to use the same tablespace path
on multiple servers running on the same host, without clashing.

Also includes cherry-pick of the small upstream cleanup commits 5c82ccb1
a6f56efc, and c282b36d.

Re-enable upstream 'tablespace' regression test. It works now, even when
all the nodes are running on same host.

72e20655

Remove filespaces. · 5a3a39bc

由 Heikki Linnakangas 提交于 12月 20, 2017

Remove the concept of filespaces, revert tablespaces to work the same as
in upstream.

There is some leftovers in management tools. I don't know how to test all
that, and I was afraid of touching things I can't run. Also, we may need
to create replacements for some of those things on top of tablespaces, to
make the management of tablespaces easier, and it might be easier to modify
the existing tools than write them from scratch. (Yeah, you could always
look at the git history, but still.)

Per the discussion on gpdb-dev mailing list, the plan is to cherry-pick
commit 16d8e594 from PostgreSQL 9.2, to make it possible to have a
different path for a tablespace in the primary and its mirror. But that's
not included in this commit yet.

TODO: Make temp_tablespaces work.
TODO: Make pg_dump do something sensible, when dumping from a GPDB 5 cluster
that uses filespaces. Same with pg_upgrade.

Discussion: https://groups.google.com/a/greenplum.org/d/msg/gpdb-dev/sON4lraPEqg/v3lkM587BAAJ

5a3a39bc

H

Fix queries used in gpexpand test, gp_persistent_filespace_node is no more. · 29cb79a9
由 Heikki Linnakangas 提交于 1月 02, 2018

29cb79a9

Temporarily make failed TINC tests more verbose. · 2e70f925

由 Heikki Linnakangas 提交于 1月 02, 2018

In order to debug the MM_gpexpand_2 test failure. TINC prints the full
exception in the TINC log, I think, but I have no access to the log on
the CI machine, and I have not been able to run these tests locally.

2e70f925

Make newly-added gp_bloat_diag test less sensitive. · c91e320c

由 Heikki Linnakangas 提交于 1月 02, 2018

The test was sensitive to the number of pages in the pg_rewrite system
table's index, for no good reason. Also, don't create a new database for
it, to speed it up.

c91e320c

H
Remove LWLockWaitCancel(). · 67795819
由 Heikki Linnakangas 提交于 1月 02, 2018
```
It was only used by the filerep code. Now that that's gone, this was just
dead code.
```
67795819

Remove dead/unreferenced db state code (#4225) · 145eb5d6

由 Jacob Champion 提交于 1月 01, 2018

The DB_IN_STANDBY_NEW_TLI_SET state doesn't really seem to do anything
anymore, as of commit 813b817cc. Remove it entirely to get rid of an
assertion during standby tests. Also remove multipass function
declarations; they're gone.

145eb5d6

Disable new added 2PC test as failing in CI. · 6494b02c

由 Ashwin Agrawal 提交于 1月 01, 2018

This test passed locally, also in PR pipeline and forked pipeline multiple
times, but intermittently faling in main CI pipeline hence disabling the
same. The failure is happening when master is connecting to segments first time
after PANIC and its failing with

```
+LOG:  could not connect to segment: initialization of segworker group failed (cdbgang.c:235)
+LOG:  could not connect to segment: initialization of segworker group failed (cdbgang.c:235)

2018-01-02 02:44:16.225927 UTC,"gpadmin","isolation2test",p33565,th-1615808736,"[local]",,2018-01-02 02:44:16 UTC,0,con640,,seg-1,,,,sx1,"FATAL","XX000","DTM initialization: failure during startup recovery, retry failed, check segment status (cdbtm.c:1537)",,"Process 33565 will wait for gp_debug_linger=120 seconds before termination.
Note that its locks and other resources will not be released until then.",,,,,0,,"cdbtm.c",1537,"Stack trace:
1    0x9c1afb postgres errstart + 0x1db
2    0x9c3ca9 postgres elog_finish + 0xb9
3    0xadedea postgres initTM (cdbtm.c:1536)
4    0x9dac77 postgres InitPostgres + 0x857
5    0x8867a7 postgres PostgresMain + 0x207
6    0x81c97d postgres <symbol not found> (postmaster.c:0)
7    0x81eea2 postgres PostmasterMain + 0xc42
8    0x73f0a1 postgres main (main.c:206)
9    0x7f909b4fbd1d libc.so.6 __libc_start_main + 0xfd
10   0x4bf2b5 postgres <symbol not found> + 0x4bf2b5

```

Investigating and will renable this newly added test once able to find why first
connection is failing from master.

6494b02c

Revert "cs_walrep_1: disable gpactivatestandby tests for now" · c28fd064

由 Jacob Champion 提交于 1月 01, 2018

Unfortunately the cluster crashes anyway two tests later. Rather than
comment out half the tests to get a fake green, put this set of tests
back. We'll just have to solve this one problem at a time.

This reverts commit 5982a72614492916187ca27fc660d7cc7e3b69e1.

c28fd064

cs_walrep_1: disable gpactivatestandby tests for now · 1384a094

由 Jacob Champion 提交于 1月 01, 2018

The promotion logic that gpactivatestandby relies on doesn't work yet,
and when these tests fail, they leave the cluster completely unusable.

1384a094

Rewrite a 2PC test in isolation2. · c20ac186

由 Ashwin Agrawal 提交于 12月 29, 2017

This test in TINC is very shaky, as brings down primary and mirror and hence
affects gp_segment_configuration.

Test intends to fail broadcasting of COMMIT PREPARED to one segment and hence
trigger PANIC in master while after completing phase 2 of 2PC. Master's recovery
cycle should correctly broadcast COMMIT PREPARED again because master should
find distributed commit record in its xlog during recovery. Verify that the
transaction is committed after recovery. This scenario used to create cluster
inconsistency due to bug fixed now, as transaction used to get committed on all
segments except one where COMMIT PREPARED broadcast failed before
recovery. Master used to miss sending the COMMIT PREPARED across restart and
instead abort the transaction after querying in-doubt prepared transactions from
segments.

c20ac186

Add retry in isolation2 test framework for database restart. · b26fe4eb

由 Ashwin Agrawal 提交于 12月 29, 2017

To support writing tests where session can cause PANIC of master, add retry
logic while establishing connection in isolation2. This helps to keep the tests
simple.

b26fe4eb

A

Adding new faultinjector at star of FinishPreparedTransaction. · 9586ea99
由 Ashwin Agrawal 提交于 12月 29, 2017

9586ea99

Add retries using grace period for declaring mirror down. · cd647b1f

由 Ashwin Agrawal 提交于 12月 28, 2017

If fts detects primary as down, it retries n times before marking it down. But
mirror gets marked as down if connection to primary has not been made or
lost. This surfaced as problem mostly during cluster start (gpstart), where
sequence is to start primary and mirror followed by master. In many instances
when master probed primary, mirror connection was yet to be made and hence up
mirror in configuration unnecessarily got marked down, if if just few secs latr
mirror established connection to primary.

So, to avoid such sitations plus make it little resilient against minor network
glitches, adding variable to record when initialization or disconnection
happened. Using the same on fts probe find now can find how long mirror didn't
showed-up. Only if mirror didn't show-up for allowed period (30 secs) for now
report it was down, else request fts to retry the probe. This logic doesn't
affect regular flow also avoids any waiting in utilties for specific states
after cluster restart.

cd647b1f

gpstop -u should not specifically check for "No such process" · bb0b9cb5

由 Ashwin Agrawal 提交于 12月 27, 2017

If postmaster.pid file is present, reload will get error as "No such
process". But if postmaster.pid is not present then error returned back is
"pg_ctl: PID file "......../postmaster.pid" does not exist". So, its better not
to check for any particular error message but report segmnes failed to be
reloaded.

bb0b9cb5

Restore logic to skip databases cannot connect for oldest database. · 52884f32

由 Ashwin Agrawal 提交于 12月 27, 2017

GPDB skips databases that cannot be connected to in computing the oldest
database in vac_truncate_clog(). Make write_database_file() same, which was
reverted to upstream version. This helps to get the storage tests green for now.

Later can figure out and uniformly remove this code from vac_truncate_clog() and
write_database_file() if better solution is found to original issue for which
this check was added.

52884f32

A

gparray only add standbyMaster's host if not same as master. · 6554324b
由 Ashwin Agrawal 提交于 12月 27, 2017

6554324b

Add walrep specific states to gparray.py. · f812f6d6

由 Ashwin Agrawal 提交于 12月 27, 2017

With walrep we have new states 'n' not in sync. So, adding valid states
corresponding to it to let some tests pass. Lot more cleanup needs to happen of
this area to remove filerep specific states but that's work for different
commit.

f812f6d6

Remove tests for cross check between gp_relation_node and pg_aoseg · d97dfc1e

由 Ashwin Agrawal 提交于 12月 27, 2017

Since now gp_relation_node table no more exists, no point testing if ERROR is
reported if pg_aoseg and gp_relation_node are not in sync.

d97dfc1e