提交 0b73bd6c 编写于 作者: K Kalen Krempely 提交者: David Krieger

FTS: Ensure fresh results from a manual probe request

This patch addresses two main scenarios:
1) Allowing multiple probes both internal and external to reuse the same
results when appropriate (ie: piggybacking on previous results). Multiple
requests should share the same results if they all request before the start of
a new fts loop, and after the results of the previous probe.

2) Ensuring fresh results from an external probe. When a request occurs during
a current probe in progress, this request should get fresh results rather
"piggybacking" or using the current results.

We use similar logic as the checkpointer code to detect whether a probe is in
progress with a probe start tick and probe end tick. To request a probe, we
send a signal requesting a fts results, then wait for a new loop to start,
then wait again for that current loop to finish. This implementation uses a
busy wait loop, which includes a short sleep. In the future, we can
leverage the upstream conditaion variable implementation which enables us
to signal multiple fts notify processes.

This was done via a manual cherry-pick from
a674b6b3025b9dc56c4cb34b3330f8b7bc1bf757.
Co-authored-by: NSoumyadeep Chakraborty <sochakraborty@pivotal.io>
Co-authored-by: NKalen Krempely <kkrempely@pivotal.io>
Co-authored-by: NDavid Krieger <dkrieger@pivotal.io>
Co-authored-by: NTaylor Vesely <tvesely@pivotal.io>
Co-Authored-by: NAlexandra Wang <lewang@pivotal.io>
Co-Authored-by: NJimmy Yih <jyih@pivotal.io>
上级 fdecccf4
......@@ -82,7 +82,7 @@ FtsShmemInit(void)
shared->ControlLock = LWLockAssign();
ftsControlLock = shared->ControlLock;
shared->fts_probe_info.fts_statusVersion = 0;
shared->fts_probe_info.status_version = 0;
shared->pm_launch_walreceiver = false;
}
}
......@@ -99,21 +99,51 @@ ftsUnlock(void)
LWLockRelease(ftsControlLock);
}
/* see src/backend/fts/README */
void
FtsNotifyProber(void)
{
Assert(Gp_role == GP_ROLE_DISPATCH);
uint8 probeTick = ftsProbeInfo->probeTick;
int32 initial_started;
int32 started;
int32 done;
SpinLockAcquire(&ftsProbeInfo->lock);
initial_started = ftsProbeInfo->start_count;
SpinLockRelease(&ftsProbeInfo->lock);
/* signal fts-probe */
SendPostmasterSignal(PMSIGNAL_WAKEN_FTS);
/* sit and spin */
while (probeTick == ftsProbeInfo->probeTick)
SIMPLE_FAULT_INJECTOR("ftsNotify_before");
/* Wait for a new fts probe to start. */
for (;;)
{
SpinLockAcquire(&ftsProbeInfo->lock);
started = ftsProbeInfo->start_count;
SpinLockRelease(&ftsProbeInfo->lock);
if (started != initial_started)
break;
CHECK_FOR_INTERRUPTS();
pg_usleep(50000);
}
/* Wait until current probe in progress is completed */
for (;;)
{
SpinLockAcquire(&ftsProbeInfo->lock);
done = ftsProbeInfo->done_count;
SpinLockRelease(&ftsProbeInfo->lock);
if (done - started >= 0)
break;
CHECK_FOR_INTERRUPTS();
pg_usleep(50000);
}
}
/*
......@@ -127,7 +157,7 @@ FtsIsSegmentDown(CdbComponentDatabaseInfo *dBInfo)
if (dBInfo->config->segindex == MASTER_SEGMENT_ID)
return false;
return FTS_STATUS_IS_DOWN(ftsProbeInfo->fts_status[dBInfo->config->dbid]);
return FTS_STATUS_IS_DOWN(ftsProbeInfo->status[dBInfo->config->dbid]);
}
/*
......@@ -160,5 +190,5 @@ FtsTestSegmentDBIsDown(SegmentDatabaseDescriptor **segdbDesc, int size)
uint8
getFtsVersion(void)
{
return ftsProbeInfo->fts_statusVersion;
return ftsProbeInfo->status_version;
}
Fault Tolerance Service (FTS)
=============================
This document illustrates the mechanism of a GPDB component called
Fault Tolerance Service (FTS):
- This sections explains how FTS probe process is started. The FTS
probe process is running on the Gp_entry_postmaster (master node)
only. It starts as a background worker process managed by the
BackgroundWorker structure. (see
src/include/postmaster/bgworker.h). Greenplum sets up a group of
GP background processes through an array structure PMAuxProcList. A
entry in that struct represents a GP background process.
Two functions pointers are important members of the
BackgroundWorker structure. One points to main entry function of
the GP background process. The other points to the the function
that determine if the process should be started or not. For FTS,
these two functions are FtsProbeMain() and FtsProbeStartRule(),
respectively. This is hard-coded in postmaster.c.
#define MaxPMAuxProc 6
static BackgroundWorker PMAuxProcList[MaxPMAuxProc]
In Postmaster, we will check the following condition:
Gp_entry_postmaster && Gp_role == GP_ROLE_DISPATCH
The FTS probe process is started when the condition is true.
In the initialization phase, we register one BackgroundWorker entry
for each GP background process into postmaster's private structure
BackgroundWorkerList. When we do this, the above condition is
checked to decide if FTS should be registered there or not. The
reader may want to check load_auxiliary_libraries() for more
detail.
Later, the postmaster tries to start the processes that have been
registered in the BackgroundWorkerList, which includes the FTS
probe process. If first attempt to start a particular process
fails, or a process goes down for some reason and needs to be
brought up again, postmaster restarts it in its main loop. Every
iteration, it checks the status of these processes and acts
accordingly.
2. This sections explains how FTS probes are or can be
initiated. Either the probes are trigged at regular defined
interval (which can be tuned via GUC) or triggerd on the fly when
required by certain internal components or tests or user via FTS
probe triggering function.
The FTS probe process always runs in a infinite loop, does a round
of polling at each iteration to get the health status of all
segments. At each iteration, it waits on a latch with timeout to
block itself for a while. Thus, two types of events might trigger
the polling. One is timeout on the latch it is waiting for, and the
other one is that someone sets the latch.
Certain components running on master node may interrupt FTS from
its wait to trigger a probe immediately. This is referred to as
notifying FTS. Dispatcher is one such component. As an example, it
can notify FTS if it encounters an error while creating a gang. The
reader may check FtsNotifyProber() to find more cases.
3. On the master node, the FTS probe process gets the configuration
from catalog table gp_segment_configuration, which describes the
status of each segment and also reflects if any of them has a
mirror. For each unique content(or segindex) value, will see a
primary segment and may see a mirror segment. The two make a pair
and they have the same content(or segindex) value but different
dbid.
FTS probes only the primary segments. Primary segments provide
their own status as well as their mirror's status in response. When
a primary segment is found to be down, FTS promotes its mirror,
only if it was in-sync with the primary. If the mirror is
out-of-sync, this is considered "double failure" and FTS does
nothing. The cluster is unusable in this case.
If FTS, upon probing segments, finds any change, it would update
segment configuration. Dispatcher would then use the new
configuration to create gangs.
So FTS both read and write the catalog table.
4. On the master node: each round of the polling is done in a chain of
calls :
ftsConnect()
ftsPoll()
ftsSend()
ftsReceive()
processRetry()
processResponse().
FTS probe process connects to each primary segment node(or mirror
segment when failover occurs) through TCP/IP. It sends requests to
segment and waits for the responses. Once a response is received,
it updates the catalog table gp_segment_configuration and
gp_configuration_history, and also relevant memory structures
accordingly.
5. On the segment node: in the main loop of PostgresMain(), the
requests from the master FTS probe process
received. ProcessStartupPacket() is called first to make sure this
dialog is for FTS requests and thus the Postgres process spawn for
it would be a FTS handler(am_ftshandler = true). Then it accepts
the request and process the ‘Q’ type message using
HandleFtsMessage(). This function deals with three kinds of
requests:
- Probe
- Sync
- Promote
6. SIGUSR2 is ignored by FTS now, like other background, postmaster
use SIGTERM to stop the FTS.
FTS Probe Request
=================
Currently there are three ways to trigger an FTS probe - two internal and one
external:
1. An internal regular FTS probe that is configurable with gp_fts_probe_interval
2. An internal FTS probe triggered by the query dispatcher
3. An external manual FTS probe from gp_request_fts_probe_scan()
The following diagram illustrates the fts loop process. The upper portion of the
loop represents a current probe in progress, and the lower portion represents a
completed probe awaiting a trigger including the gp_fts_probe_interval timeout.
This loop can be probed at anytime for results due to any of the above three
mechanisms.
poll segments
+---------<--------+
| | <-----+ request4
| upper |
| |
| ^
done| |start
| |
v lower |
| |
| | <-----+ request1, request2, request3
+----------->------+
waitLatch
Two main scenarios to consider:
1) Allowing multiple probes both internal and external to reuse the same results
when appropriate (ie: piggybacking on previous results). This is depicted as
requests 1, 2, and 3 which should share the same results since they request
before the start of a new fts loop, and after the results of the previous probe
- that is in the lower portion.
2) Ensuring fresh results from an external probe. This is depicted as as request
4 incoming during a current probe in progress. This request should get fresh
results rather than using the current results (ie: "piggybacking").
Our implementation addresses these concerns with a probe start tick and probe
end tick. We send a signal requesting fts results, then wait for a new loop to
start, and then wait for that current loop to finish.
......@@ -174,16 +174,16 @@ CdbComponentDatabases *readCdbComponentInfoAndUpdateStatus(MemoryContext probeCo
if (!SEGMENT_IS_ALIVE(segInfo))
FTS_STATUS_SET_DOWN(segStatus);
ftsProbeInfo->fts_status[segInfo->config->dbid] = segStatus;
ftsProbeInfo->status[segInfo->config->dbid] = segStatus;
}
/*
* Initialize fts_stausVersion after populating the config details in
* shared memory for the first time after FTS startup.
*/
if (ftsProbeInfo->fts_statusVersion == 0)
if (ftsProbeInfo->status_version == 0)
{
ftsProbeInfo->fts_statusVersion++;
ftsProbeInfo->status_version++;
writeGpSegConfigToFTSFiles();
}
......@@ -318,8 +318,14 @@ void FtsLoop()
CHECK_FOR_INTERRUPTS();
SIMPLE_FAULT_INJECTOR("ftsLoop_before_probe");
probe_start_time = time(NULL);
SpinLockAcquire(&ftsProbeInfo->lock);
ftsProbeInfo->start_count++;
SpinLockRelease(&ftsProbeInfo->lock);
/* Need a transaction to access the catalogs */
StartTransactionCommand();
......@@ -378,15 +384,20 @@ void FtsLoop()
writeGpSegConfigToFTSFiles();
CommitTransactionCommand();
ftsProbeInfo->fts_statusVersion++;
ftsProbeInfo->status_version++;
}
}
/* free current components info and free ip addr caches */
cdbcomponent_destroyCdbComponents();
SIMPLE_FAULT_INJECTOR("ftsLoop_after_probe");
/* Notify any waiting backends about probe cycle completion. */
ftsProbeInfo->probeTick++;
SpinLockAcquire(&ftsProbeInfo->lock);
ftsProbeInfo->done_count = ftsProbeInfo->start_count;
SpinLockRelease(&ftsProbeInfo->lock);
/* check if we need to sleep before starting next iteration */
elapsed = time(NULL) - probe_start_time;
......@@ -397,6 +408,8 @@ void FtsLoop()
WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
timeout * 1000L);
SIMPLE_FAULT_INJECTOR("ftsLoop_after_latch");
ResetLatch(&MyProc->procLatch);
/* emergency bailout if postmaster has died */
......
......@@ -939,14 +939,14 @@ updateConfiguration(CdbComponentDatabaseInfo *primary,
Assert(ftsProbeInfo);
ftsLock();
if (IsPrimaryAlive)
FTS_STATUS_SET_UP(ftsProbeInfo->fts_status[primary->config->dbid]);
FTS_STATUS_SET_UP(ftsProbeInfo->status[primary->config->dbid]);
else
FTS_STATUS_SET_DOWN(ftsProbeInfo->fts_status[primary->config->dbid]);
FTS_STATUS_SET_DOWN(ftsProbeInfo->status[primary->config->dbid]);
if (IsMirrorAlive)
FTS_STATUS_SET_UP(ftsProbeInfo->fts_status[mirror->config->dbid]);
FTS_STATUS_SET_UP(ftsProbeInfo->status[mirror->config->dbid]);
else
FTS_STATUS_SET_DOWN(ftsProbeInfo->fts_status[mirror->config->dbid]);
FTS_STATUS_SET_DOWN(ftsProbeInfo->status[mirror->config->dbid]);
ftsUnlock();
}
......
......@@ -32,9 +32,11 @@
typedef struct FtsProbeInfo
{
volatile uint8 fts_statusVersion;
volatile uint8 probeTick;
volatile uint8 fts_status[FTS_MAX_DBS];
volatile uint8 status_version;
volatile uint8 status[FTS_MAX_DBS];
volatile slock_t lock;
volatile int32 start_count;
volatile int32 done_count;
} FtsProbeInfo;
#define FTS_MAX_TRANSIENT_STATE 100
......
......@@ -103,3 +103,7 @@ BEGIN
);
END;
$$ language plpgsql;
create or replace function master() returns setof gp_segment_configuration as $$
select * from gp_segment_configuration where role='p' and content=-1;
$$ language sql;
-- See src/backend/fts/README for background information
--
-- This tests two scenarios:
-- 1) Piggyback Test
-- Ensure multiple probe requests come in before the start of a new ftsLoop,
-- then all those requests share the same result.
--
-- 2) Fresh Result Test
-- Ensure fresh results when a probe request occurs during an in progress
-- ftsLoop.
--
-- It is useful to remember that the FtsLoop and each FtsNotifyProbe are
-- individual processes. Careful use of fault injectors are needed to have
-- complete and consistent control over the flow of the two independent
-- processes - the ftsLoop and FtsNotifyProber's.
--
-- fts_probe_stats is only queried when the ftsLoop
-- is stopped at a known location to ensure a consistent view of the stats.
--
-- NOTE: you must add '--load-extension=gp_inject_fault' to the commandline
-- for a manual test.
include: helpers/server_helpers.sql;
select gp_inject_fault2('all', 'reset', 1, hostname, port) from master();
create temp table fts_probe_results(seq serial, seq_name varchar(20),
current_started int, expected_start_delta int,
current_done int, expected_done_delta int);
-- create extension only on master since the fts process is only on master
create or replace function fts_probe_stats() returns table (
start_count int,
done_count int,
status_version int2
)
as '/@abs_builddir@/../regress/regress.so', 'gp_fts_probe_stats' language c execute on master reads sql data;
create or replace view get_raw_stats as
select
seq,
seq_name,
current_started,
expected_start_delta,
current_started - min(current_started) over () as actual_start_delta, -- actual_start_delta = current_started - initial_started
current_done,
expected_done_delta,
current_done - min(current_done) over () as actual_done_delta -- actual_done_delta = current_done - initial_done
from fts_probe_results order by seq;
create or replace view get_stats as
select seq, seq_name,
expected_start_delta, actual_start_delta,
expected_done_delta, actual_done_delta
from get_raw_stats order by seq desc limit 1;
drop function if exists insert_expected_stats(int, int);
create or replace function insert_expected_stats(seq_name varchar(20), expected_start_delta int, expected_done_delta int) returns void as $$
INSERT INTO fts_probe_results (seq_name, current_started, expected_start_delta, current_done, expected_done_delta) /* inside a function */
SELECT seq_name, /* inside a function */
start_count AS current_started, /* inside a function */
expected_start_delta, /* inside a function */
done_count AS current_done, /* inside a function */
expected_done_delta /* inside a function */
FROM fts_probe_stats(); /* inside a function */
$$ language sql volatile;
-- ensure the internal regular probes do not affect our test
!\retcode gpconfig -c gp_fts_probe_interval -v 3600;
!\retcode gpstop -u;
-- ensure there is no in progress ftsLoop after reloading the gp_fts_probe_interval
select gp_request_fts_probe_scan();
select insert_expected_stats('initial', 0, 0);
select * from get_stats;
-- piggyback test: start multiple probes
select gp_inject_fault_infinite2('ftsNotify_before', 'suspend', 1, hostname, port) from master();
select gp_inject_fault_infinite2('ftsLoop_after_latch', 'suspend', 1, hostname, port) from master();
select gp_inject_fault_infinite2('ftsLoop_before_probe', 'suspend', 1, hostname, port) from master();
1&: select gp_request_fts_probe_scan();
2&: select gp_request_fts_probe_scan();
3&: select gp_request_fts_probe_scan();
-- piggyback: ensure the probe requests are at a known starting location
select gp_wait_until_triggered_fault2('ftsNotify_before', 3, 1, hostname, port) from master();
-- piggyback: ensure the ftsLoop is triggered only once
select gp_wait_until_triggered_fault2('ftsLoop_after_latch', 1, 1, hostname, port) from master();
select gp_inject_fault2('ftsLoop_after_latch', 'resume', 1, hostname, port) from master();
-- piggyback: ensure the ftsLoop is at a known starting location
select gp_wait_until_triggered_fault2('ftsLoop_before_probe', 1, 1, hostname, port) from master();
select insert_expected_stats('top_of_ftsLoop', 0, 0);
select * from get_stats;
select gp_inject_fault2('ftsNotify_before', 'resume', 1, hostname, port) from master();
-- piggyback: trap the probe requests inside the ftsLoop
select gp_inject_fault_infinite2('ftsLoop_after_probe', 'suspend', 1, hostname, port) from master();
select gp_inject_fault2('ftsLoop_before_probe', 'resume', 1, hostname, port) from master();
select gp_wait_until_triggered_fault2('ftsLoop_after_probe', 1, 1, hostname, port) from master();
select insert_expected_stats('bottom_of_ftsLoop', 1, 0);
select * from get_stats;
-- fresh result test: issue a new probe request during the in progress piggyback ftsLoop
select gp_inject_fault2('ftsLoop_before_probe', 'reset', 1, hostname, port) from master();
select gp_inject_fault_infinite2('ftsLoop_before_probe', 'suspend', 1, hostname, port) from master();
4&: select gp_request_fts_probe_scan();
-- piggyback: resume the suspended piggyback ftsLoop
select gp_inject_fault2('ftsLoop_after_probe', 'resume', 1, hostname, port) from master();
1<:
2<:
3<:
-- fresh result: ensure the next ftsLoop iteration is at a known starting location
select gp_wait_until_triggered_fault2('ftsLoop_before_probe', 1, 1, hostname, port) from master();
-- piggyback: query the probe stats before the start of the 'fresh result' ftsLoop
select insert_expected_stats('piggyback_result', 1, 1);
select * from get_stats;
-- fresh result: resume the suspended ftsLoop
select gp_inject_fault2('ftsLoop_before_probe', 'resume', 1, hostname, port) from master();
4<:
select insert_expected_stats('fresh_result', 2, 2);
select * from get_stats;
-- show all raw stats for debugging
-- start_ignore
select * from get_raw_stats;
-- end_ignore
-- reset the internal regular probe interval
!\retcode gpconfig -r gp_fts_probe_interval;
!\retcode gpstop -u;
......@@ -164,6 +164,7 @@ test: segwalrep/dtm_recovery_on_standby
test: segwalrep/commit_blocking_on_standby
test: pg_basebackup
test: pg_basebackup_with_tablespaces
test: fts_manual_probe
# Reindex tests
test: reindex/abort_reindex
......
-- See src/backend/fts/README for background information
--
-- This tests two scenarios:
-- 1) Piggyback Test
-- Ensure multiple probe requests come in before the start of a new ftsLoop,
-- then all those requests share the same result.
--
-- 2) Fresh Result Test
-- Ensure fresh results when a probe request occurs during an in progress
-- ftsLoop.
--
-- It is useful to remember that the FtsLoop and each FtsNotifyProbe are
-- individual processes. Careful use of fault injectors are needed to have
-- complete and consistent control over the flow of the two independent
-- processes - the ftsLoop and FtsNotifyProber's.
--
-- fts_probe_stats is only queried when the ftsLoop
-- is stopped at a known location to ensure a consistent view of the stats.
--
-- NOTE: you must add '--load-extension=gp_inject_fault' to the commandline
-- for a manual test.
include: helpers/server_helpers.sql;
CREATE
select gp_inject_fault2('all', 'reset', 1, hostname, port) from master();
gp_inject_fault2
------------------
Success:
(1 row)
create temp table fts_probe_results(seq serial, seq_name varchar(20), current_started int, expected_start_delta int, current_done int, expected_done_delta int);
CREATE
-- create extension only on master since the fts process is only on master
create or replace function fts_probe_stats() returns table ( start_count int, done_count int, status_version int2 ) as '/@abs_builddir@/../regress/regress.so', 'gp_fts_probe_stats' language c execute on master reads sql data;
CREATE
create or replace view get_raw_stats as select seq, seq_name, current_started, expected_start_delta, current_started - min(current_started) over () as actual_start_delta, -- actual_start_delta = current_started - initial_started current_done, expected_done_delta, current_done - min(current_done) over () as actual_done_delta -- actual_done_delta = current_done - initial_done from fts_probe_results order by seq;
CREATE
create or replace view get_stats as select seq, seq_name, expected_start_delta, actual_start_delta, expected_done_delta, actual_done_delta from get_raw_stats order by seq desc limit 1;
CREATE
drop function if exists insert_expected_stats(int, int);
DROP
create or replace function insert_expected_stats(seq_name varchar(20), expected_start_delta int, expected_done_delta int) returns void as $$ INSERT INTO fts_probe_results (seq_name, current_started, expected_start_delta, current_done, expected_done_delta) /* inside a function */ SELECT seq_name, /* inside a function */ start_count AS current_started, /* inside a function */ expected_start_delta, /* inside a function */ done_count AS current_done, /* inside a function */ expected_done_delta /* inside a function */ FROM fts_probe_stats(); /* inside a function */ $$ language sql volatile;
CREATE
-- ensure the internal regular probes do not affect our test
!\retcode gpconfig -c gp_fts_probe_interval -v 3600;
-- start_ignore
20190730:11:15:27:045870 gpconfig:office-5-75:dkrieger-[INFO]:-completed successfully with parameters '-c gp_fts_probe_interval -v 3600'
-- end_ignore
(exited with code 0)
!\retcode gpstop -u;
-- start_ignore
20190730:11:15:27:045929 gpstop:office-5-75:dkrieger-[INFO]:-Starting gpstop with args: -u
20190730:11:15:27:045929 gpstop:office-5-75:dkrieger-[INFO]:-Gathering information and validating the environment...
20190730:11:15:27:045929 gpstop:office-5-75:dkrieger-[INFO]:-Obtaining Greenplum Master catalog information
20190730:11:15:27:045929 gpstop:office-5-75:dkrieger-[INFO]:-Obtaining Segment details from master...
20190730:11:15:27:045929 gpstop:office-5-75:dkrieger-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 7.0.0-alpha.0+dev.575.g59811832fc build dev'
20190730:11:15:27:045929 gpstop:office-5-75:dkrieger-[INFO]:-Signalling all postmaster processes to reload
-- end_ignore
(exited with code 0)
-- ensure there is no in progress ftsLoop after reloading the gp_fts_probe_interval
select gp_request_fts_probe_scan();
gp_request_fts_probe_scan
---------------------------
t
(1 row)
select insert_expected_stats('initial', 0, 0);
insert_expected_stats
-----------------------
(1 row)
select * from get_stats;
seq | seq_name | expected_start_delta | actual_start_delta | expected_done_delta | actual_done_delta
-----+----------+----------------------+--------------------+---------------------+-------------------
1 | initial | 0 | 0 | 0 | 0
(1 row)
-- piggyback test: start multiple probes
select gp_inject_fault_infinite2('ftsNotify_before', 'suspend', 1, hostname, port) from master();
gp_inject_fault_infinite2
---------------------------
Success:
(1 row)
select gp_inject_fault_infinite2('ftsLoop_after_latch', 'suspend', 1, hostname, port) from master();
gp_inject_fault_infinite2
---------------------------
Success:
(1 row)
select gp_inject_fault_infinite2('ftsLoop_before_probe', 'suspend', 1, hostname, port) from master();
gp_inject_fault_infinite2
---------------------------
Success:
(1 row)
1&: select gp_request_fts_probe_scan(); <waiting ...>
2&: select gp_request_fts_probe_scan(); <waiting ...>
3&: select gp_request_fts_probe_scan(); <waiting ...>
-- piggyback: ensure the probe requests are at a known starting location
select gp_wait_until_triggered_fault2('ftsNotify_before', 3, 1, hostname, port) from master();
gp_wait_until_triggered_fault2
--------------------------------
Success:
(1 row)
-- piggyback: ensure the ftsLoop is triggered only once
select gp_wait_until_triggered_fault2('ftsLoop_after_latch', 1, 1, hostname, port) from master();
gp_wait_until_triggered_fault2
--------------------------------
Success:
(1 row)
select gp_inject_fault2('ftsLoop_after_latch', 'resume', 1, hostname, port) from master();
gp_inject_fault2
------------------
Success:
(1 row)
-- piggyback: ensure the ftsLoop is at a known starting location
select gp_wait_until_triggered_fault2('ftsLoop_before_probe', 1, 1, hostname, port) from master();
gp_wait_until_triggered_fault2
--------------------------------
Success:
(1 row)
select insert_expected_stats('top_of_ftsLoop', 0, 0);
insert_expected_stats
-----------------------
(1 row)
select * from get_stats;
seq | seq_name | expected_start_delta | actual_start_delta | expected_done_delta | actual_done_delta
-----+----------------+----------------------+--------------------+---------------------+-------------------
2 | top_of_ftsLoop | 0 | 0 | 0 | 0
(1 row)
select gp_inject_fault2('ftsNotify_before', 'resume', 1, hostname, port) from master();
gp_inject_fault2
------------------
Success:
(1 row)
-- piggyback: trap the probe requests inside the ftsLoop
select gp_inject_fault_infinite2('ftsLoop_after_probe', 'suspend', 1, hostname, port) from master();
gp_inject_fault_infinite2
---------------------------
Success:
(1 row)
select gp_inject_fault2('ftsLoop_before_probe', 'resume', 1, hostname, port) from master();
gp_inject_fault2
------------------
Success:
(1 row)
select gp_wait_until_triggered_fault2('ftsLoop_after_probe', 1, 1, hostname, port) from master();
gp_wait_until_triggered_fault2
--------------------------------
Success:
(1 row)
select insert_expected_stats('bottom_of_ftsLoop', 1, 0);
insert_expected_stats
-----------------------
(1 row)
select * from get_stats;
seq | seq_name | expected_start_delta | actual_start_delta | expected_done_delta | actual_done_delta
-----+-------------------+----------------------+--------------------+---------------------+-------------------
3 | bottom_of_ftsLoop | 1 | 1 | 0 | 0
(1 row)
-- fresh result test: issue a new probe request during the in progress piggyback ftsLoop
select gp_inject_fault2('ftsLoop_before_probe', 'reset', 1, hostname, port) from master();
gp_inject_fault2
------------------
Success:
(1 row)
select gp_inject_fault_infinite2('ftsLoop_before_probe', 'suspend', 1, hostname, port) from master();
gp_inject_fault_infinite2
---------------------------
Success:
(1 row)
4&: select gp_request_fts_probe_scan(); <waiting ...>
-- piggyback: resume the suspended piggyback ftsLoop
select gp_inject_fault2('ftsLoop_after_probe', 'resume', 1, hostname, port) from master();
gp_inject_fault2
------------------
Success:
(1 row)
1<: <... completed>
gp_request_fts_probe_scan
---------------------------
t
(1 row)
2<: <... completed>
gp_request_fts_probe_scan
---------------------------
t
(1 row)
3<: <... completed>
gp_request_fts_probe_scan
---------------------------
t
(1 row)
-- fresh result: ensure the next ftsLoop iteration is at a known starting location
select gp_wait_until_triggered_fault2('ftsLoop_before_probe', 1, 1, hostname, port) from master();
gp_wait_until_triggered_fault2
--------------------------------
Success:
(1 row)
-- piggyback: query the probe stats before the start of the 'fresh result' ftsLoop
select insert_expected_stats('piggyback_result', 1, 1);
insert_expected_stats
-----------------------
(1 row)
select * from get_stats;
seq | seq_name | expected_start_delta | actual_start_delta | expected_done_delta | actual_done_delta
-----+------------------+----------------------+--------------------+---------------------+-------------------
4 | piggyback_result | 1 | 1 | 1 | 1
(1 row)
-- fresh result: resume the suspended ftsLoop
select gp_inject_fault2('ftsLoop_before_probe', 'resume', 1, hostname, port) from master();
gp_inject_fault2
------------------
Success:
(1 row)
4<: <... completed>
gp_request_fts_probe_scan
---------------------------
t
(1 row)
select insert_expected_stats('fresh_result', 2, 2);
insert_expected_stats
-----------------------
(1 row)
select * from get_stats;
seq | seq_name | expected_start_delta | actual_start_delta | expected_done_delta | actual_done_delta
-----+--------------+----------------------+--------------------+---------------------+-------------------
5 | fresh_result | 2 | 2 | 2 | 2
(1 row)
-- show all raw stats for debugging
-- start_ignore
select * from get_raw_stats;
seq | seq_name | current_started | expected_start_delta | actual_start_delta | current_done | expected_done_delta | actual_done_delta
-----+-------------------+-----------------+----------------------+--------------------+--------------+---------------------+-------------------
1 | initial | 58 | 0 | 0 | 58 | 0 | 0
2 | top_of_ftsLoop | 58 | 0 | 0 | 58 | 0 | 0
3 | bottom_of_ftsLoop | 59 | 1 | 1 | 58 | 0 | 0
4 | piggyback_result | 59 | 1 | 1 | 59 | 1 | 1
5 | fresh_result | 60 | 2 | 2 | 60 | 2 | 2
(5 rows)
-- end_ignore
-- reset the internal regular probe interval
!\retcode gpconfig -r gp_fts_probe_interval;
-- start_ignore
20190730:11:15:35:045960 gpconfig:office-5-75:dkrieger-[INFO]:-completed successfully with parameters '-r gp_fts_probe_interval'
-- end_ignore
(exited with code 0)
!\retcode gpstop -u;
-- start_ignore
20190730:11:15:35:046018 gpstop:office-5-75:dkrieger-[INFO]:-Starting gpstop with args: -u
20190730:11:15:35:046018 gpstop:office-5-75:dkrieger-[INFO]:-Gathering information and validating the environment...
20190730:11:15:35:046018 gpstop:office-5-75:dkrieger-[INFO]:-Obtaining Greenplum Master catalog information
20190730:11:15:35:046018 gpstop:office-5-75:dkrieger-[INFO]:-Obtaining Segment details from master...
20190730:11:15:35:046018 gpstop:office-5-75:dkrieger-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 7.0.0-alpha.0+dev.575.g59811832fc build dev'
20190730:11:15:35:046018 gpstop:office-5-75:dkrieger-[INFO]:-Signalling all postmaster processes to reload
-- end_ignore
(exited with code 0)
......@@ -32,6 +32,7 @@
#include "cdb/memquota.h"
#include "cdb/cdbdisp_query.h"
#include "cdb/cdbdispatchresult.h"
#include "cdb/cdbfts.h"
#include "cdb/cdbgang.h"
#include "cdb/cdbvars.h"
#include "cdb/ml_ipc.h"
......@@ -107,6 +108,8 @@ extern Datum gp_get_next_oid(PG_FUNCTION_ARGS);
/* Broken output function, for testing */
extern Datum broken_int4out(PG_FUNCTION_ARGS);
/* fts tests */
extern Datum gp_fts_probe_stats(PG_FUNCTION_ARGS);
/* Triggers */
......@@ -860,6 +863,49 @@ describe(PG_FUNCTION_ARGS)
PG_RETURN_POINTER(tupdesc);
}
PG_FUNCTION_INFO_V1(gp_fts_probe_stats);
Datum
gp_fts_probe_stats(PG_FUNCTION_ARGS)
{
Assert(GpIdentity.dbid == MASTER_DBID);
TupleDesc tupdesc;
int32 start_count = 0;
int32 done_count = 0;
uint8 status_version = 0;
SpinLockAcquire(&ftsProbeInfo->lock);
start_count = ftsProbeInfo->start_count;
done_count = ftsProbeInfo->done_count;
status_version = ftsProbeInfo->status_version;
SpinLockRelease(&ftsProbeInfo->lock);
/* Build a result tuple descriptor */
tupdesc = CreateTemplateTupleDesc(3, false);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "start_count", INT4OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "end_count", INT4OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 3, "status_version", INT2OID, -1, 0);
tupdesc = BlessTupleDesc(tupdesc);
{
Datum values[3];
bool nulls[3];
HeapTuple tuple;
Datum result;
MemSet(values, 0, sizeof(values));
MemSet(nulls, false, sizeof(nulls));
values[0] = Int32GetDatum(start_count);
values[1] = Int32GetDatum(done_count);
values[2] = UInt8GetDatum(status_version);
tuple = heap_form_tuple(tupdesc, values, nulls);
result = HeapTupleGetDatum(tuple);
PG_RETURN_DATUM(result);
}
}
PG_FUNCTION_INFO_V1(project);
Datum
project(PG_FUNCTION_ARGS)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册