提交 c0b4d5bc 编写于 作者: B Bhuvnesh Chaudhary 提交者: Bhuvnesh

Fix xids on segments

Problem: After upgrade of the database 5->6 or 6->6, when a create table
is executed, it reports the below warning when you create a table.
```
WARNING:  database with OID 0 must be vacuumed within 147483647
transactions  (seg2 127.0.0.1:25434 pid=97928)
HINT:  To avoid a database shutdown, execute a database-wide VACUUM in
that database.
```
If you further go to run VACUUM FREEZE, it reports the below error:
```
template1=# vacuum freeze;
ERROR:  found xmin 711 from before relfrozenxid 800  (seg0
127.0.0.1:50433 pid=39596)
```
Even trying to run VACUUM FREEZE on the upgraded tables reports the same
error as above.

For upgrading gpdb database, first we upgrade the master and then the
segments. The interesting ones related to the problem here are the below

```
1. Create a new cluster
2. Upgrade the master of the new cluster
    2.a) Execute copy_clog_xlog_xid - Which executes pg_resetxlog to
    the set new control data files using the values based on the old master
    control data file.
    2.b) Execute set_frozenxids: Which updates the
    catalog with datfrozenxid, datminmxid. relfrozenxid, relminmxid based on
    the corresponding old master
    2.c) Restore the schema
    2.d) Vacuum Freeze - This updates the value of `checkpoint's
    oldestXID` and `checkpoint's oldestXID's DB` too.
5. Upgrade segment
    5.a) Copy master catalog to the segment
    5.b) Execute copy_clog_xlog_xid - Which executes pg_resetxlog to the
    set new control data files using the values based on the old master
    control data file.

```

The below error is coming from segments:
```
WARNING:  database with OID 0 must be vacuumed within 147483647
transactions  (seg2 127.0.0.1:25434 pid=97928) HINT:  To avoid a
database shutdown, execute a database-wide VACUUM in that database.

```
If you look the controldata files for the segment after upgrade, they
have default values for the below variables, But for the master, they
have valid values. (SINCE on master, VACUUM FREEZE was run which updates
the values)
pg_controldata <path_of_segment_data_dir>
```
Latest checkpoint's oldestXID:        2294968074
Latest checkpoint's oldestXID's DB:   0
```
The above values are currently invalid.
When pg_resetxlog is run to set the `Latest checkpoint's NextXID:`, it
sets the value of the nextxid, and put default values for oldestXidDB
and oldestXid with the assumption that autovacuum will take care of it.
Autovacuum is disabled in greenplum, so these values are not updated,
and the WARNING  `WARNING:  database with OID 0 must be vacuumed ` is
observed. Refer to the code below:
```
	if (set_xid != 0)
	{
		ControlFile.checkPointCopy.nextXid = set_xid;

		/*
		 * For the moment, just set oldestXid to a value that will force
		 * immediate autovacuum-for-wraparound.  It's not clear whether adding
		 * user control of this is useful, so let's just do something that's
		 * reasonably safe.  The magic constant here corresponds to the
		 * maximum allowed value of autovacuum_freeze_max_age.
		 */
		ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
		ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
		if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
			ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
		ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
	}
````

For the segments, a VACUUM FREEZE should be executed after pg_resetxlog
is performed to update these values.

Further, even if you try to run VACUUM, it complains
```
ERROR:  found xmin 711 from before relfrozenxid 800  (seg0
127.0.0.1:50433 pid=39596)
```
During the upgrade of a segment, the following steps are performed
    1. Copy the upgraded master catalog
    2. Execute copy_clog_xlog_xid which executes pg_resetxlog to the set
    new control data files using the values based on the old master control
    data file.

There is no step performed to update the relfrozenxid of the relation in
pg_class of the target cluster segments to reflect the same values as
that in the source cluster segment. GPDB has multiple databases which
acts as master or segment, and the values of these variables can be
different. In Master, the restore scripts contains update statement for
relfrozenxid and relminmxid and the same is copied to the segment.
However, we should all update the target segment database values with the
values of the corresponding source segment database.

This PR adds primarily the below 2 things:
1. Execute VACUUM FREEZE on segments (and the required steps before it)
    1.a) Execute set_frozenxids: After pg_resetxlog has been run the
2. Update the segments relfrozenxid, relminxmid, datfrozenxid,
datminmxid
上级 db609938
......@@ -14,7 +14,7 @@ OBJS = check.o controldata.o dump.o exec.o file.o function.o info.o \
OBJS += greenplum/aotable.o greenplum/gpdb4_heap_convert.o greenplum/version_gp.o \
greenplum/check_gp.o greenplum/file_gp.o greenplum/reporting.o \
greenplum/aomd_filehandler.o greenplum/option_gp.o \
greenplum/controldata_gp.o \
greenplum/controldata_gp.o greenplum/frozenxids_gp.o \
greenplum/old_tablespace_file_contents.o greenplum/old_tablespace_file_parser.o \
greenplum/tablespace_gp.o greenplum/info_gp.o greenplum/old_tablespace_file_gp.o \
greenplum/server_gp.o greenplum/greenplum_cluster_info.o
......
......@@ -42,11 +42,19 @@ reset_system_identifier(void)
* schema has been restored to allow the data to be visible on the segments.
* All databases need to be frozen including those where datallowconn is false.
*
* Note: No further updates should occur after freezing the master data
* directory.
* On master and segments, vacuuming will also update the checkpoint's oldestXID and
* checkpoint's oldestXID's DB which was set to default (triggering autovacuum)
* when pg_resetxlog was executed to update the checkpoint's NextXID,
* otherwise vacuuming the tables will generate warnings requesting the user to
* vacuum the tables.
*
* Note:
* In postgres autovacuum is enabled and will be automatically triggered
* once the checkpoint's oldestXID is updated by pg_resetxlog, but in GPDB vacuum
* has to be triggered manually.
*/
void
freeze_master_data(void)
freeze_all_databases(void)
{
PGconn *conn;
PGconn *conn_template1;
......@@ -61,7 +69,7 @@ freeze_master_data(void)
TransactionId txid_after;
int32 txns_from_freeze;
prep_status("Freezing all rows in new master after pg_restore");
prep_status("Freezing all rows in all databases");
/* Temporarily allow connections to all databases for vacuum freeze */
conn_template1 = connectToServer(&new_cluster, "template1");
......
/*
* frozenxids_gp.c
*
* functions for restoring frozenxid from old cluster
*
* Copyright (c) 2016-Present, Pivotal Software Inc
*/
#include "postgres_fe.h"
#include "pg_upgrade_greenplum.h"
/*
* Segment database contains data, and the tuples should not have
* any entry with a xmin > relfrozenxid for the table in pg_class.
* Instead of vacuum freezing the entire data, update the relfrozenxid
* of the relation in pg_class with the datafrozenxid from the corresponding
* database in the old cluster. This ensures that the xmin is not > relfrozenxid
* for any of the tuple.
* Also, update the new segment database with the datafrozenxid
* from the old cluster as that indicates the lowest xid available.
*
* In GPDB5 datminmxid does not exist, so use the chkpnt_nxtmulti to update
* the value in the GPDB6 cluster.
*/
void
update_segment_db_xids(void)
{
Assert(!is_greenplum_dispatcher_mode());
int dbnum;
PGconn *conn;
prep_status("Updating xid's in new cluster segment databases");
for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
{
DbInfo *active_db = &old_cluster.dbarr.dbs[dbnum];
uint32 datfrozenxid = active_db->datfrozenxid;
uint32 datminmxid = active_db->datminmxid;
conn = connectToServer(&new_cluster, active_db->db_name);
PQclear(executeQueryOrDie(conn,
"set allow_system_table_mods=true"));
PQclear(executeQueryOrDie(conn,
"UPDATE pg_catalog.pg_database "
"SET datfrozenxid = '%u', datminmxid = '%u' "
"WHERE datname = '%s'",
datfrozenxid,
(GET_MAJOR_VERSION(old_cluster.major_version) <= 803) ?
old_cluster.controldata.chkpnt_nxtmulti : datminmxid,
active_db->db_name));
/*
* include heap, materialized view, temporary/toast and AO tables
* exclude relations with external storage as well as AO and CO tables
* The logic here should keep consistent with function
* should_have_valid_relfrozenxid().
* Notes: if we ever backport this to Greenplum 5X, remove 'm' first
* and then replace 'M' with 'm', because 'm' used to be RELKIND
* visimap in 4.3/5X, not matview
*
*/
PQclear(executeQueryOrDie(conn,
"UPDATE pg_catalog.pg_class "
"SET relfrozenxid = '%u'"
"WHERE (relkind IN ('r', 'm', 't') "
"AND NOT relfrozenxid = 0) "
"OR (relkind IN ('t', 'o', 'b', 'M'))",
datfrozenxid));
/*
* update heap, materialized view, TOAST/temporary and AO tables
*/
PQclear(executeQueryOrDie(conn,
"UPDATE pg_catalog.pg_class "
"SET relminmxid = '%u' "
"WHERE relkind IN ('r', 'm', 't', 'o', 'b', 'M')",
(GET_MAJOR_VERSION(old_cluster.major_version) <= 803) ?
old_cluster.controldata.chkpnt_nxtmulti : datminmxid));
PQfinish(conn);
}
check_ok();
}
\ No newline at end of file
......@@ -75,9 +75,11 @@ bool is_show_progress_mode(void);
void validate_greenplum_options(void);
/* pg_upgrade_greenplum.c */
void freeze_master_data(void);
void freeze_all_databases(void);
void reset_system_identifier(void);
/* frozenxids_gp.c */
void update_segment_db_xids(void);
/* aotable.c */
......
......@@ -377,6 +377,8 @@ get_db_infos(ClusterInfo *cluster)
i_oid,
i_spclocation;
char query[QUERY_ALLOC];
int i_datafrozenxid;
int i_datminmxid = 0;
/*
* greenplum specific indexes
......@@ -384,7 +386,7 @@ get_db_infos(ClusterInfo *cluster)
int i_tablespace_oid;
snprintf(query, sizeof(query),
"SELECT d.oid, d.datname, t.oid as tablespace_oid, %s "
"SELECT d.oid, d.datname, t.oid as tablespace_oid, %s , datfrozenxid %s "
"FROM pg_catalog.pg_database d "
" LEFT OUTER JOIN pg_catalog.pg_tablespace t "
" ON d.dattablespace = t.oid "
......@@ -394,7 +396,9 @@ get_db_infos(ClusterInfo *cluster)
/* 9.2 removed the spclocation column */
/* GPDB_XX_MERGE_FIXME: spclocation was removed in 6.0 cycle */
(GET_MAJOR_VERSION(cluster->major_version) <= 803) ?
"t.spclocation" : "pg_catalog.pg_tablespace_location(t.oid) AS spclocation");
"t.spclocation" : "pg_catalog.pg_tablespace_location(t.oid) AS spclocation",
(GET_MAJOR_VERSION(cluster->major_version) <= 803) ?
" ": ", datminmxid");
res = executeQueryOrDie(conn, "%s", query);
......@@ -402,6 +406,9 @@ get_db_infos(ClusterInfo *cluster)
i_datname = PQfnumber(res, "datname");
i_spclocation = PQfnumber(res, "spclocation");
i_tablespace_oid = PQfnumber(res, "tablespace_oid");
i_datafrozenxid = PQfnumber(res, "datfrozenxid");
if (GET_MAJOR_VERSION(cluster->major_version) > 803)
i_datminmxid = PQfnumber(res, "datminmxid");
ntups = PQntuples(res);
dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
......@@ -410,6 +417,9 @@ get_db_infos(ClusterInfo *cluster)
{
dbinfos[tupnum].db_oid = atooid(PQgetvalue(res, tupnum, i_oid));
dbinfos[tupnum].db_name = pg_strdup(PQgetvalue(res, tupnum, i_datname));
dbinfos[tupnum].datfrozenxid = strtoul(PQgetvalue(res, tupnum, i_datafrozenxid), NULL, 10);
if (GET_MAJOR_VERSION(cluster->major_version) > 803)
dbinfos[tupnum].datminmxid = strtoul(PQgetvalue(res, tupnum, i_datminmxid), NULL, 10);
snprintf(dbinfos[tupnum].db_tablespace, sizeof(dbinfos[tupnum].db_tablespace), "%s",
determine_db_tablespace_path(
cluster,
......
......@@ -170,6 +170,41 @@ main(int argc, char **argv)
create_new_objects();
}
else
{
/*
* Restore scripts contains statements to update relfrozenxid and relminxmid
* for the relations according to the master, and the same data is copied to the
* segments but on segments those should reflect the values from the corresponding
* segment database. So, update the xids on the segments for user and catalog tables.
* If this step is not done on segment, subsequent vacuum freeze can complain that
* the xmin <some low number> from before relfrozenxid <some higher number>
*/
set_frozenxids(false);
}
/*
* vacuum freeze the database before restoring the ao segment tables
* catalog data on segments. The catalog copied from the master indicates
* that the files have 0 EOF and will not go further to open the files
* in Prepare phase which will otherwise result in error as the physical
* files are not yet copied from the old segment.
*/
freeze_all_databases();
/*
* vacuum freeze is done prior to copying / linking the data. The xmin
* of the tuples (yet to be copied/linked) for the user created tables can be
* lower than the relfrozenxid updated with vacuum freeze.
* So, it's safe / better to update the relfrozenxid, relminmxid for the
* relations using datfrozenxid which is the lowest available relfrozenxid
* for all the relation in the source database and datminmxid which is the minimum
* of relminmxid for all the relations in source database. This ensures that the
* xmin of the tuples will not be higher than relfrozenxid for the relation.
* Otherwise, vacuuming those tables once data is copied/linked will error out.
*/
if (!is_greenplum_dispatcher_mode())
update_segment_db_xids();
/*
* In a segment, the data directory already contains all the objects,
......@@ -181,12 +216,6 @@ main(int argc, char **argv)
*/
restore_aosegment_tables();
if (is_greenplum_dispatcher_mode())
{
/* freeze master data *right before* stopping */
freeze_master_data();
}
stop_postmaster(false);
/*
......@@ -669,6 +698,7 @@ copy_subdir_files(char *subdir)
check_ok();
}
static void
copy_clog_xlog_xid(void)
{
......
......@@ -277,6 +277,8 @@ typedef struct
char *db_name; /* database name */
char db_tablespace[MAXPGPATH]; /* database default tablespace
* path */
uint32 datfrozenxid;
uint32 datminmxid;
RelInfoArr rel_arr; /* array of all user relinfos */
} DbInfo;
......
-- should be able to vacuum freeze the tables
VACUUM FREEZE vf_tbl_heap;
VACUUM
VACUUM FREEZE vf_tbl_ao;
VACUUM
VACUUM FREEZE vf_tbl_aoco;
VACUUM
-- should be able to create a new table without any warnings related to vacuum
CREATE TABLE upgraded_vf_tbl_heap (LIKE vf_tbl_heap);
CREATE
INSERT INTO upgraded_vf_tbl_heap SELECT * FROM vf_tbl_heap;
INSERT 10
VACUUM FREEZE upgraded_vf_tbl_heap;
VACUUM
SELECT * FROM upgraded_vf_tbl_heap;
a | b
----+----
8 | 8
9 | 9
10 | 10
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
6 | 6
7 | 7
(10 rows)
CREATE TABLE vf_tbl_heap (a int, b int);
CREATE
INSERT INTO vf_tbl_heap SELECT i, i FROM GENERATE_SERIES(1,10)i;
INSERT 10
VACUUM FREEZE vf_tbl_heap;
VACUUM
CREATE TABLE vf_tbl_ao (a int, b int) WITH (appendonly=true);
CREATE
CREATE INDEX vf_tbl_ao_idx1 ON vf_tbl_ao(b);
CREATE
INSERT INTO vf_tbl_ao SELECT i, i FROM GENERATE_SERIES(1,10)i;
INSERT 10
VACUUM FREEZE vf_tbl_ao;
VACUUM
CREATE TABLE vf_tbl_aoco (a int, b int) WITH (appendonly=true, orientation=column);
CREATE
CREATE INDEX vf_tbl_aoco_idx1 ON vf_tbl_aoco(b);
CREATE
INSERT INTO vf_tbl_aoco SELECT i, i FROM GENERATE_SERIES(1,10)i;
INSERT 10
VACUUM FREEZE vf_tbl_aoco;
VACUUM
......@@ -10,3 +10,4 @@ test: gphdfs
test: mismatched_aopartition_indexes
test: different_name_index_backed_constraint
test: mismatched_partition_schemas
test: vacuum_freeze_tables
......@@ -3,3 +3,4 @@ test: upgraded_exchange_partition_heap_table
test: upgraded_partitioned_heap_table_with_differently_sized_dropped_columns
test: upgraded_partitioned_heap_table_with_differently_aligned_dropped_columns
test: upgraded_partitioned_heap_table_with_differently_aligned_varlen_dropped_columns
test: upgraded_vacuum_freeze_tables
-- should be able to vacuum freeze the tables
VACUUM FREEZE vf_tbl_heap;
VACUUM FREEZE vf_tbl_ao;
VACUUM FREEZE vf_tbl_aoco;
-- should be able to create a new table without any warnings related to vacuum
CREATE TABLE upgraded_vf_tbl_heap (LIKE vf_tbl_heap);
INSERT INTO upgraded_vf_tbl_heap SELECT * FROM vf_tbl_heap;
VACUUM FREEZE upgraded_vf_tbl_heap;
SELECT * FROM upgraded_vf_tbl_heap;
CREATE TABLE vf_tbl_heap (a int, b int);
INSERT INTO vf_tbl_heap SELECT i, i FROM GENERATE_SERIES(1,10)i;
VACUUM FREEZE vf_tbl_heap;
CREATE TABLE vf_tbl_ao (a int, b int) WITH (appendonly=true);
CREATE INDEX vf_tbl_ao_idx1 ON vf_tbl_ao(b);
INSERT INTO vf_tbl_ao SELECT i, i FROM GENERATE_SERIES(1,10)i;
VACUUM FREEZE vf_tbl_ao;
CREATE TABLE vf_tbl_aoco (a int, b int) WITH (appendonly=true, orientation=column);
CREATE INDEX vf_tbl_aoco_idx1 ON vf_tbl_aoco(b);
INSERT INTO vf_tbl_aoco SELECT i, i FROM GENERATE_SERIES(1,10)i;
VACUUM FREEZE vf_tbl_aoco;
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册