Managing Bloat in the Database
Greenplum Database heap tables use the PostgreSQL Multiversion Concurrency Control (MVCC)
storage implementation. A deleted or updated row is logically deleted from the database, but a
non-visible image of the row remains in the table. These deleted rows, also called expired
rows, are tracked in a free space map. Running VACUUM marks the expired rows
as free space that is available for reuse by subsequent inserts.
If the free space map is not large enough to accommodate all of the expired rows, the
VACUUM command is unable to reclaim space for expired rows that overflowed
the free space map. The disk space may only be recovered by running VACUUM
FULL, which locks the table, copies rows one-by-one to the beginning of the file,
and truncates the file. This is an expensive operation that can take an exceptional amount of
time to complete with a large table. It should be used only on smaller tables. If you attempt
to kill a VACUUM FULL operation, the system can be disrupted.
It is very important to run VACUUM after large UPDATE and
DELETE operations to avoid the necessity of ever running VACUUM
FULL.
If the free space map overflows and it is necessary to recover the space it is recommended to
use the CREATE TABLE...AS SELECT command to copy the table to a new table,
which will create a new compact table. Then drop the original table and rename the copied
table.
It is normal for tables that have frequent updates to have a small or moderate amount of
expired rows and free space that will be reused as new data is added. But when the table is
allowed to grow so large that active data occupies just a small fraction of the space, the
table has become significantly "bloated." Bloated tables require more disk storage and
additional I/O that can slow down query execution.
Bloat affects heap tables, system catalogs, and indexes.
Running the VACUUM statement on tables regularly prevents them from growing
too large. If the table does become significantly bloated, the VACUUM FULL
statement (or an alternative procedure) must be used to compact the file. If a large table
becomes significantly bloated, it is better to use one of the alternative methods described in
to remove the
bloat.
Never run VACUUM FULL <database_name> and do not
run VACUUM FULL on large tables in a Greenplum Database.
Detecting Bloat
The statistics collected by the ANALYZE statement can be used to calculate
the expected number of disk pages required to store a table. The difference between the
expected number of pages and the actual number of pages is a measure of bloat. The
gp_toolkit schema provides a gp_bloat_diag view that
identifies table bloat by comparing the ratio of expected to actual pages. To use it, make
sure statistics are up to date for all of the tables in the database, then run the following
SQL:gpadmin=# SELECT * FROM gp_toolkit.gp_bloat_diag;
bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
----------+------------+------------+-------------+-------------+---------------------------------------
21488 | public | t1 | 97 | 1 | significant amount of bloat suspected
(1 row)
The results include only tables with moderate or significant bloat. Moderate bloat is
reported when the ratio of actual to expected pages is greater than four and less than ten.
Significant bloat is reported when the ratio is greater than ten.
The gp_toolkit.gp_bloat_expected_pages view lists the actual number of
used pages and expected number of used pages for each database object.
gpadmin=# SELECT * FROM gp_toolkit.gp_bloat_expected_pages LIMIT 5;
btdrelid | btdrelpages | btdexppages
----------+-------------+-------------
10789 | 1 | 1
10794 | 1 | 1
10799 | 1 | 1
5004 | 1 | 1
7175 | 1 | 1
(5 rows)
The btdrelid is the object ID of the table. The
btdrelpages column reports the number of pages the table uses; the
btdexppages column is the number of pages expected. Again, the numbers
reported are based on the table statistics, so be sure to run ANALYZE on
tables that have changed.
Removing Bloat from Database Tables
The VACUUM command adds expired rows to the free space map so that
the space can be reused. When VACUUM is run regularly on a table that is
frequently updated, the space occupied by the expired rows can be promptly reused,
preventing the table file from growing larger. It is also important to run
VACUUM before the free space map is filled. For heavily updated tables,
you may need to run VACUUM at least once a day to prevent the table from
becoming bloated.
When a table is significantly bloated, it is better to run
ANALYZE
before running VACUUM. Because ANALYZE uses
block-level sampling, a table with a high ratio of blocks containing no valid rows can cause
ANALYZE to set the reltuples column of the
pg_class system catalog to an inaccurate value or 0, which can lead to
poorly optimized queries. The VACUUM command produces a more accurate count
and when run after ANALYZE will correct an inaccurate row count
estimate.
When a table accumulates significant bloat, running the VACUUM command is
insufficient. For small tables, running VACUUM FULL <table_name> can
reclaim space used by rows that overflowed the free space map and reduce the size of the
table file. However, a VACUUM FULL statement is an expensive operation that
requires an ACCESS EXCLUSIVE lock and may take an exceptionally long and
unpredictable amount of time to finish. Rather than run VACUUM FULL on a
large table, an alternative method is required to remove bloat from a large file. Note that
every method for removing bloat from large tables is resource intensive and should be done
only under extreme circumstances.
The first method to remove bloat from a large table is to create a copy of the table
excluding the expired rows, drop the original table, and rename the copy. This method uses
the CREATE TABLE <table_name> AS SELECT statement to create the new
table, for
example:gpadmin=# CREATE TABLE mytable_tmp AS SELECT * FROM mytable;
gpadmin=# DROP TABLE mytable;
gpadmin=# ALTER TABLE mytabe_tmp RENAME TO mytable;
A second way to remove bloat from a table is to redistribute the table, which rebuilds the
table without the expired rows. Follow these steps:
- Make a note of the table's distribution columns.
- Change the table's distribution policy to
random:ALTER TABLE mytable SET WITH (REORGANIZE=false)
DISTRIBUTED randomly;
This
changes the distribution policy for the table, but does not move any data. The command
should complete instantly.
- Change the distribution policy back to its initial
setting:ALTER TABLE mytable SET WITH (REORGANIZE=true)
DISTRIBUTED BY (<original distribution columns>);
This
step redistributes the data. Since the table was previously distributed with the same
distribution key, the rows are simply rewritten on the same segment, excluding expired
rows.
Removing Bloat from Indexes
The VACUUM command only recovers space from tables. To recover the space
from indexes, recreate them using the REINDEX command.
To rebuild all indexes on a table run REINDEX table_name;. To
rebuild a particular index, run REINDEX index_name;.
REINDEX does not update the reltuples and
relpages statistics for the index, so it is important to
ANALYZE the table to update these statistics after reindexing.
Removing Bloat from System Catalogs
Greenplum Database system catalogs are also heap tables and can become bloated over time.
As database objects are created, altered, or dropped, expired rows are left in the system
catalogs. Using gpload to load data contributes to the bloat since
gpload creates and drops external tables. (Rather than use
gpload, it is recommended to use gpfdist to load data.)
Bloat in the system catalogs increases the time require to scan the tables, for example,
when creating explain plans. System catalogs are scanned frequently and if they become
bloated, overall system performance is degraded.
It is recommended to run VACUUM on the system catalog nightly and at least
weekly. At the same time, running REINDEX SYSTEM removes bloat from the
indexes. Alternatively, you can reindex system tables using the reindexdb
utility with the -s (--system) option. After reindexing,
it is also important to run ANALYZE, because the REINDEX
command rebuilds indexes with empty statistics.
The following script runs VACUUM, REINDEX, and
ANALYZE on the system
catalogs.
#!/bin/bash
DBNAME="<database_name>"
SYSTABLES="' pg_catalog.' || relname || ';' from pg_class a, pg_namespace b \
where a.relnamespace=b.oid and b.nspname='pg_catalog' and a.relkind='r'"
psql -tc "SELECT 'VACUUM' || $SYSTABLES" $DBNAME | psql -a $DBNAME
reindexdb -s -d $DBNAME
analyzedb -s pg_catalog -d $DBNAME
If the system catalogs become significantly bloated, you must perform an intensive system
catalog maintenance procedure. The CREATE TABLE AS SELECT and
redistribution key methods for removing bloat cannot be used with system catalogs. You must
instead run VACUUM FULL during a scheduled downtime period. During this
period, stop all catalog activity on the system; VACUUM FULL takes
exclusive locks against the system catalog. Running VACUUM regularly can
prevent the need for this more costly procedure.
Removing Bloat from Append-Optimized Tables
Append-optimized tables are handled much differently than heap tables. Although
append-optimized tables allow updates, inserts, and deletes, they are not optimized for
these operations and it is recommended to not use them with append-optimized tables. If you
heed this advice and use append-optimized for load-once/read-many workloads,
VACUUM on an append-optimized table runs almost instantaneously.
If you do run UPDATE or DELETE commands on an
append-optimized table, expired rows are tracked in an auxiliary bitmap instead of the free
space map. VACUUM is the only way to recover the space. Running
VACUUM on an append-optimized table with expired rows compacts a table by
rewriting the entire table without the expired rows. However, no action is performed if the
percentage of expired rows in the table exceeds the value of the
gp_appendonly_compaction_threshold configuration parameter, which is 10
(10%) by default. The threshold is checked on each segment, so it is possible that a
VACUUM statement will compact an append-only table on some segments and
not others. Compacting append-only tables can be disabled by setting the
gp_appendonly_compaction parameter to no.