Expands an existing Greenplum Database across new hosts in the array.
Description
The gpexpand utility performs system expansion in two phases: segment
initialization and then table redistribution.
In the initialization phase, gpexpand runs with an input file that
specifies data directories, dbid values, and other characteristics of the
new segments. You can create the input file manually, or by following the prompts in an
interactive interview.
If you choose to create the input file using the interactive interview, you can optionally
specify a file containing a list of expansion hosts. If your platform or command shell
limits the length of the list of hostnames that you can type when prompted in the interview,
specifying the hosts with -f may be mandatory.
In addition to initializing the segments, the initialization phase performs these
actions:
- Creates an expansion schema to store the status of the expansion
operation, including detailed status for tables.
- Changes the distribution policy for all tables to DISTRIBUTED
RANDOMLY. The original distribution policies are later restored in the
redistribution phase.
Data redistribution should be performed during low-use hours. Redistribution can divided
into batches over an extended period.
To begin the redistribution phase, you must run gpexpand with either the
-d (duration) or -e (end time) options. Until the
specified end time or duration is reached, the utility will redistribute tables in the
expansion schema. Each table is reorganized using ALTER TABLE commands to
rebalance the tables across new segments, and to set tables to their original distribution
policy. If gpexpand completes the reorganization of all tables before the
specified duration, it displays a success message and ends.
This utility uses secure shell (SSH) connections between systems to perform its tasks.
In large Greenplum Database deployments, cloud deployments, or deployments with a large
number of segments per host, this utility may exceed the host's maximum threshold for
unauthenticated connections. Consider updating the SSH MaxStartups
configuration parameter to increase this threshold. For more information about SSH
configuration options, refer to the SSH documentation for your Linux distribution.