Description
The gptransfer utility copies database objects from a source
Greenplum Database system to a destination system. You can perform one of the
following types of operations:
- Copy a Greenplum Database system with the --full
option.
This option copies all user created databases in a source system to a
different destination system. If you specify the --full
option, you must specify both a source and destination system. The
destination system cannot contain any user-defined databases, only the
default databases postgres, template0, and template1.
- Copy a set of user defined database tables to a destination
system. The -f, and -t options copy a
specified set of user defined tables, table data, and re-creates the table
indexes. The -d option copies all user defined tables, table
data, and re-creates the table indexes from a specified database.
If the
destination system is the same as the source system, you must also specify a
destination database with the --dest-database option. When
you specify a destination database, the source database tables are copied
into the specified destination database.
For partitioned tables, you
can specify the --partition-transfer or the
--partition-transfer-non-partition-target option with
-f option to copy specific leaf child partitions of
partitioned tables from a source database. The leaf child partitions are the
lowest level partitions of a partitioned database. For the
--partition-transfer option, the destination tables are
leaf child partitions. For the
--partition-transfer-non-partition-target option, the
destination tables are non-partitioned tables.
If an invalid set of gptransfer options are specified, or if a
specified source table or database does not exist, gptransfer
returns an error and quits. No data is copied.
To copy database objects between Greenplum Database systems
gptransfer utility uses:
- The Greenplum Database utility gpfdist on the
source database system. The gpfdists protocol is not
supported.
- Writable external tables on the source database system and
readable external tables on the destination database system.
- Named pipes that transfer the data between a writable external
table and a readable external table.
When copying data into the destination system, it is redistributed on the Greenplum
Database segments of the destination system. This is the flow of data when
gptransfer copies database data:
writable external table > gpfdist > named pipe > gpfdist > readable
external table
For information about transferring data with gptransfer, see
"Migrating Data with Gptransfer" in the Greenplum Database Administrator
Guide.
Notes
The gptransfer utility efficiently transfers tables with large
amounts of data. Because of the overhead required to set up parallel transfers, the
utility is not recommended for transferring tables with small amounts of data. It
might be more efficient to copy the schema and smaller tables to the destination
database using other methods, such as the SQL COPY command, and
then use gptransfer to transfer large tables in batches.
When copying database data between different Greenplum Database systems,
gptransfer requires a text file that lists all the source
segment host names and IP addresses. Specify the name and location of the file with
the --source-map-file option. If the file is missing or not all
segment hosts are listed, gptransfer returns an error and quits.
See the description of the option for file format information.
The source and destination Greenplum Database segment hosts need to be able to
communicate with each other. To ensure that the segment hosts can communicate, you
can use a tool such as the Linux netperf utility.
If a filespace has been created for a source Greenplum Database system, a
corresponding filespace must exist on the target system.
SSH keys must be exchanged between the two systems before using
gptransfer. The gptransfer utility connects to
the source system with SSH to create the named pipes and start the
gpfdist instances. You can use the Greenplum Database
gpssh-exkeys utility with a list of all the source and
destination primary hosts to exchange keys between Greenplum Database hosts.
Source and destination systems must be able to access the gptransfer
work directory. The default directory is the user's home directory. You can specify
a different directory with the --work-base-dir option.
The gptransfer utility does not move configuration files such as
postgres.conf and pg_hba.conf. You must set up
the destination system configuration separately.
The gptransfer utility does not move external objects such as
Greenplum Database extensions, third party jar files, and shared object files. You
must install the external objects separately.
The gptransfer utility does not move dependent database objects
unless you specify the --full option. For example, if a table has a
default value on a column that is a user-defined function, that function must exist
in the destination system database when using the -t,
-d, or -f options.
If you move a set of database tables with the -d,
-t, or -f option, and the destination table or
database does not exist, gptransfer creates it. The utility
re-creates any indexes on tables before copying data.
If a table exists on the destination system and one of the options
--skip-existing, --truncate, or
--drop is not specified, gptransfer returns an
error and quits.
If an error occurs when during the process of copying a table, or table validation
fails, gptransfer continues copying the other specified tables.
After gptransfer finishes, it displays a list of tables where an
error occurred, writes the names of tables that failed into a text file, and then
prints the name of the file. You can use this file with the
gptransfer -f option to retry copying tables.
The name of the file that contains the list of tables where errors occurred is
failed_migrated_tables_yyyymmdd_hhmmss.txt.
The yyyymmdd_hhmmss is a time stamp when the
gptransfer process was started. The file is created in the
directory were gptransfer is executed.
After gptransfer completes copying database objects, the utility
compares the row count of each table copied to the destination databases with the
table in the source database. The utility returns the validation results for each
table. You can disable the table row count validation by specifying the
--no-final-count option.
If the number of rows do not match, the table is not added to the file that lists
the tables where transfer errors occurred.
The gp_external_max_segs server configuration parameter controls the
number of segment instances that can access a single gpfdist
instance simultaneously. Setting a low value might affect
gptransfer performance. For information about the parameter,
see the Greenplum Database Reference Guide.
Limitation for the Source and Destination Systems
If you are copying data from a system with a larger number of segments to a
system with a fewer number of segment hosts, then the total number of primary segments on
the destination system must be greater than or equal to the total number of
segment hosts on the source system.
For example, assume a destination system has a total of 24 primary
segments. This means that the source system cannot have more than 24 segment hosts.
When you copy data from a source Greenplum Database system with a larger number
of primary segment instances than on the destination system, the data transfer
might be slower when compared to a transfer where the source system has fewer
segment instances than the destination system. The gptransfer
utility uses a different configuration of named pipes and
gpfdist instances in the two situations.
Examples
This command copies the table public.t1 from the database
db1 and all tables in the database db2 to the
system mytest2.
gptransfer -t db1.public.t1 -d db2 --dest-host=mytest2 \
--source-map-file=gp-source-hosts --truncate
If the databases db1 and db2 do not exist on the
system mytest2, they are created. If any of the source tables exist
on the destination system, gptransfer truncates the table and
copies the data from the source to the destination table.
This command copies leaf child partition tables from a source system to a destination
system.gptransfer -f input_file --partition-transfer --source-host=source_host \
--source-user=source_user --source-port=source_port --dest-host=dest_host \
--dest-user=dest_user --dest-port=dest_port --source-map-file=host_map_file
This line in input_file copies a leaf child partition from the
source system to the destination system.
srcdb.people.person_1_prt_experienced, destdb.public.employee_1_prt_seniors
The line assumes partitioned tables in the source and destination systems similar to
the following tables.
- In the people schema of the srcdb database of the source
system, a partitioned table with a leaf child partition table
person_1_prt_experienced. This CREATE
TABLE command creates a partitioned table with the leaf child
partition
table.CREATE TABLE person(id int, title char(1))
DISTRIBUTED BY (id)
PARTITION BY list (title)
(PARTITION experienced VALUES ('S'),
PARTITION entry_level VALUES ('J'),
DEFAULT PARTITION other );
- In the public schema of the destdb database of the source
system, a partitioned table with a leaf child partition table
public.employee_1_prt_seniors. This CREATE
TABLE command creates a partitioned table with the leaf child
partition
table.CREATE TABLE employee(id int, level char(1))
DISTRIBUTED BY (id)
PARTITION BY list (level)
(PARTITION seniors VALUES ('S'),
PARTITION juniors VALUES ('J'),
DEFAULT PARTITION other );
This example uses Python regular expressions in a filter file to specify the set of
tables to transfer. This command specifies the -f option with the
filter file /tmp/filter_file to limit the tables that are
transferred.
gptransfer -f /tmp/filter_file --source-port 5432 --source-host test4 \
--source-user gpadmin --dest-user gpadmin --dest-port 5432 --dest-host test1 \
--source-map-file /home/gpadmin/source_map_file
This is the contents of /tmp/filter_file.
"test1.arc/.*/./.*/"
"test1.c/(..)/y./.*/"
In the first line, the regular expressions for the schemas, arc/.*/,
and for the tables, /.*/, limit the transfer to all tables with the
schema names that start with arc.
In the second line, the regular expressions for the schemas,
c/(..)/y, and for the tables, /.*/, limit the
transfer to all tables with the schema names that are four characters long and that
start with c and end with y, for example,
crty.
When the command is run, tables in the database test1 that satisfy
either condition are transferred to the destination database.