diff --git a/gpdb-doc/dita/GPAdminGuide.ditamap b/gpdb-doc/dita/GPAdminGuide.ditamap
index a8db968c7bc2ccc775a8412df9f580679c1721e4..f624fcad199546b6e0e3b9934210884331528761 100644
--- a/gpdb-doc/dita/GPAdminGuide.ditamap
+++ b/gpdb-doc/dita/GPAdminGuide.ditamap
@@ -86,7 +86,7 @@
-
+
diff --git a/gpdb-doc/dita/admin_guide/managing/gpcopy-migrate.xml b/gpdb-doc/dita/admin_guide/managing/gpcopy-migrate.xml
index 4d26e418f91ee72293c160ecce3decca387c681e..99399662a1b7aa254caa46d3616c9d6d1b06ba1d 100644
--- a/gpdb-doc/dita/admin_guide/managing/gpcopy-migrate.xml
+++ b/gpdb-doc/dita/admin_guide/managing/gpcopy-migrate.xml
@@ -10,289 +10,9 @@
another Greenplum database. You can migrate the entire contents of a database, or just
selected tables. The clusters can have different Greenplum Database versions. For example, you
can use gpcopy to migrate data from Greenplum 5 to Greenplum 6.
- The gpcopy utility is available only in the commercial release of
- Pivotal Greenplum Database.
- The gpcopy interface includes options to transfer one or more full
- databases, or one or more database tables. A full database transfer includes the database
- schema, table data, indexes, views, roles, user-defined functions, resource queues, and
- resource groups. If a copied table or database does not exist in the destination cluster,
- gpcopy creates it automatically, along with indexes as necessary.
- Configuration files, including postgresql.conf and
- pg_hba.conf, must be transferred manually by an administrator. Extensions
- installed in the database with gppkg, such as MADlib and programming language
- extensions, must be installed in the destination database by an administrator.
- gpcopy is a command-line tool that includes these features:
+ The gpcopy utility is available as a separate download for the
+ commercial release of Pivotal Greenplum Database. See the Pivotal gpcopy
+ Documentation.
The gpcopy utility copies objects from databases in a source Greenplum
Database system to databases in a destination Greenplum Database system.
- The gpcopy utility is available only in the commercial release of
- Pivotal Greenplum Database.
-
- Synopsis
- gpcopy
- { --full |
- { --dbname database1[, database2 ... ]
- [ --dest-dbname dest-db1[, dest-db2 ... ] ] } |
- --include-table db.schema.table[, db.schema1.table1 ... ]
- [ --dest-table db.schema.table[, db.schema1.table1 ... ] |
- --include-table-file table-file1
- [ --include-table-file table-file2] ... ] |
- --include-table-json json-table-file1
- [ --include-table-json json-table-file2] ... ] }
- [ --metadata-only ]
- [ --exclude-table db.schema.table[, db.schema1.table1 ... ] ]
- [ --exclude-table-file table-file1 ]
- [ --exclude-table-file table-file2 ] ... ] ]
-
- { --dest-host dest_host [ --dest-port dest_port ]
- [ --dest-user dest_user ] }
- [ --source-host source_host [ --source-port source_port ]
- [ --source-user source_user ] ]
- [ --jobs int ]
- [ --on-segment-threshold int ]
- [ --parallelize-leaf-partitions ]
- [ --data-port-range lower_port-upper_port ]
-
- { --skip-existing | --truncate | --drop | --append }
- [ --analyze ]
- [ --no-compression ]
- [ --no-distribution-check ]
- [ --truncate-source-after [--yes ] ]
- [ --validate type ]
-
- [ --dry-run ]
- [ --quiet | --debug ]
-
-gpcopy --version
-gpcopy --help
-
-
- Description
- The gpcopy utility copies database objects from a source Greenplum
- Database system to a destination system. You can perform one of the following types
- of copy operations:
-
- - Copy a Greenplum Database system with the --full
- option. This option copies all database objects including, tables, table data,
- indexes, views, users, roles, functions, and resource queues for all
- user-defined databases to a different destination system.
- - Copy a set of user-defined database tables to a destination
- system.
- - The --dbname option copies all user-defined tables,
- table data, and re-creates the table indexes from specified
- databases.
- - The --include-table
- --include-table-file, or
- --include-table-json option copies a specified set
- of user-defined tables, table data, and re-creates the table indexes.
- - The --exclude-table and
- --exclude-table-file options exclude a specified
- set of user-defined tables and table data to be copied.
-
- - Copy only the database schemas with the --metadata-only
- option.
-
- When running gpcopy, you must specify the data to copy from the
- source database and how to manage data in the destination database.
- If you specify the option --truncate-source-after, you must also
- specify the --validate option. When
- --truncate-source-after is specified, gpcopy
- truncates the source table after the table data is copied and destination table data
- has been validated.
-
-
- Prerequisites
- The user IDs connecting to the source and destination Greenplum Database systems, the
- --source-user and --dest-user, must have
- appropriate access to the systems.
- If Kerberos authentication is enabled for Greenplum Database, gpcopy
- can authenticate with Kerberos. Run the kinit command to obtain a
- ticket-granting ticket from the KDC server before you execute
- gpcopy. The PGKRBSRVNAME environment variable
- specifies the Kerberos service name for Greenplum Database. If your Greenplum
- Database service name is different than the default (postgres) set
- the PGKRBSRVNAME environment variable with the correct service name
- before you run gpcopy. See for information about enabling
- Kerberos authentication with Greenplum Database.
- The source and destination Greenplum Database segment hosts need to be able to
- communicate with each other. To ensure that the segment hosts can communicate, you
- can use a tool such as the Linux netperf utility.
- When the --full option is specified, resource groups and table
- spaces are copied, however, the utility does not configure the destination system.
- For example, you must configure the system to use resource groups and create the
- host directories for the tablespaces.
-
-
- Options for Choosing Data to Copy
- gpcopy provides a range of options to define the scope of data that
- is copied. You can choose options to perform a full Greenplum cluster migration,
- copy specific databases or tables, or only portions of a table using a SQL query.
- Additional options enable you to exclude certain tables from being copied, or to
- change the destination database into which a table's data is copied. The special
- --metadata-only instructs gpcopy to create the
- necessary schema for the selected source tables, but copy no table data.
- You must use at least one of the options --full,
- --dbname, --include-table,
- --include-table-file, or --include-table-json.
- Use other options as needed to exclude data from the copy or to change the
- destination database for copied tables.
-
-
- --full
- This option performs a migration of a Greenplum Database source system to a
- destination system.
- A migration copies all database objects including, tables, indexes, views,
- roles, functions, user-defined types (UDT), resource queues, and resource
- groups for all user-defined databases. The default databases,
- postgres, template0, and
- template1, are not copied.
- This option cannot be specified with the --dbname,
- --include-table, --include-table-file,
- or --include-table-json options.
-
-
- --dbname database
- A source database to copy. To copy multiple databases to the destination
- system specify a comma-separated list of databases with no spaces between
- the names. All the user-defined tables and table data are copied to the
- destination system.
- If the source database does not exist, gpcopy returns an
- error and quits. If a destination database does not exist a database is
- created.
- Not valid with the --full,
- --include-table, --include-table-file, or
- --include-table-json options.
- Alternatively, you can copy a set of tables with the
- --include-table, --include-table-file,
- or --include-table-json option.
-
-
- --dest-dbname database
- To copy a database to a different destination database, specify the name of
- the destination database. For multiple databases, specify a comma-separated
- list of databases with no spaces between the names. The number of database
- names must match the number of names specified in the
- --dbname option. The utility copies the source
- databases to the destination databases in the listed order. In this example,
- db1 is copied to destdb1,
- db2 is copied to destdb2, and
- db3 is copied to db3.
- gpcopy --dest-host mdw-2 --dbname=db1,db2,db3 --dest-dbname=destdb1,destdb2,db3 --drop
- If the source database does not exist, gpcopy returns an
- error and quits. If a destination database does not exist a database is
- created.
- Valid only with the --dbname option.
-
-
- --include-table
- db.schema.table
- One or more tables from the source database system to copy. You must provide
- fully-qualified table names
- (database.schema.table).
- You cannot specify views or system catalog tables. To copy multiple tables,
- include a comma-separated list of table names or use regular expressions to
- describe a set of tables. You can optionally use
- --dest-table to change the databases into which tables
- in --include-table are copied.
- You can use Go language regular expressions in the database, schema, and
- table portions of the fully-qualified table name to define a set of input
- tables. The regular expression pattern must be enclosed in slashes
- (/RE_pattern/). For example,
- --include-table mytest.public.demo/.*/ specifies all
- tables that begin with demo in the mytest
- database in the public schema.
- The following two examples for the --include-table option
- are equivalent. They both specify a set of tables that begins with demo and
- ends with zero or more
- digits.--include-table testdb.schema1.demo/[0-9]*/
---include-table testdb.schema1./demo[0-9]*/
- Regular expression capture groups in the database portion of the
- fully-qualified name can be referenced in --dest-table to
- change the destination database for a table. (Using capture groups to change
- the schema or table name is not currently supported.)
- If the source table does not exist, gpcopy returns an error
- and quits.
- If the destination table or database does not exist, it is created. Only the
- table and table data are copied, not dependent objects. Indexes are
- re-created only if the --drop option is specified.
- Dependent objects are not copied.
- This option is not allowed with the options: --full,
- --dbname, --include-table-file, or
- --include-table-json.
-
-
- --dest-table
- db.schema.table
- (Optional.) Changes the database where tables defined with
- --include-table are copied. Greenplum does not
- currently support changing the destination schema or table name.
- You must provide fully-qualified table names
- (database.schema.table).
- Specify multiple tables using either a comma-separated list or by
- referencing regular expression capture groups that were defined with
- --include-table. If you use a comma-separated list of
- tables with --include-table, use the same number position
- in the --dest-table list to change the destination database
- of the corresponding table. For example, to move only the second table
- provided with an option
- like--include-table mytest.public.table1,mytest.public.table2use
- an option similar
- to:--dest-table production.public.table1,production.public.table2
- If you use Go language regular expressions capture groups in
- --include-table to define a set of tables, you can
- reference the capture groups in --dest-table to rename the
- destination database. Capture groups referenced in
- --include-table or --dest-table must
- use forward slash delimiters (/) to indicate regular expression processing.
- For example, to reference the capture group defined in the database string
- of--include-table testdb/(\d+)/.myschema/(\d+)/.mytable/(\d+)/use
- an option similar
- to:--dest-table productiondb/$1/.myschema/$1/.mytable/$1/Note
- that the capture group numbering starts at 1 and restarts for each component
- of the fully-qualified table name (database, schema, and table). Therefore,
- in the above example each capture group is referenced with
- /$1/ in --dest-table. You cannot
- rename the destination schema or table name components.
- If regular expression rules would cause more than one source table name to
- be remapped to the same destination table, then the
- --skip-existing, --truncate,
- --drop, or --append options determine
- how gpcopy handles subsequent copy requests to the existing
- table.
-
-
- --include-table-file table-file
- The location and name of a text file that defines the tables and data to
- copy. To use multiple files, specify this option for each
- file.--include-table-file <path_to_file1> --include-table-file <path_to_file2>
- In the text file, specify a single fully qualified table per line
- (database.schema.table). You cannot specify views or system
- catalog tables.
- You can use Go language regular expression syntax to select multiple tables.
- See the --dest-table option for information about using
- regular expressions to select tables.
- This option cannot be specified with the --full,
- --dbname, or --include-table
- options.
-
-
- --include-table-json json-table-file
- The location and name of a JSON-format file that defines the tables and data
- to copy. In contrast to the text file used with
- --include-table-file, the JSON file can include a
- destination table name used to change the database into which the table data
- is copied.
- The JSON file that you provide must define one or more objects with
- key-value pairs to describe a source table, an optional query of the source
- table data to copy, and an optional destination table to indicate the
- database where the data is copied. Place the name-value pairs for multiple
- table objects in a JSON array. For
- example:[
- {
- "source": "database.schema.table",
- "sql": "query"
- "dest": "database.schema.table"
- },
- {
- ... }
-]
- You cannot specify views or system catalog tables as source tables.
- Any query that you provide must reference a single source table and output
- to the same columns of the source table.
- If the query includes an ORDER BY clause, then the target
- Greenplum cluster must be the same size (the same number of segments) as the
- source cluster.
- Keep in mind that certain characters must be escaped in strings in order for
- them to be parsed in JSON, specifically:
-
- Character
- Escape Sequence
-
-
- Backspace
- \b
-
-
- Form Feed
- \f
-
-
- Newline
- \n
-
-
- Tab
- \t
-
-
- Double Quote
- \"
-
-
- Backslash
- \\
-
-
- If the file cannot be parsed as JSON, gpcopy exits with an
- error.
- You can use Go language regular expression syntax to select multiple tables.
- Capture groups defined in the source: key can be referenced
- in the dest: key to change the destination database where
- the data is copied. For
- example:[
- {
- "source": "testdb/(\\d+)/.myschema/(\\d+)/.mytable/(\\d+)/",
- "dest": "productiondb/$1/.myschema/$1/.mytable/$1/"
- },
-]
- See the --dest-table option for more information about
- using regular expressions and capture groups. If you use regular
- expressions to copy multiple tables, you cannot include the
- sql: key.
-
- This option cannot be specified with the --full,
- --dbname, --include-table options. You
- cannot use this option with --parallelize-leaf-partitions
- if the JSON file includes a sql: key that queries a
- partitioned table.
-
-
- --metadata-only
- Create only the schemas specified by the command. Data is not
- transferred.
- If specified with the --full option,
- gpcopy replicates the complete database schema,
- including all tables, indexes, views, user-defined types (UDT), and
- user-defined functions (UDF) for the source databases. No data is
- transferred.
- If you specify databases with the --dbname option or tables
- with the --include-table
- --include-table-file, or
- --include-table-json options, gpcopy
- creates only the tables and indexes. No data is transferred.
- This option cannot be used with --truncate option.
-
-
- --exclude-table
- db.schema.table
- A table from the source database system to exclude from transfer. The fully
- qualified table name must be specified
- (database.schema.table).
- To exclude multiple tables, specify a comma-separated list of table
- names.
- A set of tables can be specified using the Go language regular expression
- syntax. See the --include-table option for information
- about using regular expressions.
- Only the specified tables are excluded, not dependent objects. You cannot
- specify views or system catalog tables.
- This option must be specified with one of these options:
- --full, --dbname,
- --include-table, --include-table-file,
- or --include-table-json. If the option
- --exclude-table results in no table to copy, the
- database or schema is not created in the destination system.
-
-
- --exclude-table-file table-file
- The location and name of file containing a list of fully qualified table
- names to exclude from copying to the destination system. In the text file,
- specify a single fully qualified table per line
- (database.schema.table).
- To specify multiple files, specify this option for each
- file.--exclude-table-file <path_to_file1> --exclude-table-file <path_to_file2>
- In the file, a set of tables can be specified using the Go language regular
- expression syntax. See the --include-table option for information
- about using regular expressions.
- If a source table does not exist, gpcopy displays a
- warning.
- Only the specified tables are excluded. You cannot specify views or system
- catalog tables.
- This option must be specified with one of these options:
- --full, --dbname,
- --include-table, --include-table-file,
- or --include-table-json. If the option
- --exclude-table results in no tables to copy, the
- database or schema is not created in the destination system.
-
-
-
-
- Connection Options
- The following options specify connection information for the destination and source
- Greenplum clusters. Only --dest-host is required.
- --jobs, --on-segment-threshold, and
- --parallelize-leaf-partitions affect the number of simultaneous
- connections used for data transfer. --data-port-range defines the
- ports used for data transfer to destination segments or the destination
- master.gpcopy does not currently support SSL
- encryption for its connections.
-
-
- --dest-host dest_host
- Required. The destination Greenplum Database master segment hostname or IP
- address.
-
-
- --dest-port dest_port
- The destination Greenplum Database master segment port number. If
- --dest-port is not specified, then the default is
- 5432.
-
-
- --dest-user dest_user
- The user ID that is used to connect to the destination Greenplum master. If
- not specified, the default is gpadmin.
-
-
- --source-host source_host
- The source Greenplum Database master segment host name or IP address. If not
- specified, the default host is the system running gpcopy
- (127.0.0.1).
-
-
- --source-port source_port
- The source Greenplum Database master port number. If not specified, the
- default is 5432.
-
-
- --source-user source_user
- The user ID that is used to connect to the source Greenplum Database system.
- If not specified, the default is gpadmin.
-
-
- --jobs int
- The number processes that gpcopy runs in parallel. The
- default is 4. The range is from 1 to 64.
- The option --jobs produces
- 2*n+1 database connections. The
- default value, 4, creates 9 connections.
- If you increase this option, ensure that Greenplum Database systems are
- configured with a sufficient maximum concurrent connection value to
- accommodate the gpcopy connections and other concurrent
- connections such as user connections. See the Greenplum Database server
- configuration parameter max_connections.
-
-
- --on-segment-threshold int
- Specifies the number of rows that determines when gpcopy
- copies a table using the Greenplum Database source and destination master
- instead of the source and destination segment instances. The default value
- is 10000 rows. If a table contains 10000 rows or less, the table is copied
- using the Greenplum Database master.
- The value -1 disables copying tables using the master. All
- tables are copied using the segment instances.
- For smaller tables, copying tables using the Greenplum Database master is
- more efficient than using segment instances.
-
-
- --parallelize-leaf-partitions
- If specified, the utility copies the leaf partition tables of a partitioned
- table in parallel. The default is to copy the partitioned table as a single
- table based on the root partition table.
- If the --validate option is also specified, the utility
- validates each leaf partition table during the copy process and then
- validates the entire partitioned table.
- This option cannot be specified with --include-table-json
- if the JSON file includes a sql: key that queries a
- partitioned table.
-
-
- --data-port-range
- lower_port-upper_port
- A range of port numbers to use on Greenplum Database destination hosts for
- data transfer. This applies to destination segment hosts or, if data is
- transferred using the master segment, only the master segment host.
- gpcopy uses the first available port specified in the
- range (inclusive). lower_port must be greater than or
- equal to 1024 (to avoid reserved system ports), and
- upper_port must be a greater value.
- The number of ports specified by the range must be greater than or equal to
- the number of parallel processes created with --jobs, if
- specified.
- If --data-port-range is not specified, then
- gpcopy uses any available port.
-
-
-
-
- Options for Configuring how Data is Copied
- gpcopy provides additional options that affect the way data is
- copied between systems. You must use one of these options to specify how to manage
- data in the destination database: --skip-existing,
- --truncate, --drop, or
- --append. Other options can be included as necessary, for
- example, to perform additional data validation.
-
-
- --skip-existing
- Specify this option to skip copying a table from the source database if the
- table already exists in the destination database.
- At most, only one of the options can be specified:
- --skip-existing, --truncate,
- --drop, or --append.
-
-
- --truncate
- Specify this option to truncate the table that is in the destination
- database if it already exists.
- At most, only one of the options can be specified:
- --skip-existing, --truncate,
- --drop, or --append.
-
-
- --drop
- Specify this option to drop the table that is in the destination database if
- it already exists. Before copying table data, gpcopy drops
- the table and creates it again.
- At most, only one of the options can be specified:
- --skip-existing, --truncate,
- --drop, or --append.
-
-
- --append
- Append data to the table in the destination database if it already exists.
- At most, only one of the options can be specified:
- --skip-existing, --truncate,
- --drop, or --append.
-
-
- --analyze
- Run the ANALYZE command on non-system tables. The default
- is to not run the ANALYZE command. The operation is
- performed for each table after the table data is copied.
-
-
- --no-compression
- If specified, data is transferred without compression. By default,
- gpcopy compresses data during transfer from the source
- to the destination database when copying data to a different host.
- The utility does not compress data when copying data to the same host.
-
-
- --no-distribution-check
- Specify this option to disable table data distribution checking. By default,
- gpcopy performs data distribution checking to ensure
- data is distributed to segment instances correctly. If distribution checking
- fails, the table copy fails.
- The utility does not support table data distribution checking when copying a
- partitioned table that is defined with a leaf table that is an external
- table or if a leaf table is defined with a distribution policy that is
- different from the root partitioned table. Before you
- perform a gpcopy operation with the
- --no-distribution-check option, ensure that you
- have a backup of the destination database and that the distribution
- policies of the tables that are being copied are the same in the source
- and destination database. Copying data into segment instances with
- incorrect data distribution can cause incorrect query results and can
- cause database corruption.
-
-
- --truncate-source-after
- Specify this option to truncate the table that is in the source database
- after gpcopy copies the table and validates the table data
- in the destination database.
- If you specify this option, you must also specify the
- --validate option.
-
- --yes
- Optional. Automatic confirmation to truncate source table data
- after copying and validating table data. The prompt to truncate
- source tables does not appear. The default is to prompt to
- confirm truncating source table data after copying and
- validating the data.
-
-
-
-
- --validate type
- Perform data validation on the table data in the destination database after
- the table data is copied. These are the supported type of validation.
- - count - compares row counts between source and
- destination table data.
- - md5xor - calculates the MD5 value of all rows, then
- performs an XOR over the MD5 values
-
- If you specify the --append option, and the destination
- table contains data, validation fails for the table.
- If validation for a table fails, gpcopy rolls back the
- destination table.
-
-
-
-
- Additional Options
-
-
- --dry-run
- When you specify this option, gpcopy generates a list of
- the migration operations that would have been performed with the specified
- options. The data is not migrated.
- The information is displayed at the command line and written to the log
- file.
-
-
- --quiet
- If specified, suppress status messages at the command prompt. The messages
- are sent only to the log file. Higher level messages such as warning and
- error messages are still displayed.
- This option cannot be specified with the --debug
- option.
-
-
- --debug
- If specified, debug messages are displayed at the command prompt.
- This option cannot be specified with the --quiet
- option.
-
-
- --version
- Displays the version of this utility.
-
-
- --help
- Displays the online help.
-
-
-
-
- Notes
- If a gpcopy command specifies an invalid option, or specifies a
- source table or database that does not exist, the utility returns an error and
- quits. No data is copied.
- The source table data can change while the data is being copied. A lock is not
- acquired on the source table when data is copied.
- The utility cannot copy a row with a width greater than 1GB (a PostgreSQL
- limitation).
- If you copy a set of database tables with the --dbname,
- --include-table, --include-table-file, or
- --include-table-json options, and the destination database does
- not exist, the utility creates the database before copying the tables. If the
- destination database exists, the utility creates the tables in the database if
- required.
- The gpcopy utility does not copy dependent database objects unless
- you specify the --full option. For example, if a table has a
- default value on a column that is a user-defined function, that function must exist
- in the destination system database when using the --dbname,
- --include-table, --include-table-file, or
- --include-table-json options.
- When copying tables, sequences defined for tables are considered table data and are
- copied. A sequence is copied if a table is created with a serial column or if a
- sequence is specified as a default value. The sequences are reset if you specify the
- --truncate option.
- The utility re-creates table indexes only with --full or
- --drop options.
- The gpcopy utility does not copy configuration files such as
- postgresql.conf and pg_hba.conf. You must set
- up the destination system configuration separately.
- The gpcopy utility does not copy external objects such as Greenplum
- Database extensions, third party jar files, and shared object files. You must
- install the external objects separately.
- gpcopy does not currently support SSL encryption for its
- connections.
-
- Specifying Table Names with Special Characters
- When you list a table with the option --include-table
- or --exclude-table, and the table name or schema name contains
- single quote ('), double quote (") , or
- backslash (\), you must escape the character with a backslash
- (\). These are the escaped characters:
- \', \", \\. If the table
- or schema name contains a period (.), comma
- (,), or a space character, you must enclose the entire name in
- single quotes (') and enclose the table and schema in double
- quotes (") at the command prompt shell. In this example, the
- fully qualified name of the table is
- testdb.schema'1.table"test.--include-table 'testdb."schema\'1"."table\"test"'
- For fully qualified table names listed in a file that is used with the
- options --include-table-file,
- --include-table-json, or
- --exclude-table-file, if the table name or schema name
- contains a period (.), the name must be enclosed in double
- quotes ("). In this example, the table
- table.test is in the testdb, and belongs
- to the schema
- schema.1.testdb."schema.1"."table.test"
-
-
- Copying Partitioned Tables
- When copying data for a partitioned table, if a leaf partition has been exchanged
- with an external table, that leaf partition is created, but data is not copied.
- If you specify copying leaf partitions of a partitioned table with an option such
- as --include-table or --exclude-table,
- gpcopy creates the partitioned table if it does not exist.
- Only the data for the specified leaf partitions are added to the partitioned
- table. Specifying individual leaf partitions is useful when the entire
- partitioned tabled does not need to be copied.
- gpcopy does not support table data distribution checking when
- copying a partitioned table that is defined with a leaf table that is an
- external table or if a leaf table is defined with a distribution policy that is
- different from the root partitioned table. You can copy those tables in a
- gpcopy operation and specify the option
- --no-distribution-check to disable checking of data
- distribution.
- Before you perform a gpcopy operation with the
- --no-distribution-check option, ensure that you have a
- backup of the destination database and that the distribution policies of the
- tables that are being copied are the same in the source and destination
- database. Copying data into segment instances with incorrect data distribution
- can cause incorrect query results and can cause database corruption.
-
-
- Handling gpcopy Errors
- When gpcopy encounters errors and quits or is cancelled by the
- user, current copy operations on tables in the destination database are rolled
- back. Copy operations that have completed are not rolled back.
- If an error occurs during the process of copying a table, or table validation
- fails, gpcopy continues copying the other specified tables.
- After gpcopy finishes, it displays a list of tables where
- errors occurred or validation failed and displays a gpcopy
- command. You can use the provided command to retry copying the failed tables.
- The gpcopy utility logs messages in log file
- gpcopy_date.log in the
- ~/gpAdminLogs directory on the master host. If you run
- multiple gpcopy commands on the same day, the utility appends
- messages to that day's log file.
- After gpcopy completes, it displays a summary of the operations
- performed. If the utility fails to copy tables, they are highlighted in summary,
- and there is a gpcopy command provided in summary for user to
- just copy the failed tables. The information is displayed at the command prompt
- and in the gpcopy log file.
- After resolving the issues that caused the copy operations to fail, you can run
- the provided command to copy the tables that failed in the previous
- gpcopy command.
-
-
- Database Connections Created by gpcopy
- The option --jobs produces
- 2*n+1 database connections. The default
- value 4, creates 9 connections.
- If you increase this option, ensure that both source and destination Greenplum
- Database is configured with a sufficient maximum concurrent connection value to
- accommodate the gpcopy connections and other concurrent
- connections such as user connections. See the Greenplum Database server
- configuration parameter max_connections in
- the Greenplum Database Reference Guide.
-
-
-
- Examples
- This command
- copies all user created databases in a source system to a destination system with
- the --full option. And drops the table and creates it again if it
- already exists in the destination.
- gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
- --dest-host demohost --dest-port 1234 --dest-user gpuser \
- --full --drop
- This command copies the specified databases in a source system to a
- destination system with the --dbname option. The
- --truncate option truncates the table data before copying table
- data from the source table.
- gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
- --dest-host demohost --dest-port 1234 --dest-user gpuser \
- --dbname database1, database2 --truncate
- This command copies the specified tables in a source system to a
- destination system with the --include-table option. The
- --skip-existing option skips the table if it already exists in
- the destination database.
- gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
- --dest-host demohost --dest-port 1234 --dest-user gpuser \
- --include-table database.schema.table1, database.schema.table2 --skip-existing
- This command copies the tables from the source database to the destination
- system, excluding the tables specified in the specified tables in
- /home/gpuser/mytables with
- --exclude-table-file option. The --truncate
- option truncates tables that already exist in the destination system. With the
- options --analyze and --validate count, the
- utility performs an ANALYZE operation on the copied tables, and validates the copied
- table data by comparing row counts between source and destination tables.
- gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
- --dest-host demohost --dest-port 1234 --dest-user gpuser \
- --dbname database1 --exclude-table-file /home/gpuser/mytables \
- --truncate --analyze --validate count
- This command specifies the --full and
- --metadata-only options to copy the complete database schema,
- including all tables, indexes, views, user-defined types (UDT), and user-defined
- functions (UDF) from all the source databases. No data is copied, The
- --drop option specifies that the table is dropped in the
- destination database before it is created again if the table exists in both the
- source and destination database.
- gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
- --dest-host demohost --dest-port 1234 --dest-user gpuser \
- --full --metadata-only --drop
- This command copies the specified databases in a source system to a
- destination system with the --dbname option and specifies 8
- parallel processes with the --jobs option. The command specifies
- the --truncate option to truncate the table and create it again if
- it already exists in the destination database, and uses uses ports in the range
- 2000-2010 for the parallel process connections.
- gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
- --dest-host demohost --dest-port 1234 --dest-user gpuser \
- --dbname database1, database2 --truncate --jobs 8 --data-port-range 2000-2010
- This command
- copies the specified database in a source system to a destination system with the
- --dbname option and specifies 16 parallel processes with the
- --jobs option. The --truncate option truncates
- the table and creates it again if it already exists in the destination database. The
- --truncate-source-after option truncates the tables in source
- database after that table data has been validated in the destination database.
- gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
- --dest-host demohost --dest-port 1234 --dest-user gpuser \
- --dbname database1 --truncate --jobs 16 --truncate-source-after --validate count
- In the previous example, if --truncate was not specified and the
- destination table contained data, validation would fail.
- This is an example table file that uses regular expressions.
-
- "test1.arc/.*/./.*/"
-"test1.c/(..)/y./.*/"
-
- In the first line, the regular expressions for the schemas, arc/.*/,
- and for the tables, /.*/, limit the transfer to all tables with the
- schema names that start with arc.
- In the second line, the regular expressions for the schemas,
- c/(..)/y, and for the tables, /.*/, limit the
- transfer to all tables with the schema names that are four characters long and that
- start with c and end with y, for example,
- crty.
- When the command is run, tables in the database test1 that satisfy
- either condition are copied to the destination database.
-
-
- See Also
- For information about migrating data, see the Greenplum Database Administrator
- Guide.
-
+ The gpcopy utility is available as a separate download for the
+ commercial release of Pivotal Greenplum Database. See the Pivotal
+ gpcopy Documentation.