diff --git a/gpdb-doc/dita/GPAdminGuide.ditamap b/gpdb-doc/dita/GPAdminGuide.ditamap index a8db968c7bc2ccc775a8412df9f580679c1721e4..f624fcad199546b6e0e3b9934210884331528761 100644 --- a/gpdb-doc/dita/GPAdminGuide.ditamap +++ b/gpdb-doc/dita/GPAdminGuide.ditamap @@ -86,7 +86,7 @@ - + diff --git a/gpdb-doc/dita/admin_guide/managing/gpcopy-migrate.xml b/gpdb-doc/dita/admin_guide/managing/gpcopy-migrate.xml index 4d26e418f91ee72293c160ecce3decca387c681e..99399662a1b7aa254caa46d3616c9d6d1b06ba1d 100644 --- a/gpdb-doc/dita/admin_guide/managing/gpcopy-migrate.xml +++ b/gpdb-doc/dita/admin_guide/managing/gpcopy-migrate.xml @@ -10,289 +10,9 @@ another Greenplum database. You can migrate the entire contents of a database, or just selected tables. The clusters can have different Greenplum Database versions. For example, you can use gpcopy to migrate data from Greenplum 5 to Greenplum 6.

- The gpcopy utility is available only in the commercial release of - Pivotal Greenplum Database. -

The gpcopy interface includes options to transfer one or more full - databases, or one or more database tables. A full database transfer includes the database - schema, table data, indexes, views, roles, user-defined functions, resource queues, and - resource groups. If a copied table or database does not exist in the destination cluster, - gpcopy creates it automatically, along with indexes as necessary.

-

Configuration files, including postgresql.conf and - pg_hba.conf, must be transferred manually by an administrator. Extensions - installed in the database with gppkg, such as MADlib and programming language - extensions, must be installed in the destination database by an administrator.

-

gpcopy is a command-line tool that includes these features:

    -
  • gpcopy can migrate data between Greenplum Database systems where the - source and destination systems are configured with a different number of segment - instances.
  • -
  • gpcopy provides detailed reporting and summary information about all - aspects of the copy operation.
  • -
  • -

    gpcopy allows the source table data to change while the data is being - copied. A lock is not acquired on the source table when data is copied.

    -
  • -
  • The gpcopy utility includes the - --truncate-source-after option to help migrate data from one Pivotal - Greenplum Database system to another on the same hardware, requiring minimal free space - available.
  • -

+ The gpcopy utility is available as a separate download for the + commercial release of Pivotal Greenplum Database. See the Pivotal gpcopy + Documentation. - - Prerequisites - -

The source and destination Greenplum Database systems must already exist, have network - access between all hosts, and have master host and primary segment hosts in both - systems.

-

gpcopy is dependent on the pg_dump, - pg_dumpall, and psql utilities installed with Greenplum - Database. In most cases, you run gpcopy from a Greenplum Database cluster, - so the dependencies are automatically met. If you need to run gpcopy on a - remote server, such as an ETL system, copy the gpcopy binary and install a - compatible Greenplum Clients package to meet the gpcopy - dependencies.

-

gpcopy supports migrating data from a Greenplum 5 cluster to a Greenplum - Database 6 cluster.

-

gpcopy does not currently support SSL encryption for its connections.

- -
- - Limitations for the Source and Destination Systems - -

If you are copying data between Greenplum Database clusters having different versions, each - cluster must have gpcopy installed locally. gpcopy is - installed with Pivotal Greenplum Database starting with version 5.9.

-

gpcopy transfers data from user databases only; the - postgres, template0, and template1 - databases cannot be transferred. Administrators must transfer configuration files manually - and install extensions into the destination database with gppkg.

-

gpcopy cannot copy a row that is larger than 1GB in size.

-

gpcopy does not support table data distribution checking when copying a - partitioned table that is defined with a leaf table that is an external table or if a leaf - table is defined with a distribution policy that is different from the root partitioned - table. You can copy those tables in a gpcopy operation and specify the - --no-distribution-check option to disable checking of data distribution.

- Before you perform a gpcopy operation with the - --no-distribution-check option, ensure that you have a backup of the - destination database and that the distribution policies of the tables that are being copied - are the same in the source and destination database. Copying data into segment instances - with incorrect data distribution can cause incorrect query results and can cause database - corruption. -

When transferring data between databases, you can run only one instance of - gpcopy at a time. Running multiple, concurrent instances of - gpcopy is not supported.

- -
- - Configuring Parallel Jobs - -

The degree of parallelism when running gpcopy is determined the option - --jobs. The option controls the number processes that - gpcopy runs in parallel. The default is 4. The range is from 1 to 64.

-

The --jobs value, n, produces - 2*n+1 database connections. For example, the default - --jobs value of 4 creates 9 connections.

-

If you increase this option, ensure that the Greenplum Database systems are configured with - a sufficient maximum concurrent connection value to accommodate the gpcopy - connections and any other concurrent connections (such as user connections) that you - require. See the Greenplum Database server configuration parameter - max_connections.

- -
- - Validating Copied Data - -

By default, gpcopy does not validate the data transferred. You can request - validation using the --validate=type option. You must include - --validate=type if you specify the - --truncate-source-after option. The validation type can be one of - the following:

    -
  • count - compares the row counts between the source and destination - tables.
  • -
  • md5xor - validates by selecting all rows of the source and - destination tables, converting all columns in a row to text, and then calculating the - md5 value of each row. gpcopy then performs an XOR over the MD5 values - to ensure that all rows were successfully copied for the table.
  • -

- Avoid using --append with either validation option. If you use - --append and the source table already includes rows, then either - validation method will fail due to the different number of rows in the source and - destination tables. - -
- - Addressing Failed Data Transfers - -

When gpcopy encounters errors and quits or is cancelled by the user, - current copy operations on tables in the destination database are rolled back. Copy - operations that have completed are not rolled back.

-

If an error occurs during the process of copying a table, or table validation fails, - gpcopy continues copying the other specified tables. After - gpcopy finishes, it displays a list of tables where errors occurred or - validation failed and displays a gpcopy command. You can use the provided - command to retry copying the failed tables.

-

The gpcopy utility logs messages in log file - gpcopy_date.log in the - ~/gpAdminLogs directory on the master host. If you run multiple - gpcopy commands on the same day, the utility appends messages to that - day's log file.

-

After gpcopy completes, it displays a summary of the operations performed. - If the utility fails to copy tables, they are highlighted in summary, and there is a - gpcopy command provided in summary for user to just copy the failed - tables. The information is displayed at the command prompt and in the - gpcopy log file.

-

After resolving the issues that caused the copy operations to fail, you can run the - provided command to copy the tables that failed in the previous gpcopy - command.

- -
- - Performing a Basic Data Migration - -

Follow this procedure to migrate data from one Greenplum Database system to another with - gpcopy:

    -
  1. Start both the source and destination clusters.
  2. -
  3. Perform a full database backup in the source Greenplum Database system. See .
  4. -
  5. As a best practice, source the greenplum_path.sh file in the source - Greenplum 5 installation, so that you execute gpcopy from the source - system. For - example:$ source /usr/local/greenplum-db-5.20.0/greenplum_path.sh
  6. -
  7. Use gpcopy with the --full option to migrate your - data to the destination system. A full migration automatically copies all database - objects including tables, indexes, views, roles, functions, user defined types (UDT), - resource queues, and resource groups for all user defined databases. Include the - --drop to drop any destination tables that may already exist - (recreating es as necessary). For example: - gpcopy --source-host mytest --source-port 1234 --source-user gpuser \ - --dest-host demohost --dest-port 1234 --dest-user gpuser \ - --full --drop --validate count

    With - the above command, the utility drops tables in the destination database - (--drop option) and uses the row count of the source and - destination tables to validate the data transfer (--validate count - option). The other gpcopy options specify the source and destination - Greenplum Database system master hosts, ports, and the User ID to use to connect to - the Greenplum Database systems. While the example command performs a full system - copy, consider migrating only portions of the your data at a time, so that you can - reduce downtime while addressing table errors or validation failures that may occur - during the copy operation.

  8. -
  9. The gpcopy utility does not copy configuration files such - as postgresql.conf and - pg_hba.conf. You must set up the destination system - configuration as necessary to match the source system.
  10. -
  11. The gpcopy utility does not copy external objects such as - Greenplum Database extensions, third party jar files, and shared object files. You must - recreate these external objects as necessary to match the source system.
  12. -
  13. After migrating data you may need to modify SQL scripts, administration scripts, and - user-defined functions as necessary to account for changes in Greenplum Database version - 6.0. See the Pivotal Greenplum 6.0 Release - Notes for features or changes that may necessitate post-migration tasks.
  14. -

-

See the gpcopy reference - page for complete syntax and usage information.

- -
- - Migrating Data Between Clusters that Share Hardware - -

In order to migrate data between two clusters on the same hardware, you should have enough - free disk space to accommodate over 5 times the original data set. This enables you to - maintain 2 full copies of the primary and mirror data sets (on the source and destination - systems), plus the original backup data in ASCII format.

-

If you attempt to migrate on the same system but you run out of disk space, the - gpcopy utility provides the --truncate-source-after - option to help you complete the operation with only a minimum of free disk space. The - --truncate-source-after option instructs the utility to truncate each - source table after successfully copying the table data to the destination cluster and - validating that the copy succeeded.

-

If you choose to use --truncate-source-after, consider the following:

    -
  • Using the --truncate-source-after option does not allow for an easy - rollback to the source system to its original condition if errors occur or validation - checks fail during the gpcopy operation. Table errors or validation - failures during the gpcopy operation can leave some tables remaining in - the source cluster, while other tables may be empty (having been truncated after being - copied to the new cluster). Back up all source data before using gpcopy - with --truncate-source-after.
  • -
  • Migrating data with --truncate-source-after still requires an amount - of free disk space equal to the sum of the largest tables that you will migrate in a - single batch using gpcopy. For example, with a --jobs - setting of 5, you must ensure that you have free space equal to the sum of the 5 largest - tables copied in the batch. The procedure below provides sample commands to determine - the largest table sizes.
  • -
  • You must use the --validate option with - --truncate-source-after to ensure that data is successfully copied - before source tables are truncated.
  • -

-

If you attempt to use the instructions in - to migrate systems that use the same hardware, but you do not have the required free - space:

    -
  1. Start both the source and destination clusters.
  2. -
  3. Perform a full database backup in the source Greenplum Database system. See .
  4. -
  5. Determine if you have enough free space to migrate your data using - --truncate-source-after. Migrating data "in-place" requires an amount - of free disk space equal to the sum of the largest tables that you will migrate in a - single batch using gpcopy. For example, if you want to use a - --jobs setting of 5, ensure that you have free space equal to the sum - of the 5 largest tables copied in the batch.

    The following query lists the largest 5 - tables in your source system; modify the query as needed depending on the - --jobs setting you intend to - use:gpadmin=# SELECT n.nspname, c.relname, c.relstorage, pg_relation_size(c.oid) -FROM -pg_class c JOIN pg_namespace n ON (c.relnamespace=n.oid) -JOIN pg_catalog.gp_distribution_policy p ON (c.oid = p.localoid) -WHERE -n.nspname NOT IN ('gpexpand', 'pg_bitmap', 'information_schema', 'gp_toolkit') -AND n.nspname NOT LIKE 'pg_temp_%%' AND c.relstorage <> 'v' -ORDER BY 4 DESC LIMIT 5; -

    Either - free enough disk space to cover the sum of the table sizes shown in the above query, - or consider using a smaller --jobs value to reduce the free space - requirements.

  6. -
  7. As a best practice, source the greenplum_path.sh file in the source - Greenplum 5 installation, so that you execute gpcopy from the source - system. For - example:$ source /usr/local/greenplum-db-5.20.0/greenplum_path.sh
  8. -
  9. Use a gpcopy with the --truncate-source-after and - --validate options to migrate your data to the destination system. A - full migration automatically copies all database objects including tables, es, views, - roles, functions, user defined types (UDT), resource queues, and resource groups for all - user defined databases. Include the --drop to drop any destination - tables that may already exist (recreating es as necessary). The - --truncate-source-after truncates each source table, only after - copying and validating the table data in the destination system. For example: - gpcopy --source-host my_host --source-port 1234 --source-user gpuser \ ---dest-host my_host --dest-port 1235 --dest-user gpuser --full --drop \ --truncate-source-after --analyze --validate count

    The - above command performs a full database copy, first dropping tables in the destination - database (--drop option) if they already exist. - gpcopy truncates each table in the source system only after - successfully copying and validating the table data in the destination system. The - other gpcopy options specify the source and destination Greenplum - Database system master hosts, ports, and the User ID to use to connect to the - Greenplum Database systems. - While the example command performs a full system copy, consider migrating only - portions of the your data at a time, so that you can reduce downtime while - addressing table errors or validation failures that may occur during the copy - operation.

  10. -
  11. The gpcopy utility does not copy configuration files such - as postgresql.conf and - pg_hba.conf. You must set up the destination system - configuration as necessary to match the source system.
  12. -
  13. The gpcopy utility does not copy external objects such as - Greenplum Database extensions, third party jar files, and shared object files. You must - recreate these external objects as necessary to match the source system.
  14. -
  15. After migrating data you may need to modify SQL scripts, administration scripts, and - user-defined functions as necessary to account for changes in Greenplum Database version - 6.0. See the Pivotal Greenplum 6.0 Release - Notes for features and changes that may necessitate post-migration tasks.
  16. -

-

See the gpcopy reference - page for complete syntax and usage information.

- -
diff --git a/gpdb-doc/dita/admin_guide/managing/partII.xml b/gpdb-doc/dita/admin_guide/managing/partII.xml index 1bd24b7d0139a4b4fbaffc6baf73e3bafc2aaae7..d2a780d4b84a17f9a3071da5c981adec913b8f53 100644 --- a/gpdb-doc/dita/admin_guide/managing/partII.xml +++ b/gpdb-doc/dita/admin_guide/managing/partII.xml @@ -16,7 +16,7 @@ />
  • -
  • +
  • diff --git a/gpdb-doc/dita/utility_guide/admin_utilities/gpcopy.xml b/gpdb-doc/dita/utility_guide/admin_utilities/gpcopy.xml index 5a0a20af04f0fd316949452640dc6c117f42b46d..cff7b6f9b556fea85ee03e4518f9d2b24d99e9d8 100644 --- a/gpdb-doc/dita/utility_guide/admin_utilities/gpcopy.xml +++ b/gpdb-doc/dita/utility_guide/admin_utilities/gpcopy.xml @@ -7,864 +7,9 @@

    The gpcopy utility copies objects from databases in a source Greenplum Database system to databases in a destination Greenplum Database system.

    - The gpcopy utility is available only in the commercial release of - Pivotal Greenplum Database. -
    - Synopsis - gpcopy - { --full | - { --dbname database1[, database2 ... ] - [ --dest-dbname dest-db1[, dest-db2 ... ] ] } | - --include-table db.schema.table[, db.schema1.table1 ... ] - [ --dest-table db.schema.table[, db.schema1.table1 ... ] | - --include-table-file table-file1 - [ --include-table-file table-file2] ... ] | - --include-table-json json-table-file1 - [ --include-table-json json-table-file2] ... ] } - [ --metadata-only ] - [ --exclude-table db.schema.table[, db.schema1.table1 ... ] ] - [ --exclude-table-file table-file1 ] - [ --exclude-table-file table-file2 ] ... ] ] - - { --dest-host dest_host [ --dest-port dest_port ] - [ --dest-user dest_user ] } - [ --source-host source_host [ --source-port source_port ] - [ --source-user source_user ] ] - [ --jobs int ] - [ --on-segment-threshold int ] - [ --parallelize-leaf-partitions ] - [ --data-port-range lower_port-upper_port ] - - { --skip-existing | --truncate | --drop | --append } - [ --analyze ] - [ --no-compression ] - [ --no-distribution-check ] - [ --truncate-source-after [--yes ] ] - [ --validate type ] - - [ --dry-run ] - [ --quiet | --debug ] - -gpcopy --version -gpcopy --help -
    -
    - Description -

    The gpcopy utility copies database objects from a source Greenplum - Database system to a destination system. You can perform one of the following types - of copy operations:

    -
      -
    • Copy a Greenplum Database system with the --full - option. This option copies all database objects including, tables, table data, - indexes, views, users, roles, functions, and resource queues for all - user-defined databases to a different destination system.
    • -
    • Copy a set of user-defined database tables to a destination - system.
        -
      • The --dbname option copies all user-defined tables, - table data, and re-creates the table indexes from specified - databases.
      • -
      • The --include-table - --include-table-file, or - --include-table-json option copies a specified set - of user-defined tables, table data, and re-creates the table indexes.
      • -
      • The --exclude-table and - --exclude-table-file options exclude a specified - set of user-defined tables and table data to be copied.
      • -
    • -
    • Copy only the database schemas with the --metadata-only - option.
    • -
    -

    When running gpcopy, you must specify the data to copy from the - source database and how to manage data in the destination database.

      -
    • You must use one and only one of these options to specify the data to be - copied from the source database: --full, - --dbname, --include-table, - --include-table-file, or - --include-table-json.
    • -
    • You must use one of these options to specify how to manage data in the - destination database: --skip-existing, - --truncate, --drop, or - --append.

      If you specify both the - --append and --validation options, - validation of source table data fails if a destination table contains - data.

    • -

    -

    If you specify the option --truncate-source-after, you must also - specify the --validate option. When - --truncate-source-after is specified, gpcopy - truncates the source table after the table data is copied and destination table data - has been validated.

    -
    -
    - Prerequisites -

    The user IDs connecting to the source and destination Greenplum Database systems, the - --source-user and --dest-user, must have - appropriate access to the systems.

    -

    If Kerberos authentication is enabled for Greenplum Database, gpcopy - can authenticate with Kerberos. Run the kinit command to obtain a - ticket-granting ticket from the KDC server before you execute - gpcopy. The PGKRBSRVNAME environment variable - specifies the Kerberos service name for Greenplum Database. If your Greenplum - Database service name is different than the default (postgres) set - the PGKRBSRVNAME environment variable with the correct service name - before you run gpcopy. See for information about enabling - Kerberos authentication with Greenplum Database.

    -

    The source and destination Greenplum Database segment hosts need to be able to - communicate with each other. To ensure that the segment hosts can communicate, you - can use a tool such as the Linux netperf utility.

    -

    When the --full option is specified, resource groups and table - spaces are copied, however, the utility does not configure the destination system. - For example, you must configure the system to use resource groups and create the - host directories for the tablespaces.

    -
    -
    - Options for Choosing Data to Copy -

    gpcopy provides a range of options to define the scope of data that - is copied. You can choose options to perform a full Greenplum cluster migration, - copy specific databases or tables, or only portions of a table using a SQL query. - Additional options enable you to exclude certain tables from being copied, or to - change the destination database into which a table's data is copied. The special - --metadata-only instructs gpcopy to create the - necessary schema for the selected source tables, but copy no table data.

    -

    You must use at least one of the options --full, - --dbname, --include-table, - --include-table-file, or --include-table-json. - Use other options as needed to exclude data from the copy or to change the - destination database for copied tables.

    - - - --full - This option performs a migration of a Greenplum Database source system to a - destination system. - A migration copies all database objects including, tables, indexes, views, - roles, functions, user-defined types (UDT), resource queues, and resource - groups for all user-defined databases. The default databases, - postgres, template0, and - template1, are not copied. - This option cannot be specified with the --dbname, - --include-table, --include-table-file, - or --include-table-json options. - - - --dbname database - A source database to copy. To copy multiple databases to the destination - system specify a comma-separated list of databases with no spaces between - the names. All the user-defined tables and table data are copied to the - destination system. - If the source database does not exist, gpcopy returns an - error and quits. If a destination database does not exist a database is - created. - Not valid with the --full, - --include-table, --include-table-file, or - --include-table-json options. - Alternatively, you can copy a set of tables with the - --include-table, --include-table-file, - or --include-table-json option. - - - --dest-dbname database - To copy a database to a different destination database, specify the name of - the destination database. For multiple databases, specify a comma-separated - list of databases with no spaces between the names. The number of database - names must match the number of names specified in the - --dbname option. The utility copies the source - databases to the destination databases in the listed order. In this example, - db1 is copied to destdb1, - db2 is copied to destdb2, and - db3 is copied to db3. - gpcopy --dest-host mdw-2 --dbname=db1,db2,db3 --dest-dbname=destdb1,destdb2,db3 --drop - If the source database does not exist, gpcopy returns an - error and quits. If a destination database does not exist a database is - created. - Valid only with the --dbname option. - - - --include-table - db.schema.table - One or more tables from the source database system to copy. You must provide - fully-qualified table names - (database.schema.table). - You cannot specify views or system catalog tables. To copy multiple tables, - include a comma-separated list of table names or use regular expressions to - describe a set of tables. You can optionally use - --dest-table to change the databases into which tables - in --include-table are copied. - You can use Go language regular expressions in the database, schema, and - table portions of the fully-qualified table name to define a set of input - tables. The regular expression pattern must be enclosed in slashes - (/RE_pattern/). For example, - --include-table mytest.public.demo/.*/ specifies all - tables that begin with demo in the mytest - database in the public schema. - The following two examples for the --include-table option - are equivalent. They both specify a set of tables that begins with demo and - ends with zero or more - digits.--include-table testdb.schema1.demo/[0-9]*/ ---include-table testdb.schema1./demo[0-9]*/ - Regular expression capture groups in the database portion of the - fully-qualified name can be referenced in --dest-table to - change the destination database for a table. (Using capture groups to change - the schema or table name is not currently supported.) - If the source table does not exist, gpcopy returns an error - and quits. - If the destination table or database does not exist, it is created. Only the - table and table data are copied, not dependent objects. Indexes are - re-created only if the --drop option is specified. - Dependent objects are not copied. - This option is not allowed with the options: --full, - --dbname, --include-table-file, or - --include-table-json. - - - --dest-table - db.schema.table - (Optional.) Changes the database where tables defined with - --include-table are copied. Greenplum does not - currently support changing the destination schema or table name. - You must provide fully-qualified table names - (database.schema.table). - Specify multiple tables using either a comma-separated list or by - referencing regular expression capture groups that were defined with - --include-table. If you use a comma-separated list of - tables with --include-table, use the same number position - in the --dest-table list to change the destination database - of the corresponding table. For example, to move only the second table - provided with an option - like--include-table mytest.public.table1,mytest.public.table2use - an option similar - to:--dest-table production.public.table1,production.public.table2 - If you use Go language regular expressions capture groups in - --include-table to define a set of tables, you can - reference the capture groups in --dest-table to rename the - destination database. Capture groups referenced in - --include-table or --dest-table must - use forward slash delimiters (/) to indicate regular expression processing. - For example, to reference the capture group defined in the database string - of--include-table testdb/(\d+)/.myschema/(\d+)/.mytable/(\d+)/use - an option similar - to:--dest-table productiondb/$1/.myschema/$1/.mytable/$1/Note - that the capture group numbering starts at 1 and restarts for each component - of the fully-qualified table name (database, schema, and table). Therefore, - in the above example each capture group is referenced with - /$1/ in --dest-table. You cannot - rename the destination schema or table name components. - If regular expression rules would cause more than one source table name to - be remapped to the same destination table, then the - --skip-existing, --truncate, - --drop, or --append options determine - how gpcopy handles subsequent copy requests to the existing - table. - - - --include-table-file table-file - The location and name of a text file that defines the tables and data to - copy. To use multiple files, specify this option for each - file.--include-table-file <path_to_file1> --include-table-file <path_to_file2> - In the text file, specify a single fully qualified table per line - (database.schema.table). You cannot specify views or system - catalog tables. - You can use Go language regular expression syntax to select multiple tables. - See the --dest-table option for information about using - regular expressions to select tables. - This option cannot be specified with the --full, - --dbname, or --include-table - options. - - - --include-table-json json-table-file - The location and name of a JSON-format file that defines the tables and data - to copy. In contrast to the text file used with - --include-table-file, the JSON file can include a - destination table name used to change the database into which the table data - is copied. - The JSON file that you provide must define one or more objects with - key-value pairs to describe a source table, an optional query of the source - table data to copy, and an optional destination table to indicate the - database where the data is copied. Place the name-value pairs for multiple - table objects in a JSON array. For - example:[ - { - "source": "database.schema.table", - "sql": "query" - "dest": "database.schema.table" - }, - { - ... } -] - You cannot specify views or system catalog tables as source tables. - Any query that you provide must reference a single source table and output - to the same columns of the source table. - If the query includes an ORDER BY clause, then the target - Greenplum cluster must be the same size (the same number of segments) as the - source cluster. - Keep in mind that certain characters must be escaped in strings in order for - them to be parsed in JSON, specifically: - - Character - Escape Sequence - - - Backspace - \b - - - Form Feed - \f - - - Newline - \n - - - Tab - \t - - - Double Quote - \" - - - Backslash - \\ - - - If the file cannot be parsed as JSON, gpcopy exits with an - error. - You can use Go language regular expression syntax to select multiple tables. - Capture groups defined in the source: key can be referenced - in the dest: key to change the destination database where - the data is copied. For - example:[ - { - "source": "testdb/(\\d+)/.myschema/(\\d+)/.mytable/(\\d+)/", - "dest": "productiondb/$1/.myschema/$1/.mytable/$1/" - }, -] - See the --dest-table option for more information about - using regular expressions and capture groups. If you use regular - expressions to copy multiple tables, you cannot include the - sql: key. - - This option cannot be specified with the --full, - --dbname, --include-table options. You - cannot use this option with --parallelize-leaf-partitions - if the JSON file includes a sql: key that queries a - partitioned table. - - - --metadata-only - Create only the schemas specified by the command. Data is not - transferred. - If specified with the --full option, - gpcopy replicates the complete database schema, - including all tables, indexes, views, user-defined types (UDT), and - user-defined functions (UDF) for the source databases. No data is - transferred. - If you specify databases with the --dbname option or tables - with the --include-table - --include-table-file, or - --include-table-json options, gpcopy - creates only the tables and indexes. No data is transferred. - This option cannot be used with --truncate option. - - - --exclude-table - db.schema.table - A table from the source database system to exclude from transfer. The fully - qualified table name must be specified - (database.schema.table). - To exclude multiple tables, specify a comma-separated list of table - names. - A set of tables can be specified using the Go language regular expression - syntax. See the --include-table option for information - about using regular expressions. - Only the specified tables are excluded, not dependent objects. You cannot - specify views or system catalog tables. - This option must be specified with one of these options: - --full, --dbname, - --include-table, --include-table-file, - or --include-table-json. If the option - --exclude-table results in no table to copy, the - database or schema is not created in the destination system. - - - --exclude-table-file table-file - The location and name of file containing a list of fully qualified table - names to exclude from copying to the destination system. In the text file, - specify a single fully qualified table per line - (database.schema.table). - To specify multiple files, specify this option for each - file.--exclude-table-file <path_to_file1> --exclude-table-file <path_to_file2> - In the file, a set of tables can be specified using the Go language regular - expression syntax. See the --include-table option for information - about using regular expressions. - If a source table does not exist, gpcopy displays a - warning. - Only the specified tables are excluded. You cannot specify views or system - catalog tables. - This option must be specified with one of these options: - --full, --dbname, - --include-table, --include-table-file, - or --include-table-json. If the option - --exclude-table results in no tables to copy, the - database or schema is not created in the destination system. - - -
    -
    - Connection Options -

    The following options specify connection information for the destination and source - Greenplum clusters. Only --dest-host is required. - --jobs, --on-segment-threshold, and - --parallelize-leaf-partitions affect the number of simultaneous - connections used for data transfer. --data-port-range defines the - ports used for data transfer to destination segments or the destination - master.gpcopy does not currently support SSL - encryption for its connections.

    - - - --dest-host dest_host - Required. The destination Greenplum Database master segment hostname or IP - address. - - - --dest-port dest_port - The destination Greenplum Database master segment port number. If - --dest-port is not specified, then the default is - 5432. - - - --dest-user dest_user - The user ID that is used to connect to the destination Greenplum master. If - not specified, the default is gpadmin. - - - --source-host source_host - The source Greenplum Database master segment host name or IP address. If not - specified, the default host is the system running gpcopy - (127.0.0.1). - - - --source-port source_port - The source Greenplum Database master port number. If not specified, the - default is 5432. - - - --source-user source_user - The user ID that is used to connect to the source Greenplum Database system. - If not specified, the default is gpadmin. - - - --jobs int - The number processes that gpcopy runs in parallel. The - default is 4. The range is from 1 to 64. - The option --jobs produces - 2*n+1 database connections. The - default value, 4, creates 9 connections. - If you increase this option, ensure that Greenplum Database systems are - configured with a sufficient maximum concurrent connection value to - accommodate the gpcopy connections and other concurrent - connections such as user connections. See the Greenplum Database server - configuration parameter max_connections. - - - --on-segment-threshold int - Specifies the number of rows that determines when gpcopy - copies a table using the Greenplum Database source and destination master - instead of the source and destination segment instances. The default value - is 10000 rows. If a table contains 10000 rows or less, the table is copied - using the Greenplum Database master. - The value -1 disables copying tables using the master. All - tables are copied using the segment instances. - For smaller tables, copying tables using the Greenplum Database master is - more efficient than using segment instances. - - - --parallelize-leaf-partitions - If specified, the utility copies the leaf partition tables of a partitioned - table in parallel. The default is to copy the partitioned table as a single - table based on the root partition table. - If the --validate option is also specified, the utility - validates each leaf partition table during the copy process and then - validates the entire partitioned table. - This option cannot be specified with --include-table-json - if the JSON file includes a sql: key that queries a - partitioned table. - - - --data-port-range - lower_port-upper_port - A range of port numbers to use on Greenplum Database destination hosts for - data transfer. This applies to destination segment hosts or, if data is - transferred using the master segment, only the master segment host. - gpcopy uses the first available port specified in the - range (inclusive). lower_port must be greater than or - equal to 1024 (to avoid reserved system ports), and - upper_port must be a greater value. - The number of ports specified by the range must be greater than or equal to - the number of parallel processes created with --jobs, if - specified. - If --data-port-range is not specified, then - gpcopy uses any available port. - - -
    -
    - Options for Configuring how Data is Copied -

    gpcopy provides additional options that affect the way data is - copied between systems. You must use one of these options to specify how to manage - data in the destination database: --skip-existing, - --truncate, --drop, or - --append. Other options can be included as necessary, for - example, to perform additional data validation.

    - - - --skip-existing - Specify this option to skip copying a table from the source database if the - table already exists in the destination database. - At most, only one of the options can be specified: - --skip-existing, --truncate, - --drop, or --append. - - - --truncate - Specify this option to truncate the table that is in the destination - database if it already exists. - At most, only one of the options can be specified: - --skip-existing, --truncate, - --drop, or --append. - - - --drop - Specify this option to drop the table that is in the destination database if - it already exists. Before copying table data, gpcopy drops - the table and creates it again. - At most, only one of the options can be specified: - --skip-existing, --truncate, - --drop, or --append. - - - --append - Append data to the table in the destination database if it already exists. - At most, only one of the options can be specified: - --skip-existing, --truncate, - --drop, or --append. - - - --analyze - Run the ANALYZE command on non-system tables. The default - is to not run the ANALYZE command. The operation is - performed for each table after the table data is copied. - - - --no-compression - If specified, data is transferred without compression. By default, - gpcopy compresses data during transfer from the source - to the destination database when copying data to a different host. - The utility does not compress data when copying data to the same host. - - - --no-distribution-check - Specify this option to disable table data distribution checking. By default, - gpcopy performs data distribution checking to ensure - data is distributed to segment instances correctly. If distribution checking - fails, the table copy fails. - The utility does not support table data distribution checking when copying a - partitioned table that is defined with a leaf table that is an external - table or if a leaf table is defined with a distribution policy that is - different from the root partitioned table. Before you - perform a gpcopy operation with the - --no-distribution-check option, ensure that you - have a backup of the destination database and that the distribution - policies of the tables that are being copied are the same in the source - and destination database. Copying data into segment instances with - incorrect data distribution can cause incorrect query results and can - cause database corruption. - - - --truncate-source-after - Specify this option to truncate the table that is in the source database - after gpcopy copies the table and validates the table data - in the destination database. - If you specify this option, you must also specify the - --validate option. - - --yes - Optional. Automatic confirmation to truncate source table data - after copying and validating table data. The prompt to truncate - source tables does not appear. The default is to prompt to - confirm truncating source table data after copying and - validating the data. - - - - - --validate type - Perform data validation on the table data in the destination database after - the table data is copied. These are the supported type of validation.
      -
    • count - compares row counts between source and - destination table data.
    • -
    • md5xor - calculates the MD5 value of all rows, then - performs an XOR over the MD5 values
    • -
    - If you specify the --append option, and the destination - table contains data, validation fails for the table. - If validation for a table fails, gpcopy rolls back the - destination table. -
    -
    -
    -
    - Additional Options - - - --dry-run - When you specify this option, gpcopy generates a list of - the migration operations that would have been performed with the specified - options. The data is not migrated. - The information is displayed at the command line and written to the log - file. - - - --quiet - If specified, suppress status messages at the command prompt. The messages - are sent only to the log file. Higher level messages such as warning and - error messages are still displayed. - This option cannot be specified with the --debug - option. - - - --debug - If specified, debug messages are displayed at the command prompt. - This option cannot be specified with the --quiet - option. - - - --version - Displays the version of this utility. - - - --help - Displays the online help. - - -
    -
    - Notes -

    If a gpcopy command specifies an invalid option, or specifies a - source table or database that does not exist, the utility returns an error and - quits. No data is copied.

    -

    The source table data can change while the data is being copied. A lock is not - acquired on the source table when data is copied.

    -

    The utility cannot copy a row with a width greater than 1GB (a PostgreSQL - limitation).

    -

    If you copy a set of database tables with the --dbname, - --include-table, --include-table-file, or - --include-table-json options, and the destination database does - not exist, the utility creates the database before copying the tables. If the - destination database exists, the utility creates the tables in the database if - required.

    -

    The gpcopy utility does not copy dependent database objects unless - you specify the --full option. For example, if a table has a - default value on a column that is a user-defined function, that function must exist - in the destination system database when using the --dbname, - --include-table, --include-table-file, or - --include-table-json options.

    -

    When copying tables, sequences defined for tables are considered table data and are - copied. A sequence is copied if a table is created with a serial column or if a - sequence is specified as a default value. The sequences are reset if you specify the - --truncate option.

    -

    The utility re-creates table indexes only with --full or - --drop options.

    -

    The gpcopy utility does not copy configuration files such as - postgresql.conf and pg_hba.conf. You must set - up the destination system configuration separately.

    -

    The gpcopy utility does not copy external objects such as Greenplum - Database extensions, third party jar files, and shared object files. You must - install the external objects separately.

    -

    gpcopy does not currently support SSL encryption for its - connections.

    - -

    Specifying Table Names with Special Characters

    -

    When you list a table with the option --include-table - or --exclude-table, and the table name or schema name contains - single quote ('), double quote (") , or - backslash (\), you must escape the character with a backslash - (\). These are the escaped characters: - \', \", \\. If the table - or schema name contains a period (.), comma - (,), or a space character, you must enclose the entire name in - single quotes (') and enclose the table and schema in double - quotes (") at the command prompt shell. In this example, the - fully qualified name of the table is - testdb.schema'1.table"test.--include-table 'testdb."schema\'1"."table\"test"'

    -

    For fully qualified table names listed in a file that is used with the - options --include-table-file, - --include-table-json, or - --exclude-table-file, if the table name or schema name - contains a period (.), the name must be enclosed in double - quotes ("). In this example, the table - table.test is in the testdb, and belongs - to the schema - schema.1.testdb."schema.1"."table.test"

    -
    - -

    Copying Partitioned Tables

    -

    When copying data for a partitioned table, if a leaf partition has been exchanged - with an external table, that leaf partition is created, but data is not copied.

    -

    If you specify copying leaf partitions of a partitioned table with an option such - as --include-table or --exclude-table, - gpcopy creates the partitioned table if it does not exist. - Only the data for the specified leaf partitions are added to the partitioned - table. Specifying individual leaf partitions is useful when the entire - partitioned tabled does not need to be copied.

    -

    gpcopy does not support table data distribution checking when - copying a partitioned table that is defined with a leaf table that is an - external table or if a leaf table is defined with a distribution policy that is - different from the root partitioned table. You can copy those tables in a - gpcopy operation and specify the option - --no-distribution-check to disable checking of data - distribution.

    - Before you perform a gpcopy operation with the - --no-distribution-check option, ensure that you have a - backup of the destination database and that the distribution policies of the - tables that are being copied are the same in the source and destination - database. Copying data into segment instances with incorrect data distribution - can cause incorrect query results and can cause database corruption. -
    - -

    Handling gpcopy Errors

    -

    When gpcopy encounters errors and quits or is cancelled by the - user, current copy operations on tables in the destination database are rolled - back. Copy operations that have completed are not rolled back.

    -

    If an error occurs during the process of copying a table, or table validation - fails, gpcopy continues copying the other specified tables. - After gpcopy finishes, it displays a list of tables where - errors occurred or validation failed and displays a gpcopy - command. You can use the provided command to retry copying the failed tables.

    -

    The gpcopy utility logs messages in log file - gpcopy_date.log in the - ~/gpAdminLogs directory on the master host. If you run - multiple gpcopy commands on the same day, the utility appends - messages to that day's log file.

    -

    After gpcopy completes, it displays a summary of the operations - performed. If the utility fails to copy tables, they are highlighted in summary, - and there is a gpcopy command provided in summary for user to - just copy the failed tables. The information is displayed at the command prompt - and in the gpcopy log file.

    -

    After resolving the issues that caused the copy operations to fail, you can run - the provided command to copy the tables that failed in the previous - gpcopy command.

    -
    - -

    Database Connections Created by gpcopy

    -

    The option --jobs produces - 2*n+1 database connections. The default - value 4, creates 9 connections.

    -

    If you increase this option, ensure that both source and destination Greenplum - Database is configured with a sufficient maximum concurrent connection value to - accommodate the gpcopy connections and other concurrent - connections such as user connections. See the Greenplum Database server - configuration parameter max_connections in - the Greenplum Database Reference Guide.

    -
    -
    -
    - Examples -

    This command - copies all user created databases in a source system to a destination system with - the --full option. And drops the table and creates it again if it - already exists in the destination.

    - gpcopy --source-host mytest --source-port 1234 --source-user gpuser \ - --dest-host demohost --dest-port 1234 --dest-user gpuser \ - --full --drop -

    This command copies the specified databases in a source system to a - destination system with the --dbname option. The - --truncate option truncates the table data before copying table - data from the source table.

    - gpcopy --source-host mytest --source-port 1234 --source-user gpuser \ - --dest-host demohost --dest-port 1234 --dest-user gpuser \ - --dbname database1, database2 --truncate -

    This command copies the specified tables in a source system to a - destination system with the --include-table option. The - --skip-existing option skips the table if it already exists in - the destination database.

    - gpcopy --source-host mytest --source-port 1234 --source-user gpuser \ - --dest-host demohost --dest-port 1234 --dest-user gpuser \ - --include-table database.schema.table1, database.schema.table2 --skip-existing -

    This command copies the tables from the source database to the destination - system, excluding the tables specified in the specified tables in - /home/gpuser/mytables with - --exclude-table-file option. The --truncate - option truncates tables that already exist in the destination system. With the - options --analyze and --validate count, the - utility performs an ANALYZE operation on the copied tables, and validates the copied - table data by comparing row counts between source and destination tables.

    - gpcopy --source-host mytest --source-port 1234 --source-user gpuser \ - --dest-host demohost --dest-port 1234 --dest-user gpuser \ - --dbname database1 --exclude-table-file /home/gpuser/mytables \ - --truncate --analyze --validate count -

    This command specifies the --full and - --metadata-only options to copy the complete database schema, - including all tables, indexes, views, user-defined types (UDT), and user-defined - functions (UDF) from all the source databases. No data is copied, The - --drop option specifies that the table is dropped in the - destination database before it is created again if the table exists in both the - source and destination database.

    - gpcopy --source-host mytest --source-port 1234 --source-user gpuser \ - --dest-host demohost --dest-port 1234 --dest-user gpuser \ - --full --metadata-only --drop -

    This command copies the specified databases in a source system to a - destination system with the --dbname option and specifies 8 - parallel processes with the --jobs option. The command specifies - the --truncate option to truncate the table and create it again if - it already exists in the destination database, and uses uses ports in the range - 2000-2010 for the parallel process connections.

    - gpcopy --source-host mytest --source-port 1234 --source-user gpuser \ - --dest-host demohost --dest-port 1234 --dest-user gpuser \ - --dbname database1, database2 --truncate --jobs 8 --data-port-range 2000-2010 -

    This command - copies the specified database in a source system to a destination system with the - --dbname option and specifies 16 parallel processes with the - --jobs option. The --truncate option truncates - the table and creates it again if it already exists in the destination database. The - --truncate-source-after option truncates the tables in source - database after that table data has been validated in the destination database.

    - gpcopy --source-host mytest --source-port 1234 --source-user gpuser \ - --dest-host demohost --dest-port 1234 --dest-user gpuser \ - --dbname database1 --truncate --jobs 16 --truncate-source-after --validate count -

    In the previous example, if --truncate was not specified and the - destination table contained data, validation would fail.

    -

    This is an example table file that uses regular expressions.

    -

    - "test1.arc/.*/./.*/" -"test1.c/(..)/y./.*/" -

    -

    In the first line, the regular expressions for the schemas, arc/.*/, - and for the tables, /.*/, limit the transfer to all tables with the - schema names that start with arc.

    -

    In the second line, the regular expressions for the schemas, - c/(..)/y, and for the tables, /.*/, limit the - transfer to all tables with the schema names that are four characters long and that - start with c and end with y, for example, - crty.

    -

    When the command is run, tables in the database test1 that satisfy - either condition are copied to the destination database.

    -
    -
    - See Also -

    For information about migrating data, see the Greenplum Database Administrator - Guide.

    -
    + The gpcopy utility is available as a separate download for the + commercial release of Pivotal Greenplum Database. See the Pivotal + gpcopy Documentation.