提交 5f05af7d 编写于 作者: D dyozie

Docs: Reducing the amount of pivotal-specific gpcopy docs in the oss repo

上级 d7d09b27
......@@ -86,7 +86,7 @@
<chapter href="admin_guide/managing/backup.ditamap" format="ditamap"/>
<chapter href="admin_guide/expand/expand.ditamap" navtitle="Expanding a Greenplum System"
format="ditamap"/>
<chapter href="admin_guide/managing/gpcopy-migrate.xml"/>
<chapter href="admin_guide/managing/gpcopy-migrate.xml" otherprops="pivotal"/>
<chapter href="admin_guide/managing/monitor.xml" navtitle="Monitoring a Greenplum System"/>
<chapter href="admin_guide/managing/maintain.xml" navtitle="Routine System Maintenance Tasks"/>
<chapter href="admin_guide/monitoring/monitoring.dita"/>
......
......@@ -10,289 +10,9 @@
another Greenplum database. You can migrate the entire contents of a database, or just
selected tables. The clusters can have different Greenplum Database versions. For example, you
can use <codeph>gpcopy</codeph> to migrate data from Greenplum 5 to Greenplum 6.</p>
<note> The <codeph>gpcopy</codeph> utility is available only in the commercial release of
Pivotal Greenplum Database.</note>
<p>The <codeph>gpcopy</codeph> interface includes options to transfer one or more full
databases, or one or more database tables. A full database transfer includes the database
schema, table data, indexes, views, roles, user-defined functions, resource queues, and
resource groups. If a copied table or database does not exist in the destination cluster,
<codeph>gpcopy</codeph> creates it automatically, along with indexes as necessary.</p>
<p>Configuration files, including <codeph>postgresql.conf</codeph> and
<codeph>pg_hba.conf</codeph>, must be transferred manually by an administrator. Extensions
installed in the database with <codeph>gppkg</codeph>, such as MADlib and programming language
extensions, must be installed in the destination database by an administrator. </p>
<p><codeph>gpcopy</codeph> is a command-line tool that includes these features:<ul
id="ul_ibs_vsp_zdb">
<li><codeph>gpcopy</codeph> can migrate data between Greenplum Database systems where the
source and destination systems are configured with a different number of segment
instances.</li>
<li><codeph>gpcopy</codeph> provides detailed reporting and summary information about all
aspects of the copy operation.</li>
<li>
<p><codeph>gpcopy</codeph> allows the source table data to change while the data is being
copied. A lock is not acquired on the source table when data is copied. </p>
</li>
<li>The <codeph>gpcopy</codeph> utility includes the
<codeph>--truncate-source-after</codeph> option to help migrate data from one Pivotal
Greenplum Database system to another on the same hardware, requiring minimal free space
available.</li>
</ul></p>
<note> The <codeph>gpcopy</codeph> utility is available as a separate download for the
commercial release of Pivotal Greenplum Database. See the <xref
href="https://gpdb.docs.pivotal.io/gpcopy" format="html" scope="external">Pivotal gpcopy
Documentation</xref>.</note>
</body>
<topic id="topic_psq_dsp_zdb">
<title>Prerequisites</title>
<body>
<p>The source and destination Greenplum Database systems must already exist, have network
access between all hosts, and have master host and primary segment hosts in both
systems.</p>
<p><codeph>gpcopy</codeph> is dependent on the <codeph>pg_dump</codeph>,
<codeph>pg_dumpall</codeph>, and <codeph>psql</codeph> utilities installed with Greenplum
Database. In most cases, you run <codeph>gpcopy</codeph> from a Greenplum Database cluster,
so the dependencies are automatically met. If you need to run <codeph>gpcopy</codeph> on a
remote server, such as an ETL system, copy the <codeph>gpcopy</codeph> binary and install a
compatible <xref href="https://network.pivotal.io/products/pivotal-gpdb" scope="external"
format="html">Greenplum Clients</xref> package to meet the <codeph>gpcopy</codeph>
dependencies.</p>
<p><codeph>gpcopy</codeph> supports migrating data from a Greenplum 5 cluster to a Greenplum
Database 6 cluster.</p>
<p><codeph>gpcopy</codeph> does not currently support SSL encryption for its connections.</p>
</body>
</topic>
<topic id="topic_qwl_2rp_zdb">
<title>Limitations for the Source and Destination Systems</title>
<body>
<p>If you are copying data between Greenplum Database clusters having different versions, each
cluster must have <codeph>gpcopy</codeph> installed locally. <codeph>gpcopy</codeph> is
installed with Pivotal Greenplum Database starting with version 5.9.</p>
<p><codeph>gpcopy</codeph> transfers data from user databases only; the
<codeph>postgres</codeph>, <codeph>template0</codeph>, and <codeph>template1</codeph>
databases cannot be transferred. Administrators must transfer configuration files manually
and install extensions into the destination database with <codeph>gppkg</codeph>.</p>
<p><codeph>gpcopy</codeph> cannot copy a row that is larger than 1GB in size.</p>
<p><codeph>gpcopy</codeph> does not support table data distribution checking when copying a
partitioned table that is defined with a leaf table that is an external table or if a leaf
table is defined with a distribution policy that is different from the root partitioned
table. You can copy those tables in a <codeph>gpcopy</codeph> operation and specify the
<codeph>--no-distribution-check</codeph> option to disable checking of data distribution. </p>
<note type="warning">Before you perform a <codeph>gpcopy</codeph> operation with the
<codeph>--no-distribution-check</codeph> option, ensure that you have a backup of the
destination database and that the distribution policies of the tables that are being copied
are the same in the source and destination database. Copying data into segment instances
with incorrect data distribution can cause incorrect query results and can cause database
corruption.</note>
<p>When transferring data between databases, you can run only one instance of
<codeph>gpcopy</codeph> at a time. Running multiple, concurrent instances of
<codeph>gpcopy</codeph> is not supported.</p>
</body>
</topic>
<topic id="topic_ay1_frp_zdb">
<title>Configuring Parallel Jobs</title>
<body>
<p>The degree of parallelism when running <codeph>gpcopy</codeph> is determined the option
<codeph>--jobs</codeph>. The option controls the number processes that
<codeph>gpcopy</codeph> runs in parallel. The default is 4. The range is from 1 to 64. </p>
<p>The <codeph>--jobs</codeph> value, <varname>n</varname>, produces
<codeph>2*<varname>n</varname>+1</codeph> database connections. For example, the default
<codeph>--jobs</codeph> value of 4 creates 9 connections.</p>
<p>If you increase this option, ensure that the Greenplum Database systems are configured with
a sufficient maximum concurrent connection value to accommodate the <codeph>gpcopy</codeph>
connections and any other concurrent connections (such as user connections) that you
require. See the Greenplum Database server configuration parameter
<codeph>max_connections</codeph>.</p>
</body>
</topic>
<topic id="topic_nd3_2sp_zdb">
<title>Validating Copied Data</title>
<body>
<p>By default, <codeph>gpcopy</codeph> does not validate the data transferred. You can request
validation using the <codeph>--validate=<i>type</i></codeph> option. You must include
<codeph>--validate=<i>type</i></codeph> if you specify the
<codeph>--truncate-source-after</codeph> option. The validation <i>type</i> can be one of
the following:<ul id="ul_hf1_k21_xdb">
<li><codeph>count</codeph> - compares the row counts between the source and destination
tables.</li>
<li><codeph>md5xor</codeph> - validates by selecting all rows of the source and
destination tables, converting all columns in a row to text, and then calculating the
md5 value of each row. <codeph>gpcopy</codeph> then performs an XOR over the MD5 values
to ensure that all rows were successfully copied for the table.</li>
</ul></p>
<note>Avoid using <codeph>--append</codeph> with either validation option. If you use
<codeph>--append</codeph> and the source table already includes rows, then either
validation method will fail due to the different number of rows in the source and
destination tables. </note>
</body>
</topic>
<topic id="topic_ytw_2sp_zdb">
<title>Addressing Failed Data Transfers</title>
<body>
<p>When <codeph>gpcopy</codeph> encounters errors and quits or is cancelled by the user,
current copy operations on tables in the destination database are rolled back. Copy
operations that have completed are not rolled back. </p>
<p>If an error occurs during the process of copying a table, or table validation fails,
<codeph>gpcopy</codeph> continues copying the other specified tables. After
<codeph>gpcopy</codeph> finishes, it displays a list of tables where errors occurred or
validation failed and displays a <codeph>gpcopy</codeph> command. You can use the provided
command to retry copying the failed tables. </p>
<p>The <codeph>gpcopy</codeph> utility logs messages in log file
<codeph>gpcopy_<varname>date</varname>.log</codeph> in the
<codeph>~/gpAdminLogs</codeph> directory on the master host. If you run multiple
<codeph>gpcopy</codeph> commands on the same day, the utility appends messages to that
day's log file. </p>
<p>After <codeph>gpcopy</codeph> completes, it displays a summary of the operations performed.
If the utility fails to copy tables, they are highlighted in summary, and there is a
<codeph>gpcopy</codeph> command provided in summary for user to just copy the failed
tables. The information is displayed at the command prompt and in the
<codeph>gpcopy</codeph> log file. </p>
<p>After resolving the issues that caused the copy operations to fail, you can run the
provided command to copy the tables that failed in the previous <codeph>gpcopy</codeph>
command. </p>
</body>
</topic>
<topic id="topic_ekp_rmp_zdb">
<title>Performing a Basic Data Migration</title>
<body>
<p>Follow this procedure to migrate data from one Greenplum Database system to another with
<codeph>gpcopy</codeph>:<ol id="ol_zc4_b34_b2b">
<li>Start both the source and destination clusters.</li>
<li>Perform a full database backup in the source Greenplum Database system. See <xref
href="backup-gpbackup.xml#topic_yrr_hqw_sbb"/>.</li>
<li>As a best practice, source the <codeph>greenplum_path.sh</codeph> file in the source
Greenplum 5 installation, so that you execute <codeph>gpcopy</codeph> from the source
system. For
example:<codeblock>$ source /usr/local/greenplum-db-5.20.0/greenplum_path.sh</codeblock></li>
<li>Use <codeph>gpcopy</codeph> with the <codeph>--full</codeph> option to migrate your
data to the destination system. A full migration automatically copies all database
objects including tables, indexes, views, roles, functions, user defined types (UDT),
resource queues, and resource groups for all user defined databases. Include the
<codeph>--drop</codeph> to drop any destination tables that may already exist
(recreating es as necessary). For example:
<codeblock>gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--full --drop --validate count</codeblock><p>With
the above command, the utility drops tables in the destination database
(<codeph>--drop</codeph> option) and uses the row count of the source and
destination tables to validate the data transfer (<codeph>--validate count</codeph>
option). The other <codeph>gpcopy</codeph> options specify the source and destination
Greenplum Database system master hosts, ports, and the User ID to use to connect to
the Greenplum Database systems. <note>While the example command performs a full system
copy, consider migrating only portions of the your data at a time, so that you can
reduce downtime while addressing table errors or validation failures that may occur
during the copy operation.</note></p></li>
<li>The <systemoutput>gpcopy</systemoutput> utility does not copy configuration files such
as <systemoutput>postgresql.conf</systemoutput> and
<systemoutput>pg_hba.conf</systemoutput>. You must set up the destination system
configuration as necessary to match the source system.</li>
<li>The <systemoutput>gpcopy</systemoutput> utility does not copy external objects such as
Greenplum Database extensions, third party jar files, and shared object files. You must
recreate these external objects as necessary to match the source system. </li>
<li>After migrating data you may need to modify SQL scripts, administration scripts, and
user-defined functions as necessary to account for changes in Greenplum Database version
6.0. See the <xref scope="external" format="html"
href="https://gpdb.docs.pivotal.io/latest/relnotes/">Pivotal Greenplum 6.0 Release
Notes</xref> for features or changes that may necessitate post-migration tasks.</li>
</ol></p>
<p>See the <xref href="../../utility_guide/admin_utilities/gpcopy.xml">gpcopy reference
page</xref> for complete syntax and usage information. </p>
</body>
</topic>
<topic id="topic_pyc_hpp_zdb">
<title>Migrating Data Between Clusters that Share Hardware</title>
<body>
<p>In order to migrate data between two clusters on the same hardware, you should have enough
free disk space to accommodate over 5 times the original data set. This enables you to
maintain 2 full copies of the primary and mirror data sets (on the source and destination
systems), plus the original backup data in ASCII format. </p>
<p>If you attempt to migrate on the same system but you run out of disk space, the
<codeph>gpcopy</codeph> utility provides the <codeph>--truncate-source-after</codeph>
option to help you complete the operation with only a minimum of free disk space. The
<codeph>--truncate-source-after</codeph> option instructs the utility to truncate each
source table after successfully copying the table data to the destination cluster and
validating that the copy succeeded.</p>
<p>If you choose to use <codeph>--truncate-source-after</codeph>, consider the following:<ul
id="ul_g5k_22q_zdb">
<li>Using the <codeph>--truncate-source-after</codeph> option does not allow for an easy
rollback to the source system to its original condition if errors occur or validation
checks fail during the <codeph>gpcopy</codeph> operation. Table errors or validation
failures during the <codeph>gpcopy</codeph> operation can leave some tables remaining in
the source cluster, while other tables may be empty (having been truncated after being
copied to the new cluster). Back up all source data before using <codeph>gpcopy</codeph>
with <codeph>--truncate-source-after</codeph>.</li>
<li>Migrating data with <codeph>--truncate-source-after</codeph> still requires an amount
of free disk space equal to the sum of the largest tables that you will migrate in a
single batch using <codeph>gpcopy</codeph>. For example, with a <codeph>--jobs</codeph>
setting of 5, you must ensure that you have free space equal to the sum of the 5 largest
tables copied in the batch. The procedure below provides sample commands to determine
the largest table sizes.</li>
<li>You must use the <codeph>--validate</codeph> option with
<codeph>--truncate-source-after</codeph> to ensure that data is successfully copied
before source tables are truncated.</li>
</ul></p>
<p>If you attempt to use the instructions in <xref href="#topic_ekp_rmp_zdb" format="dita"/>
to migrate systems that use the same hardware, but you do not have the required free
space:<ol id="ul_zdm_bm4_b2b">
<li>Start both the source and destination clusters. </li>
<li>Perform a full database backup in the source Greenplum Database system. See <xref
href="backup-gpbackup.xml#topic_yrr_hqw_sbb"/>.</li>
<li>Determine if you have enough free space to migrate your data using
<codeph>--truncate-source-after</codeph>. Migrating data "in-place" requires an amount
of free disk space equal to the sum of the largest tables that you will migrate in a
single batch using <codeph>gpcopy</codeph>. For example, if you want to use a
<codeph>--jobs</codeph> setting of 5, ensure that you have free space equal to the sum
of the 5 largest tables copied in the batch. <p>The following query lists the largest 5
tables in your source system; modify the query as needed depending on the
<codeph>--jobs</codeph> setting you intend to
use:<codeblock>gpadmin=# SELECT n.nspname, c.relname, c.relstorage, pg_relation_size(c.oid)
FROM
pg_class c JOIN pg_namespace n ON (c.relnamespace=n.oid)
JOIN pg_catalog.gp_distribution_policy p ON (c.oid = p.localoid)
WHERE
n.nspname NOT IN ('gpexpand', 'pg_bitmap', 'information_schema', 'gp_toolkit')
AND n.nspname NOT LIKE 'pg_temp_%%' AND c.relstorage &lt;> 'v'
ORDER BY 4 DESC LIMIT 5;
</codeblock></p><p>Either
free enough disk space to cover the sum of the table sizes shown in the above query,
or consider using a smaller <codeph>--jobs</codeph> value to reduce the free space
requirements.</p></li>
<li>As a best practice, source the <codeph>greenplum_path.sh</codeph> file in the source
Greenplum 5 installation, so that you execute <codeph>gpcopy</codeph> from the source
system. For
example:<codeblock>$ source /usr/local/greenplum-db-5.20.0/greenplum_path.sh</codeblock></li>
<li>Use a <codeph>gpcopy</codeph> with the <codeph>--truncate-source-after</codeph> and
<codeph>--validate</codeph> options to migrate your data to the destination system. A
full migration automatically copies all database objects including tables, es, views,
roles, functions, user defined types (UDT), resource queues, and resource groups for all
user defined databases. Include the <codeph>--drop</codeph> to drop any destination
tables that may already exist (recreating es as necessary). The
<codeph>--truncate-source-after</codeph> truncates each source table, only after
copying and validating the table data in the destination system. For example:
<codeblock>gpcopy --source-host my_host --source-port 1234 --source-user gpuser \
--dest-host my_host --dest-port 1235 --dest-user gpuser --full --drop \
-truncate-source-after --analyze --validate count</codeblock><p>The
above command performs a full database copy, first dropping tables in the destination
database (<codeph>--drop</codeph> option) if they already exist.
<codeph>gpcopy</codeph> truncates each table in the source system only after
successfully copying and validating the table data in the destination system. The
other <codeph>gpcopy</codeph> options specify the source and destination Greenplum
Database system master hosts, ports, and the User ID to use to connect to the
Greenplum Database systems.
<note>While the example command performs a full system copy, consider migrating only
portions of the your data at a time, so that you can reduce downtime while
addressing table errors or validation failures that may occur during the copy
operation.</note></p></li>
<li>The <systemoutput>gpcopy</systemoutput> utility does not copy configuration files such
as <systemoutput>postgresql.conf</systemoutput> and
<systemoutput>pg_hba.conf</systemoutput>. You must set up the destination system
configuration as necessary to match the source system.</li>
<li>The <systemoutput>gpcopy</systemoutput> utility does not copy external objects such as
Greenplum Database extensions, third party jar files, and shared object files. You must
recreate these external objects as necessary to match the source system. </li>
<li>After migrating data you may need to modify SQL scripts, administration scripts, and
user-defined functions as necessary to account for changes in Greenplum Database version
6.0. See the <xref scope="external" format="html"
href="https://gpdb.docs.pivotal.io/latest/relnotes/">Pivotal Greenplum 6.0 Release
Notes</xref> for features and changes that may necessitate post-migration tasks.</li>
</ol></p>
<p>See the <xref href="../../utility_guide/admin_utilities/gpcopy.xml">gpcopy reference
page</xref> for complete syntax and usage information. </p>
</body>
</topic>
</topic>
......@@ -16,7 +16,7 @@
/></li>
<li><xref href="backup-main.xml#backup-main"/></li>
<li><xref href="../expand/expand-main.xml"/></li>
<li><xref href="gpcopy-migrate.xml"/></li>
<li otherprops="pivotal"><xref href="gpcopy-migrate.xml"/></li>
<li><xref href="../ddl/ddl.xml#topic1"/></li>
<li><xref href="maintain.xml#topic1"/></li>
</ul></p>
......
......@@ -7,864 +7,9 @@
<body>
<p>The <codeph>gpcopy</codeph> utility copies objects from databases in a source Greenplum
Database system to databases in a destination Greenplum Database system. </p>
<note> The <codeph>gpcopy</codeph> utility is available only in the commercial release of
Pivotal Greenplum Database.</note>
<section id="section2">
<title>Synopsis</title>
<codeblock><b>gpcopy</b>
{ <b>--full</b> |
{ <b>--dbname</b> <varname>database1</varname>[, <varname>database2</varname> ... ]
[ <b>--dest-dbname</b> <varname>dest-db1</varname>[, <varname>dest-db2</varname> ... ] ] } |
<b>--include-table</b> <varname>db</varname>.<varname>schema</varname>.<varname>table</varname>[, <varname>db</varname>.<varname>schema1</varname>.<varname>table1</varname> ... ]
[ <b>--dest-table</b> <varname>db</varname>.<varname>schema</varname>.<varname>table</varname>[, <varname>db</varname>.<varname>schema1</varname>.<varname>table1</varname> ... ] |
<b>--include-table-file</b> <varname>table-file1</varname>
[ <b>--include-table-file</b> <varname>table-file2</varname>] ... ] |
<b>--include-table-json</b> <varname>json-table-file1</varname>
[ <b>--include-table-json</b> <varname>json-table-file2</varname>] ... ] }
[ <b>--metadata-only</b> ]
[ <b>--exclude-table</b> <varname>db</varname>.<varname>schema</varname>.<varname>table</varname>[, <varname>db</varname>.<varname>schema1</varname>.<varname>table1</varname> ... ] ]
[ <b>--exclude-table-file</b> <varname>table-file1</varname> ]
[ <b>--exclude-table-file</b> <varname>table-file2</varname> ] ... ] ]
{ <b>--dest-host</b> <varname>dest_host</varname> [ <b>--dest-port</b> <varname>dest_port</varname> ]
[ <b>--dest-user</b> <varname>dest_user</varname> ] }
[ <b>--source-host</b> <varname>source_host</varname> [ <b>--source-port</b> <varname>source_port</varname> ]
[ <b>--source-user</b> <varname>source_user</varname> ] ]
[ <b>--jobs</b> <varname>int</varname> ]
[ <b>--on-segment-threshold</b> <varname>int</varname> ]
[ <b>--parallelize-leaf-partitions</b> ]
[ <b>--data-port-range</b> <varname>lower_port</varname>-<varname>upper_port</varname> ]
{ <b>--skip-existing</b> | <b>--truncate</b> | <b>--drop</b> | <b>--append</b> }
[ <b>--analyze</b> ]
[ <b>--no-compression</b> ]
[ <b>--no-distribution-check</b> ]
[ <b>--truncate-source-after</b> [<b>--yes</b> ] ]
[ <b>--validate</b> <varname>type</varname> ]
[ <b>--dry-run</b> ]
[ <b>--quiet</b> | <b>--debug</b> ]
<b>gpcopy --version</b>
<b>gpcopy</b> <b>--help</b></codeblock>
</section>
<section id="section3">
<title>Description</title>
<p>The <codeph>gpcopy</codeph> utility copies database objects from a source Greenplum
Database system to a destination system. You can perform one of the following types
of copy operations: </p>
<ul>
<li id="pk138459">Copy a Greenplum Database system with the <codeph>--full</codeph>
option. This option copies all database objects including, tables, table data,
indexes, views, users, roles, functions, and resource queues for all
user-defined databases to a different destination system. </li>
<li id="pk138461">Copy a set of user-defined database tables to a destination
system. <ul id="ul_ly3_v5c_vdb">
<li>The <codeph>--dbname</codeph> option copies all user-defined tables,
table data, and re-creates the table indexes from specified
databases.</li>
<li>The <codeph>--include-table</codeph>
<codeph>--include-table-file</codeph>, or
<codeph>--include-table-json</codeph> option copies a specified set
of user-defined tables, table data, and re-creates the table indexes. </li>
<li>The <codeph>--exclude-table</codeph> and
<codeph>--exclude-table-file</codeph> options exclude a specified
set of user-defined tables and table data to be copied. </li>
</ul></li>
<li>Copy only the database schemas with the <codeph>--metadata-only</codeph>
option.</li>
</ul>
<p>When running <codeph>gpcopy</codeph>, you must specify the data to copy from the
source database and how to manage data in the destination database.<ul
id="ul_ehn_fvx_vdb">
<li>You must use one and only one of these options to specify the data to be
copied from the source database: <codeph>--full</codeph>,
<codeph>--dbname</codeph>, <codeph>--include-table</codeph>,
<codeph>--include-table-file</codeph>, or
<codeph>--include-table-json</codeph>. </li>
<li>You must use one of these options to specify how to manage data in the
destination database: <codeph>--skip-existing</codeph>,
<codeph>--truncate</codeph>, <codeph>--drop</codeph>, or
<codeph>--append</codeph>.<p>If you specify both the
<codeph>--append</codeph> and <codeph>--validation</codeph> options,
validation of source table data fails if a destination table contains
data. </p></li>
</ul></p>
<p>If you specify the option <codeph>--truncate-source-after</codeph>, you must also
specify the <codeph>--validate</codeph> option. When
<codeph>--truncate-source-after</codeph> is specified, <codeph>gpcopy</codeph>
truncates the source table after the table data is copied and destination table data
has been validated. </p>
</section>
<section>
<title>Prerequisites</title>
<p>The user IDs connecting to the source and destination Greenplum Database systems, the
<codeph>--source-user</codeph> and <codeph>--dest-user</codeph>, must have
appropriate access to the systems.</p>
<p>If Kerberos authentication is enabled for Greenplum Database, <codeph>gpcopy</codeph>
can authenticate with Kerberos. Run the <codeph>kinit</codeph> command to obtain a
ticket-granting ticket from the KDC server before you execute
<codeph>gpcopy</codeph>. The <codeph>PGKRBSRVNAME</codeph> environment variable
specifies the Kerberos service name for Greenplum Database. If your Greenplum
Database service name is different than the default (<codeph>postgres</codeph>) set
the <codeph>PGKRBSRVNAME</codeph> environment variable with the correct service name
before you run <codeph>gpcopy</codeph>. See <xref
href="../../admin_guide/kerberos.xml#topic1"/> for information about enabling
Kerberos authentication with Greenplum Database.</p>
<p>The source and destination Greenplum Database segment hosts need to be able to
communicate with each other. To ensure that the segment hosts can communicate, you
can use a tool such as the Linux <codeph>netperf</codeph> utility. </p>
<p>When the <codeph>--full</codeph> option is specified, resource groups and table
spaces are copied, however, the utility does not configure the destination system.
For example, you must configure the system to use resource groups and create the
host directories for the tablespaces. </p>
</section>
<section id="section6">
<title>Options for Choosing Data to Copy</title>
<p><codeph>gpcopy</codeph> provides a range of options to define the scope of data that
is copied. You can choose options to perform a full Greenplum cluster migration,
copy specific databases or tables, or only portions of a table using a SQL query.
Additional options enable you to exclude certain tables from being copied, or to
change the destination database into which a table's data is copied. The special
<codeph>--metadata-only</codeph> instructs <codeph>gpcopy</codeph> to create the
necessary schema for the selected source tables, but copy no table data.</p>
<p>You must use at least one of the options <codeph>--full</codeph>,
<codeph>--dbname</codeph>, <codeph>--include-table</codeph>,
<codeph>--include-table-file</codeph>, or <codeph>--include-table-json</codeph>.
Use other options as needed to exclude data from the copy or to change the
destination database for copied tables.</p>
<parml>
<plentry>
<pt>--full</pt>
<pd>This option performs a migration of a Greenplum Database source system to a
destination system. </pd>
<pd>A migration copies all database objects including, tables, indexes, views,
roles, functions, user-defined types (UDT), resource queues, and resource
groups for all user-defined databases. The default databases,
<codeph>postgres</codeph>, <codeph>template0</codeph>, and
<codeph>template1</codeph>, are not copied.</pd>
<pd>This option cannot be specified with the <codeph>--dbname</codeph>,
<codeph>--include-table</codeph>, <codeph>--include-table-file</codeph>,
or <codeph>--include-table-json</codeph> options.</pd>
</plentry>
<plentry>
<pt>--dbname <varname>database</varname></pt>
<pd>A source database to copy. To copy multiple databases to the destination
system specify a comma-separated list of databases with no spaces between
the names. All the user-defined tables and table data are copied to the
destination system. </pd>
<pd>If the source database does not exist, <codeph>gpcopy</codeph> returns an
error and quits. If a destination database does not exist a database is
created. </pd>
<pd>Not valid with the <codeph>--full</codeph>,
<codeph>--include-table</codeph>, <codeph>--include-table-file</codeph>, or
<codeph>--include-table-json</codeph> options.</pd>
<pd>Alternatively, you can copy a set of tables with the
<codeph>--include-table</codeph>, <codeph>--include-table-file</codeph>,
or <codeph>--include-table-json</codeph> option.</pd>
</plentry>
<plentry>
<pt>--dest-dbname <varname>database</varname></pt>
<pd>To copy a database to a different destination database, specify the name of
the destination database. For multiple databases, specify a comma-separated
list of databases with no spaces between the names. The number of database
names must match the number of names specified in the
<codeph>--dbname</codeph> option. The utility copies the source
databases to the destination databases in the listed order. In this example,
<codeph>db1</codeph> is copied to <codeph>destdb1</codeph>,
<codeph>db2</codeph> is copied to <codeph>destdb2</codeph>, and
<codeph>db3</codeph> is copied to <codeph>db3</codeph>.
<codeblock>gpcopy --dest-host mdw-2 --dbname=db1,db2,db3 --dest-dbname=destdb1,destdb2,db3 --drop</codeblock></pd>
<pd>If the source database does not exist, <codeph>gpcopy</codeph> returns an
error and quits. If a destination database does not exist a database is
created. </pd>
<pd>Valid only with the <codeph>--dbname</codeph> option.</pd>
</plentry>
<plentry id="include-file-gpcopy">
<pt>--include-table
<varname>db</varname>.<varname>schema</varname>.<varname>table</varname></pt>
<pd>One or more tables from the source database system to copy. You must provide
fully-qualified table names
(<varname>database</varname>.<varname>schema</varname>.<varname>table</varname>).
You cannot specify views or system catalog tables. To copy multiple tables,
include a comma-separated list of table names or use regular expressions to
describe a set of tables. You can optionally use
<codeph>--dest-table</codeph> to change the databases into which tables
in <codeph>--include-table</codeph> are copied.</pd>
<pd>You can use Go language regular expressions in the database, schema, and
table portions of the fully-qualified table name to define a set of input
tables. The regular expression pattern must be enclosed in slashes
(<codeph>/<varname>RE_pattern</varname>/</codeph>). For example,
<codeph>--include-table mytest.public.demo/.*/</codeph> specifies all
tables that begin with <codeph>demo</codeph> in the <codeph>mytest</codeph>
database in the <codeph>public</codeph> schema.</pd>
<pd>The following two examples for the <codeph>--include-table</codeph> option
are equivalent. They both specify a set of tables that begins with demo and
ends with zero or more
digits.<codeblock>--include-table testdb.schema1.demo/[0-9]*/
--include-table testdb.schema1./demo[0-9]*/</codeblock></pd>
<pd>Regular expression capture groups in the database portion of the
fully-qualified name can be referenced in <codeph>--dest-table</codeph> to
change the destination database for a table. (Using capture groups to change
the schema or table name is not currently supported.)</pd>
<pd>If the source table does not exist, <codeph>gpcopy</codeph> returns an error
and quits. </pd>
<pd>If the destination table or database does not exist, it is created. Only the
table and table data are copied, not dependent objects. Indexes are
re-created only if the <codeph>--drop</codeph> option is specified.
Dependent objects are not copied. </pd>
<pd>This option is not allowed with the options: <codeph>--full</codeph>,
<codeph>--dbname</codeph>, <codeph>--include-table-file</codeph>, or
<codeph>--include-table-json</codeph>. </pd>
</plentry>
<plentry id="dest-table-gpcopy">
<pt>--dest-table
<varname>db</varname>.<varname>schema</varname>.<varname>table</varname></pt>
<pd>(Optional.) Changes the database where tables defined with
<codeph>--include-table</codeph> are copied. Greenplum does not
currently support changing the destination schema or table name. </pd>
<pd>You must provide fully-qualified table names
(<varname>database</varname>.<varname>schema</varname>.<varname>table</varname>).
Specify multiple tables using either a comma-separated list or by
referencing regular expression capture groups that were defined with
<codeph>--include-table</codeph>. If you use a comma-separated list of
tables with <codeph>--include-table</codeph>, use the same number position
in the <codeph>--dest-table</codeph> list to change the destination database
of the corresponding table. For example, to move only the second table
provided with an option
like<codeblock>--include-table mytest.public.table1,mytest.public.table2</codeblock>use
an option similar
to:<codeblock>--dest-table production.public.table1,production.public.table2</codeblock></pd>
<pd>If you use Go language regular expressions capture groups in
<codeph>--include-table</codeph> to define a set of tables, you can
reference the capture groups in <codeph>--dest-table</codeph> to rename the
destination database. Capture groups referenced in
<codeph>--include-table</codeph> or <codeph>--dest-table</codeph> must
use forward slash delimiters (/) to indicate regular expression processing.
For example, to reference the capture group defined in the database string
of<codeblock>--include-table testdb/(\d+)/.myschema/(\d+)/.mytable/(\d+)/</codeblock>use
an option similar
to:<codeblock>--dest-table productiondb/$1/.myschema/$1/.mytable/$1/</codeblock>Note
that the capture group numbering starts at 1 and restarts for each component
of the fully-qualified table name (database, schema, and table). Therefore,
in the above example each capture group is referenced with
<codeph>/$1/</codeph> in <codeph>--dest-table</codeph>. You cannot
rename the destination schema or table name components.</pd>
<pd>If regular expression rules would cause more than one source table name to
be remapped to the same destination table, then the
<codeph>--skip-existing</codeph>, <codeph>--truncate</codeph>,
<codeph>--drop</codeph>, or <codeph>--append</codeph> options determine
how <codeph>gpcopy</codeph> handles subsequent copy requests to the existing
table.</pd>
</plentry>
<plentry>
<pt>--include-table-file <varname>table-file</varname></pt>
<pd>The location and name of a text file that defines the tables and data to
copy. To use multiple files, specify this option for each
file.<codeblock>--include-table-file &lt;<varname>path_to_file1</varname>> --include-table-file &lt;<varname>path_to_file2</varname>></codeblock></pd>
<pd>In the text file, specify a single fully qualified table per line
(<i>database.schema.table</i>). You cannot specify views or system
catalog tables.</pd>
<pd>You can use Go language regular expression syntax to select multiple tables.
See the <codeph><xref href="#topic1/dest-table-gpcopy" format="dita"
>--dest-table</xref></codeph> option for information about using
regular expressions to select tables.</pd>
<pd>This option cannot be specified with the <codeph>--full</codeph>,
<codeph>--dbname</codeph>, or <codeph>--include-table</codeph>
options.</pd>
</plentry>
<plentry>
<pt>--include-table-json <varname>json-table-file</varname></pt>
<pd>The location and name of a JSON-format file that defines the tables and data
to copy. In contrast to the text file used with
<codeph>--include-table-file</codeph>, the JSON file can include a
destination table name used to change the database into which the table data
is copied.</pd>
<pd>The JSON file that you provide must define one or more objects with
key-value pairs to describe a source table, an optional query of the source
table data to copy, and an optional destination table to indicate the
database where the data is copied. Place the name-value pairs for multiple
table objects in a JSON array. For
example:<codeblock>[
{
"source": "<varname>database</varname>.<varname>schema</varname>.<varname>table</varname>",
"sql": "<varname>query</varname>"
"dest": "<varname>database</varname>.<varname>schema</varname>.<varname>table</varname>"
},
{
... }
]</codeblock></pd>
<pd>You cannot specify views or system catalog tables as source tables.</pd>
<pd>Any query that you provide must reference a single source table and output
to the same columns of the source table.</pd>
<pd>If the query includes an <codeph>ORDER BY</codeph> clause, then the target
Greenplum cluster must be the same size (the same number of segments) as the
source cluster.</pd>
<pd>Keep in mind that certain characters must be escaped in strings in order for
them to be parsed in JSON, specifically:<simpletable frame="all"
relcolwidth="1.0* 1.0*" id="simpletable_lp3_g5r_xhb">
<sthead>
<stentry>Character</stentry>
<stentry>Escape Sequence</stentry>
</sthead>
<strow>
<stentry>Backspace</stentry>
<stentry>\b</stentry>
</strow>
<strow>
<stentry>Form Feed</stentry>
<stentry>\f</stentry>
</strow>
<strow>
<stentry>Newline</stentry>
<stentry>\n</stentry>
</strow>
<strow>
<stentry>Tab</stentry>
<stentry>\t</stentry>
</strow>
<strow>
<stentry>Double Quote</stentry>
<stentry>\"</stentry>
</strow>
<strow>
<stentry>Backslash</stentry>
<stentry>\\</stentry>
</strow>
</simpletable></pd>
<pd>If the file cannot be parsed as JSON, <codeph>gpcopy</codeph> exits with an
error.</pd>
<pd>You can use Go language regular expression syntax to select multiple tables.
Capture groups defined in the <codeph>source:</codeph> key can be referenced
in the <codeph>dest:</codeph> key to change the destination database where
the data is copied. For
example:<codeblock>[
{
"source": "testdb/(\\d+)/.myschema/(\\d+)/.mytable/(\\d+)/",
"dest": "productiondb/$1/.myschema/$1/.mytable/$1/"
},
]</codeblock></pd>
<pd>See the <codeph><xref href="#topic1/dest-table-gpcopy" format="dita"
>--dest-table</xref></codeph> option for more information about
using regular expressions and capture groups. <note>If you use regular
expressions to copy multiple tables, you cannot include the
<codeph>sql:</codeph> key.</note>
</pd>
<pd>This option cannot be specified with the <codeph>--full</codeph>,
<codeph>--dbname</codeph>, <codeph>--include-table</codeph> options. You
cannot use this option with <codeph>--parallelize-leaf-partitions</codeph>
if the JSON file includes a <codeph>sql:</codeph> key that queries a
partitioned table.</pd>
</plentry>
<plentry>
<pt>--metadata-only</pt>
<pd>Create only the schemas specified by the command. Data is not
transferred.</pd>
<pd>If specified with the <codeph>--full</codeph> option,
<codeph>gpcopy</codeph> replicates the complete database schema,
including all tables, indexes, views, user-defined types (UDT), and
user-defined functions (UDF) for the source databases. No data is
transferred. </pd>
<pd>If you specify databases with the <codeph>--dbname</codeph> option or tables
with the <codeph>--include-table</codeph>
<codeph>--include-table-file</codeph>, or
<codeph>--include-table-json</codeph> options, <codeph>gpcopy</codeph>
creates only the tables and indexes. No data is transferred.</pd>
<pd>This option cannot be used with <codeph>--truncate</codeph> option.</pd>
</plentry>
<plentry>
<pt>--exclude-table
<varname>db</varname>.<varname>schema</varname>.<varname>table</varname></pt>
<pd>A table from the source database system to exclude from transfer. The fully
qualified table name must be specified
(<varname>database</varname>.<varname>schema</varname>.<varname>table</varname>). </pd>
<pd>To exclude multiple tables, specify a comma-separated list of table
names.</pd>
<pd>A set of tables can be specified using the Go language regular expression
syntax. See the <codeph><xref href="#topic1/include-file-gpcopy"
format="dita">--include-table</xref></codeph> option for information
about using regular expressions.</pd>
<pd>Only the specified tables are excluded, not dependent objects. You cannot
specify views or system catalog tables. </pd>
<pd>This option must be specified with one of these options:
<codeph>--full</codeph>, <codeph>--dbname</codeph>,
<codeph>--include-table</codeph>, <codeph>--include-table-file</codeph>,
or <codeph>--include-table-json</codeph>. If the option
<codeph>--exclude-table</codeph> results in no table to copy, the
database or schema is not created in the destination system.</pd>
</plentry>
<plentry>
<pt>--exclude-table-file <varname>table-file</varname></pt>
<pd>The location and name of file containing a list of fully qualified table
names to exclude from copying to the destination system. In the text file,
specify a single fully qualified table per line
(<varname>database</varname>.<varname>schema</varname>.<varname>table</varname>).
To specify multiple files, specify this option for each
file.<codeblock>--exclude-table-file &lt;<varname>path_to_file1</varname>> --exclude-table-file &lt;<varname>path_to_file2</varname>></codeblock></pd>
<pd>In the file, a set of tables can be specified using the Go language regular
expression syntax. See the <codeph><xref href="#topic1/include-file-gpcopy"
format="dita">--include-table</xref></codeph> option for information
about using regular expressions.</pd>
<pd>If a source table does not exist, <codeph>gpcopy</codeph> displays a
warning. </pd>
<pd>Only the specified tables are excluded. You cannot specify views or system
catalog tables.</pd>
<pd>This option must be specified with one of these options:
<codeph>--full</codeph>, <codeph>--dbname</codeph>,
<codeph>--include-table</codeph>, <codeph>--include-table-file</codeph>,
or <codeph>--include-table-json</codeph>. If the option
<codeph>--exclude-table</codeph> results in no tables to copy, the
database or schema is not created in the destination system.</pd>
</plentry>
</parml>
</section>
<section>
<title>Connection Options</title>
<p>The following options specify connection information for the destination and source
Greenplum clusters. Only <codeph>--dest-host</codeph> is required.
<codeph>--jobs</codeph>, <codeph>--on-segment-threshold</codeph>, and
<codeph>--parallelize-leaf-partitions</codeph> affect the number of simultaneous
connections used for data transfer. <codeph>--data-port-range</codeph> defines the
ports used for data transfer to destination segments or the destination
master.<note><codeph>gpcopy</codeph> does not currently support SSL
encryption for its connections.</note></p>
<parml>
<plentry>
<pt>--dest-host <varname>dest_host</varname></pt>
<pd>Required. The destination Greenplum Database master segment hostname or IP
address.</pd>
</plentry>
<plentry>
<pt>--dest-port <varname>dest_port</varname></pt>
<pd>The destination Greenplum Database master segment port number. If
<codeph>--dest-port</codeph> is not specified, then the default is
5432.</pd>
</plentry>
<plentry>
<pt>--dest-user <varname>dest_user</varname></pt>
<pd>The user ID that is used to connect to the destination Greenplum master. If
not specified, the default is gpadmin. </pd>
</plentry>
<plentry>
<pt>--source-host <varname>source_host</varname></pt>
<pd>The source Greenplum Database master segment host name or IP address. If not
specified, the default host is the system running <codeph>gpcopy</codeph>
(127.0.0.1).</pd>
</plentry>
<plentry>
<pt>--source-port <varname>source_port</varname></pt>
<pd>The source Greenplum Database master port number. If not specified, the
default is 5432.</pd>
</plentry>
<plentry>
<pt>--source-user <varname>source_user</varname></pt>
<pd>The user ID that is used to connect to the source Greenplum Database system.
If not specified, the default is gpadmin.</pd>
</plentry>
<plentry>
<pt>--jobs <varname>int</varname></pt>
<pd>The number processes that <codeph>gpcopy</codeph> runs in parallel. The
default is 4. The range is from 1 to 64.</pd>
<pd>The option <codeph>--jobs</codeph> produces
<codeph>2*<varname>n</varname>+1</codeph> database connections. The
default value, 4, creates 9 connections.</pd>
<pd>If you increase this option, ensure that Greenplum Database systems are
configured with a sufficient maximum concurrent connection value to
accommodate the <codeph>gpcopy</codeph> connections and other concurrent
connections such as user connections. See the Greenplum Database server
configuration parameter <codeph>max_connections</codeph>.</pd>
</plentry>
<plentry>
<pt>--on-segment-threshold <varname>int</varname></pt>
<pd>Specifies the number of rows that determines when <codeph>gpcopy</codeph>
copies a table using the Greenplum Database source and destination master
instead of the source and destination segment instances. The default value
is 10000 rows. If a table contains 10000 rows or less, the table is copied
using the Greenplum Database master. </pd>
<pd>The value <codeph>-1</codeph> disables copying tables using the master. All
tables are copied using the segment instances.</pd>
<pd>For smaller tables, copying tables using the Greenplum Database master is
more efficient than using segment instances.</pd>
</plentry>
<plentry>
<pt>--parallelize-leaf-partitions</pt>
<pd>If specified, the utility copies the leaf partition tables of a partitioned
table in parallel. The default is to copy the partitioned table as a single
table based on the root partition table. </pd>
<pd>If the <codeph>--validate</codeph> option is also specified, the utility
validates each leaf partition table during the copy process and then
validates the entire partitioned table.</pd>
<pd>This option cannot be specified with <codeph>--include-table-json</codeph>
if the JSON file includes a <codeph>sql:</codeph> key that queries a
partitioned table.</pd>
</plentry>
<plentry>
<pt>--data-port-range
<varname>lower_port</varname>-<varname>upper_port</varname></pt>
<pd>A range of port numbers to use on Greenplum Database destination hosts for
data transfer. This applies to destination segment hosts or, if data is
transferred using the master segment, only the master segment host.
<codeph>gpcopy</codeph> uses the first available port specified in the
range (inclusive). <varname>lower_port</varname> must be greater than or
equal to 1024 (to avoid reserved system ports), and
<varname>upper_port</varname> must be a greater value. </pd>
<pd>The number of ports specified by the range must be greater than or equal to
the number of parallel processes created with <codeph>--jobs</codeph>, if
specified.</pd>
<pd>If <codeph>--data-port-range</codeph> is not specified, then
<codeph>gpcopy</codeph> uses any available port.</pd>
</plentry>
</parml>
</section>
<section>
<title>Options for Configuring how Data is Copied</title>
<p><codeph>gpcopy</codeph> provides additional options that affect the way data is
copied between systems. You must use one of these options to specify how to manage
data in the destination database: <codeph>--skip-existing</codeph>,
<codeph>--truncate</codeph>, <codeph>--drop</codeph>, or
<codeph>--append</codeph>. Other options can be included as necessary, for
example, to perform additional data validation.</p>
<parml>
<plentry>
<pt>--skip-existing</pt>
<pd>Specify this option to skip copying a table from the source database if the
table already exists in the destination database. </pd>
<pd>At most, only one of the options can be specified:
<codeph>--skip-existing</codeph>, <codeph>--truncate</codeph>,
<codeph>--drop</codeph>, or <codeph>--append</codeph>.</pd>
</plentry>
<plentry>
<pt>--truncate</pt>
<pd>Specify this option to truncate the table that is in the destination
database if it already exists.</pd>
<pd>At most, only one of the options can be specified:
<codeph>--skip-existing</codeph>, <codeph>--truncate</codeph>,
<codeph>--drop</codeph>, or <codeph>--append</codeph>.</pd>
</plentry>
<plentry>
<pt>--drop</pt>
<pd>Specify this option to drop the table that is in the destination database if
it already exists. Before copying table data, <codeph>gpcopy</codeph> drops
the table and creates it again. </pd>
<pd>At most, only one of the options can be specified:
<codeph>--skip-existing</codeph>, <codeph>--truncate</codeph>,
<codeph>--drop</codeph>, or <codeph>--append</codeph>.</pd>
</plentry>
<plentry>
<pt>--append</pt>
<pd>Append data to the table in the destination database if it already exists. </pd>
<pd>At most, only one of the options can be specified:
<codeph>--skip-existing</codeph>, <codeph>--truncate</codeph>,
<codeph>--drop</codeph>, or <codeph>--append</codeph>.</pd>
</plentry>
<plentry>
<pt>--analyze</pt>
<pd>Run the <codeph>ANALYZE</codeph> command on non-system tables. The default
is to not run the <codeph>ANALYZE</codeph> command. The operation is
performed for each table after the table data is copied.</pd>
</plentry>
<plentry>
<pt>--no-compression</pt>
<pd>If specified, data is transferred without compression. By default,
<codeph>gpcopy</codeph> compresses data during transfer from the source
to the destination database when copying data to a different host.</pd>
<pd>The utility does not compress data when copying data to the same host.</pd>
</plentry>
<plentry>
<pt>--no-distribution-check</pt>
<pd>Specify this option to disable table data distribution checking. By default,
<codeph>gpcopy</codeph> performs data distribution checking to ensure
data is distributed to segment instances correctly. If distribution checking
fails, the table copy fails. </pd>
<pd>The utility does not support table data distribution checking when copying a
partitioned table that is defined with a leaf table that is an external
table or if a leaf table is defined with a distribution policy that is
different from the root partitioned table. <note type="warning">Before you
perform a <codeph>gpcopy</codeph> operation with the
<codeph>--no-distribution-check</codeph> option, ensure that you
have a backup of the destination database and that the distribution
policies of the tables that are being copied are the same in the source
and destination database. Copying data into segment instances with
incorrect data distribution can cause incorrect query results and can
cause database corruption.</note></pd>
</plentry>
<plentry>
<pt>--truncate-source-after</pt>
<pd>Specify this option to truncate the table that is in the source database
after <codeph>gpcopy</codeph> copies the table and validates the table data
in the destination database. </pd>
<pd>If you specify this option, you must also specify the
<codeph>--validate</codeph> option.<parml>
<plentry>
<pt>--yes</pt>
<pd>Optional. Automatic confirmation to truncate source table data
after copying and validating table data. The prompt to truncate
source tables does not appear. The default is to prompt to
confirm truncating source table data after copying and
validating the data.</pd>
</plentry>
</parml></pd>
</plentry>
<plentry>
<pt>--validate <varname>type</varname></pt>
<pd>Perform data validation on the table data in the destination database after
the table data is copied. These are the supported type of validation.<ul
id="ul_hf1_k21_xdb">
<li><codeph>count</codeph> - compares row counts between source and
destination table data.</li>
<li><codeph>md5xor</codeph> - calculates the MD5 value of all rows, then
performs an XOR over the MD5 values</li>
</ul></pd>
<pd>If you specify the <codeph>--append</codeph> option, and the destination
table contains data, validation fails for the table.</pd>
<pd>If validation for a table fails, <codeph>gpcopy</codeph> rolls back the
destination table.</pd>
</plentry>
</parml>
</section>
<section>
<title>Additional Options</title>
<parml>
<plentry>
<pt>--dry-run</pt>
<pd>When you specify this option, <codeph>gpcopy</codeph> generates a list of
the migration operations that would have been performed with the specified
options. The data is not migrated. </pd>
<pd>The information is displayed at the command line and written to the log
file.</pd>
</plentry>
<plentry>
<pt>--quiet</pt>
<pd>If specified, suppress status messages at the command prompt. The messages
are sent only to the log file. Higher level messages such as warning and
error messages are still displayed. </pd>
<pd>This option cannot be specified with the <codeph>--debug</codeph>
option.</pd>
</plentry>
<plentry>
<pt>--debug</pt>
<pd>If specified, debug messages are displayed at the command prompt.</pd>
<pd>This option cannot be specified with the <codeph>--quiet</codeph>
option.</pd>
</plentry>
<plentry>
<pt>--version</pt>
<pd>Displays the version of this utility.</pd>
</plentry>
<plentry>
<pt>--help</pt>
<pd>Displays the online help. </pd>
</plentry>
</parml>
</section>
<section id="notes_gpcopy">
<title>Notes</title>
<p>If a <codeph>gpcopy</codeph> command specifies an invalid option, or specifies a
source table or database that does not exist, the utility returns an error and
quits. No data is copied. </p>
<p>The source table data can change while the data is being copied. A lock is not
acquired on the source table when data is copied. </p>
<p>The utility cannot copy a row with a width greater than 1GB (a PostgreSQL
limitation). </p>
<p>If you copy a set of database tables with the <codeph>--dbname</codeph>,
<codeph>--include-table</codeph>, <codeph>--include-table-file</codeph>, or
<codeph>--include-table-json</codeph> options, and the destination database does
not exist, the utility creates the database before copying the tables. If the
destination database exists, the utility creates the tables in the database if
required.</p>
<p>The <codeph>gpcopy</codeph> utility does not copy dependent database objects unless
you specify the <codeph>--full</codeph> option. For example, if a table has a
default value on a column that is a user-defined function, that function must exist
in the destination system database when using the <codeph>--dbname</codeph>,
<codeph>--include-table</codeph>, <codeph>--include-table-file</codeph>, or
<codeph>--include-table-json</codeph> options. </p>
<p>When copying tables, sequences defined for tables are considered table data and are
copied. A sequence is copied if a table is created with a serial column or if a
sequence is specified as a default value. The sequences are reset if you specify the
<codeph>--truncate</codeph> option.</p>
<p>The utility re-creates table indexes only with <codeph>--full</codeph> or
<codeph>--drop</codeph> options. </p>
<p>The <codeph>gpcopy</codeph> utility does not copy configuration files such as
<codeph>postgresql.conf</codeph> and <codeph>pg_hba.conf</codeph>. You must set
up the destination system configuration separately. </p>
<p>The <codeph>gpcopy</codeph> utility does not copy external objects such as Greenplum
Database extensions, third party jar files, and shared object files. You must
install the external objects separately. </p>
<p><codeph>gpcopy</codeph> does not currently support SSL encryption for its
connections.</p>
<sectiondiv>
<p><b>Specifying Table Names with Special Characters</b></p>
<p dir="ltr">When you list a table with the option <codeph>--include-table</codeph>
or <codeph>--exclude-table</codeph>, and the table name or schema name contains
single quote (<codeph>'</codeph>), double quote (<codeph>"</codeph>) , or
backslash (<codeph>\</codeph>), you must escape the character with a backslash
(<systemoutput>\</systemoutput>). These are the escaped characters:
<codeph>\'</codeph>, <codeph>\"</codeph>, <codeph>\\</codeph>. If the table
or schema name contains a period (<codeph>.</codeph>), comma
(<codeph>,</codeph>), or a space character, you must enclose the entire name in
single quotes (<codeph>'</codeph>) and enclose the table and schema in double
quotes (<codeph>"</codeph>) at the command prompt shell. In this example, the
fully qualified name of the table is
<codeph>testdb.schema'1.table"test</codeph>.<codeblock>--include-table 'testdb."schema\'1"."table\"test"'</codeblock></p>
<p dir="ltr">For fully qualified table names listed in a file that is used with the
options <codeph>--include-table-file</codeph>,
<codeph>--include-table-json</codeph>, or
<codeph>--exclude-table-file</codeph>, if the table name or schema name
contains a period (<codeph>.</codeph>), the name must be enclosed in double
quotes (<systemoutput>"</systemoutput>). In this example, the table
<codeph>table.test</codeph> is in the <codeph>testdb</codeph>, and belongs
to the schema
<codeph>schema.1</codeph>.<codeblock>testdb."schema.1"."table.test"</codeblock></p>
</sectiondiv>
<sectiondiv>
<p><b>Copying Partitioned Tables</b></p>
<p>When copying data for a partitioned table, if a leaf partition has been exchanged
with an external table, that leaf partition is created, but data is not copied. </p>
<p>If you specify copying leaf partitions of a partitioned table with an option such
as <codeph>--include-table</codeph> or <codeph>--exclude-table</codeph>,
<codeph>gpcopy</codeph> creates the partitioned table if it does not exist.
Only the data for the specified leaf partitions are added to the partitioned
table. Specifying individual leaf partitions is useful when the entire
partitioned tabled does not need to be copied.</p>
<p><codeph>gpcopy</codeph> does not support table data distribution checking when
copying a partitioned table that is defined with a leaf table that is an
external table or if a leaf table is defined with a distribution policy that is
different from the root partitioned table. You can copy those tables in a
<codeph>gpcopy</codeph> operation and specify the option
<codeph>--no-distribution-check</codeph> to disable checking of data
distribution. </p>
<note type="warning">Before you perform a <codeph>gpcopy</codeph> operation with the
<codeph>--no-distribution-check</codeph> option, ensure that you have a
backup of the destination database and that the distribution policies of the
tables that are being copied are the same in the source and destination
database. Copying data into segment instances with incorrect data distribution
can cause incorrect query results and can cause database corruption.</note>
</sectiondiv>
<sectiondiv>
<p><b>Handling gpcopy Errors</b></p>
<p>When <codeph>gpcopy</codeph> encounters errors and quits or is cancelled by the
user, current copy operations on tables in the destination database are rolled
back. Copy operations that have completed are not rolled back. </p>
<p>If an error occurs during the process of copying a table, or table validation
fails, <codeph>gpcopy</codeph> continues copying the other specified tables.
After <codeph>gpcopy</codeph> finishes, it displays a list of tables where
errors occurred or validation failed and displays a <codeph>gpcopy</codeph>
command. You can use the provided command to retry copying the failed tables. </p>
<p>The <codeph>gpcopy</codeph> utility logs messages in log file
<codeph>gpcopy_<varname>date</varname>.log</codeph> in the
<codeph>~/gpAdminLogs</codeph> directory on the master host. If you run
multiple <codeph>gpcopy</codeph> commands on the same day, the utility appends
messages to that day's log file. </p>
<p>After <codeph>gpcopy</codeph> completes, it displays a summary of the operations
performed. If the utility fails to copy tables, they are highlighted in summary,
and there is a <codeph>gpcopy</codeph> command provided in summary for user to
just copy the failed tables. The information is displayed at the command prompt
and in the <codeph>gpcopy</codeph> log file. </p>
<p>After resolving the issues that caused the copy operations to fail, you can run
the provided command to copy the tables that failed in the previous
<codeph>gpcopy</codeph> command. </p>
</sectiondiv>
<sectiondiv>
<p><b>Database Connections Created by gpcopy</b></p>
<p>The option <codeph>--jobs</codeph> produces
<codeph>2*<varname>n</varname>+1</codeph> database connections. The default
value 4, creates 9 connections.</p>
<p>If you increase this option, ensure that both source and destination Greenplum
Database is configured with a sufficient maximum concurrent connection value to
accommodate the <codeph>gpcopy</codeph> connections and other concurrent
connections such as user connections. See the Greenplum Database server
configuration parameter <codeph><xref
href="../../ref_guide/config_params/guc-list.xml#max_connections"
type="dita">max_connections</xref></codeph><ph otherprops="op-print"> in
the <cite>Greenplum Database Reference Guide</cite></ph>.</p>
</sectiondiv>
</section>
<section id="section7">
<title>Examples</title>
<p dir="ltr" id="docs-internal-guid-12cd43d8-6521-a9e1-6f15-4eeab6c1f3df">This command
copies all user created databases in a source system to a destination system with
the <codeph>--full</codeph> option. And drops the table and creates it again if it
already exists in the destination.</p>
<codeblock dir="ltr">gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--full --drop</codeblock>
<p dir="ltr">This command copies the specified databases in a source system to a
destination system with the <codeph>--dbname</codeph> option. The
<codeph>--truncate</codeph> option truncates the table data before copying table
data from the source table.</p>
<codeblock dir="ltr">gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--dbname database1, database2 --truncate</codeblock>
<p dir="ltr">This command copies the specified tables in a source system to a
destination system with the <codeph>--include-table</codeph> option. The
<codeph>--skip-existing</codeph> option skips the table if it already exists in
the destination database.</p>
<codeblock dir="ltr">gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--include-table database.schema.table1, database.schema.table2 --skip-existing</codeblock>
<p dir="ltr">This command copies the tables from the source database to the destination
system, excluding the tables specified in the specified tables in
<codeph>/home/gpuser/mytables</codeph> with
<codeph>--exclude-table-file</codeph> option. The <codeph>--truncate</codeph>
option truncates tables that already exist in the destination system. With the
options <codeph>--analyze</codeph> and <codeph>--validate count</codeph>, the
utility performs an ANALYZE operation on the copied tables, and validates the copied
table data by comparing row counts between source and destination tables.</p>
<codeblock dir="ltr">gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--dbname database1 --exclude-table-file /home/gpuser/mytables \
--truncate --analyze --validate count</codeblock>
<p dir="ltr">This command specifies the <codeph>--full</codeph> and
<codeph>--metadata-only</codeph> options to copy the complete database schema,
including all tables, indexes, views, user-defined types (UDT), and user-defined
functions (UDF) from all the source databases. No data is copied, The
<codeph>--drop</codeph> option specifies that the table is dropped in the
destination database before it is created again if the table exists in both the
source and destination database.</p>
<codeblock dir="ltr">gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--full --metadata-only --drop</codeblock>
<p dir="ltr">This command copies the specified databases in a source system to a
destination system with the <codeph>--dbname</codeph> option and specifies 8
parallel processes with the <codeph>--jobs</codeph> option. The command specifies
the <codeph>--truncate</codeph> option to truncate the table and create it again if
it already exists in the destination database, and uses uses ports in the range
2000-2010 for the parallel process connections.</p>
<codeblock dir="ltr">gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--dbname database1, database2 --truncate --jobs 8 --data-port-range 2000-2010</codeblock>
<p dir="ltr" id="docs-internal-guid-cbf0fcbb-893d-f28a-6bc9-3caae975b2e3">This command
copies the specified database in a source system to a destination system with the
<codeph>--dbname</codeph> option and specifies 16 parallel processes with the
<codeph>--jobs</codeph> option. The <codeph>--truncate</codeph> option truncates
the table and creates it again if it already exists in the destination database. The
<codeph>--truncate-source-after</codeph> option truncates the tables in source
database after that table data has been validated in the destination database.</p>
<codeblock dir="ltr">gpcopy --source-host mytest --source-port 1234 --source-user gpuser \
--dest-host demohost --dest-port 1234 --dest-user gpuser \
--dbname database1 --truncate --jobs 16 --truncate-source-after --validate count</codeblock>
<p>In the previous example, if <codeph>--truncate</codeph> was not specified and the
destination table contained data, validation would fail.</p>
<p>This is an example table file that uses regular expressions.</p>
<p>
<codeblock>"test1.arc/.*/./.*/"
"test1.c/(..)/y./.*/"</codeblock>
</p>
<p>In the first line, the regular expressions for the schemas, <codeph>arc/.*/</codeph>,
and for the tables, <codeph>/.*/</codeph>, limit the transfer to all tables with the
schema names that start with <codeph>arc</codeph>.</p>
<p>In the second line, the regular expressions for the schemas,
<codeph>c/(..)/y</codeph>, and for the tables, <codeph>/.*/</codeph>, limit the
transfer to all tables with the schema names that are four characters long and that
start with <codeph>c</codeph> and end with <codeph>y</codeph>, for example,
<codeph>crty</codeph>. </p>
<p>When the command is run, tables in the database <codeph>test1</codeph> that satisfy
either condition are copied to the destination database.</p>
</section>
<section id="section8">
<title>See Also</title>
<p>For information about migrating data, see the <i>Greenplum Database Administrator
Guide</i>.</p>
</section>
<note> The <codeph>gpcopy</codeph> utility is available as a separate download for the
commercial release of Pivotal Greenplum Database. See the <xref
href="https://gpdb.docs.pivotal.io/gpcopy" format="html" scope="external">Pivotal
gpcopy Documentation</xref>.</note>
</body>
</topic>
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册