From 58b61bd3a2e247d4d728940f4f9ded666d29c023 Mon Sep 17 00:00:00 2001 From: Mel Kiyama Date: Fri, 15 Sep 2017 13:53:31 -0700 Subject: [PATCH] docs: COPY command - add PROGRAM clause (#3297) * docs: COPY command add PROGRAM clause * docs: copy - edits from review comments. --- .../dita/ref_guide/config_params/guc-list.xml | 22 +++++-- gpdb-doc/dita/ref_guide/sql_commands/COPY.xml | 64 ++++++++++++++----- .../utility_guide/client_utilities/psql.xml | 12 ++-- 3 files changed, 71 insertions(+), 27 deletions(-) diff --git a/gpdb-doc/dita/ref_guide/config_params/guc-list.xml b/gpdb-doc/dita/ref_guide/config_params/guc-list.xml index b838fe17b0..f8aa1433b5 100644 --- a/gpdb-doc/dita/ref_guide/config_params/guc-list.xml +++ b/gpdb-doc/dita/ref_guide/config_params/guc-list.xml @@ -1166,8 +1166,7 @@ DEBUG5 -

DEBUG4

DEBUG3

DEBUG2

DEBUG1

LOG - NOTICE

WARNING

ERROR

FATAL

PANIC

+

DEBUG4

DEBUG3

DEBUG2

DEBUG1

LOG

NOTICE

WARNING

ERROR

FATAL

PANIC

NOTICE master

session

reload

@@ -3616,7 +3615,8 @@

If the value is false, the distribution policy is not checked. The data added to the table might violate the table distribution policy for the segment instance. - Manual redistribution of table data might be required.

+ Manual redistribution of table data might be required. See the ALTER TABLE + clause WITH REORGANIZE.

The parameter can be set for a database system or a session. The parameter cannot be set for a specific database.

@@ -4878,7 +4878,9 @@ gp_resource_group_cpu_limit - Resource group-based workload management is an experimental feature and is not intended for use in a production environment. Experimental features are subject to change without notice in future releases. + Resource group-based workload management is an experimental feature and + is not intended for use in a production environment. Experimental features are subject to + change without notice in future releases.

Identifies the maximum percentage of system CPU resources to allocate to resource groups on each Greenplum Database segment node.

@@ -4907,7 +4909,9 @@ gp_resource_group_memory_limit - Resource group-based workload management is an experimental feature and is not intended for use in a production environment. Experimental features are subject to change without notice in future releases. + Resource group-based workload management is an experimental feature and + is not intended for use in a production environment. Experimental features are subject to + change without notice in future releases.

Identifies the maximum percentage of system memory resources to allocate to resource groups on each Greenplum Database segment node.

@@ -4936,7 +4940,9 @@ gp_resource_manager - Resource group-based workload management is an experimental feature and is not intended for use in a production environment. Experimental features are subject to change without notice in future releases. + Resource group-based workload management is an experimental feature and + is not intended for use in a production environment. Experimental features are subject to + change without notice in future releases.

Identifies the resource management scheme currently enabled in the Greenplum Database cluster. The default scheme is workload management using resource queues.

@@ -7213,7 +7219,9 @@ max_resource_groups - Resource group-based workload management is an experimental feature and is not intended for use in a production environment. Experimental features are subject to change without notice in future releases. + Resource group-based workload management is an experimental feature and + is not intended for use in a production environment. Experimental features are subject to + change without notice in future releases.

Sets the maximum number of resource groups that you can create in a Greenplum Database system. Resource groups are defined system-wide.

diff --git a/gpdb-doc/dita/ref_guide/sql_commands/COPY.xml b/gpdb-doc/dita/ref_guide/sql_commands/COPY.xml index db3d097797..9e0b2f2f86 100644 --- a/gpdb-doc/dita/ref_guide/sql_commands/COPY.xml +++ b/gpdb-doc/dita/ref_guide/sql_commands/COPY.xml @@ -7,7 +7,7 @@

Copies data between a file and a table.

Synopsis - COPY table [(column [, ...])] FROM {'file' | STDIN} + COPY table [(column [, ...])] FROM {'file' | PROGRAM 'command' | STDIN}      [ [WITH] [ON SEGMENT] [BINARY] @@ -23,7 +23,7 @@      [[LOG ERRORS]        SEGMENT REJECT LIMIT count [ROWS | PERCENT] ] -COPY {table [(column [, ...])] | (query)} TO {'file' | STDOUT} +COPY {table [(column [, ...])] | (query)} TO {'file' | PROGRAM 'command' | STDOUT}       [ [WITH] [ON SEGMENT] [BINARY] @@ -123,6 +123,26 @@ COPY {table [(column [, ...])] | (query)} The absolute path name of the input or output file. + + PROGRAM 'command' + Specify a command to execute. The command must be specified from + the viewpoint of the Greenplum Database master host system, and must be executable by + the Greenplum Database administrator user (gpadmin). The COPY + FROM command reads the input from the standard output of the command, and for + the COPY TO command, the output is written to the standard input of the + command. + The command is invoked by a shell. When passing arguments to the + shell, strip or escape any special characters that have a special meaning for the shell. + For security reasons, it is best to use a fixed command string, or at least avoid + passing any user input in the string. + When ON SEGMENT is specified, the command must be executable on all + Greenplum Database primary segment hosts by the Greenplum Database administrator user + (gpadmin). The command is executed by each Greenplum segment + instance. The <SEGID> is required in the + command. + See the ON SEGMENT clause for information about command syntax + requirements and he data that is copied when the clause is specified. + STDIN Specifies that input comes from the client application. The ON @@ -170,6 +190,10 @@ COPY {table [(column [, ...])] | (query)} instance. + When the PROGRAM command clause is specified, the + <SEGID> string literal is required in the + command, the <SEG_DATA_DIR> string literal is + optional. See Examples. For a COPY FROM...ON SEGMENT command, the table distribution policy is checked when data is copied into the table. By default, an error is returned if a data row violates the table distribution policy. You can disable the distribution policy @@ -378,7 +402,8 @@ COPY {table [(column [, ...])] | (query)} COPY FROM...ON SEGMENT was run.

If you run COPY FROM...ON SEGMENTand the server configuration parameter gp_enable_segment_copy_checking is false, manual - redistribution of table data might be required. + redistribution of table data might be required. See the ALTER TABLE clause + WITH REORGANIZE.

When you specify the LOG ERRORS clause, Greenplum Database captures errors that occur while reading the external table data. You can view and manage the captured error log data.

@@ -583,12 +608,9 @@ COPY {table [(column [, ...])] | (query)} isolation mode and log errors:

COPY sales FROM '/home/usr1/sql/sales_data' LOG ERRORS SEGMENT REJECT LIMIT 10 ROWS; -

To copy segment data for later use, use the ON SEGMENT argument. Use of - the COPY TO ON SEGMENT argument takes the form:

-

COPY - table TO - '<SEG_DATA_DIR>/gpdumpname<SEGID>_suffix' ON - SEGMENT;

+

To copy segment data for later use, use the ON SEGMENT clause. Use of the + COPY TO ON SEGMENT command takes the form:

+ COPY table TO '<SEG_DATA_DIR>/gpdumpname<SEGID>_suffix' ON SEGMENT;

The <SEGID> is required. However, you can substitute an absolute path for the <SEG_DATA_DIR> string literal in the path.

When you pass in the string literal <SEG_DATA_DIR> and @@ -597,14 +619,10 @@ COPY {table [(column [, ...])] | (query)}

For example, if you have mytable with the segments and mirror segments like this:contentid | dbid | file segment location - 0 | 1 |/home/usr1/data1/gpsegdir0 - + 0 | 1 | /home/usr1/data1/gpsegdir0 0 | 3 | /home/usr1/data_mirror1/gpsegdir0 - 1 | 4 | /home/usr1/data2/gpsegdir1 - - 1 | 2 | /home/usr1/data_mirror2/gpsegdir1 -running + 1 | 2 | /home/usr1/data_mirror2/gpsegdir1 running the command:COPY mytable TO '<SEG_DATA_DIR>/gpbackup<SEGID>.txt' ON SEGMENT; would result in the following @@ -624,6 +642,22 @@ COPY {table [(column [, ...])] | (query)} necessary.Tools such as gpfdist can be used to restore data. The backup/restore tools will not work with files that were manually generated with COPY TO ON SEGMENT.

+

This example copies the data from the lineitem table and uses the + PROGRAM clause to add the data to the + /tmp/lineitem_program.csv file with cat utility. The + file is placed on the Greenplum Database + master.COPY LINEITEM TO PROGRAM 'cat > /tmp/lineitem.csv' CSV;

+

This example uses the PROGRAM and ON SEGEMENT clauses to + copy data to files on the segment hosts. On the segment hosts, the COPY + command replaces <SEGID> with the segment content ID to create a file + for each segment instance on the segment + host.COPY LINEITEM TO PROGRAM 'cat > /tmp/lineitem_program<SEGID>.csv' ON SEGMENT CSV;

+

This example uses the PROGRAM and ON SEGEMENT clauses to + copy data from files on the segment hosts. The COPY command replaces + <SEGID> with the segment content ID when copying data from the files. + On the segment hosts, there must be a file for each segment instance where the file name + contains the segment content ID on the segment host. + COPY LINEITEM_4 FROM PROGRAM 'cat /tmp/lineitem_program<SEGID>.csv' ON SEGMENT CSV;

Compatibility diff --git a/gpdb-doc/dita/utility_guide/client_utilities/psql.xml b/gpdb-doc/dita/utility_guide/client_utilities/psql.xml index 85323fd597..db0bcdcf28 100644 --- a/gpdb-doc/dita/utility_guide/client_utilities/psql.xml +++ b/gpdb-doc/dita/utility_guide/client_utilities/psql.xml @@ -336,9 +336,9 @@ testdb=# \copy {table [(column_list)] | - (query)} {from | to} {filename | stdin | stdout - | pstdin | pstdout} [with] [binary] [oids] [delimiter [as] - 'character'] [null [as] 'string'] [csv [header] + (query)} {from | to} {'filename' | stdin | + stdout | pstdin | pstdout} [with] [binary] [oids] [delimiter [as] + 'character'] [null [as] 'string'] [csv [header] [quote [as] 'character'] [escape [as] 'character'] [force quote column_list] [force not null column_list]] Performs a frontend (client) copy. This is an operation that runs an SQL @@ -497,8 +497,10 @@ testdb=# Lists all database roles, or only those that match pattern. - \dx [extension_pattern] | \dx+ [extension_pattern] - Lists all installed extensions, or only those that match the pattern. \dx and \dx+ are functionally equivalent. + \dx [extension_pattern] | \dx+ + [extension_pattern] + Lists all installed extensions, or only those that match the pattern. + \dx and \dx+ are functionally equivalent. \e | \edit [filename] -- GitLab