From b960d9811097f066128ec615b6e320f9a1970c7f Mon Sep 17 00:00:00 2001 From: Mel Kiyama Date: Wed, 25 Apr 2018 14:38:22 -0700 Subject: [PATCH] docs: gpbackup/gprestore S3 plugin (#4881) * docs: gpbackup/gprestore S3 plugin -add gpbackup/gprestore --plugin-config option -add S3 plugin information -other minor fixes: add index as object, support table data and metadata for --jobs > 1 PR for 5X_STABLE Will be ported to MAIN * docs: review updates for gpbackup/gprestore S3 plugin -moved S3 links to Notes section -changed name from S3 plugin to S3 storage plugin -removed draft comments * docs: gpbackup s3 plugin change binary plugin name to gpbackup_s3_plugin * docs: s3 plugin - fix typo --- .../admin_guide/managing/backup-gpbackup.xml | 50 ++++--- .../admin_guide/managing/backup-s3-plugin.xml | 140 ++++++++++++++++++ .../dita/admin_guide/managing/backup.ditamap | 1 + .../admin_utilities/gpbackup.xml | 49 +++--- .../admin_utilities/gprestore.xml | 45 +++--- 5 files changed, 215 insertions(+), 70 deletions(-) create mode 100644 gpdb-doc/dita/admin_guide/managing/backup-s3-plugin.xml diff --git a/gpdb-doc/dita/admin_guide/managing/backup-gpbackup.xml b/gpdb-doc/dita/admin_guide/managing/backup-gpbackup.xml index 1c7f8ceb66..f47e838214 100644 --- a/gpdb-doc/dita/admin_guide/managing/backup-gpbackup.xml +++ b/gpdb-doc/dita/admin_guide/managing/backup-gpbackup.xml @@ -53,8 +53,8 @@ than 1).
  • You cannot use the --exclude-table-file with --leaf-partition-data. Although you can specify leaf partition names - in a file specified with --exclude-table-file, gpbackup - ignores the partition names.
  • + in a file specified with --exclude-table-file, + gpbackup ignores the partition names.
  • Incremental backups are not supported.
  • @@ -90,6 +90,7 @@
  • Sequences
  • Comments
  • Tables
  • +
  • Indexes
  • Owners
  • Writable External Tables (DDL only)
  • Readable External Tables (DDL only)
  • @@ -183,7 +184,8 @@ gpbackup_20180105112754_metadata.sql gpbackup_20180105112754_toc.yaml

    To consolidate all backup files into a single directory, include the - --backup-dir option. Note that you must specify an absolute path with this + --backup-dir option. Note that you must specify an absolute path with + this option:$ gpbackup --dbname demo --backup-dir /home/gpadmin/backups 20171103:15:31:56 gpbackup:gpadmin:0ee2f5fb02c9:017586-[INFO]:-Starting backup of database demo ... @@ -220,8 +222,8 @@ $ gprestore --timestamp 20171103152558 --create-db 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Restoring post-data metadata from /gpmaster/gpsne-1/backups/20171103/20171103152558/gpbackup_20171103152558_postdata.sql 20171103:15:45:45 gprestore:gpadmin:0ee2f5fb02c9:017714-[INFO]:-Post-data metadata restore complete

    If you specified a custom --backup-dir to consolidate the backup files, - include the same --backup-dir option when using gprestore - to locate the backup + include the same --backup-dir option when using + gprestore to locate the backup files:$ dropdb demo $ gprestore --backup-dir /home/gpadmin/backups/ --timestamp 20171103153156 --create-db 20171103:15:51:02 gprestore:gpadmin:0ee2f5fb02c9:017819-[INFO]:-Restore Key = 20171103153156 @@ -230,9 +232,9 @@ $ gprestore --backup-dir /home/gpadmin/backups/ --timestamp 20171103153156 --

    gprestore does not attempt to restore global metadata for the Greenplum System by default. If this is required, include the --with-globals argument.

    -

    By default, gprestore uses 1 connection to restore table data. If you - have a large backup set, you can improve performance of the restore by increasing the - number of parallel connections with the --jobs option. For +

    By default, gprestore uses 1 connection to restore table data and + metadata. If you have a large backup set, you can improve performance of the restore by + increasing the number of parallel connections with the --jobs option. For example:$ gprestore --backup-dir /home/gpadmin/backups/ --timestamp 20171103153156 --create-db --jobs 8

    Test the number of parallel connections with your backup set to determine the ideal number for fast data recovery.

    @@ -308,8 +310,8 @@ $ gpbackup --dbname demo --exclude-schema twitter

    or multiple --exclude-schema options. For example:$ gpbackup --dbname demo --include-schema wikipedia --include-schema twitter

    To filter the individual tables that are included in a backup set, or excluded from a - backup set, specify individual tables with the --include-table option or the - --exclude-table option. The table must be schema qualified, + backup set, specify individual tables with the --include-table option or + the --exclude-table option. The table must be schema qualified, <schema-name>.<table-name>. The individual table filtering options can be specified multiple times. However, --include-table and --exclude-table cannot both be used in the same command.

    @@ -330,11 +332,11 @@ water.tonic

    example:$ gpbackup --dbname demo --include-table-file /home/gpadmin/table-list.txt

    You can combine -include schema with --exclude-table or --exclude-table-file for a backup. This example uses - --include-schema with --exclude-table to back up a schema - except for a single table.

    + --include-schema with --exclude-table to back up a + schema except for a single table.

    $ gpbackup --dbname demo --include-schema mydata --exclude-table mydata.addresses -

    You cannot combine --include-schema with --include-table or - --include-table-file, and you cannot combine +

    You cannot combine --include-schema with --include-table + or --include-table-file, and you cannot combine --exclude-schema with any table filtering option such as --exclude-table or --include-table.

    Filtering by Leaf Partition

    By default, @@ -377,8 +379,8 @@ CREATE TABLE

    To name:public.sales_1_prt_oct17 public.sales_1_prt_nov17 public.sales_1_prt_dec17

    Then - specify the file with the --include-table-file option to generate one data - file per leaf + specify the file with the --include-table-file option to generate one + data file per leaf partition:$ gpbackup --dbname demo --include-table-file last-quarter.txt --leaf-partition-data

    When you specify --leaf-partition-data, gpbackup generates one data file per leaf partition when backing up a partitioned table. For example, this command @@ -386,10 +388,10 @@ public.sales_1_prt_dec17

    Then partition:$ gpbackup --dbname demo --include-table public.sales --leaf-partition-data

    When leaf partitions are backed up, the leaf partition data is backed up along with the metadata for the entire partitioned table.

    You cannot use the - --exclude-table-file option with --leaf-partition-data. - Although you can specify leaf partition names in a file specified with - --exclude-table-file, gpbackup ignores the partition - names.
    + --exclude-table-file option with + --leaf-partition-data. Although you can specify leaf partition names in a + file specified with --exclude-table-file, gpbackup + ignores the partition names.
    Filtering with gprestore

    After creating a backup set with gpbackup, you can filter the schemas @@ -407,8 +409,8 @@ public.sales_1_prt_dec17

    Then gprestore does not create roles or set the owner of the tables. The utility restores table indexes and rules. Triggers are also restored but are not supported in Greenplum Database. -

  • The file that you specify with --include-table-file cannot include a - leaf partition name, as it can when you specify this option with +
  • The file that you specify with --include-table-file cannot include + a leaf partition name, as it can when you specify this option with gpbackup. If you specified leaf partitions in the backup set, specify the partitioned table to restore the leaf partition data.

    When restoring a backup set that contains data from some leaf partitions of a partitioned table, the @@ -675,8 +677,8 @@ public.sales_1_prt_dec17

    Then <seg_dir>/backups/YYYYMMDD/YYYYMMDDHHMMSS/.

    If you specify a custom backup directory, segment data files are copied to this same file path as a subdirectory of the backup directory. If you include the - --leaf-partition-data option, gpbackup creates one data - file for each leaf partition of a partitioned table, instead of just one table for + --leaf-partition-data option, gpbackup creates one + data file for each leaf partition of a partitioned table, instead of just one table for file.

    Each data file uses the file name format gpbackup_<content_id>_<YYYYMMDDHHMMSS>_<oid>.gz where:

      + + + Using the S3 Storage Plugin with gpbackup and gprestore + + The S3 storage plugin for gpbackup and + gprestore is an experimental feature and is not intended for + use in a production environment. Experimental features are subject to change without notice in + future releases. +

      The S3 storage plugin application lets you use an Amazon Simple Storage Service (Amazon S3) + location to store and retrieve backups when you run gpbackup and + gprestore. Amazon S3 provides secure, durable, highly-scalable object + storage.

      +

      To use the S3 storage plugin application, you specify the location of the plugin and the + Amazon Web Services (AWS) login and backup location in a configuration file. When you run + gpbackup or gprestore, you specify the configuration file + with the option --plugin-config. For information about the configuration + file, see .

      +

      If you perform a backup operation with the gpbackup option + --plugin-config, you must also specify the --plugin-config + option when you restore the backup with gprestore.

      +
      + S3 Storage Plugin Configuration File Format +

      The configuration file specifies the absolute path to the Greenplum Database S3 storage + plugin executable, AWS connection credentials, and S3 location.

      +

      The S3 storage plugin configuration file uses the YAML 1.1 document format and implements its own + schema for specifying the location of the Greenplum Database S3 storage plugin, AWS + connection credentials, and S3 location and login information.

      +

      The configuration file must be a valid YAML document. The gpbackup and + gprestore utilities process the control file document in order and use + indentation (spaces) to determine the document hierarchy and the relationships of the + sections to one another. The use of white space is significant. White space should not be + used simply for formatting purposes, and tabs should not be used at all.

      +

      This is the structure of a S3 storage plugin configuration file.

      + executablepath: <absolute-path-to-gpbackup_s3_plugin> +options: + region: <aws-region> + aws_access_key_id: <aws-user-id> + aws_secret_access_key: <aws-user-id-key> + bucket: <s3-bucket> + backupdir: <s3-location> + + + executablepath + Required. Absolute path to the plugin executable. For example, the Pivotal Greenplum + Database installation location is $GPHOME/bin/gpbackup_s3_plugin. + + + + options + Required. Begins the S3 storage plugin options section. + + region + Required. The AWS region. + + + aws_access_key_id + Required. The AWS S3 ID to access the S3 bucket location that stores + backup files. + + + aws_secret_access_key + Required. AWS S3 passcode for the S3 ID to access the S3 bucket + location. + + + bucket + Required. The name of the S3 bucket in the AWS region. The bucket must + exist. + + + backupdir + Required. The S3 location for backups. During a backup operation, the + plugin creates the S3 location if it does not exist in the S3 bucket. + + + + + + + +
      +
      + Example +

      This is an example S3 storage plugin configuration file that is used in the next + gpbackup example command. The name of the file is + s3-test-config.yaml.

      + executablepath: $GPHOME/bin/gpbackup_s3_plugin +options: + region: us-west-2 + aws_access_key_id: test-s3-user + aws_secret_access_key: asdf1234asdf + bucket: gpdb-backup + backupdir: test/backup3 +

      This gpbackup example backs up the database demo using the S3 storage + plugin. The absolute path to the S3 storage plugin configuration file is + /home/gpadmin/s3-test.gpbackup --dbname demo --single-data-file --plugin-config /home/gpadmin/s3-test-config.yaml

      +

      The S3 storage plugin writes the backup files to this S3 location in the AWS region + us-west-2.

      +

      + gpdb-backup/test/backup3/backups/YYYYMMDD/YYYYMMDDHHMMSS/ +

      +
      +
      + Notes +

      The S3 storage plugin application must be in the same location on every Greenplum Database + host. The configuration file is required only on the master host.

      +

      When running gpbackup, the --plugin-config option is + supported only with --single-data-file or + --metadata-only.

      +

      When you perform a backup with the S3 storage plugin, the plugin stores the backup files in + this location in the S3 bucket.

      + <backupdir>/backups/<datestamp>/<timestamp> +

      Where backupdir is the location you specified in the S3 configuration + file, and datestamp and timestamp are the backup date + and time stamps.

      +

      Using Amazon S3 to back up and restore data requires an Amazon AWS account with access to + the Amazon S3 bucket. These are the Amazon S3 bucket permissions required for backing up and + restoring data.

        +
      • Upload/Delete for the S3 user ID that uploads the + files
      • +
      • Open/Download and View for + the S3 user ID that accesses the files
      • +

      +

      For information about Amazon S3, see Amazon S3.

        +
      • For information about Amazon S3 regions and endpoints, see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region.
      • +
      • For information about S3 buckets and folders, see the Amazon S3 documentation https://aws.amazon.com/documentation/s3/.
      • +

      +
      + +
      diff --git a/gpdb-doc/dita/admin_guide/managing/backup.ditamap b/gpdb-doc/dita/admin_guide/managing/backup.ditamap index d72651b1d2..19c3adf719 100644 --- a/gpdb-doc/dita/admin_guide/managing/backup.ditamap +++ b/gpdb-doc/dita/admin_guide/managing/backup.ditamap @@ -15,6 +15,7 @@ + diff --git a/gpdb-doc/dita/utility_guide/admin_utilities/gpbackup.xml b/gpdb-doc/dita/utility_guide/admin_utilities/gpbackup.xml index f02d7c4a11..2866b8ad45 100644 --- a/gpdb-doc/dita/utility_guide/admin_utilities/gpbackup.xml +++ b/gpdb-doc/dita/utility_guide/admin_utilities/gpbackup.xml @@ -20,6 +20,7 @@ [--leaf-partition-data] [--metadata-only] [--no-compression] + [--plugin-config config_file_location [--quiet] [--single-data-file] [--verbose] @@ -60,11 +61,10 @@ Options - -dbname + --dbname database_name Required. Specifies the database to back up. - --backup-dir directory @@ -78,7 +78,6 @@ specify a custom backup directory, files are copied to these paths in subdirectories of the backup directory. - --compression-level level @@ -86,18 +85,15 @@ files. The default is 1. Note that gpbackup uses compression by default. - --data-only Optional. Backs up only the table data into CSV files, but does not backup metadata files needed to recreate the tables and other database objects. - --debug Optional. Displays verbose debug messages during operation. - --exclude-schema schema_name @@ -123,7 +119,6 @@ See for more information. - --exclude-table-file file_name @@ -141,7 +136,6 @@ See for more information. - --include-schema schema_name @@ -154,7 +148,6 @@ href="../../admin_guide/managing/backup-gpbackup.xml#topic_et4_b5d_tbb"/> for more information. - --include-table schema.table @@ -172,7 +165,6 @@ See for more information. - --include-table-file file_name @@ -191,7 +183,6 @@ See for more information. - --leaf-partition-data Optional. For partitioned tables, creates one data file per leaf partition instead of @@ -201,23 +192,33 @@ combination with --exclude-table-file or --exclude-table. - --metadata-only Optional. Creates only the metadata files (DDL) needed to recreate the database objects, but does not back up the actual table data. - --no-compression Optional. Do not compress the table data CSV files. - + + --plugin-config + config-file_location + Specify the location of the gpbackup plugin configuration file, a + YAML-formatted text file. The file contains configuration information for the plugin + application that gpbackup uses during the backup operation. + This option is supported only with --single-data-file or + --metadata-only. + If you specify the --plugin-config option when you back up a + database, you must specify this option with configuration information for a + corresponding plugin application when you restore the database from the backup. + For information about using the S3 storage plugin application, see . + --quiet Optional. Suppress all non-warning, non-error log messages. - --single-data-file Optional. Create a single data file on each segment host for all tables backed up on @@ -225,20 +226,17 @@ for each table that is backed up on the segment.If you use the --single-data-file option to combine table backups into a single file per segment, you cannot set the gprestore option - -jobs to a value higher than 1 to perform a parallel restore + --jobs to a value higher than 1 to perform a parallel restore operation. - --verbose Optional. Print verbose log messages. - --version Optional. Print the version number and exit. - --with-stats Optional. Include query plan statistics in the backup set. @@ -262,18 +260,18 @@
      Examples

      Backup all schemas and tables in the "demo" database, including global Greenplum Database - system objects statistics:$ gpbackup -dbname demo

      + system objects statistics:$ gpbackup --dbname demo

      Backup all schemas and tables in the "demo" database except for the "twitter" - schema:$ gpbackup -dbname demo --exclude-schema twitter

      + schema:$ gpbackup --dbname demo --exclude-schema twitter

      Backup only the "twitter" schema in the "demo" - database:$ gpbackup -dbname demo --include-schema twitter

      + database:$ gpbackup --dbname demo --include-schema twitter

      Backup all schemas and tables in the "demo" database, including global Greenplum Database system objects and query statistics, and copy all backup files to the /home/gpadmin/backup - directory:$ gpbackup -dbname demo --with-stats --backup-dir /home/gpadmin/backup

      + directory:$ gpbackup --dbname demo --with-stats --backup-dir /home/gpadmin/backup

      This example uses --include-schema with --exclude-table to back up a schema except for a single table.

      - $ gpbackup -dbname demo --include-schema mydata --exclude-table mydata.addresses + $ gpbackup --dbname demo --include-schema mydata --exclude-table mydata.addresses

      You cannot use the option --exclude-schema with a table filtering option such as --include-table.

      @@ -281,7 +279,8 @@ See Also

      ,

      + href="../../admin_guide/managing/backup-gpbackup.xml" format="dita"/> and

  • diff --git a/gpdb-doc/dita/utility_guide/admin_utilities/gprestore.xml b/gpdb-doc/dita/utility_guide/admin_utilities/gprestore.xml index bc791ddcc8..829f7260dd 100644 --- a/gpdb-doc/dita/utility_guide/admin_utilities/gprestore.xml +++ b/gpdb-doc/dita/utility_guide/admin_utilities/gprestore.xml @@ -20,6 +20,7 @@ [--include-table schema.table] [--include-table-file file_name] [--jobs int] + [--plugin-config config_file_location [--quiet] [--redirect-db database_name] [--verbose] @@ -57,8 +58,8 @@ gprestore. By default, only database objects in the backup set are restored.

    Performance of restore operations can be improved by creating multiple parallel connections - to restore table data. By default gprestore uses 1 connection, but you can - increase this number with the --jobs option for large restore + to restore table data and metadata. By default gprestore uses 1 connection, + but you can increase this number with the --jobs option for large restore operations.

    When a restore operation completes, gprestore returns a status code. See .

    @@ -82,7 +83,6 @@ <seg_dir>/backups/YYYYMMDD/YYYYMMDDhhmmss/ directory of each segment host. - --backup-dir directory @@ -96,19 +96,16 @@ this option when you specify a custom backup directory with gpbackup. - --create-db Optional. Creates the database before restoring the database object metadata. The database is created by cloning the empty standard system database template0. - --debug Optional. Displays verbose debug messages during operation. - --exclude-schema schema_name @@ -142,7 +139,6 @@ You cannot combine this option with the option --exclude-schema, or another a table filtering option such as --include-table. - --include-schema schema_name @@ -156,8 +152,9 @@ in the database. You cannot use this option if objects in the backup set have dependencies on multiple schemas. + See Filtering the Contents of a Backup or Restore for more information. - --include-table schema.table @@ -187,47 +184,52 @@ See for more information. - --jobs int Optional. Specifies the number of parallel connections to use when restoring table - data. By default, gprestore uses 1 connection. Increasing this number - can improve the speed of restoring data.If you used the gpbackup - --single-data-file option to combine table backups into a single file per - segment, you cannot set --jobs to a value higher than 1 to perform a - parallel restore operation. + data and metadata. By default, gprestore uses 1 connection. Increasing + this number can improve the speed of restoring data.If you used the + gpbackup --single-data-file option to combine table backups into a + single file per segment, you cannot set --jobs to a value higher than + 1 to perform a parallel restore operation. + + + --plugin-config + config-file_location + Specify the location of the gpbackup plugin configuration file, a + YAML-formatted text file. The file contains configuration information for the plugin + application that gprestore uses during the restore operation. + If you specify the --plugin-config option when you back up a + database, you must specify this option with configuration information for a + corresponding plugin application when you restore the database from the backup. + For information about using the S3 storage plugin application, see . - --quiet Optional. Suppress all non-warning, non-error log messages. - --redirect-db database_name Optional. Restore to the specified database_name instead of to the database that was backed up. - --verbose Optional. Print verbose log messages. - --version Optional. Print the version number and exit. - --with-globals Optional. Restores Greenplum Database system objects in the backup set, in addition to database objects. See . - --with-stats Optional. Restore query plan statistics from the backup set. @@ -272,7 +274,8 @@ $ gprestore --include-schema wikipedia --backup-dir /home/gpadmin/backups/ --tim See Also

    ,

    + href="../../admin_guide/managing/backup-gpbackup.xml" format="dita"/> and

    -- GitLab