提交 e57af1d6 编写于 作者: L Lisa Owen 提交者: dyozie

docs - add pxf server cfg topic, rework other cfg topics, misc edits (#7672)

* docs - add pxf server cfg topic, rework other cfg topics, misc edits

* some of the edits requested by francisco

* add Hive back in to list of JDBC SQL dbs

* sneaking in a misc unrelated formatting fix

* edits requested by david
上级 63567533
...@@ -37,7 +37,7 @@ Also during initialization, PXF populates a user configuration directory that yo ...@@ -37,7 +37,7 @@ Also during initialization, PXF populates a user configuration directory that yo
| keytabs/ | The default location for the PXF service Kerberos principal keytab file. | | keytabs/ | The default location for the PXF service Kerberos principal keytab file. |
| lib/ | The default PXF user runtime library directory. | | lib/ | The default PXF user runtime library directory. |
| logs/ | The PXF runtime log file directory. Includes `pxf-service.log` and the Tomcat-related log `catalina.out`. The `logs/` directory and log files are readable only by the `gpadmin` user. | | logs/ | The PXF runtime log file directory. Includes `pxf-service.log` and the Tomcat-related log `catalina.out`. The `logs/` directory and log files are readable only by the `gpadmin` user. |
| servers/ | The server configuration directory; each subdirectory identifies the name of a server. PXF supports a default Hadoop server configuration named `default`. The Greenplum Database administrator may configure other servers. | | servers/ | The server configuration directory; each subdirectory identifies the name of a server. The default server is named `default`. The Greenplum Database administrator may configure other servers. |
| templates/ | The configuration directory for Hadoop and object store template files. | | templates/ | The configuration directory for connector server template files. |
Refer to [Initializing PXF](init_pxf.html) and [Starting PXF](cfginitstart_pxf.html#start_pxf) for detailed information about the PXF initialization and startup commands and procedures. Refer to [Initializing PXF](init_pxf.html) and [Starting PXF](cfginitstart_pxf.html#start_pxf) for detailed information about the PXF initialization and startup commands and procedures.
...@@ -118,7 +118,7 @@ The PXF Hadoop connectors expose the following profiles to read, and in many cas ...@@ -118,7 +118,7 @@ The PXF Hadoop connectors expose the following profiles to read, and in many cas
| [Hive](hive_pxf.html) | stored as Parquet | Hive | n/a | | [Hive](hive_pxf.html) | stored as Parquet | Hive | n/a |
| [HBase](hbase_pxf.html) | Any | HBase | n/a | | [HBase](hbase_pxf.html) | Any | HBase | n/a |
You provide the profile name when you specify the `pxf` protocol on a `CREATE EXTERNAL TABLE` command to create a Greenplum Database external table that references a Hadoop file, directory, or table. For example, the following command creates an external table that specifies the profile named `hdfs:text`: You provide the profile name when you specify the `pxf` protocol on a `CREATE EXTERNAL TABLE` command to create a Greenplum Database external table that references a Hadoop file, directory, or table. For example, the following command creates an external table that uses the default server and specifies the profile named `hdfs:text`:
``` sql ``` sql
CREATE EXTERNAL TABLE pxf_hdfs_text(location text, month text, num_orders int, total_sales float8) CREATE EXTERNAL TABLE pxf_hdfs_text(location text, month text, num_orders int, total_sales float8)
......
---
title: About PXF Server Configuration
---
This topic provides an overview of PXF server configuration. To configure a server, refer to the topic specific to the connector that you want to configure.
You read from or write data to an external data store via a PXF connector. To access an external data store, you must provide the server location. You may also be required to provide client access credentials and other external data store-specific properties. PXF simplifies configuring access to external data stores by:
- Supporting file-based configuration
- Providing connector-specific template configuration files
A PXF *Server* definition is simply a named configuration that provides access to a specific external data store as a specific user. A PXF server name is the name of a directory residing in `$PXF_CONF/servers/`. The information that you provide in a server configuration is connector-specific. PXF provides a server template file for each connector; this template identifies the minimum set of properties that you must configure to use the connector.
You configure a server definition for each external data store that you will permit Greenplum Database users to access. For example, if you require access to two Hadoop clusters, you will create a PXF Hadoop server configuration for each cluster. If you require access to an Oracle and a MySQL database, you will create a PXF JDBC server configuration for each database/user combination.
After you configure a PXF server, you publish the server name to Greenplum Database users as appropriate. A user only needs to provide the server name when they create an external table that accesses the external data store. PXF obtains the external data source location and access credentials from the server configuration directory identified by the server name.
## <a id="cfgproc"></a>Server Configuration Procedure
When you configure a PXF connector, you add a named PXF server configuration for the connector. Among the tasks that you perform, you may:
1. Determine if you are configuring the `default` PXF server, or choose a new name for the server configuration.
2. Create the directory `$PXF_CONF/servers/<server_name>`.
3. Copy template or other configuration files to the new server directory.
4. Fill in appropriate values for the properties in the template file.
5. Add additional properties and values if required for your environment.
6. Add other configuration information supported by the connector.
7. Synchronize the server configuration to the Greenplum Database cluster.
8. Publish the PXF server names to your Greenplum Database end users as appropriate.
## <a id="default"></a>About The Default Server
PXF defines a special server named `default`. When you initialize PXF, it automatically creates a `$PXF_CONF/servers/default/` directory. This directory, initially empty, identifies the default PXF server configuration. You can configure and assign the default PXF server to any external data source. For example, you may choose to assign the PXF default server to a Hadoop cluster, or to a MySQL database that you frequently access as a specific user.
**Note**: You *must* configure a Hadoop server as the PXF `default` server when your Hadoop cluster utilizes Kerberos authentication.
PXF automatically uses the `default` server configuration if you omit the `SERVER=<server_name>` setting in the `CREATE EXTERNAL TABLE` command `LOCATION` clause.
## <a id="templates"></a>Server Template Files
PXF provides a template configuration file for each connector. These server template configuration files, located in the `$PXF_CONF/templates/` directory, identify the minimum set of properties that you must configure to use the connector.
```
gpadmin@gpmaster$ ls $PXF_CONF/templates
adl-site.xml hbase-site.xml jdbc-site.xml s3-site.xml
core-site.xml hdfs-site.xml mapred-site.xml wasbs-site.xml
gs-site.xml hive-site.xml minio-site.xml yarn-site.xml
```
For example, the contents of the `s3-site.xml` template file follow:
``` pre
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>YOUR_AWS_ACCESS_KEY_ID</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>YOUR_AWS_SECRET_ACCESS_KEY</value>
</property>
<property>
<name>fs.s3a.fast.upload</name>
<value>true</value>
</property>
</configuration>
```
<div class="note">You specify credentials to PXF in clear text in configuration files.</div>
The template files for the Hadoop connectors are not intended to be modified and used for configuration, as they only provide an example of the information needed. Instead of modifying the Hadoop templates, you will copy several Hadoop `*-site.xml` files from the Hadoop cluster to your PXF Hadoop server configuration.
## <a id="using"></a>Using a Server Configuration
To access an external data store, the Greenplum Database user specifies the server name in the `CREATE EXTERNAL TABLE` command `LOCATION` clause `SERVER=<server_name>` option. The `<server_name>` that the user provides identifies the server configuration directory from which PXF obtains the configuration and credentials to access the external data store.
For example, the following command accesses an S3 object store using the server configuration defined in `$PXF_CONF/servers/s3srvcfg`:
<pre>
CREATE EXTERNAL TABLE pxf_ext_tbl(name text, orders int)
LOCATION ('pxf://BUCKET/dir/file.txt?PROFILE=s3:text&<b>SERVER=s3srvcfg</b>')
FORMAT 'TEXT' (delimiter=E',');
</pre>
PXF automatically uses the `default` server configuration when no `SERVER=<server_name>` setting is provided.
For example, if the `default` server configuration identifies a Hadoop cluster, the following example command references the HDFS file located at `/path/to/file.txt`:
<pre>
CREATE EXTERNAL TABLE pxf_ext_hdfs(location text, miles int)
LOCATION ('pxf://path/to/file.txt?PROFILE=hdfs:text')
FORMAT 'TEXT' (delimiter=E',');
</pre>
<div class="note info">A Greenplum Database user who queries or writes to an external table accesses the external data store with the credentials configured for the server associated with the external table.</div>
## <a id="srv-cfg-update"></a>Adding or Updating Server Configurations
If you add new, or update existing, PXF server configurations on the Greenplum Database master host, you must re-sync the PXF configuration to the Greenplum Database cluster:
``` shell
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
```
## <a id="override"></a>Overriding the Server Configuration
Some PXF connectors (S3, JDBC) allow you to override a server configuration by directly specifying certain properties via custom options in the `CREATE EXTERNAL TABLE` command `LOCATION` clause. Refer to [Overriding the S3 Server Configuration](access_objstore.html#s3_override) and [Overriding the JDBC Server Configuration](jdbc_cfg.html#override) for additional information.
<div class="note warning">When you override a server configuration, the properties and their values are visible as part of the external table defininition. Do not use this method to pass credentials in a production environment.</div>
...@@ -10,24 +10,42 @@ PXF is compatible with Cloudera, Hortonworks Data Platform, MapR, and generic Ap ...@@ -10,24 +10,42 @@ PXF is compatible with Cloudera, Hortonworks Data Platform, MapR, and generic Ap
Configuring PXF Hadoop connectors involves copying configuration files from your Hadoop cluster to the Greenplum Database master host. If you are using the MapR Hadoop distribution, you must also copy certain JAR files to the master host. Before you configure the PXF Hadoop connectors, ensure that you can copy files from hosts in your Hadoop cluster to the Greenplum Database master. Configuring PXF Hadoop connectors involves copying configuration files from your Hadoop cluster to the Greenplum Database master host. If you are using the MapR Hadoop distribution, you must also copy certain JAR files to the master host. Before you configure the PXF Hadoop connectors, ensure that you can copy files from hosts in your Hadoop cluster to the Greenplum Database master.
In this procedure, you copy Hadoop configuration files to the `$PXF_CONF/servers/default` directory on the Greenplum Database master host. You may also copy libraries to `$PXF_CONF/lib` for MapR support. You then synchronize the PXF configuration on the master host to the standby master and segment hosts. (PXF creates the`$PXF_CONF/*` directories when you run `pxf cluster init`.)
**Note**: After you complete the configuration procedure, you will have configured the PXF default Hadoop server. End users need not provide a `SERVER` option in a `CREATE EXTERNAL TABLE` command when they access the default Hadoop server configuration.
## <a id="client-pxf-config-steps"></a>Procedure ## <a id="client-pxf-config-steps"></a>Procedure
Perform the following procedure to configure the desired PXF Hadoop-related connectors on the Greenplum Database master host. After you configure the connectors, you will use the `pxf cluster sync` command to copy the PXF configuration to the Greenplum Database cluster. Perform the following procedure to configure the desired PXF Hadoop-related connectors on the Greenplum Database master host. After you configure the connectors, you will use the `pxf cluster sync` command to copy the PXF configuration to the Greenplum Database cluster.
In this procedure, you use the `default`, or create a new, PXF server configuration. You copy Hadoop configuration files to the server configuration directory on the Greenplum Database master host. You may also copy libraries to `$PXF_CONF/lib` for MapR support. You then synchronize the PXF configuration on the master host to the standby master and segment hosts. (PXF creates the`$PXF_CONF/*` directories when you run `pxf cluster init`.)
1. Log in to your Greenplum Database master node: 1. Log in to your Greenplum Database master node:
``` shell ``` shell
$ ssh gpadmin@<gpmaster> $ ssh gpadmin@<gpmaster>
``` ```
2. PXF requires information from `core-site.xml` and other Hadoop configuration files. Copy the `core-site.xml`, `hdfs-site.xml`, `mapred-site.xml`, and `yarn-site.xml` Hadoop configuration files from your Hadoop cluster NameNode host to the current host using your tool of choice. (Your file paths may differ based on the Hadoop distribution in use.) For example, these commands use `scp` to copy the files: 2. Identify the name of your PXF Hadoop server configuration. If your Hadoop cluster is Kerberized, you must use the `default` PXF server.
3. If you are not using the `default` PXF server, create the `$PXF_HOME/servers/<server_name>` directory. For example, use the following command to create a Hadoop server configuration named `hdp3`:
``` shell ``` shell
gpadmin@gpmaster$ mkdir $PXF_CONF/servers/hdp3
````
4. Change to the server directory. For example:
```shell
gpadmin@gpmaster$ cd $PXF_CONF/servers/default gpadmin@gpmaster$ cd $PXF_CONF/servers/default
```
Or,
```shell
gpadmin@gpmaster$ cd $PXF_CONF/servers/hdp3
```
2. PXF requires information from `core-site.xml` and other Hadoop configuration files. Copy the `core-site.xml`, `hdfs-site.xml`, `mapred-site.xml`, and `yarn-site.xml` Hadoop configuration files from your Hadoop cluster NameNode host to the current host using your tool of choice. Your file paths may differ based on the Hadoop distribution in use. For example, these commands use `scp` to copy the files:
``` shell
gpadmin@gpmaster$ scp hdfsuser@namenode:/etc/hadoop/conf/core-site.xml . gpadmin@gpmaster$ scp hdfsuser@namenode:/etc/hadoop/conf/core-site.xml .
gpadmin@gpmaster$ scp hdfsuser@namenode:/etc/hadoop/conf/hdfs-site.xml . gpadmin@gpmaster$ scp hdfsuser@namenode:/etc/hadoop/conf/hdfs-site.xml .
gpadmin@gpmaster$ scp hdfsuser@namenode:/etc/hadoop/conf/mapred-site.xml . gpadmin@gpmaster$ scp hdfsuser@namenode:/etc/hadoop/conf/mapred-site.xml .
...@@ -70,10 +88,10 @@ Perform the following procedure to configure the desired PXF Hadoop-related conn ...@@ -70,10 +88,10 @@ Perform the following procedure to configure the desired PXF Hadoop-related conn
## <a id="client-cfg-update"></a>Updating Hadoop Configuration ## <a id="client-cfg-update"></a>Updating Hadoop Configuration
If you update your Hadoop, Hive, or HBase configuration while the PXF service is running, you must copy the updated configuration to the `$PXF_CONF/servers/default` directory and re-sync the PXF configuration to your Greenplum Database cluster. For example: If you update your Hadoop, Hive, or HBase configuration while the PXF service is running, you must copy the updated configuration to the `$PXF_CONF/servers/<server_name>` directory and re-sync the PXF configuration to your Greenplum Database cluster. For example:
``` shell ``` shell
gpadmin@gpmaster$ cd $PXF_CONF/servers/default gpadmin@gpmaster$ cd $PXF_CONF/servers/<server_name>
gpadmin@gpmaster$ scp hiveuser@hivehost:/etc/hive/conf/hive-site.xml . gpadmin@gpmaster$ scp hiveuser@hivehost:/etc/hive/conf/hive-site.xml .
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
``` ```
......
...@@ -90,7 +90,7 @@ Use the following syntax to create a Greenplum Database external table that refe ...@@ -90,7 +90,7 @@ Use the following syntax to create a Greenplum Database external table that refe
``` sql ``` sql
CREATE EXTERNAL TABLE <table_name> CREATE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<hbase-table-name>?PROFILE=HBase') LOCATION ('pxf://<hbase-table-name>?PROFILE=HBase[&SERVER=<server_name>]')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
``` ```
...@@ -100,6 +100,7 @@ HBase connector-specific keywords and values used in the [CREATE EXTERNAL TABLE] ...@@ -100,6 +100,7 @@ HBase connector-specific keywords and values used in the [CREATE EXTERNAL TABLE]
|-------|-------------------------------------| |-------|-------------------------------------|
| \<hbase&#8209;table&#8209;name\> | The name of the HBase table. | | \<hbase&#8209;table&#8209;name\> | The name of the HBase table. |
| PROFILE | The `PROFILE` keyword must specify `HBase`. | | PROFILE | The `PROFILE` keyword must specify `HBase`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| FORMAT | The `FORMAT` clause must specify `'CUSTOM' (FORMATTER='pxfwritable_import')`. | | FORMAT | The `FORMAT` clause must specify `'CUSTOM' (FORMATTER='pxfwritable_import')`. |
......
...@@ -72,7 +72,7 @@ Use the `hdfs:avro` profile to read Avro-format data in HDFS. The following synt ...@@ -72,7 +72,7 @@ Use the `hdfs:avro` profile to read Avro-format data in HDFS. The following synt
``` sql ``` sql
CREATE EXTERNAL TABLE <table_name> CREATE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-hdfs-file>?PROFILE=hdfs:avro[&<custom-option>=<value>[...]]') LOCATION ('pxf://<path-to-hdfs-file>?PROFILE=hdfs:avro[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
``` ```
...@@ -82,6 +82,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -82,6 +82,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;hdfs&#8209;file\> | The absolute path to the directory or file in the HDFS data store. | | \<path&#8209;to&#8209;hdfs&#8209;file\> | The absolute path to the directory or file in the HDFS data store. |
| PROFILE | The `PROFILE` keyword must specify `hdfs:avro`. | | PROFILE | The `PROFILE` keyword must specify `hdfs:avro`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\> | \<custom-option\>s are discussed below.| | \<custom&#8209;option\> | \<custom-option\>s are discussed below.|
| FORMAT 'CUSTOM' | Use `FORMAT` `'CUSTOM'` with the `hdfs:avro` profile. The `CUSTOM` `FORMAT` requires that you specify `(FORMATTER='pxfwritable_import')`. | | FORMAT 'CUSTOM' | Use `FORMAT` `'CUSTOM'` with the `hdfs:avro` profile. The `CUSTOM` `FORMAT` requires that you specify `(FORMATTER='pxfwritable_import')`. |
...@@ -201,6 +202,7 @@ Perform the following steps to create a sample Avro data file conforming to the ...@@ -201,6 +202,7 @@ Perform the following steps to create a sample Avro data file conforming to the
Perform the following operations to create and query an external table that references the `pxf_avro.avro` file that you added to HDFS in the previous section. When creating the table: Perform the following operations to create and query an external table that references the `pxf_avro.avro` file that you added to HDFS in the previous section. When creating the table:
- Use the PXF default server.
- Map the top-level primitive fields, `id` (type long) and `username` (type string), to their equivalent Greenplum Database types (bigint and text). - Map the top-level primitive fields, `id` (type long) and `username` (type string), to their equivalent Greenplum Database types (bigint and text).
- Map the remaining complex fields to type text. - Map the remaining complex fields to type text.
- Explicitly set the record, map, and collection delimiters using the `hdfs:avro` profile custom options. - Explicitly set the record, map, and collection delimiters using the `hdfs:avro` profile custom options.
......
...@@ -178,7 +178,7 @@ Use the `hdfs:json` profile to read JSON-format files from HDFS. The following s ...@@ -178,7 +178,7 @@ Use the `hdfs:json` profile to read JSON-format files from HDFS. The following s
``` sql ``` sql
CREATE EXTERNAL TABLE <table_name> CREATE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-hdfs-file>?PROFILE=hdfs:json[&<custom-option>=<value>[...]]') LOCATION ('pxf://<path-to-hdfs-file>?PROFILE=hdfs:json[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
``` ```
...@@ -188,6 +188,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -188,6 +188,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;hdfs&#8209;file\> | The absolute path to the directory or file in the HDFS data store. | | \<path&#8209;to&#8209;hdfs&#8209;file\> | The absolute path to the directory or file in the HDFS data store. |
| PROFILE | The `PROFILE` keyword must specify `hdfs:json`. | | PROFILE | The `PROFILE` keyword must specify `hdfs:json`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\> | \<custom-option\>s are discussed below.| | \<custom&#8209;option\> | \<custom-option\>s are discussed below.|
| FORMAT 'CUSTOM' | Use `FORMAT` `'CUSTOM'` with the `hdfs:json` profile. The `CUSTOM` `FORMAT` requires that you specify `(FORMATTER='pxfwritable_import')`. | | FORMAT 'CUSTOM' | Use `FORMAT` `'CUSTOM'` with the `hdfs:json` profile. The `CUSTOM` `FORMAT` requires that you specify `(FORMATTER='pxfwritable_import')`. |
...@@ -201,7 +202,7 @@ PXF supports single- and multi- line JSON records. When you want to read multi-l ...@@ -201,7 +202,7 @@ PXF supports single- and multi- line JSON records. When you want to read multi-l
## <a id="jsonexample1"></a>Example: Reading a JSON File with Single Line Records ## <a id="jsonexample1"></a>Example: Reading a JSON File with Single Line Records
Use the following [CREATE EXTERNAL TABLE](../ref_guide/sql_commands/CREATE_EXTERNAL_TABLE.html) SQL command to create a readable external table that references the single-line-per-record JSON data file. Use the following [CREATE EXTERNAL TABLE](../ref_guide/sql_commands/CREATE_EXTERNAL_TABLE.html) SQL command to create a readable external table that references the single-line-per-record JSON data file and uses the PXF default server.
``` sql ``` sql
CREATE EXTERNAL TABLE singleline_json_tbl( CREATE EXTERNAL TABLE singleline_json_tbl(
......
...@@ -63,7 +63,7 @@ Use the following syntax to create a Greenplum Database external table that refe ...@@ -63,7 +63,7 @@ Use the following syntax to create a Greenplum Database external table that refe
CREATE [WRITABLE] EXTERNAL TABLE <table_name> CREATE [WRITABLE] EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-hdfs-dir> LOCATION ('pxf://<path-to-hdfs-dir>
?PROFILE=hdfs:parquet[&<custom-option>=<value>[...]]') ?PROFILE=hdfs:parquet[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'|'pxfwritable_export'); FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'|'pxfwritable_export');
[DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY]; [DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY];
``` ```
...@@ -74,6 +74,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -74,6 +74,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;hdfs&#8209;file\> | The absolute path to the directory in the HDFS data store. | | \<path&#8209;to&#8209;hdfs&#8209;file\> | The absolute path to the directory in the HDFS data store. |
| PROFILE | The `PROFILE` keyword must specify `hdfs:parquet`. | | PROFILE | The `PROFILE` keyword must specify `hdfs:parquet`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\>=\<value\> | \<custom-option\>s are described below.| | \<custom&#8209;option\>=\<value\> | \<custom-option\>s are described below.|
| FORMAT 'CUSTOM' | Use `FORMAT` '`CUSTOM`' with `(FORMATTER='pxfwritable_export')` (write) or `(FORMATTER='pxfwritable_import')` (read). | | FORMAT 'CUSTOM' | Use `FORMAT` '`CUSTOM`' with `(FORMATTER='pxfwritable_export')` (write) or `(FORMATTER='pxfwritable_import')` (read). |
| DISTRIBUTED BY | If you plan to load the writable external table with data from an existing Greenplum Database table, consider specifying the same distribution policy or \<column_name\> on the writable external table as that defined for the table from which you plan to load the data. Doing so will avoid extra motion of data between segments on the load operation. | | DISTRIBUTED BY | If you plan to load the writable external table with data from an existing Greenplum Database table, consider specifying the same distribution policy or \<column_name\> on the writable external table as that defined for the table from which you plan to load the data. Doing so will avoid extra motion of data between segments on the load operation. |
...@@ -86,7 +87,7 @@ The PXF `hdfs:parquet` profile supports encoding- and compression-related write ...@@ -86,7 +87,7 @@ The PXF `hdfs:parquet` profile supports encoding- and compression-related write
| COMPRESSION_CODEC | The compression codec alias. Supported compression codecs for writing Parquet data include: `snappy`, `gzip`, `lzo`, and `uncompressed` . If this option is not provided, PXF compresses the data using `snappy` compression. | | COMPRESSION_CODEC | The compression codec alias. Supported compression codecs for writing Parquet data include: `snappy`, `gzip`, `lzo`, and `uncompressed` . If this option is not provided, PXF compresses the data using `snappy` compression. |
| ROWGROUP_SIZE | A Parquet file consists of one or more row groups, a logical partitioning of the data into rows. `ROWGROUP_SIZE` identifies the size (in bytes) of the row group. The default row group size is `8 * 1024 * 1024` bytes. | | ROWGROUP_SIZE | A Parquet file consists of one or more row groups, a logical partitioning of the data into rows. `ROWGROUP_SIZE` identifies the size (in bytes) of the row group. The default row group size is `8 * 1024 * 1024` bytes. |
| PAGE_SIZE | A row group consists of column chunks that are divided up into pages. `PAGE_SIZE` is the size (in bytes) of such a page. The default page size is `1024 * 1024` bytes. | | PAGE_SIZE | A row group consists of column chunks that are divided up into pages. `PAGE_SIZE` is the size (in bytes) of such a page. The default page size is `1024 * 1024` bytes. |
| DICTIONARY_PAGE_SIZE | Dictionary encoding is enabled by default when PXF writes Parquet files. There is a single dictionary page per column, per row group. `DICTIONARY_PAGE_SIZE` is similar to `PAGE_SIZE`, but for the dictionary. The default dictionary page size is `512 * 1024` bytes. | | DICTIONARY\_PAGE\_SIZE | Dictionary encoding is enabled by default when PXF writes Parquet files. There is a single dictionary page per column, per row group. `DICTIONARY_PAGE_SIZE` is similar to `PAGE_SIZE`, but for the dictionary. The default dictionary page size is `512 * 1024` bytes. |
| PARQUET_VERSION | The Parquet version; values `v1` and `v2` are supported. The default Parquet version is `v1`. | | PARQUET_VERSION | The Parquet version; values `v1` and `v2` are supported. The default Parquet version is `v1`. |
**Note**: You must explicitly specify `uncompressed` if you do not want PXF to compress the data. **Note**: You must explicitly specify `uncompressed` if you do not want PXF to compress the data.
...@@ -104,7 +105,7 @@ This example utilizes the data schema introduced in [Example: Reading Text Data ...@@ -104,7 +105,7 @@ This example utilizes the data schema introduced in [Example: Reading Text Data
| number\_of\_orders | int | | number\_of\_orders | int |
| total\_sales | float8 | | total\_sales | float8 |
In this example, you create a Parquet-format writable external table that references Parquet-format data in HDFS, insert some data into the table, and then create a readable external table to read the data. In this example, you create a Parquet-format writable external table that uses the default PXF server to reference Parquet-format data in HDFS, insert some data into the table, and then create a readable external table to read the data.
1. Use the `hdfs:parquet` profile to create a writable external table. For example: 1. Use the `hdfs:parquet` profile to create a writable external table. For example:
......
...@@ -39,7 +39,7 @@ Use the following syntax to create a Greenplum Database external table that refe ...@@ -39,7 +39,7 @@ Use the following syntax to create a Greenplum Database external table that refe
CREATE [WRITABLE] EXTERNAL TABLE <table_name> CREATE [WRITABLE] EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-hdfs-dir> LOCATION ('pxf://<path-to-hdfs-dir>
?PROFILE=hdfs:SequenceFile[&<custom-option>=<value>[...]]') ?PROFILE=hdfs:SequenceFile[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT 'CUSTOM' (<formatting-properties>) FORMAT 'CUSTOM' (<formatting-properties>)
[DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY]; [DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY];
``` ```
...@@ -50,6 +50,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -50,6 +50,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;hdfs&#8209;dir\> | The absolute path to the directory in the HDFS data store. | | \<path&#8209;to&#8209;hdfs&#8209;dir\> | The absolute path to the directory in the HDFS data store. |
| PROFILE | The `PROFILE` keyword must specify `hdfs:SequenceFile`. | | PROFILE | The `PROFILE` keyword must specify `hdfs:SequenceFile`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\> | \<custom-option\>s are described below.| | \<custom&#8209;option\> | \<custom-option\>s are described below.|
| FORMAT | Use `FORMAT` '`CUSTOM`' with `(FORMATTER='pxfwritable_export')` (write) or `(FORMATTER='pxfwritable_import')` (read). | | FORMAT | Use `FORMAT` '`CUSTOM`' with `(FORMATTER='pxfwritable_export')` (write) or `(FORMATTER='pxfwritable_import')` (read). |
| DISTRIBUTED BY | If you plan to load the writable external table with data from an existing Greenplum Database table, consider specifying the same distribution policy or \<column_name\> on the writable external table as that defined for the table from which you plan to load the data. Doing so will avoid extra motion of data between segments on the load operation. | | DISTRIBUTED BY | If you plan to load the writable external table with data from an existing Greenplum Database table, consider specifying the same distribution policy or \<column_name\> on the writable external table as that defined for the table from which you plan to load the data. Doing so will avoid extra motion of data between segments on the load operation. |
...@@ -79,7 +80,7 @@ Use the HDFS connector `hdfs:SequenceFile` profile when you want to read or writ ...@@ -79,7 +80,7 @@ Use the HDFS connector `hdfs:SequenceFile` profile when you want to read or writ
### <a id="write_seqfile_example" class="no-quick-link"></a>Example: Writing Binary Data to HDFS ### <a id="write_seqfile_example" class="no-quick-link"></a>Example: Writing Binary Data to HDFS
In this example, you create a Java class named `PxfExample_CustomWritable` that will serialize/deserialize the fields in the sample schema used in previous examples. You will then use this class to access a writable external table that you create with the `hdfs:SequenceFile` profile. In this example, you create a Java class named `PxfExample_CustomWritable` that will serialize/deserialize the fields in the sample schema used in previous examples. You will then use this class to access a writable external table that you create with the `hdfs:SequenceFile` profile and that uses the default PXF server.
Perform the following procedure to create the Java class and writable table. Perform the following procedure to create the Java class and writable table.
......
...@@ -34,7 +34,7 @@ Use the `hdfs:text` profile when you read plain text delimited or .csv data wher ...@@ -34,7 +34,7 @@ Use the `hdfs:text` profile when you read plain text delimited or .csv data wher
``` sql ``` sql
CREATE EXTERNAL TABLE <table_name> CREATE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-hdfs-file>?PROFILE=hdfs:text') LOCATION ('pxf://<path-to-hdfs-file>?PROFILE=hdfs:text[&SERVER=<server_name>]')
FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>'); FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>');
``` ```
...@@ -44,12 +44,13 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -44,12 +44,13 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;hdfs&#8209;file\> | The absolute path to the directory or file in the HDFS data store. | | \<path&#8209;to&#8209;hdfs&#8209;file\> | The absolute path to the directory or file in the HDFS data store. |
| PROFILE | The `PROFILE` keyword must specify `hdfs:text`. | | PROFILE | The `PROFILE` keyword must specify `hdfs:text`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| FORMAT | Use `FORMAT` `'TEXT'` when \<path-to-hdfs-file\> references plain text delimited data.<br> Use `FORMAT` `'CSV'` when \<path-to-hdfs-file\> references comma-separated value data. | | FORMAT | Use `FORMAT` `'TEXT'` when \<path-to-hdfs-file\> references plain text delimited data.<br> Use `FORMAT` `'CSV'` when \<path-to-hdfs-file\> references comma-separated value data. |
| delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. | | delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. |
### <a id="profile_text_query"></a>Example: Reading Text Data on HDFS ### <a id="profile_text_query"></a>Example: Reading Text Data on HDFS
Perform the following procedure to create a sample text file, copy the file to HDFS, and use the `hdfs:text` profile to create two PXF external tables to query the data: Perform the following procedure to create a sample text file, copy the file to HDFS, and use the `hdfs:text` profile and the default PXF server to create two PXF external tables to query the data:
1. Create an HDFS directory for PXF example data files. For example: 1. Create an HDFS directory for PXF example data files. For example:
...@@ -128,7 +129,7 @@ Use the `hdfs:text:multi` profile to read plain text data with delimited single- ...@@ -128,7 +129,7 @@ Use the `hdfs:text:multi` profile to read plain text data with delimited single-
``` sql ``` sql
CREATE EXTERNAL TABLE <table_name> CREATE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-hdfs-file>?PROFILE=hdfs:text:multi') LOCATION ('pxf://<path-to-hdfs-file>?PROFILE=hdfs:text:multi[&SERVER=<server_name>]')
FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>'); FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>');
``` ```
...@@ -138,12 +139,13 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -138,12 +139,13 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;hdfs&#8209;file\> | The absolute path to the directory or file in the HDFS data store. | | \<path&#8209;to&#8209;hdfs&#8209;file\> | The absolute path to the directory or file in the HDFS data store. |
| PROFILE | The `PROFILE` keyword must specify `hdfs:text:multi`. | | PROFILE | The `PROFILE` keyword must specify `hdfs:text:multi`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| FORMAT | Use `FORMAT` `'TEXT'` when \<path-to-hdfs-file\> references plain text delimited data.<br> Use `FORMAT` `'CSV'` when \<path-to-hdfs-file\> references comma-separated value data. | | FORMAT | Use `FORMAT` `'TEXT'` when \<path-to-hdfs-file\> references plain text delimited data.<br> Use `FORMAT` `'CSV'` when \<path-to-hdfs-file\> references comma-separated value data. |
| delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. | | delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. |
### <a id="profile_textmulti_query"></a>Example: Reading Multi-Line Text Data on HDFS ### <a id="profile_textmulti_query"></a>Example: Reading Multi-Line Text Data on HDFS
Perform the following steps to create a sample text file, copy the file to HDFS, and use the PXF `hdfs:text:multi` profile to create a Greenplum Database readable external table to query the data: Perform the following steps to create a sample text file, copy the file to HDFS, and use the PXF `hdfs:text:multi` profile and the default PXF server to create a Greenplum Database readable external table to query the data:
1. Create a second delimited plain text file: 1. Create a second delimited plain text file:
...@@ -218,7 +220,7 @@ Use the following syntax to create a Greenplum Database writable external table ...@@ -218,7 +220,7 @@ Use the following syntax to create a Greenplum Database writable external table
CREATE WRITABLE EXTERNAL TABLE <table_name> CREATE WRITABLE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-hdfs-dir> LOCATION ('pxf://<path-to-hdfs-dir>
?PROFILE=hdfs:text[&<custom-option>=<value>[...]]') ?PROFILE=hdfs:text[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>'); FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>');
[DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY]; [DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY];
``` ```
...@@ -229,6 +231,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -229,6 +231,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;hdfs&#8209;dir\> | The absolute path to the directory in the HDFS data store. | | \<path&#8209;to&#8209;hdfs&#8209;dir\> | The absolute path to the directory in the HDFS data store. |
| PROFILE | The `PROFILE` keyword must specify `hdfs:text` | | PROFILE | The `PROFILE` keyword must specify `hdfs:text` |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\> | \<custom-option\>s are described below.| | \<custom&#8209;option\> | \<custom-option\>s are described below.|
| FORMAT | Use `FORMAT` `'TEXT'` to write plain, delimited text to \<path-to-hdfs-dir\>.<br> Use `FORMAT` `'CSV'` to write comma-separated value text to \<path-to-hdfs-dir\>. | | FORMAT | Use `FORMAT` `'TEXT'` to write plain, delimited text to \<path-to-hdfs-dir\>.<br> Use `FORMAT` `'CSV'` to write comma-separated value text to \<path-to-hdfs-dir\>. |
| delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. | | delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. |
...@@ -263,7 +266,7 @@ This example also optionally uses the Greenplum Database external table named `p ...@@ -263,7 +266,7 @@ This example also optionally uses the Greenplum Database external table named `p
#### <a id="write_hdfstextsimple_proc" class="no-quick-link"></a>Procedure #### <a id="write_hdfstextsimple_proc" class="no-quick-link"></a>Procedure
Perform the following procedure to create Greenplum Database writable external tables utilizing the same data schema as described above, one of which will employ compression. You will use the PXF `hdfs:text` profile to write data to the underlying HDFS directory. You will also create a separate, readable external table to read the data that you wrote to the HDFS directory. Perform the following procedure to create Greenplum Database writable external tables utilizing the same data schema as described above, one of which will employ compression. You will use the PXF `hdfs:text` profile and the default PXF server to write data to the underlying HDFS directory. You will also create a separate, readable external table to read the data that you wrote to the HDFS directory.
1. Create a Greenplum Database writable external table utilizing the data schema described above. Write to the HDFS directory `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter: 1. Create a Greenplum Database writable external table utilizing the data schema described above. Write to the HDFS directory `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
......
...@@ -183,7 +183,7 @@ Use the following syntax to create a Greenplum Database external table that refe ...@@ -183,7 +183,7 @@ Use the following syntax to create a Greenplum Database external table that refe
CREATE EXTERNAL TABLE <table_name> CREATE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<hive-db-name>.<hive-table-name> LOCATION ('pxf://<hive-db-name>.<hive-table-name>
?PROFILE=Hive|HiveText|HiveRC|HiveORC|HiveVectorizedORC[&DELIMITER=<delim>']) ?PROFILE=Hive|HiveText|HiveRC|HiveORC|HiveVectorizedORC[&SERVER=<server_name>][&DELIMITER=<delim>'])
FORMAT 'CUSTOM|TEXT' (FORMATTER='pxfwritable_import' | delimiter='<delim>') FORMAT 'CUSTOM|TEXT' (FORMATTER='pxfwritable_import' | delimiter='<delim>')
``` ```
...@@ -194,6 +194,7 @@ Hive connector-specific keywords and values used in the [CREATE EXTERNAL TABLE]( ...@@ -194,6 +194,7 @@ Hive connector-specific keywords and values used in the [CREATE EXTERNAL TABLE](
| \<hive&#8209;db&#8209;name\> | The name of the Hive database. If omitted, defaults to the Hive database named `default`. | | \<hive&#8209;db&#8209;name\> | The name of the Hive database. If omitted, defaults to the Hive database named `default`. |
| \<hive&#8209;table&#8209;name\> | The name of the Hive table. | | \<hive&#8209;table&#8209;name\> | The name of the Hive table. |
| PROFILE | The `PROFILE` keyword must specify one of the values `Hive`, `HiveText`, `HiveRC`, `HiveORC`, or `HiveVectorizedORC`. | | PROFILE | The `PROFILE` keyword must specify one of the values `Hive`, `HiveText`, `HiveRC`, `HiveORC`, or `HiveVectorizedORC`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| DELIMITER | The custom options `DELIMITER` clause is required for both the `HiveText` and `HiveRC` profiles and identifies the field delimiter used in the Hive data set. \<delim\> must be a single ascii character or specified in hexadecimal representation. | | DELIMITER | The custom options `DELIMITER` clause is required for both the `HiveText` and `HiveRC` profiles and identifies the field delimiter used in the Hive data set. \<delim\> must be a single ascii character or specified in hexadecimal representation. |
| FORMAT (`Hive`, `HiveORC`, and `HiveVectorizedORC` profiles) | The `FORMAT` clause must specify `'CUSTOM'`. The `CUSTOM` format requires the built-in `pxfwritable_import` `formatter`. | | FORMAT (`Hive`, `HiveORC`, and `HiveVectorizedORC` profiles) | The `FORMAT` clause must specify `'CUSTOM'`. The `CUSTOM` format requires the built-in `pxfwritable_import` `formatter`. |
| FORMAT (`HiveText` and `HiveRC` profiles) | The `FORMAT` clause must specify `TEXT`. The `delimiter` must be specified a second time in the '\<delim\>' formatting option. | | FORMAT (`HiveText` and `HiveRC` profiles) | The `FORMAT` clause must specify `TEXT`. The `delimiter` must be specified a second time in the '\<delim\>' formatting option. |
......
...@@ -62,7 +62,7 @@ Perform the following procedure to initialize PXF on each segment host in your G ...@@ -62,7 +62,7 @@ Perform the following procedure to initialize PXF on each segment host in your G
gpadmin@gpmaster$ PXF_CONF=/usr/local/greenplum-pxf $GPHOME/pxf/bin/pxf cluster init gpadmin@gpmaster$ PXF_CONF=/usr/local/greenplum-pxf $GPHOME/pxf/bin/pxf cluster init
``` ```
The `init` command creates the PXF web application and initializes the internal PXF configuration. The `init` command also creates the `$PXF_CONF` user configuration directory if it does not exist, and populates the directory with user-customizable configuration templates. The `init` command creates the PXF web application and initializes the internal PXF configuration. The `init` command also creates the `$PXF_CONF` user configuration directory if it does not exist, and populates the `conf` and `templates` directories with user-customizable configuration templates. If `$PXF_CONF` exists, PXF updates only the `templates` directory.
**Note**: The PXF service runs only on the segment hosts. However,`pxf cluster init` also sets up the PXF user configuration directories on the Greenplum Database master and standby master hosts. **Note**: The PXF service runs only on the segment hosts. However,`pxf cluster init` also sets up the PXF user configuration directories on the Greenplum Database master and standby master hosts.
...@@ -14,17 +14,17 @@ Your Greenplum Database deployment consists of a master node and multiple segmen ...@@ -14,17 +14,17 @@ Your Greenplum Database deployment consists of a master node and multiple segmen
## <a id="more"></a> About Connectors, Servers, and Profiles ## <a id="more"></a> About Connectors, Servers, and Profiles
*Connector* is a generic term that encapsulates the implementation details required to read from or write to an external data store. PXF provides built-in connectors to Hadoop (HDFS, Hive, HBase), object stores (Azure, Google Cloud Storage, Minio, S3), and SQL databases (JDBC). *Connector* is a generic term that encapsulates the implementation details required to read from or write to an external data store. PXF provides built-in connectors to Hadoop (HDFS, Hive, HBase), object stores (Azure, Google Cloud Storage, Minio, S3), and SQL databases (via JDBC).
A PXF *server* is a named configuration for Hadoop or an object store connector. A server definition provides the information required for PXF to access an external data source. This configuration information is data-store-specific, and may include server location and access credentials. A PXF *Server* is a named configuration for a connector. A server definition provides the information required for PXF to access an external data source. This configuration information is data-store-specific, and may include server location, access credentials, and other relevant properties.
<div class="note">PXF supports storing object store credentials in clear text in configuration files at this time.</div> The Greenplum Database administrator will configure at least one server definition for each external data store that they will allow Greenplum Database users to access, and will publish the available server names as appropriate.
The default PXF server is named `default`, and when configured provides the information required for PXF to access Hadoop services including HDFS, Hive, and HBase. You specify a `SERVER=<server_name>` setting when you create the external table to identify the server configuration from which to obtain the configuration and credentials to access the external data store.
The Greenplum Database administrator will configure at least one server definition for each object store that they will permit Greenplum Database users to access, and will publish the available server names as appropriate. The default PXF server is named `default` (reserved), and when configured provides the location and access information for the external data source in the absence of a `SERVER=<server_name>` setting.
Finally, a PXF *profile* is a named mapping identifying a specific data format supported by a specific external data store. PXF supports text, Avro, JSON, RCFile, Parquet, SequenceFile, and ORC data formats, and provides several built-in profiles as discussed in the following section. Finally, a PXF *profile* is a named mapping identifying a specific data format or protocol supported by a specific external data store. PXF supports text, Avro, JSON, RCFile, Parquet, SequenceFile, and ORC data formats, and the JDBC protocol, and provides several built-in profiles as discussed in the following section.
## <a id="create_external_table"></a>Creating an External Table ## <a id="create_external_table"></a>Creating an External Table
...@@ -51,7 +51,7 @@ PXF may require additional information to read or write certain data formats. Yo ...@@ -51,7 +51,7 @@ PXF may require additional information to read or write certain data formats. Yo
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| \<path&#8209;to&#8209;data\> | A directory, file name, wildcard pattern, table name, etc. The syntax of \<path-to-data\> is dependent upon the external data source. | | \<path&#8209;to&#8209;data\> | A directory, file name, wildcard pattern, table name, etc. The syntax of \<path-to-data\> is dependent upon the external data source. |
| PROFILE=\<profile_name\> | The profile that PXF uses to access the data. PXF supports profiles that access text, Avro, JSON, RCFile, Parquet, SequenceFile, and ORC data in [Hadoop services](access_hdfs.html), [object stores](access_objstore.html), and [other SQL databases](jdbc_pxf.html). | | PROFILE=\<profile_name\> | The profile that PXF uses to access the data. PXF supports profiles that access text, Avro, JSON, RCFile, Parquet, SequenceFile, and ORC data in [Hadoop services](access_hdfs.html), [object stores](access_objstore.html), and [other SQL databases](jdbc_pxf.html). |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. The default server is a Hadoop server named `default`. The Greenplum Database administrator may configure other servers. | | SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\>=\<value\> | Additional options and their values supported by the profile or the server. | | \<custom&#8209;option\>=\<value\> | Additional options and their values supported by the profile or the server. |
| FORMAT&nbsp;\<value\>| PXF profiles support the `TEXT`, `CSV`, and `CUSTOM` formats. | | FORMAT&nbsp;\<value\>| PXF profiles support the `TEXT`, `CSV`, and `CUSTOM` formats. |
| \<formatting&#8209;properties\> | Formatting properties supported by the profile; for example, the `FORMATTER` or `delimiter`.   | | \<formatting&#8209;properties\> | Formatting properties supported by the profile; for example, the `FORMATTER` or `delimiter`.   |
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: Configuring the JDBC Connector (Optional) title: Configuring the JDBC Connector (Optional)
--- ---
You can use PXF to access an external SQL database including MySQL, ORACLE, PostgreSQL, Apache Ignite, and Hive. This topic describes how to configure the PXF JDBC Connector to access these external data sources. You can use PXF to access an external SQL database including MySQL, ORACLE, PostgreSQL, Hive, and Apache Ignite. This topic describes how to configure the PXF JDBC Connector to access these external data sources.
*If you do not plan to use the PXF JDBC Connector, then you do not need to perform this procedure.* *If you do not plan to use the PXF JDBC Connector, then you do not need to perform this procedure.*
...@@ -15,7 +15,7 @@ To access data in an external SQL database with the PXF JDBC Connector, you must ...@@ -15,7 +15,7 @@ To access data in an external SQL database with the PXF JDBC Connector, you must
In previous releases of Greenplum Database, you may have specified the JDBC driver class name, database URL, and client credentials via options in the `CREATE EXTERNAL TABLE` command. PXF now supports file-based server configuration for the JDBC Connector. This configuration, described below, allows you to specify these options and credentials in a file. In previous releases of Greenplum Database, you may have specified the JDBC driver class name, database URL, and client credentials via options in the `CREATE EXTERNAL TABLE` command. PXF now supports file-based server configuration for the JDBC Connector. This configuration, described below, allows you to specify these options and credentials in a file.
**Note**: PXF external tables that you previously created that directly specified the JDBC connection options will continue to work. If you want to move these tables to use JDBC file-based server configuration, you must create a server configuration, drop the external tables, and then recreate the tables specifying an appropriate `SERVER=<servercfg>` clause. **Note**: PXF external tables that you previously created that directly specified the JDBC connection options will continue to work. If you want to move these tables to use JDBC file-based server configuration, you must create a server configuration, drop the external tables, and then recreate the tables specifying an appropriate `SERVER=<server_name>` clause.
### <a id="cfg_jar"></a>JDBC Driver JAR Registration ### <a id="cfg_jar"></a>JDBC Driver JAR Registration
...@@ -23,6 +23,8 @@ The PXF JDBC Connector is installed with the `postgresql-8.4-702.jdbc4.jar` JAR ...@@ -23,6 +23,8 @@ The PXF JDBC Connector is installed with the `postgresql-8.4-702.jdbc4.jar` JAR
### <a id="cfg_server"></a>JDBC Server Configuration ### <a id="cfg_server"></a>JDBC Server Configuration
When you configure the PXF JDBC Connector, you add at least one named PXF server configuration for the connector as described in [About PXF Server Configuration](cfg_server.html#cfgproc).
PXF provides a template configuration file for the JDBC Connector. This server template configuration file, located in `$PXF_CONF/templates/jdbc-site.xml`, identifies properties that you can configure to establish a connection to the external SQL database. The template also includes optional properties that you can set before executing query or insert commands in the external database session. PXF provides a template configuration file for the JDBC Connector. This server template configuration file, located in `$PXF_CONF/templates/jdbc-site.xml`, identifies properties that you can configure to establish a connection to the external SQL database. The template also includes optional properties that you can set before executing query or insert commands in the external database session.
The required properties in the `jdbc-site.xml` server template file follow: The required properties in the `jdbc-site.xml` server template file follow:
...@@ -136,30 +138,8 @@ Ensure that the JDBC driver for the external SQL database supports any property ...@@ -136,30 +138,8 @@ Ensure that the JDBC driver for the external SQL database supports any property
You can override the JDBC server configuration by directly specifying certain JDBC properties via custom options in the `CREATE EXTERNAL TABLE` command `LOCATION` clause. Refer to [Overriding the JDBC Server Configuration](jdbc_pxf.html#jdbc_override) for additional information. You can override the JDBC server configuration by directly specifying certain JDBC properties via custom options in the `CREATE EXTERNAL TABLE` command `LOCATION` clause. Refer to [Overriding the JDBC Server Configuration](jdbc_pxf.html#jdbc_override) for additional information.
## <a id="cfg_server_proc"></a>Configuration Procedure
When you configure the PXF JDBC Connector to access an external SQL database, you add at least one named PXF server configuration for the connector. You:
1. Choose a name for the server configuration.
2. Create the directory `$PXF_CONF/servers/<server_name>`.
3. Copy the PXF `jdbc-site.xml` template configuration file to the new server directory.
4. Fill in appropriate values for the properties in the template file.
6. Synchronize the server configuration to the Greenplum Database cluster.
7. Publish the PXF server name(s) to your Greenplum Database end users as appropriate.
The Greenplum Database user specifies the `<server_name>` in the `CREATE EXTERNAL TABLE` `LOCATION` clause `SERVER` option to access the external SQL database. For example, if you created a server configuration and named the server directory `pgsrvcfg`:
<pre>
CREATE EXTERNAL TABLE pxf_ext_tbl(name text, orders int)
LOCATION ('pxf://schema.tblname?PROFILE=Jdbc&<b>SERVER=pgsrvcfg</b>')
FORMAT 'TEXT' (delimiter=E',');
</pre>
<div class="note info">A Greenplum Database user who queries or writes to an external table that specifies a server name accesses the external SQL database with the credentials configured for that server.</div>
While not recommended, you can override a JDBC server configuration by directly specifying the driver, database URL, and/or user credentials via custom options in the `CREATE EXTERNAL TABLE` command `LOCATION` clause. Refer to [Overriding the JDBC Server Configuration](jdbc_pxf.html#jdbc_override) for additional information. ### <a id="cfg_proc" class="no-quick-link"></a>Example Configuration Procedure
### <a id="cfg_proc" class="no-quick-link"></a>Example
Ensure that you have initialized PXF before you configure a JDBC Connector server. Ensure that you have initialized PXF before you configure a JDBC Connector server.
...@@ -218,11 +198,3 @@ In this procedure, you name and add a PXF JDBC server configuration for a Postgr ...@@ -218,11 +198,3 @@ In this procedure, you name and add a PXF JDBC server configuration for a Postgr
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
``` ```
## <a id="client-cfg-update"></a>Adding or Updating JDBC Server Configuration
If you add a new, or update an existing, JDBC server configuration on the Greenplum Database master host, you must re-sync the PXF configuration to the Greenplum Database cluster:
``` shell
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
```
...@@ -21,7 +21,7 @@ specific language governing permissions and limitations ...@@ -21,7 +21,7 @@ specific language governing permissions and limitations
under the License. under the License.
--> -->
Some of your data may already reside in an external SQL database. PXF provides access to this data via the PXF JDBC connector. The JDBC connector is a JDBC client. It can read data from and write data to SQL databases including MySQL, ORACLE, PostgreSQL, Apache Ignite, and Hive. Some of your data may already reside in an external SQL database. PXF provides access to this data via the PXF JDBC connector. The JDBC connector is a JDBC client. It can read data from and write data to SQL databases including MySQL, ORACLE, PostgreSQL, Hive, and Apache Ignite.
This section describes how to use the PXF JDBC connector to access data in an external SQL database, including how to create and query or insert data into a PXF external table that references a table in an external database. This section describes how to use the PXF JDBC connector to access data in an external SQL database, including how to create and query or insert data into a PXF external table that references a table in an external database.
...@@ -63,7 +63,7 @@ Use the following syntax to create a Greenplum Database external table that refe ...@@ -63,7 +63,7 @@ Use the following syntax to create a Greenplum Database external table that refe
<pre> <pre>
CREATE [READABLE | WRITABLE] EXTERNAL TABLE &lt;table_name> CREATE [READABLE | WRITABLE] EXTERNAL TABLE &lt;table_name>
( &lt;column_name> &lt;data_type> [, ...] | LIKE &lt;other_table> ) ( &lt;column_name> &lt;data_type> [, ...] | LIKE &lt;other_table> )
LOCATION ('pxf://&lt;external-table-name>?<b>PROFILE=Jdbc[&SERVER=&lt;servername>]</b>[&&lt;custom-option>=&lt;value>[...]]') LOCATION ('pxf://&lt;external-table-name>?<b>PROFILE=Jdbc[&SERVER=&lt;server_name>]</b>[&&lt;custom-option>=&lt;value>[...]]')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'|'pxfwritable_export'); FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'|'pxfwritable_export');
</pre> </pre>
...@@ -73,7 +73,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -73,7 +73,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<external&#8209;table&#8209;name\> | The full name of the external table. Depends on the external SQL database, may include a schema name and a table name. | | \<external&#8209;table&#8209;name\> | The full name of the external table. Depends on the external SQL database, may include a schema name and a table name. |
| PROFILE | The `PROFILE` keyword value must specify `Jdbc`. | | PROFILE | The `PROFILE` keyword value must specify `Jdbc`. |
| SERVER=\<servername\> | The named server configuration that PXF uses to access the external SQL database. | | SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\>=\<value\> | \<custom-option\> is profile-specific. `Jdbc` profile-specific options are discussed in the next section.| | \<custom&#8209;option\>=\<value\> | \<custom-option\> is profile-specific. `Jdbc` profile-specific options are discussed in the next section.|
| FORMAT 'CUSTOM' | The JDBC `CUSTOM` `FORMAT` supports the built-in `'pxfwritable_import'` `FORMATTER` function for read operations and the built-in `'pxfwritable_export'` function for write operations. | | FORMAT 'CUSTOM' | The JDBC `CUSTOM` `FORMAT` supports the built-in `'pxfwritable_import'` `FORMATTER` function for read operations and the built-in `'pxfwritable_export'` function for write operations. |
......
...@@ -51,7 +51,7 @@ The following syntax creates a Greenplum Database readable external table that r ...@@ -51,7 +51,7 @@ The following syntax creates a Greenplum Database readable external table that r
``` sql ``` sql
CREATE EXTERNAL TABLE <table_name> CREATE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-file>?PROFILE=<objstore>:avro&SERVER=<server_name>[&<custom-option>=<value>[...]]') LOCATION ('pxf://<path-to-file>?PROFILE=<objstore>:avro[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
``` ```
...@@ -61,7 +61,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -61,7 +61,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;file\> | The absolute path to the directory or file in the object store. | | \<path&#8209;to&#8209;file\> | The absolute path to the directory or file in the object store. |
| PROFILE=\<objstore\>:avro | The `PROFILE` keyword must identify the specific object store. For example, `s3:avro`. | | PROFILE=\<objstore\>:avro | The `PROFILE` keyword must identify the specific object store. For example, `s3:avro`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. | | SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\>=\<value\> | Avro-specific custom options are described in the [PXF HDFS Avro documentation](hdfs_avro.html#customopts). | | \<custom&#8209;option\>=\<value\> | Avro-specific custom options are described in the [PXF HDFS Avro documentation](hdfs_avro.html#customopts). |
| FORMAT 'CUSTOM' | Use `FORMAT` `'CUSTOM'` with the `<objstore>:avro` profile. The `CUSTOM` `FORMAT` requires that you specify `(FORMATTER='pxfwritable_import')`. | | FORMAT 'CUSTOM' | Use `FORMAT` `'CUSTOM'` with the `<objstore>:avro` profile. The `CUSTOM` `FORMAT` requires that you specify `(FORMATTER='pxfwritable_import')`. |
......
...@@ -8,56 +8,10 @@ You can use PXF to access Azure Data Lake, Azure Blob Storage, Google Cloud Stor ...@@ -8,56 +8,10 @@ You can use PXF to access Azure Data Lake, Azure Blob Storage, Google Cloud Stor
## <a id="about_cfg"></a>About Object Store Configuration ## <a id="about_cfg"></a>About Object Store Configuration
To access data in an object store, you must provide a server location and client credentials. PXF provides a template configuration file for each Hadoop and object store connector. These server template configuration files, located in the `$PXF_CONF/templates/` directory, identify the minimum set of properties that you must configure to use the connector. To access data in an object store, you must provide a server location and client credentials. When you configure a PXF object store connector, you add at least one named PXF server configuration for the connector as described in [About PXF Server Configuration](cfg_server.html#cfgproc).
``` PXF provides a template configuration file for each object store connector. These template files are located in the `$PXF_CONF/templates/` directory.
gpadmin@gpmaster$ ls $PXF_CONF/templates
adl-site.xml hbase-site.xml jdbc-site.xml s3-site.xml
core-site.xml hdfs-site.xml mapred-site.xml wasbs-site.xml
gs-site.xml hive-site.xml minio-site.xml yarn-site.xml
```
For example, the contents of the `s3-site.xml` template file follow:
``` pre
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>YOUR_AWS_ACCESS_KEY_ID</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>YOUR_AWS_SECRET_ACCESS_KEY</value>
</property>
<property>
<name>fs.s3a.fast.upload</name>
<value>true</value>
</property>
</configuration>
```
<div class="note">You specify object store credentials to PXF in clear text in configuration files.</div>
When you configure a PXF object store connector, you add at least one named PXF server configuration for the connector. You:
1. Choose a name for the server configuration.
2. Create the directory `$PXF_CONF/servers/<server_name>`.
3. Copy the PXF template configuration file corresponding to the object store to the new server directory.
4. Fill in appropriate values for the properties in the template file.
5. Add additional properties and values if required for your environment.
6. Synchronize the server configuration to the Greenplum Database cluster.
7. Publish the PXF server names to your Greenplum Database end users as appropriate.
The Greenplum Database user specifies the server name in the `CREATE EXTERNAL TABLE` `LOCATION` clause `SERVER` option to access the object store. For example:
<pre>
CREATE EXTERNAL TABLE pxf_ext_tbl(name text, orders int)
LOCATION ('pxf://BUCKET/dir/file.txt?PROFILE=s3:text&<b>SERVER=s3srvcfg</b>')
FORMAT 'TEXT' (delimiter=E',');
</pre>
<div class="note info">A Greenplum Database user who queries or writes to an external table that specifies a server name accesses the object store with the credentials configured for that server.</div>
### <a id="abs_cfg"></a>Azure Blob Storage Server Configuration ### <a id="abs_cfg"></a>Azure Blob Storage Server Configuration
...@@ -220,11 +174,11 @@ To set these properties in the `s3-site.xml` file: ...@@ -220,11 +174,11 @@ To set these properties in the `s3-site.xml` file:
To enable SSE-C for a specific S3 bucket, use the property name variants that include the bucket name as described in the SSE-KMS example. To enable SSE-C for a specific S3 bucket, use the property name variants that include the bucket name as described in the SSE-KMS example.
## <a id="cfg_proc"></a>Example Configuration Procedure ## <a id="cfg_proc"></a>Example Server Configuration Procedure
Ensure that you have initialized PXF before you configure an object store connector. Ensure that you have initialized PXF before you configure an object store connector server.
In this procedure, you name and add a PXF server configuration in the `$PXF_CONF/servers` directory on the Greenplum Database master host for each object store connector that you plan to use. You then use the `pxf cluster sync` command to sync the server configuration(s) to the Greenplum Database cluster. In this procedure, you name and add a PXF server configuration in the `$PXF_CONF/servers` directory on the Greenplum Database master host for the Google Cloud Storate (GCS) connector. You then use the `pxf cluster sync` command to sync the server configuration(s) to the Greenplum Database cluster.
1. Log in to your Greenplum Database master node: 1. Log in to your Greenplum Database master node:
...@@ -232,48 +186,40 @@ In this procedure, you name and add a PXF server configuration in the `$PXF_CONF ...@@ -232,48 +186,40 @@ In this procedure, you name and add a PXF server configuration in the `$PXF_CONF
$ ssh gpadmin@<gpmaster> $ ssh gpadmin@<gpmaster>
``` ```
2. PXF includes connectors to the Azure Data Lake, Azure Blob Storage, Google Cloud Storage, Minio, and S3 object stores. Identify the PXF object store connectors that you want to configure. 2. Choose a name for the server. You will provide the name to end users that need to reference files in the object store.
3. For each object store connector that you configure:
1. Choose a name for the server. You will provide the name to end users that need to reference files in the object store.
**Note**: The server name `default` is reserved. 3. Create the `$PXF_HOME/servers/<server_name>` directory. For example, use the following command to create a server configuration for a Google Cloud Storage server named `gs_public`:
2. Create the `$PXF_HOME/servers/<server_name>` directory. For example, use the following command if you are creating a server configuration for Google Cloud Storage and you want to name the server `gs_public`: ``` shell
gpadmin@gpmaster$ mkdir $PXF_CONF/servers/gs_public
``` shell ````
gpadmin@gpmaster$ mkdir $PXF_CONF/servers/gs_public
````
3. Copy the PXF template file for the object store to the server configuration directory. For example: 3. Copy the PXF template file for GCS to the server configuration directory. For example:
``` shell ``` shell
gpadmin@gpmaster$ cp $PXF_CONF/templates/gs-site.xml $PXF_CONF/servers/gs_public/ gpadmin@gpmaster$ cp $PXF_CONF/templates/gs-site.xml $PXF_CONF/servers/gs_public/
``` ```
4. Open the template server configuration file in the editor of your choice, and provide appropriate property values for your environment. For example, if your Google Cloud Storage key file is located in `/home/gpadmin/keys/gcs-account.key.json`: 4. Open the template server configuration file in the editor of your choice, and provide appropriate property values for your environment. For example, if your Google Cloud Storage key file is located in `/home/gpadmin/keys/gcs-account.key.json`:
``` pre ``` pre
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?>
<configuration> <configuration>
<property> <property>
<name>google.cloud.auth.service.account.enable</name> <name>google.cloud.auth.service.account.enable</name>
<value>true</value> <value>true</value>
</property> </property>
<property> <property>
<name>google.cloud.auth.service.account.json.keyfile</name> <name>google.cloud.auth.service.account.json.keyfile</name>
<value>/home/gpadmin/keys/gcs-account.key.json</value> <value>/home/gpadmin/keys/gcs-account.key.json</value>
</property> </property>
<property> <property>
<name>fs.AbstractFileSystem.gs.impl</name> <name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value> <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
</property> </property>
</configuration> </configuration>
``` ```
5. Save your changes and exit the editor. 5. Save your changes and exit the editor.
6. Repeat Step 3 to configure the next object store connector.
4. Use the `pxf cluster sync` command to copy the new server configurations to the Greenplum Database cluster. For example: 4. Use the `pxf cluster sync` command to copy the new server configurations to the Greenplum Database cluster. For example:
...@@ -281,12 +227,3 @@ In this procedure, you name and add a PXF server configuration in the `$PXF_CONF ...@@ -281,12 +227,3 @@ In this procedure, you name and add a PXF server configuration in the `$PXF_CONF
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
``` ```
## <a id="client-cfg-update"></a>Adding or Updating Object Store Configuration
If you add or update the object store server configuration on the Greenplum Database master host, you must re-sync the PXF configuration to the Greenplum Database cluster:
``` shell
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
```
...@@ -50,7 +50,7 @@ The following syntax creates a Greenplum Database readable external table that r ...@@ -50,7 +50,7 @@ The following syntax creates a Greenplum Database readable external table that r
``` sql ``` sql
CREATE EXTERNAL TABLE <table_name> CREATE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-file>?PROFILE=<objstore>:json&SERVER=<server_name>[&<custom-option>=<value>[...]]') LOCATION ('pxf://<path-to-file>?PROFILE=<objstore>:json[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
``` ```
...@@ -60,7 +60,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -60,7 +60,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;file\> | The absolute path to the directory or file in the object store. | | \<path&#8209;to&#8209;file\> | The absolute path to the directory or file in the object store. |
| PROFILE=\<objstore\>:json | The `PROFILE` keyword must identify the specific object store. For example, `s3:json`. | | PROFILE=\<objstore\>:json | The `PROFILE` keyword must identify the specific object store. For example, `s3:json`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. | | SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\>=\<value\> | JSON supports the custom option named `IDENTIFIER` as described in the [PXF HDFS JSON documentation](hdfs_json.html#customopts). | | \<custom&#8209;option\>=\<value\> | JSON supports the custom option named `IDENTIFIER` as described in the [PXF HDFS JSON documentation](hdfs_json.html#customopts). |
| FORMAT 'CUSTOM' | Use `FORMAT` `'CUSTOM'` with the `<objstore>:json` profile. The `CUSTOM` `FORMAT` requires that you specify `(FORMATTER='pxfwritable_import')`. | | FORMAT 'CUSTOM' | Use `FORMAT` `'CUSTOM'` with the `<objstore>:json` profile. The `CUSTOM` `FORMAT` requires that you specify `(FORMATTER='pxfwritable_import')`. |
......
...@@ -55,7 +55,7 @@ Use the following syntax to create a Greenplum Database external table that refe ...@@ -55,7 +55,7 @@ Use the following syntax to create a Greenplum Database external table that refe
CREATE [WRITABLE] EXTERNAL TABLE <table_name> CREATE [WRITABLE] EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-dir> LOCATION ('pxf://<path-to-dir>
?PROFILE=<objstore>:parquet&SERVER=<server_name>[&<custom-option>=<value>[...]]') ?PROFILE=<objstore>:parquet[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'|'pxfwritable_export'); FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'|'pxfwritable_export');
[DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY]; [DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY];
``` ```
...@@ -66,7 +66,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -66,7 +66,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;dir\> | The absolute path to the directory in the object store. | | \<path&#8209;to&#8209;dir\> | The absolute path to the directory in the object store. |
| PROFILE=\<objstore\>:parquet | The `PROFILE` keyword must identify the specific object store. For example, `s3:parquet`. | | PROFILE=\<objstore\>:parquet | The `PROFILE` keyword must identify the specific object store. For example, `s3:parquet`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. | | SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\>=\<value\> | Parquet-specific custom write options are described in the [PXF HDFS Parquet documentation](hdfs_parquet.html#customopts). | | \<custom&#8209;option\>=\<value\> | Parquet-specific custom write options are described in the [PXF HDFS Parquet documentation](hdfs_parquet.html#customopts). |
| FORMAT 'CUSTOM' | Use `FORMAT` '`CUSTOM`' with `(FORMATTER='pxfwritable_export')` (write) or `(FORMATTER='pxfwritable_import')` (read). | | FORMAT 'CUSTOM' | Use `FORMAT` '`CUSTOM`' with `(FORMATTER='pxfwritable_export')` (write) or `(FORMATTER='pxfwritable_import')` (read). |
| DISTRIBUTED BY | If you plan to load the writable external table with data from an existing Greenplum Database table, consider specifying the same distribution policy or \<column_name\> on the writable external table as that defined for the table from which you plan to load the data. Doing so will avoid extra motion of data between segments on the load operation. | | DISTRIBUTED BY | If you plan to load the writable external table with data from an existing Greenplum Database table, consider specifying the same distribution policy or \<column_name\> on the writable external table as that defined for the table from which you plan to load the data. Doing so will avoid extra motion of data between segments on the load operation. |
......
...@@ -48,7 +48,7 @@ Use the following syntax to create a Greenplum Database external table that refe ...@@ -48,7 +48,7 @@ Use the following syntax to create a Greenplum Database external table that refe
CREATE [WRITABLE] EXTERNAL TABLE <table_name> CREATE [WRITABLE] EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-dir> LOCATION ('pxf://<path-to-dir>
?PROFILE=<objstore>:SequenceFile&SERVER=<server_name>[&<custom-option>=<value>[...]]') ?PROFILE=<objstore>:SequenceFile[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'|'pxfwritable_export') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'|'pxfwritable_export')
[DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY]; [DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY];
``` ```
...@@ -59,7 +59,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -59,7 +59,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;dir\> | The absolute path to the directory in the object store. | | \<path&#8209;to&#8209;dir\> | The absolute path to the directory in the object store. |
| PROFILE=\<objstore\>:SequenceFile | The `PROFILE` keyword must identify the specific object store. For example, `s3:SequenceFile`. | | PROFILE=\<objstore\>:SequenceFile | The `PROFILE` keyword must identify the specific object store. For example, `s3:SequenceFile`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. | | SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\>=\<value\> | SequenceFile-specific custom options are described in the [PXF HDFS SequenceFile documentation](hdfs_seqfile.html#customopts). | | \<custom&#8209;option\>=\<value\> | SequenceFile-specific custom options are described in the [PXF HDFS SequenceFile documentation](hdfs_seqfile.html#customopts). |
| FORMAT 'CUSTOM' | Use `FORMAT` '`CUSTOM`' with `(FORMATTER='pxfwritable_export')` (write) or `(FORMATTER='pxfwritable_import')` (read). | | FORMAT 'CUSTOM' | Use `FORMAT` '`CUSTOM`' with `(FORMATTER='pxfwritable_export')` (write) or `(FORMATTER='pxfwritable_import')` (read). |
| DISTRIBUTED BY | If you plan to load the writable external table with data from an existing Greenplum Database table, consider specifying the same distribution policy or \<column_name\> on the writable external table as that defined for the table from which you plan to load the data. Doing so will avoid extra motion of data between segments on the load operation. | | DISTRIBUTED BY | If you plan to load the writable external table with data from an existing Greenplum Database table, consider specifying the same distribution policy or \<column_name\> on the writable external table as that defined for the table from which you plan to load the data. Doing so will avoid extra motion of data between segments on the load operation. |
......
...@@ -46,7 +46,7 @@ The following syntax creates a Greenplum Database readable external table that r ...@@ -46,7 +46,7 @@ The following syntax creates a Greenplum Database readable external table that r
``` sql ``` sql
CREATE EXTERNAL TABLE <table_name> CREATE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-file>?PROFILE=<objstore>:text&SERVER=<server_name>[&<custom-option>=<value>[...]]') LOCATION ('pxf://<path-to-file>?PROFILE=<objstore>:text[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>'); FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>');
``` ```
...@@ -56,7 +56,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -56,7 +56,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;file\> | The absolute path to the directory or file in the S3 object store. | | \<path&#8209;to&#8209;file\> | The absolute path to the directory or file in the S3 object store. |
| PROFILE=\<objstore\>:text | The `PROFILE` keyword must identify the specific object store. For example, `s3:text`. | | PROFILE=\<objstore\>:text | The `PROFILE` keyword must identify the specific object store. For example, `s3:text`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. | | SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| FORMAT | Use `FORMAT` `'TEXT'` when \<path-to-file\> references plain text delimited data.<br> Use `FORMAT` `'CSV'` when \<path-to-file\> references comma-separated value data. | | FORMAT | Use `FORMAT` `'TEXT'` when \<path-to-file\> references plain text delimited data.<br> Use `FORMAT` `'CSV'` when \<path-to-file\> references comma-separated value data. |
| delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. | | delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. |
...@@ -149,7 +149,7 @@ Use the `<objstore>:text:multi` profile to read plain text data with delimited s ...@@ -149,7 +149,7 @@ Use the `<objstore>:text:multi` profile to read plain text data with delimited s
``` sql ``` sql
CREATE EXTERNAL TABLE <table_name> CREATE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-file>?PROFILE=<objstore>:text:multi&SERVER=<server_name>[&<custom-option>=<value>[...]]') LOCATION ('pxf://<path-to-file>?PROFILE=<objstore>:text:multi[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>'); FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>');
``` ```
...@@ -159,7 +159,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -159,7 +159,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;file\> | The absolute path to the directory or file in the S3 data store. | | \<path&#8209;to&#8209;file\> | The absolute path to the directory or file in the S3 data store. |
| PROFILE=\<objstore\>:text:multi | The `PROFILE` keyword must identify the specific object store. For example, `s3:text:multi`. | | PROFILE=\<objstore\>:text:multi | The `PROFILE` keyword must identify the specific object store. For example, `s3:text:multi`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. | | SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| FORMAT | Use `FORMAT` `'TEXT'` when \<path-to-file\> references plain text delimited data.<br> Use `FORMAT` `'CSV'` when \<path-to-file\> references comma-separated value data. | | FORMAT | Use `FORMAT` `'TEXT'` when \<path-to-file\> references plain text delimited data.<br> Use `FORMAT` `'CSV'` when \<path-to-file\> references comma-separated value data. |
| delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. | | delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. |
...@@ -248,7 +248,7 @@ Use the following syntax to create a Greenplum Database writable external table ...@@ -248,7 +248,7 @@ Use the following syntax to create a Greenplum Database writable external table
CREATE WRITABLE EXTERNAL TABLE <table_name> CREATE WRITABLE EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> ) ( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<path-to-dir> LOCATION ('pxf://<path-to-dir>
?PROFILE=<objstore>:text&SERVER=<server_name>[&<custom-option>=<value>[...]]') ?PROFILE=<objstore>:text[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>'); FORMAT '[TEXT|CSV]' (delimiter[=|<space>][E]'<delim_value>');
[DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY]; [DISTRIBUTED BY (<column_name> [, ... ] ) | DISTRIBUTED RANDOMLY];
``` ```
...@@ -259,7 +259,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid ...@@ -259,7 +259,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------| |-------|-------------------------------------|
| \<path&#8209;to&#8209;dir\> | The absolute path to the directory in the S3 data store. | | \<path&#8209;to&#8209;dir\> | The absolute path to the directory in the S3 data store. |
| PROFILE=\<objstore\>:text | The `PROFILE` keyword must identify the specific object store. For example, `s3:text`. | | PROFILE=\<objstore\>:text | The `PROFILE` keyword must identify the specific object store. For example, `s3:text`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. | | SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. Optional; PXF uses the `default` server if not specified. |
| \<custom&#8209;option\>=\<value\> | \<custom-option\>s are described below.| | \<custom&#8209;option\>=\<value\> | \<custom-option\>s are described below.|
| FORMAT | Use `FORMAT` `'TEXT'` to write plain, delimited text to \<path-to-dir\>.<br> Use `FORMAT` `'CSV'` to write comma-separated value text to \<path-to-dir\>. | | FORMAT | Use `FORMAT` `'TEXT'` to write plain, delimited text to \<path-to-dir\>.<br> Use `FORMAT` `'CSV'` to write comma-separated value text to \<path-to-dir\>. |
| delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. | | delimiter | The delimiter character in the data. For `FORMAT` `'CSV'`, the default \<delim_value\> is a comma `,`. Preface the \<delim_value\> with an `E` when the value is an escape sequence. Examples: `(delimiter=E'\t')`, `(delimiter ':')`. |
......
...@@ -10,6 +10,8 @@ When Kerberos is enabled for your HDFS filesystem, PXF, as an HDFS client, requi ...@@ -10,6 +10,8 @@ When Kerberos is enabled for your HDFS filesystem, PXF, as an HDFS client, requi
Before you configure PXF for access to a secure HDFS filesystem, ensure that you have: Before you configure PXF for access to a secure HDFS filesystem, ensure that you have:
- Configured the Hadoop connectors using the default PXF server configuration.
- Initialized, configured, and started PXF as described in [Configuring PXF](instcfg_pxf.html), including enabling PXF and Hadoop user impersonation. - Initialized, configured, and started PXF as described in [Configuring PXF](instcfg_pxf.html), including enabling PXF and Hadoop user impersonation.
- Enabled Kerberos for your Hadoop cluster per the instructions for your specific distribution and verified the configuration. - Enabled Kerberos for your Hadoop cluster per the instructions for your specific distribution and verified the configuration.
......
...@@ -63,7 +63,7 @@ When PXF user personation is enabled (the default), you must configure the Hadoo ...@@ -63,7 +63,7 @@ When PXF user personation is enabled (the default), you must configure the Hadoo
``` ```
4. After changing `core-site.xml`, you must restart Hadoop for your changes to take effect. 4. After changing `core-site.xml`, you must restart Hadoop for your changes to take effect.
5. Copy the updated `core-site.xml` file to the PXF Hadoop configuration directory `$PXF_CONF/servers/default` on the master, standby master, and on each Greenplum Database segment host. 5. Copy the updated `core-site.xml` file to the PXF Hadoop server configuration directory `$PXF_CONF/servers/<server_name>` on the master and synchronize the configuration to the standby master and each Greenplum Database segment host.
## <a id="hive"></a>Hive User Impersonation ## <a id="hive"></a>Hive User Impersonation
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册