提交 74a13a66 编写于 作者: L Lisa Owen 提交者: David Yozie

docs - pxf jdbc server file-based config (#7269)

* docs - pxf jdbc server file-based config

* add apache ignite to list of dbs supported by jdbc connector

* misc edits requested by david
上级 46f6f470
......@@ -34,6 +34,7 @@
</ul>
</li>
<li><a href="/6-0/pxf/objstore_cfg.html" format="markdown">Configuring Connectors to Azure, Google Cloud Storage, Minio, and S3 Object Stores (Optional)</a> </li>
<li><a href="/6-0/pxf/jdbc_cfg.html" format="markdown">Configuring the JDBC Connector (Optional)</a> </li>
</ul>
</lie>
<li><a href="/6-0/pxf/upgrade_pxf.html" format="markdown">Upgrading PXF</a></li>
......
......@@ -16,5 +16,7 @@ To configure PXF, you must:
3. If you plan to use the PXF connectors to access the Azure, Google Cloud Storage, Minio, or S3 object store(s), you must perform the configuration procedure described in [Configuring Connectors to Azure, Google Cloud Storage, Minio, and S3 Object Stores](objstore_cfg.html).
3. If you plan to use the PXF JDBC Connector to access an external SQL database, perform the configuration procedure described in [Configuring the JDBC Connector](jdbc_cfg.html).
4. [Start PXF](cfginitstart_pxf.html).
---
title: Configuring the JDBC Connector (Optional)
---
You can use PXF to access an external SQL database including MySQL, ORACLE, PostgreSQL, Apache Ignite, and Hive. This topic describes how to configure the PXF JDBC Connector to access these external data sources.
*If you do not plan to use the PXF JDBC Connector, then you do not need to perform this procedure.*
## <a id="about_cfg"></a>About JDBC Configuration
To access data in an external SQL database with the PXF JDBC Connector, you must:
- Register a compatible JDBC driver JAR file
- Specify the JDBC driver class name, database URL, and client credentials
### <a id="cfg_jar"></a>JDBC Driver JAR Registration
The PXF JDBC connector is installed with the `postgresql-8.4-702.jdbc4.jar` JAR file. If you require a different JDBC driver, ensure that you install the JDBC driver JAR file for the external SQL database in the `$PXF_CONF/lib` directory on each segment host. Be sure to install JDBC driver JAR files that are compatible with your JRE version. See [Registering PXF JAR Dependencies](reg_jar_depend.html) for additional information.
### <a id="cfg_server"></a>JDBC Server Configuration
PXF provides a template configuration file for the JDBC Connector. This server template configuration file, located in `$PXF_CONF/templates/jdbc-site.xml`, identifies the minimum set of properties that you must configure to use the PXF JDBC Connector.
The properties in the `jdbc-site.xml` server template file follow:
| Property | Description | Value |
|----------------|--------------------------------------------|-------|
| jdbc.driver | Class name of the JDBC driver. | The JDBC driver Java class name; for example `org.postgresql.Driver`. |
| jdbc.url | The URL that the JDBC driver uses to connect to the database. | The database connection URL (database-specific); for example `jdbc:postgresql://phost:pport/pdatabase`. |
| jdbc.user | The database user name. | The user name for connecting to the database. |
| jdbc.password | The password for `jdbc.user`. | The password for connecting to the database. |
<div class="note">When you configure a PXF JDBC server, you specify the external database user credentials to PXF in clear text in a configuration file.</div>
When you configure the PXF JDBC Connector to access an external SQL database, you add at least one named PXF server configuration for the connector. You:
1. Choose a name for the server configuration.
2. Create the directory `$PXF_CONF/servers/<server_name>`.
3. Copy the PXF `jdbc-site.xml` template configuration file to the new server directory.
4. Fill in appropriate values for the properties in the template file.
6. Synchronize the server configuration to each Greenplum Database segment host.
7. Publish the PXF server name(s) to your Greenplum Database end users as appropriate.
The Greenplum Database user specifies the `<server_name>` in the `CREATE EXTERNAL TABLE` `LOCATION` clause `SERVER` option to access the external SQL database. For example, if you created a server configuration and named the server directory `pgsrvcfg`:
<pre>
CREATE EXTERNAL TABLE pxf_ext_tbl(name text, orders int)
LOCATION ('pxf://schema.tblname?PROFILE=Jdbc&<b>SERVER=pgsrvcfg</b>')
FORMAT 'TEXT' (delimiter=E',');
</pre>
<div class="note info">A Greenplum Database user who queries or writes to an external table that specifies a server name accesses the external SQL database with the credentials configured for that server.</div>
While not recommended, you can override a JDBC server configuration by directly specifying the driver, database URL, and/or user credentials via custom options in the `CREATE EXTERNAL TABLE` command `LOCATION` clause. Refer to [Accessing an SQL Database with JDBC](jdbc_pxf.html) for additional information.
## <a id="cfg_proc"></a>Example Configuration Procedure
Ensure that you have initialized PXF before you configure a JDBC Connector server.
In this procedure, you name and add a PXF JDBC server configuration for a PostgreSQL database and synchronize the server configuration(s) to all segment hosts.
1. Log in to your Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
```
2. Choose a name for the JDBC server. You will provide the name to Greenplum users that you choose to allow to reference tables in the external SQL database as the configured user.
**Note**: The server name `default` is reserved.
3. Create the `$PXF_HOME/servers/<server_name>` directory. For example, use the following command to create a JDBC server configuration named `pg_user1_testdb`:
``` shell
gpadmin@gpmaster$ mkdir $PXF_CONF/servers/pg_user1_testdb
````
4. Copy the PXF JDBC server template file to the server configuration directory. For example:
``` shell
gpadmin@gpmaster$ cp $PXF_CONF/templates/jdbc-site.xml $PXF_CONF/servers/pg_user1_testdb/
```
5. Open the template server configuration file in the editor of your choice, and provide appropriate property values for your environment. For example, if you are configuring access to a PostgreSQL database named `testdb` on a PostgreSQL instance running on the host named `pgserverhost` for the user named `user1`:
``` xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>jdbc.driver</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>jdbc.url</name>
<value>jdbc:postgresql://pgserverhost:5432/testdb</value>
</property>
<property>
<name>jdbc.user</name>
<value>user1</value>
</property>
<property>
<name>jdbc.password</name>
<value>changeme</value>
</property>
</configuration>
```
6. Save your changes and exit the editor.
7. Use the `pxf cluster sync` command to copy the new server configurations to each Greenplum Database segment host. For example:
``` shell
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
```
## <a id="client-cfg-update"></a>Adding or Updating JDBC Server Configuration
If you add a new, or update an existing, JDBC server configuration on the Greenplum Database master host, you must re-sync the PXF configuration to the Greenplum Database cluster:
``` shell
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
```
......@@ -21,11 +21,11 @@ specific language governing permissions and limitations
under the License.
-->
Some of your data may already reside in an external SQL database. PXF provides access to this data via the PXF JDBC Connector. The JDBC connector is a JDBC client. It can read data from and write data to SQL databases including MySQL, ORACLE, PostgreSQL, and Hive.
Some of your data may already reside in an external SQL database. PXF provides access to this data via the PXF JDBC Connector. The JDBC connector is a JDBC client. It can read data from and write data to SQL databases including MySQL, ORACLE, PostgreSQL, Apache Ignite, and Hive.
This section describes how to use the PXF JDBC connector to access data in an external SQL database, including how to create and query or insert data into a PXF external table that references a table in an external database.
**Note**: The JDBC Connector does not guarantee consistency when writing to an external SQL database. Be aware that if an `INSERT` operation fails, some data may be written to the external database table. If you require consistency for writes, consider writing to a staging table in the external database, and loading to the target table only after verifying the write operation.
<div class="note">The JDBC Connector does not guarantee consistency when writing to an external SQL database. Be aware that if an <code>INSERT</code> operation fails, some data may be written to the external database table. If you require consistency for writes, consider writing to a staging table in the external database, and loading to the target table only after verifying the write operation.</div>
## <a id="prereq"></a>Prerequisites
......@@ -35,10 +35,7 @@ Before you access an external SQL database using the PXF JDBC connector, ensure
- You can identify the PXF user configuration directory (`$PXF_CONF`).
- Connectivity exists between all Greenplum Database segment hosts and the external SQL database.
- You have configured your external SQL database for user access from all Greenplum Database segment hosts.
The PXF JDBC connector is installed with the `postgresql-8.4-702.jdbc4.jar` JAR file. If you require a different JDBC JAR file(s), ensure that:
- You have installed the JDBC driver JAR files for the external SQL database in the `$PXF_CONF/lib` directory on each segment host. Be sure to install JDBC driver JAR files that are compatible with your JRE version. See [Registering PXF JAR Dependencies](reg_jar_depend.html) for additional information.
- You have registered any JDBC driver JAR dependencies, and you have created one or more named PXF JDBC Connector server configurations as described in [Configuring the PXF JDBC Connector](jdbc_cfg.html).
## <a id="datatypes"></a>Data Types Supported
......@@ -62,12 +59,12 @@ To access data in an external SQL database, you create a readable or writable Gr
Use the following syntax to create a Greenplum Database external table that references an external SQL database table and uses the JDBC connector to read or write data:
``` sql
<pre>
CREATE [READABLE | WRITABLE] EXTERNAL TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> )
LOCATION ('pxf://<external-table-name>?PROFILE=Jdbc[&<custom-option>=<value>[...]]')
LOCATION ('pxf://<external-table-name>?<b>PROFILE=Jdbc[&SERVER=&lt;servername>]</b>[&&lt;custom-option>=&lt;value>[...]]')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'|'pxfwritable_export');
```
</pre>
The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guide/sql_commands/CREATE_EXTERNAL_TABLE.html) command are described in the table below.
......@@ -75,6 +72,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
|-------|-------------------------------------|
| \<external&#8209;table&#8209;name\> | The full name of the external table. Depends on the external SQL database, may include a schema name and a table name. |
| PROFILE | The `PROFILE` keyword value must specify `Jdbc`. |
| SERVER=\<servername\> | The named server configuration that PXF uses to access the external SQL database. |
| \<custom&#8209;option\> | \<custom-option\> is profile-specific. `Jdbc` profile-specific options are discussed in the next section.|
| FORMAT 'CUSTOM' | The JDBC `CUSTOM` `FORMAT` supports the built-in `'pxfwritable_import'` `FORMATTER` function for read operations and the built-in `'pxfwritable_export'` function for write operations. |
......@@ -85,14 +83,28 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
You include JDBC connector custom options in the `LOCATION` URI, prefacing each option with an ampersand `&`.
The `Jdbc` profile supports the following \<custom-option\> values:
The `Jdbc` profile supports the following connection-related \<custom-option\>s:
| Custom Option Name | jdbc-site.xml Property Name | Description
|----------------------|-----------------------------|--------|
| JDBC_DRIVER | jdbc.driver | The JDBC driver class name. (Required) |
| DB_URL | jdbc.url | The external database URL. Depends on the external SQL database, typically includes at least the hostname, port, and database name. (Required) |
| USER | jdbc.user | The database user name. Required if `PASS` is provided. |
| PASS | jdbc.password | The database password for `USER`. Required if `USER` is provided. |
<div class="note">If you provide a <code>SERVER=&lt;servername></code> in the <code>CREATE EXTERNAL TABLE</code> <code>LOCATION</code> clause, any connection option that you include in the <code>LOCATION</code> clause overrides the value that you specified in the <code>&lt;servername></code>'s <code>jdbc-site.xml</code> configuration file.</div>
Example JDBC \<custom-option\> connection strings:
``` pre
&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://gpmaster:5432/pgtestdb&USER=pguser1&PASS=changeme
&JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://mysqlhost:3306/testdb&USER=user1&PASS=changeme
```
Additional `CREATE EXTERNAL TABLE` \<custom-option\>s supported by the `Jdbc` profile include:
| Option Name | Operation | Description
|---------------|------------|--------|
| JDBC_DRIVER | Read, Write | The JDBC driver class name. (Required) |
| DB_URL | Read, Write | The external database URL. Depends on the external SQL database, typically includes at least the hostname, port, and database name. (Required) |
| USER | Read, Write | The database user name. Required if `PASS` is provided. |
| PASS | Read, Write | The database password for `USER`. Required if `USER` is provided. |
| BATCH_SIZE | Write | Integer identifying the number of `INSERT` operations to batch to the external SQL database. PXF always validates a `BATCH_SIZE` option, even when provided on a read operation. Batching is enabled by default. |
| POOL_SIZE | Write | Enable thread pooling on `INSERT` operations and identify the number of threads in the pool. Thread pooling is disabled by default. |
| PARTITION_BY | Read | The partition column, \<column-name\>:\<column-type\>. You may specify only one partition column. The JDBC connector supports `date`, `int`, and `enum` \<column-type\> values. A null `PARTITION_BY` defaults to a single fragment. |
......@@ -100,12 +112,6 @@ The `Jdbc` profile supports the following \<custom-option\> values:
| INTERVAL | Read | Required when `PARTITION_BY` is specified and of the `int` or `date` type. The interval, \<interval-value\>[:\<interval-unit\>], of one fragment. Specify the size of the fragment in \<interval-value\>. If the partition column is a `date` type, use the \<interval-unit\> to specify `year`, `month`, or `day`. |
| QUOTE_COLUMNS | Read | Controls whether PXF should quote column names when constructing an SQL query to the external database. Specify `true` to force PXF to quote all column names; PXF does not quote column names if any other value is provided. If `QUOTE_COLUMNS` is not specified (the default), PXF automatically quotes *all* column names in the query when *any* column name:<br>- includes special characters, or <br>- is mixed case and the external database does not support unquoted mixed case identifiers. |
Example JDBC \<custom-option\> connection strings:
``` pre
&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://gpmaster:5432/pgtestdb&USER=pguser1&PASS=changeme
&JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://mysqlhost:3306/testdb&USER=user1&PASS=changeme
```
#### <a id="batching"></a>Batching Insert Operations (Write)
......@@ -160,6 +166,7 @@ In this example, you:
- Create a PostgreSQL database and table, and insert data into the table
- Create a PostgreSQL user and assign all privileges on the table to the user
- Configure the PXF JDBC Connector to access the PostgreSQL database
- Create a PXF readable external table that references the PostgreSQL table
- Read the data in the PostgreSQL table
- Create a PXF writable external table that references the PostgreSQL table
......@@ -211,18 +218,10 @@ Perform the following steps to create a PostgreSQL table named `forpxf_table1` i
7. Update the PostgreSQL configuration to allow user `pxfuser1` to access `pgtestdb` from each Greenplum Database segment host. This configuration is specific to your PostgreSQL environment. You will update the `/var/lib/pgsql/pg_hba.conf` file and then restart the PostgreSQL server.
8. Construct the JDBC connection string, substituting your PostgreSQL server hostname and port number. For example:
``` pre
&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://pserver:5432/pgtestdb&USER=pxfuser1&PASS=changeme
```
Save this string for use later.
#### <a id="ex_jdbconfig"></a>Configure the JDBC Connector
#### <a id="ex_jdbconfig"></a>Configure PXF
You must download the PostgreSQL driver JAR file to your system, copy the JAR file to the PXF user configuration directory, and then restart PXF.
You must create a JDBC server configuration for PostgreSQL, download the PostgreSQL driver JAR file to your system, copy the JAR file to the PXF user configuration directory, synchronize PXF configuration, and then restart PXF.
1. Log in to the Greenplum Database master node:
......@@ -230,6 +229,30 @@ You must download the PostgreSQL driver JAR file to your system, copy the JAR fi
$ ssh gpadmin@<gpmaster>
```
2. Create a JDBC server configuration for PostgreSQL as described in [Example Configuration Procedure](jdbc_cfg.html#cfg_proc), naming the server/directory `pgsrvcfg`. The `jdbc-site.xml` file contents should look similar to the following (substitute your PostgreSQL host system for `pgserverhost`):
``` xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>jdbc.driver</name>
<value>org.postgresql.Driver</value>
</property>
<property>
<name>jdbc.url</name>
<value>jdbc:postgresql://pgserverhost:5432/pgtestdb</value>
</property>
<property>
<name>jdbc.user</name>
<value>pxfuser1</value>
</property>
<property>
<name>jdbc.password</name>
<value>changeme</value>
</property>
</configuration>
```
2. [Download](https://jdbc.postgresql.org/download.html) a PostgreSQL JDBC driver JAR file and note the location of the downloaded file.
3. Copy the JDBC driver JAR file to `$PXF_CONF/lib` on the Greenplum Database master host. For example:
......@@ -254,12 +277,10 @@ Perform the following procedure to create a PXF external table that references t
``` sql
gpadmin=# CREATE EXTERNAL TABLE pxf_tblfrompg(id int)
LOCATION ('pxf://public.forpxf_table1?PROFILE=Jdbc&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://pserver:5432/pgtestdb&USER=pxfuser1&PASS=changeme')
LOCATION ('pxf://public.forpxf_table1?PROFILE=Jdbc&SERVER=pgsvrcfg')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
```
Substitute the `DB_URL` string that you constructed in the previous exercise.
2. Display all rows of the `pxf_tblfrompg` table:
``` sql
......@@ -280,12 +301,10 @@ Perform the following procedure to insert some data into the `forpxf_table1` Pos
``` sql
gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_writeto_postgres(id int)
LOCATION ('pxf://public.forpxf_table1?PROFILE=Jdbc&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://pserver:5432/pgtestdb&USER=pxfuser1&PASS=changeme')
LOCATION ('pxf://public.forpxf_table1?PROFILE=Jdbc&SERVER=pgsrvcfg')
FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export');
```
Again, substitute the `DB_URL` string that you constructed in the previous exercise.
4. Insert some data into the `pxf_writeto_postgres` table. For example:
``` sql
......
......@@ -12,9 +12,9 @@ To access data in an object store, you must provide a server location and client
```
gpadmin@gpmaster$ ls $PXF_CONF/templates
adl-site.xml hbase-site.xml mapred-site.xml wasbs-site.xml
core-site.xml hdfs-site.xml minio-site.xml yarn-site.xml
gs-site.xml hive-site.xml s3-site.xml
adl-site.xml hbase-site.xml jdbc-site.xml s3-site.xml
core-site.xml hdfs-site.xml mapred-site.xml wasbs-site.xml
gs-site.xml hive-site.xml minio-site.xml yarn-site.xml
```
For example, the contents of the `s3-site.xml` template file follow:
......
......@@ -59,7 +59,7 @@ After you upgrade to the new version of Greenplum Database, perform the followin
4. **If you are upgrading from Greenplum Database version 5.14 or earlier**:
1. If you updated the `pxf-env.sh` configuration file in your *PXF.from* installation, re-apply those changes to `$PXF_CONF/conf/pxf-env.sh`. For example::
1. If you updated the `pxf-env.sh` configuration file in your *PXF.from* installation, re-apply those changes to `$PXF_CONF/conf/pxf-env.sh`. For example:
``` shell
gpadmin@gpmaster$ vi $PXF_CONF/conf/pxf-env.sh
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册