提交 e648cf45 编写于 作者: L Lisa Owen 提交者: David Yozie

docs - pxf CLI refactor, add ref pages for pxf, pxf cluster (#6401)

* docs - pxf CLI refactor, add ref pages for pxf, pxf cluster

* display pxf cluster first in ref landing page

* remedy -> solution

* remove java 1.7

* remove commands that source greenplum_path.sh

* edits requested by david

* fix link

* update OSS book for new ref pages, add jdbc, edits
上级 6b1d7a34
......@@ -12,7 +12,7 @@
<a href="/600/pxf/intro_pxf.html" format="markdown">PXF Architecture</a>
</li>
<li>
<a href="/600/pxf/about_pxf_dir.html" format="markdown">About the PXF Installation Directories</a>
<a href="/600/pxf/about_pxf_dir.html" format="markdown">About the PXF Installation and Configuration Directories</a>
</li>
<li class="has_submenu">
<a href="/600/pxf/instcfg_pxf.html" format="markdown">Configuring PXF</a>
......@@ -65,9 +65,19 @@
<li>
<a href="/600/pxf/hbase_pxf.html" format="markdown">Accessing HBase Table Data with PXF</a>
</li>
<li>
<a href="/600/pxf/jdbc_pxf.html" format="markdown">Accessing an External SQL Database with PXF (JDBC)</a>
</li>
<li>
<a href="/600/pxf/troubleshooting_pxf.html" format="markdown">Troubleshooting PXF</a>
</li>
<li class="has_submenu">
<a href="/600/pxf/ref/pxf-ref.html" format="markdown">PXF Utility Reference</a>
<ul>
<li><a href="/600/pxf/ref/pxf-cluster.html" format="markdown">pxf cluster</a></li>
<li><a href="/600/pxf/ref/pxf.html" format="markdown">pxf</a></li>
</ul>
</li>
<li class="has_submenu">
<a href="/600/pxf/sdk/dev_overview.html" format="markdown">Using the PXF Java SDK</a>
<ul>
......
......@@ -144,6 +144,12 @@
<p>
<xref href="pgbouncer-admin.xml#topic1"/>
</p>
<p>
<xref href="../../pxf/ref/pxf.html" format="html" scope="peer">pxf</xref>
</p>
<p>
<xref href="../../pxf/ref/pxf-cluster.html" format="html" scope="peer">pxf cluster</xref>
</p>
</stentry>
</strow>
</simpletable>
......
......@@ -49,6 +49,8 @@
<topicref href="admin_utilities/pgbouncer.xml"/>
<topicref href="admin_utilities/pgbouncer-ini.xml"/>
<topicref href="admin_utilities/pgbouncer-admin.xml"/>
<topicref href="../pxf/ref/pxf.html" scope="peer" navtitle="pxf"/>
<topicref href="../pxf/ref/pxf-cluster.html" scope="peer" navtitle="pxf cluster"/>
</topicref>
<topicref href="cli_ref.xml">
<topicref href="client_utilities/ClientUtilitySummary.xml" linking="targetonly">
......
......@@ -23,55 +23,60 @@ under the License.
You must initialize and start PXF before you can use the framework.
PXF provides two management commands:
- `pxf cluster` - manage all PXF service instances in the Greenplum Database cluster
- `pxf` - manage the PXF service instance on a specific Greenplum Database host
The [`pxf cluster`](ref/pxf-cluster.html) command supports `init`, `start`, and `stop` subcommands. When you run a `pxf cluster` subcommand, you perform the operation on all segment hosts in the Greenplum Database cluster.
The [`pxf`](ref/pxf.html) command supports `init`, `start`, `stop`, `restart`, and `status` operations. These operations run locally. That is, if you want to start or stop the PXF agent on a specific Greenplum Database segment host, you log in to the host and run the command.
## <a id="init_pxf"></a>Initializing PXF
You must explicitly initialize the PXF service instance. This one-time initialization creates the PXF service web application and generates PXF configuration files and templates.
PXF supports both internal and user-customizable configuration properties. Initializing PXF generates PXF internal configuration files, setting default properties specific to your configuration. Initializing PXF also generates configuration file templates for user-customizable settings such as custom profiles and PXF runtime and logging settings.
PXF internal configuration files are located in `$GPHOME/pxf/conf`. You identify the PXF user configuration directory at initialization time via an environment variable named `$PXF_CONF`. If you do not set `$PXF_CONF` prior to initializing PXF, PXF prompts you to accept or decline the default user configuration directory, `$HOME/pxf`, during the initialization process.
PXF internal configuration files are located in `$GPHOME/pxf/conf`. You identify the PXF user configuration directory at initialization time via an environment variable named `$PXF_CONF`. If you do not set `$PXF_CONF` prior to initializing PXF, PXF may prompt you to accept or decline the default user configuration directory, `$HOME/pxf`, during the initialization process.
**Note**: The `gpadmin` user must have permission to either create, or write to, the specified `$PXF_CONF` directory.
During initialization, PXF creates the `$PXF_CONF` directory if necessary, and then populates it with subdirectories and template files. Refer to [PXF User Configuration Directories](about_pxf_dir.html#usercfg) for a list of these directories and their contents.
### <a id="init-pxf-prereq"></a>Prerequisites
### <a id="init-pxf-prereq" class="no-quick-link"></a>Prerequisites
Before initializing PXF in your Greenplum Database cluster, ensure that:
Before initializing PXF in your Greenplum Database cluster, be sure to identify the filesystem location for the PXF user configuration directory, `$PXF_CONF`.
- Your Greenplum Database cluster is up and running.
- You have identified the PXF user configuration directory filesystem location, `$PXF_CONF`.
### <a id="init-pxf-steps"></a>Procedure
### <a id="init-pxf-steps" class="no-quick-link"></a>Procedure
Perform the following procedure to initialize PXF on each segment host in your Greenplum Database cluster. You will use the `gpssh` utility to run a command on multiple hosts.
Perform the following procedure to initialize PXF on each segment host in your Greenplum Database cluster.
1. Log in to the Greenplum Database master node and set up your environment:
1. Log in to the Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. Create a text file that lists your Greenplum Database segment hosts, one host name per line. For example, a file named `seghostfile` may include:
``` pre
seghost1
seghost2
seghost3
```
4. Run the `pxf init` command to initialize the PXF service on the master and each segment host. For example, the following command specifies `/etc/pxf/usercfg` as the PXF user configuration directory for initialization. (Note that the `gpadmin` user must have permission to either create, or write to, the specified directory.)
4. Run the `pxf cluster init` command to initialize the PXF service on the master and on each segment host. For example, the following command specifies `/etc/pxf/usercfg` as the PXF user configuration directory for initialization.)
``` shell
gpadmin@gpmaster$ PXF_CONF=/etc/pxf/usercfg /usr/local/greenplum-db/pxf/bin/pxf init
gpadmin@gpmaster$ gpssh -e -v -f seghostfile 'PXF_CONF=/etc/pxf/usercfg /usr/local/greenplum-db/pxf/bin/pxf init'
gpadmin@gpmaster$ PXF_CONF=/etc/pxf/usercfg $GPHOME/pxf/bin/pxf cluster init
```
The `init` command creates the PXF web application and initializes the internal PXF configuration. The `init` command also creates the `$PXF_CONF` user configuration directory if it does not exist, and populates the directory with user-customizable configuration templates.
**Note**: The PXF service runs only on the segment hosts. However, you execute `pxf init` on the Greenplum Database master to set up the PXF user configuration directories there.
**Note**: The PXF service runs only on the segment hosts. However,`pxf cluster init` also sets up the PXF user configuration directories on the Greenplum Database master host.
## <a id="start_pxf"></a>Starting PXF
After initializing PXF, you must explicitly start PXF on each segment host in your Greenplum Database cluster. The PXF service, once started, runs as the `gpadmin` user on default port 5888. Only the `gpadmin` user can start and stop the PXF service.
After initializing PXF, you must start PXF on each segment host in your Greenplum Database cluster. The PXF service, once started, runs as the `gpadmin` user on default port 5888. Only the `gpadmin` user can start and stop the PXF service.
If you want to change the default PXF configuration, you must update the configuration before you start PXF.
......@@ -91,51 +96,99 @@ The `pxf-env.sh` file exposes the following PXF runtime configuration parameters
| PXF_KEYTAB | The absolute path to the PXF service Kerberos principal keytab file. | $PXF_CONF/keytabs/pxf.service.keytab |
| PXF_PRINCIPAL | The PXF service Kerberos principal. | gpadmin/\_HOST@EXAMPLE.COM |
You must propagate any updates that you make to `pxf-env.sh`, `pxf-log4j.properties`, or `pxf-profiles.xml` to each Greenplum Database segment host, and (re)start PXF on each host.
You must propagate any changes that you make to `pxf-env.sh`, `pxf-log4j.properties`, or `pxf-profiles.xml` to each Greenplum Database segment host, and (re)start PXF on each host.
### <a id="start_pxf_prereq"></a>Prerequisites
### <a id="start_pxf_prereq" class="no-quick-link"></a>Prerequisites
Before you start PXF in your Greenplum Database cluster, ensure that you have previously initialized PXF.
Before you start PXF in your Greenplum Database cluster, ensure that:
- Your Greenplum Database cluster is up and running.
- You have previously initialized PXF.
### <a id="start_pxf_proc"></a>Procedure
### <a id="start_pxf_proc" class="no-quick-link"></a>Procedure
Perform the following procedure to start PXF on each segment host in your Greenplum Database cluster. You will use the `gpssh` command and a `seghostfile` to run the command on multiple hosts.
Perform the following procedure to start PXF on each segment host in your Greenplum Database cluster.
1. Log in to the Greenplum Database master node and set up your environment:
1. Log in to the Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
3. Run the `pxf start` command to start PXF on each segment host. For example:
3. Run the `pxf cluster start` command to start PXF on each segment host. For example:
```shell
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf start"
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster start
```
## <a id="stop_pxf"></a>Stopping PXF
If you must stop PXF, for example if you are upgrading PXF, you must explicitly stop PXF on each segment host in your Greenplum Database cluster. Only the `gpadmin` user can stop the PXF service.
If you must stop PXF, for example if you are upgrading PXF, you must stop PXF on each segment host in your Greenplum Database cluster. Only the `gpadmin` user can stop the PXF service.
### <a id="stop_pxf_prereq" class="no-quick-link"></a>Prerequisites
Before you stop PXF in your Greenplum Database cluster, ensure that your Greenplum Database cluster is up and running.
### <a id="stop_pxf_proc" class="no-quick-link"></a>Procedure
Perform the following procedure to stop PXF on each segment host in your Greenplum Database cluster.
1. Log in to the Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
```
3. Run the `pxf cluster stop` command to stop PXF on each segment host. For example:
```shell
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster stop
```
## <a id="restart_pxf"></a>Restarting PXF
If you must restart PXF, for example if you updated PXF user configuration files in `$PXF_CONF/conf`, you can stop, and then start, PXF in your Greenplum Database cluster.
Only the `gpadmin` user can restart the PXF service.
### <a id="restart_pxf_prereq" class="no-quick-link"></a>Prerequisites
Before you restart PXF in your Greenplum Database cluster, ensure that your Greenplum Database cluster is up and running.
### <a id="restart_pxf_proc" class="no-quick-link"></a>Procedure
Perform the following procedure to stop PXF on each segment host in your Greenplum Database cluster. You will use the `gpssh` command and a `seghostfile` to run the command on multiple hosts.
Perform the following procedure to restart PXF in your Greenplum Database cluster.
1. Log in to the Greenplum Database master node and set up your environment:
1. Log in to the Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
3. Run the `pxf stop` command to stop PXF on each segment host. For example:
2. Restart PXF:
```shell
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf stop"
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster stop
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster start
```
## <a id="pxf_svc_mgmt"></a>PXF Service Management
## <a id="status_pxf"></a>Displaying PXF Status
The `pxf` command supports `init`, `start`, `stop`, `restart`, and `status` operations. These operations run locally. That is, if you want to start or stop the PXF agent on a specific segment host, you can log in to the host and run the command. If you want to start or stop the PXF agent on multiple segment hosts, use the `gpssh` utility as shown above, or individually log in to each segment host and run the command.
If you want to display PXF status, you must explicitly request the status of the PXF service instance on each segment host in your Greenplum Database cluster.
**Note**: If you have configured PXF Hadoop connectors and you update your Hadoop (or Hive or HBase) configuration while the PXF service is running, you must copy any updated configuration files to each Greenplum Database segment host and restart PXF on each host.
Only the `gpadmin` user can request the status of the PXF service.
Perform the following procedure to request PXF status on each segment host in your Greenplum Database cluster.
1. Log in to the Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
```
2. Use the `gpssh` command and a seghostfile to run the `pxf status` command on each segment host:
```shell
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf status"
```
......@@ -10,20 +10,19 @@ PXF is compatible with Cloudera, Hortonworks Data Platform, and generic Apache H
PXF bundles all of the JAR files on which it depends, including those for Hadoop services, and loads these JARs at runtime. Configuring PXF Hadoop connectors involves copying configuration files from your Hadoop cluster to each Greenplum Database segment host. Before you configure PXF Hadoop, Hive, and HBase connectors, ensure that you can copy configuration files from HDFS, Hive, and HBase hosts in your Hadoop cluster to the Greenplum Database master host.
In this procedure, you copy Hadoop configuration files to the `$PXF_CONF/servers/default` directory on each Greenplum Database segment host. PXF creates this directory when you run `pxf init`.
In this procedure, you copy Hadoop configuration files to the `$PXF_CONF/servers/default` directory on each Greenplum Database segment host. PXF creates this directory when you run `pxf cluster init`.
## <a id="client-pxf-config-steps"></a>Procedure
Perform the following procedure to configure the desired PXF Hadoop-related connectors on each segment host in your Greenplum Database cluster.
You will use the `gpssh` and `gpscp` utilities where possible to run a command on multiple hosts.
You will use the `gpscp` utility where possible to copy files to multiple hosts.
1. Log in to your Greenplum Database master node and set up the environment:
1. Log in to your Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. PXF requires information from `core-site.xml` and other Hadoop configuration files. Copy relevant configuration from your Hadoop cluster to each Greenplum Database segment host.
......@@ -84,5 +83,5 @@ You will use the `gpssh` and `gpscp` utilities where possible to run a command o
## <a id="client-cfg-update"></a>Updating Hadoop Configuration
If you update your Hadoop, Hive, or HBase configuration while the PXF service is running, you must copy the updated `.xml` file(s) to each Greenplum Database segment host and restart PXF.
If you update your Hadoop, Hive, or HBase configuration while the PXF service is running, you must copy the updated `.xml` file(s) to each Greenplum Database segment host and restart PXF on each host.
......@@ -348,11 +348,10 @@ public class PxfExample_CustomWritable implements Writable {
$ scp pxfex-customwritable.jar gpadmin@gpmaster:/home/gpadmin
```
4. Log in to your Greenplum Database master node and set up the environment:
4. Log in to your Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
````
5. Copy the `pxfex-customwritable.jar` JAR file to the user runtime library directory on each Greenplum Database segment host, and note the location. For example, if `PXF_CONF=/etc/pxf/usercfg`:
......@@ -361,11 +360,7 @@ public class PxfExample_CustomWritable implements Writable {
gpadmin@gpmaster$ gpscp -v -f seghostfile /home/gpadmin/pxfex-customwritable.jar =:/etc/pxf/usercfg/lib/pxfex-customwritable.jar
```
6. Restart PXF on each Greenplum Database segment host. For example:
``` shell
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
3. Restart PXF on each Greenplum Database segment host as described in [Restarting PXF](cfginitstart_pxf.html#restart_pxf).
5. Use the PXF `SequenceWritable` profile to create a Greenplum Database writable external table. Identify the serialization/deserialization Java class you created above in the `DATA-SCHEMA` \<custom-option\>. Use `BLOCK` mode compression with `BZip2` when you create the writable table.
......
......@@ -2,7 +2,7 @@
title: Installing Java for PXF
---
PXF is a Java service. It requires a Java 1.7 or 1.8 installation on each Greenplum Database segment host.
PXF is a Java service. It requires a Java 1.8 installation on each Greenplum Database segment host.
*If an appropriate version of PXF is already installed on each Greenplum Database segment host, you need not perform the procedure in this topic.*
......@@ -15,11 +15,10 @@ Ensure that you have access to, or superuser permissions to install, Java versio
Perform the following procedure to install Java on the master and on each segment host in your Greenplum Database cluster. You will use the `gpssh` utility where possible to run a command on multiple hosts.
1. Log in to your Greenplum Database master node and set up the environment:
1. Log in to your Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. Create a text file that lists your Greenplum Database segment hosts, one host name per line. For example, a file named `seghostfile` may include:
......
......@@ -223,11 +223,10 @@ Perform the following steps to create a PostgreSQL table named `forpxf_table1` i
You must download the PostgreSQL driver JAR file to your system, copy the JAR file to the PXF user configuration directory, and then restart PXF.
1. Log in to the Greenplum Database master node and set up your environment:
1. Log in to the Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. [Download](https://jdbc.postgresql.org/download.html) a PostgreSQL JDBC driver JAR file and note the location of the downloaded file.
......@@ -238,11 +237,7 @@ You must download the PostgreSQL driver JAR file to your system, copy the JAR fi
gpadmin@gpmaster$ gpscp -v -f seghostfile postgresql-42.2.5.jar =:/etc/pxf/usercfg/lib/postgresql-42.2.5.jar
```
6. Restart PXF on *each* Greenplum Database segment host with the following command:
``` shell
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
6. Restart PXF on each Greenplum Database segment host as described in [Restarting PXF](cfginitstart_pxf.html#restart_pxf).
#### <a id="ex_readjdbc"></a>Read from the PostgreSQL Table
......
......@@ -67,6 +67,11 @@ The Greenplum Platform Extension Framework (PXF) provides parallel, high through
This topic details the service- and database- level logging configuration procedures for PXF. It also identifies some common PXF errors and describes how to address PXF memory issues.
- [PXF Utility Reference](ref/pxf-ref.html)
The PXF utility reference.
- [Using the PXF Java SDK](sdk/dev_overview.html)
The PXF SDK provides the Java classes and interfaces that you use to add support for external data stores and new data formats and data access APIs to Greenplum Database. This topic describes how to set up your PXF development environment, use the PXF API, and deploy your extension.
......@@ -89,11 +89,10 @@ Perform the following steps to configure PXF for a secure HDFS. You will perform
**Perform the following steps on each Greenplum Database segment host**:
1. Login to the segment host and set up the Greenplum Database environment. For example:
1. Log in to the segment host. For example:
``` shell
$ ssh gpadmin@<seghost>
gpadmin@seghost$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. Install the Kerberos client packages on **each** Greenplum Database segment host if they are not already installed. You must have superuser permissions to install operating system packages. For example:
......
......@@ -12,11 +12,10 @@ As an alternative, you can disable PXF user impersonation. With user impersonati
Perform the following procedure to turn PXF user impersonation on or off in your Greenplum Database cluster. If you are configuring PXF for the first time, user impersonation is enabled by default. You need not perform this procedure.
1. Log in to your Greenplum Database master node as the administrative user and set up the environment:
1. Log in to your Greenplum Database master node as the administrative user:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. Recall the location of the PXF user configuration directory (`$PXF_CONF`). Open the `$PXF_CONF/conf/pxf-env.sh` configuration file in a text editor. For example:
......@@ -37,11 +36,8 @@ Perform the following procedure to turn PXF user impersonation on or off in your
gpadmin@gpmaster$ gpscp -v -f seghostfile $PXF_CONF/conf/pxf-env.sh =:/etc/pxf/usercfg/conf/pxf-env.sh
```
5. If you have previously started PXF, restart it on each Greenplum Database segment host to apply the new setting. For example:
5. If you have previously started PXF, restart it on each Greenplum Database segment host as described in [Restarting PXF](cfginitstart_pxf.html#restart_pxf) to apply the new setting.
``` shell
$ gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
## <a id="hadoop"></a>Configure Hadoop Proxying
......
---
title: pxf cluster
---
Manage the PXF service instance on all Greenplum Database segment hosts.
## <a id="topic1__section2"></a>Synopsis
``` pre
pxf cluster <command>
```
where `<command>` is:
``` pre
help
init
start
stop
```
## <a id="topic1__section3"></a>Description
The `pxf cluster` utility manages the PXF service instance on all Greenplum Database segment hosts. You can initialize, start, and stop the PXF service instance on all segment hosts.
The `pxf cluster` command requires a running Greenplum Database cluster.
If you want to manage the PXF service instance on a specific segment host, use the `pxf` utility. See [`pxf`](pxf.html#topic1).
## <a id="commands"></a>Commands
<dt>help</dt>
<dd>Display the `pxf cluster` help message and then exit.</dd>
<dt>init</dt>
<dd>Initialize the PXF service instance on the master and on all segment hosts. When you initialize PXF across your Greenplum Database cluster, you must identify the PXF user configuration directory via an environment variable named `$PXF_CONF`. If you do not set `$PXF_CONF` prior to initializing PXF, PXF returns an error.</dd>
<dt>start</dt>
<dd>Start the PXF service instance on all segment hosts.</dd>
<dt>stop </dt>
<dd>Stop the PXF service instance on all segment hosts.</dd>
## <a id="topic1__section5"></a>Examples
Stop the PXF service instance on all segment hosts:
``` shell
$ $GPHOME/pxf/bin/pxf cluster stop
```
## <a id="topic1__section6"></a>See Also
[`pxf`](pxf.html#topic1)
---
title: PXF Utility Reference
---
The Greenplum Platform Extension Framework (PXF) includes the following utility reference pages:
- [pxf cluster](pxf-cluster.html)
- [pxf](pxf.html)
---
title: pxf
---
Manage the PXF service instance on a segment host.
## <a id="topic1__section2"></a>Synopsis
``` pre
pxf <command>
```
where \<command\> is:
``` pre
cluster
help
init
restart
start
status
stop
version
```
## <a id="topic1__section3"></a>Description
The `pxf` utility manages the PXF service instances on Greenplum Database segment hosts.
You can initialize, start, stop, and restart the PXF service instance on a specific segment host. You can also display the status of the PXF service instance running on the host.
To initialize, start, and stop the PXF service instance on all segment hosts in the Greenplum Database cluster, use the `pxf cluster` command. See [`pxf cluster`](pxf-cluster.html#topic1).
## <a id="commands"></a>Commands
<dt>cluster</dt>
<dd>Manage the PXF service instance on all Greenplum Database segment hosts. See [`pxf cluster`](pxf-cluster.html#topic1).</dd>
<dt>help</dt>
<dd>Display the `pxf` management utility help message and then exit.</dd>
<dt>init</dt>
<dd>Initialize the PXF service instance on the host. When you initialize PXF, you must identify the PXF user configuration directory via an environment variable named `$PXF_CONF`. If you do not set `$PXF_CONF` prior to initializing PXF, PXF prompts you to accept or decline the default user configuration directory, `$HOME/pxf`, during the initialization process. See [Options](pxf.html#options).</dd>
<dt>restart</dt>
<dd>Restart the PXF service instance running on the segment host.</dd>
<dt>start</dt>
<dd>Start the PXF service instance on the segment host.</dd>
<dt>status</dt>
<dd>Display the status of the PXF service instance running on the segment host.</dd>
<dt>stop </dt>
<dd>Stop the PXF service instance running on the segment host.</dd>
<dt>version </dt>
<dd>Display the PXF version and then exit.</dd>
## <a id="options"></a>Options
The `pxf init` command takes the following option:
<dt>-y </dt>
<dd>Do not prompt, use the default `$PXF_CONF` directory location if the environment variable is not set.</dd>
## <a id="topic1__section5"></a>Examples
Start the PXF service instance on the local segment host:
``` shell
$ $GPHOME/pxf/bin/pxf start
```
## <a id="topic1__section6"></a>See Also
[`pxf cluster`](pxf-cluster.html#topic1)
......@@ -52,7 +52,7 @@ Perform the following procedure to set up your PXF development environment. This
Exercises in this guide reference your work directory. You may consider adding `$PXFDEV_BASE` to your `.bash_profile` or equivalent shell initialization script.
2. If not already present on your development system, install Java Development Kit version 1.7 or 1.8. You must have superuser permissions to install operating system packages. For example, to install the JDK on a CentOS development system:
2. If not already present on your development system, install Java Development Kit version 1.8. You must have superuser permissions to install operating system packages. For example, to install the JDK on a CentOS development system:
``` shell
root@devsystem$ sudo yum install java-1.8.0-openjdk-1.8.0*
......
......@@ -29,9 +29,10 @@ For example, if `seghostfile` contains a list, one-host-per-line, of the segment
``` shell
gpadmin@gpmaster$ gpscp my-connector.jar -v -f seghostfile =:/etc/pxf/usercfg/lib
gpadmin@gpmaster$ gpscp connector-dependency.jar -v -seghostfile /etc/pxf/usercfg/lib
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
Restart PXF on each Greenplum Database segment host as described in [Restarting PXF](../cfginitstart_pxf.html#restart_pxf).
The administrator must also install any third-party commands or other components used by the connector on all Greenplum Database segment hosts and ensure that these programs are executable by the `gpadmin` operating system user.
......@@ -98,11 +99,10 @@ Before attempting this exercise, ensure that you have:
Perform the following procedure to deploy the *Demo* connector and to verify that you deployed the connector successfully:
1. Log in to the Greenplum Database master node as an administrative user and set up your environment:
1. Log in to the Greenplum Database master node as an administrative user:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. Copy the *Demo* connector JAR file that you previously built to the Greenplum Database master host. For example, to copy the JAR file to the `/tmp` directory, replace `PXFDEV_BASE` with the absolute path to your PXF development work area:
......@@ -117,11 +117,7 @@ Perform the following procedure to deploy the *Demo* connector and to verify tha
gpadmin@gpmaster$ gpscp -v -f seghostfile /tmp/my-demo-connector.jar =:/etc/pxf/usercfg/lib/my-demo-connector.jar
```
9. Restart PXF on each segment host. For example:
``` shell
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
9. Restart PXF on each Greenplum Database segment host as described in [Restarting PXF](../cfginitstart_pxf.html#restart_pxf).
11. Verify that you correctly deployed the *Demo* connector by creating and accessing Greenplum Database readable and writable external tables that specify the *Demo* connector plug-ins:
......
......@@ -79,9 +79,10 @@ For example, if `seghostfile` contains a list, one-host-per-line, of the segment
``` shell
gpadmin@gpmaster$ gpscp -v -f seghostfile $PXF_CONF/conf/pxf-profiles.xml =:/etc/pxf/usercfg/conf/pxf-profiles.xml
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
Restart PXF on each Greenplum Database segment host as described in [Restarting PXF](../cfginitstart_pxf.html#restart_pxf).
## <a id="verify_profile_reg"></a>Verifying Profile Registration
To verify that you registered and deployed a new profile correctly, you create a Greenplum Database external table specifying the profile name and invoke `SELECT` and/or `INSERT` commands on the table to test read and write operations on the external data source.
......@@ -99,11 +100,10 @@ Before attempting this exercise, ensure that you have completed the [Example: De
Perform the following procedure to define and register a read profile and a write profile for your *Demo* connector and verify that you deployed the profiles successfully:
1. Log in to the Greenplum Database master node as an administrative user and set up your environment:
1. Log in to the Greenplum Database master node as an administrative user:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. Open the `pxf-profiles.xml` file in the editor of your choosing. For example:
......@@ -149,11 +149,7 @@ Perform the following procedure to define and register a read profile and a writ
gpadmin@gpmaster$ gpscp -v -f seghostfile $PXF_CONF/conf/pxf-profiles.xml =:/etc/pxf/usercfg/conf/pxf-profiles.xml
```
7. Run the `pxf restart` command to restart PXF on each segment host. For example:
``` shell
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
7. Restart PXF on each Greenplum Database segment host as described in [Restarting PXF](../cfginitstart_pxf.html#restart_pxf).
8. Verify that you correctly deployed the *Demo* connector profiles by creating and accessing Greenplum Database external tables:
......
......@@ -33,7 +33,7 @@ Summary of terms:
PXF in Greenplum Database has two components:
- A C shared library that is loaded into Greenplum Database when the `CREATE EXTENSION pxf` command is invoked on a database.
- A Java service, referred to as the PXF agent, a single JVM process located on each Greenplum Database segment host. You start the PXF agent when you run `pxf start` on the segment host.
- A Java service, referred to as the PXF agent, a single JVM process located on each Greenplum Database segment host. You start the PXF agent when you run `pxf cluster start`.
Operations on Greenplum Database external tables are first routed to the PXF C shared library extension then on to the PXF agent.
......
......@@ -27,11 +27,11 @@ The following table describes some errors you may encounter while using PXF:
| Error Message | Discussion |
|-------------------------------|---------------------------------|
| Protocol "pxf" does not exist | **Cause**: The `pxf` extension was not registered.<br>**Remedy**: Create (enable) the PXF extension for the database as described in the PXF [Enable Procedure](using_pxf.html#enable-pxf-ext).|
| Invalid URI pxf://\<path-to-data\>: missing options section | **Cause**: The `LOCATION` URI does not include the profile or other required options.<br>**Remedy**: Provide the profile and required options in the URI. |
| org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://\<namenode\>:8020/\<path-to-file\> | **Cause**: The HDFS file you specified in \<path-to-file\> does not exist. <br>**Remedy**: Provide the path to an existing HDFS file. |
| NoSuchObjectException(message:\<schema\>.\<hivetable\> table not found) | **Cause**: The Hive table you specified with \<schema\>.\<hivetable\> does not exist. <br>**Remedy**: Provide the name of an existing Hive table. |
| Failed to connect to \<segment-host\> port 5888: Connection refused (libchurl.c:944) (\<segment-id\> slice\<N\> \<segment-host\>:40000 pid=\<process-id\>)<br> ... |**Cause**: PXF is not running on \<segment-host\>.<br>**Remedy**: Restart PXF on \<segment-host\>. |
| Protocol "pxf" does not exist | **Cause**: The `pxf` extension was not registered.<br>**Solution**: Create (enable) the PXF extension for the database as described in the PXF [Enable Procedure](using_pxf.html#enable-pxf-ext).|
| Invalid URI pxf://\<path-to-data\>: missing options section | **Cause**: The `LOCATION` URI does not include the profile or other required options.<br>**Solution**: Provide the profile and required options in the URI. |
| org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://\<namenode\>:8020/\<path-to-file\> | **Cause**: The HDFS file you specified in \<path-to-file\> does not exist. <br>**Solution**: Provide the path to an existing HDFS file. |
| NoSuchObjectException(message:\<schema\>.\<hivetable\> table not found) | **Cause**: The Hive table you specified with \<schema\>.\<hivetable\> does not exist. <br>**Solution**: Provide the name of an existing Hive table. |
| Failed to connect to \<segment-host\> port 5888: Connection refused (libchurl.c:944) (\<segment-id\> slice\<N\> \<segment-host\>:40000 pid=\<process-id\>)<br> ... |**Cause**: PXF is not running on \<segment-host\>.<br>**Solution**: Restart PXF on \<segment-host\>. |
| *ERROR*: failed to acquire resources on one or more segments<br>*DETAIL*: could not connect to server: Connection refused<br>&nbsp;&nbsp;&nbsp;&nbsp;Is the server running on host "\<segment-host\>" and accepting<br>&nbsp;&nbsp;&nbsp;&nbsp;TCP/IP connections on port 40000?(seg\<N\> \<segment-host\>:40000) | **Cause**: The Greenplum Database segment host \<segment-host\> is down. |
| org.apache.hadoop.security.AccessControlException: Permission denied: user=<user>, access=READ, inode=&quot;<filepath>&quot;:<user>:<group>:-rw------- | **Cause**: The Greenplum Database user that executed the PXF operation does not have permission to access the underlying Hadoop service (HDFS or Hive). See [About PXF User Impersonation](pxfuserimpers.html). |
......@@ -40,38 +40,43 @@ Enabling more verbose logging may aid PXF troubleshooting efforts. PXF provides
### <a id="pxfsvclogmsg"></a>Service-Level Logging
PXF utilizes `log4j` for service-level logging. PXF-service-related log messages are captured in a log file specified by PXF's `log4j` properties file, `$PXF_CONF/conf/pxf-log4j.properties`. The default PXF logging configuration will write `INFO` and more severe level logs to `$PXF_CONF/logs/pxf-service.log`. You can configure the logging level and log file location.
PXF utilizes `log4j` for service-level logging. PXF-service-related log messages are captured in a log file specified by PXF's `log4j` properties file, `$PXF_CONF/conf/pxf-log4j.properties`. The default PXF logging configuration will write `INFO` and more severe level logs to `$PXF_CONF/logs/pxf-service.log`. You can configure the logging level and the log file location.
PXF provides more detailed logging when the `DEBUG` level is enabled. To configure PXF `DEBUG` logging, uncomment the following line in `pxf-log4j.properties`:
PXF provides more detailed logging when the `DEBUG` level is enabled. To configure PXF `DEBUG` logging and examine the output:
``` shell
#log4j.logger.org.greenplum.pxf=DEBUG
```
1. Uncomment the following line in `pxf-log4j.properties`:
Copy the `pxf-log4j.properties` file to each segment host and restart the PXF service on *each* Greenplum Database segment host. For example, if `PXF_CONF=/etc/pxf/usercfg`:
``` shell
#log4j.logger.org.greenplum.pxf=DEBUG
```
``` shell
gpadmin@gpmaster$ gpscp -v -f seghostfile $PXF_CONF/conf/pxf-log4j.properties :=/ect/pxf/usercfg/conf/pxf-log4j.properties
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
2. Copy the `pxf-log4j.properties` file to each segment host. For example, if `PXF_CONF=/etc/pxf/usercfg`:
With `DEBUG` level logging now enabled, perform your PXF operations; for example, create and query an external table. (Make note of the time; this will direct you to the relevant log messages in `$PXF_CONF/logs/pxf-service.log`.)
``` shell
gpadmin@gpmaster$ gpscp -v -f seghostfile $PXF_CONF/conf/pxf-log4j.properties :=/etc/pxf/usercfg/conf/pxf-log4j.properties
```
``` shell
$ date
Wed Oct 4 09:30:06 MDT 2017
$ psql -d <dbname>
```
3. Restart PXF on each Greenplum Database segment host as described in [Restarting PXF](cfginitstart_pxf.html#restart_pxf).
``` sql
dbname=> CREATE EXTERNAL TABLE hdfstest(id int, newid int)
LOCATION ('pxf://data/dir/hdfsfile?PROFILE=HdfsTextSimple')
FORMAT 'TEXT' (delimiter='E',');
dbname=> SELECT * FROM hdfstest;
<select output>
```
4. With `DEBUG` level logging now enabled, you can perform your PXF operations. Be sure to make note of the time; this will direct you to the relevant log messages in `$PXF_CONF/logs/pxf-service.log`.
Examine/collect the log messages from `pxf-service.log`.
``` shell
$ date
Wed Oct 4 09:30:06 MDT 2017
$ psql -d <dbname>
```
4. Create and query an external table. For example:
``` sql
dbname=> CREATE EXTERNAL TABLE hdfstest(id int, newid int)
LOCATION ('pxf://data/dir/hdfsfile?PROFILE=HdfsTextSimple')
FORMAT 'TEXT' (delimiter='E',');
dbname=> SELECT * FROM hdfstest;
<select output>
```
5. Finally, examine/collect the log messages from `pxf-service.log`.
**Note**: `DEBUG` logging is quite verbose and has a performance impact. Remember to turn off PXF service `DEBUG` logging after you have collected the desired information.
......@@ -125,11 +130,10 @@ Each PXF agent running on a segment host is configured with a default maximum Ja
Perform the following procedure to increase the heap size for the PXF agent running on each segment host in your Greenplum Database cluster.
1. Log in to your Greenplum Database master node and set up the environment:
1. Log in to your Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. Recall the location of the PXF user configuration directory ($PXF_CONF). Edit the `$PXF_CONF/conf/pxf-env.sh` file. For example:
......@@ -147,14 +151,10 @@ Perform the following procedure to increase the heap size for the PXF agent runn
4. Copy the updated `pxf-env.sh` file to each Greenplum Database segment host. For example, if `seghostfile` contains a list, one-host-per-line, of the segment hosts in your Greenplum Database cluster and `PXF_CONF=/etc/pxf/usercfg`:
``` shell
gpadmin@gpmaster$ gpscp -v -f seghostfile $PXF_CONF/conf/pxf-env.sh =:/ect/pxf/usercfg/conf/pxf-env.sh
gpadmin@gpmaster$ gpscp -v -f seghostfile $PXF_CONF/conf/pxf-env.sh =:/etc/pxf/usercfg/conf/pxf-env.sh
```
5. Restart PXF on each Greenplum Database segment host. For example:
``` shell
gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
5. Restart PXF on each Greenplum Database segment host as described in [Restarting PXF](cfginitstart_pxf.html#restart_pxf).
### <a id="pxf-threadcfg"></a>Another Option for Resource-Constrained PXF Segment Hosts
......@@ -164,11 +164,10 @@ The Tomcat default maximum number of threads is 300. Decrease the maximum number
Perform the following procedure to decrease the maximum number of Tomcat threads for the PXF agent running on each segment host in your Greenplum Database deployment.
1. Log in to your Greenplum Database master node and set up the environment:
1. Log in to your Greenplum Database master node:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. Edit the `$GPHOME/pxf/pxf-service/conf/server.xml` file. For example:
......@@ -192,11 +191,7 @@ Perform the following procedure to decrease the maximum number of Tomcat threads
gpadmin@gpmaster$ gpscp -v -f seghostfile $GPHOME/pxf/pxf-service/conf/server.xml =:/usr/local/greenplum-db/pxf/pxf-service/conf/server.xml
```
5. Restart PXF on each Greenplum Database segment host. For example:
``` shell
$ gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
5. Restart PXF on each Greenplum Database segment host as described in [Restarting PXF](cfginitstart_pxf.html#restart_pxf).
## <a id="pxf-timezonecfg"></a>Addressing PXF JDBC Connector Time Zone Errors
......
......@@ -21,28 +21,15 @@ The PXF upgrade procedure has two parts. You perform one procedure before, and o
Perform this procedure before you upgrade to a new version of Greenplum Database:
1. Log in to the Greenplum Database master node and set up the environment. For example:
1. Log in to the Greenplum Database master node. For example:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. Create a text file that lists your Greenplum Database segment hosts, one host name per line. For example, a file named `seghostfile` may include:
3. Stop PXF on each segment host as described in [Stopping PXF](cfginitstart_pxf.html#stop_pxf).
``` pre
seghost1
seghost2
seghost3
```
3. Run the `pxf stop` command to stop PXF on each segment host. For example:
``` shell
$ gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf stop"
```
4. **If you are upgrading from Greenplum Database version 5.13 or earlier**:
4. **If you are upgrading from Greenplum Database version 5.14 or earlier**:
1. Back up the *PXF.from* configuration files found in the `$GPHOME/pxf/conf/` directory. These files should be the same on all segment hosts, so you need only copy from one of the hosts. For example:
......@@ -60,18 +47,17 @@ Perform this procedure before you upgrade to a new version of Greenplum Database
After you upgrade to the new version of Greenplum Database, perform the following procedure to upgrade and configure the *PXF.to* software:
1. Log in to the Greenplum Database master node and set up the environment. For example:
1. Log in to the Greenplum Database master node. For example:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. Initialize PXF on each segment host as described in [Initializing PXF](cfginitstart_pxf.html#init_pxf).
3. PXF user impersonation is on by default in Greenplum Database version 5.5.0 and later. If you are upgrading from an older *PXF.from* version, you must configure user impersonation for the underlying Hadoop services. Refer to [Configuring User Impersonation and Proxying](pxfuserimpers.html) for instructions, including the configuration procedure to turn off PXF user impersonation.
4. **If you are upgrading from Greenplum Database version 5.13 or earlier**:
4. **If you are upgrading from Greenplum Database version 5.14 or earlier**:
1. If you updated the `pxf-env.sh` configuration file in your *PXF.from* installation, re-apply those changes to `$PXF_CONF/conf/pxf-env.sh`, and copy the updated `pxf-env.sh` to all segment hosts. For example, if `seghostfile` contains a list, one-host-per-line, of the segment hosts in your Greenplum Database cluster and `PXF_CONF=/etc/pxf/usercfg`:
......@@ -86,8 +72,8 @@ After you upgrade to the new version of Greenplum Database, perform the followin
3. If you updated the `pxf-log4j.properties` configuration file in your *PXF.from* installation, re-apply those changes to `$PXF_CONF/conf/pxf-log4j.properties` and copy the updated file to all segment hosts. Refer to Step 4a above for a similar `gpscp` command.
4. If you updated the `pxf-public.classpath` configuration file in your *PXF.from* installation, copy every JAR referenced in the file to `$PXF_CONF/lib` on each segment host. Refer to Step 4a above for a similar `gpscp` command.
5. If you added additional JAR files to your *PXF.from* installation, copy them to `$PXF_CONF/lib` on all segment hosts.
5. Starting in Greenplum Database version 5.14, PXF requires that the Hadoop configuration files reside in the `$PXF_CONF/servers/default` directory. If you configured PXF Hadoop connectors in your *PXF.from* installation, copy the Hadoop configuration files in `/etc/<hadoop_service>/conf` to `$PXF_CONF/servers/default` on the Greenplum Database master and on all segment hosts.
5. Starting in Greenplum Database version 5.14, the default Kerberos keytab file location for PXF is `$PXF_CONF/keytabs`. If you previously configured PXF for secure HDFS and the PXF keytab file is located in a *PXF.from* installation directory (for example, `$GPHOME/pxf/conf`), consider relocating the keytab file to `$PXF_CONF/keytabs`. Alternatively, update the `PXF_KEYTAB` property setting in the `$PXF_CONF/conf/pxf-env.sh` file to reference your keytab file. Be sure to propagate any updated files to each segment host.
5. Starting in Greenplum Database version 5.15, PXF requires that the Hadoop configuration files reside in the `$PXF_CONF/servers/default` directory. If you configured PXF Hadoop connectors in your *PXF.from* installation, copy the Hadoop configuration files in `/etc/<hadoop_service>/conf` to `$PXF_CONF/servers/default` on the Greenplum Database master and on all segment hosts.
5. Starting in Greenplum Database version 5.15, the default Kerberos keytab file location for PXF is `$PXF_CONF/keytabs`. If you previously configured PXF for secure HDFS and the PXF keytab file is located in a *PXF.from* installation directory (for example, `$GPHOME/pxf/conf`), consider relocating the keytab file to `$PXF_CONF/keytabs`. Alternatively, update the `PXF_KEYTAB` property setting in the `$PXF_CONF/conf/pxf-env.sh` file to reference your keytab file. Be sure to propagate any updated files to each segment host.
6. Start PXF on each segment host as described in [Starting PXF](cfginitstart_pxf.html#start_pxf).
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册