未验证 提交 4f3d753a 编写于 作者: D David Yozie 提交者: GitHub

Docs: Adding info for pxf user impersonation feature (#4509)

* Docs: Adding first take on PXF user impersonation feature

* typo fix from Shivram

* adding user impersonation section to install overview. Also renaming to reflect that this is a procedure that should be followed

* more feedback from Alex

* remove HBase from troubleshooting description

* a few more fixes from Alex
上级 c64080b4
......@@ -14,6 +14,9 @@
<li class="has_submenu">
<a href="/docs/550/pxf/instcfg_pxf.html" format="markdown">Installing and Configuring PXF</a>
<ul>
<li>
<a href="/docs/550/pxf/pxfuserimpers.html">Configuring User Impersonation and Proxying</a>
</li>
<li>
<a href="/docs/550/pxf/client_instcfg.html" format="markdown">Installing and Configuring Hadoop Clients for PXF</a>
</li>
......
......@@ -2,6 +2,10 @@
title: Installing and Configuring PXF
---
PXF accesses Hadoop services on behalf of Greenplum Database end users. By default, PXF tries to access data source services (HDFS, Hive, HBase) using the identity of the Greenplum Database user account that logs into Greenplum Database. In order to support this functionality, you must configure proxy settings for Hadoop, as well as for Hive and HDFS if you intend to use those PXF connectors. Follow the procedures in:
- **[Configuring User Impersonation and Proxying](pxfuserimpers.html)**
The Greenplum Platform Extension Framework (PXF) provides connectors to Hadoop, Hive, and HBase data stores. To use these PXF connectors, you must install Hadoop, Hive, and HBase clients on each Greenplum Database segment host as described in this one-time installation and configuration procedure:
- **[Installing and Configuring Hadoop Clients for PXF](client_instcfg.html)**
......
---
title: Configuring User Impersonation and Proxying
---
PXF accesses Hadoop services on behalf of Greenplum Database end users. By default, PXF tries to access data source services (HDFS, Hive, HBase) using the identity of the Greenplum Database user account that logs into Greenplum Database and performs an operation using a PXF connector profile. Keep in mind that PXF uses only the _login_ identity of the user when accessing Hadoop services. For example, if a user logs into Greenplum Database as the user `jane` and then execute `SET ROLE` or `SET SESSION AUTHORIZATION` to assume a different user identity, all PXF requests still use the identity `jane` to access Hadoop services.
With the default PXF configuration, you must explicitly configure each Hadoop data source (HDFS, Hive, HBase) to allow the PXF process owner (usually `gpadmin`) to act as a proxy for impersonating users or groups. See [Configuring Hadoop Proxying](#hadoop), [Hive User Impersonation](#hive), and [HBase User Impersonation](#hbase).
As an alternative, you can disable PXF user impersonation. With user impersonation disabled, PXF executes all Hadoop service requests as the PXF process owner (usually `gpadmin`). This behavior matches earlier releases of PXF, but it provides no means to control access to Hadoop services for different Greenplum Database users in Hadoop. It requires that the `gpadmin` user have access to all files and directories in HDFS, and all tables in Hive and HBase that need to be accessed as PXF external tables. See [Configuring PXF User Impersonation](#pxf_cfg_proc) for information about disabling user impersonation.
## <a id="pxf_cfg_proc"></a>Configure PXF User Impersonation
Perform the following procedure to turn PXF user impersonation on or off in your Greenplum Database cluster. User impersonation is enabled by default.
1. Log in to your Greenplum Database master node as the administrative user and set up the environment:
``` shell
$ ssh gpadmin@<gpmaster>
gpadmin@gpmaster$ . /usr/local/greenplum-db/greenplum_path.sh
```
2. Open the `$GPHOME/pxf/conf/pxf-env.sh` file in a text editor. For example:
``` shell
gpadmin@gpmaster$ vi $GPHOME/pxf/conf/pxf-env.sh
```
3. Locate the `PXF_USER_IMPERSONATION` setting in the `pxf-env.sh` file. Set the value to `true` to turn PXF user impersonation on, or `false` to turn it off. For example:
``` shell
PXF_USER_IMPERSONATION="true"
```
4. Copy the updated `pxf-env.sh` file to each Greenplum Database segment host. For example, if `seghostfile` contains a list, one-host-per-line, of the segment hosts in your Greenplum Database cluster:
``` shell
gpadmin@gpmaster$ gpscp -v -f seghostfile $GPHOME/pxf/conf/pxf-env.sh =:/usr/local/greenplum-db/pxf/conf/pxf-env.sh
```
5. Restart PXF on each Greenplum Database segment host to apply the new setting. For example:
``` shell
$ gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart"
```
## <a id="hadoop"></a>Configure Hadoop Proxying
When PXF user personation is enabled (the default), you must configure the Hadoop `core-site.xml` configuration file permit user impersonation for PXF. Follow these steps:
1. Open the `core-site.xml` configuration file using a text editor, or use Ambari to add or edit the property values described in this procedure.
2. Set the property `hadoop.proxyuser.<name>.hosts` to specify the list of PXF host names where proxy requests are permitted. Substitute `<name>` for the PXF user (generally `gpadmin`) and provide multiple hostnames in a comma-separated list. For example:
``` xml
<property>
<name>hadoop.proxyuser.gpadmin.hosts</name>
<value>pxfhost1,pxfhost2,pxfhost3</value>
</property>
```
3. Set the property `hadoop.proxyuser.<name>.groups` to specify the list of HDFS groups that PXF can impersonate. You should limit this list to only those groups that require access to HDFS data from PXF. For example:
``` xml
<property>
<name>hadoop.proxyuser.gpadmin.groups</name>
<value>group1,group2</value>
</property>
```
4. After changing `core-site.xml`, restart Hadoop for your changes to take effect.
## <a id="hive"></a>Hive User Impersonation
The PXF Hive connector uses the Hive MetaStore to determine the HDFS locations of Hive tables, and then accesses the underlying HDFS files directly. No specific impersonation configuration is required for Hive, because the Hadoop proxy configuration in `core-site.xml` also applies to Hive access.
## <a id="hbase"></a>HBase User Impersonation
In order for user impersonation to work with HBase, you must enable the `AccessController` coprocessor in the HBase configuration and restart the cluster. See [61.3 Server-side Configuration for Simple User Access Operation](http://hbase.apache.org/book.html#hbase.secure.configuration) in the Apache HBase Reference Guide for the required `hbase-site.xml` configuration settings.
\ No newline at end of file
......@@ -32,7 +32,8 @@ The following table describes some errors you may encounter while using PXF:
| org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://\<namenode\>:8020/\<path-to-file\> | **Cause**: The HDFS file you specified in \<path-to-file\> does not exist. <br>**Remedy**: Provide the path to an existing HDFS file. |
| NoSuchObjectException(message:\<schema\>.\<hivetable\> table not found) | **Cause**: The Hive table you specified with \<schema\>.\<hivetable\> does not exist. <br>**Remedy**: Provide the name of an existing Hive table. |
| Failed to connect to \<segment-host\> port 5888: Connection refused (libchurl.c:944) (\<segment-id\> slice\<N\> \<segment-host\>:40000 pid=\<process-id\>)<br> ... |**Cause**: PXF is not running on \<segment-host\>.<br>**Remedy**: Restart PXF on \<segment-host\>. |
| *ERROR*: failed to acquire resources on one or more segments<br>*DETAIL*: could not connect to server: Connection refused<br>&nbsp;&nbsp;&nbsp;&nbsp;Is the server running on host "\<segment-host\>" and accepting<br>&nbsp;&nbsp;&nbsp;&nbsp;TCP/IP connections on port 40000?(seg\<N\> \<segment-host\>:40000) | **Cause**: The Greenplum Database segment host \<segment-host\> is down.
| *ERROR*: failed to acquire resources on one or more segments<br>*DETAIL*: could not connect to server: Connection refused<br>&nbsp;&nbsp;&nbsp;&nbsp;Is the server running on host "\<segment-host\>" and accepting<br>&nbsp;&nbsp;&nbsp;&nbsp;TCP/IP connections on port 40000?(seg\<N\> \<segment-host\>:40000) | **Cause**: The Greenplum Database segment host \<segment-host\> is down. |
| org.apache.hadoop.security.AccessControlException: Permission denied: user=<user>, access=READ, inode=&quot;<filepath>&quot;:<user>:<group>:-rw------- | **Cause**: The Greenplum Database user that executed the PXF operation does not have permission to access the underlying Hadoop service (HDFS or Hive). See [About PXF User Impersonation](pxfuserimpers.html). |
## <a id="pxf-logging"></a>PXF Logging
Enabling more verbose logging may aid PXF troubleshooting efforts. PXF provides two categories of message logging: service-level and client-level.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册