提交 6b8c0fc3 编写于 作者: L Lisa Owen 提交者: David Yozie

docs - clarify pxf filter partitioning support for hive (#7580)

* docs - clarify pxf filter partitioning support for hive

* clarify the hadoop cfg update content

* remove cluster start

* edits requested by david

* remove disable statement per shivram
上级 2310b8f7
......@@ -70,11 +70,11 @@ Perform the following procedure to configure the desired PXF Hadoop-related conn
## <a id="client-cfg-update"></a>Updating Hadoop Configuration
If you update your Hadoop, Hive, or HBase configuration while the PXF service is running, you must re-sync the PXF configuration to your Greenplum Database cluster and restart PXF on each segment host in the cluster. For example:
If you update your Hadoop, Hive, or HBase configuration while the PXF service is running, you must copy the updated configuration to the `$PXF_CONF/servers/default` directory and re-sync the PXF configuration to your Greenplum Database cluster. For example:
``` shell
gpadmin@gpmaster$ cd $PXF_CONF/servers/default
gpadmin@gpmaster$ scp hiveuser@hivehost:/etc/hive/conf/hive-site.xml .
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster stop
gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster start
```
......@@ -34,25 +34,26 @@ SET gp_external_enable_filter_pushdown TO 'on';
PXF accesses data sources using different connectors, and filter pushdown support is determined by the specific connector implementation. The following PXF connectors support filter pushdown:
- Hive Connector
- Hive Connector, all profiles
- HBase Connector
- JDBC Connector
PXF filter pushdown can be used with these data types:
PXF filter pushdown can be used with these data types (connector-specific):
- `INT`
- `INT2`, `INT4`, `INT8`
- `CHAR`, `TEXT`
- `FLOAT`
- `NUMERIC`
- `BOOL`
- `CHAR`, `TEXT`
- `DATE`, `TIMESTAMP` (JDBC connector only)
PXF filter pushdown can be used with these operators:
You can use PXF filter pushdown with these operators:
- `<`, `<=`, `>=`, `>`
- `<>`, `=`
- `IN` operator on arrays of `INT` and `TEXT`
- `LIKE` (only for `TEXT` fields) (JDBC connector only)
- `AND`, `OR`
- `IN` operator on arrays of `INT` and `TEXT` (JDBC connector only)
- `LIKE` (`TEXT` fields, JDBC connector only)
To summarize, all of the following criteria must be met for filter pushdown to occur:
......@@ -61,3 +62,5 @@ To summarize, all of the following criteria must be met for filter pushdown to o
* The external data source that you are accessing must support pushdown. For example, HBase and Hive support pushdown.
* For queries on external tables that you create with the `pxf` protocol, the underlying PXF connector must also support filter pushdown. For example, the PXF Hive, HBase, and JDBC connectors support pushdown.
- Refer to Hive [Partition Filter Pushdown](hive_pxf.html#partitionfiltering) for more information about Hive support for this feature.
......@@ -29,7 +29,7 @@ The PXF Hive connector reads data stored in a Hive table. This section describes
Before working with Hive table data using PXF, ensure that you have met the PXF Hadoop [Prerequisites](access_hdfs.html#hadoop_prereq).
If you plan to use PXF filter pushdown with Hive, ensure that the `hive-site.xml` configuration parameter `hive.metastore.integral.jdo.pushdown` exists and is set to `true`.
*If you plan to use PXF filter pushdown with Hive integral types*, ensure that the configuration parameter `hive.metastore.integral.jdo.pushdown` exists and is set to `true` in the `hive-site.xml` in both your Hadoop cluster **and** `$PXF_CONF/servers/default/hive-site.xml`. Refer to [Updating Hadoop Configuration](client_instcfg.html#client-cfg-update) for more information.
## <a id="hive_fileformats"></a>Hive Data Formats
......@@ -661,13 +661,18 @@ In the following example, you will create and populate a Hive table stored in OR
## <a id="partitionfiltering"></a>Partition Filter Pushdown
The PXF Hive connector supports the Hive partitioning feature and directory structure. This enables partition exclusion on selected HDFS files comprising a Hive table. To use the partition filtering feature to reduce network traffic and I/O, run a query on a PXF external table using a `WHERE` clause that refers to a specific partition column in a partitioned Hive table.
The PXF Hive connector supports Hive partitioning pruning and the Hive partition directory structure. This enables partition exclusion on selected HDFS files comprising a Hive table. To use the partition filtering feature to reduce network traffic and I/O, run a query on a PXF external table using a `WHERE` clause that refers to a specific partition column in a partitioned Hive table.
To take advantage of PXF partition filtering pushdown, the Hive and PXF partition field names must be the same. Otherwise, PXF ignores partition filtering and the filtering is performed on the Greenplum Database side, impacting performance.
The PXF Hive Connector partition filtering support for Hive string and integral types is described below:
- The relational operators `=`, `<`, `<=`, `>`, `>=`, and `<>` are supported on string types.
- The relational operators `=` and `<>` are supported on integral types (To use partition filtering with Hive integral types, you must update the Hive configuration as described in the [Prerequisites](#prereq)).
- The logical operators `AND` and `OR` are supported when used with the relational operators mentioned above.
- The `LIKE` string operator is not supported.
<div class="note">The PXF Hive connector filters only on partition columns, not on other table attributes. Additionally, filter pushdown is supported only for those data types and operators identified in <a href="filter_push.html">About Filter Pushdown</a>. Disable filter pushdown when your query includes unsupported operators and data types.</div>
To take advantage of PXF partition filtering pushdown, the Hive and PXF partition field names must be the same. Otherwise, PXF ignores partition filtering and the filtering is performed on the Greenplum Database side, impacting performance.
To use PXF filter pushdown with Hive, the `hive-site.xml` configuration parameter `hive.metastore.integral.jdo.pushdown` must be set to `true`.
<div class="note">The PXF Hive connector filters only on partition columns, not on other table attributes. Additionally, filter pushdown is supported only for those data types and operators identified above.</div>
PXF filter pushdown is enabled by default. You configure PXF filter pushdown as described in [About Filter Pushdown](filter_push.html).
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册