提交 052251a7 编写于 作者: D David Yozie

Docs: Edit pxf filter pushdown docs (#5219)

* fix problematic xrefs

* Consistency edits for list items

* edit, relocate filter pushdown section

* minor edits to guc description

* remove note about non-support in Hive doc

* Edits from Lisa's review

* Adding note about experimental status of HBase connector pushdown

* Adding note about experimental status of Hive connector pushdown

* Revert "Adding note about experimental status of Hive connector pushdown"

This reverts commit 43dfe51526e19983835f7cbd25d540d3c0dec4ba.

* Revert "Adding note about experimental status of HBase connector pushdown"

This reverts commit 3b143de058c7403c2bc141c11c61bf227c2abf3a.

* restoring HBase, Hive pushdown support

* slight wording change

* adding xref
上级 673bcf22
......@@ -58,29 +58,28 @@
<p>Greenplum Database provides readable and writable external tables:</p>
<ul>
<li id="du210036">Readable external tables for data loading. Readable external tables
support:
<ul>
<li id="du210101">Basic extraction, transformation, and loading (ETL) tasks
common in data warehousing;</li>
<li id="du210102">Reading external table data in parallel
from multiple Greenplum database segment instances to optimize large load operations;</li>
<li id="du210103">Filter pushdown (if a query contains WHERE clause,
it may be passed to the external data source). See <xref href="../../ref_guide/config_params/guc-list.xml#gp_external_enable_filter_pushdown"></xref>.
Note that this feature is currently supported only by the <codeph>pxf</codeph> protocol (see <xref href="g-pxf-protocol.xml"></xref>).</li>
</ul>
<p>Readable external tables allow only <codeph>SELECT</codeph> operations.</p>
support: <ul>
<li id="du210101">Basic extraction, transformation, and loading (ETL) tasks common in
data warehousing</li>
<li id="du210102">Reading external table data in parallel from multiple Greenplum
database segment instances, to optimize large load operations</li>
<li id="du210103">Filter pushdown. If a query contains <codeph>WHERE</codeph> clause,
it may be passed to the external data source. See <xref
href="../../ref_guide/config_params/guc-list.xml#gp_external_enable_filter_pushdown"
/>. Note that this feature is currently supported only with the
<codeph>pxf</codeph> protocol (see <xref href="g-pxf-protocol.xml"/>).</li>
</ul><p>Readable external tables allow only <codeph>SELECT</codeph> operations.</p>
</li>
<li id="du220433">Writable external tables for data unloading. Writable external tables support:
<ul>
<li id="du220433">Writable external tables for data unloading. Writable external tables
support: <ul>
<li id="du220434">Selecting data from database tables to insert into the writable
external table;</li>
external table</li>
<li id="du220435">Sending data to an application as a stream of data. For example,
unload data from Greenplum Database and send it to an application that connects to
another database or ETL tool to load the data elsewhere;</li>
another database or ETL tool to load the data elsewhere</li>
<li id="du210321">Receiving output from Greenplum parallel MapReduce
calculations.</li>
</ul>
<p>Writable external tables allow only <codeph>INSERT</codeph> operations.</p>
</ul><p>Writable external tables allow only <codeph>INSERT</codeph> operations.</p>
</li>
</ul>
<p>External tables can be file-based or web-based. External tables using the
......
......@@ -3783,7 +3783,9 @@
<title>gp_external_enable_filter_pushdown</title>
<body>
<p>Enable filter pushdown when reading data from external tables. If pushdown fails, a query
will be executed without it, and GPDB will apply the same constraints to the result.</p>
is executed without pushing filters to the external data source (instead, Greenplum Database
applies the same constraints to the result). See <xref
href="../../admin_guide/external/g-external-tables.xml#topic3"/> for more information.</p>
<table id="enable_filter_pushdown_table">
<tgroup cols="3">
<colspec colnum="1" colname="col1" colwidth="1*"/>
......
......@@ -349,8 +349,6 @@ When choosing an ORC-supporting profile, consider the following:
- Does not support column projection.
- Does not support complex types or the timestamp data type.
**Note**: The `HiveORC` and `HiveVectorizedORC` profiles do not currently support predicate pushdown.
### <a id="hive_hiveorc_example" class="no-quick-link"></a>Example: Using the HiveORC Profile
In the following example, you will create a Hive table stored in ORC format and use the `HiveORC` profile to query this Hive table.
......
......@@ -88,6 +88,37 @@ To write data to an external data store with PXF, you create an external table w
GRANT INSERT ON PROTOCOL pxf TO bill;
```
## <a id="filter-pushdown"></a>Configuring Filter Pushdown
PXF supports filter pushdown. When filter pushdown is enabled, the constraints from the `WHERE` clause of a `SELECT` query can be extracted and passed to the external data source for filtering. This process can improve query performance, and can also reduce the amount of data that is transferred to Greenplum Database.
You enable or disable filter pushdown for all external table protocols, including `pxf`, by setting the `gp_external_enable_filter_pushdown` configuration parameter. The default value is `off`; set it to `on` to enable filter pushdown.
**Note:** Some data sources do not support filter pushdown. Also, filter pushdown may not be supported with certain data types or operators. If a query accesses a data source that does not support filter push-down for the query constraints, the query is instead executed without filter pushdown (the data is first transferred to Greenplum Database and then filtered).
PXF accesses data sources using different connectors, and filter pushdown support is determined by the specific connector implementation. The following connectors support filter pushdown:
* HBase
* Hive
PXF filter pushdown can be used with these data types:
* `INT`, array of `INT`
* `FLOAT`
* `NUMERIC`
* `BOOL`
* `CHAR`, `TEXT`, array of `TEXT`
* `DATE`, `TIMESTAMP`.
PXF filter pushdown can be used with these operators:
* `<`, `<=`, `>=`, `>`
* `<>`, `=`
* `IN`
* `LIKE` (only for `TEXT` fields).
To summarize, all of the following criteria must be met for filter pushdown to occur:
* The Greenplum Database protocol that is used to access external data must use filter pushdown. The `pxf` external table protocol can use pushdown.
* The external data source that is being accessed must support pushdown. For example, both HBase and Hive support pushdown.
* For queries that use the `pxf` protocol, the underlying PXF connector must support filter pushdown. For example, the HBase and Hive connectors support pushdown.
## <a id="built-inprofiles"></a> PXF Profiles
PXF is installed with HDFS, Hive, and HBase connectors that provide a number of built-in profiles. These profiles simplify and unify access to external data sources of varied formats. You provide the profile name when you specify the `pxf` protocol on a `CREATE EXTERNAL TABLE` command to create a Greenplum Database external table referencing an external data store.
......@@ -171,33 +202,3 @@ Greenplum Database passes the parameters in the `LOCATION` string as headers to
**Note:** When you create a PXF external table, you cannot use the `HEADER` option in your `FORMAT` specification.
## <a id="filter-pushdown"></a>Filter pushdown
PXF supports filter pushdown. When executing a `SELECT` query, the constraints from its `WHERE` clause can be extracted and passed to an external data source. This can speed up such queries and also reduce the amount of data transferred to Greenplum Database.
To enable this feature (not only for PXF, but for all external data accessors in Greenplum database), set the GUC parameter `gp_external_enable_filter_pushdown` to `on` (default value is `off`).
Not all external data sources support this feature. If such a source is accessed, the query will be executed without pushdown. Also, not all data types and operators are supported.
The PXF accesses data sources using different connectors, and the connector may not support the pushdown feature. The following connectors do support pushdown:
* HBase
* Hive
Supported data types:
* `INT`, array of `INT`;
* `FLOAT`;
* `NUMERIC`;
* `BOOL`;
* `CHAR`, `TEXT`, array of `TEXT`;
* `DATE`, `TIMESTAMP`.
Supported operators:
* `<`, `<=`, `>=`, `>`;
* `<>`, `=`;
* `IN`;
* `LIKE` (only for `TEXT` fields).
So, the pushdown feature works when:
* The GPDB protocol to access external data supports pushdown (PXF does), and
* The PXF connector supports pushdown, and
* The external data source (database, ...) supports pushdown.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册