提交 ac917ce0 编写于 作者: I Ivan Leskin 提交者: David Yozie

Extra docs for the pushdown feature (#5193)

* Extra docs for gp_external_enable_filter_pushdown

Add extra documentation for 'gp_external_enable_filter_pushdown' and the pushdown feature in PXF extension.

* Minor doc text fixes

Minor documentation text fixes, proposed by @dyozie.

* Clarify the pushdown support by PXF

Add the following information:
* List the PXF connectors that support pushdown;
* State that GPDB PXF extension supports pushdown;
* Add a list of conditions that need to be fulfilled for the pushdown feature to work when PXF protocol is used.

* Correct the list of PXF connectors with pushdown

* State that Hive and HBase PXF connectors support filter predicate pushdown;
* Remove references to JDBC and Apache Ignite PXF connectors, as proposed by @dyozie (these are not officially supported by Greenplum).
上级 db95eb96
......@@ -58,18 +58,30 @@
<p>Greenplum Database provides readable and writable external tables:</p>
<ul>
<li id="du210036">Readable external tables for data loading. Readable external tables
support basic extraction, transformation, and loading (ETL) tasks common in data
warehousing. Greenplum Database segment instances read external table data in parallel
to optimize large load operations. You cannot modify readable external tables. </li>
<li id="du220433">Writable external tables for data unloading. Writable external tables support:<ul>
support:
<ul>
<li id="du210101">Basic extraction, transformation, and loading (ETL) tasks
common in data warehousing;</li>
<li id="du210102">Reading external table data in parallel
from multiple Greenplum database segment instances to optimize large load operations;</li>
<li id="du210103">Filter pushdown (if a query contains WHERE clause,
it may be passed to the external data source). See <xref href="../../../ref_guide/config_params/guc-list.xml#gp_external_enable_filter_pushdown">.
Note that this feature is currently supported only by the <codeph>pxf</codeph> protocol (see <xref href="g-pxf-protocol.xml"></xref>).</li>
</ul>
<p>Readable external tables allow only <codeph>SELECT</codeph> operations.</p>
</li>
<li id="du220433">Writable external tables for data unloading. Writable external tables support:
<ul>
<li id="du220434">Selecting data from database tables to insert into the writable
external table.</li>
external table;</li>
<li id="du220435">Sending data to an application as a stream of data. For example,
unload data from Greenplum Database and send it to an application that connects to
another database or ETL tool to load the data elsewhere. </li>
another database or ETL tool to load the data elsewhere;</li>
<li id="du210321">Receiving output from Greenplum parallel MapReduce
calculations.</li>
</ul><p>Writable external tables allow only <codeph>INSERT</codeph> operations.</p></li>
</ul>
<p>Writable external tables allow only <codeph>INSERT</codeph> operations.</p>
</li>
</ul>
<p>External tables can be file-based or web-based. External tables using the
<codeph>file://</codeph> protocol are read-only tables.</p>
......
......@@ -5,7 +5,7 @@
<shortdesc>Data managed by your organization may already reside in external sources. The Greenplum Platform Extension Framework (PXF) provides access to this external data via built-in connectors that map an external data source to a Greenplum Database table definition.</shortdesc>
<body>
<p>PXF is installed with HDFS, Hive, and HBase connectors. These connectors enable you to read external HDFS file system and Hive and HBase table data stored in text, Avro, JSON, RCFile, Parquet, SequenceFile, and ORC formats.</p>
<note>PXF does not currently support filter predicate pushdown in the HDFS, Hive, and HBase connectors.</note>
<note>PXF does not currently support filter predicate pushdown for the HDFS connector.</note>
<p>The Greenplum Platform Extension Framework includes a protocol C library and a Java service. After you configure and initialize PXF, you start a single PXF JVM process on each Greenplum Database segment host. This long-running process concurrently serves multiple query requests.</p>
<p>For detailed information about the architecture of and using PXF, refer to the <xref href="../../pxf/overview_pxf.html" type="topic" format="html">Greenplum Platform Extension Framework (PXF)</xref> documentation.</p>
</body>
......
......@@ -6,8 +6,6 @@ Apache HBase is a distributed, versioned, non-relational database on Hadoop.
The PXF HBase connector reads data stored in an HBase table. This section describes how to use the PXF HBase connector.
**Note**: PXF does not yet support predicate pushdown to HBase.
## <a id="hbase_prereq"></a>Prerequisites
Before working with HBase table data, ensure that you have:
......
......@@ -169,3 +169,35 @@ Greenplum Database passes the parameters in the `LOCATION` string as headers to
| \<formatting-properties\> | Formatting properties supported by the profile; for example, the `formatter` or `delimiter`.   |
**Note:** When you create a PXF external table, you cannot use the `HEADER` option in your `FORMAT` specification.
## <a id="filter-pushdown"></a>Filter pushdown
PXF supports filter pushdown. When executing a `SELECT` query, the constraints from its `WHERE` clause can be extracted and passed to an external data source. This can speed up such queries and also reduce the amount of data transferred to Greenplum Database.
To enable this feature (not only for PXF, but for all external data accessors in Greenplum database), set the GUC parameter `gp_external_enable_filter_pushdown` to `on` (default value is `off`).
Not all external data sources support this feature. If such a source is accessed, the query will be executed without pushdown. Also, not all data types and operators are supported.
The PXF accesses data sources using different connectors, and the connector may not support the pushdown feature. The following connectors do support pushdown:
* HBase
* Hive
Supported data types:
* `INT`, array of `INT`;
* `FLOAT`;
* `NUMERIC`;
* `BOOL`;
* `CHAR`, `TEXT`, array of `TEXT`;
* `DATE`, `TIMESTAMP`.
Supported operators:
* `<`, `<=`, `>=`, `>`;
* `<>`, `=`;
* `IN`;
* `LIKE` (only for `TEXT` fields).
So, the pushdown feature works when:
* The GPDB protocol to access external data supports pushdown (PXF does), and
* The PXF connector supports pushdown, and
* The external data source (database, ...) supports pushdown.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册