diff --git a/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb b/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb index c5d65c506fd5519f85a825ec875126bc92bdcef1..0915d790789bc1f536ab3cbf3dbe5926fce14f0c 100644 --- a/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb +++ b/gpdb-doc/book/master_middleman/source/subnavs/pxf-subnav.erb @@ -10,6 +10,7 @@ Introduction to PXF
  • diff --git a/gpdb-doc/markdown/pxf/col_project.html.md.erb b/gpdb-doc/markdown/pxf/col_project.html.md.erb new file mode 100644 index 0000000000000000000000000000000000000000..89cc8122c99a0f84e542bb35b0106f9e16da3c33 --- /dev/null +++ b/gpdb-doc/markdown/pxf/col_project.html.md.erb @@ -0,0 +1,22 @@ +--- +title: About Column Projection in PXF +--- + +PXF supports column projection, and it is always enabled. With column projection, only the columns required by a `SELECT` query on an external table are returned from the external data source. This process can improve query performance, and can also reduce the amount of data that is transferred to Greenplum Database. + +**Note:** Some external data sources do not support column projection. If a query accesses a data source that does not support column projection, the query is instead executed without it, and the data is filtered after it is transferred to Greenplum Database. + +Column projection is automatically enabled for the `pxf` external table protocol. PXF accesses external data sources using different connectors, and column projection support is also determined by the specific connector implementation. The following PXF connector and profile combinations support column projection on read operations: + +- PXF Hive Connector, `HiveORC` profile +- PXF JDBC Connector, `Jdbc` profile +- PXF Hadoop and Object Store Connectors, `hdfs:parquet`, `adl:parquet`, `gs:parquet`,`s3:parquet`, and `wasbs:parquet` profiles + +**Note:** PXF may disable column projection in cases where it cannot successfully serialize a query filter; for example, when the `WHERE` clause resolves to a `boolean` type. + +To summarize, all of the following criteria must be met for column projection to occur: + +* The external data source that you are accessing must support column projection. For example, Hive supports column projection for ORC-format data, and certain SQL databases support column projection. +* The underlying PXF connector and profile implementation must also support column projection. For example, the PXF Hive and JDBC connector profiles identified above support column projection, as do the PXF connectors that support reading Parquet data. +* PXF must be able to serialize the query filter. + diff --git a/gpdb-doc/markdown/pxf/intro_pxf.html.md.erb b/gpdb-doc/markdown/pxf/intro_pxf.html.md.erb index e9c59fb616d217fc40a2183c48566c19de865c8f..38fcbacdb519f8ca380f14d465ec06f2886bb43d 100644 --- a/gpdb-doc/markdown/pxf/intro_pxf.html.md.erb +++ b/gpdb-doc/markdown/pxf/intro_pxf.html.md.erb @@ -58,3 +58,10 @@ PXF may require additional information to read or write certain data formats. Yo **Note:** When you create a PXF external table, you cannot use the `HEADER` option in your formatter specification. +## Other PXF Features + +Certain PXF connectors and profiles support filter pushdown and column projection. Refer to the following topics for detailed information about this support: + +- [About PXF Filter Pushdown](filter_push.html) +- [About Column Projection in PXF](col_project.html) +