From 54dee6cebe3e13e414a51fe65d77190dcf12001d Mon Sep 17 00:00:00 2001 From: Shivram Mani Date: Mon, 24 Sep 2018 13:35:23 -0700 Subject: [PATCH] Use PXF server from apache/hawq to the new greenplum/pxf repo (#5798) PXF client in gpdb uses pxf libraries from apache hawq repo. These pxf libraries will continue being developed in a new PXF repo greenplum-db/pxf and is in the process of getting open sourced in the next few days. The PXF extension and gpdb-pxf client code will continue to remain in gpdb repo. The following changes are included in this PR: Transition from the old PXF namespace org.apache.hawq.pxf to org.greenplum.pxf (there is a separate PR in the PXF repo to address the package namespace refactor greenplum-db/pxf#5) Doc updates to reflect the new PXF repo and the new package namespace --- README.md | 3 +- gpAux/extensions/pxf/README.md | 33 +++++++----------- gpAux/extensions/pxf/expected/setup.out | 16 ++++----- gpAux/extensions/pxf/sql/setup.sql | 18 +++++----- .../markdown/common/gpdb-features.html.md.erb | 2 +- .../markdown/pxf/overview_pxf.html.md.erb | 2 +- .../markdown/pxf/sdk/build_conn.html.md.erb | 22 +++++------- .../pxf/sdk/deploy_profile.html.md.erb | 24 ++++++------- .../markdown/pxf/sdk/dev_overview.html.md.erb | 2 +- gpdb-doc/markdown/pxf/sdk/pxfapi.html.md.erb | 6 ++-- .../pxf/troubleshooting_pxf.html.md.erb | 4 +-- gpdb-doc/markdown/pxf/using_pxf.html.md.erb | 34 +++++++++---------- 12 files changed, 76 insertions(+), 90 deletions(-) diff --git a/README.md b/README.md index b2f8853f24..33d9bf8d39 100644 --- a/README.md +++ b/README.md @@ -170,7 +170,8 @@ make distclean ### Building GPDB with PXF PXF is an extension framework for GPDB to enable fast access to external hadoop datasets. -Refer to [PXF extension](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf) for more information. +Refer to [PXF extension](gpAux/extensions/pxf/README.md) for more information. + Currently, GPDB is built with PXF by default (--enable-pxf is on). In order to build GPDB without pxf, simply invoke `./configure` with additional option `--disable-pxf`. PXF requires curl, so `--enable-pxf` is not compatible with the `--without-libcurl` option. diff --git a/gpAux/extensions/pxf/README.md b/gpAux/extensions/pxf/README.md index d1f1c3ca25..e4be220794 100644 --- a/gpAux/extensions/pxf/README.md +++ b/gpAux/extensions/pxf/README.md @@ -5,15 +5,6 @@ PXF consists of a server side JVM based component and a C client component which This module only includes the PXF C client and the build instructions only builds the client. Using the 'pxf' protocol with external table, GPDB can query external datasets via PXF service that runs alongside GPDB segments. -## Table of Contents - -* Usage -* Initialize and start GPDB cluster -* Enable PXF extension -* Run unit tests -* Run regression tests -======= - ## Usage ### Enable PXF extension in GPDB build process. @@ -54,23 +45,25 @@ Additional instructions on building and starting a GPDB cluster can be found in the top-level [README.md](../../../README.md) ("_Build the database_" section). -## Create and use PXF external table -If you wish to simply test GPDB and PXF without hadoop, you can use the Demo Profile. -The Demo profile demonstrates how GPDB can parallely the external data via the PXF agents. The data served is -static data from the PXF agents themselves. +### Install PXF Server +Please refer to [PXF Development](https://github.com/greenplum-db/pxf/blob/master/README.md) for instructions to setup PXF. +You will need one PXF server agent per Segment host. + +### Create and use PXF external table +If you wish to simply test drive PXF extension without hitting any external data source, you can avoid starting any of the hadoop components (while installing the PXF Server) and simply use the Demo Profile. + +The Demo profile demonstrates how GPDB using its segments can access static data served by the PXF service(s) in parallel. ``` # CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT) \ LOCATION ('pxf://localhost:5888/tmp/dummy1' \ -'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' \ -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' \ -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') \ +'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' \ +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' \ +'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') \ FORMAT 'TEXT' (DELIMITER ','); ``` -Please refer to [PXF Setup](https://cwiki.apache.org/confluence/display/HAWQ/PXF+Build+and+Install) for instructions to setup PXF. -Once you install and run PXF server alongside the GPDB segments, you can select data from the demo PXF profile: ``` # SELECT * from pxf_read_test order by a; @@ -90,11 +83,11 @@ If you wish to use PXF with Hadoop, you will need to integrate with Hdfs or Hive -## Run regression tests +### Run regression tests ``` make installcheck ``` This will connect to the running database, and run the regression -tests located in the `regress` directory. +tests located in the `regress` directory. \ No newline at end of file diff --git a/gpAux/extensions/pxf/expected/setup.out b/gpAux/extensions/pxf/expected/setup.out index bee92950eb..8d81c3468e 100644 --- a/gpAux/extensions/pxf/expected/setup.out +++ b/gpAux/extensions/pxf/expected/setup.out @@ -9,20 +9,20 @@ DROP EXTENSION pxf; CREATE EXTENSION pxf; CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://tmp/dummy1' -'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') +'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE pxf_readcustom_test (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://tmp/dummy1' -'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoResolver') +'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoResolver') FORMAT 'CUSTOM' (formatter='pxfwritable_import'); CREATE WRITABLE EXTERNAL TABLE pxf_write_test (a int, b TEXT) LOCATION ('pxf:///tmp/pxf?' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoFileWritableAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoFileWritableAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ',') DISTRIBUTED BY (a); CREATE TABLE origin (a int, b TEXT) DISTRIBUTED BY (a); INSERT INTO origin SELECT i, 'data_' || i FROM generate_series(10,99) AS i; diff --git a/gpAux/extensions/pxf/sql/setup.sql b/gpAux/extensions/pxf/sql/setup.sql index 3b75c616ee..426cef659a 100644 --- a/gpAux/extensions/pxf/sql/setup.sql +++ b/gpAux/extensions/pxf/sql/setup.sql @@ -13,23 +13,23 @@ CREATE EXTENSION pxf; CREATE EXTERNAL TABLE pxf_read_test (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://tmp/dummy1' -'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') +'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE pxf_readcustom_test (a TEXT, b TEXT, c TEXT) LOCATION ('pxf://tmp/dummy1' -'?FRAGMENTER=org.apache.hawq.pxf.api.examples.DemoFragmenter' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoResolver') +'?FRAGMENTER=org.greenplum.pxf.api.examples.DemoFragmenter' +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoResolver') FORMAT 'CUSTOM' (formatter='pxfwritable_import'); CREATE WRITABLE EXTERNAL TABLE pxf_write_test (a int, b TEXT) LOCATION ('pxf:///tmp/pxf?' -'&ACCESSOR=org.apache.hawq.pxf.api.examples.DemoFileWritableAccessor' -'&RESOLVER=org.apache.hawq.pxf.api.examples.DemoTextResolver') +'&ACCESSOR=org.greenplum.pxf.api.examples.DemoFileWritableAccessor' +'&RESOLVER=org.greenplum.pxf.api.examples.DemoTextResolver') FORMAT 'TEXT' (DELIMITER ',') DISTRIBUTED BY (a); CREATE TABLE origin (a int, b TEXT) DISTRIBUTED BY (a); -INSERT INTO origin SELECT i, 'data_' || i FROM generate_series(10,99) AS i; \ No newline at end of file +INSERT INTO origin SELECT i, 'data_' || i FROM generate_series(10,99) AS i; diff --git a/gpdb-doc/markdown/common/gpdb-features.html.md.erb b/gpdb-doc/markdown/common/gpdb-features.html.md.erb index 46e8248dd8..92b7b2257c 100644 --- a/gpdb-doc/markdown/common/gpdb-features.html.md.erb +++ b/gpdb-doc/markdown/common/gpdb-features.html.md.erb @@ -52,6 +52,6 @@ the Greenplum hosts. - The deprecated gpcheck management utility and its replacement gpsupport are only supported with Pivotal Greenplum Database. -- To use the Greenplum Platform Extension Framework (PXF) with open source Greenplum Database, you must separately build and install the PXF server software. Refer to the build instructions in the PXF README files in the Greenplum Database and Apache HAWQ (incubating) repositories. +- To use the Greenplum Platform Extension Framework (PXF) with open source Greenplum Database, you must separately build and install the PXF server software. Refer to the build instructions in the PXF README files in the Greenplum Database and PXF repositories. - Suggestions to contact Pivotal Technical Support in this documentation are intended only for Pivotal Greenplum Database customers. diff --git a/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb b/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb index e8e942d75b..87e046a3f6 100644 --- a/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb +++ b/gpdb-doc/markdown/pxf/overview_pxf.html.md.erb @@ -21,7 +21,7 @@ specific language governing permissions and limitations under the License. --> -The Greenplum Platform Extension Framework (PXF) provides parallel, high throughput data access and federated queries across heterogeneous data sources via built-in connectors that map a Greenplum Database external table definition to an external data source. This Greenplum Database extension is based on [PXF](https://cwiki.apache.org/confluence/display/HAWQ/PXF) from Apache HAWQ (incubating). +The Greenplum Platform Extension Framework (PXF) provides parallel, high throughput data access and federated queries across heterogeneous data sources via built-in connectors that map a Greenplum Database external table definition to an external data source. PXF has its roots from Apache HAWQ project. - **[PXF Architecture](intro_pxf.html)** diff --git a/gpdb-doc/markdown/pxf/sdk/build_conn.html.md.erb b/gpdb-doc/markdown/pxf/sdk/build_conn.html.md.erb index 4e688af1c3..f14f1d1750 100644 --- a/gpdb-doc/markdown/pxf/sdk/build_conn.html.md.erb +++ b/gpdb-doc/markdown/pxf/sdk/build_conn.html.md.erb @@ -111,14 +111,14 @@ Before building the *Demo* connector, ensure that you have: Perform the following procedure to create a local copy of the *Demo* connector source code, update package names, configure compile-time dependencies, and use `gradle` to build the connector. -1. Download the PXF *Demo* connector source code. You can obtain the PXF source code from the Apache HAWQ (incubating) `incubator-hawq` `github` repository. For example: +1. Download the PXF *Demo* connector source code from the Greenplum PXF git repo. You can obtain the PXF source code from Greenplum PXF `github` repository. For example: ``` shell user@devsystem$ cd $PXFDEV_BASE - user@devsystem$ git clone https://github.com/apache/incubator-hawq.git + user@devsystem$ git clone https://github.com/greenplum-db/pxf.git ``` - The `clone` operation creates a directory named `incubator-hawq/` in your current working directory. + The `clone` operation creates a directory named `pxf/` in your current working directory. 2. Create a project directory for your copy of the source code and navigate to that directory. For example: @@ -134,24 +134,18 @@ Perform the following procedure to create a local copy of the *Demo* connector s user@devsystem$ cp $PXFDEV_BASE/pxf-api-.jar libs/ ``` -4. The source code for the PXF *Demo* connector is located in the `incubator-hawq/pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/examples` directory of the repository you cloned in Step 1. Copy this code to your work area. For example: +4. The source code for the PXF *Demo* connector is located in the `pxf/server/pxf-api/src/main/java/org/greenplum/pxf/api/examples` directory of the repository you cloned in Step 1. Copy this code to your work area. For example: ``` shell user@devsystem$ mkdir -p src/main/java/org/greenplum/pxf/example/demo user@devsystem$ cd src/main/java/org/greenplum/pxf/example/demo - user@devsystem$ cp $PXFDEV_BASE/incubator-hawq/pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/examples/* . + user@devsystem$ cp $PXFDEV_BASE/pxf/server/pxf-api/src/main/java/org/greenplum/pxf/api/examples/* . ``` -5. The original PXF *Demo* connector resides in the `org.apache.hawq.pxf.api.examples` package. Your *Demo* connector resides in a package named `org.greenplum.pxf.example.demo`. Update the package name in your local copy of the *Demo* connector source code. You can edit the files, run a script, etc. For example: +5. The original PXF *Demo* connector resides in the `org.greenplum.pxf.api.examples` package. Your *Demo* connector resides in a package named `org.greenplum.pxf.example.demo`. Update the package name in your local copy of the *Demo* connector source code. You can edit the files, run a script, etc. For example: ``` shell - user@devsystem$ sed -i.bak s/"org.apache.hawq.pxf.api.examples"/"org.greenplum.pxf.example.demo"/g *.java - ``` - - The `sed` command above creates a backup of each file. Remove the backup files. For example: - - ``` shell - user@devsystem$ rm *.bak + user@devsystem$ find . -name '*.java' -exec sed -i '' s/"org.apache.hawq.pxf.api.examples"/"org.apache.hawq.pxf.example.demo"/g {} + ``` 6. Initialize a `gradle` Java library project for your *Demo* connector. For example: @@ -200,7 +194,7 @@ Perform the following procedure to create a local copy of the *Demo* connector s testCompile 'junit:junit:4.12' compile 'commons-logging:commons-logging:1.1.3' - compile 'org.apache.hawq.pxf.api:pxf-api:3.3.0.0' + compile 'org.greenplum.pxf.api:pxf-api:4.0.0' } diff --git a/gpdb-doc/markdown/pxf/sdk/deploy_profile.html.md.erb b/gpdb-doc/markdown/pxf/sdk/deploy_profile.html.md.erb index 6f00da4700..95cc130ee8 100644 --- a/gpdb-doc/markdown/pxf/sdk/deploy_profile.html.md.erb +++ b/gpdb-doc/markdown/pxf/sdk/deploy_profile.html.md.erb @@ -28,7 +28,7 @@ The profile \ provides a simple mapping to the \ classes. The G ### Specifying the Plug-in Class Names -The profile \ identify the fully-qualified names of the Java classes that PXF will use to split (\), read and/or write (\), and deserialize/serialize (\) the external data. +The profile \ identify the fully-qualified names of the Java classes that PXF will use to split (\), read and/or write (\), and deserialize/serialize (\) the external data. When you define a profile that supports a read operation from an external data store, you must provide one each of \, \, and \ plug-in class names. You must provide both an \ and a \ plug-in class name for a profile that supports a write operation to an external data store. @@ -43,9 +43,9 @@ You can re-use a single plug-in class name in multiple profile definitions. For delimited single line records from plain text files on HDFS - org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter - org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor - org.apache.hawq.pxf.plugins.hdfs.StringPassResolver + org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter + org.greenplum.pxf.plugins.hdfs.LineBreakAccessor + org.greenplum.pxf.plugins.hdfs.StringPassResolver @@ -56,9 +56,9 @@ You can re-use a single plug-in class name in multiple profile definitions. For It is not splittable (non parallel) and slower than HdfsTextSimple. - org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter - org.apache.hawq.pxf.plugins.hdfs.QuotedLineBreakAccessor - org.apache.hawq.pxf.plugins.hdfs.StringPassResolver + org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter + org.greenplum.pxf.plugins.hdfs.QuotedLineBreakAccessor + org.greenplum.pxf.plugins.hdfs.StringPassResolver ``` @@ -154,7 +154,7 @@ Perform the following procedure to define and register a read profile and a writ ``` shell gpadmin@gpmaster$ gpssh -e -v -f seghostfile "/usr/local/greenplum-db/pxf/bin/pxf restart" ``` - + 8. Verify that you correctly deployed the *Demo* connector profiles by creating and accessing Greenplum Database external tables: 1. Connect to a database in which you created the PXF extension as the `gpadmin` user. For example, to connect to a database named `pxf_exampledb`: @@ -162,7 +162,7 @@ Perform the following procedure to define and register a read profile and a writ ``` shell gpadmin@gpmaster$ psql -d pxf_exampledb -U gpadmin ``` - + 2. Create a readable Greenplum external table specifying the `DemoReadLocalFS` profile name. For example: ``` sql @@ -171,12 +171,12 @@ Perform the following procedure to define and register a read profile and a writ FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE ``` - + 3. Query the `demo_tbl_read_wp` table: ``` sql pxf_exampledb=# SELECT * from demo_tbl_read_wp; - a | b | c + a | b | c ----------------+--------+-------- fragment2 row1 | value1 | value2 fragment2 row2 | value1 | value2 @@ -195,7 +195,7 @@ Perform the following procedure to define and register a read profile and a writ FORMAT 'TEXT' (DELIMITER ','); CREATE EXTERNAL TABLE ``` - + 5. Write some text data into the `demo_tbl_write_wp` table. For example: ``` sql diff --git a/gpdb-doc/markdown/pxf/sdk/dev_overview.html.md.erb b/gpdb-doc/markdown/pxf/sdk/dev_overview.html.md.erb index 0670b125b1..4e180935c5 100644 --- a/gpdb-doc/markdown/pxf/sdk/dev_overview.html.md.erb +++ b/gpdb-doc/markdown/pxf/sdk/dev_overview.html.md.erb @@ -3,7 +3,7 @@ title: Using the PXF Java SDK --- The Greenplum Platform Extension Framework (PXF) SDK provides the Java classes and interfaces that you use to create connectors to new external data sources, data formats, and data access APIs from Greenplum Database. You can extend PXF functionality *without changing Greenplum Database* when you use the PXF Java SDK. -PXF in Greenplum Database is based on [PXF](https://cwiki.apache.org/confluence/display/HAWQ/PXF) from the open source Apache HAWQ (incubating) project. You can contribute to PXF development via the [Greenplum Database](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf) and the [Apache HAWQ (incubating)](https://github.com/apache/incubator-hawq/tree/master/pxf) open source `github` repositories. +PXF in Greenplum Database has its roots in the Apache HAWQ project. You can contribute to Greenplum PXF development via the open source github repositories for [PXF Server Plugins](https://github.com/greenplum-db/pxf) and the [Greenplum PXF Extension/Client](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf). ## Topic Overview diff --git a/gpdb-doc/markdown/pxf/sdk/pxfapi.html.md.erb b/gpdb-doc/markdown/pxf/sdk/pxfapi.html.md.erb index 113ce65bc4..402e60fa7a 100644 --- a/gpdb-doc/markdown/pxf/sdk/pxfapi.html.md.erb +++ b/gpdb-doc/markdown/pxf/sdk/pxfapi.html.md.erb @@ -29,19 +29,17 @@ The PXF API exposes the following interfaces: | `WriteAccessor` | Writes `OneRow` records to the external data source. | -Refer to the [PXF API JavaDocs](http://hawq.incubator.apache.org/docs/pxf/javadoc/) for detailed information about the classes and interfaces exposed by the API. - ## General PXF API Information ### Package Name -The PXF API base package name is `org.apache.hawq.pxf.api`. All PXF API classes and interfaces reside in this package. +The PXF API base package name is `org.greenplum.pxf.api`. All PXF API classes and interfaces reside in this package. ### JAR File You need the PXF API JAR file to develop with the PXF SDK. This file is named `pxf-api-.jar`, where `` is a dot-separated 4 digit version number. For example: ``` shell -pxf-api-3.3.0.0.jar +pxf-api-4.0.0.jar ``` *PXF JAR files are not currently available from a remote repository.* You can obtain the PXF API JAR file from your Greenplum Database installation here: diff --git a/gpdb-doc/markdown/pxf/troubleshooting_pxf.html.md.erb b/gpdb-doc/markdown/pxf/troubleshooting_pxf.html.md.erb index 2af377833d..08b1e21e2b 100644 --- a/gpdb-doc/markdown/pxf/troubleshooting_pxf.html.md.erb +++ b/gpdb-doc/markdown/pxf/troubleshooting_pxf.html.md.erb @@ -45,7 +45,7 @@ PXF utilizes `log4j` for service-level logging. PXF-service-related log messages PXF provides more detailed logging when the `DEBUG` level is enabled. To configure PXF `DEBUG` logging, uncomment the following line in `pxf-log4j.properties`: ``` shell -#log4j.logger.org.apache.hawq.pxf=DEBUG +#log4j.logger.org.greenplum.pxf=DEBUG ``` Copy the `pxf-log4j.properties` file to each segment host and restart the PXF service on *each* Greenplum Database segment host. For example: @@ -74,7 +74,7 @@ dbname=> SELECT * FROM hdfstest; Examine/collect the log messages from `pxf-service.log`. **Note**: `DEBUG` logging is quite verbose and has a performance impact. Remember to turn off PXF service `DEBUG` logging after you have collected the desired information. - + ### Client-Level Logging diff --git a/gpdb-doc/markdown/pxf/using_pxf.html.md.erb b/gpdb-doc/markdown/pxf/using_pxf.html.md.erb index 60bda934e7..fe533240e2 100644 --- a/gpdb-doc/markdown/pxf/using_pxf.html.md.erb +++ b/gpdb-doc/markdown/pxf/using_pxf.html.md.erb @@ -34,7 +34,7 @@ You must explicitly enable the PXF extension in each Greenplum Database in which **Note**: You must have Greenplum Database administrator privileges to create an extension. - + ### Enable Procedure Perform the following procedure for **_each_** database in which you want to use PXF: @@ -50,11 +50,11 @@ Perform the following procedure for **_each_** database in which you want to use ``` sql database-name=# CREATE EXTENSION pxf; ``` - + Creating the `pxf` extension registers the `pxf` protocol and the call handlers required for PXF to access external data. ### Disable Procedure - + When you no longer want to use PXF on a specific database, you must explicitly disable the PXF extension for that database: 1. Connect to the database as the `gpadmin` user: @@ -62,30 +62,30 @@ When you no longer want to use PXF on a specific database, you must explicitly d ``` shell gpadmin@gpmaster$ psql -d -U gpadmin ``` - + 2. Drop the PXF extension: ``` sql database-name=# DROP EXTENSION pxf; ``` - + The `DROP` command fails if there are any currently defined external tables using the `pxf` protocol. Add the `CASCADE` option if you choose to forcibly remove these external tables. ## Granting Access to PXF -To read external data with PXF, you create an external table with the `CREATE EXTERNAL TABLE` command that specifies the `pxf` protocol. You must specifically grant `SELECT` permission to the `pxf` protocol to all non-`SUPERUSER` Greenplum Database roles that require such access. +To read external data with PXF, you create an external table with the `CREATE EXTERNAL TABLE` command that specifies the `pxf` protocol. You must specifically grant `SELECT` permission to the `pxf` protocol to all non-`SUPERUSER` Greenplum Database roles that require such access. To grant a specific role access to the `pxf` protocol, use the `GRANT` command. For example, to grant the role named `bill` read access to data referenced by an external table created with the `pxf` protocol: ``` sql -GRANT SELECT ON PROTOCOL pxf TO bill; +GRANT SELECT ON PROTOCOL pxf TO bill; ``` To write data to an external data store with PXF, you create an external table with the `CREATE WRITABLE EXTERNAL TABLE` command that specifies the `pxf` protocol. You must specifically grant `INSERT` permission to the `pxf` protocol to all non-`SUPERUSER` Greenplum Database roles that require such access. For example: ``` sql -GRANT INSERT ON PROTOCOL pxf TO bill; +GRANT INSERT ON PROTOCOL pxf TO bill; ``` ## Configuring Filter Pushdown @@ -99,7 +99,7 @@ SHOW gp_external_enable_filter_pushdown; SET gp_external_enable_filter_pushdown TO 'on'; ``` -**Note:** Some external data sources do not support filter pushdown. Also, filter pushdown may not be supported with certain data types or operators. If a query accesses a data source that does not support filter push-down for the query constraints, the query is instead executed without filter pushdown (the data is filtered after it is transferred to Greenplum Database). +**Note:** Some external data sources do not support filter pushdown. Also, filter pushdown may not be supported with certain data types or operators. If a query accesses a data source that does not support filter push-down for the query constraints, the query is instead executed without filter pushdown (the data is filtered after it is transferred to Greenplum Database). PXF accesses data sources using different connectors, and filter pushdown support is determined by the specific connector implementation. The following PXF connectors support filter pushdown: @@ -157,13 +157,13 @@ A PXF profile definition includes the name of the profile, a description, and th ``` xml HdfsTextSimple - This profile is suitable for using when reading + This profile is suitable for using when reading delimited single line records from plain text files on HDFS - org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter - org.apache.hawq.pxf.plugins.hdfs.LineBreakAccessor - org.apache.hawq.pxf.plugins.hdfs.StringPassResolver + org.greenplum.pxf.plugins.hdfs.HdfsDataFragmenter + org.greenplum.pxf.plugins.hdfs.LineBreakAccessor + org.greenplum.pxf.plugins.hdfs.StringPassResolver ``` @@ -175,7 +175,7 @@ A PXF profile definition includes the name of the profile, a description, and th You use PXF to access data stored on external systems. Depending upon the external data store, this access may require that you install and/or configure additional components or services for the external data store. For example, to use PXF to access a file stored in HDFS, you must install a Hadoop client on each Greenplum Database segment host. -PXF depends on JAR files and other configuration information provided by these additional components. The `$GPHOME/pxf/conf/pxf-private.classpath` and `$GPHOME/pxf/conf/pxf-public.classpath` configuration files identify PXF JAR dependencies. In most cases, PXF manages the `pxf-private.classpath` file, adding entries as necessary based on your Hadoop distribution and optional Hive and HBase client installations. +PXF depends on JAR files and other configuration information provided by these additional components. The `$GPHOME/pxf/conf/pxf-private.classpath` and `$GPHOME/pxf/conf/pxf-public.classpath` configuration files identify PXF JAR dependencies. In most cases, PXF manages the `pxf-private.classpath` file, adding entries as necessary based on your Hadoop distribution and optional Hive and HBase client installations. Should you need to add additional JAR dependencies for PXF, for example a JDBC driver JAR file, you must add them to the `pxf-public.classpath` file on each segment host, and then restart PXF on each host. @@ -193,11 +193,11 @@ FORMAT '[TEXT|CSV|CUSTOM]' (); The `LOCATION` clause in a `CREATE EXTERNAL TABLE` statement specifying the `pxf` protocol is a URI that identifies the path to, or other information describing, the location of the external data. For example, if the external data store is HDFS, the \ would identify the absolute path to a specific HDFS file. If the external data store is Hive, \ would identify a schema-qualified Hive table name. -Use the query portion of the URI, introduced by the question mark (?), to identify the PXF profile name. +Use the query portion of the URI, introduced by the question mark (?), to identify the PXF profile name. -You will provide profile-specific information using the optional &\=\ component of the `LOCATION` string and formatting information via the \ component of the string. The custom options and formatting properties supported by a specific profile are identified later in usage documentation. +You will provide profile-specific information using the optional &\=\ component of the `LOCATION` string and formatting information via the \ component of the string. The custom options and formatting properties supported by a specific profile are identified later in usage documentation. -Greenplum Database passes the parameters in the `LOCATION` string as headers to the PXF Java service. +Greenplum Database passes the parameters in the `LOCATION` string as headers to the PXF Java service. Table 1. Create External Table Parameter Values and Descriptions -- GitLab