README updates for PXF build/setup instructions (#3135)

* README updates for PXF setup instructions * PXF README updates for PXF setup instructions * test update * README updates for PXF setup instructions * PXF README updates for PXF setup instructions * note on curl depedancy * minor update

README updates for PXF build/setup instructions (#3135)
* README updates for PXF setup instructions * PXF README updates for PXF setup instructions * test update * README updates for PXF setup instructions * PXF README updates for PXF setup instructions * note on curl depedancy * minor update
baee6ca9 · Shivram Mani · GitHub · 1b46025d · baee6ca9 · baee6ca9
隐藏空白更改
内联并排

Showing with 19 addition and 9 deletion

README.macOS.bash README.macOS.bash +1 -1

README.md README.md +8 -0

gpAux/extensions/pxf/README.md gpAux/extensions/pxf/README.md +10 -8

未找到文件。
--- a/README.macOS.bash
+++ b/README.macOS.bash
@@ -16,7 +16,7 @@ brew install apr # gpperfmon
 brew install apr-util # gpperfmon
 brew link --force apr
 brew link --force apr-util
-
+brew install json-c # pxf

 # Installing Golang
 mkdir -p ~/go/src

--- a/README.md
+++ b/README.md
@@ -194,6 +194,14 @@ to be available in the environment. [Build and Install](#buildOrca) the latest O

 If you want to build GPDB without ORCA, configure requires `--disable-orca` flag to be set.

+### Building GPDB with PXF
+PXF is an extension framework for GPDB to enable access to external hadoop datasets.
+Refer to [PXF extension](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf) for more information.
+Currently, GPDPB is built with PXF by default. If you don't need PXF, use --disable-pxf flag with configure.
+Also PXF requires curl version > 7.21.3. On most centos6 environments with curl version 7.19 compilation will fail due to this dependancy.
+You can either try upgrading your curl version, or simply disable pxf during build.
+
+
 ```
 # Clean environment
 make distclean

--- a/gpAux/extensions/pxf/README.md
+++ b/gpAux/extensions/pxf/README.md
-# The PXF extension client for GPDB
+# The PXF extension for GPDB

-PXF is an extensible framework that allows GPDB or any other parallel database to query external datasets. The framework is built in Java and provides built-in connectors for accessing data of various formats(text,sequence files, orc,etc) that exists inside HDFS files, Hive tables, HBase tables and many more stores.
-This module includes the PXF C client and using the 'pxf' protocol with external table, GPDB can query external datasets via PXF service that runs alongside GPDB segments.
+PXF is an extension framework that allows GPDB or any other database to query external distributed datasets. The framework is built in Java and provides built-in connectors for accessing data of various formats(text,sequence files, avro, orc,etc) that may exist inside HDFS files, Hive tables, HBase tables and many more stores.
+PXF consists of a server side JVM based component and a C client component which serves as the means for GPDB to interact with the PXF service.
+This module only includes the PXF C client and the build instructions only builds the client.
+Using the 'pxf' protocol with external table, GPDB can query external datasets via PXF service that runs alongside GPDB segments.

 ## Table of Contents

@@ -66,13 +68,9 @@ LOCATION ('pxf://localhost:51200/tmp/dummy1' \
 FORMAT 'TEXT' (DELIMITER ',');
 ```

-If you wish to use PXF with Hadoop, instructions will be made available shortly.
-
-
-

 Please refer to [PXF Setup](https://cwiki.apache.org/confluence/display/HAWQ/PXF+Build+and+Install) for instructions to setup PXF.
-Once you also install and run PXF server on the machines where GPDB segments are run, you can select data from the demo PXF profile:
+Once you install and run PXF server alongside the GPDB segments, you can select data from the demo PXF profile:
 ```
 # SELECT * from pxf_read_test order by a;

@@ -88,6 +86,10 @@ Once you also install and run PXF server on the machines where GPDB segments are
 ```


+If you wish to use PXF with Hadoop, you will need to integrate with Hdfs or Hive, you can refer to the above doc on steps to install them.
+
+
+
 ## Run regression tests

 ```