提交 baee6ca9 编写于 作者: S Shivram Mani 提交者: GitHub

README updates for PXF build/setup instructions (#3135)

* README updates for PXF setup instructions

* PXF README updates for PXF setup instructions

* test update

* README updates for PXF setup instructions

* PXF README updates for PXF setup instructions

* note on curl depedancy

* minor update
上级 1b46025d
......@@ -16,7 +16,7 @@ brew install apr # gpperfmon
brew install apr-util # gpperfmon
brew link --force apr
brew link --force apr-util
brew install json-c # pxf
# Installing Golang
mkdir -p ~/go/src
......
......@@ -194,6 +194,14 @@ to be available in the environment. [Build and Install](#buildOrca) the latest O
If you want to build GPDB without ORCA, configure requires `--disable-orca` flag to be set.
### Building GPDB with PXF
PXF is an extension framework for GPDB to enable access to external hadoop datasets.
Refer to [PXF extension](https://github.com/greenplum-db/gpdb/tree/master/gpAux/extensions/pxf) for more information.
Currently, GPDPB is built with PXF by default. If you don't need PXF, use --disable-pxf flag with configure.
Also PXF requires curl version > 7.21.3. On most centos6 environments with curl version 7.19 compilation will fail due to this dependancy.
You can either try upgrading your curl version, or simply disable pxf during build.
```
# Clean environment
make distclean
......
# The PXF extension client for GPDB
# The PXF extension for GPDB
PXF is an extensible framework that allows GPDB or any other parallel database to query external datasets. The framework is built in Java and provides built-in connectors for accessing data of various formats(text,sequence files, orc,etc) that exists inside HDFS files, Hive tables, HBase tables and many more stores.
This module includes the PXF C client and using the 'pxf' protocol with external table, GPDB can query external datasets via PXF service that runs alongside GPDB segments.
PXF is an extension framework that allows GPDB or any other database to query external distributed datasets. The framework is built in Java and provides built-in connectors for accessing data of various formats(text,sequence files, avro, orc,etc) that may exist inside HDFS files, Hive tables, HBase tables and many more stores.
PXF consists of a server side JVM based component and a C client component which serves as the means for GPDB to interact with the PXF service.
This module only includes the PXF C client and the build instructions only builds the client.
Using the 'pxf' protocol with external table, GPDB can query external datasets via PXF service that runs alongside GPDB segments.
## Table of Contents
......@@ -66,13 +68,9 @@ LOCATION ('pxf://localhost:51200/tmp/dummy1' \
FORMAT 'TEXT' (DELIMITER ',');
```
If you wish to use PXF with Hadoop, instructions will be made available shortly.
Please refer to [PXF Setup](https://cwiki.apache.org/confluence/display/HAWQ/PXF+Build+and+Install) for instructions to setup PXF.
Once you also install and run PXF server on the machines where GPDB segments are run, you can select data from the demo PXF profile:
Once you install and run PXF server alongside the GPDB segments, you can select data from the demo PXF profile:
```
# SELECT * from pxf_read_test order by a;
......@@ -88,6 +86,10 @@ Once you also install and run PXF server on the machines where GPDB segments are
```
If you wish to use PXF with Hadoop, you will need to integrate with Hdfs or Hive, you can refer to the above doc on steps to install them.
## Run regression tests
```
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册