<shortdesc>You can use the Greenplum Platform Extension Framework (PXF) <codeph>pxf://</codeph> protocol to access data on external HDFS and Hive systems.</shortdesc>
<body>
<p>The PXF <codeph>pxf</codeph> protocol is packaged as a Greenplum Database extension. The <codeph>pxf</codeph> protocol supports reading HDFS file and Hive table data. The protocol does not yet support writing to HDFS or Hive data stores.</p>
<p>When you use the <codeph>pxf</codeph> protocol to query HDFS and Hive systems, you specify the HDFS file or Hive table you want to access. PXF requests the data from HDFS and delivers the relevant portions in parallel to each Greenplum Database segment instance serving the query.</p>
<p>The PXF <codeph>pxf</codeph> protocol is packaged as a Greenplum Database extension. The <codeph>pxf</codeph> protocol supports reading from HDFS, Hive, and HBase data stores. You can also write text and binary data to HDFS with the <codeph>pxf</codeph> protocol.</p>
<p>When you use the <codeph>pxf</codeph> protocol to query HDFS, Hive, or HBase systems, you specify the HDFS file or Hive or HBase table that you want to access. PXF requests the data from the data store and delivers the relevant portions in parallel to each Greenplum Database segment instance serving the query.</p>
<p>You must explicitly initialize and start PXF before you can use the <codeph>pxf</codeph> protocol to read external data. You must also grant permissions to the <codeph>pxf</codeph> protocol and enable PXF in each database in which you want to create external tables to access external data.</p>
<p>For detailed information about configuring and using PXF and the <codeph>pxf</codeph> protocol, refer to <xrefhref="pxf-overview.xml"type="topic">Accessing External Data with PXF</xref>.</p>
<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">
<topicid="topic_u14_wtd_dbb">
<title>Accessing External Data with PXF</title>
<shortdesc>Data managed by your organization may already reside in external sources. The Greenplum Database PXF Extension Framework (PXF) provides access to this external data via built-in connectors that map an external data source to a Greenplum Database table definition.</shortdesc>
<shortdesc>Data managed by your organization may already reside in external sources. The Greenplum Platform Extension Framework (PXF) provides access to this external data via built-in connectors that map an external data source to a Greenplum Database table definition.</shortdesc>
<body>
<p>PXF is installed with HDFS, Hive, and HBase connectors. These connectors enable you to read external HDFS file system and Hive and HBase table data stored in text, Avro, RCFile, Parquet, SequenceFile, and ORC formats.</p>
<p>The Greenplum Platform Extension Framework includes a protocol C library and a Java service. After you configure and initialize PXF, you start a single PXF JVM process on each Greenplum Database segment host. This long-running process concurrently serves multiple query requests.</p>
@@ -35,11 +35,11 @@ The Greenplum Platform Extension Framework (PXF) provides parallel, high through
This topic describes the procedure that you must perform to upgrade PXF when you install a new version of Greenplum Database.
- **[Using PXF](using_pxf.html)**
- **[Using PXF to Read and Write External Data](using_pxf.html)**
This topic describes important PXF procedures and concepts, including enabling PXF for use in a database and PXF protocol and external table definitions.
This topic details the service- and database- level logging configuration procuredures for PXF. It also identifies some common PXF errors.
This topic details the service- and database- level logging configuration procedures for PXF. It also identifies some common PXF errors and describes how to address PXF memory issues.