提交 0837cd6a 编写于 作者: L Lisa Owen 提交者: David Yozie

docs - new pxf parquet write options (#7240)

* docs - new pxf parquet write options

* add bytes

* line edit: the PXF -> PXF

* Line edit: which -> that
上级 23a11d3a
......@@ -21,7 +21,7 @@ specific language governing permissions and limitations
under the License.
-->
Use the PXF HDFS connector to read and write Parquet-format data. This section decribes how to read and write HDFS files that are stored in Parquet format, including how to create, query, and insert into an external table that references files in the HDFS data store.
Use the PXF HDFS connector to read and write Parquet-format data. This section decribes how to read and write HDFS files that are stored in Parquet format, including how to create, query, and insert into external tables that reference files in the HDFS data store.
PXF currently supports reading and writing primitive Parquet data types only.
......@@ -77,15 +77,19 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
| DISTRIBUTED BY | If you plan to load the writable external table with data from an existing Greenplum Database table, consider specifying the same distribution policy or \<column_name\> on the writable external table as that defined for the table from which you plan to load the data. Doing so will avoid extra motion of data between segments on the load operation. |
<a id="customopts"></a>
The PXF `hdfs:parquet` profile supports write compression. You specify the compression codec via a custom option in the `CREATE EXTERNAL TABLE` `LOCATION` clause. The `hdfs:parquet` profile supports the following custom options:
The PXF `hdfs:parquet` profile supports encoding- and compression-related write options. You specify these write options in the `CREATE WRITABLE EXTERNAL TABLE` `LOCATION` clause. The `hdfs:parquet` profile supports the following custom options:
| Option | Value Description |
| Write Option | Value Description |
|-------|-------------------------------------|
| COMPRESSION_CODEC | The compression codec alias. Supported compression codecs for writing Parquet data include: `snappy`, `gzip`, `lzo`, and `uncompressed` . If this option is not provided, PXF compresses the data using snappy compression. |
| COMPRESSION_CODEC | The compression codec alias. Supported compression codecs for writing Parquet data include: `snappy`, `gzip`, `lzo`, and `uncompressed` . If this option is not provided, PXF compresses the data using `snappy` compression. |
| ROWGROUP_SIZE | A Parquet file consists of one or more row groups, a logical partitioning of the data into rows. `ROWGROUP_SIZE` identifies the size (in bytes) of the row group. The default row group size is `8 * 1024 * 1024` bytes. |
| PAGE_SIZE | A row group consists of column chunks that are divided up into pages. `PAGE_SIZE` is the size (in bytes) of such a page. The default page size is `1024 * 1024` bytes. |
| DICTIONARY_PAGE_SIZE | Dictionary encoding is enabled by default when PXF writes Parquet files. There is a single dictionary page per column, per row group. `DICTIONARY_PAGE_SIZE` is similar to `PAGE_SIZE`, but for the dictionary. The default dictionary page size is `512 * 1024` bytes. |
| PARQUET_VERSION | The Parquet version; values `v1` and `v2` are supported. The default Parquet version is `v1`. |
**Note**: You must explicitly specify `uncompressed` if you do not want PXF to compress the data.
Parquet files that you write to HDFS have the following naming format: `<file>.<compress_extension>.parquet`, for example `1547061635-0000004417_0.gz.parquet`.
Parquet files that you write to HDFS with PXF have the following naming format: `<file>.<compress_extension>.parquet`, for example `1547061635-0000004417_0.gz.parquet`.
## <a id="parquet_write"></a> Example
......@@ -126,7 +130,7 @@ In this example, you create a Parquet-format writable external table that refere
4. Query the readable external table `read_pxf_parquet`:
``` sql
gpadmin=# SELECT * FROM read_pxf_parquet ORDER BY total_sales;
postgres=# SELECT * FROM read_pxf_parquet ORDER BY total_sales;
```
``` pre
......
......@@ -21,7 +21,7 @@ specific language governing permissions and limitations
under the License.
-->
The PXF object store connectors support reading and writing Parquet-format data. This section decribes how to use PXF to access Parquet-format data in an object store, including how to create and query an external table that references a Parquet file in the store.
The PXF object store connectors support reading and writing Parquet-format data. This section decribes how to use PXF to access Parquet-format data in an object store, including how to create and query external tables that references a Parquet file in the store.
<div class="note">PXF Parquet write support is a Beta feature.</div>
......@@ -67,7 +67,7 @@ The specific keywords and values used in the [CREATE EXTERNAL TABLE](../ref_guid
| \<path&#8209;to&#8209;dir\> | The absolute path to the directory in the object store. |
| PROFILE=\<objstore\>:parquet | The `PROFILE` keyword must identify the specific object store. For example, `s3:parquet`. |
| SERVER=\<server_name\> | The named server configuration that PXF uses to access the data. |
| \<custom&#8209;option\>=\<value\> | Parquet-specific custom options are described in the [PXF HDFS Parquet documentation](hdfs_parquet.html#customopts). |
| \<custom&#8209;option\>=\<value\> | Parquet-specific custom write options are described in the [PXF HDFS Parquet documentation](hdfs_parquet.html#customopts). |
| FORMAT 'CUSTOM' | Use `FORMAT` '`CUSTOM`' with `(FORMATTER='pxfwritable_export')` (write) or `(FORMATTER='pxfwritable_import')` (read). |
| DISTRIBUTED BY | If you plan to load the writable external table with data from an existing Greenplum Database table, consider specifying the same distribution policy or \<column_name\> on the writable external table as that defined for the table from which you plan to load the data. Doing so will avoid extra motion of data between segments on the load operation. |
......@@ -77,7 +77,7 @@ If you are accessing an S3 object store, you can provide S3 credentials directly
Refer to the [Example](hdfs_parquet.html#parquet_write) in the PXF HDFS Parquet documentation for a Parquet write/read example. Modifications that you must make to run the example with an object store include:
- Using the `CREATE EXTERNAL TABLE` syntax and `LOCATION` keywords and settings described above for the writable external table. For example, if your server name is `s3srvcfg`:
- Using the `CREATE WRITABLE EXTERNAL TABLE` syntax and `LOCATION` keywords and settings described above for the writable external table. For example, if your server name is `s3srvcfg`:
``` sql
CREATE WRITABLE EXTERNAL TABLE pxf_tbl_parquet_s3 (location text, month text, number_of_orders int, total_sales double precision)
......
......@@ -238,7 +238,7 @@ To run this example, you must:
## <a id="write_text"></a>Writing Text Data
The "<objstore>:text" profiles support writing single line plain text data to an object store. When you create a writable external table with the PXF, you specify the name of a directory. When you insert records into a writable external table, the block(s) of data that you insert are written to one or more files in the directory that you specified.
The "\<objstore\>:text" profiles support writing single line plain text data to an object store. When you create a writable external table with PXF, you specify the name of a directory. When you insert records into a writable external table, the block(s) of data that you insert are written to one or more files in the directory that you specified.
**Note**: External tables that you create with a writable profile can only be used for `INSERT` operations. If you want to query the data that you inserted, you must create a separate readable external table that references the directory.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册