提交 2190c6d5 编写于 作者: K Kostas Kloudas

[FLINK-10803][docs] Update the StreamingFileSink documentation for S3.

上级 16e05206
......@@ -24,16 +24,25 @@ under the License.
-->
This connector provides a Sink that writes partitioned files to filesystems
supported by the Flink `FileSystem` abstraction. Since in streaming the input
is potentially infinite, the streaming file sink writes data into buckets. The
bucketing behaviour is configurable but a useful default is time-based
supported by the [Flink `FileSystem` abstraction]({{ site.baseurl}}/ops/filesystems.html).
<span class="label label-danger">Important Note</span>: For S3, the `StreamingFileSink`
supports only the [Hadoop-based](https://hadoop.apache.org/) FileSystem implementation, not
the implementation based on [Presto](https://prestodb.io/). In case your job uses the
`StreamingFileSink` to write to S3 but you want to use the Presto-based one for checkpointing,
it is advised to use explicitly *"s3a://"* (for Hadoop) as the scheme for the target path of
the sink and *"s3p://"* for checkpointing (for Presto). Using *"s3://"* for both the sink
and checkpointing may lead to unpredictable behavior, as both implementations "listen" to that scheme.
Since in streaming the input is potentially infinite, the streaming file sink writes data
into buckets. The bucketing behaviour is configurable but a useful default is time-based
bucketing where we start writing a new bucket every hour and thus get
individual files that each contain a part of the infinite output stream.
Within a bucket, we further split the output into smaller part files based on a
rolling policy. This is useful to prevent individual bucket files from getting
too big. This is also configurable but the default policy rolls files based on
file size and a timeout, i.e if no new data was written to a part file.
file size and a timeout, *i.e* if no new data was written to a part file.
The `StreamingFileSink` supports both row-wise encoding formats and
bulk-encoding formats, such as [Apache Parquet](http://parquet.apache.org).
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册