From 2190c6d5bda2465993bae1d1486bf259a8c5a0ee Mon Sep 17 00:00:00 2001
From: Kostas Kloudas <kkloudas@gmail.com>
Date: Wed, 7 Nov 2018 16:27:02 +0100
Subject: [PATCH] [FLINK-10803][docs] Update the StreamingFileSink
 documentation for S3.

---
 docs/dev/connectors/streamfile_sink.md | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/docs/dev/connectors/streamfile_sink.md b/docs/dev/connectors/streamfile_sink.md
index aea66c3cc48..8f50675ccbc 100644
--- a/docs/dev/connectors/streamfile_sink.md
+++ b/docs/dev/connectors/streamfile_sink.md
@@ -24,16 +24,25 @@ under the License.
 -->
 
 This connector provides a Sink that writes partitioned files to filesystems
-supported by the Flink `FileSystem` abstraction. Since in streaming the input
-is potentially infinite, the streaming file sink writes data into buckets. The
-bucketing behaviour is configurable but a useful default is time-based
+supported by the [Flink `FileSystem` abstraction]({{ site.baseurl}}/ops/filesystems.html).
+
+<span class="label label-danger">Important Note</span>: For S3, the `StreamingFileSink` 
+supports only the [Hadoop-based](https://hadoop.apache.org/) FileSystem implementation, not
+the implementation based on [Presto](https://prestodb.io/). In case your job uses the 
+`StreamingFileSink` to write to S3 but you want to use the Presto-based one for checkpointing,
+it is advised to use explicitly *"s3a://"* (for Hadoop) as the scheme for the target path of
+the sink and *"s3p://"* for checkpointing (for Presto). Using *"s3://"* for both the sink
+and checkpointing may lead to unpredictable behavior, as both implementations "listen" to that scheme.
+
+Since in streaming the input is potentially infinite, the streaming file sink writes data
+into buckets. The bucketing behaviour is configurable but a useful default is time-based
 bucketing where we start writing a new bucket every hour and thus get
 individual files that each contain a part of the infinite output stream.
 
 Within a bucket, we further split the output into smaller part files based on a
 rolling policy. This is useful to prevent individual bucket files from getting
 too big. This is also configurable but the default policy rolls files based on
-file size and a timeout, i.e if no new data was written to a part file. 
+file size and a timeout, *i.e* if no new data was written to a part file. 
 
 The `StreamingFileSink` supports both row-wise encoding formats and
 bulk-encoding formats, such as [Apache Parquet](http://parquet.apache.org).
-- 
GitLab