未验证 提交 f3da8755 编写于 作者: 片刻小哥哥's avatar 片刻小哥哥 提交者: GitHub

Merge pull request #43 from apachecn/feature/flink_1.7_doc_zh_17

17 完成
# Generating Timestamps / Watermarks
# Generating Timestamps / Watermarks 生成时间戳/水印
This section is relevant for programs running on **event time**. For an introduction to _event time_, _processing time_, and _ingestion time_, please refer to the [introduction to event time](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_time.html).
本节与**事件时间**上运行的程序相关。有关_event time_, _processing time_, 和 _ingestion time_ 的介绍,请参阅[介绍事件时间](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_time.html)
To work with _event time_, streaming programs need to set the _time characteristic_ accordingly.
要使用 _event time_,流程序需要相应地设置 _time characteristic_ 。
......@@ -24,26 +24,26 @@ env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
## Assigning Timestamps
## Assigning Timestamps 分配时间戳
In order to work with _event time_, Flink needs to know the events’ _timestamps_, meaning each element in the stream needs to have its event timestamp _assigned_. This is usually done by accessing/extracting the timestamp from some field in the element.
为了处理 _event time_,Flink需要知道事件 _timestamps_ ,这意味着流中的每个元素都需要有其事件时间戳 _assigned_。这通常是通过从元素中的某个字段访问/提取时间戳来完成的。
Timestamp assignment goes hand-in-hand with generating watermarks, which tell the system about progress in event time.
时间戳分配与生成水印同时进行,水印可以告诉系统事件时间的进展。
There are two ways to assign timestamps and generate watermarks:
有两种分配时间戳和生成水印的方法:
1. Directly in the data stream source
2. Via a timestamp assigner / watermark generator: in Flink, timestamp assigners also define the watermarks to be emitted
1. 直接在数据流源中
2. 通过时间戳分配器/水印生成器:在FLink中,时间戳分配器还定义要发射的水印
Attention Both timestamps and watermarks are specified as milliseconds since the Java epoch of 1970-01-01T00:00:00Z.
注意时间戳和水印都被指定为自1970-01-01T00:00:00Z的Java时期以来的毫秒。
### Source Functions with Timestamps and Watermarks
### Source Functions with Timestamps and Watermarks 具有时间戳和水印的源函数
Stream sources can directly assign timestamps to the elements they produce, and they can also emit watermarks. When this is done, no timestamp assigner is needed. Note that if a timestamp assigner is used, any timestamps and watermarks provided by the source will be overwritten.
流源可以直接为它们生成的元素分配时间戳,还可以发出水印。完成此操作后,不需要时间戳分配器。注意,如果使用时间戳分配程序,则源提供的任何时间戳和水印都将被覆盖。
To assign a timestamp to an element in the source directly, the source must use the `collectWithTimestamp(...)` method on the `SourceContext`. To generate watermarks, the source must call the `emitWatermark(Watermark)` function.
要将时间戳直接分配给源中的一个元素,源必须在 `SourceContext`上使用 `collectWithTimestamp(...)`方法。要生成水印,源必须调用`emitWatermark(Watermark)`函数。
Below is a simple example of a _(non-checkpointed)_ source that assigns timestamps and generates watermarks:
以下是分配时间戳并生成水印的 _(non-checkpointed)_ Source的简单示例:
......@@ -80,13 +80,13 @@ override def run(ctx: SourceContext[MyType]): Unit = {
### Timestamp Assigners / Watermark Generators
### Timestamp Assigners / Watermark Generators 时间戳分配器/水印生成器
Timestamp assigners take a stream and produce a new stream with timestamped elements and watermarks. If the original stream had timestamps and/or watermarks already, the timestamp assigner overwrites them.
时间戳分配者获取一个流并生成一个具有时间戳元素和水印的新流。如果原始流已经具有时间戳和/或水印,则时间戳分配程序将覆盖它们。
Timestamp assigners are usually specified immediately after the data source, but it is not strictly required to do so. A common pattern, for example, is to parse (_MapFunction_) and filter (_FilterFunction_) before the timestamp assigner. In any case, the timestamp assigner needs to be specified before the first operation on event time (such as the first window operation). As a special case, when using Kafka as the source of a streaming job, Flink allows the specification of a timestamp assigner / watermark emitter inside the source (or consumer) itself. More information on how to do so can be found in the [Kafka Connector documentation](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/kafka.html).
时间戳分配程序通常是在数据源之后立即指定的,但并不严格要求这样做。例如,一个常见的模式是在时间戳分配器之前解析(_MapFunction_)和过滤器(_FilterFunction_)。在任何情况下,时间戳分配程序都需要在事件时间上的第一个操作(例如第一个窗口操作)之前指定。作为特例,当使用Kafka作为流作业的源时,Flink允许指定源(或使用者)内部的时间戳分配器/水印发射器。有关如何这样做的更多信息,可以在[Kafka连接器documentation](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/kafka.html).]中找到。
**NOTE:** The remainder of this section presents the main interfaces a programmer has to implement in order to create her own timestamp extractors/watermark emitters. To see the pre-implemented extractors that ship with Flink, please refer to the [Pre-defined Timestamp Extractors / Watermark Emitters](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_timestamp_extractors.html) page.
**注:** 本部分的其余部分介绍了程序员必须实施的主要接口,以便创建自己的时间戳提取器/水印发射器。要查看带有FLink的预实现提取器,请参阅[预定义的时间戳提取器/水印发射器](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_timestamp_extractors.html)页面。
......@@ -134,13 +134,13 @@ withTimestampsAndWatermarks
#### **With Periodic Watermarks**
#### **With Periodic Watermarks 带有周期性水印**
`AssignerWithPeriodicWatermarks` assigns timestamps and generates watermarks periodically (possibly depending on the stream elements, or purely based on processing time).
`AssignerWithPeriodicWatermarks`分配时间戳并定期生成水印(可能取决于流元素,也可能完全取决于处理时间)。
The interval (every _n_ milliseconds) in which the watermark will be generated is defined via `ExecutionConfig.setAutoWatermarkInterval(...)`. The assigner’s `getCurrentWatermark()` method will be called each time, and a new watermark will be emitted if the returned watermark is non-null and larger than the previous watermark.
生成水印的间隔(每 _n_ 毫秒)是通过`ExecutionConfig.setAutoWatermarkInterval(...)`定义的。如果返回的水印是非null且大于先前的水印,则将每次调用分配器的`getCurrentWatermark()` 方法,并且将发出新的水印。
Here we show two simple examples of timestamp assigners that use periodic watermark generation. Note that Flink ships with a `BoundedOutOfOrdernessTimestampExtractor` similar to the `BoundedOutOfOrdernessGenerator` shown below, which you can read about [here](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_timestamp_extractors.html#assigners-allowing-a-fixed-amount-of-lateness).
这里我们展示了两个使用周期性水印生成的时间戳分配程序的简单示例。请注意,Flink附带了`BoundedOutOfOrdernessTimestampExtractor`,类似于下面所示的`BoundedOutOfOrdernessGenerator` ,您可以阅读有关[here](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_timestamp_extractors.html#assigners-allowing-a-fixed-amount-of-lateness).的内容。
......@@ -149,6 +149,9 @@ Here we show two simple examples of timestamp assigners that use periodic waterm
* This generator generates watermarks assuming that elements arrive out of order,
* but only to a certain degree. The latest elements for a certain timestamp t will arrive
* at most n milliseconds after the earliest elements for timestamp t.
* 当元件无序到达时,该发生器产生水印,
* 但仅在一定程度上。某个时间戳T的最新元素将到达
* 在时间戳T最早的元素之后的至多N毫秒。
*/
public class BoundedOutOfOrdernessGenerator implements AssignerWithPeriodicWatermarks<MyEvent> {
......@@ -173,6 +176,8 @@ public class BoundedOutOfOrdernessGenerator implements AssignerWithPeriodicWater
/**
* This generator generates watermarks that are lagging behind processing time by a fixed amount.
* It assumes that elements arrive in Flink after a bounded delay.
* 该生成器生成落后于处理时间的水印。
* 假定元素在有界延迟之后到达Flink。
*/
public class TimeLagWatermarkGenerator implements AssignerWithPeriodicWatermarks<MyEvent> {
......@@ -200,6 +205,8 @@ public class TimeLagWatermarkGenerator implements AssignerWithPeriodicWatermarks
* This generator generates watermarks assuming that elements arrive out of order,
* but only to a certain degree. The latest elements for a certain timestamp t will arrive
* at most n milliseconds after the earliest elements for timestamp t.
* 这个生成器产生水印,假设元素到达不正常,
* 但只在一定程度上。某个时间戳t的最新元素将在时间戳t的最早元素之后最多n毫秒到达。
*/
class BoundedOutOfOrdernessGenerator extends AssignerWithPeriodicWatermarks[MyEvent] {
......@@ -221,6 +228,8 @@ class BoundedOutOfOrdernessGenerator extends AssignerWithPeriodicWatermarks[MyEv
/**
* This generator generates watermarks that are lagging behind processing time by a fixed amount.
* It assumes that elements arrive in Flink after a bounded delay.
* 该生成器生成落后于处理时间的水印。
* 假定元素在有界延迟之后到达Flink。
*/
class TimeLagWatermarkGenerator extends AssignerWithPeriodicWatermarks[MyEvent] {
......@@ -238,11 +247,11 @@ class TimeLagWatermarkGenerator extends AssignerWithPeriodicWatermarks[MyEvent]
#### **With Punctuated Watermarks**
#### **With Punctuated Watermarks 加上标点符号 **
To generate watermarks whenever a certain event indicates that a new watermark might be generated, use `AssignerWithPunctuatedWatermarks`. For this class Flink will first call the `extractTimestamp(...)` method to assign the element a timestamp, and then immediately call the `checkAndGetNextWatermark(...)` method on that element.
若要在某一事件表明可能生成新水印时生成水印,请使用 `AssignerWithPunctuatedWatermarks`。对于这个类,Flink将首先调用`extractTimestamp(...)`方法为元素分配一个时间戳,然后立即调用该元素上的`checkAndGetNextWatermark(...)` 方法。
The `checkAndGetNextWatermark(...)` method is passed the timestamp that was assigned in the `extractTimestamp(...)` method, and can decide whether it wants to generate a watermark. Whenever the `checkAndGetNextWatermark(...)` method returns a non-null watermark, and that watermark is larger than the latest previous watermark, that new watermark will be emitted.
`checkAndGetNextWatermark(...)` 方法通过了在`extractTimestamp(...)` 方法中分配的时间戳,并可以决定它是否希望生成水印。每当`checkAndGetNextWatermark(...)`方法返回非零水印,并且该水印大于最新的前一个水印时,将发射新的水印。
......@@ -280,17 +289,17 @@ class PunctuatedAssigner extends AssignerWithPunctuatedWatermarks[MyEvent] {
_Note:_ It is possible to generate a watermark on every single event. However, because each watermark causes some computation downstream, an excessive number of watermarks degrades performance.
_注意:_ 可以在每个单个事件上生成水印。然而,由于每个水印导致一些下游的计算,过多的水印会降低性能。
## Timestamps per Kafka Partition
## Timestamps per Kafka Partition 每个卡夫卡分区的##时间戳
When using [Apache Kafka](connectors/kafka.html) as a data source, each Kafka partition may have a simple event time pattern (ascending timestamps or bounded out-of-orderness). However, when consuming streams from Kafka, multiple partitions often get consumed in parallel, interleaving the events from the partitions and destroying the per-partition patterns (this is inherent in how Kafka’s consumer clients work).
当使用[ApacheKafka](连接器/kafka.html)作为数据源时,每个Kafka分区可能有一个简单的事件时间模式(升序时间戳或超出顺序的界限)。然而,当使用Kafka的流时,多个分区常常被并行地消耗,将事件与分区交织在一起,并破坏每个分区模式(这是Kafka的消费者客户端工作方式固有的)。
In that case, you can use Flink’s Kafka-partition-aware watermark generation. Using that feature, watermarks are generated inside the Kafka consumer, per Kafka partition, and the per-partition watermarks are merged in the same way as watermarks are merged on stream shuffles.
在这种情况下,您可以使用flink的卡夫卡分区感知水印生成.使用该特性,在Kafka使用者内部生成水印,每个Kafka分区,并且每个分区的水印以与流洗牌上合并水印相同的方式合并。
For example, if event timestamps are strictly ascending per Kafka partition, generating per-partition watermarks with the [ascending timestamps watermark generator](event_timestamp_extractors.html#assigners-with-ascending-timestamps) will result in perfect overall watermarks.
例如,如果事件时间戳是严格上升的每卡夫卡分区,生成每个分区的水印与[上升的时间戳水印generator](event_timestamp_extractors.html#assigners-with-ascending-timestamps)将导致完美的整体水印。
The illustrations below show how to use the per-Kafka-partition watermark generation, and how watermarks propagate through the streaming dataflow in that case.
下图展示了如何使用per-Kafka分割水印生成,以及在这种情况下水印如何通过流数据流传播。
......@@ -322,5 +331,5 @@ val stream: DataStream[MyType] = env.addSource(kafkaSource)
![Generating Watermarks with awareness for Kafka-partitions](../img/parallel_kafka_watermarks.svg)
![通过对Kafka-Partitions的感知生成水印](../IMG/parallel_kafka_watermarks.svg)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册