未验证 提交 a6167a7c 编写于 作者: I Islotus 提交者: GitHub

Merge pull request #10 from Islotus/youjun/flink_1.7_doc_zh_61

Youjun/flink 1.7 doc zh 61
# Time Attributes
# 时间属性
Flink is able to process streaming data based on different notions of _time_.
Flink能够根据_time_的不同概念处理流式数据。
* _Processing time_ refers to the system time of the machine (also known as “wall-clock time”) that is executing the respective operation.
* _Event time_ refers to the processing of streaming data based on timestamps which are attached to each row. The timestamps can encode when an event happened.
* _Ingestion time_ is the time that events enter Flink; internally, it is treated similarly to event time.
* _Processing time_ 指的是执行相应操作的机器的系统时间(也称为“挂钟时间”)。
* _Event time_ 指的是基于附加到每一行的时间戳流式数据处理。时间戳可以在事件发生时进行编码。
* _Ingestion time_ 是事件进入Flink的时间; 在内部,它与事件时间类似地对待。
For more information about time handling in Flink, see the introduction about [Event Time and Watermarks](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_time.html).
有关Flink中时间处理的更多信息,请参阅有关 [Event Time and Watermarks](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_time.html)的介绍。
This pages explains how time attributes can be defined for time-based operations in Flink’s Table API & SQL.
本页介绍了如何在Flink的Table API和SQL中为基于时间的操作定义时间属性。
## Introduction to Time Attributes
## 时间属性简介
Time-based operations such as windows in both the [Table API](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/tableApi.html#group-windows) and [SQL](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/sql.html#group-windows) require information about the notion of time and its origin. Therefore, tables can offer _logical time attributes_ for indicating time and accessing corresponding timestamps in table programs.
基于时间的操作,例如 [Table API](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/tableApi.html#group-windows)[SQL](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/sql.html#group-windows) 中的窗口,需要有关时间概念及其来源的信息。因此,表可以提供 _logical time attributes_ 用于指示时间和访问表程序中的相应时间戳。
Time attributes can be part of every table schema. They are defined when creating a table from a `DataStream` or are pre-defined when using a `TableSource`. Once a time attribute has been defined at the beginning, it can be referenced as a field and can be used in time-based operations.
时间属性可以是每个表模式的一部分。它们是在从 `DataStream` 创建表时定义的,或者是在使用 `TableSource` 时预定义的。一旦在开头定义了时间属性,它就可以作为字段引用,并且可以在基于时间的操作中使用。
As long as a time attribute is not modified and is simply forwarded from one part of the query to another, it remains a valid time attribute. Time attributes behave like regular timestamps and can be accessed for calculations. If a time attribute is used in a calculation, it will be materialized and becomes a regular timestamp. Regular timestamps do not cooperate with Flink’s time and watermarking system and thus can not be used for time-based operations anymore.
只要时间属性未被修改并且只是从查询的一部分转发到另一部分,它仍然是有效的时间属性 时间属性的行为类似于常规时间戳,可以访问以进行计算。如果在计算中使用了时间属性,则它将具体化并成为常规时间戳。常规时间戳不与 Flink 的时间和水印系统配合,因此不能再用于基于时间的操作。
Table programs require that the corresponding time characteristic has been specified for the streaming environment:
表程序要求为流式环境指定相应的时间特性:
......@@ -41,22 +41,22 @@ env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime); // default
```
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime) // default
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime) // default
// alternatively:
// env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime) // env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
```
## Processing time
## 处理时间
Processing time allows a table program to produce results based on the time of the local machine. It is the simplest notion of time but does not provide determinism. It neither requires timestamp extraction nor watermark generation.
处理时间允许表程序根据本地机器的时间产生结果。这是最简单的时间概念,但不提供决定论。它既不需要时间戳提取也不需要水印生成。
There are two ways to define a processing time attribute.
有两种方法可以定义处理时间属性。
### During DataStream-to-Table Conversion
### 在 DataStream 到 Table 转换期间
The processing time attribute is defined with the `.proctime` property during schema definition. The time attribute must only extend the physical schema by an additional logical field. Thus, it can only be defined at the end of the schema definition.
处理时间属性在模式定义期间使用 `.proctime` 属性定义。time 属性只能通过附加的逻辑字段扩展物理模式。因此,它只能在模式定义的末尾定义。
......@@ -83,9 +83,9 @@ val windowedTable = table.window(Tumble over 10.minutes on 'UserActionTime as 'u
### Using a TableSource
### 使用 TableSource
The processing time attribute is defined by a `TableSource` that implements the `DefinedProctimeAttribute` interface. The logical time attribute is appended to the physical schema defined by the return type of the `TableSource`.
处理时间属性由实现 `DefinedProctimeAttribute` 接口的 `TableSource` 定义。逻辑时间属性附加到由 `TableSource` 的返回类型定义的物理模式。
......@@ -154,26 +154,26 @@ val windowedTable = tEnv
## Event time
## 事件时间
Event time allows a table program to produce results based on the time that is contained in every record. This allows for consistent results even in case of out-of-order events or late events. It also ensures replayable results of the table program when reading records from persistent storage.
事件时间允许表程序根据每个记录中包含的时间生成结果。即使在无序事件或延迟事件的情况下,这也允许一致的结果。当从持久存储中读取记录时,它还确保表程序的可重放结果。
Additionally, event time allows for unified syntax for table programs in both batch and streaming environments. A time attribute in a streaming environment can be a regular field of a record in a batch environment.
此外,事件时间允许批处理和流式处理环境中的表程序的统一语法。流式处理环境中的时间属性可以是批处理环境中的记录的常规字段。
In order to handle out-of-order events and distinguish between on-time and late events in streaming, Flink needs to extract timestamps from events and make some kind of progress in time (so-called [watermarks](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_time.html)).
为了处理乱序事件并区分流式数据中的准时和迟到事件,Flink 需要从事件中提取时间戳并及时取得某种进展(所谓的[watermarks](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_time.html))。
An event time attribute can be defined either during DataStream-to-Table conversion or by using a TableSource.
可以在 DataStream 到 Table 转换期间或使用 TableSource 定义事件时间属性。
### During DataStream-to-Table Conversion
### DataStream 到 Table 转换期间
The event time attribute is defined with the `.rowtime` property during schema definition. [Timestamps and watermarks](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_time.html) must have been assigned in the `DataStream` that is converted.
在模式定义期间使用 `.rowtime` 属性定义事件时间属性。[Timestamps and watermarks](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/event_time.html) 必须在已转换的 `DataStream` 中分配。
There are two ways of defining the time attribute when converting a `DataStream` into a `Table`. Depending on whether the specified `.rowtime` field name exists in the schema of the `DataStream` or not, the timestamp field is either
`DataStream` 转换为 `Table` 时有两种定义时间属性的方法。根据指定的 `.rowtime` 字段名称是否存在于 `DataStream` 的模式中,时间戳字段要么是
* appended as a new field to the schema or
* replaces an existing field.
* 作为新字段附加到模式或
* 替换现有字段。
In either case the event time timestamp field will hold the value of the `DataStream` event time timestamp.
在任何一种情况下,事件时间时间戳字段都将保存 `DataStream` 事件时间戳的值。
......@@ -205,28 +205,28 @@ WindowedTable windowedTable = table.window(Tumble.over("10.minutes").on("UserAct
```
// Option 1:
// Option 1:
// extract timestamp and assign watermarks based on knowledge of the stream val stream: DataStream[(String, String)] = inputStream.assignTimestampsAndWatermarks(...)
// declare an additional logical field as an event time attribute val table = tEnv.fromDataStream(stream, 'Username, 'Data, 'UserActionTime.rowtime)
// Option 2:
// Option 2:
// extract timestamp from first field, and assign watermarks based on knowledge of the stream val stream: DataStream[(Long, String, String)] = inputStream.assignTimestampsAndWatermarks(...)
// the first field has been used for timestamp extraction, and is no longer necessary
// replace first field with a logical event time attribute val table = tEnv.fromDataStream(stream, 'UserActionTime.rowtime, 'Username, 'Data)
// Usage:
// Usage:
val windowedTable = table.window(Tumble over 10.minutes on 'UserActionTime as 'userActionWindow)
```
### Using a TableSource
### 使用 TableSource
The event time attribute is defined by a `TableSource` that implements the `DefinedRowtimeAttributes` interface. The `getRowtimeAttributeDescriptors()` method returns a list of `RowtimeAttributeDescriptor` for describing the final name of a time attribute, a timestamp extractor to derive the values of the attribute, and the watermark strategy associated with the attribute.
事件时间属性由实现 `DefinedRowtimeAttributes` 接口的 `TableSource` 定义。`getRowtimeAttributeDescriptors()` 方法返回一个用于描述时间属性的最终名称 `RowtimeAttributeDescriptor` 列表,一个用于派生属性值的时间戳提取器,以及与该属性相关的 watermark 策略。
Please make sure that the `DataStream` returned by the `getDataStream()` method is aligned with the defined time attribute. The timestamps of the `DataStream` (the ones which are assigned by a `TimestampAssigner`) are only considered if a `StreamRecordTimestamp` timestamp extractor is defined. Watermarks of a `DataStream` are only preserved if a `PreserveWatermarks` watermark strategy is defined. Otherwise, only the values of the `TableSource`’s rowtime attribute are relevant.
请确保 `getDataStream()` 方法返回的 `DataStream` 与定义的 time 属性对齐。只有在定义了 `StreamRecordTimestamp` 时间戳提取器时,才会考虑 `DataStream` 的时间戳(由 `TimestampAssigner` 分配的时间戳)。只有在定义了 `PreserveWatermarks` 水印策略时,才会保留 `DataStream` 的水印。 否则,只有 `TableSource` 的 rowtime 属性的值是相关的。
......@@ -306,6 +306,3 @@ val windowedTable = tEnv
.scan("UserActions")
.window(Tumble over 10.minutes on 'UserActionTime as 'userActionWindow)
```
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册