未验证 提交 24a56404 编写于 作者: 片刻小哥哥's avatar 片刻小哥哥 提交者: GitHub

Merge pull request #51 from apachecn/feature/flink_1.7_doc_zh_20

20 完成
# Working with State
# Working with State 工作状态
This document explains how to use Flink’s state abstractions when developing an application.
本文档解释了在开发应用程序时如何使用Flink的状态抽象。
## Keyed State and Operator State
## Keyed State and Operator State 键控状态和操作状态
There are two basic kinds of state in Flink: `Keyed State` and `Operator State`.
在Flink中有两种基本的状态: `Keyed State``Operator State`
### Keyed State
### Keyed 状态
_Keyed State_ is always relative to keys and can only be used in functions and operators on a `KeyedStream`.
_Keyed State_ 总是相对于键的,只能在 `KeyedStream`上的函数和运算符中使用。
You can think of Keyed State as Operator State that has been partitioned, or sharded, with exactly one state-partition per key. Each keyed-state is logically bound to a unique composite of <parallel-operator-instance, key>, and since each key “belongs” to exactly one parallel instance of a keyed operator, we can think of this simply as <operator, key>.
您可以将键控状态视为已分区或分块的运算符状态,每个键只使用一个状态分区。每个键状态在逻辑上绑定到一个唯一的并行操作符-实例(key>)的组合,而且由于每个键“属于”一个键控操作符的一个并行实例,我们可以简单地将其看作是<操作符、key>。
Keyed State is further organized into so-called _Key Groups_. Key Groups are the atomic unit by which Flink can redistribute Keyed State; there are exactly as many Key Groups as the defined maximum parallelism. During execution each parallel instance of a keyed operator works with the keys for one or more Key Groups.
密钥状态进一步被组织成所谓的_Key Groups_。键组是flink可以重新分配键状态的原子单元;与定义的最大并行度完全一样多的键组。在执行过程中,键操作器的每个并行实例与一个或多个键组的键一起工作。
### Operator State
### Operator 状态
With _Operator State_ (or _non-keyed state_), each operator state is bound to one parallel operator instance. The [Kafka Connector](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/kafka.html) is a good motivating example for the use of Operator State in Flink. Each parallel instance of the Kafka consumer maintains a map of topic partitions and offsets as its Operator State.
使用 _Operator State_ (或 _non-keyed state_ ),每个运算符状态都绑定到一个并行运算符实例。[KafkaConnector](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/kafka.html)是在FLink中使用运算符状态的良好激励示例。Kafka消费者的每个并行实例将主题分区和偏移的映射保持为其运营商状态。
The Operator State interfaces support redistributing state among parallel operator instances when the parallelism is changed. There can be different schemes for doing this redistribution.
当并行度被改变时,操作员状态接口支持并行操作员实例之间的重新分配状态。可以有用于执行这种再分配的不同方案。
## Raw and Managed State
## Raw and Managed State 原始和托管状态
_Keyed State_ and _Operator State_ exist in two forms: _managed_ and _raw_.
_Keyed State_ 和 _Operator State_ 以两种形式存在: _managed_ 和 _raw_。
_Managed State_ is represented in data structures controlled by the Flink runtime, such as internal hash tables, or RocksDB. Examples are “ValueState”, “ListState”, etc. Flink’s runtime encodes the states and writes them into the checkpoints.
_Managed State_ 在由FLink运行时控制的数据结构中表示,如内部哈希表或RocksDB。示例是“ValueState”, “ListState”等。flink的运行时对状态进行编码,并将它们写入检查点。
_Raw State_ is state that operators keep in their own data structures. When checkpointed, they only write a sequence of bytes into the checkpoint. Flink knows nothing about the state’s data structures and sees only the raw bytes.
_Raw State_ 是运算符保留在自己的数据结构中的状态。检查点时,它们只将一个字节序列写入检查点。flink对状态的数据结构一无所知,只看到原始字节。
All datastream functions can use managed state, but the raw state interfaces can only be used when implementing operators. Using managed state (rather than raw state) is recommended, since with managed state Flink is able to automatically redistribute state when the parallelism is changed, and also do better memory management.
所有数据流函数都可以使用托管状态,但在实现运算符时,只能使用原始状态接口。建议使用托管状态(而不是原始状态),因为托管状态flink能够在并行度更改时自动重新分发状态,并且还可以实现更好的内存管理。v
Attention If your managed state needs custom serialization logic, please see the [corresponding guide](custom_serialization.html) in order to ensure future compatibility. Flink’s default serializers don’t need special treatment.
如果您的托管状态需要自定义序列化逻辑,请参阅[相应的指南](customalization.html),以确保将来的兼容性。Flink的默认序列化程序不需要特殊处理。
## Using Managed Keyed State
## Using Managed Keyed State 使用托管密钥状态
The managed keyed state interface provides access to different types of state that are all scoped to the key of the current input element. This means that this type of state can only be used on a `KeyedStream`, which can be created via `stream.keyBy(…)`.
托管键状态接口提供对不同类型状态的访问,这些状态的作用域都是当前输入元素的键。这意味着这种状态只能在`KeyedStream`上使用,而“KeyedStream”可以通过`stream.keyBy(…)`创建。
Now, we will first look at the different types of state available and then we will see how they can be used in a program. The available state primitives are:
现在,我们将首先查看可用的不同类型的状态,然后我们将看到它们如何在程序中使用。可用的状态基元是:
* `ValueState<T>`: This keeps a value that can be updated and retrieved (scoped to key of the input element as mentioned above, so there will possibly be one value for each key that the operation sees). The value can be set using `update(T)` and retrieved using `T value()`.
* `ValueState<T>`:这将保存一个可以更新和检索的值(范围为上述输入元素的关键字,因此可能有一个值用于操作所看到的每个键)。可以使用 `update(T)` 来设置值,并使用 `T value()`检索该值。
* `ListState<T>`: This keeps a list of elements. You can append elements and retrieve an `Iterable` over all currently stored elements. Elements are added using `add(T)` or `addAll(List<T>)`, the Iterable can be retrieved using `Iterable<T> get()`. You can also override the existing list with `update(List<T>)`
* `ListState<T>`: 这将保留元素的列表。可以在所有当前存储的元素上附加元素并检索`Iterable`。使用`add(T)``addAll(List<T>)`添加元素,可使用`Iterable<T> get()`来检索可迭代的元素。也可以使用 `update(List<T>)`覆盖现有列表“”
* `ReducingState<T>`: This keeps a single value that represents the aggregation of all values added to the state. The interface is similar to `ListState` but elements added using `add(T)` are reduced to an aggregate using a specified `ReduceFunction`.
* `ReducingState<T>`:这保留了一个表示添加到状态的所有值的聚合的值。接口类似于 `ListState` ,但使用`add(T)`添加的元素将使用指定的`ReduceFunction`还原为聚合。
* `AggregatingState<IN, OUT>`: This keeps a single value that represents the aggregation of all values added to the state. Contrary to `ReducingState`, the aggregate type may be different from the type of elements that are added to the state. The interface is the same as for `ListState` but elements added using `add(IN)` are aggregated using a specified `AggregateFunction`.
* `AggregatingState<IN, OUT>`: 这保留了一个表示添加到状态的所有值的聚合的值。与 `ReducingState`相反,聚合类型可能与添加到状态的元素类型不同。接口与 `ListState` 相同,但使用`add(IN)` 添加的元素使用指定的 `AggregateFunction`进行聚合。
* `FoldingState<T, ACC>`: This keeps a single value that represents the aggregation of all values added to the state. Contrary to `ReducingState`, the aggregate type may be different from the type of elements that are added to the state. The interface is similar to `ListState` but elements added using `add(T)` are folded into an aggregate using a specified `FoldFunction`.
* `FoldingState<T, ACC>`: 这保留了一个值,表示添加到状态的所有值的聚合。与 `ReducingState`相反,聚合类型可能与添加到状态的元素类型不同。接口类似于 `ListState` ,但是使用 `add(T)` 添加的元素使用指定的 `FoldFunction`折叠成一个聚合。
* `MapState<UK, UV>`: This keeps a list of mappings. You can put key-value pairs into the state and retrieve an `Iterable` over all currently stored mappings. Mappings are added using `put(UK, UV)` or `putAll(Map<UK, UV>)`. The value associated with a user key can be retrieved using `get(UK)`. The iterable views for mappings, keys and values can be retrieved using `entries()`, `keys()` and `values()` respectively.
* `MapState<UK, UV>`: 这保存了一个映射列表。您可以将键值对放入状态,并在所有当前存储的映射上检索 `Iterable` 。映射使用 `put(UK, UV)``putAll(Map<UK, UV>)`添加。可以使用`get(UK)`检索与用户密钥相关的值。映射、键和值的可迭代视图可以分别使用`entry()``key()``values()`检索。
All types of state also have a method `clear()` that clears the state for the currently active key, i.e. the key of the input element.
所有类型的状态都有一个方法 `clear()`,用于清除当前活动键的状态,即输入元素的键。
Attention `FoldingState` and `FoldingStateDescriptor` have been deprecated in Flink 1.4 and will be completely removed in the future. Please use `AggregatingState` and `AggregatingStateDescriptor` instead.
`FoldingState``FoldingStateDescriptor` 已在Flink 1.4中被废弃,并将在今后完全删除。请使用`AggregatingState``AggregatingStateDescriptor`
It is important to keep in mind that these state objects are only used for interfacing with state. The state is not necessarily stored inside but might reside on disk or somewhere else. The second thing to keep in mind is that the value you get from the state depends on the key of the input element. So the value you get in one invocation of your user function can differ from the value in another invocation if the keys involved are different.
重要的是要记住,这些状态对象仅用于与状态进行接口。状态不一定存储在内部,但可能驻留在磁盘或其他地方。要记住的第二件事是,从状态中得到的值取决于输入元素的键。因此,如果所涉及的键不同,则在用户函数的一次调用中获得的值可能与另一次调用中的值不同。
To get a state handle, you have to create a `StateDescriptor`. This holds the name of the state (as we will see later, you can create several states, and they have to have unique names so that you can reference them), the type of the values that the state holds, and possibly a user-specified function, such as a `ReduceFunction`. Depending on what type of state you want to retrieve, you create either a `ValueStateDescriptor`, a `ListStateDescriptor`, a `ReducingStateDescriptor`, a `FoldingStateDescriptor` or a `MapStateDescriptor`.
要获得状态句柄,必须创建一个`StateDescriptor`。这保存了状态的名称(我们稍后会看到,您可以创建几个状态,它们必须有唯一的名称,以便您可以引用它们)、状态所持有的值的类型,以及可能是用户指定的函数,例如`ReduceFunction`。根据要检索的状态类型,可以创建 `ValueStateDescriptor`, a `ListStateDescriptor`, a `ReducingStateDescriptor`, a `FoldingStateDescriptor` or a `MapStateDescriptor`
State is accessed using the `RuntimeContext`, so it is only possible in _rich functions_. Please see [here](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/api_concepts.html#rich-functions) for information about that, but we will also see an example shortly. The `RuntimeContext` that is available in a `RichFunction` has these methods for accessing state:
状态是使用`RuntimeContext`访问的,因此它只能在_rich functions_中使用。有关这方面的信息,请参阅[here](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/api_concepts.html#rich-functions),但我们不久也将看到一个示例。在 `RuntimeContext` 中可用的`RichFunction` 具有以下访问状态的方法:
* `ValueState<T> getState(ValueStateDescriptor<T>)`
* `ReducingState<T> getReducingState(ReducingStateDescriptor<T>)`
......@@ -69,7 +70,7 @@ State is accessed using the `RuntimeContext`, so it is only possible in _rich fu
* `FoldingState<T, ACC> getFoldingState(FoldingStateDescriptor<T, ACC>)`
* `MapState<UK, UV> getMapState(MapStateDescriptor<UK, UV>)`
This is an example `FlatMapFunction` that shows how all of the parts fit together:
这是一个`FlatMapFunction`示例,它显示了所有部件是如何连接在一起的:
......@@ -183,15 +184,15 @@ object ExampleCountWindowAverage extends App {
This example implements a poor man’s counting window. We key the tuples by the first field (in the example all have the same key `1`). The function stores the count and a running sum in a `ValueState`. Once the count reaches 2 it will emit the average and clear the state so that we start over from `0`. Note that this would keep a different state value for each different input key if we had tuples with different values in the first field.
这个例子实现了一个穷人的计数窗口。我们用第一个字段来键入元组(在示例中,所有元组都有相同的键 `1`)。函数将计数和运行的和存储在“ValueState”中。一旦计数达到2,它将发出平均值并清除状态,以便我们从 `0`开始。注意,如果在第一个字段中有不同值的元组,这将为每个不同的输入键保留不同的状态值。
### State Time-To-Live (TTL)
### State Time-To-Live (TTL) 状态生存时间(TTL)
A _time-to-live_ (TTL) can be assigned to the keyed state of any type. If a TTL is configured and a state value has expired, the stored value will be cleaned up on a best effort basis which is discussed in more detail below.
a _time-to-live_ (Ttl)可以分配给任意类型的键控状态。如果配置了一个TTL,并且状态值已经过期,则将在尽最大努力的基础上清理存储的值,下文将对此进行更详细的讨论。
All state collection types support per-entry TTLs. This means that list elements and map entries expire independently.
所有状态集合类型都支持每个入口TTL。这意味着列表元素和映射项将独立过期。
In order to use state TTL one must first build a `StateTtlConfig` configuration object. The TTL functionality can then be enabled in any state descriptor by passing the configuration:
为了使用状态TTL,必须首先构建一个`StateTtlConfig` 配置对象。然后,通过传递配置,可以在任何状态描述符中启用TTL功能:
......@@ -231,43 +232,43 @@ stateDescriptor.enableTimeToLive(ttlConfig)
The configuration has several options to consider:
该配置有多个选项可考虑:
The first parameter of the `newBuilder` method is mandatory, it is the time-to-live value.
`newBuilder` 方法的第一个参数是强制性的,它是实时值。
The update type configures when the state TTL is refreshed (by default `OnCreateAndWrite`):
更新类型在刷新状态TTL时配置(默认为 `OnCreateAndWrite`):
* `StateTtlConfig.UpdateType.OnCreateAndWrite` - only on creation and write access
* `StateTtlConfig.UpdateType.OnReadAndWrite` - also on read access
* `StateTtlConfig.UpdateType.OnCreateAndWrite` - 仅在创建和写入权限时
* `StateTtlConfig.UpdateType.OnReadAndWrite` - 也是关于读访问
The state visibility configures whether the expired value is returned on read access if it is not cleaned up yet (by default `NeverReturnExpired`):
如果尚未清除过期值,则状态可见性将配置是否在读取访问中返回过期值(默认情况下,`NeverReturnExpired`):
* `StateTtlConfig.StateVisibility.NeverReturnExpired` - expired value is never returned
* `StateTtlConfig.StateVisibility.ReturnExpiredIfNotCleanedUp` - returned if still available
* `StateTtlConfig.StateVisibility.NeverReturnExpired` - 过期的值永远不会返回。
* `StateTtlConfig.StateVisibility.ReturnExpiredIfNotCleanedUp` - 如果仍然可用的话返回
In case of `NeverReturnExpired`, the expired state behaves as if it does not exist anymore, even if it still has to be removed. The option can be useful for use cases where data has to become unavailable for read access strictly after TTL, e.g. application working with privacy sensitive data.
`NeverReturnExpired`的情况下,过期状态的行为就好像它不再存在了,即使它仍然必须被移除。对于数据必须在TTL之后才能读取访问的用例来说,该选项是有用的,例如。处理隐私敏感数据的应用程序。
Another option `ReturnExpiredIfNotCleanedUp` allows to return the expired state before its cleanup.
另一个选项`ReturnExpiredIfNotCleanedUp` 允许在清理之前返回过期状态。
**Notes:**
**注意:**
* The state backends store the timestamp of the last modification along with the user value, which means that enabling this feature increases consumption of state storage. Heap state backend stores an additional Java object with a reference to the user state object and a primitive long value in memory. The RocksDB state backend adds 8 bytes per stored value, list entry or map entry.
* 状态后端将上次修改的时间戳与用户值一起存储,这意味着启用此功能会增加状态存储的消耗。堆状态后端存储具有对用户状态对象的引用和在存储器中的原始长值的附加Java对象。ROCKSDB状态后端根据存储的值、列表条目或映射条目添加8个字节。
* Only TTLs in reference to _processing time_ are currently supported.
* 当前只支持引用 _processing time_ 的TTL。
* Trying to restore state, which was previously configured without TTL, using TTL enabled descriptor or vice versa will lead to compatibility failure and `StateMigrationException`.
* 尝试还原以前没有TTL配置的状态,使用启用TTL的描述符(反之亦然)将导致兼容性失败和 `StateMigrationException`
* The TTL configuration is not part of check- or savepoints but rather a way of how Flink treats it in the currently running job.
* TTL配置不是Check-或Savepoint的一部分,而是Flink在当前运行的作业中如何对待它的一种方式。
* The map state with TTL currently supports null user values only if the user value serializer can handle null values. If the serializer does not support null values, it can be wrapped with `NullableSerializer` at the cost of an extra byte in the serialized form.
* 只有当用户值序列化程序能够处理空值时,TTL的映射状态才支持空用户值。如果序列化程序不支持空值,则可以使用 `NullableSerializer` 包装它,代价是序列化形式中的额外字节。
#### Cleanup of Expired State
#### Cleanup of Expired State 清除过期状态
Currently, expired values are only removed when they are read out explicitly, e.g. by calling `ValueState.value()`.
当前,只有在显式读取过期值(例如,通过调用`ValueState.value()`)时,才会删除过期值。
Attention This means that by default if expired state is not read, it won’t be removed, possibly leading to ever growing state. This might change in future releases.
注意,这意味着默认情况下,如果未读取过期状态,则不会删除它,可能会导致状态不断增长。这可能会在未来的版本中发生变化。
Additionally, you can activate the cleanup at the moment of taking the full state snapshot which will reduce its size. The local state is not cleaned up under the current implementation but it will not include the removed expired state in case of restoration from the previous snapshot. It can be configured in `StateTtlConfig`:
此外,您可以在获取将减小其大小的完整状态快照时激活清理。在当前实现下不清除本地状态,但在从上一个快照恢复的情况下,它将不包括已删除的过期状态。它可以在 `StateTtlConfig`中配置:
......@@ -297,13 +298,13 @@ val ttlConfig = StateTtlConfig
This option is not applicable for the incremental checkpointing in the RocksDB state backend.
此选项不适用于RocksDB状态后端中的增量检查点。
More strategies will be added in the future for cleaning up expired state automatically in the background.
更多的策略将添加在未来的清理过期状态自动在后台。
### State in the Scala DataStream API
### State in the Scala DataStream API Scala 数据流API中的状态
In addition to the interface described above, the Scala API has shortcuts for stateful `map()` or `flatMap()` functions with a single `ValueState` on `KeyedStream`. The user function gets the current value of the `ValueState` in an `Option` and must return an updated value that will be used to update the state.
除了上面描述的接口之外,Scala API还提供了有状态的`map()``flatMap()`函数的快捷方式,其中只有一个`KeyedStream`上的`ValueState` 函数。用户函数在 `Option` 中获取 `ValueState` 的当前值,并且必须返回将用于更新状态的更新值。
......@@ -321,13 +322,14 @@ val counts: DataStream[(String, Int)] = stream
## Using Managed Operator State
## Using Managed Operator State 使用托管运营商状态
To use managed operator state, a stateful function can implement either the more general `CheckpointedFunction` interface, or the `ListCheckpointed<T extends Serializable>` interface.
要使用托管操作符状态,有状态函数可以实现更通用的 `CheckpointedFunction` 接口,也可以实现 `ListCheckpointed<T extends Serializable>`接口。
#### CheckpointedFunction
#### CheckpointedFunction 校验点函数
The `CheckpointedFunction` interface provides access to non-keyed state with different redistribution schemes. It requires the implementation of two methods:
`CheckpointedFunction`接口提供了对具有不同再分配方案的非键控状态的访问。它需要实施两种方法:
......@@ -339,15 +341,15 @@ void initializeState(FunctionInitializationContext context) throws Exception;
Whenever a checkpoint has to be performed, `snapshotState()` is called. The counterpart, `initializeState()`, is called every time the user-defined function is initialized, be that when the function is first initialized or be that when the function is actually recovering from an earlier checkpoint. Given this, `initializeState()` is not only the place where different types of state are initialized, but also where state recovery logic is included.
必须执行检查点时,调用`snapshotState()`。对应的`initializeState()`在每次用户定义的函数被初始化时被调用,当函数首先被初始化时,或者当函数实际从较早的检查点恢复时。因此, `initializeState()` 不仅是不同类型状态被初始化的地方,而且还包括其中包括状态恢复逻辑的地方。
Currently, list-style managed operator state is supported. The state is expected to be a `List` of _serializable_ objects, independent from each other, thus eligible for redistribution upon rescaling. In other words, these objects are the finest granularity at which non-keyed state can be redistributed. Depending on the state accessing method, the following redistribution schemes are defined:
当前,支持列表样式的托管运算符状态。该状态应为 _serializable_ object的 `List`,彼此独立,因此在重新调用时符合重新分发的条件。换句话说,这些对象是可以重新分配非键控状态的最佳粒度。根据状态访问方法,定义了以下重新分配方案:
* **Even-split redistribution:** Each operator returns a List of state elements. The whole state is logically a concatenation of all lists. On restore/redistribution, the list is evenly divided into as many sublists as there are parallel operators. Each operator gets a sublist, which can be empty, or contain one or more elements. As an example, if with parallelism 1 the checkpointed state of an operator contains elements `element1` and `element2`, when increasing the parallelism to 2, `element1` may end up in operator instance 0, while `element2` will go to operator instance 1.
* **Even-split redistribution 偶数再分配:** 每个操作符返回一个状态元素列表。整个状态在逻辑上是所有列表的连接。在恢复/重新分配时,列表被平均地划分为与并行运算符相同的子列表。每个运算符都会获得一个子列表,该子列表可以是空的,也可以包含一个或多个元素。例如,如果使用并行主义1,运算符的校验点状态包含元素`element1``element2` ,则当将并行性增加到2时,`element1` 可能最终出现在运算符实例0中,而`element2`将转到运算符`element1`
* **Union redistribution:** Each operator returns a List of state elements. The whole state is logically a concatenation of all lists. On restore/redistribution, each operator gets the complete list of state elements.
* **Union redistribution UNION再分配 :** 每个运算符返回一个状态元素列表。整个状态在逻辑上是所有列表的连接。在恢复/重新分配时,每个运算符都会获得状态元素的完整列表。
Below is an example of a stateful `SinkFunction` that uses `CheckpointedFunction` to buffer elements before sending them to the outside world. It demonstrates the basic even-split redistribution list state:
下面是一个有状态的`SinkFunction` 示例,它在将元素发送到外部世界之前使用 `CheckpointedFunction` 缓冲元素。它演示了基本的均匀再分配列表状态:
......@@ -455,9 +457,9 @@ class BufferingSink(threshold: Int = 0)
The `initializeState` method takes as argument a `FunctionInitializationContext`. This is used to initialize the non-keyed state “containers”. These are a container of type `ListState` where the non-keyed state objects are going to be stored upon checkpointing.
`initializeState`方法以 `FunctionInitializationContext`作为参数。这用于初始化无键状态的“containers”。这是一个类型为 `ListState` 的容器,在该容器中,无键状态对象将在检查点时存储。
Note how the state is initialized, similar to keyed state, with a `StateDescriptor` that contains the state name and information about the type of the value that the state holds:
请注意如何初始化状态,类似于键控状态,使用`StateDescriptor`,其中包含状态名称和有关状态所持有值的类型的信息:
......@@ -485,17 +487,17 @@ checkpointedState = context.getOperatorStateStore.getListState(descriptor)
The naming convention of the state access methods contain its redistribution pattern followed by its state structure. For example, to use list state with the union redistribution scheme on restore, access the state by using `getUnionListState(descriptor)`. If the method name does not contain the redistribution pattern, _e.g._ `getListState(descriptor)`, it simply implies that the basic even-split redistribution scheme will be used.
状态访问方法的命名约定包含它的重新分布模式,然后是它的状态结构。例如,要在RESTORE上使用LIST状态和联合重新分配方案,可以使用`getUnionListState(descriptor)`访问状态。如果方法名称不包含重新分配模式,则仅意味着将使用基本的均匀再分配方案。
After initializing the container, we use the `isRestored()` method of the context to check if we are recovering after a failure. If this is `true`, _i.e._ we are recovering, the restore logic is applied.
初始化容器后,我们使用上下文的`isRestored()`方法检查故障后是否正在恢复。如果这是`true`,_即_we正在恢复,则应用恢复逻辑。
As shown in the code of the modified `BufferingSink`, this `ListState` recovered during state initialization is kept in a class variable for future use in `snapshotState()`. There the `ListState` is cleared of all objects included by the previous checkpoint, and is then filled with the new ones we want to checkpoint.
如修改的`BufferingSink`的代码所示,在状态初始化过程中恢复的这个`ListState`保存在类变量中,以便将来在 `snapshotState()`中使用。在那里,`ListState` 被清除了以前检查点所包含的所有对象,然后用我们想要检查点的新对象来填充。
As a side note, the keyed state can also be initialized in the `initializeState()` method. This can be done using the provided `FunctionInitializationContext`.
作为一个侧面注释,键控状态也可以在`initializeState()`方法中初始化。这可以使用提供的`FunctionInitializationContext`来完成。
#### ListCheckpointed
#### ListCheckpointed 列表校验点
The `ListCheckpointed` interface is a more limited variant of `CheckpointedFunction`, which only supports list-style state with even-split redistribution scheme on restore. It also requires the implementation of two methods:
`ListCheckpointed`接口是`CheckpointedFunction`的一个更有限的变体,它只支持列表样式的状态,在还原时采用均匀分割的重新分配方案。它还要求采用两种方法:
......@@ -507,11 +509,11 @@ void restoreState(List<T> state) throws Exception;
On `snapshotState()` the operator should return a list of objects to checkpoint and `restoreState` has to handle such a list upon recovery. If the state is not re-partitionable, you can always return a `Collections.singletonList(MY_STATE)` in the `snapshotState()`.
`snapshotState()`上,操作符应该将对象列表返回给检查点,`restoreState` 在恢复时必须处理这样的列表。如果状态不可再分区,则始终可以在`snapshotState()`中返回`Collections.singletonList(MY_STATE)`
### Stateful Source Functions
### Stateful Source Functions 有状态源函数
Stateful sources require a bit more care as opposed to other operators. In order to make the updates to the state and output collection atomic (required for exactly-once semantics on failure/recovery), the user is required to get a lock from the source’s context.
与其他操作符相比,有状态源需要更多的注意。为了使状态和输出集合的更新是原子的(在失败/恢复时只需要一次语义),用户需要从源的上下文中获得一个锁。
......@@ -599,5 +601,5 @@ class CounterSource
Some operators might need the information when a checkpoint is fully acknowledged by Flink to communicate that with the outside world. In this case see the `org.apache.flink.runtime.state.CheckpointListener` interface.
当一个检查点被flink完全确认以与外部世界进行通信时,一些运营商可能需要该信息。在这种情况下,请参见 `org.apache.flink.runtime.state.CheckpointListener` 界面。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册