Apache Flink streaming applications are typically designed to run indefinitely or for long periods of time. As with all long-running services, the applications need to be updated to adapt to changing requirements. This goes the same for data schemas that the applications work against; they evolve along with the application.
This page provides an overview of how you can evolve your state type’s data schema. The current restrictions varies across different types and state structures (`ValueState`, `ListState`, etc.).
Note that the information on this page is relevant only if you are using state serializers that are generated by Flink’s own [type serialization framework](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/types_serialization.html). That is, when declaring your state, the provided state descriptor is not configured to use a specific `TypeSerializer` or `TypeInformation`, in which case Flink infers information about the state type:
Under the hood, whether or not the schema of state can be evolved depends on the serializer used to read / write persisted state bytes. Simply put, a registered state’s schema can only be evolved if its serializer properly supports it. This is handled transparently by serializers generated by Flink’s type serialization framework (current scope of support is listed [below](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/schema_evolution.html#supported-data-types-for-schema-evolution)).
If you intend to implement a custom `TypeSerializer` for your state type and would like to learn how to implement the serializer to support state schema evolution, please refer to [Custom State Serialization](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/custom_serialization.html). The documentation there also covers necessary internal details about the interplay between state serializers and Flink’s state backends to support state schema evolution.
To evolve the schema of a given state type, you would take the following steps:
要发展给定状态类型的架构,您需要执行以下步骤:
1.Take a savepoint of your Flink streaming job.
2.Update state types in your application (e.g., modifying your Avro type schema).
3.Restore the job from the savepoint. When accessing state for the first time, Flink will assess whether or not the schema had been changed for the state, and migrate state schema if necessary.
The process of migrating state to adapt to changed schemas happens automatically, and independently for each state. This process is performed internally by Flink by first checking if the new serializer for the state has different serialization schema than the previous serializer; if so, the previous serializer is used to read the state to objects, and written back to bytes again with the new serializer.
Further details about the migration process is out of the scope of this documentation; please refer to [here](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/custom_serialization.html).
## Supported data types for schema evolution 支持模式演化的数据类型
Currently, schema evolution is supported only for Avro. Therefore, if you care about schema evolution for state, it is currently recommended to always use Avro for state data types.
There are plans to extend the support for more composite types, such as POJOs; for more details, please refer to [FLINK-10897](https://issues.apache.org/jira/browse/FLINK-10897).
Flink fully supports evolving schema of Avro type state, as long as the schema change is considered compatible by [Avro’s rules for schema resolution](http://avro.apache.org/docs/current/spec.html#Schema+Resolution).