From 4acb8380944545140b2c48ac8e448cb25815b30f Mon Sep 17 00:00:00 2001
From: Anonymitaet <50226895+Anonymitaet@users.noreply.github.com>
Date: Thu, 1 Aug 2019 22:53:26 +0800
Subject: [PATCH] [Doc] Add Schema Evolution and Compatibility section (#4841)
Structure of Schema Chapter: https://github.com/apache/pulsar/issues/4789
---
site2/docs/schema-evolution-compatibility.md | 779 +++++++++++++++++++
1 file changed, 779 insertions(+)
create mode 100644 site2/docs/schema-evolution-compatibility.md
diff --git a/site2/docs/schema-evolution-compatibility.md b/site2/docs/schema-evolution-compatibility.md
new file mode 100644
index 00000000000..dcbb3d3c8e9
--- /dev/null
+++ b/site2/docs/schema-evolution-compatibility.md
@@ -0,0 +1,779 @@
+---
+id: schema-evolution-compatibility
+title: Schema evolution and compatibility
+sidebar_label: Schema evolution and compatibility
+---
+
+## Schema evolution
+
+Pulsar schema is defined in a data structure called `SchemaInfo`.
+
+Each `SchemaInfo` stored with a topic has a version. The version is used to manage the schema changes happening within a topic.
+
+The message produced with `SchemaInfo` is tagged with a schema version. When a message is consumed by a Pulsar client, the Pulsar client can use the schema version to retrieve the corresponding `SchemaInfo` and use the correct schema information to deserialize data.
+
+### What is schema evolution?
+
+Schemas store the details of attributes and types. To satisfy new business requirements, you need to update schemas inevitably over time, which is called **schema evolution**.
+
+Any schema changes affect downstream consumers. Schema evolution ensures that the downstream consumers can seamlessly handle data encoded with both old schemas and new schemas.
+
+### How Pulsar schema should evolve?
+
+The answer is Pulsar schema compatibility check strategy. It determines how schema compares old schemas with new schemas in topics.
+
+For more information, see [Schema compatibility check strategy](#schema-compatibility-check-strategy).
+
+### How does Pulsar support schema evolution?
+
+1. When a producer/consumer/reader connects to a broker, the broker deploys the schema compatibility checker configured by `schemaRegistryCompatibilityCheckers` to enforce schema compatibility check.
+
+ The schema compatibility checker is one instance per schema type.
+
+ Currently, Avro and JSON have their own compatibility checkers, while all the other schema types share the default compatibility checker which disables schema evolution.
+
+2. The producer/consumer/reader sends its client `SchemaInfo` to the broker.
+
+3. The broker knows the schema type and locates the schema compatibility checker for that type.
+
+4. The broker uses the checker to check if the `SchemaInfo` is compatible with the latest schema of the topic by applying its compatibility check strategy.
+
+ Currently, the compatibility check strategy is configured at the namespace level and applied to all the topics within that namespace.
+
+## Schema compatibility check strategy
+
+Pulsar has 8 schema compatibility check strategies, which are summarized in the following table.
+
+Suppose that you have a topic containing three schemas (V1, V2, and V3), V1 is the oldest and V3 is the latest:
+
+
+
+
+
+
+
+Compatibility check strategy
+
+ |
+
+
+
+Definition
+
+ |
+
+
+
+Changes allowed
+
+ |
+
+
+
+Check against which schema
+
+ |
+
+
+
+Upgrade first
+
+ |
+
+
+
+
+
+
+
+`ALWAYS_COMPATIBLE`
+
+ |
+
+
+
+Disable schema compatibility check.
+
+ |
+
+
+
+All changes are allowed
+
+ |
+
+
+
+All previous versions
+
+ |
+
+
+
+Any order
+
+ |
+
+
+
+
+
+
+
+`ALWAYS_INCOMPATIBLE`
+
+ |
+
+
+
+Disable schema evolution.
+
+ |
+
+
+
+All changes are disabled
+
+ |
+
+
+
+None
+
+ |
+
+
+
+None
+
+ |
+
+
+
+
+
+
+
+`BACKWARD`
+
+ |
+
+
+
+Consumers using the schema V3 can process data written by producers using the schema V3 or V2.
+
+ |
+
+
+
+* Add optional fields
+
+* Delete fields
+
+ |
+
+
+
+Latest version
+
+ |
+
+
+
+Consumers
+
+ |
+
+
+
+
+
+
+
+`BACKWARD_TRANSITIVE`
+
+ |
+
+
+
+Consumers using the schema V3 can process data written by producers using the schema V3, V2 or V1.
+
+ |
+
+
+
+* Add optional fields
+
+* Delete fields
+
+ |
+
+
+
+All previous versions
+
+ |
+
+
+
+Consumers
+
+ |
+
+
+
+
+
+
+
+`FORWARD`
+
+ |
+
+
+
+Consumers using the schema V3 or V2 can process data written by producers using the schema V3.
+
+ |
+
+
+
+* Add fields
+
+* Delete optional fields
+
+ |
+
+
+
+Latest version
+
+ |
+
+
+
+Producers
+
+ |
+
+
+
+
+
+
+
+`FORWARD_TRANSITIVE`
+
+ |
+
+
+
+Consumers using the schema V3, V2 or V1 can process data written by producers using the schema V3.
+
+ |
+
+
+
+* Add fields
+
+* Delete optional fields
+
+ |
+
+
+
+All previous versions
+
+ |
+
+
+
+Producers
+
+ |
+
+
+
+
+
+
+
+`FULL`
+
+ |
+
+
+
+Backward and forward compatible between the schema V3 and V2.
+
+ |
+
+
+
+* Modify optional fields
+
+ |
+
+
+
+Latest version
+
+ |
+
+
+
+Any order
+
+ |
+
+
+
+
+
+
+
+`FULL_TRANSITIVE`
+
+ |
+
+
+
+Backward and forward compatible among the schema V3, V2, and V1.
+
+ |
+
+
+
+* Modify optional fields
+
+ |
+
+
+
+All previous versions
+
+ |
+
+
+
+Any order
+
+ |
+
+
+
+
+
+### ALWAYS_COMPATIBLE and ALWAYS_INCOMPATIBLE
+
+
+
+
+
+
+
+Compatibility check strategy
+
+ |
+
+
+
+Definition
+
+ |
+
+
+
+Description
+
+ |
+
+
+
+Note
+
+ |
+
+
+
+
+
+
+
+`FULL`
+
+ |
+
+
+
+Schemas are both backward and forward compatible, which means:
+
+Consumers using the last schema can process data written by producers using the new schema.
+
+AND
+
+Consumers using the new schema can process data written by producers using the last schema.
+
+ |
+
+
+
+Consumers using the schema V3 can process data written by producers using the schema V3 or V2.
+
+AND
+
+Consumers using the schema V3 or V2 can process data written by producers using the schema V3.
+
+ |
+
+
+
+* For Avro and JSON, the default schema compatibility check strategy is `FULL`.
+
+* For all schema types except Avro and JSON, the default schema compatibility check strategy is `ALWAYS_INCOMPATIBLE`.
+
+ |
+
+
+
+
+
+
+
+`FULL_TRANSITIVE`
+
+ |
+
+
+
+The new schema is backward and forward compatible with all previously registered schemas.
+
+ |
+
+
+
+Consumers using the schema V3 can process data written by producers using the schema V3, V2 or V1.
+
+AND
+
+Consumers using the schema V3, V2 or V1 can process data written by producers using the schema V3.
+
+ |
+
+
+
+None
+
+ |
+
+
+
+
+
+#### Example
+
+In some data formats, for example, Avro, you can define fields with default values. Consequently, adding or removing a field with a default value is a fully compatible change.
+
+## Order of upgrading clients
+
+The order of upgrading client applications is determined by the compatibility check strategy.
+
+For example, the producers using schemas to write data to Pulsar and the consumers using schemas to read data from Pulsar.
+
+
+
+
+
+
+
+Compatibility check strategy
+
+ |
+
+
+
+Upgrade first
+
+ |
+
+
+
+Description
+
+ |
+
+
+
+
+
+
+
+`ALWAYS_COMPATIBLE`
+
+ |
+
+
+
+Any order
+
+ |
+
+
+
+The compatibility check is disabled.
+
+Consequently, you can upgrade the producers and consumers in **any order**.
+
+ |
+
+
+
+
+
+
+
+`ALWAYS_INCOMPATIBLE`
+
+ |
+
+
+
+None
+
+ |
+
+
+
+The schema evolution is disabled.
+
+ |
+
+
+
+
+
+
+
+* `BACKWARD`
+
+* `BACKWARD_TRANSITIVE`
+
+ |
+
+
+
+Consumers
+
+ |
+
+
+
+There is no guarantee that consumers using the old schema can read data produced using the new schema.
+
+Consequently, **upgrade all consumers first**, and then start producing new data.
+
+ |
+
+
+
+
+
+
+
+* `FORWARD`
+
+* `FORWARD_TRANSITIVE`
+
+ |
+
+
+
+Producers
+
+ |
+
+
+
+There is no guarantee that consumers using the new schema can read data produced using the old schema.
+
+Consequently, **upgrade all producers first** to use the new schema and ensure that the data already produced using the old schemas are not available to consumers, and then upgrade the consumers.
+
+ |
+
+
+
+
+
+
+
+* `FULL`
+
+* `FULL_TRANSITIVE`
+
+ |
+
+
+
+Any order
+
+ |
+
+
+
+There is no guarantee that consumers using the old schema can read data produced using the new schema and consumers using the new schema can read data produced using the old schema.
+
+Consequently, you can upgrade the producers and consumers in **any order**.
+
+ |
+
+
+
+
--
GitLab