未验证 提交 52ddd9b0 编写于 作者: Q Qingsheng Ren 提交者: GitHub

[FLINK-18382][docs-zh] Translate Kafka SQL connector documentation into Chinese

This closes #13876
上级 8da60525
<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<p>为使用 {{ include.connector.name }} {{ include.connector.category }},以下依赖在使用自动化构建工具(如 Maven
或 SBT)构建的工程和带有 SQL JAR 的 SQL 客户端时都必须提供。</p>
{% comment %}
The 'liquify' filter makes it possible to include liquid variables such as e.g. site.version.
{% endcomment %}
{% if include.connector.versions == nil %}
<table>
<thead>
<tr>
<th style="text-align: left">Maven 依赖</th>
<th style="text-align: left">SQL 客户端 JAR</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left"><code class="highlighter-rouge">{{ include.connector.maven | liquify }}</code></td>
{% if include.connector.built-in %}
<td style="text-align: left">Built-in</td>
{% elsif site.is_stable %}
{% if include.connector.sql-url != nil %}
<td style="text-align: left"><a href="{{ include.connector.sql-url | liquify }}">下载</a></td>
{% else %}
<td style="text-align: left">目前无 SQL JAR 可用</td>
{% endif %}
{% else %}
<td style="text-align: left">只在稳定版本可用</td>
{% endif %}
</tr>
</tbody>
</table>
{% else %}
<table>
<thead>
<tr>
<th style="text-align: left">{{ include.connector.name }} 版本</th>
<th style="text-align: left">Maven 依赖</th>
<th style="text-align: left">SQL 客户端 JAR</th>
</tr>
</thead>
<tbody>
{% for version in include.connector.versions %}
<tr>
<td style="text-align: left">{{ version.version | liquify }}</td>
<td style="text-align: left"><code class="highlighter-rouge">{{ version.maven | liquify }}</code></td>
{% if include.connector.built-in %}
<td style="text-align: left">内置</td>
{% elsif include.connector.no-sql-jar %}
{% elsif site.is_stable %}
{% if version.sql-url != nil %}
<td style="text-align: left"><a href="{{ version.sql-url | liquify }}">下载</a></td>
{% else %}
<td style="text-align: left">目前无 SQL JAR 可用</td>
{% endif %}
{% else %}
<td style="text-align: left">只在稳定版本可用</td>
{% endif %}
</tr>
{% endfor %}
</tbody>
</table>
{% endif %}
......@@ -39,7 +39,7 @@ Dependencies
------------
{% assign connector = site.data.sql-connectors['elastic'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -41,7 +41,7 @@ Avro Schema Registry 格式只能与[Apache Kafka SQL连接器]({% link dev/tabl
------------
{% assign connector = site.data.sql-connectors['avro-confluent'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -35,7 +35,7 @@ under the License.
------------
{% assign connector = site.data.sql-connectors['avro'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -47,7 +47,7 @@ Flink 还支持将 Flink SQL 中的 INSERT / UPDATE / DELETE 消息编码为 Can
------------
{% assign connector = site.data.sql-connectors['canal'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -35,7 +35,7 @@ under the License.
------------
{% assign connector = site.data.sql-connectors['csv'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -47,13 +47,13 @@ Flink 还支持将 Flink SQL 中的 INSERT / UPDATE / DELETE 消息编码为 Deb
<div class="codetabs" markdown="1">
<div data-lang="Debezium Avro" markdown="1">
{% assign connector = site.data.sql-connectors['debezium-avro-confluent'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
</div>
<div data-lang="Debezium Json" markdown="1">
{% assign connector = site.data.sql-connectors['debezium-json'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
</div>
......
......@@ -35,7 +35,7 @@ under the License.
------------
{% assign connector = site.data.sql-connectors['json'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -45,7 +45,7 @@ Dependencies
------------
{% assign connector = site.data.sql-connectors['maxwell'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -35,7 +35,7 @@ under the License.
------------
{% assign connector = site.data.sql-connectors['orc'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -35,7 +35,7 @@ under the License.
------------
{% assign connector = site.data.sql-connectors['parquet'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -37,7 +37,7 @@ Raw format 允许读写原始(基于字节)值作为单个列。
------------
{% assign connector = site.data.sql-connectors['raw'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -39,7 +39,7 @@ HBase 连接器在 upsert 模式下运行,可以使用 DDL 中定义的主键
------------
{% assign connector = site.data.sql-connectors['hbase'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -39,7 +39,7 @@ Dependencies
------------
{% assign connector = site.data.sql-connectors['jdbc'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -536,7 +536,7 @@ See more about how to use the CDC formats in [debezium-json]({% link dev/table/c
### Sink Partitioning
The config option `sink.partitioner` specifies output partitioning from Flink's partitions into Kafka's partitions.
By default, Flink uses the [Kafka default partitioner](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java) to parititon records. It uses the [sticky partition strategy](https://www.confluent.io/blog/apache-kafka-producer-improvements-sticky-partitioner/) for records with null keys and uses a murmur2 hash to compute the partition for a record with the key defined.
By default, Flink uses the [Kafka default partitioner](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java) to partition records. It uses the [sticky partition strategy](https://www.confluent.io/blog/apache-kafka-producer-improvements-sticky-partitioner/) for records with null keys and uses a murmur2 hash to compute the partition for a record with the key defined.
In order to control the routing of rows into partitions, a custom sink partitioner can be provided. The 'fixed' partitioner will write the records in the same Flink partition into the same Kafka partition, which could reduce the cost of the network connections.
......
---
title: "Apache Kafka SQL Connector"
title: "Apache Kafka SQL 连接器"
nav-title: Kafka
nav-parent_id: sql-connectors
nav-pos: 2
......@@ -29,23 +29,22 @@ under the License.
* This will be replaced by the TOC
{:toc}
The Kafka connector allows for reading data from and writing data into Kafka topics.
Kafka 连接器提供从 Kafka topic 中消费和写入数据的能力。
Dependencies
依赖
------------
{% assign connector = site.data.sql-connectors['kafka'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
The Kafka connectors are not currently part of the binary distribution.
See how to link with them for cluster execution [here]({% link dev/project-configuration.zh.md %}).
Kafka 连接器目前并不包含在 Flink 的二进制发行版中,请查阅 [该链接]({% link dev/project-configuration.zh.md %}) 了解如何在集群运行中引用 Kafka 连接器。
How to create a Kafka table
如何创建 Kafka 表
----------------
The example below shows how to create a Kafka table:
以下示例展示了如何创建 Kafka 表:
<div class="codetabs" markdown="1">
<div data-lang="SQL" markdown="1">
......@@ -67,20 +66,19 @@ CREATE TABLE KafkaTable (
</div>
</div>
Available Metadata
可用的元数据
------------------
以下的连接器元数据可以在表定义中通过元数据列的形式获取。
The following connector metadata can be accessed as metadata columns in a table definition.
The `R/W` column defines whether a metadata field is readable (`R`) and/or writable (`W`).
Read-only columns must be declared `VIRTUAL` to exclude them during an `INSERT INTO` operation.
`R/W` 列定义了一个元数据是可读的(`R`)还是可写的(`W`)。只读列必须声明为 `VIRTUAL` 以在 `INSERT INTO`
操作中排除它们。
<table class="table table-bordered">
<thead>
<tr>
<th class="text-left" style="width: 25%">Key</th>
<th class="text-center" style="width: 30%">Data Type</th>
<th class="text-center" style="width: 40%">Description</th>
<th class="text-left" style="width: 25%"></th>
<th class="text-center" style="width: 30%">数据类型</th>
<th class="text-center" style="width: 40%">描述</th>
<th class="text-center" style="width: 5%">R/W</th>
</tr>
</thead>
......@@ -88,50 +86,50 @@ Read-only columns must be declared `VIRTUAL` to exclude them during an `INSERT I
<tr>
<td><code>topic</code></td>
<td><code>STRING NOT NULL</code></td>
<td>Topic name of the Kafka record.</td>
<td>Kafka 记录的 Topic 名</td>
<td><code>R</code></td>
</tr>
<tr>
<td><code>partition</code></td>
<td><code>INT NOT NULL</code></td>
<td>Partition ID of the Kafka record.</td>
<td>Kafka 记录的 partition ID</td>
<td><code>R</code></td>
</tr>
<tr>
<td><code>headers</code></td>
<td><code>MAP<STRING, BYTES> NOT NULL</code></td>
<td>Headers of the Kafka record as a map of raw bytes.</td>
<td>二进制 Map 类型的 Kafka 记录头(Header)</td>
<td><code>R/W</code></td>
</tr>
<tr>
<td><code>leader-epoch</code></td>
<td><code>INT NULL</code></td>
<td>Leader epoch of the Kafka record if available.</td>
<td>Kafka 记录的 Leader epoch(如果可用)</td>
<td><code>R</code></td>
</tr>
<tr>
<td><code>offset</code></td>
<td><code>BIGINT NOT NULL</code></td>
<td>Offset of the Kafka record in the partition.</td>
<td>Kafka 记录在 partition 中的位点</td>
<td><code>R</code></td>
</tr>
<tr>
<td><code>timestamp</code></td>
<td><code>TIMESTAMP(3) WITH LOCAL TIME ZONE NOT NULL</code></td>
<td>Timestamp of the Kafka record.</td>
<td>Kafka record 的时间戳</td>
<td><code>R/W</code></td>
</tr>
<tr>
<td><code>timestamp-type</code></td>
<td><code>STRING NOT NULL</code></td>
<td>Timestamp type of the Kafka record. Either "NoTimestampType",
"CreateTime" (also set when writing metadata), or "LogAppendTime".</td>
<td>Kafka 记录的时间戳类型。可能的类型有 "NoTimestampType",
"CreateTime"(会在写入元数据时设置)或 "LogAppendTime"。</td>
<td><code>R</code></td>
</tr>
</tbody>
</table>
The extended `CREATE TABLE` example demonstrates the syntax for exposing these metadata fields:
以下扩展的 `CREATE TABLE` 示例展示了使用这些元数据的语法:
<div class="codetabs" markdown="1">
<div data-lang="SQL" markdown="1">
......@@ -155,12 +153,11 @@ CREATE TABLE KafkaTable (
</div>
</div>
**Format Metadata**
**格式元信息**
The connector is able to expose metadata of the value format for reading. Format metadata keys
are prefixed with `'value.'`.
连接器可以读出消息格式的元数据。格式元数据的配置键以 `'value.'` 作为前缀。
The following example shows how to access both Kafka and Debezium metadata fields:
以下示例展示了如何获取 Kafka 和 Debezium 的元数据字段:
<div class="codetabs" markdown="1">
<div data-lang="SQL" markdown="1">
......@@ -185,206 +182,196 @@ CREATE TABLE KafkaTable (
</div>
</div>
Connector Options
连接器配置项
----------------
<table class="table table-bordered">
<thead>
<tr>
<th class="text-left" style="width: 25%">Option</th>
<th class="text-center" style="width: 8%">Required</th>
<th class="text-center" style="width: 7%">Default</th>
<th class="text-center" style="width: 10%">Type</th>
<th class="text-center" style="width: 50%">Description</th>
<th class="text-left" style="width: 25%">选项</th>
<th class="text-center" style="width: 8%">是否必需</th>
<th class="text-center" style="width: 7%">默认值</th>
<th class="text-center" style="width: 10%">类型</th>
<th class="text-center" style="width: 50%">描述</th>
</tr>
</thead>
<tbody>
<tr>
<td><h5>connector</h5></td>
<td>required</td>
<td style="word-wrap: break-word;">(none)</td>
<td>必需</td>
<td style="word-wrap: break-word;">()</td>
<td>String</td>
<td>Specify what connector to use, for Kafka use: <code>'kafka'</code>.</td>
<td>使用的连接器,Kafka 连接器使用 <code>'kafka'</code></td>
</tr>
<tr>
<td><h5>topic</h5></td>
<td>required for sink, optional for source(use 'topic-pattern' instead if not set)</td>
<td style="word-wrap: break-word;">(none)</td>
<td>sink 必需</td>
<td style="word-wrap: break-word;">()</td>
<td>String</td>
<td>Topic name(s) to read data from when the table is used as source. It also supports topic list for source by separating topic by semicolon like <code>'topic-1;topic-2'</code>. Note, only one of "topic-pattern" and "topic" can be specified for sources. When the table is used as sink, the topic name is the topic to write data to. Note topic list is not supported for sinks.</td>
<td>当表用作 source 读取数据的 topic 名。亦支持用分号间隔的 topic 列表,如 <code>'topic-1;topic-2'</code>。注意对 source 表而言,'topic' 和 'topic-pattern' 两个选项只能使用其中一个。
当表被用作 sink 时,该配置表示写入的 topic 名。注意 sink 表不支持 topic 列表。</td>
</tr>
<tr>
<td><h5>topic-pattern</h5></td>
<td>optional</td>
<td style="word-wrap: break-word;">(none)</td>
<td>可选</td>
<td style="word-wrap: break-word;">()</td>
<td>String</td>
<td>The regular expression for a pattern of topic names to read from. All topics with names that match the specified regular expression will be subscribed by the consumer when the job starts running. Note, only one of "topic-pattern" and "topic" can be specified for sources.</td>
<td>匹配读取 topic 名称的正则表达式。在作业开始运行时,所有匹配该正则表达式的 topic 都将被 Kafka consumer 订阅。注意对 source 表而言,'topic' 和 'topic-pattern' 两个选项只能使用其中一个。</td>
</tr>
<tr>
<td><h5>properties.bootstrap.servers</h5></td>
<td>required</td>
<td style="word-wrap: break-word;">(none)</td>
<td>必需</td>
<td style="word-wrap: break-word;">()</td>
<td>String</td>
<td>Comma separated list of Kafka brokers.</td>
<td>逗号分隔的 Kafka broker 列表</td>
</tr>
<tr>
<td><h5>properties.group.id</h5></td>
<td>required by source</td>
<td style="word-wrap: break-word;">(none)</td>
<td>source 必需</td>
<td style="word-wrap: break-word;">()</td>
<td>String</td>
<td>The id of the consumer group for Kafka source, optional for Kafka sink.</td>
<td>Kafka source 的 consumer 组 ID,对 Kafka sink 可选填</td>
</tr>
<tr>
<td><h5>properties.*</h5></td>
<td>optional</td>
<td style="word-wrap: break-word;">(none)</td>
<td>可选</td>
<td style="word-wrap: break-word;">()</td>
<td>String</td>
<td>
This can set and pass arbitrary Kafka configurations. Suffix names must match the configuration key defined in <a href="https://kafka.apache.org/documentation/#configuration">Kafka Configuration documentation</a>. Flink will remove the "properties." key prefix and pass the transformed key and values to the underlying KafkaClient. For example, you can disable automatic topic creation via <code>'properties.allow.auto.create.topics' = 'false'</code>. But there are some configurations that do not support to set, because Flink will override them, e.g. <code>'key.deserializer'</code> and <code>'value.deserializer'</code>.
可以设置任意 Kafka 的配置项。后缀名必须匹配在 <a href="https://kafka.apache.org/documentation/#configuration">Kafka 配置文档</a> 中定义的配置键。Flink 将移除 "properties." 配置键前缀并将变换后的配置键和值传入底层的 Kafka 客户端。
例如您可以通过 <code>'properties.allow.auto.create.topics' = 'false'</code> 来禁用 topic 的自动创建。但是某些配置项不支持进行配置,因为 Flink 会覆盖这些配置,例如 <code>'key.deserializer'</code><code>'value.deserializer'</code>
</td>
</tr>
<tr>
<td><h5>format</h5></td>
<td>required</td>
<td style="word-wrap: break-word;">(none)</td>
<td>必需</td>
<td style="word-wrap: break-word;">()</td>
<td>String</td>
<td>The format used to deserialize and serialize the value part of Kafka messages.
Please refer to the <a href="{% link dev/table/connectors/formats/index.zh.md %}">formats</a> page for
more details and more format options.
Note: Either this option or the <code>'value.format'</code> option are required.
<td>用来序列化或反序列化 Kafka 消息的格式。
请参阅 <a href="{% link dev/table/connectors/formats/index.zh.md %}">格式</a> 页面以获取更多关于格式的细节和相关配置项。
注意:该配置项和 <code>'value.format'</code> 二者之一是必需的。
</td>
</tr>
<tr>
<td><h5>key.format</h5></td>
<td>optional</td>
<td style="word-wrap: break-word;">(none)</td>
<td>可选</td>
<td style="word-wrap: break-word;">()</td>
<td>String</td>
<td>The format used to deserialize and serialize the key part of Kafka messages.
Please refer to the <a href="{% link dev/table/connectors/formats/index.zh.md %}">formats</a> page
for more details and more format options. Note: If a key format is defined, the <code>'key.fields'</code>
option is required as well. Otherwise the Kafka records will have an empty key.
<td>用来序列化和反序列化 Kafka 消息键(Key)的格式。
请参阅 <a href="{% link dev/table/connectors/formats/index.zh.md %}">格式</a> 页面以获取更多关于格式的细节和相关配置项。
注意:如果定义了键格式,则配置项 <code>'key.fields'</code> 也是必需的,否则 Kafka 记录将使用空值作为键。
</td>
</tr>
<tr>
<td><h5>key.fields</h5></td>
<td>optional</td>
<td>可选</td>
<td style="word-wrap: break-word;">[]</td>
<td>List&lt;String&gt;</td>
<td>Defines an explicit list of physical columns from the table schema that configure the data
type for the key format. By default, this list is empty and thus a key is undefined.
The list should look like <code>'field1;field2'</code>.
<td>表结构中用来配置消息键(Key)格式数据类型的字段列表。默认情况下该列表为空,因此消息键没有定义。列表格式为 <code>'field1;field2'</code>
</td>
</tr>
<tr>
<td><h5>key.fields-prefix</h5></td>
<td>optional</td>
<td style="word-wrap: break-word;">(none)</td>
<td>可选</td>
<td style="word-wrap: break-word;">()</td>
<td>String</td>
<td>Defines a custom prefix for all fields of the key format to avoid name clashes with fields
of the value format. By default, the prefix is empty. If a custom prefix is defined, both the
table schema and <code>'key.fields'</code> will work with prefixed names. When constructing the
data type of the key format, the prefix will be removed and the non-prefixed names will be used
within the key format. Please note that this option requires that <code>'value.fields-include'</code>
must be set to <code>'EXCEPT_KEY'</code>.
<td>为所有消息键(Key)格式字段指定自定义前缀,以避免与消息体(Value)格式字段重名。默认情况下前缀为空。如果定义了前缀,表结构和配置项
<code>'key.fields'</code> 都需要使用带前缀的名称。当构建消息键格式字段时,前缀会被移除,消息键格式将会使用无前缀的名称。请注意该配置项要求必须将
<code>'value.fields-include'</code> 配置为 <code>'EXCEPT_KEY'</code>
</td>
</tr>
<tr>
<td><h5>value.format</h5></td>
<td>required</td>
<td style="word-wrap: break-word;">(none)</td>
<td>必需</td>
<td style="word-wrap: break-word;">()</td>
<td>String</td>
<td>The format used to deserialize and serialize the value part of Kafka messages.
Please refer to the <a href="{% link dev/table/connectors/formats/index.zh.md %}">formats</a> page
for more details and more format options.
Note: Either this option or the <code>'format'</code> option are required.
<td>序列化和反序列化 Kafka 消息体时使用的格式。
请参阅 <a href="{% link dev/table/connectors/formats/index.zh.md %}">格式</a> 页面以获取更多关于格式的细节和相关配置项。
注意:该配置项和 <code>'format'</code> 二者之一是必需的。
</td>
</tr>
<tr>
<td><h5>value.fields-include</h5></td>
<td>optional</td>
<td>可选</td>
<td style="word-wrap: break-word;">ALL</td>
<td><p>Enum</p>Possible values: [ALL, EXCEPT_KEY]</td>
<td>Defines a strategy how to deal with key columns in the data type of the value format. By
default, <code>'ALL'</code> physical columns of the table schema will be included in the value
format which means that key columns appear in the data type for both the key and value format.
<td><p>枚举类型</p>可选值: [ALL, EXCEPT_KEY]</td>
<td>定义了消息体(Value)格式如何处理消息键(Key)字段。默认情况下,表结构中
<code>'ALL'</code> 即所有的字段都会包含在消息体格式中,即消息键字段在消息键和消息体格式中都会出现。
</td>
</tr>
<tr>
<td><h5>scan.startup.mode</h5></td>
<td>optional</td>
<td>可选</td>
<td style="word-wrap: break-word;">group-offsets</td>
<td>String</td>
<td>Startup mode for Kafka consumer, valid values are <code>'earliest-offset'</code>, <code>'latest-offset'</code>, <code>'group-offsets'</code>, <code>'timestamp'</code> and <code>'specific-offsets'</code>.
See the following <a href="#start-reading-position">Start Reading Position</a> for more details.</td>
<td>Kafka consumer 的启动模式。有效的值有:<code>'earliest-offset'</code><code>'latest-offset'</code><code>'group-offsets'</code><code>'timestamp'</code><code>'specific-offsets'</code>
请参阅下方 <a href="#起始消费位点">起始消费位点</a> 一节以获取更多细节。</td>
</tr>
<tr>
<td><h5>scan.startup.specific-offsets</h5></td>
<td>optional</td>
<td style="word-wrap: break-word;">(none)</td>
<td>可选</td>
<td style="word-wrap: break-word;">()</td>
<td>String</td>
<td>Specify offsets for each partition in case of <code>'specific-offsets'</code> startup mode, e.g. <code>'partition:0,offset:42;partition:1,offset:300'</code>.
<td>在使用 <code>'specific-offsets'</code> 启动模式时为每个 partition 指定位点,例如 <code>'partition:0,offset:42;partition:1,offset:300'</code>
</td>
</tr>
<tr>
<td><h5>scan.startup.timestamp-millis</h5></td>
<td>optional</td>
<td style="word-wrap: break-word;">(none)</td>
<td>可选</td>
<td style="word-wrap: break-word;">()</td>
<td>Long</td>
<td>Start from the specified epoch timestamp (milliseconds) used in case of <code>'timestamp'</code> startup mode.</td>
<td>在使用 <code>'timestamp'</code> 启动模式时指定启动的时间戳(毫秒单位)</td>
</tr>
<tr>
<td><h5>scan.topic-partition-discovery.interval</h5></td>
<td>optional</td>
<td style="word-wrap: break-word;">(none)</td>
<td>可选</td>
<td style="word-wrap: break-word;">()</td>
<td>Duration</td>
<td>Interval for consumer to discover dynamically created Kafka topics and partitions periodically.</td>
<td>Consumer 定期探测动态创建的 Kafka topic 和 partition 的时间间隔</td>
</tr>
<tr>
<td><h5>sink.partitioner</h5></td>
<td>optional</td>
<td style="word-wrap: break-word;">'default'</td>
<td>可选</td>
<td style="word-wrap: break-word;">(无)</td>
<td>String</td>
<td>Output partitioning from Flink's partitions into Kafka's partitions. Valid values are
<td>Flink partition 到 Kafka partition 的分区映射关系,可选值有:
<ul>
<li><code>default</code>: use the kafka default partitioner to partition records.</li>
<li><code>fixed</code>: each Flink partition ends up in at most one Kafka partition.</li>
<li><code>round-robin</code>: a Flink partition is distributed to Kafka partitions sticky round-robin. It only works when record's keys are not specified.</li>
<li>Custom <code>FlinkKafkaPartitioner</code> subclass: e.g. <code>'org.mycompany.MyPartitioner'</code>.</li>
<li><code>default</code>: 使用 Kafka 默认的分区器对消息进行分区</li>
<li><code>fixed</code>: 每个 Flink partition 最终对应最多一个 Kafka partition。</li>
<li><code>round-robin</code>: Flink partition 按轮循 (round-robin) 的模式对应到 Kafka partition。</li>
<li>自定义 <code>FlinkKafkaPartitioner</code> 的子类: 例如 <code>'org.mycompany.MyPartitioner'</code></li>
</ul>
See the following <a href="#sink-partitioning">Sink Partitioning</a> for more details.
请参阅下方 <a href="#sink-分区">Sink 分区</a> 一节以获取更多细节。
</td>
</tr>
<tr>
<td><h5>sink.semantic</h5></td>
<td>optional</td>
<td>可选</td>
<td style="word-wrap: break-word;">at-least-once</td>
<td>String</td>
<td>Defines the delivery semantic for the Kafka sink. Valid enumerationns are <code>'at-least-once'</code>, <code>'exactly-once'</code> and <code>'none'</code>.
See <a href='#consistency-guarantees'>Consistency guarantees</a> for more details. </td>
<td>定义 Kafka sink 的语义. 有效值有 <code>'at-lease-once'</code><code>'exactly-once'</code><code>'none'</code>
请参阅 <a href='#一致性保证'>一致性保证</a> 一节以获取更多细节。</td>
</tr>
<tr>
<td><h5>sink.parallelism</h5></td>
<td>optional</td>
<td style="word-wrap: break-word;">(none)</td>
<td>可选</td>
<td style="word-wrap: break-word;">()</td>
<td>Integer</td>
<td>Defines the parallelism of the Kafka sink operator. By default, the parallelism is determined by the framework using the same parallelism of the upstream chained operator.</td>
<td>定义 Kafka sink 算子的并行度。默认情况下,并行度由框架定义为与上游串联的算子相同。</td>
</tr>
</tbody>
</table>
Features
特性
----------------
### 消息键(Key)与消息体(Value)的格式
### Key and Value Formats
Both the key and value part of a Kafka record can be serialized to and deserialized from raw bytes using
one of the given [formats]({% link dev/table/connectors/formats/index.zh.md %}).
Kafka 消息的消息键和消息体部分都可以使用某种 [格式]({% link dev/table/connectors/formats/index.zh.md %}) 来序列化或反序列化成二进制数据。
**Value Format**
**消息体格式**
Since a key is optional in Kafka records, the following statement reads and writes records with a configured
value format but without a key format. The `'format'` option is a synonym for `'value.format'`. All format
options are prefixed with the format identifier.
由于 Kafka 消息中消息键是可选的,以下语句将使用消息体格式读取和写入消息,但不使用消息键格式。`'format'` 选项与 `'value.format'` 意义相同。
所有的格式配置使用格式识别符作为前缀。
<div class="codetabs" markdown="1">
<div data-lang="SQL" markdown="1">
......@@ -405,16 +392,15 @@ CREATE TABLE KafkaTable (,
</div>
</div>
The value format will be configured with the following data type:
消息体格式将配置为以下的数据类型:
{% highlight text %}
ROW<`user_id` BIGINT, `item_id` BIGINT, `behavior` STRING>
{% endhighlight %}
**Key and Value Format**
**消息键和消息体格式**
The following example shows how to specify and configure key and value formats. The format options are
prefixed with either the `'key'` or `'value'` plus format identifier.
以下的示例展示了如何配置和使用消息键和消息体格式。格式配置使用 `'key'``'value'` 加上格式识别符作为前缀。
<div class="codetabs" markdown="1">
<div data-lang="SQL" markdown="1">
......@@ -440,28 +426,24 @@ CREATE TABLE KafkaTable (
</div>
</div>
The key format includes the fields listed in `'key.fields'` (using `';'` as the delimiter) in the same
order. Thus, it will be configured with the following data type:
消息键格式包含了在 `'key.fields'` 中列出的字段(使用 `';'` 分隔)和字段顺序,因此将配置为以下的数据类型:
{% highlight text %}
ROW<`user_id` BIGINT, `item_id` BIGINT>
{% endhighlight %}
Since the value format is configured with `'value.fields-include' = 'ALL'`, key fields will also end up in
the value format's data type:
由于消息体格式配置为 `'value.fields-include' = 'ALL'`,消息键字段也会出现在消息体格式的数据类型中:
{% highlight text %}
ROW<`user_id` BIGINT, `item_id` BIGINT, `behavior` STRING>
{% endhighlight %}
**Overlapping Format Fields**
**重名的格式字段**
The connector cannot split the table's columns into key and value fields based on schema information
if both key and value formats contain fields of the same name. The `'key.fields-prefix'` option allows
to give key columns a unique name in the table schema while keeping the original names when configuring
the key format.
如果消息键字段和消息体字段重名,连接器无法根据表结构信息将这些列区分开。`'key.fields-prefix'` 配置项可以为消息键字段指定
唯一的名称,并在配置消息体格式的时候保留原名。
The following example shows a key and value format that both contain a `version` field:
以下示例展示了在消息键和消息体中同时包含 `version` 字段的情况:
<div class="codetabs" markdown="1">
<div data-lang="SQL" markdown="1">
......@@ -487,93 +469,87 @@ CREATE TABLE KafkaTable (
</div>
</div>
The value format must be configured in `'EXCEPT_KEY'` mode. The formats will be configured with
the following data types:
消息体格式必须配置为 `'EXCEPT_KEY'` 模式。格式将被配置为以下的数据类型:
{% highlight text %}
key format:
消息键格式:
ROW<`version` INT, `user_id` BIGINT, `item_id` BIGINT>
value format:
消息体格式:
ROW<`version` INT, `behavior` STRING>
{% endhighlight %}
### Topic and Partition Discovery
### Topic 和 Partition 的探测
The config option `topic` and `topic-pattern` specifies the topics or topic pattern to consume for source. The config option `topic` can accept topic list using semicolon separator like 'topic-1;topic-2'.
The config option `topic-pattern` will use regular expression to discover the matched topic. For example, if the `topic-pattern` is `test-topic-[0-9]`, then all topics with names that match the specified regular expression (starting with `test-topic-` and ending with a single digit)) will be subscribed by the consumer when the job starts running.
`topic``topic-pattern` 配置项决定了 source 消费的 topic 或 topic 的匹配规则。`topic` 配置项可接受使用分号间隔的 topic 列表,例如 `topic-1;topic-2`
`topic-pattern` 配置项使用正则表达式来探测匹配的 topic。例如 `topic-pattern` 设置为 `test-topic-[0-9]`,则在作业启动时,所有匹配该正则表达式的 topic(以 `test-topic-` 开头,以一位数字结尾)都将被 consumer 订阅。
To allow the consumer to discover dynamically created topics after the job started running, set a non-negative value for `scan.topic-partition-discovery.interval`. This allows the consumer to discover partitions of new topics with names that also match the specified pattern.
为允许 consumer 在作业启动之后探测到动态创建的 topic,请将 `scan.topic-partition-discovery.interval` 配置为一个非负值,从而使 consumer 能够探测匹配名称规则的 topic 中新的 partition。
Please refer to [Kafka DataStream Connector documentation]({% link dev/connectors/kafka.zh.md %}#kafka-consumers-topic-and-partition-discovery) for more about topic and partition discovery.
请参阅 [Kafka DataStream 连接器文档]({% link dev/connectors/kafka.zh.md %}#kafka-consumers-topic-and-partition-discovery) 以获取更多关于 topic 和 partition 探测的信息。
Note that topic list and topic pattern only work in sources. In sinks, Flink currently only supports a single topic.
请注意 topic 列表和 topic 匹配规则只适用于 source,对于 sink 端 Flink 目前只支持单一 topic。
### Start Reading Position
### 起始消费位点
The config option `scan.startup.mode` specifies the startup mode for Kafka consumer. The valid enumerations are:
`scan.startup.mode` 配置项决定了 Kafka consumer 的启动模式。有效值有:
<ul>
<li><span markdown="span">`group-offsets`</span>: start from committed offsets in ZK / Kafka brokers of a specific consumer group.</li>
<li><span markdown="span">`earliest-offset`</span>: start from the earliest offset possible.</li>
<li><span markdown="span">`latest-offset`</span>: start from the latest offset.</li>
<li><span markdown="span">`timestamp`</span>: start from user-supplied timestamp for each partition.</li>
<li><span markdown="span">`specific-offsets`</span>: start from user-supplied specific offsets for each partition.</li>
<li><span markdown="span">`group-offsets`</span>: 从 Zookeeper/Kafka 中某个指定的消费组已提交的偏移量开始。 </li>
<li><span markdown="span">`earliest-offset`</span>: 从可能的最早偏移量开始。 </li>
<li><span markdown="span">`latest-offset`</span>: 从最末尾偏移量开始。 </li>
<li><span markdown="span">`timestamp`</span>: 从用户为每个 partition 指定的时间戳开始。 </li>
<li><span markdown="span">`specific-offsets`</span>: 从用户为每个 partition 指定的偏移量开始。</li>
</ul>
The default option value is `group-offsets` which indicates to consume from last committed offsets in ZK / Kafka brokers.
默认值 `group-offsets` 表示从 Zookeeper/Kafka 中最近一次已提交的偏移量开始消费。
If `timestamp` is specified, another config option `scan.startup.timestamp-millis` is required to specify a specific startup timestamp in milliseconds since January 1, 1970 00:00:00.000 GMT.
如果使用了 `timestamp`,必须使用另外一个配置项 `scan.startup.timestamp-millis` 来指定一个从格林尼治标准时间 1970 年 1 月 1 日 00:00:00.000 开始计算的毫秒单位时间戳作为起始时间位点。
If `specific-offsets` is specified, another config option `scan.startup.specific-offsets` is required to specify specific startup offsets for each partition,
e.g. an option value `partition:0,offset:42;partition:1,offset:300` indicates offset `42` for partition `0` and offset `300` for partition `1`.
如果使用了 `specific-offsets`,必须使用另一个配置项 `scan.startup.specific-offsets` 来为每个 partition 指定起始偏移量,例如 `partition:0,offset:42;partition:1,offset:300` 表示
partition `0` 从偏移量 `42` 开始,partition `1` 从偏移量 `300` 开始。
### Changelog Source
### 变更日志(Changelog) Source
Flink natively supports Kafka as a changelog source. If messages in Kafka topic is change event captured from other databases using CDC tools, then you can use a CDC format to interpret messages as INSERT/UPDATE/DELETE messages into Flink SQL system.
Flink provides two CDC formats [debezium-json]({% link dev/table/connectors/formats/debezium.zh.md %}) and [canal-json]({% link dev/table/connectors/formats/canal.zh.md %}) to interpret change events captured by [Debezium](https://debezium.io/) and [Canal](https://github.com/alibaba/canal/wiki).
The changelog source is a very useful feature in many cases, such as synchronizing incremental data from databases to other systems, auditing logs, materialized views on databases, temporal join changing history of a database table and so on.
See more about how to use the CDC formats in [debezium-json]({% link dev/table/connectors/formats/debezium.zh.md %}) and [canal-json]({% link dev/table/connectors/formats/canal.zh.md %}).
Flink 原生支持使用 Kafka 作为变更日志(changelog) source. 如果 Kafka topic 中的消息是通过变更数据捕获(CDC)工具从其他数据库捕获的变更事件,则您可以使用 CDC 格式将消息解析为 Flink SQL 系统中的插入(INSERT)、更新(UPDATE)、删除(DELETE)消息。
Flink 提供两种 CDC 格式:[debezium-json]({% link dev/table/connectors/formats/debezium.zh.md %}) 和 [canal-json]({% link dev/table/connectors/formats/canal.zh.md %}) 来解析由 [Debezium](https://debezium.io/)[Canal](https://github.com/alibaba/canal/wiki) 变更事件。
变更日志(changelog) source 是一个用途广泛的特性,例如将数据库增量数据同步到其他系统、日志审核、数据库物化视图、时态 join 数据表变更历史等。
请参阅 [debezium-json]({% link dev/table/connectors/formats/debezium.zh.md %}) 和 [canal-json]({% link dev/table/connectors/formats/canal.zh.md %}) 以了解如何使用 CDC 格式。
### Sink Partitioning
### Sink 分区
The config option `sink.partitioner` specifies output partitioning from Flink's partitions into Kafka's partitions.
By default, Flink uses the [Kafka default partitioner](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java) to parititon records. It uses the [sticky partition strategy](https://www.confluent.io/blog/apache-kafka-producer-improvements-sticky-partitioner/) for records with null keys and uses a murmur2 hash to compute the partition for a record with the key defined.
配置项 `sink.partitioner` 指定了从 Flink 分区到 Kafka 分区的映射关系。
默认情况下,Flink 使用 [Kafka 默认分区器](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java)
来对消息分区。默认分区器对没有消息键的消息使用 [粘性分区策略(sticky partition strategy)](https://www.confluent.io/blog/apache-kafka-producer-improvements-sticky-partitioner/) 进行分区,对含有消息键的消息使用 murmur2 哈希算法计算分区。
In order to control the routing of rows into partitions, a custom sink partitioner can be provided. The 'fixed' partitioner will write the records in the same Flink partition into the same partition, which could reduce the cost of the network connections.
为了控制数据行到分区的路由,亦可以指定一个自定义 sink 分区器。'fixed' 分区器会将同一个 Flink 分区中的消息写入同一个 Kafka 分区,从而减少网络连接的开销。
### Consistency guarantees
### 一致性保证
By default, a Kafka sink ingests data with at-least-once guarantees into a Kafka topic if the query is executed with [checkpointing enabled]({% link dev/stream/state/checkpointing.zh.md %}#enabling-and-configuring-checkpointing).
默认情况下,语句在 [启用 checkpoint]({% link dev/stream/state/checkpointing.zh.md %}#enabling-and-configuring-checkpointing) 模式下执行时, Kafka sink 按照至少一次(at-lease-once)语义保证将数据写入 Kafka topic。
With Flink's checkpointing enabled, the `kafka` connector can provide exactly-once delivery guarantees.
当 Flink checkpoint 启用时,Kafka 连接器可以提供精确一次(exactly-once)的语义保证。
Besides enabling Flink's checkpointing, you can also choose three different modes of operating chosen by passing appropriate `sink.semantic` option:
除了启用 Flink checkpoint,还可以通过传入对应的 `sink.semantic` 选项来选择三种不同的运行模式:
* `NONE`: Flink will not guarantee anything. Produced records can be lost or they can be duplicated.
* `AT_LEAST_ONCE` (default setting): This guarantees that no records will be lost (although they can be duplicated).
* `EXACTLY_ONCE`: Kafka transactions will be used to provide exactly-once semantic. Whenever you write
to Kafka using transactions, do not forget about setting desired `isolation.level` (`read_committed`
or `read_uncommitted` - the latter one is the default value) for any application consuming records
from Kafka.
* `none`: Flink 不保证任何语义,已经写出的记录可能会丢失或重复。
* `at-least-once` (默认设置): 保证没有记录会丢失(但可能发生重复)。
* `exactly-once`: 使用 Kafka 事务提供精确一次(exactly-once)语义。当使用事务向 Kafka 写入数据时,请将所有消费这些记录的应用中的 `isolation.level` 配置项设置成实际所需的值(`read_committed``read_uncommitted`,后者为默认值)。
Please refer to [Kafka documentation]({% link dev/connectors/kafka.zh.md %}#kafka-producers-and-fault-tolerance) for more caveats about delivery guarantees.
请参阅 [Kafka 文档]({% link dev/connectors/kafka.zh.md %}#kafka-producers-and-fault-tolerance) 以获取更多关于语义保证的信息。
### Source Per-Partition Watermarks
### Source 按分区 Watermark
Flink supports to emit per-partition watermarks for Kafka. Watermarks are generated inside the Kafka
consumer. The per-partition watermarks are merged in the same way as watermarks are merged during streaming
shuffles. The output watermark of the source is determined by the minimum watermark among the partitions
it reads. If some partitions in the topics are idle, the watermark generator will not advance. You can
alleviate this problem by setting the [`'table.exec.source.idle-timeout'`]({% link dev/table/config.zh.md %}#table-exec-source-idle-timeout)
option in the table configuration.
Flink 对于 Kafka 支持发送按分区的 watermark。Watermark 在 Kafka consumer 中生成。按分区 watermark 的合并方式和在流 shuffle 时合并
Watermark 的方式一致。 Source 输出的 watermark 由读取的分区中最小的 watermark 决定。如果 topic 的某个分区闲置,则 watermark 生成器将不会向前推进。
您可以通过在表配置中设置 [`'table.exec.source.idle-timeout'`]({% link dev/table/config.md %}#table-exec-source-idle-timeout) 选项来避免
上述问题。
Please refer to [Kafka watermark strategies]({% link dev/event_timestamps_watermarks.zh.md %}#watermark-strategies-and-the-kafka-connector)
for more details.
请参阅 [Kafka watermark 策略]({% link dev/event_timestamps_watermarks.md %}#watermark-strategies-and-the-kafka-connector) 一节以获取更多细节。
Data Type Mapping
数据类型映射
----------------
Kafka stores message keys and values as bytes, so Kafka doesn't have schema or data types. The Kafka messages are deserialized and serialized by formats, e.g. csv, json, avro.
Thus, the data type mapping is determined by specific formats. Please refer to [Formats]({% link dev/table/connectors/formats/index.zh.md %}) pages for more details.
Kafka 将消息键值以二进制进行存储,因此 Kafka 并不存在 schema 或数据类型。Kafka 消息使用格式配置进行序列化和反序列化,例如 csv、json、avro 等。
因此,数据类型映射取决于使用的格式。请参阅 [格式]({% link dev/table/connectors/formats/index.zh.md %}) 页面以获取更多细节。
{% top %}
......@@ -35,7 +35,7 @@ Dependencies
------------
{% assign connector = site.data.sql-connectors['kinesis'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
......@@ -39,7 +39,7 @@ Upsert Kafka 连接器支持以 upsert 方式从 Kafka topic 中读取数据并
------------
{% assign connector = site.data.sql-connectors['upsert-kafka'] %}
{% include sql-connector-download-table.html
{% include sql-connector-download-table.zh.html
connector=connector
%}
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册