Merge branch 'feature/async.scheduler' of https://github.com/taosdata/TDengine...

Merge branch 'feature/async.scheduler' of https://github.com/taosdata/TDengine into feature/async.scheduler

Merge branch 'feature/async.scheduler' of https://github.com/taosdata/TDengine...
Merge branch 'feature/async.scheduler' of https://github.com/taosdata/TDengine into feature/async.scheduler
041df53f · dapan1121 · 6c6237ab · 49510f63 · 041df53f · 041df53f
183 changed file
--- a/.gitignore
+++ b/.gitignore
@@ -46,6 +46,7 @@ psim/
 pysim/
 *.out
 *DS_Store
+tests/script/api/batchprepare

 # Doxygen Generated files
 html/
@@ -108,4 +109,4 @@ TAGS
 contrib/*
 !contrib/CMakeLists.txt
 !contrib/test
-sql
\ No newline at end of file
+sql
--- a/docs-cn/02-intro.md
+++ b/docs-cn/02-intro.md
@@ -119,7 +119,6 @@ TDengine的主要功能如下：
 - [用 InfluxDB 开源的性能测试工具对比 InfluxDB 和 TDengine](https://www.taosdata.com/blog/2020/01/13/1105.html)
 - [TDengine 与 OpenTSDB 对比测试](https://www.taosdata.com/blog/2019/08/21/621.html)
 - [TDengine 与 Cassandra 对比测试](https://www.taosdata.com/blog/2019/08/14/573.html)
- [TDengine 与 InfluxDB 对比测试](https://www.taosdata.com/blog/2019/07/19/419.html)
 - [TDengine VS InfluxDB ，写入性能大 PK ！](https://www.taosdata.com/2021/11/05/3248.html)
 - [TDengine 和 InfluxDB 查询性能对比测试报告](https://www.taosdata.com/2022/02/22/5969.html)
 - [TDengine 与 InfluxDB、OpenTSDB、Cassandra、MySQL、ClickHouse 等数据库的对比测试报告](https://www.taosdata.com/downloads/TDengine_Testing_Report_cn.pdf)
--- a/docs-cn/13-operation/11-optimize.md
+++ b/docs-cn/13-operation/11-optimize.md
---
-title: 性能优化
---
-
-因数据行 [update](/train-faq/faq/#update)、表删除、数据过期等原因，TDengine 的磁盘存储文件有可能出现数据碎片，影响查询操作的性能表现。从 2.1.3.0 版本开始，新增 SQL 指令 COMPACT 来启动碎片重整过程：
-
-```sql
-COMPACT VNODES IN (vg_id1, vg_id2, ...)
-```
-
-COMPACT 命令对指定的一个或多个 VGroup 启动碎片重整，系统会通过任务队列尽快安排重整操作的具体执行。COMPACT 指令所需的 VGroup id，可以通过 `SHOW VGROUPS;` 指令的输出结果获取；而且在 `SHOW VGROUPS;` 中会有一个 compacting 列，值为 2 时表示对应的 VGroup 处于排队等待进行重整的状态，值为 1 时表示正在进行碎片重整，为 0 时则表示并没有处于重整状态（未要求进行重整或已经完成重整）。
-
-需要注意的是，碎片重整操作会大幅消耗磁盘 I/O。因此在重整进行期间，有可能会影响节点的写入和查询性能，甚至在极端情况下导致短时间的阻写。
-
-## 存储参数优化
-
-不同应用场景的数据往往具有不同的数据特征，比如保留天数、副本数、采集频次、记录大小、采集点的数量、压缩等都可完全不同。为获得在存储上的最高效率，TDengine 提供如下存储相关的系统配置参数（既可以作为 create database 指令的参数，也可以写在 taos.cfg 配置文件中用来设定创建新数据库时所采用的默认值）：
-
-| #   | 配置参数名称 | 单位 | 含义                                                                                                                                                                                                                                                                 | **取值范围**                                                                                         | **缺省值** |
-| --- | ------------ | ---- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | ---------- |
-| 1   | days         | 天   | 一个数据文件存储数据的时间跨度                                                                                                                                                                                                                                       | 1-3650                                                                                               | 10         |
-| 2   | keep         | 天   | （可通过 alter database 修改）数据库中数据保留的天数。                                                                                                                                                                                                               | 1-36500                                                                                              | 3650       |
-| 3   | cache        | MB   | 内存块的大小                                                                                                                                                                                                                                                         | 1-128                                                                                                | 16         |
-| 4   | blocks       |      | （可通过 alter database 修改）每个 VNODE（TSDB）中有多少个 cache 大小的内存块。因此一个 VNODE 使用的内存大小粗略为（cache \* blocks）。                                                                                                                              | 3-10000                                                                                              | 6          |
-| 5   | quorum       |      | （可通过 alter database 修改）多副本环境下指令执行的确认数要求                                                                                                                                                                                                       | 1-2                                                                                                  | 1          |
-| 6   | minRows      |      | 文件块中记录的最小条数                                                                                                                                                                                                                                               | 10-1000                                                                                              | 100        |
-| 7   | maxRows      |      | 文件块中记录的最大条数                                                                                                                                                                                                                                               | 200-10000                                                                                            | 4096       |
-| 8   | comp         |      | （可通过 alter database 修改）文件压缩标志位                                                                                                                                                                                                                         | 0：关闭，1:一阶段压缩，2:两阶段压缩                                                                  | 2          |
-| 9   | walLevel     |      | （作为 database 的参数时名为 wal；在 taos.cfg 中作为参数时需要写作 walLevel）WAL 级别                                                                                                                                                                                | 1：写 WAL，但不执行 fsync；2：写 WAL, 而且执行 fsync                                                 | 1          |
-| 10  | fsync        | 毫秒 | 当 wal 设置为 2 时，执行 fsync 的周期。设置为 0，表示每次写入，立即执行 fsync。                                                                                                                                                                                      |                                                                                                      | 3000       |
-| 11  | replica      |      | （可通过 alter database 修改）副本个数                                                                                                                                                                                                                               | 1-3                                                                                                  | 1          |
-| 12  | precision    |      | 时间戳精度标识（2.1.2.0 版本之前、2.0.20.7 版本之前在 taos.cfg 文件中不支持此参数。）（从 2.1.5.0 版本开始，新增对纳秒时间精度的支持）                                                                                                                               | ms 表示毫秒，us 表示微秒，ns 表示纳秒                                                                | ms         |
-| 13  | update       |      | 是否允许数据更新（从 2.1.7.0 版本开始此参数支持 0 ～ 2 的取值范围，在此之前取值只能是 [0, 1]；而 2.0.8.0 之前的版本在 SQL 指令中不支持此参数。）                                                                                                                     | 0：不允许；1：允许更新整行；2：允许部分列更新。                                                      | 0          |
-| 14  | cacheLast    |      | （可通过 alter database 修改）是否在内存中缓存子表的最近数据（从 2.1.2.0 版本开始此参数支持 0 ～ 3 的取值范围，在此之前取值只能是 [0, 1]；而 2.0.11.0 之前的版本在 SQL 指令中不支持此参数。）（2.1.2.0 版本之前、2.0.20.7 版本之前在 taos.cfg 文件中不支持此参数。） | 0：关闭；1：缓存子表最近一行数据；2：缓存子表每一列的最近的非 NULL 值；3：同时打开缓存最近行和列功能 | 0          |
-
-对于一个应用场景，可能有多种数据特征的数据并存，最佳的设计是将具有相同数据特征的表放在一个库里，这样一个应用有多个库，而每个库可以配置不同的存储参数，从而保证系统有最优的性能。TDengine 允许应用在创建库时指定上述存储参数，如果指定，该参数就将覆盖对应的系统配置参数。举例，有下述 SQL：
-
-```sql
- CREATE DATABASE demo DAYS 10 CACHE 32 BLOCKS 8 REPLICA 3 UPDATE 1;
-```
-
-该 SQL 创建了一个库 demo, 每个数据文件存储 10 天数据，内存块为 32 兆字节，每个 VNODE 占用 8 个内存块，副本数为 3，允许更新，而其他参数与系统配置完全一致。
-
-一个数据库创建成功后，仅部分参数可以修改并实时生效，其余参数不能修改：
-
-| **参数名**  | **能否修改** | **范围**                                                   | **修改语法示例**                       |
-| ----------- | ------------ | ---------------------------------------------------------- | -------------------------------------- |
-| name        |              |                                                            |                                        |
-| create time |              |                                                            |                                        |
-| ntables     |              |                                                            |                                        |
-| vgroups     |              |                                                            |                                        |
-| replica     | **YES**      | 在线 dnode 数目为:<br/>1：1-1；<br/>2：1-2；<br/>\>=3：1-3 | ALTER DATABASE <dbname\> REPLICA _n_   |
-| quorum      | **YES**      | 1-2                                                        | ALTER DATABASE <dbname\> QUORUM _n_    |
-| days        |              |                                                            |                                        |
-| keep        | **YES**      | days-365000                                                | ALTER DATABASE <dbname\> KEEP _n_      |
-| cache       |              |                                                            |                                        |
-| blocks      | **YES**      | 3-1000                                                     | ALTER DATABASE <dbname\> BLOCKS _n_    |
-| minrows     |              |                                                            |                                        |
-| maxrows     |              |                                                            |                                        |
-| wal         |              |                                                            |                                        |
-| fsync       |              |                                                            |                                        |
-| comp        | **YES**      | 0-2                                                        | ALTER DATABASE <dbname\> COMP _n_      |
-| precision   |              |                                                            |                                        |
-| status      |              |                                                            |                                        |
-| update      |              |                                                            |                                        |
-| cachelast   | **YES**      | 0 \| 1 \| 2 \| 3                                           | ALTER DATABASE <dbname\> CACHELAST _n_ |
-
-**说明：**在 2.1.3.0 版本之前，通过 ALTER DATABASE 语句修改这些参数后，需要重启服务器才能生效。
-
-TDengine 集群中加入一个新的 dnode 时，涉及集群相关的一些参数必须与已有集群的配置相同，否则不能成功加入到集群中。会进行校验的参数如下：
-
- numOfMnodes：系统中管理节点个数。默认值：3。（2.0 版本从 2.0.20.11 开始、2.1 及以上版本从 2.1.6.0 开始，numOfMnodes 默认值改为 1。）
- mnodeEqualVnodeNum: 一个 mnode 等同于 vnode 消耗的个数。默认值：4。
- offlineThreshold: dnode 离线阈值，超过该时间将导致该 dnode 从集群中删除。单位为秒，默认值：86400\*10（即 10 天）。
- statusInterval: dnode 向 mnode 报告状态时长。单位为秒，默认值：1。
- maxTablesPerVnode: 每个 vnode 中能够创建的最大表个数。默认值：1000000。
- maxVgroupsPerDb: 每个数据库中能够使用的最大 vgroup 个数。
- arbitrator: 系统中裁决器的 endpoint，缺省为空。
- timezone、locale、charset 的配置见客户端配置。（2.0.20.0 及以上的版本里，集群中加入新节点已不要求 locale 和 charset 参数取值一致）
- balance：是否启用负载均衡。0：否，1：是。默认值：1。
- flowctrl：是否启用非阻塞流控。0：否，1：是。默认值：1。
- slaveQuery：是否启用 slave vnode 参与查询。0：否，1：是。默认值：1。
- adjustMaster：是否启用 vnode master 负载均衡。0：否，1：是。默认值：1。
-
-为方便调试，可通过 SQL 语句临时调整每个 dnode 的日志配置，系统重启后会失效：
-
-```sql
-ALTER DNODE <dnode_id> <config>
-```
-
- dnode_id: 可以通过 SQL 语句"SHOW DNODES"命令获取
- config: 要调整的日志参数，在如下列表中取值
-  > resetlog 截断旧日志文件，创建一个新日志文件
-  > debugFlag < 131 | 135 | 143 > 设置 debugFlag 为 131、135 或者 143
-
-例如：
-
-```
-alter dnode 1 debugFlag 135;
-```
--- a/docs-cn/21-tdinternal/02-replica.md
+++ b/docs-cn/21-tdinternal/02-replica.md
--- a/docs-cn/21-tdinternal/03-taosd.md
+++ b/docs-cn/21-tdinternal/03-taosd.md
---
-sidebar_label: taosd 的设计
-title: taosd的设计
---
-
-逻辑上，TDengine 系统包含 dnode，taosc 和 App，dnode 是服务器侧执行代码 taosd 的一个运行实例，因此 taosd 是 TDengine 的核心，本文对 taosd 的设计做一简单的介绍，模块内的实现细节请见其他文档。
-
-## 系统模块图
-
-taosd 包含 rpc，dnode，vnode，tsdb，query，cq，sync，wal，mnode，http，monitor 等模块，具体如下图：
-
-![modules.png](/img/architecture/modules.png)
-
-taosd 的启动入口是 dnode 模块，dnode 然后启动其他模块，包括可选配置的 http，monitor 模块。taosc 或 dnode 之间交互的消息都是通过 rpc 模块进行，dnode 模块根据接收到的消息类型，将消息分发到 vnode 或 mnode 的消息队列，或由 dnode 模块自己消费。dnode 的工作线程（worker）消费消息队列里的消息，交给 mnode 或 vnode 进行处理。下面对各个模块做简要说明。
-
-## RPC 模块
-
-该模块负责 taosd 与 taosc，以及其他数据节点之间的通讯。TDengine 没有采取标准的 HTTP 或 gRPC 等第三方工具，而是实现了自己的通讯模块 RPC。
-
-考虑到物联网场景下，数据写入的包一般不大，因此除支持 TCP 连接之外，RPC 还支持 UDP 连接。当数据包小于 15K 时，RPC 将采用 UDP 方式进行连接，否则将采用 TCP 连接。对于查询类的消息，RPC 不管包的大小，总是采取 TCP 连接。对于 UDP 连接，RPC 实现了自己的超时、重传、顺序检查等机制，以保证数据可靠传输。
-
-RPC 模块还提供数据压缩功能，如果数据包的字节数超过系统配置参数 compressMsgSize，RPC 在传输中将自动压缩数据，以节省带宽。
-
-为保证数据的安全和数据的 integrity，RPC 模块采用 MD5 做数字签名，对数据的真实性和完整性进行认证。
-
-## DNODE 模块
-
-该模块是整个 taosd 的入口，它具体负责如下任务：
-
- 系统的初始化，包括
-  - 从文件 taos.cfg 读取系统配置参数，从文件 dnodeCfg.json 读取数据节点的配置参数；
-  - 启动 RPC 模块，并建立起与 taosc 通讯的 server 连接，与其他数据节点通讯的 server 连接；
-  - 启动并初始化 dnode 的内部管理，该模块将扫描该数据节点已有的 vnode ，并打开它们；
-  - 初始化可配置的模块，如 mnode，http，monitor 等。
- 数据节点的管理，包括
-  - 定时的向 mnode 发送 status 消息，报告自己的状态；
-  - 根据 mnode 的指示，创建、改变、删除 vnode；
-  - 根据 mnode 的指示，修改自己的配置参数；
- 消息的分发、消费，包括
-  - 为每一个 vnode 和 mnode 的创建并维护一个读队列、一个写队列；
-  - 将从 taosc 或其他数据节点来的消息，根据消息类型，将其直接分发到不同的消息队列，或由自己的管理模块直接消费；
-  - 维护一个读的线程池，消费读队列的消息，交给 vnode 或 mnode 处理。为支持高并发，一个读线程（worker）可以消费多个队列的消息，一个读队列可以由多个 worker 消费；
-  - 维护一个写的线程池，消费写队列的消息，交给 vnode 或 mnode 处理。为保证写操作的序列化，一个写队列只能由一个写线程负责，但一个写线程可以负责多个写队列。
-
-taosd 的消息消费由 dnode 通过读写线程池进行控制，是系统的中枢。该模块内的结构体图如下：
-
-![dnode.png](/img/architecture/dnode.png)
-
-## VNODE 模块
-
-vnode 是一独立的数据存储查询逻辑单元，但因为一个 vnode 只能容许一个 DB ，因此 vnode 内部没有 account，DB，user 等概念。为实现更好的模块化、封装以及未来的扩展，它有很多子模块，包括负责存储的 TSDB，负责查询的 query，负责数据复制的 sync，负责数据库日志的的 WAL，负责连续查询的 cq（continuous query），负责事件触发的流计算的 event 等模块，这些子模块只与 vnode 模块发生关系，与其他模块没有任何调用关系。模块图如下：
-
-![vnode.png](/img/architecture/vnode.png)
-
-vnode 模块向下，与 dnodeVRead，dnodeVWrite 发生互动，向上，与子模块发生互动。它主要的功能有：
-
- 协调各个子模块的互动。各个子模块之间都不直接调用，都需要通过 vnode 模块进行；
- 对于来自 taosc 或 mnode 的写操作，vnode 模块将其分解为写日志（WAL），转发（sync），本地存储（TSDB）子模块的操作；
- 对于查询操作，分发到 query 模块进行。
-
-一个数据节点里有多个 vnode，因此 vnode 模块是有多个运行实例的。每个运行实例是完全独立的。
-
-vnode 与其子模块是通过 API 直接调用，而不是通过消息队列传递。而且各个子模块只与 vnode 模块有交互，不与 dnode，rpc 等模块发生任何直接关联。
-
-## MNODE 模块
-
-mnode 是整个系统的大脑，负责整个系统的资源调度，负责 meta data 的管理与存储。
-
-一个运行的系统里，只有一个 mnode，但它有多个副本（由系统配置参数 numOfMnodes 控制）。这些副本分布在不同的 dnode 里，目的是保证系统的高可靠运行。副本之间的数据复制是采用同步而非异步的方式，以确保数据的一致性，确保数据不会丢失。这些副本会自动选举一个 Master，其他副本是 slave。所有数据更新类的操作，都只能在 master 上进行，而查询类的可以在 slave 节点上进行。代码实现上，同步模块与 vnode 共享，但 mnode 被分配一个特殊的 vgroup ID: 1，而且 quorum 大于 1。整个集群系统是由多个 dnode 组成的，运行的 mnode 的副本数不可能超过 dnode 的个数，但不会超过配置的副本数。如果某个 mnode 副本宕机一段时间，只要超过半数的 mnode 副本仍在运行，运行的 mnode 会自动根据整个系统的资源情况，在其他 dnode 里再启动一个 mnode，以保证运行的副本数。
-
-各个 dnode 通过信息交换，保存有 mnode 各个副本的 End Point 列表，并向其中的 master 节点定时（间隔由系统配置参数 statusInterval 控制）发送 status 消息，消息体里包含该 dnode 的 CPU、内存、剩余存储空间、vnode 个数，以及各个 vnode 的状态（存储空间、原始数据大小、记录条数、角色等）。这样 mnode 就了解整个系统的资源情况，如果用户创建新的表，就可以决定需要在哪个 dnode 创建；如果增加或删除 dnode，或者监测到某 dnode 数据过热、或离线太长，就可以决定需要挪动那些 vnode，以实现负载均衡。
-
-mnode 里还负责 account，user，DB，stable，table，vgroup，dnode 的创建、删除与更新。mnode 不仅把这些 entity 的 meta data 保存在内存，还做持久化存储。但为节省内存，各个表的标签值不保存在 mnode（保存在 vnode），而且子表不维护自己的 schema，而是与 stable 共享。为减小 mnode 的查询压力，taosc 会缓存 table、stable 的 schema。对于查询类的操作，各个 slave mnode 也可以提供，以减轻 master 压力。
-
-## TSDB 模块
-
-TSDB 模块是 vnode 中的负责快速高并发地存储和读取属于该 vnode 的表的元数据及采集的时序数据的引擎。除此之外，TSDB 还提供了表结构的修改、表标签值的修改等功能。TSDB 提供 API 供 vnode 和 query 等模块调用。TSDB 中存储了两类数据，1：元数据信息；2：时序数据
-
-### 元数据信息
-
-TSDB 中存储的元数据包含属于其所在的 vnode 中表的类型，schema 的定义等。对于超级表和超级表下的子表而言，又包含了 tag 的 schema 定义以及子表的 tag 值等。对于元数据信息而言，TSDB 就相当于一个全内存的 KV 型数据库，属于该 vnode 的表对象全部在内存中，方便快速查询表的信息。除此之外，TSDB 还对其中的子表，按照 tag 的第一列取值做了全内存的索引，大大加快了对于标签的过滤查询。TSDB 中的元数据的最新状态在落盘时，会以追加（append-only）的形式，写入到 meta 文件中。meta 文件只进行追加操作，即便是元数据的删除，也会以一条记录的形式写入到文件末尾。TSDB 也提供了对于元数据的修改操作，如表 schema 的修改，tag schema 的修改以及 tag 值的修改等。
-
-### 时序数据
-
-每个 TSDB 在创建时，都会事先分配一定量的内存缓冲区，且内存缓冲区的大小可配可修改。表采集的时序数据，在写入 TSDB 时，首先以追加的方式写入到分配的内存缓冲区中，同时建立基于时间戳的内存索引，方便快速查询。当内存缓冲区的数据积累到一定的程度时（达到内存缓冲区总大小的 1/3），则会触发落盘操作，将缓冲区中的数据持久化到硬盘文件上。时序数据在内存缓冲区中是以行（row）的形式存储的。
-
-而时序数据在写入到 TSDB 的数据文件时，是以列（column）的形式存储的。TSDB 中的数据文件包含多个数据文件组，每个数据文件组中又包含 .head、.data 和 .last 三个文件，如（v2f1801.head、v2f1801.data、v2f1801.last）数据文件组。TSDB 中的数据文件组是按照时间跨度进行分片的，默认是 10 天一个文件组，且可通过配置文件及建库选项进行配置。分片的数据文件组又按照编号递增排列，方便快速定位某一时间段的时序数据，高效定位数据文件组。时序数据在 TSDB 的数据文件中是以块的形式进行列式存储的，每个块中只包含一张表的数据，且数据在一个块中是按照时间顺序递增排列的。在一个数据文件组中，.head 文件负责存储数据块的索引及统计信息，如每个块的位置，压缩算法，时间戳范围等。存储在 .head 文件中一张表的索引信息是按照数据块中存储的数据的时间递增排列的，方便进行折半查找等工作。.head 和 .last 文件是存储真实数据块的文件，若数据块中的数据累计到一定程度，则会写入 .data 文件中，否则，会写入 .last 文件中，等待下次落盘时合并数据写入 .data 文件中，从而大大减少文件中块的个数，避免数据的过度碎片化。
-
-## Query 模块
-
-该模块负责整体系统的查询处理。客户端调用该该模块进行 SQL 语法解析，并将查询或写入请求发送到 vnode ，同时负责针对超级表的查询进行二阶段的聚合操作。在 vnode 端，该模块调用 TSDB 模块读取系统中存储的数据进行查询处理。query 模块还定义了系统能够支持的全部查询函数，查询函数的实现机制与查询框架无耦合，可以在不修改查询流程的情况下动态增加查询函数。详细的设计请参见《TDengine 2.0 查询模块设计》。
-
-## SYNC 模块
-
-该模块实现数据的多副本复制，包括 vnode 与 mnode 的数据复制，支持异步和同步两种复制方式，以满足 meta data 与时序数据不同复制的需求。因为它为 mnode 与 vnode 共享，系统为 mnode 副本预留了一个特殊的 vgroup ID:1。因此 vnode group 的 ID 是从 2 开始的。
-
-每个 vnode/mnode 模块实例会有一对应的 sync 模块实例，他们是一一对应的。详细设计请见[TDengine 2.0 数据复制模块设计](/tdinternal/replica/)
-
-## WAL 模块
-
-该模块负责将新插入的数据写入 write ahead log（WAL），为 vnode，mnode 共享。以保证服务器 crash 或其他故障，能从 WAL 中恢复数据。
-
-每个 vnode/mnode 模块实例会有一对应的 WAL 模块实例，是完全一一对应的。WAL 的落盘操作由两个参数 walLevel，fsync 控制。看具体场景，如果要 100% 保证数据不会丢失，需要将 walLevel 配置为 2，fsync 设置为 0，每条数据插入请求，都会实时落盘后，才会给应用确认
-
-## HTTP 模块
-
-该模块负责处理系统对外的 RESTful 接口，可以通过配置，由 dnode 启动或停止 。（仅 2.2 及之前的版本中存在）
-
-该模块将接收到的 RESTful 请求，做了各种合法性检查后，将其变成标准的 SQL 语句，通过 taosc 的异步接口，将请求发往整个系统中的任一 dnode 。收到处理后的结果后，再翻译成 HTTP 协议，返回给应用。
-
-如果 HTTP 模块启动，就意味着启动了一个 taosc 的实例。任一一个 dnode 都可以启动该模块，以实现对 RESTful 请求的分布式处理。
-
-## Monitor 模块
-
-该模块负责检测一个 dnode 的运行状态，可以通过配置，由 dnode 启动或停止。原则上，每个 dnode 都应该启动一个 monitor 实例。
-
-Monitor 采集 TDengine 里的关键操作，比如创建、删除、更新账号、表、库等，而且周期性的收集 CPU、内存、网络等资源的使用情况（采集周期由系统配置参数 monitorInterval 控制）。获得这些数据后，monitor 模块将采集的数据写入系统的日志库（DB 名字由系统配置参数 monitorDbName 控制）。
-
-Monitor 模块使用 taosc 来将采集的数据写入系统，因此每个 monitor 实例，都有一个 taosc 运行实例。
--- a/docs-cn/21-tdinternal/12-tsz-compress.md
+++ b/docs-cn/21-tdinternal/12-tsz-compress.md
---
-title: TSZ 压缩算法
---
-
-TSZ 压缩算法是 TDengine 为浮点数据类型提供更加丰富的压缩功能，可以实现浮点数的有损至无损全状态压缩，相比原来在 TDengine 中原有压缩算法，TSZ 压缩算法压缩选项更丰富，压缩率更高，即使切到无损状态下对浮点数压缩，压缩率也会比原来的压缩算法高一倍。
-
-## 适合场景
-
-TSZ 压缩算法压缩率比原来的要高，但压缩时间会更长，即开启 TSZ 压缩算法写入速度会有一些下降，通常情况下会有 20% 左右的下降。影响写入速度是因为需要更多的 CPU 计算，所以从原始数据到压缩好数据的交付时间变长，导致写入速度变慢。如果您的服务器 CPU 配置很高的话，这个影响会变小甚至没有。
-
-另外如果设备产生了大量的高精度浮点数，存储占用的空间非常庞大，但实际使用并不需要那么高的精度时，可以通过 TSZ 压缩的有损压缩功能，把精度压缩至指定的长度，节约存储空间。
-
-总结：采集到了大量浮点数，存储时占用空间过大或出有存储空间不足，需要超高压缩率的场景。
-
-## 使用步骤
-
- 检查版本支持，2.4.0.10 及之后 TDengine 的版本都支持此功能
-
- 配置选项开启功能，在 TDengine 的配置文件 taos.cfg 增加一行以下内容，打开 TSZ 功能
-
-```TSZ
-lossyColumns     float|double
-```
-
- 根据自己需要配置其它选项，如果不配置都会按默认值处理。
-
- 重启服务，配置生效。
- 确认功能已开启，在服务启动过程中输出的信息如果有前面配置的内容，表明功能已生效：
-
-```TSZ Test
-02/22 10:49:27.607990 00002933 UTL  lossyColumns     float|double
-```
-
-## 注意事项
-
- 确认版本是否支持
-
- 除了服务器启动时的输出的配置成功信息外，不再会有其它的信息输出是使用的哪种压缩算法，可以通过配置前后数据库文件大小来比较效果
-
- 如果浮点数类型列较少，看整体数据文件大小效果会不太明显
-
- 此压缩产生的数据文件中浮点数据部分将不能被 2.4.0.10 以下的版本解析，即不向下兼容，使用时避免更换回旧版本，以免数据不能被读取出来。
-
- 在使用过程中允许反复开启和关闭 TSZ 压缩选项的操作，前后两种压缩算法产生的数据都能正常读取。
--- a/docs-cn/21-tdinternal/30-iot-big-data.md
+++ b/docs-cn/21-tdinternal/30-iot-big-data.md
---
-title: 物联网大数据
-description: "物联网、工业互联网大数据的特点；物联网大数据平台应具备的功能和特点；通用大数据架构为什么不适合处理物联网数据；物联网、车联网、工业互联网大数据平台，为什么推荐使用 TDengine"
---
-
- [物联网、工业互联网大数据的特点](https://www.taosdata.com/blog/2019/07/09/105.html)
- [物联网大数据平台应具备的功能和特点](https://www.taosdata.com/blog/2019/07/29/542.html)
- [通用大数据架构为什么不适合处理物联网数据？](https://www.taosdata.com/blog/2019/07/09/107.html)
- [物联网、车联网、工业互联网大数据平台，为什么推荐使用 TDengine？](https://www.taosdata.com/blog/2019/07/09/109.html)
--- a/docs-cn/27-train-faq/02-video.mdx
+++ b/docs-cn/27-train-faq/02-video.mdx
---
-title: 视频教程
---
-
-## 技术公开课
-
- [技术公开课：开源、高效的物联网大数据平台，TDengine 内核技术剖析](https://www.taosdata.com/blog/2020/12/25/2126.html)
-
-## 视频教程
-
- [TDengine 视频教程 - 快速上手](https://www.taosdata.com/blog/2020/11/11/1941.html)
- [TDengine 视频教程 - 数据建模](https://www.taosdata.com/blog/2020/11/11/1945.html)
- [TDengine 视频教程 - 集群搭建](https://www.taosdata.com/blog/2020/11/11/1961.html)
- [TDengine 视频教程 - Go Connector](https://www.taosdata.com/blog/2020/11/11/1951.html)
- [TDengine 视频教程 - JDBC Connector](https://www.taosdata.com/blog/2020/11/11/1955.html)
- [TDengine 视频教程 - Node.js Connector](https://www.taosdata.com/blog/2020/11/11/1957.html)
- [TDengine 视频教程 - Python Connector](https://www.taosdata.com/blog/2020/11/11/1963.html)
- [TDengine 视频教程 - RESTful Connector](https://www.taosdata.com/blog/2020/11/11/1965.html)
- [TDengine 视频教程 - “零”代码运维监控](https://www.taosdata.com/blog/2020/11/11/1959.html)
-
-## 微课堂
-
-关注 TDengine 视频号， 有精心制作的微课堂。
-
-<img src="/img/shi-pin-hao.png" width={350} />
--- a/docs-en/01-index.md
+++ b/docs-en/01-index.md
@@ -4,24 +4,24 @@ sidebar_label: Documentation Home
 slug: /
 ---

-TDengine is a [high-performance](https://tdengine.com/fast), [scalable](https://tdengine.com/scalable) time series database with [SQL support](https://tdengine.com/sql-support). This document is the TDengine user manual. It introduces the basic concepts, installation, features, SQL, APIs, operation, maintenance, kernel design, etc. It’s written mainly for architects, developers and system administrators.
+TDengine is a [high-performance](https://tdengine.com/fast), [scalable](https://tdengine.com/scalable) time series database with [SQL support](https://tdengine.com/sql-support). This document is the TDengine user manual. It introduces the basic, as well as novel concepts, in TDengine, and also talks in detail about installation, features, SQL, APIs, operation, maintenance, kernel design and other topics. It’s written mainly for architects, developers and system administrators.

-To get a global view about TDengine, like feature list, benchmarks, and competitive advantages, please browse through section [Introduction](./intro).
+To get an overview of TDengine, such as a feature list, benchmarks, and competitive advantages, please browse through the [Introduction](./intro) section.

-TDengine makes full use of the characteristics of time series data, proposes the concepts of "one table for one data collection point" and "super table", and designs an innovative storage engine, which greatly improves the efficiency of data ingestion, querying and storage. To understand the new concepts and use TDengine in the right way, please read [“Concepts”](./concept) thoroughly.
+TDengine greatly improves the efficiency of data ingestion, querying and storage by exploiting the characteristics of time series data, introducing the novel concepts of "one table for one data collection point" and "super table", and designing an innovative storage engine. To understand the new concepts in TDengine and make full use of the features and capabilities of TDengine, please read [“Concepts”](./concept) thoroughly.

-If you are a developer, please read the [“Developer Guide”](./develop) carefully. This section introduces the database connection, data modeling, data ingestion, query, continuous query, cache, data subscription, user-defined function, etc. in detail. Sample code is provided for a variety of programming languages. In most cases, you can just copy and paste the sample code, make a few changes to accommodate your application, and it will work.
+If you are a developer, please read the [“Developer Guide”](./develop) carefully. This section introduces the database connection, data modeling, data ingestion, query, continuous query, cache, data subscription, user-defined functions, and other functionality in detail. Sample code is provided for a variety of programming languages. In most cases, you can just copy and paste the sample code, make a few changes to accommodate your application, and it will work.

-We live in the era of big data, and scale-up is unable to meet the growing business needs. Any modern data system must have the ability to scale out, and clustering has become an indispensable feature of big data systems. The TDengine team has not only developed the cluster feature, they also decided to open source this important feature. To learn how to deploy, manage and maintain a TDengine cluster please refer to ["Cluster"](./cluster).
+We live in the era of big data, and scale-up is unable to meet the growing needs of business. Any modern data system must have the ability to scale out, and clustering has become an indispensable feature of big data systems. Not only did the TDengine team develop the cluster feature, but also decided to open source this important feature. To learn how to deploy, manage and maintain a TDengine cluster please refer to ["cluster"](./cluster).

-TDengine uses SQL as its query language, which greatly reduces learning costs and migration costs. In addition to the standard SQL, TDengine has extensions to support time series data scenarios better, such as roll up, interpolation, time weighted average, etc. The ["SQL Reference"](./taos-sql) chapter describes the SQL syntax in detail, and lists the various supported commands and functions.
+TDengine uses ubiquitious SQL as its query language, which greatly reduces learning costs and migration costs. In addition to the standard SQL, TDengine has extensions to better support time series data analysis. These extensions include functions such as roll up, interpolation and time weighted average, among many others. The ["SQL Reference"](./taos-sql) chapter describes the SQL syntax in detail, and lists the various supported commands and functions.

-If you are a system administrator who cares about installation, upgrade, fault tolerance, disaster recovery, data import, data export, system configuration, how to monitor whether TDengine is running healthily, and how to improve system performance, please refer to the ["Administration"](./operation) thoroughly.
+If you are a system administrator who cares about installation, upgrade, fault tolerance, disaster recovery, data import, data export, system configuration, how to monitor whether TDengine is running healthily, and how to improve system performance, please refer to, and thoroughly read the ["Administration"](./operation) section.

-If you want to know more about TDengine tools, REST API, and connectors for various programming languages, please see the ["Reference"](./reference) chapter.
+If you want to know more about TDengine tools, the REST API, and connectors for various programming languages, please see the ["Reference"](./reference) chapter.

 If you are very interested in the internal design of TDengine, please read the chapter ["Inside TDengine”](./tdinternal), which introduces the cluster design, data partitioning, sharding, writing, and reading processes in detail. If you want to study TDengine code or even contribute code, please read this chapter carefully.

-TDengine is an open source database, you are welcome to be a part of TDengine. If you find any errors in the documentation, or the description is not clear, please click "Edit this page" at the bottom of each page to edit it directly.
+TDengine is an open source database, and we would love for you to be a part of TDengine. If you find any errors in the documentation, or see parts where more clarity or elaboration is needed, please click "Edit this page" at the bottom of each page to edit it directly.

 Together, we make a difference.
--- a/docs-en/02-intro/index.md
+++ b/docs-en/02-intro/index.md
@@ -5,39 +5,39 @@ toc_max_heading_level: 2

 TDengine is a high-performance, scalable time-series database with SQL support. Its code, including its cluster feature is open source under GNU AGPL v3.0. Besides the database engine, it provides [caching](/develop/cache), [stream processing](/develop/continuous-query), [data subscription](/develop/subscribe)  and other functionalities to reduce the complexity and cost of development and operation.

-This section introduces the major features, competitive advantages, suited scenarios and benchmarks to help you get a high level picture for TDengine.
+This section introduces the major features, competitive advantages, typical use-cases and benchmarks to help you get a high level overview of TDengine.

 ## Major Features

 The major features are listed below:

-1. Besides [using SQL to insert](/develop/insert-data/sql-writing)，it supports [Schemaless writing](/reference/schemaless/)，and it supports [InfluxDB LINE](/develop/insert-data/influxdb-line)，[OpenTSDB Telnet](/develop/insert-data/opentsdb-telnet), [OpenTSDB JSON ](/develop/insert-data/opentsdb-json) and other protocols.
-2. Support for seamless integration with third-party data collection agents like [Telegraf](/third-party/telegraf)，[Prometheus](/third-party/prometheus)，[StatsD](/third-party/statsd)，[collectd](/third-party/collectd)，[icinga2](/third-party/icinga2), [TCollector](/third-party/tcollector), [EMQX](/third-party/emq-broker), [HiveMQ](/third-party/hive-mq-broker). Without a line of code, those agents can write data points into TDengine just by configuration. 
-3. Support for [all kinds of queries](/develop/query-data), including aggregation, nested query, downsampling, interpolation, etc.
-4. Support for [user defined functions](/develop/udf)
+1. While TDengine supports [using SQL to insert](/develop/insert-data/sql-writing), it also supports [Schemaless writing](/reference/schemaless/) just like NoSQL databases. TDengine also supports standard protocols like [InfluxDB LINE](/develop/insert-data/influxdb-line)，[OpenTSDB Telnet](/develop/insert-data/opentsdb-telnet), [OpenTSDB JSON ](/develop/insert-data/opentsdb-json) among others.
+2. TDengine supports seamless integration with third-party data collection agents like [Telegraf](/third-party/telegraf)，[Prometheus](/third-party/prometheus)，[StatsD](/third-party/statsd)，[collectd](/third-party/collectd)，[icinga2](/third-party/icinga2), [TCollector](/third-party/tcollector), [EMQX](/third-party/emq-broker), [HiveMQ](/third-party/hive-mq-broker). These agents can write data into TDengine with simple configuration and without a single line of code. 
+3. Support for [all kinds of queries](/develop/query-data), including aggregation, nested query, downsampling, interpolation and others.
+4. Support for [user defined functions](/develop/udf).
 5. Support for [caching](/develop/cache). TDengine always saves the last data point in cache, so Redis is not needed in some scenarios.
 6. Support for [continuous query](/develop/continuous-query).
 7. Support for [data subscription](/develop/subscribe) with the capability to specify filter conditions.
 8. Support for [cluster](/cluster/), with the capability of increasing processing power by adding more nodes. High availability is supported by replication. 
-9. Provides interactive [command-line interface](/reference/taos-shell) for management, maintenance and ad-hoc query.
+9. Provides an interactive [command-line interface](/reference/taos-shell) for management, maintenance and ad-hoc queries.
 10. Provides many ways to [import](/operation/import) and [export](/operation/export) data.
-11. Provides [monitoring](/operation/monitor) on TDengine running instances.
+11. Provides [monitoring](/operation/monitor) on running instances of TDengine.
 12. Provides [connectors](/reference/connector/) for [C/C++](/reference/connector/cpp), [Java](/reference/connector/java), [Python](/reference/connector/python), [Go](/reference/connector/go), [Rust](/reference/connector/rust), [Node.js](/reference/connector/node) and other programming languages.
 13. Provides a [REST API](/reference/rest-api/).
-14. Supports the seamless integration with [Grafana](/third-party/grafana) for visualization.
+14. Supports seamless integration with [Grafana](/third-party/grafana) for visualization.
 15. Supports seamless integration with Google Data Studio.

-For more detail on features, please read through the whole documentation. 
+For more details on features, please read through the entire documentation. 

 ## Competitive Advantages

-TDengine makes full use of [the characteristics of time series data](https://tdengine.com/2019/07/09/86.html), such as structured, no transaction, rarely delete or update, etc., and builds its own innovative storage engine and computing engine to differentiate itself from other time series databases with the following advantages.
+Time-series data is structured, not transactional, and is rarely deleted or updated. TDengine makes full use of [these characteristics of time series data](https://tdengine.com/2019/07/09/86.html) to build its own innovative storage engine and computing engine to differentiate itself from other time series databases, with the following advantages.

- **[High Performance](https://tdengine.com/fast)**: TDengine outperforms other time series databases in data ingestion and querying while significantly reducing storage cost and compute costs, with an innovatively designed and purpose-built storage engine.
+- **[High Performance](https://tdengine.com/fast)**: With an innovatively designed and purpose-built storage engine, TDengine outperforms other time series databases in data ingestion and querying while significantly reducing storage costs and compute costs.

 - **[Scalable](https://tdengine.com/scalable)**: TDengine provides out-of-box scalability and high-availability through its native distributed design. Nodes can be added through simple configuration to achieve greater data processing power. In addition, this feature is open source.

- **[SQL Support](https://tdengine.com/sql-support)**: TDengine uses SQL as the query language, thereby reducing learning and migration costs, while adding SQL extensions to handle time-series data better, and supporting convenient and flexible schemaless data ingestion.
+- **[SQL Support](https://tdengine.com/sql-support)**: TDengine uses SQL as the query language, thereby reducing learning and migration costs, while adding SQL extensions to better handle time-series. Keeping NoSQL developers in mind, TDengine also supports convenient and flexible, schemaless data ingestion.

 - **All in One**: TDengine has built-in caching, stream processing and data subscription functions. It is no longer necessary to integrate Kafka/Redis/HBase/Spark or other software in some scenarios. It makes the system architecture much simpler, cost-effective and easier to maintain.

@@ -45,24 +45,24 @@ TDengine makes full use of [the characteristics of time series data](https://tde

 - **Zero Management**: Installation and cluster setup can be done in seconds. Data partitioning and sharding are executed automatically. TDengine’s running status can be monitored via Grafana or other DevOps tools.

- **Zero Learning Costs**: With SQL as the query language and support for ubiquitous tools like Python, Java, C/C++, Go, Rust, and Node.js connectors, there are zero learning costs.
+- **Zero Learning Costs**: With SQL as the query language and support for ubiquitous tools like Python, Java, C/C++, Go, Rust, and Node.js connectors, and a REST API, there are zero learning costs.

- **Interactive Console**: TDengine provides convenient console access to the database to run ad hoc queries, maintain the database, or manage the cluster without any programming.
+- **Interactive Console**: TDengine provides convenient console access to the database, through a CLI, to run ad hoc queries, maintain the database, or manage the cluster, without any programming.

-With TDengine, the total cost of ownership of time-series data platform can be greatly reduced. Because 1: with its superior performance, the computing and storage resources are reduced significantly; 2：with SQL support, it can be seamlessly integrated with many third party tools, and learning costs/migration costs are reduced significantly; 3: with its simple architecture and zero management, the operation and maintenance costs are reduced. 
+With TDengine, the total cost of ownership of your time-series data platform can be greatly reduced. 1: With its superior performance, the computing and storage resources are reduced significantly 2: With SQL support, it can be seamlessly integrated with many third party tools, and learning costs/migration costs are reduced significantly 3: With its simple architecture and zero management, the operation and maintenance costs are reduced. 

 ## Technical Ecosystem
-In the time-series data processing platform, TDengine stands in a role like this diagram below:
+This is how TDengine would be situated, in a typical time-series data processing platform:

 ![TDengine Technical Ecosystem ](eco_system.png)

 <center>Figure 1. TDengine Technical Ecosystem</center>

-On the left side, there are data collection agents like OPC-UA, MQTT, Telegraf and Kafka. On the right side, visualization/BI tools, HMI, Python/R, and IoT Apps can be connected. TDengine itself provides interactive command-line interface and web interface for management and maintenance.
+On the left-hand side, there are data collection agents like OPC-UA, MQTT, Telegraf and Kafka. On the right-hand side, visualization/BI tools, HMI, Python/R, and IoT Apps can be connected. TDengine itself provides an interactive command-line interface and a web interface for management and maintenance.

-## Suited Scenarios
+## Typical Use Cases

-As a high-performance, scalable and SQL supported time-series database, TDengine's typical application scenarios include but are not limited to IoT, Industrial Internet, Connected Vehicles, IT operation and maintenance, energy, financial markets and other fields. TDengine is a purpose-built database optimized for the characteristics of time series data, it cannot be used to process data from web crawlers, social media, e-commerce, ERP, CRM, etc. This section makes a more detailed analysis of the applicable scenarios.
+As a high-performance, scalable and SQL supported time-series database, TDengine's typical use case include but are not limited to IoT, Industrial Internet, Connected Vehicles, IT operation and maintenance, energy, financial markets and other fields. TDengine is a purpose-built database optimized for the characteristics of time series data. As such, it cannot be used to process data from web crawlers, social media, e-commerce, ERP, CRM and so on. More generally TDengine is not a suitable storage engine for non-time-series data. This section makes a more detailed analysis of the applicable scenarios.

 ### Characteristics and Requirements of Data Sources


--- a/docs-en/04-concept/index.md
+++ b/docs-en/04-concept/index.md
@@ -2,7 +2,7 @@
 title: Concepts
 ---

-In order to explain the basic concepts and provide some sample code, the TDengine documentation takes smart meters as a typical time series data scenario. Assuming that each smart meter collects three metrics of current, voltage, and phase, there are multiple smart meters, and each meter has static attributes like location and group ID, the collected data will be similar to the following table:
+In order to explain the basic concepts and provide some sample code, the TDengine documentation smart meters as a typical time series use case. We assume the following: 1. Each smart meter collects three metrics i.e. current, voltage, and phase 2. There are multiple smart meters, and 3. Each meter has static attributes like location and group ID. Based on this, collected data will look similar to the following table:

 <div className="center-table">
 <table>
@@ -29,7 +29,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
 <td>10.3</td>
 <td>219</td>
 <td>0.31</td>
-<td>Beijing.Chaoyang</td>
+<td>San Jose</td>
 <td>2</td>
 </tr>
 <tr>
@@ -38,7 +38,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
 <td>10.2</td>
 <td>220</td>
 <td>0.23</td>
-<td>Beijing.Chaoyang</td>
+<td>San Jose</td>
 <td>3</td>
 </tr>
 <tr>
@@ -47,7 +47,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
 <td>11.5</td>
 <td>221</td>
 <td>0.35</td>
-<td>Beijing.Haidian</td>
+<td>Mountain View</td>
 <td>3</td>
 </tr>
 <tr>
@@ -56,7 +56,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
 <td>13.4</td>
 <td>223</td>
 <td>0.29</td>
-<td>Beijing.Haidian</td>
+<td>Mountain View</td>
 <td>2</td>
 </tr>
 <tr>
@@ -65,7 +65,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
 <td>12.6</td>
 <td>218</td>
 <td>0.33</td>
-<td>Beijing.Chaoyang</td>
+<td>San Jose</td>
 <td>2</td>
 </tr>
 <tr>
@@ -74,7 +74,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
 <td>11.8</td>
 <td>221</td>
 <td>0.28</td>
-<td>Beijing.Haidian</td>
+<td>Mountain View</td>
 <td>2</td>
 </tr>
 <tr>
@@ -83,7 +83,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
 <td>10.3</td>
 <td>218</td>
 <td>0.25</td>
-<td>Beijing.Chaoyang</td>
+<td>San Jose</td>
 <td>3</td>
 </tr>
 <tr>
@@ -92,7 +92,7 @@ In order to explain the basic concepts and provide some sample code, the TDengin
 <td>12.3</td>
 <td>221</td>
 <td>0.31</td>
-<td>Beijing.Chaoyang</td>
+<td>San Jose</td>
 <td>2</td>
 </tr>
 </tbody>
@@ -112,7 +112,7 @@ Label/Tag refers to the static properties of sensors, equipment or other types o

 ## Data Collection Point

-Data Collection Point (DCP) refers to hardware or software that collects metrics based on preset time periods or triggered by events. A data collection point can collect one or multiple metrics, but these metrics are collected at the same time and have the same time stamp. For some complex equipments, there are often multiple data collection points, and the sampling rate of each collection point may be different, and fully independent. For example, for a car, there could be a data collection point to collect GPS position metrics, a data collection point to collect engine status metrics, and a data collection point to collect the environment metrics inside the car, so in this example the car would have three data collection points.
+Data Collection Point (DCP) refers to hardware or software that collects metrics based on preset time periods or triggered by events. A data collection point can collect one or multiple metrics, but these metrics are collected at the same time and have the same time stamp. For some complex equipment, there are often multiple data collection points, and the sampling rate of each collection point may be different, and fully independent. For example, for a car, there could be a data collection point to collect GPS position metrics, a data collection point to collect engine status metrics, and a data collection point to collect the environment metrics inside the car. So in this example the car would have three data collection points.

 ## Table

@@ -122,10 +122,10 @@ To make full use of time-series data characteristics, TDengine adopts a strategy

 1. Since the metric data from different DCP are fully independent, the data source of each DCP is unique, and a table has only one writer. In this way, data points can be written in a lock-free manner, and the writing speed can be greatly improved.
 2. For a DCP, the metric data generated by DCP is ordered by timestamp, so the write operation can be implemented by simple appending, which further greatly improves the data writing speed.
-3. The metric data from a DCP is continuously stored in block by block. If you read data for a period of time, it can greatly reduce random read operations and improve read and query performance by orders of magnitude.
-4. Inside a data block for a DCP, columnar storage is used, and different compression algorithms are used for different data types. Metrics generally don't vary as significantly between themselves over a time range as compared to other metrics, this allows for a higher compression rate.
+3. The metric data from a DCP is continuously stored, block by block. If you read data for a period of time, it can greatly reduce random read operations and improve read and query performance by orders of magnitude.
+4. Inside a data block for a DCP, columnar storage is used, and different compression algorithms are used for different data types. Metrics generally don't vary as significantly between themselves over a time range as compared to other metrics, which allows for a higher compression rate.

-If the metric data of multiple DCPs are traditionally written into a single table, due to the uncontrollable network delay, the timing of the data from different DCPs arriving at the server cannot be guaranteed, the writing operation must be protected by locks, and the metric data from one DCP cannot be guaranteed to be continuously stored together. **One table for one data collection point can ensure the best performance of insert and query of a single data collection point to the greatest extent.**
+If the metric data of multiple DCPs are traditionally written into a single table, due to uncontrollable network delays, the timing of the data from different DCPs arriving at the server cannot be guaranteed, write operations must be protected by locks, and metric data from one DCP cannot be guaranteed to be continuously stored together. **One table for one data collection point can ensure the best performance of insert and query of a single data collection point to the greatest possible extent.**

 TDengine suggests using DCP ID as the table name (like D1001 in the above table). Each DCP may collect one or multiple metrics (like the current, voltage, phase as above). Each metric has a corresponding column in the table. The data type for a column can be int, float, string and others. In addition, the first column in the table must be a timestamp. TDengine uses the time stamp as the index, and won’t build the index on any metrics stored. Column wise storage is used.

@@ -139,7 +139,7 @@ In the design of TDengine, **a table is used to represent a specific data collec

 ## Subtable

-When creating a table for a specific data collection point, the user can use a STable as a template and specifies the tag values of this specific DCP to create it. **The table created by using a STable as the template is called subtable** in TDengine. The difference between regular table and subtable is: 
+When creating a table for a specific data collection point, the user can use a STable as a template and specify the tag values of this specific DCP to create it. **The table created by using a STable as the template is called subtable** in TDengine. The difference between regular table and subtable is: 
 1. Subtable is a table, all SQL commands applied on a regular table can be applied on subtable.
 2. Subtable is a table with extensions, it has static tags (labels), and these tags can be added, deleted, and updated after it is created. But a regular table does not have tags.
 3. A subtable belongs to only one STable, but a STable may have many subtables. Regular tables do not belong to a STable.
@@ -151,7 +151,7 @@ The relationship between a STable and the subtables created based on this STable
 2. The schema of metrics or labels cannot be adjusted through subtables, and it can only be changed via STable. Changes to the schema of a STable takes effect immediately for all associated subtables.
 3. STable defines only one template and does not store any data or label information by itself. Therefore, data cannot be written to a STable, only to subtables.

-Queries can be executed on both a table (subtable) and a STable. For a query on a STable, TDengine will treat the data in all its subtables as a whole data set for processing. TDengine will first find the subtables that meet the tag filter conditions, then scan the time-series data of these subtables to perform aggregation operation, which can greatly reduce the data sets to be scanned, thus greatly improving the performance of data aggregation across multiple DCPs.
+Queries can be executed on both a table (subtable) and a STable. For a query on a STable, TDengine will treat the data in all its subtables as a whole data set for processing. TDengine will first find the subtables that meet the tag filter conditions, then scan the time-series data of these subtables to perform aggregation operation, which reduces the number of data sets to be scanned which in turn greatly improves the performance of data aggregation across multiple DCPs.

 In TDengine, it is recommended to use a subtable instead of a regular table for a DCP. 

@@ -167,4 +167,4 @@ FQDN (Fully Qualified Domain Name) is the full domain name of a specific compute

 Each node of a TDengine cluster is uniquely identified by an End Point, which consists of an FQDN and a Port, such as h1.tdengine.com:6030. In this way, when the IP changes, we can still use the FQDN to dynamically find the node without changing any configuration of the cluster. In addition, FQDN is used to facilitate unified access to the same cluster from the Intranet and the Internet.

-TDengine does not recommend using an IP address to access the cluster, FQDN is recommended for cluster management.
+TDengine does not recommend using an IP address to access the cluster. FQDN is recommended for cluster management.
--- a/docs-en/05-get-started/index.md
+++ b/docs-en/05-get-started/index.md
@@ -10,7 +10,7 @@ import AptGetInstall from "./\_apt_get_install.mdx";

 ## Quick Install

-The full package of TDengine includes the server(taosd), taosAdapter for connecting with third-party systems and providing a RESTful interface, client driver(taosc), command-line program(CLI, taos) and some tools. For the current version, the server taosd and taosAdapter can only be installed and run on Linux systems. In the future taosd and taosAdapter will also be supported on Windows, macOS and other systems. The client driver taosc and TDengine CLI can be installed and run on Windows or Linux. In addition to the connectors of multiple languages, [RESTful interface](/reference/rest-api) is also provided by [taosAdapter](/reference/taosadapter) in TDengine. Prior to version 2.4.0.0, however, there is no taosAdapter, the RESTful interface is provided by the built-in HTTP service of taosd.
+The full package of TDengine includes the server(taosd), taosAdapter for connecting with third-party systems and providing a RESTful interface, client driver(taosc), command-line program(CLI, taos) and some tools. For the current version, the server taosd and taosAdapter can only be installed and run on Linux systems. In the future taosd and taosAdapter will also be supported on Windows, macOS and other systems. The client driver taosc and TDengine CLI can be installed and run on Windows or Linux. In addition to connectors for multiple languages, TDengine also provides a [RESTful interface](/reference/rest-api) through [taosAdapter](/reference/taosadapter). Prior to version 2.4.0.0, taosAdapter did not exist and the RESTful interface was provided by the built-in HTTP service of taosd.

 TDengine supports X64/ARM64/MIPS64/Alpha64 hardware platforms, and will support ARM32, RISC-V and other CPU architectures in the future.


--- a/docs-en/07-develop/01-connect/index.md
+++ b/docs-en/07-develop/01-connect/index.md
 ---
 sidebar_label: Connection
 title: Connect to TDengine
-description: "This document explains how to establish connection to TDengine, and briefly introduce how to install and use TDengine connectors."
+description: "This document explains how to establish connections to TDengine, and briefly introduces how to install and use TDengine connectors."
 ---

 import Tabs from "@theme/Tabs";
@@ -19,25 +19,24 @@ import InstallOnLinux from "../../14-reference/03-connector/\_windows_install.md
 import VerifyLinux from "../../14-reference/03-connector/\_verify_linux.mdx";
 import VerifyWindows from "../../14-reference/03-connector/\_verify_windows.mdx";

-Any application programs running on any kind of platforms can access TDengine through the REST API provided by TDengine. For the details, please refer to [REST API](/reference/rest-api/). Besides, application programs can use the connectors of multiple programming languages to access TDengine, including C/C++, Java, Python, Go, Node.js, C#, and Rust. This chapter describes how to establish connection to TDengine and briefly introduce how to install and use connectors. For details about the connectors, please refer to [Connectors](/reference/connector/)
+Any application programs running on any kind of platform can access TDengine through the REST API provided by TDengine. For details, please refer to [REST API](/reference/rest-api/). Additionally, application programs can use the connectors of multiple programming languages including C/C++, Java, Python, Go, Node.js, C#, and Rust to access TDengine. This chapter describes how to establish a connection to TDengine and briefly introduces how to install and use connectors. For details about the connectors, please refer to [Connectors](/reference/connector/)

 ## Establish Connection

 There are two ways for a connector to establish connections to TDengine:

-1. Connection through the REST API provided by taosAdapter component, this way is called "REST connection" hereinafter.
+1. Connection through the REST API provided by the taosAdapter component, this way is called "REST connection" hereinafter.
 2. Connection through the TDengine client driver (taosc), this way is called "Native connection" hereinafter.

-Either way, same or similar APIs are provided by connectors to access database or execute SQL statements, no obvious difference can be observed.
-
 Key differences：

-1. With REST connection, it's not necessary to install TDengine client driver (taosc), it's more friendly for cross-platform with the cost of 30% performance downgrade. When taosc has an upgrade, application does not need to make changes. 
-2. With native connection, full compatibility of TDengine can be utilized, like [Parameter Binding](/reference/connector/cpp#parameter-binding-api), [Subscription](/reference/connector/cpp#subscription-and-consumption-api), etc. But taosc has to be installed, some platforms may not be supported.
+1. The TDengine client driver (taosc) has the highest performance with all the features of TDengine like [Parameter Binding](/reference/connector/cpp#parameter-binding-api), [Subscription](/reference/connector/cpp#subscription-and-consumption-api), etc.
+2. The TDengine client driver (taosc) is not supported across all platforms, and applications built on taosc may need to be modified when updating taosc to newer versions.
+3. The REST connection is more accessible with cross-platform support, however it results in a 30% performance downgrade.

 ## Install Client Driver taosc

-If choosing to use native connection and the application is not on the same host as TDengine server, TDengine client driver taosc needs to be installed on the host where the application is. If choosing to use REST connection or the application is on the same host as server side, this step can be skipped. It's better to use same version of taosc as the server.
+If you are choosing to use the native connection and the the application is not on the same host as TDengine server, the TDengine client driver taosc needs to be installed on the application host. If choosing to use the REST connection or the application is on the same host as TDengine server, this step can be skipped. It's better to use same version of taosc as the TDengine server.

 ### Install


--- a/docs-en/07-develop/02-model/index.mdx
+++ b/docs-en/07-develop/02-model/index.mdx
@@ -2,11 +2,11 @@
 title: Data Model
 ---

-The data model employed by TDengine is similar to relational database, you need to create databases and tables. For a specific application, the design of databases, STables (abbreviated for super table), and tables need to be considered. This chapter will explain the big picture without syntax details.
+The data model employed by TDengine is similar to a relational database, you need to create databases and tables. Design the data model based on your own application scenarios and you should design the STable (abbreviation for super table) schema to fit your data. This chapter will explain the big picture without getting into syntax details.

 ## Create Database

-The characteristics of data from different data collection points may be different, such as collection frequency, days to keep, number of replicas, data block size, whether it's allowed to update data, etc. For TDengine to operate with the best performance, it's strongly suggested to put the data with different characteristics into different databases because different storage policy can be set for each database. When creating a database, there are a lot of parameters that can be configured, such as the days to keep data, the number of replicas, the number of memory blocks, time precision, the minimum and maximum number of rows in each data block, compress or not, the time range of the data in single data file, etc. Below is an example of the SQL statement for creating a database.
+The characteristics of data from different data collection points may be different, such as collection frequency, days to keep, number of replicas, data block size, whether it's allowed to update data, etc. For TDengine to operate with the best performance, it's strongly suggested to put the data with different characteristics into different databases because different storage policies can be set for each database. When creating a database, there are a lot of parameters that can be configured, such as the days to keep data, the number of replicas, the number of memory blocks, time precision, the minimum and maximum number of rows in each data block, compress or not, the time range of the data in single data file, etc. Below is an example of the SQL statement for creating a database.

 ```sql
 CREATE DATABASE power KEEP 365 DAYS 10 BLOCKS 6 UPDATE 1;
@@ -14,7 +14,7 @@ CREATE DATABASE power KEEP 365 DAYS 10 BLOCKS 6 UPDATE 1;

 In the above SQL statement, a database named "power" will be created, the data in it will be kept for 365 days, which means the data older than 365 days will be deleted automatically, a new data file will be created every 10 days, the number of memory blocks is 6, data is allowed to be updated. For more details please refer to [Database](/taos-sql/database).

-After creating a database, the current database in use can be switched using SQL command `USE`, for example below SQL statement switches the current database to `power`. Without current database specified, table name must be preceded with the corresponding database name.
+After creating a database, the current database in use can be switched using SQL command `USE`, for example below SQL statement switches the current database to `power`. Without the current database specified, table name must be preceded with the corresponding database name.

 ```sql
 USE power;
@@ -23,14 +23,14 @@ USE power;
 :::note

 - Any table or STable must belong to a database. To create a table or STable, the database it belongs to must be ready.
- JOIN operation can't be performed tables from two different databases.
+- JOIN operations can't be performed on tables from two different databases.
 - Timestamp needs to be specified when inserting rows or querying historical rows.

 :::

 ## Create STable

-In a time-series application, there may be multiple kinds of data collection points. For example, in the electrical power system there are meters, transformers, bus bars, switches, etc. For easy and efficient aggregation of multiple tables, one STable needs to be created for each kind of data collection point. For example, for the meters in [table 1](/tdinternal/arch#model_table1), below SQL statement can be used to create the super table.
+In a time-series application, there may be multiple kinds of data collection points. For example, in the electrical power system there are meters, transformers, bus bars, switches, etc. For easy and efficient aggregation of multiple tables, one STable needs to be created for each kind of data collection point. For example, for the meters in [table 1](/tdinternal/arch#model_table1), the below SQL statement can be used to create the super table.

 ```sql
 CREATE STable meters (ts timestamp, current float, voltage int, phase float) TAGS (location binary(64), groupId int);
@@ -41,11 +41,11 @@ If you are using versions prior to 2.0.15, the `STable` keyword needs to be repl

 :::

-Similar to creating a regular table, when creating a STable, name and schema need to be provided too. In the STable schema, the first column must be timestamp (like ts in the example), and other columns (like current, voltage and phase in the example) are the data collected. The type of a column can be integer, float, double, string ,etc. Besides, the schema for tags need to be provided, like location and groupId in the example. The type of a tag can be integer, float, string, etc. The static properties of a data collection point can be defined as tags, like the location, device type, device group ID, manager ID, etc. Tags in the schema can be added, removed or updated. Please refer to [STable](/taos-sql/stable) for more details.
+Similar to creating a regular table, when creating a STable, the name and schema need to be provided. In the STable schema, the first column must be timestamp (like ts in the example), and the other columns (like current, voltage and phase in the example) are the data collected. The column type can be integer, float, double, string ,etc. Besides, the schema for tags need to be provided, like location and groupId in the example. The tag type can be integer, float, string, etc. The static properties of a data collection point can be defined as tags, like the location, device type, device group ID, manager ID, etc. Tags in the schema can be added, removed or updated. Please refer to [STable](/taos-sql/stable) for more details.

-For each kind of data collection points, a corresponding STable must be created. There may be many STables in an application. For electrical power system, we need to create a STable respectively for meters, transformers, busbars, switches. There may be multiple kinds of data collection points on a single device, for example there may be one data collection point for electrical data like current and voltage and another point for environmental data like temperature, humidity and wind direction, multiple STables are required for such kind of device.
+For each kind of data collection point, a corresponding STable must be created. There may be many STables in an application. For electrical power system, we need to create a STable respectively for meters, transformers, busbars, switches. There may be multiple kinds of data collection points on a single device, for example there may be one data collection point for electrical data like current and voltage and another point for environmental data like temperature, humidity and wind direction, multiple STables are required for such kind of device.

-At most 4096 (or 1024 prior to version 2.1.7.0) columns are allowed in a STable. If there are more than 4096 of metrics to bo collected for a data collection point, multiple STables are required for such kind of data collection point. There can be multiple databases in system, while one or more STables can exist in a database.
+At most 4096 (or 1024 prior to version 2.1.7.0) columns are allowed in a STable. If there are more than 4096 of metrics to be collected for a data collection point, multiple STables are required. There can be multiple databases in a system, while one or more STables can exist in a database.

 ## Create Table

@@ -57,7 +57,7 @@ CREATE TABLE d1001 USING meters TAGS ("Beijing.Chaoyang", 2);

 In the above SQL statement, "d1001" is the table name, "meters" is the STable name, followed by the value of tag "Location" and the value of tag "groupId", which are "Beijing.Chaoyang" and "2" respectively in the example. The tag values can be updated after the table is created. Please refer to [Tables](/taos-sql/table) for details.

-In TDengine system, it's recommended to create a table for a data collection point via STable. Table created via STable is called subtable in some parts of TDengine document. All SQL commands applied on regular table can be applied on subtable.
+In TDengine system, it's recommended to create a table for a data collection point via STable. A table created via STable is called subtable in some parts of the TDengine documentation. All SQL commands applied on regular tables can be applied on subtables.

 :::warning
 It's not recommended to create a table in a database while using a STable from another database as template.
@@ -67,7 +67,7 @@ It's suggested to use the global unique ID of a data collection point as the tab

 ## Create Table Automatically

-In some circumstances, it's not sure whether the table already exists when inserting rows. The table can be created automatically using the SQL statement below, and nothing will happen if the table already exist.
+In some circumstances, it's unknown whether the table already exists when inserting rows. The table can be created automatically using the SQL statement below, and nothing will happen if the table already exist.

 ```sql
 INSERT INTO d1001 USING meters TAGS ("Beijng.Chaoyang", 2) VALUES (now, 10.2, 219, 0.32);
@@ -79,6 +79,6 @@ For more details please refer to [Create Table Automatically](/taos-sql/insert#a

 ## Single Column vs Multiple Column

-Multiple columns data model is supported in TDengine. As long as multiple metrics are collected by same data collection point at same time, i.e. the timestamp are identical, these metrics can be put in single stable as columns. However, there is another kind of design, i.e. single column data model, a table is created for each metric, which means a STable is required for each kind of metric. For example, 3 STables are required for current, voltage and phase.
+A multiple columns data model is supported in TDengine. As long as multiple metrics are collected by the same data collection point at the same time, i.e. the timestamp are identical, these metrics can be put in a single STable as columns. However, there is another kind of design, i.e. single column data model, a table is created for each metric, which means a STable is required for each kind of metric. For example, 3 STables are required for current, voltage and phase.

-It's recommended to use multiple column data model as much as possible because it's better in the performance of inserting or querying rows. In some cases, however, the metrics to be collected vary frequently and correspondingly the STable schema needs to be changed frequently too. In such case, it's more convenient to use single column data model.
+It's recommended to use a multiple column data model as much as possible because it's better in the performance of inserting or querying rows. In some cases, however, the metrics to be collected vary frequently and correspondingly the STable schema needs to be changed frequently too. In such case, it's more convenient to use single column data model.
--- a/docs-en/07-develop/03-insert-data/01-sql-writing.mdx
+++ b/docs-en/07-develop/03-insert-data/01-sql-writing.mdx
@@ -22,11 +22,11 @@ import CStmt from "./_c_stmt.mdx";

 ## Introduction

-Application program can execute `INSERT` statement through connectors to insert rows. TAOS CLI can be launched manually to insert data too.
+Application programs can execute `INSERT` statement through connectors to insert rows. The TAOS CLI can also be used to manually insert data.

 ### Insert Single Row

-Below SQL statement is used to insert one row into table "d1001".
+The below SQL statement is used to insert one row into table "d1001".

 ```sql
 INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31);
@@ -34,7 +34,7 @@ INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31);

 ### Insert Multiple Rows

-Multiple rows can be inserted in single SQL statement. Below example inserts 2 rows into table "d1001".
+Multiple rows can be inserted in a single SQL statement. The example below inserts 2 rows into table "d1001".

 ```sql
 INSERT INTO d1001 VALUES (1538548684000, 10.2, 220, 0.23) (1538548696650, 10.3, 218, 0.25);
@@ -42,7 +42,7 @@ INSERT INTO d1001 VALUES (1538548684000, 10.2, 220, 0.23) (1538548696650, 10.3,

 ### Insert into Multiple Tables

-Data can be inserted into multiple tables in same SQL statement. Below example inserts 2 rows into table "d1001" and 1 row into table "d1002".
+Data can be inserted into multiple tables in the same SQL statement. The example below inserts 2 rows into table "d1001" and 1 row into table "d1002".

 ```sql
 INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31) (1538548695000, 12.6, 218, 0.33) d1002 VALUES (1538548696800, 12.3, 221, 0.31);
@@ -52,14 +52,14 @@ For more details about `INSERT` please refer to [INSERT](/taos-sql/insert).

 :::info

- Inserting in batch can gain better performance. Normally, the higher the batch size, the better the performance. Please be noted each single row can't exceed 16K bytes and each single SQL statement can't exceed 1M bytes.
- Inserting with multiple threads can gain better performance too. However, depending on the system resources on the application side and the server side, with the number of inserting threads grows to a specific point, the performance may drop instead of growing. The proper number of threads need to be tested in a specific environment to find the best number.
+- Inserting in batches can improve performance. Normally, the higher the batch size, the better the performance. Please note that a single row can't exceed 16K bytes and each SQL statement can't exceed 1MB.
+- Inserting with multiple threads can also improve performance. However, depending on the system resources on the application side and the server side, when the number of inserting threads grows beyond a specific point the performance may drop instead of improving. The proper number of threads needs to be tested in a specific environment to find the best number.

 :::

 :::warning

- If the timestamp for the row to be inserted already exists in the table, the behavior depends on the value of parameter `UPDATE`. If it's set to 0 (also the default value), the row will be discarded. If it's set to 1, the new values will override the old values for the same row.
+- If the timestamp for the row to be inserted already exists in the table, the behavior depends on the value of parameter `UPDATE`. If it's set to 0 (the default value), the row will be discarded. If it's set to 1, the new values will override the old values for the same row.
 - The timestamp to be inserted must be newer than the timestamp of subtracting current time by the parameter `KEEP`. If `KEEP` is set to 3650 days, then the data older than 3650 days ago can't be inserted. The timestamp to be inserted can't be newer than the timestamp of current time plus parameter `DAYS`. If `DAYS` is set to 2, the data newer than 2 days later can't be inserted.

 :::
@@ -95,13 +95,13 @@ For more details about `INSERT` please refer to [INSERT](/taos-sql/insert).
 :::note

 1. With either native connection or REST connection, the above samples can work well.
-2. Please be noted that `use db` can't be used with REST connection because REST connection is stateless, so in the samples `dbName.tbName` is used to specify the table name.
+2. Please note that `use db` can't be used with a REST connection because REST connections are stateless, so in the samples `dbName.tbName` is used to specify the table name.

 :::

 ### Insert with Parameter Binding

-TDengine also provides Prepare API that support parameter binding. Similar to MySQL, only `?` can be used in these APIs to represent the parameters to bind. From version 2.1.1.0 and 2.1.2.0, parameter binding support for inserting data has been improved significantly to improve the insert performance by avoiding the cost of parsing SQL statements.
+TDengine also provides API support for parameter binding. Similar to MySQL, only `?` can be used in these APIs to represent the parameters to bind. From version 2.1.1.0 and 2.1.2.0, parameter binding support for inserting data has improved significantly to improve the insert performance by avoiding the cost of parsing SQL statements.

 Parameter binding is available only with native connection.


--- a/docs-en/07-develop/03-insert-data/03-opentsdb-telnet.mdx
+++ b/docs-en/07-develop/03-insert-data/03-opentsdb-telnet.mdx
@@ -15,16 +15,16 @@ import CTelnet from "./_c_opts_telnet.mdx";

 ## Introduction

-A single line of text is used in OpenTSDB line protocol to represent one row of data. OpenTSDB employs single column data model, so one line can only contains single data column. There can be multiple tags. Each line contains 4 parts as below:
+A single line of text is used in OpenTSDB line protocol to represent one row of data. OpenTSDB employs single column data model, so one line can only contain a single data column. There can be multiple tags. Each line contains 4 parts as below:

 ```
 <metric> <timestamp> <value> <tagk_1>=<tagv_1>[ <tagk_n>=<tagv_n>]
 ```

- `metric` will be used as STable name.
- `timestamp` is the timestamp of current row of data. The time precision will be determined automatically based on the length of the timestamp. second and millisecond time precision are supported.\
+- `metric` will be used as the STable name.
+- `timestamp` is the timestamp of current row of data. The time precision will be determined automatically based on the length of the timestamp. Second and millisecond time precision are supported.
 - `value` is a metric which must be a numeric value, the corresponding column name is "value".
- The last part is tag sets separated by space, all tags will be converted to nchar type automatically.
+- The last part is the tag set separated by spaces, all tags will be converted to nchar type automatically.

 For example:


--- a/docs-en/07-develop/03-insert-data/index.md
+++ b/docs-en/07-develop/03-insert-data/index.md
@@ -2,11 +2,11 @@
 title: Insert
 ---

-TDengine supports multiple protocols of inserting data, including SQL, InfluxDB Line protocol, OpenTSDB Telnet protocol, OpenTSDB JSON protocol. Data can be inserted row by row, or in batch. Data from one or more collecting points can be inserted simultaneously. In the meantime, data can be inserted with multiple threads, out of order data and historical data can be inserted too. InfluxDB Line protocol, OpenTSDB Telnet protocol and OpenTSDB JSON protocol are the 3 kinds of schemaless insert protocols supported by TDengine. It's not necessary to create stable and table in advance if using schemaless protocols, and the schemas can be adjusted automatically according to the data to be inserted.
+TDengine supports multiple protocols of inserting data, including SQL, InfluxDB Line protocol, OpenTSDB Telnet protocol, and OpenTSDB JSON protocol. Data can be inserted row by row, or in batches. Data from one or more collection points can be inserted simultaneously. Data can be inserted with multiple threads, and out of order data and historical data can be inserted as well. InfluxDB Line protocol, OpenTSDB Telnet protocol and OpenTSDB JSON protocol are the 3 kinds of schemaless insert protocols supported by TDengine. It's not necessary to create STables and tables in advance if using schemaless protocols, and the schemas can be adjusted automatically based on the data being inserted.

 ```mdx-code-block
 import DocCardList from '@theme/DocCardList';
 import {useCurrentSidebarCategory} from '@docusaurus/theme-common';

 <DocCardList items={useCurrentSidebarCategory().items}/>
-```
\ No newline at end of file
+```
--- a/docs-en/07-develop/04-query-data/index.mdx
+++ b/docs-en/07-develop/04-query-data/index.mdx
@@ -20,7 +20,7 @@ import CAsync from "./_c_async.mdx";

 ## Introduction

-SQL is used by TDengine as the query language. Application programs can send SQL statements to TDengine through REST API or connectors. TDengine CLI `taos` can also be used to execute SQL Ad-Hoc query. Here is the list of major query functionalities supported by TDengine：
+SQL is used by TDengine as the query language. Application programs can send SQL statements to TDengine through REST API or connectors. TDengine CLI `taos` can also be used to execute SQL Ad-Hoc queries. Here is the list of major query functionalities supported by TDengine：

 - Query on single column or multiple columns
 - Filter on tags or data columns：>, <, =, <\>, like
@@ -31,7 +31,7 @@ SQL is used by TDengine as the query language. Application programs can send SQL
 - Join query with timestamp alignment
 - Aggregate functions: count, max, min, avg, sum, twa, stddev, leastsquares, top, bottom, first, last, percentile, apercentile, last_row, spread, diff

-For example, below SQL statement can be executed in TDengine CLI `taos` to select the rows whose voltage column is bigger than 215 and limit the output to only 2 rows.
+For example, the SQL statement below can be executed in TDengine CLI `taos` to select the rows whose voltage column is bigger than 215 and limit the output to only 2 rows.

 ```sql
 select * from d1001 where voltage > 215 order by ts desc limit 2;
@@ -46,15 +46,15 @@ taos> select * from d1001 where voltage > 215 order by ts desc limit 2;
 Query OK, 2 row(s) in set (0.001100s)
 ```

-To meet the requirements in many use cases, some special functions have been added in TDengine, for example `twa` (Time Weighted Average), `spared` (The difference between the maximum and the minimum), `last_row` (the last row), more and more functions will be added to better perform in many use cases. Furthermore, continuous query is also supported in TDengine.
+To meet the requirements of many use cases, some special functions have been added in TDengine, for example `twa` (Time Weighted Average), `spared` (The difference between the maximum and the minimum), and `last_row` (the last row). Furthermore, continuous query is also supported in TDengine.

 For detailed query syntax please refer to [Select](/taos-sql/select).

 ## Aggregation among Tables

-In many use cases, there are always multiple kinds of data collection points. A new concept, called STable (abbreviated for super table), is used in TDengine to represent a kind of data collection points, and a table is used to represent a specific data collection point. Tags are used by TDengine to represent the static properties of data collection points. A specific data collection point has its own values for static properties. By specifying filter conditions on tags, aggregation can be performed efficiently among all the subtables created via the same STable, i.e. same kind of data collection points, can be. Aggregate functions applicable for tables can be used directly on STables, syntax is exactly same.
+In many use cases, there are always multiple kinds of data collection points. A new concept, called STable (abbreviated for super table), is used in TDengine to represent a kind of data collection point, and a subtable is used to represent a specific data collection point. Tags are used by TDengine to represent the static properties of data collection points. A specific data collection point has its own values for static properties. By specifying filter conditions on tags, aggregation can be performed efficiently among all the subtables created via the same STable, i.e. same kind of data collection points. Aggregate functions applicable for tables can be used directly on STables, the syntax is exactly the same.

-In summary, for a STable, its subtables can be aggregated by a simple query on STable, it's kind of join operation. But tables belong to different STables could not be aggregated. 
+In summary, for a STable, its subtables can be aggregated by a simple query on the STable, it's a kind of join operation. But tables belong to different STables can not be aggregated. 

 ### Example 1

@@ -81,11 +81,11 @@ taos> SELECT count(*), max(current) FROM meters where groupId = 2 and ts > now -
 Query OK, 1 row(s) in set (0.002136s)
 ```

-Join query is allowed between only the tables of same STable. In [Select](/taos-sql/select), all query operations are marked as whether it supports STable or not.
+Join queries are only allowed between the subtables of the same STable. In [Select](/taos-sql/select), all query operations are marked as to whether they supports STables or not.

 ## Down Sampling and Interpolation

-In IoT use cases, down sampling is widely used to aggregate the data by time range. `INTERVAL` keyword in TDengine can be used to simplify the query by time window. For example, below SQL statement can be used to get the sum of current every 10 seconds from meters table d1001.
+In IoT use cases, down sampling is widely used to aggregate the data by time range. The `INTERVAL` keyword in TDengine can be used to simplify the query by time window. For example, the SQL statement below can be used to get the sum of current every 10 seconds from meters table d1001.

 ```
 taos> SELECT sum(current) FROM d1001 INTERVAL(10s);
@@ -96,7 +96,7 @@ taos> SELECT sum(current) FROM d1001 INTERVAL(10s);
 Query OK, 2 row(s) in set (0.000883s)
 ```

-Down sampling can also be used for STable. For example, below SQL statement can be used to get the sum of current from all meters in BeiJing.
+Down sampling can also be used for STable. For example, the below SQL statement can be used to get the sum of current from all meters in BeiJing.

 ```
 taos> SELECT SUM(current) FROM meters where location like "Beijing%" INTERVAL(1s);
@@ -110,7 +110,7 @@ taos> SELECT SUM(current) FROM meters where location like "Beijing%" INTERVAL(1s
 Query OK, 5 row(s) in set (0.001538s)
 ```

-Down sampling also supports time offset. For example, below SQL statement can be used to get the sum of current from all meters but each time window must start at the boundary of 500 milliseconds.
+Down sampling also supports time offset. For example, the below SQL statement can be used to get the sum of current from all meters but each time window must start at the boundary of 500 milliseconds.

 ```
 taos> SELECT SUM(current) FROM meters INTERVAL(1s, 500a);
@@ -124,7 +124,7 @@ taos> SELECT SUM(current) FROM meters INTERVAL(1s, 500a);
 Query OK, 5 row(s) in set (0.001521s)
 ```

-In many use cases, it's hard to align the timestamp of the data collected by each collection point. However, a lot of algorithms like FFT require the data to be aligned with same time interval and application programs have to handle by themselves in many systems. In TDengine, it's easy to achieve the alignment using down sampling.
+In many use cases, it's hard to align the timestamp of the data collected by each collection point. However, a lot of algorithms like FFT require the data to be aligned with same time interval and application programs have to handle this by themselves. In TDengine, it's easy to achieve the alignment using down sampling.

 Interpolation can be performed in TDengine if there is no data in a time range.

@@ -162,16 +162,16 @@ In the section describing [Insert](/develop/insert-data/sql-writing), a database

 :::note

-1. With either REST connection or native connection, the above sample code work well.
-2. Please be noted that `use db` can't be used in case of REST connection because it's stateless.
+1. With either REST connection or native connection, the above sample code works well.
+2. Please note that `use db` can't be used in case of REST connection because it's stateless.

 :::

 ### Asynchronous Query

-Besides synchronous query, asynchronous query API is also provided by TDengine to insert or query data more efficiently. With similar hardware and software environment, async API is 2~4 times faster than sync APIs. Async API works in non-blocking mode, which means an operation can be returned without finishing so that the calling thread can switch to other works to improve the performance of the whole application system. Async APIs perform especially better in case of poor network.
+Besides synchronous queries, an asynchronous query API is also provided by TDengine to insert or query data more efficiently. With a similar hardware and software environment, the async API is 2~4 times faster than sync APIs. Async API works in non-blocking mode, which means an operation can be returned without finishing so that the calling thread can switch to other works to improve the performance of the whole application system. Async APIs perform especially better in the case of poor networks.

-Please be noted that async query can only be used with native connection.
+Please note that async query can only be used with a native connection.

 <Tabs defaultValue="python" groupId="lang">
  <TabItem label="Python" value="python">

--- a/docs-en/07-develop/05-continuous-query.mdx
+++ b/docs-en/07-develop/05-continuous-query.mdx
@@ -4,15 +4,15 @@ description: "Continuous query is a query that's executed automatically accordin
 title: "Continuous Query"
 ---

-Continuous query is a query that's executed automatically according to predefined frequency to provide aggregate query capability by time window, it's actually a simplified time driven stream computing. Continuous query can be performed on a table or STable in TDengine. The result of continuous query can be pushed to client or written back to TDengine. Each query is executed on a time window, which moves forward with time. The size of time window and the forward sliding time need to be specified with parameter `INTERVAL` and `SLIDING` respectively.
+Continuous query is a query that's executed automatically according to a predefined frequency to provide aggregate query capability by time window, it's actually a simplified time driven stream computing. Continuous query can be performed on a table or STable in TDengine. The result of continuous query can be pushed to clients or written back to TDengine. Each query is executed on a time window, which moves forward with time. The size of time window and the forward sliding time need to be specified with parameter `INTERVAL` and `SLIDING` respectively.

-Continuous query in TDengine is time driven, and can be defined using TAOS SQL directly without any extra operations. With continuous query, the result can be generated according to time window to achieve down sampling of original data. Once a continuous query is defined using TAOS SQL, the query is automatically executed at the end of each time window and the result is pushed back to client or written to TDengine.
+Continuous query in TDengine is time driven, and can be defined using TAOS SQL directly without any extra operations. With continuous query, the result can be generated according to a time window to achieve down sampling of the original data. Once a continuous query is defined using TAOS SQL, the query is automatically executed at the end of each time window and the result is pushed back to clients or written to TDengine.

 There are some differences between continuous query in TDengine and time window computation in stream computing：

 - The computation is performed and the result is returned in real time in stream computing, but the computation in continuous query is only started when a time window closes. For example, if the time window is 1 day, then the result will only be generated at 23:59:59.
- If a historical data row is written in to a time widow for which the computation has been finished, the computation will not be performed again and the result will not be pushed to client again either. If the result has been written into TDengine, there will be no update for the result.
- In continuous query, if the result is pushed to client, the client status is not cached on the server side and Exactly-once is not guaranteed by the server either. If the client program crashes, a new time window will be generated from the time where the continuous query is restarted. If the result is written into TDengine, the data written into TDengine can be guaranteed as valid and continuous.
+- If a historical data row is written in to a time window for which the computation has already finished, the computation will not be performed again and the result will not be pushed to client applications again. If the results have already been written into TDengine, they will not be updated.
+- In continuous query, if the result is pushed to a client, the client status is not cached on the server side and Exactly-once is not guaranteed by the server. If the client program crashes, a new time window will be generated from the time where the continuous query is restarted. If the result is written into TDengine, the data written into TDengine can be guaranteed as valid and continuous.

 ## Syntax

@@ -30,7 +30,7 @@ SLIDING: The time step for which the time window moves forward each time

 ## How to Use

-In this section the use case of meters will be used to introduce how to use continuous query. Assume the STable and sub tables have been created using below SQL statement.
+In this section the use case of meters will be used to introduce how to use continuous query. Assume the STable and subtables have been created using the SQL statements below.

 ```sql
 create table meters (ts timestamp, current float, voltage int, phase float) tags (location binary(64), groupId int);
@@ -38,7 +38,7 @@ create table D1001 using meters tags ("Beijing.Chaoyang", 2);
 create table D1002 using meters tags ("Beijing.Haidian", 2);
 ```

-The average voltage for each time window of one minute with 30 seconds as the length of moving forward can be retrieved using below SQL statement.
+The SQL statement below retrieves the average voltage for a one minute time window, with each time window moving forward by 30 seconds.

 ```sql
 select avg(voltage) from meters interval(1m) sliding(30s);
@@ -50,13 +50,13 @@ Whenever the above SQL statement is executed, all the existing data will be comp
 select avg(voltage) from meters where ts > {startTime} interval(1m) sliding(30s);
 ```

-Another easier way for same purpose is prepend `create table {tableName} as` before the `select`.
+An easier way to achieve this is to prepend `create table {tableName} as` before the `select`.

 ```sql
 create table avg_vol as select avg(voltage) from meters interval(1m) sliding(30s);
 ```

-A table named as `avg_vol` will be created automatically, then every 30 seconds the `select` statement will be executed automatically on the data in the past 1 minutes, i.e. the latest time window, and the result is written into table `avg_vol`. The client program just needs to query from table `avg_vol`. For example:
+A table named as `avg_vol` will be created automatically, then every 30 seconds the `select` statement will be executed automatically on the data in the past 1 minute, i.e. the latest time window, and the result is written into table `avg_vol`. The client program just needs to query from table `avg_vol`. For example:

 ```sql
 taos> select * from avg_vol;
@@ -68,16 +68,16 @@ taos> select * from avg_vol;
 2020-07-29 13:39:00.000 |            223.0800000 |
 ```

-Please be noted that the minimum allowed time window is 10 milliseconds, and no upper limit.
+Please note that the minimum allowed time window is 10 milliseconds, and no upper limit.

-Besides, it's allowed to specify the start and end time of continuous query. If the start time is not specified, the timestamp of the first original row will be considered as the start time; if the end time is not specified, the continuous will be performed infinitely, otherwise it will be terminated once the end time is reached. For example, the continuous query in below SQL statement will be started from now and terminated one hour later.
+It's possible to specify the start and end time of a continuous query. If the start time is not specified, the timestamp of the first row will be considered as the start time; if the end time is not specified, the continuous query will be performed indefinitely, otherwise it will be terminated once the end time is reached. For example, the continuous query in the SQL statement below will be started from now and terminated one hour later.

 ```sql
 create table avg_vol as select avg(voltage) from meters where ts > now and ts <= now + 1h interval(1m) sliding(30s);
 ```

-`now` in above SQL statement stands for the time when the continuous query is created, not the time when the computation is actually performed. Besides, to avoid the trouble caused by the delay of original data as much as possible, the actual computation in continuous query is also started with a little delay. That means, once a time window closes, the computation is not started immediately. Normally, the result can only be available a little time later, normally within one minute, after the time window closes.
+`now` in the above SQL statement stands for the time when the continuous query is created, not the time when the computation is actually performed. To avoid the trouble caused by a delay in receiving data as much as possible, the actual computation in a continuous query is started after a little delay. That means, once a time window closes, the computation is not started immediately. Normally, the result are available after a little time, normally within one minute, after the time window closes.

 ## How to Manage

-`show streams` command can be used in TDengine CLI `taos` to show all the continuous queries in the system, and `kill stream` can be used to terminate a continuous query.
+`show streams` command can be used in the TDengine CLI `taos` to show all the continuous queries in the system, and `kill stream` can be used to terminate a continuous query.
--- a/docs-en/07-develop/06-subscribe.mdx
+++ b/docs-en/07-develop/06-subscribe.mdx
@@ -16,9 +16,9 @@ import CDemo from "./_sub_c.mdx";

 ## Introduction

-According to the time series nature of the data, data inserting in TDengine is similar to data publishing in message queues, they both can be considered as a new data record with timestamp is inserted into the system. Data is stored in ascending order of timestamp inside TDengine, so essentially each table in TDengine can be considered as a message queue.
+Due to the nature of time series data, data inserting in TDengine is similar to data publishing in message queues. Data is stored in ascending order of timestamp inside TDengine, so each table in TDengine can essentially be considered as a message queue.

-Lightweight service for data subscription and pushing is built in TDengine. With the API provided by TDengine, client programs can used `select` statement to subscribe the data from one or more tables. The subscription and and state maintenance is performed on the client side, the client programs polls the server to check whether there is new data, and if so the new data will be pushed back to the client side. If the client program is restarted, where to start for retrieving new data is up to the client side.
+A lightweight service for data subscription and pushing is built in TDengine. With the API provided by TDengine, client programs can use `select` statements to subscribe to data from one or more tables. The subscription and state maintenance is performed on the client side, the client programs poll the server to check whether there is new data, and if so the new data will be pushed back to the client side. If the client program is restarted, where to start for retrieving new data is up to the client side.

 There are 3 major APIs related to subscription provided in the TDengine client driver.

@@ -28,9 +28,9 @@ taos_consume
 taos_unsubscribe
 ```

-For more details about these API please refer to [C/C++ Connector](/reference/connector/cpp). Their usage will be introduced below using the use case of meters, in which the schema of STable and sub tables please refer to the previous section "continuous query". Full sample code can be found [here](https://github.com/taosdata/TDengine/blob/master/examples/c/subscribe.c).
+For more details about these APIs please refer to [C/C++ Connector](/reference/connector/cpp). Their usage will be introduced below using the use case of meters, in which the schema of STable and subtables from the previous section [Continuous Query](/develop/continuous-query) are used. Full sample code can be found [here](https://github.com/taosdata/TDengine/blob/master/examples/c/subscribe.c).

-If we want to get notification and take some actions if the current exceeds a threshold, like 10A, from some meters, there are two ways:
+If we want to get a notification and take some actions if the current exceeds a threshold, like 10A, from some meters, there are two ways:

 The first way is to query on each sub table and record the last timestamp matching the criteria, then after some time query on the data later than recorded timestamp and repeat this process. The SQL statements for this way are as below.

@@ -40,7 +40,7 @@ select * from D1002 where ts > {last_timestamp2} and current > 10;
 ...
 ```

-The above way works, but the problem is that the number of `select` statements increases with the number of meters grows. Finally the performance of both client side and server side will be unacceptable once the number of meters grows to a big enough number.
+The above way works, but the problem is that the number of `select` statements increases with the number of meters. Additionally, the performance of both client side and server side will be unacceptable once the number of meters grows to a big enough number.

 A better way is to query on the STable, only one `select` is enough regardless of the number of meters, like below:

@@ -48,7 +48,7 @@ A better way is to query on the STable, only one `select` is enough regardless o
 select * from meters where ts > {last_timestamp} and current > 10;
 ```

-However, how to choose `last_timestamp` becomes a new problem if using this way. Firstly, the timestamp when the data is generated is different from the timestamp when the data is inserted into the database, sometimes the difference between them may be very big. Secondly, the time when the data from different meters may arrives at the database may be different too. If the timestamp of the "slowest" meter is used as `last_timestamp` in the query, the data from other meters may be selected repeatedly; but if the timestamp of the "fasted" meters is used as `last_timestamp`, some data from other meters may be missed.
+However, this presents a new problem in how to choose `last_timestamp`. First, the timestamp when the data is generated is different from the timestamp when the data is inserted into the database, sometimes the difference between them may be very big. Second, the time when the data from different meters arrives at the database may be different too. If the timestamp of the "slowest" meter is used as `last_timestamp` in the query, the data from other meters may be selected repeatedly; but if the timestamp of the "fastest" meter is used as `last_timestamp`, some data from other meters may be missed.

 All the problems mentioned above can be resolved thoroughly using subscription provided by TDengine.

@@ -75,19 +75,19 @@ The parameter `sql` is a `select` statement in which `where` clause can be used
 select * from meters where current > 10;
 ```

-Please be noted that, all the data will be processed because no start time is specified. If only the data from one day ago needs to be processed, a time related condition can be added:
+Please note that, all the data will be processed because no start time is specified. If only the data from one day ago needs to be processed, a time related condition can be added:

 ```sql
 select * from meters where ts > now - 1d and current > 10;
 ```

-The parameter `topic` is the name of the subscription, it needs to be guaranteed unique in the client program, but it's not necessary to be globally unique because subscription is implemented in the APIs on client side.
+The parameter `topic` is the name of the subscription, it needs to be guaranteed unique in the client program, but it's not necessary to be globally unique because subscription is implemented in the APIs on the client side.

-If the subscription named as `topic` doesn't exist, parameter `restart` would be ignored. If the subscription named as `topic` has been created before by the client program which then exited, when the client program is restarted to use this `topic`, parameter `restart` is used to determine retrieving data from beginning or from the last point where the subscription was broken. If the value of `restart` is **true** (i.e. a non-zero value), the data will be retrieved from beginning, or if it is **false** (i.e. zero), the data already consumed before will not be processed again.
+If the subscription named as `topic` doesn't exist, the parameter `restart` will be ignored. If the subscription named as `topic` has been created before by the client program, when the client program is restarted with the subscription named `topic`, parameter `restart` is used to determine whether to retrieve data from the beginning or from the last point where the subscription was broken. If the value of `restart` is **true** (i.e. a non-zero value), the data will be retrieved from beginning, or if it is **false** (i.e. zero), the data already consumed before will not be processed again.

-The last parameter of `taos_subscribe` is the polling interval in unit of millisecond. In sync mode, if the time difference between two continuous invocations to `taos_consume` is smaller than the interval specified by `taos_subscribe`, `taos_consume` would be blocked until the interval is reached. In async mode, this interval is the minimum interval between two invocations to the call back function.
+The last parameter of `taos_subscribe` is the polling interval in unit of millisecond. In sync mode, if the time difference between two continuous invocations to `taos_consume` is smaller than the interval specified by `taos_subscribe`, `taos_consume` will be blocked until the interval is reached. In async mode, this interval is the minimum interval between two invocations to the call back function.

-The last second parameter of `taos_subscribe` is used to pass arguments to the call back function. `taos_subscribe` doesn't process this parameter and simply passes it to the call back function. This parameter is simply ignored in sync mode.
+The second to last parameter of `taos_subscribe` is used to pass arguments to the call back function. `taos_subscribe` doesn't process this parameter and simply passes it to the call back function. This parameter is simply ignored in sync mode.

 After a subscription is created, its data can be consumed and processed, below is the sample code of how to consume data in sync mode, in the else part if `if (async)`.

@@ -149,22 +149,22 @@ void subscribe_callback(TAOS_SUB* tsub, TAOS_RES *res, void* param, int code) {
 taos_unsubscribe(tsub, keep);
 ```

-The second parameter `keep` is used to specify whether to keep the subscription progress on the client sde. If it is **false**, i.e. **0**, then subscription will be restarted from beginning regardless of the `restart` parameter's value in when `taos_subscribe` is invoked again. The subscription progress information is stored in _{DataDir}/subscribe/_ , under which there is a file with same name as `topic` for each subscription, the subscription will be restarted from beginning if the corresponding progress file is removed.
+The second parameter `keep` is used to specify whether to keep the subscription progress on the client sde. If it is **false**, i.e. **0**, then subscription will be restarted from beginning regardless of the `restart` parameter's value when `taos_subscribe` is invoked again. The subscription progress information is stored in _{DataDir}/subscribe/_ , under which there is a file with the same name as `topic` for each subscription, the subscription will be restarted from the beginning if the corresponding progress file is removed.

 Now let's see the effect of the above sample code, assuming below prerequisites have been done.

 - The sample code has been downloaded to local system
 - TDengine has been installed and launched properly on same system
- The database, STable, sub tables required in the sample code have been ready
+- The database, STable, and subtables required in the sample code are ready

-It's ready to launch below command in the directory where the sample code resides to compile and start the program.
+Launch the command below in the directory where the sample code resides to compile and start the program.

 ```bash
 make
 ./subscribe -sql='select * from meters where current > 10;'
 ```

-After the program is started, open another terminal and launch TDengine CLI `taos`, then use below SQL commands to insert a row whose current is 12A into table **D1001**.
+After the program is started, open another terminal and launch TDengine CLI `taos`, then use the below SQL commands to insert a row whose current is 12A into table **D1001**.

 ```sql
 use test;
@@ -232,7 +232,7 @@ Query OK, 5 row(s) in set (0.004896s)

 ### Run the Examples

-The example programs firstly consume all historical data matching the criteria.
+The example programs first consume all historical data matching the criteria.

 ```bash
 ts: 1597464000000	current: 12.0	voltage: 220	phase: 1	location: Beijing.Chaoyang	groupid : 2

--- a/docs-en/07-develop/07-cache.md
+++ b/docs-en/07-develop/07-cache.md
@@ -10,9 +10,9 @@ Caching the latest data provides the capability of retrieving data in millisecon

 The memory space used by TDengine cache is fixed in size, according to the configuration based on application requirement and system resources. Independent memory pool is allocated for and managed by each vnode (virtual node) in TDengine, there is no sharing of memory pools between vnodes. All the tables belonging to a vnode share all the cache memory of the vnode.

-Memory pool is divided into blocks and data is stored in row format in memory and each block follows FIFO policy. The size of each block is determined by configuration parameter `cache`, the number of blocks for each vnode is determined by `blocks`. For each vnode, the total cache size is `cache * blocks`. It's better to set the size of each block to hold at least tends of rows.
+Memory pool is divided into blocks and data is stored in row format in memory and each block follows FIFO policy. The size of each block is determined by configuration parameter `cache`, the number of blocks for each vnode is determined by `blocks`. For each vnode, the total cache size is `cache * blocks`.  A cache block needs to ensure that each table can store at least dozens of records to be efficient.

-`last_row` function can be used to retrieve the last row of a table or a STable to quickly show the current state of devices on monitoring screen. For example below SQL statement retrieves the latest voltage of all meters in Chaoyang district of Beijing.
+`last_row` function can be used to retrieve the last row of a table or a STable to quickly show the current state of devices on monitoring screen. For example the below SQL statement retrieves the latest voltage of all meters in Chaoyang district of Beijing.

 ```sql
 select last_row(voltage) from meters where location='Beijing.Chaoyang';

--- a/docs-en/07-develop/index.md
+++ b/docs-en/07-develop/index.md
@@ -2,15 +2,15 @@
 title: Developer Guide
 ---

-To develop an application using TDengine to process time-series data, we recommend taking the following steps:
+To develop an application to process time-series data using TDengine, we recommend taking the following steps:

-1. Choose the way for connection to TDengine. No matter what programming language you use, you can always use the REST interface to access TDengine, but you can also use connectors unique to each programming language.
-2. Design the data model based on your own application scenarios. Learn the [concepts](/concept/) of TDengine including "one table for one data collection point" and the "super table" concept; learn about static labels, collected metrics, and subtables. According to the data characteristics, you may decide to create one or more databases, and you should design the STable schema to fit your data.
-3. Decide how to insert data. TDengine supports writing using standard SQL, but also supports schemaless writing, so that data can be written directly without creating tables manually.
-4. Based on business requirements, find out what SQL query statements need to be written.
+1. Choose the method to connect to TDengine. No matter what programming language you use, you can always use the REST interface to access TDengine, but you can also use connectors unique to each programming language.
+2. Design the data model based on your own use cases. Learn the [concepts](/concept/) of TDengine including "one table for one data collection point" and the "super table" (STable) concept; learn about static labels, collected metrics, and subtables. Depending on the characteristics of your data and your requirements, you may decide to create one or more databases, and you should design the STable schema to fit your data.
+3. Decide how you will insert data. TDengine supports writing using standard SQL, but also supports schemaless writing, so that data can be written directly without creating tables manually.
+4. Based on business requirements, find out what SQL query statements need to be written. You may be able to repurpose any existing SQL.
 5. If you want to run real-time analysis based on time series data, including various dashboards, it is recommended that you use the TDengine continuous query feature instead of deploying complex streaming processing systems such as Spark or Flink.
 6. If your application has modules that need to consume inserted data, and they need to be notified when new data is inserted, it is recommended that you use the data subscription function provided by TDengine without the need to deploy Kafka.
-7. In many scenarios (such as fleet management), the application needs to obtain the latest status of each data collection point. It is recommended that you use the cache function of TDengine instead of deploying Redis separately.
+7. In many use cases (such as fleet management), the application needs to obtain the latest status of each data collection point. It is recommended that you use the cache function of TDengine instead of deploying Redis separately.
 8. If you find that the SQL functions of TDengine cannot meet your requirements, then you can use user-defined functions to solve the problem.

 This section is organized in the order described above. For ease of understanding, TDengine provides sample code for each supported programming language for each function. If you want to learn more about the use of SQL, please read the [SQL manual](/taos-sql/). For a more in-depth understanding of the use of each connector, please read the [Connector Reference Guide](/reference/connector/). If you also want to integrate TDengine with third-party systems, such as Grafana, please refer to the [third-party tools](/third-party/).

--- a/docs-en/10-cluster/01-deploy.md
+++ b/docs-en/10-cluster/01-deploy.md
@@ -6,15 +6,15 @@ title: Deployment

 ### Step 1

-The FQDN of all hosts need to be setup properly, all the FQDNs need to be configured in the /etc/hosts of each host. It must be guaranteed that each FQDN can be accessed (by ping, for example) from any other hosts.
+The FQDN of all hosts needs to be setup properly, all the FQDNs need to be configured in the /etc/hosts of each host. It must be confirmed that each FQDN can be accessed (by ping, for example) from any other hosts.

-On each host command `hostname -f` can be executed to get the hostname. `ping` command can be executed on each host to check whether any other host is accessible from it. If any host is not accessible, the network configuration, like /etc/hosts or DNS configuration, need to be checked and revised to make any two hosts accessible to each other.
+On each host the command `hostname -f` can be executed to get the hostname. `ping` command can be executed on each host to check whether any other host is accessible from it. If any host is not accessible, the network configuration, like /etc/hosts or DNS configuration, need to be checked and revised to make any two hosts accessible to each other.

 :::note

- The host where the client program runs also needs to configured properly for FQDN, to make sure all hosts for client or server can be accessed from any other. In other words, the hosts where the client is running are also considered as a part of the cluster.
+- The host where the client program runs also needs to be configured properly for FQDN, to make sure all hosts for client or server can be accessed from any other. In other words, the hosts where the client is running are also considered as a part of the cluster.

- It's suggested to disable the firewall for all hosts in the cluster. At least TCP/UDP for port 6030~6042 need to be open if firewall is enabled.
+- It's suggested to disable the firewall for all hosts in the cluster. At least TCP/UDP for port 6030~6042 need to be open if a firewall is enabled.

 :::

@@ -28,7 +28,7 @@ Now it's time to install TDengine on all hosts without starting `taosd`, the ver

 ### Step 4

-Now each physical node (referred to as `dnode` hereinafter, it's abbreviation for "data node") of TDengine need to be configured properly. Please be noted that one dnode doesn't stand for one host, multiple TDengine nodes can be started on single host as long as they are configured properly without conflicting. More specifically each instance of the configuration file `taos.cfg` stands for a dnode. Assuming the first dnode of TDengine cluster is "h1.taosdata.com:6030", its `taos.cfg` is configured as following.
+Now each physical node (referred to as `dnode` hereinafter, it's abbreviation for "data node") of TDengine needs to be configured properly. Please note that one dnode doesn't stand for one host, multiple TDengine nodes can be started on single host as long as they are configured properly without conflicting. More specifically each instance of the configuration file `taos.cfg` stands for a dnode. Assuming the first dnode of TDengine cluster is "h1.taosdata.com:6030", its `taos.cfg` is configured as following.

 ```c
 // firstEp is the end point to connect to when any dnode starts
@@ -44,9 +44,9 @@ serverPort            6030
 #arbitrator            ha.taosdata.com:6042
 ```

-`firstEp` and `fqdn` must be configured properly. In `taos.cfg` of all dnodes in TDengine cluster, `firstEp` must be configured to point to same address, i.e. the first dnode of the cluster. `fqdn` and `serverPort` compose the address of each node itself. If you want to start multiple TDengine dnodes on a single host, please also make sure all other configurations like `dataDir`, `logDir`, and other resources related parameters are not conflicting.
+`firstEp` and `fqdn` must be configured properly. In `taos.cfg` of all dnodes in TDengine cluster, `firstEp` must be configured to point to same address, i.e. the first dnode of the cluster. `fqdn` and `serverPort` compose the address of each node itself. If you want to start multiple TDengine dnodes on a single host, please make sure all other configurations like `dataDir`, `logDir`, and other resources related parameters are not conflicting.

-For all the dnodes in a TDengine cluster, below parameters must be configured as exactly same, any node whose configuration is different from dnodes already in the cluster can't join the cluster.
+For all the dnodes in a TDengine cluster, the below parameters must be configured exactly the same, any node whose configuration is different from dnodes already in the cluster can't join the cluster.

 | **#** | **Parameter**      | **Definition**                                                                    |
 | ----- | ------------------ | --------------------------------------------------------------------------------- |
@@ -61,7 +61,7 @@ For all the dnodes in a TDengine cluster, below parameters must be configured as
 | 9     | maxVgroupsPerDb    | Maximum number vgroups that can be used by each DB                                |

 :::note
-Prior to version 2.0.19.0, besides the above parameters, `locale` and `charset` must be configured as same too for each dnode.
+Prior to version 2.0.19.0, besides the above parameters, `locale` and `charset` must also be configured the same for each dnode.

 :::

@@ -92,7 +92,7 @@ From the above output, it is shown that the end point of the started dnode is "h

 There are a few steps necessary to add other dnodes in the cluster.

-Firstly, start `taosd` as instructed in [Get Started](/get-started/), assuming it's for the second dnode. Before starting `taosd`, please making sure the configuration is correct, especially `firstEp`, `FQDN` and `serverPort`, `firstEp` must be same as the dnode shown in the section "Start First DNODE", i.e. "h1.taosdata.com" in this example.
+First, start `taosd` as instructed in [Get Started](/get-started/), assuming it's for the second dnode. Before starting `taosd`, please making sure the configuration is correct, especially `firstEp`, `FQDN` and `serverPort`, `firstEp` must be same as the dnode shown in the section "Start First DNODE", i.e. "h1.taosdata.com" in this example.

 Then, on the first dnode, use TDengine CLI `taos` to execute below command to add the end point of the dnode in the cluster. In the command "fqdn:port" should be quoted using double quotes.

@@ -109,6 +109,6 @@ SHOW DNODES;
 If the status of the newly added dnode is offline, please check:

 - Whether the `taosd` process is running properly or not
- In the log file `taosdlog.0` to see whether the fqdn and port are correct or not
+- In the log file `taosdlog.0` to see whether the fqdn and port are correct

 The above process can be repeated to add more dnodes in the cluster.
--- a/docs-en/10-cluster/02-cluster-mgmt.md
+++ b/docs-en/10-cluster/02-cluster-mgmt.md
@@ -3,7 +3,7 @@ sidebar_label: Operation
 title: Manage DNODEs
 ---

-It has been introduced that how to deploy and start a cluster from scratch. Once a cluster is ready, the dnode status in the cluster can be shown at any time, new dnode can be added to scale out the cluster, an existing dnode can be removed, even load balance can be performed manually.\
+The previous section [Deployment](/cluster/deploy) introduced how to deploy and start a cluster from scratch. Once a cluster is ready, the dnode status in the cluster can be shown at any time, new dnode can be added to scale out the cluster, an existing dnode can be removed, even load balance can be performed manually.

 :::note
 All the commands to be introduced in this chapter need to be run through TDengine CLI, sometimes it's necessary to use root privilege.
@@ -12,7 +12,7 @@ All the commands to be introduced in this chapter need to be run through TDengin

 ## Show DNODEs

-below command can be executed in TDengine CLI `taos` to list all dnodes in the cluster, including ID, end point (fqdn:port), status (ready, offline), number of vnodes, number of free vnodes, etc. It's suggested to execute this command to check after adding or removing a dnode.
+The below command can be executed in TDengine CLI `taos` to list all dnodes in the cluster, including ID, end point (fqdn:port), status (ready, offline), number of vnodes, number of free vnodes, etc. It's suggested to execute this command to check after adding or removing a dnode.

 ```sql
 SHOW DNODES;
@@ -39,7 +39,7 @@ USE SOME_DATABASE;
 SHOW VGROUPS;
 ```

-The example output is as below:
+The example output is below:

 ```
 taos> show dnodes;
@@ -87,7 +87,7 @@ taos> show dnodes;
 Query OK, 2 row(s) in set (0.001017s)
 ```

-It can be seen that the status of the new dnode is "offline", once the dnode is started and connects the firstEp of the cluster, execute the command again and get below example output, from which it can be seen that two dnodes are both in "ready" status.
+It can be seen that the status of the new dnode is "offline", once the dnode is started and connects the firstEp of the cluster, execute the command again and get the example output below, from which it can be seen that two dnodes are both in "ready" status.

 ```
 taos> show dnodes;
@@ -100,7 +100,7 @@ Query OK, 2 row(s) in set (0.001316s)

 ## Drop DNODE

-Launch TDengine CLI `taos` and execute the command below to drop or remove a dnode from the cluster. In the command, `dnodeId` can be gotten from `show dnodes`.
+Launch TDengine CLI `taos` and execute the command below to drop or remove a dnode from the cluster. In the command, you can get `dnodeId` from `show dnodes`.

 ```sql
 DROP DNODE "fqdn:port";
@@ -112,7 +112,7 @@ or
 DROP DNODE dnodeId;
 ```

-The example output is as below：
+The example output is below：

 ```
 taos> show dnodes;
@@ -139,7 +139,7 @@ In the above example, when `show dnodes` is executed the first time, two dnodes
 - Once a dnode is dropped, it can't rejoin the cluster. To rejoin, the dnode needs to deployed again after cleaning up the data directory. Normally, before dropping a dnode, the data belonging to the dnode needs to be migrated to other place.
 - Please be noted that `drop dnode` is different from stopping `taosd` process. `drop dnode` just removes the dnode out of TDengine cluster. Only after a dnode is dropped, can the corresponding `taosd` process be stopped.
 - Once a dnode is dropped, other dnodes in the cluster will be notified of the drop and will not accept the request from the dropped dnode.
- dnodeID is allocated automatically and can't be interfered manually. dnodeID is generated in ascending order without duplication.
+- dnodeID is allocated automatically and can't be manually modified. dnodeID is generated in ascending order without duplication.

 :::

@@ -155,7 +155,7 @@ ALTER DNODE <source-dnodeId> BALANCE "VNODE:<vgId>-DNODE:<dest-dnodeId>";

 In the above command, `source-dnodeId` is the original dnodeId where the vnode resides, `dest-dnodeId` specifies the target dnode. vgId (vgroup ID) can be shown by `SHOW VGROUPS `.

-Firstly `show vgroups` is executed to show the vgroup distribution.
+First `show vgroups` is executed to show the vgroup distribution.

 ```
 taos> show vgroups;
@@ -172,7 +172,7 @@ taos> show vgroups;
 Query OK, 8 row(s) in set (0.001314s)
 ```

-It can be seen that there are 5 vgroups in dnode 3 and 3 vgroups in node 1, now we want to move vgId 18 from dnode 3 to dnode 1. Execute below command in `taos`
+It can be seen that there are 5 vgroups in dnode 3 and 3 vgroups in node 1, now we want to move vgId 18 from dnode 3 to dnode 1. Execute the below command in `taos`

 ```
 taos> alter dnode 3 balance "vnode:18-dnode:1";
@@ -207,7 +207,7 @@ It can be seen from above output that vgId 18 has been moved from dnode 3 to dno
 :::note

 - Manual load balancing can only be performed when the automatic load balancing is disabled, i.e. `balance` is set to 0.
- Only vnode in normal state, i.e. master or slave, can be moved. vnode can't moved when its in status offline, unsynced or syncing.
+- Only a vnode in normal state, i.e. master or slave, can be moved. vnode can't be moved when its in status offline, unsynced or syncing.
 - Before moving a vnode, it's necessary to make sure the target dnode has enough resources: CPU, memory and disk.

 :::
--- a/docs-en/10-cluster/03-ha-and-lb.md
+++ b/docs-en/10-cluster/03-ha-and-lb.md
@@ -7,19 +7,19 @@ title: High Availability and Load Balancing

 High availability of vnode and mnode can be achieved through replicas in TDengine.

-The number of vnodes is associated with each DB, there can be multiple DBs in a TDengine cluster. For the purpose of operation, different number of replicas can be configured properly for each DB. When creating a database, the parameter `replica` is used to specify the number of replicas, the default value is 1. With single replica, the high availability of the system can't be guaranteed. Whenever one node is down, data service would be unavailable. The number of dnodes in the cluster must NOT be lower than the number of replicas set for any DB, otherwise the `create table` operation would fail with error "more dnodes are needed". Below SQL statement is used to create a database named as "demo" with 3 replicas.
+The number of vnodes is associated with each DB, there can be multiple DBs in a TDengine cluster. A different number of replicas can be configured for each DB. When creating a database, the parameter `replica` is used to specify the number of replicas, the default value is 1. With single replica, the high availability of the system can't be guaranteed. Whenever one node is down, the data service will be unavailable. The number of dnodes in the cluster must NOT be lower than the number of replicas set for any DB, otherwise the `create table` operation would fail with error "more dnodes are needed". The SQL statement below is used to create a database named "demo" with 3 replicas.

 ```sql
 CREATE DATABASE demo replica 3;
 ```

-The data in a DB is divided into multiple shards and stored in multiple vgroups. The number of vnodes in each group is determined by the number of replicas set for the DB. The vnodes in each vgroups store exactly same data. For the purpose of high availability, the vnodes in a vgroup must be located in different dnodes on different hosts. As long as over half of the vnodes in a vgroup are in online state, the vgroup is able to serve data access. Otherwise the vgroup can't handle any data access for reading or inserting data.
+The data in a DB is divided into multiple shards and stored in multiple vgroups. The number of vnodes in each vgroup is determined by the number of replicas set for the DB. The vnodes in each vgroup store exactly the same data. For the purpose of high availability, the vnodes in a vgroup must be located in different dnodes on different hosts. As long as over half of the vnodes in a vgroup are in an online state, the vgroup is able to provide data access. Otherwise the vgroup can't provide data access for reading or inserting data.

 There may be data for multiple DBs in a dnode. Once a dnode is down, multiple DBs may be affected. However, it's hard to say the cluster is guaranteed to work properly as long as over half of dnodes are online because vnodes are introduced and there may be complex mapping between vnodes and dnodes.

 ## High Availability of Mnode

-Each TDengine cluster is managed by `mnode`, which is a module of `taosd`. For the high availability of mnode, multiple mnodes can be configured using system parameter `numOfMNodes`, the valid time range is [1,3]. To make sure the data consistency between mnodes, the data replication between mnodes is performed in synchronous way.
+Each TDengine cluster is managed by `mnode`, which is a module of `taosd`. For the high availability of mnode, multiple mnodes can be configured using system parameter `numOfMNodes`, the valid time range is [1,3]. To make sure the data consistency between mnodes, the data replication between mnodes is performed in a synchronous way.

 There may be multiple dnodes in a cluster, but only one mnode can be started in each dnode. Which one or ones of the dnodes will be designated as mnodes is automatically determined by TDengine according to the cluster configuration and system resources. Command `show mnodes` can be executed in TDengine `taos` to show the mnodes in the cluster.

@@ -32,19 +32,19 @@ The end point and role/status (master, slave, unsynced, or offline) of all mnode
 For the high availability of mnode, `numOfMnodes` needs to be configured to 2 or a higher value. Because the data consistency between mnodes must be guaranteed, the replica confirmation parameter `quorum` is set to 2 automatically if `numOfMNodes` is set to 2 or higher.

 :::note
-If high availability is important for your system, both vnode and mnode must be configured to have multiple replicas. How to configure for them are different and have been described.
+If high availability is important for your system, both vnode and mnode must be configured to have multiple replicas.

 :::

 ## Load Balance

-Load balance will be triggered in 3 cades without manual intervention.
+Load balance will be triggered in 3 cases without manual intervention.

 - When a new dnode is joined in the cluster, automatic load balancing may be triggered, some data from some dnodes may be transferred to the new dnode automatically.
 - When a dnode is removed from the cluster, the data from this dnode will be transferred to other dnodes automatically.
 - When a dnode is too hot, i.e. too much data has been stored in it, automatic load balancing may be triggered to migrate some vnodes from this dnode to other dnodes.
- :::tip
-  Automatic load balancing is controlled by parameter `balance`, 0 means disabled and 1 means enabled.
+:::tip
+Automatic load balancing is controlled by parameter `balance`, 0 means disabled and 1 means enabled.

 :::

@@ -54,7 +54,7 @@ When a dnode is offline, it can be detected by the TDengine cluster. There are t

 - The dnode becomes online again before the threshold configured in `offlineThreshold` is reached, it is still in the cluster and data replication is started automatically. The dnode can work properly after the data syncup is finished.

- If the dnode has been offline over the threshold configured in `offlineThreshold` in `taos.cfg`, the dnode will be removed from the cluster automatically. System alert will be generated and automatic load balancing will be triggered too if `balance` is set to 1. When the removed dnode is restarted and becomes online, it will not be joined in the cluster automatically, it can only be joined manually by the system operator.
+- If the dnode has been offline over the threshold configured in `offlineThreshold` in `taos.cfg`, the dnode will be removed from the cluster automatically. A system alert will be generated and automatic load balancing will be triggered if `balance` is set to 1. When the removed dnode is restarted and becomes online, it will not join in the cluster automatically, it can only be joined manually by the system operator.

 :::note
 If all the vnodes in a vgroup (or mnodes in mnode group) are in offline or unsynced status, the master node can only be voted after all the vnodes or mnodes in the group become online and can exchange status, then the vgroup (or mnode group) is able to provide service.
@@ -63,15 +63,15 @@ If all the vnodes in a vgroup (or mnodes in mnode group) are in offline or unsyn

 ## Arbitrator

-If the number of replicas is set to an even number like 2, when half of the vnodes in a vgroup don't work master node can't be voted. Similar case is also applicable to mnode if the number of mnodes is set to an even number like 2.
+If the number of replicas is set to an even number like 2, when half of the vnodes in a vgroup don't work a master node can't be voted. A similar case is also applicable to mnode if the number of mnodes is set to an even number like 2.

-To resolve this problem, a new arbitrator component named `tarbitrator`, abbreviated for TDengine Arbitrator, was introduced. Arbitrator simulates a vnode or mnode but it's only responsible for network communication and doesn't handle any actual data access. With Arbitrator, any vgroup or mnode group can be considered as having number of member nodes and master node can be selected.
+To resolve this problem, a new arbitrator component named `tarbitrator`, abbreviated for TDengine Arbitrator, was introduced. Arbitrator simulates a vnode or mnode but it's only responsible for network communication and doesn't handle any actual data access. As long as more than half of the vnode or mnode, including Arbitrator, are available the vnode group or mnode group can provide data insertion or query services normally.

-Normally, it's suggested to configure replica number of each DB or system parameter `numOfMNodes` to an odd number. However, if a user is very sensitive to storage space, replica number of 2 plus arbitrator component can be used to achieve both lower cost of storage space and high availability.
+Normally, it's suggested to configure a replica number of each DB or system parameter `numOfMNodes` to an odd number. However, if a user is very sensitive to storage space, a replica number of 2 plus arbitrator component can be used to achieve both lower cost of storage space and high availability.

 Arbitrator component is installed with the server package. For details about how to install, please refer to [Install](/operation/pkg-install). The `-p` parameter of `tarbitrator` can be used to specify the port on which it provides service.

-In the configuration file `taos.cfg` of each dnode, parameter `arbitrator` needs to be configured to the end point of the `tarbitrator` process. arbitrator component will be used automatically if the replica is configured to an even number and will be ignored if the replica is configured to an odd number.
+In the configuration file `taos.cfg` of each dnode, parameter `arbitrator` needs to be configured to the end point of the `tarbitrator` process. Arbitrator component will be used automatically if the replica is configured to an even number and will be ignored if the replica is configured to an odd number.

 Arbitrator can be shown by executing command in TDengine CLI `taos` with its role shown as "arb".


--- a/docs-en/10-cluster/index.md
+++ b/docs-en/10-cluster/index.md
@@ -3,7 +3,7 @@ title: Cluster
 keywords: ["cluster", "high availability", "load balance", "scale out"]
 ---

-TDengine has a native distributed design and provides the ability to scale out. A few of nodes can form a TDengine cluster. If you need to get higher processing power, you just need to add more nodes into the cluster. TDengine uses virtual node technology to virtualize a node into multiple virtual nodes to achieve load balancing. At the same time, TDengine can group virtual nodes on different nodes into virtual node groups, and use the replication mechanism to ensure the high availability of the system. The cluster feature of TDengine is completely open source.
+TDengine has a native distributed design and provides the ability to scale out. A few nodes can form a TDengine cluster. If you need higher processing power, you just need to add more nodes into the cluster. TDengine uses virtual node technology to virtualize a node into multiple virtual nodes to achieve load balancing. At the same time, TDengine can group virtual nodes on different nodes into virtual node groups, and use the replication mechanism to ensure the high availability of the system. The cluster feature of TDengine is completely open source.

 This chapter mainly introduces cluster deployment, maintenance, and how to achieve high availability and load balancing.


--- a/docs-en/12-taos-sql/index.md
+++ b/docs-en/12-taos-sql/index.md
@@ -3,9 +3,9 @@ title: TDengine SQL
 description: "The syntax supported by TDengine SQL "
 ---

-This section explains the syntax about operating database, table, STable, inserting data, selecting data, functions and some tips that can be used in TDengine SQL. It would be easier to understand with some fundamental knowledge of SQL.
+This section explains the syntax to operating databases, tables, STables, inserting data, selecting data, functions and some tips that can be used in TDengine SQL. It would be easier to understand with some fundamental knowledge of SQL.

-TDengine SQL is the major interface for users to write data into or query from TDengine. For users to easily use, syntax similar to standard SQL is provided. However, please be noted that TDengine SQL is not standard SQL. Besides, because TDengine doesn't provide the functionality of deleting time series data, corresponding statements are not provided in TDengine SQL.
+TDengine SQL is the major interface for users to write data into or query from TDengine. For users to easily use, syntax similar to standard SQL is provided. However, please note that TDengine SQL is not standard SQL. For instance, TDengine doesn't provide the functionality of deleting time series data, thus corresponding statements are not provided in TDengine SQL.

 TDengine SQL doesn't support abbreviation for keywords, for example `DESCRIBE` can't be abbreviated as `DESC`.

@@ -16,7 +16,7 @@ Syntax Specifications used in this chapter:
 - | means one of a few options, excluding | itself.
 - … means the item prior to it can be repeated multiple times.

-To better demonstrate the syntax, usage and rules of TAOS SQL, hereinafter it's assumed that there is a data set of meters. Assuming each meter collects 3 data: current, voltage, phase. The data model is as below:
+To better demonstrate the syntax, usage and rules of TAOS SQL, hereinafter it's assumed that there is a data set of meters. Assuming each meter collects 3 data measurements: current, voltage, phase. The data model is shown below:

 ```sql
 taos> DESCRIBE meters;

--- a/docs-en/13-operation/11-optimize.md
+++ b/docs-en/13-operation/11-optimize.md
---
-title: Performance Optimization
---
-
-After a TDengine cluster has been running for long enough time, because of updating data, deleting tables and deleting expired data, there may be fragments in data files and query performance may be impacted. To resolve the problem of fragments, from version 2.1.3.0 a new SQL command `COMPACT` can be used to defragment the data files.
-
-```sql
-COMPACT VNODES IN (vg_id1, vg_id2, ...)
-```
-
-`COMPACT` can be used to defragment one or more vgroups. The defragmentation work will be put in task queue for scheduling execution by TDengine. `SHOW VGROUPS` command can be used to get the vgroup ids to be used in `COMPACT` command. There is a column `compacting` in the output of `SHOW GROUPS` to indicate the compacting status of the vgroup: 2 means the vgroup is waiting in task queue for compacting, 1 means compacting is in progress, and 0 means the vgroup has nothing to do with compacting.
-
-Please be noted that a lot of disk I/O is required for defragementation operation, during which the performance may be impacted significantly for data insertion and query, data insertion may be blocked shortly in extreme cases.
-
-## Optimize Storage Parameters
-
-The data in different use cases may have different characteristics, such as the days to keep, number of replicas, collection interval, record size, number of collection points, compression or not, etc. To achieve best efficiency in storage, the parameters in below table can be used, all of them can be either configured in `taos.cfg` as default configuration or in the command `create database`. For detailed definition of these parameters please refer to [Configuration Parameters](/reference/config/).
-
-| #   | Parameter | Unit | Definition                                                                     | **Value Range**                                                                                 | **Default Value** |
-| --- | --------- | ---- | ------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------- | ----------------- |
-| 1   | days      | Day  | The time range of the data stored in a single data file                        | 1-3650                                                                                          | 10                |
-| 2   | keep      | Day  | The number of days the data is kept in the database                            | 1-36500                                                                                         | 3650              |
-| 3   | cache     | MB   | The size of each memory block                                                  | 1-128                                                                                           | 16                |
-| 4   | blocks    | None | The number of memory blocks used by each vnode                                 | 3-10000                                                                                         | 6                 |
-| 5   | quorum    | None | The number of required confirmation in case of multiple replicas               | 1-2                                                                                             | 1                 |
-| 6   | minRows   | None | The minimum number of rows in a data file                                      | 10-1000                                                                                         | 100               |
-| 7   | maxRows   | None | The maximum number of rows in a daa file                                       | 200-10000                                                                                       | 4096              |
-| 8   | comp      | None | Whether to compress the data                                                   | 0：uncompressed; 1: One Phase compression; 2: Two Phase compression                             | 2                 |
-| 9   | walLevel  | None | wal sync level (named as "wal" in create database )                            | 1：wal enabled without fsync; 2：wal enabled with fsync                                         | 1                 |
-| 10  | fsync     | ms   | The time to wait for invoking fsync when walLevel is set to 2; 0 means no wait | 3000                                                                                            |
-| 11  | replica   | none | The number of replications                                                     | 1-3                                                                                             | 1                 |
-| 12  | precision | none | Time precision                                                                 | ms: millisecond; us: microsecond;ns: nanosecond                                                 | ms                |
-| 13  | update    | none | Whether to allow updating data                                                 | 0: not allowed; 1: a row must be updated as whole; 2: a part of columns in a row can be updated | 0                 |
-| 14  | cacheLast | none | Whether the latest data of a table is cached in memory                         | 0: not cached; 1: the last row is cached; 2: the latest non-NULL value of each column is cached | 0                 |
-
-For a specific use case, there may be multiple kinds of data with different characteristics, it's best to put data with same characteristics in same database. So there may be multiple databases in a system while each database can be configured with different storage parameters to achieve best performance. The above parameters can be used when creating a database to override the default setting in configuration file.
-
-```sql
- CREATE DATABASE demo DAYS 10 CACHE 32 BLOCKS 8 REPLICA 3 UPDATE 1;
-```
-
-The above SQL statement creates a database named as `demo`, in which each data file stores data across 10 days, the size of each memory block is 32 MB and each vnode is allocated with 8 blocks, the replica is set to 3, update operation is allowed, and all other parameters not specified in the command follow the default configuration in `taos.cfg`.
-
-Once a database is created, only some parameters can be changed and be effective immediately while others are can't.
-
-| **Parameter** | **Alterable** | **Value Range**  | **Syntax**                             |
-| ------------- | ------------- | ---------------- | -------------------------------------- |
-| name          |               |                  |                                        |
-| create time   |               |                  |                                        |
-| ntables       |               |                  |                                        |
-| vgroups       |               |                  |                                        |
-| replica       | **YES**       | 1-3              | ALTER DATABASE <dbname\> REPLICA _n_   |
-| quorum        | **YES**       | 1-2              | ALTER DATABASE <dbname\> QUORUM _n_    |
-| days          |               |                  |                                        |
-| keep          | **YES**       | days-365000      | ALTER DATABASE <dbname\> KEEP _n_      |
-| cache         |               |                  |                                        |
-| blocks        | **YES**       | 3-1000           | ALTER DATABASE <dbname\> BLOCKS _n_    |
-| minrows       |               |                  |                                        |
-| maxrows       |               |                  |                                        |
-| wal           |               |                  |                                        |
-| fsync         |               |                  |                                        |
-| comp          | **YES**       | 0-2              | ALTER DATABASE <dbname\> COMP _n_      |
-| precision     |               |                  |                                        |
-| status        |               |                  |                                        |
-| update        |               |                  |                                        |
-| cachelast     | **YES**       | 0 \| 1 \| 2 \| 3 | ALTER DATABASE <dbname\> CACHELAST _n_ |
-
-**Explanation：** Prior to version 2.1.3.0, `taosd` server process needs to be restarted for these parameters to take in effect if they are changed using `ALTER DATABASE`.
-
-When trying to join a new dnode into a running TDengine cluster, all the parameters related to cluster in the new dnode configuration must be consistent with the cluster, otherwise it can't join the cluster. The parameters that are checked when joining a dnode are as below. For detailed definition of these parameters please refer to [Configuration Parameters](/reference/config/).
-
- numOfMnodes
- mnodeEqualVnodeNum
- offlineThreshold
- statusInterval
- maxTablesPerVnode
- maxVgroupsPerDb
- arbitrator
- timezone
- balance
- flowctrl
- slaveQuery
- adjustMaster
-
-For the convenience of debugging, the log setting of a dnode can be changed temporarily. The temporary change will be lost once the server is restarted.
-
-```sql
-ALTER DNODE <dnode_id> <config>
-```
-
- dnode_id: from output of "SHOW DNODES"
- config: the parameter to be changed, as below
-  - resetlog: close the old log file and create the new on
-  - debugFlag: 131 (INFO/ERROR/WARNING), 135 (DEBUG), 143 (TRACE)
-
-For example
-
-```
-alter dnode 1 debugFlag 135;
-```
--- a/docs-en/21-tdinternal/30-iot-big-data.md
+++ b/docs-en/21-tdinternal/30-iot-big-data.md
---
-title: IoT Big Data
-description: "Characteristics of IoT Big Data, why general big data platform does not work well for IoT? The required features for an IoT Big Data Platform"
---
-
- [Characteristics of IoT Big Data](https://tdengine.com/2019/07/09/86.html)
- [Why don’t General Big Data Platforms Fit IoT Scenarios?](https://tdengine.com/2019/07/09/92.html)
- [Why TDengine is the Best Choice for IoT Big Data Processing?](https://tdengine.com/2019/07/09/94.html)
- [Why Redis, Kafka, Spark aren’t Needed if TDengine is Used in the IoT Platform?](https://tdengine.com/2019/07/09/96.html)
-
--- a/include/common/taosdef.h
+++ b/include/common/taosdef.h
@@ -96,6 +96,7 @@ extern char *qtypeStr[];
 #define TSDB_PORT_HTTP 11

 #undef TD_DEBUG_PRINT_ROW
+#undef TD_DEBUG_PRINT_TSDB_LOAD_DCOLS

 #ifdef __cplusplus
 }

--- a/include/common/tmsg.h
+++ b/include/common/tmsg.h
@@ -1210,9 +1210,10 @@ typedef struct {
 } SRetrieveMetaTableRsp;

 typedef struct SExplainExecInfo {
-  uint64_t startupCost;
-  uint64_t totalCost;
+  double   startupCost;
+  double   totalCost;
  uint64_t numOfRows;
+  uint32_t verboseLen;
  void*    verboseInfo;
 } SExplainExecInfo;

@@ -1221,6 +1222,18 @@ typedef struct {
  SExplainExecInfo* subplanInfo;
 } SExplainRsp;

+typedef struct STableScanAnalyzeInfo {
+  uint64_t totalRows;
+  uint64_t totalCheckedRows;
+  uint32_t totalBlocks;
+  uint32_t loadBlocks;
+  uint32_t loadBlockStatis;
+  uint32_t skipBlocks;
+  uint32_t filterOutBlocks;
+  double   elapsedTime;
+  uint64_t filterTime;
+} STableScanAnalyzeInfo;
+
 int32_t tSerializeSExplainRsp(void* buf, int32_t bufLen, SExplainRsp* pRsp);
 int32_t tDeserializeSExplainRsp(void* buf, int32_t bufLen, SExplainRsp* pRsp);


--- a/include/common/tmsgdef.h
+++ b/include/common/tmsgdef.h
@@ -158,6 +158,7 @@ enum {
  TD_DEF_MSG_TYPE(TDMT_MND_DROP_INDEX, "mnode-drop-index", NULL, NULL)
  TD_DEF_MSG_TYPE(TDMT_MND_GET_DB_CFG, "mnode-get-db-cfg", NULL, NULL)
  TD_DEF_MSG_TYPE(TDMT_MND_GET_INDEX, "mnode-get-index", NULL, NULL)
+  TD_DEF_MSG_TYPE(TDMT_MND_APPLY_MSG, "mnode-apply-msg", NULL, NULL)

  // Requests handled by VNODE
  TD_NEW_MSG_SEG(TDMT_VND_MSG)

--- a/include/common/ttypes.h
+++ b/include/common/ttypes.h
@@ -30,7 +30,7 @@ typedef uint64_t TDRowVerT;
 typedef int16_t  col_id_t;
 typedef int8_t   col_type_t;
 typedef int32_t  col_bytes_t;
-typedef uint16_t schema_ver_t;
+typedef int32_t schema_ver_t;
 typedef int32_t  func_id_t;

 #pragma pack(push, 1)

--- a/include/dnode/mnode/mnode.h
+++ b/include/dnode/mnode/mnode.h
@@ -29,6 +29,7 @@ extern "C" {
 typedef struct SMnode SMnode;

 typedef struct {
+  bool     standby;
  bool     deploy;
  int8_t   replica;
  int8_t   selfIndex;
@@ -89,6 +90,7 @@ int32_t mndGetLoad(SMnode *pMnode, SMnodeLoad *pLoad);
 * @return int32_t 0 for success, -1 for failure.
 */
 int32_t mndProcessMsg(SRpcMsg *pMsg);
+int32_t mndProcessSyncMsg(SRpcMsg *pMsg);

 /**
 * @brief Generate machine code

--- a/include/dnode/mnode/sdb/sdb.h
+++ b/include/dnode/mnode/sdb/sdb.h
@@ -44,12 +44,9 @@ extern "C" {
  }

 #define SDB_GET_INT64(pData, dataPos, val, pos) SDB_GET_VAL(pData, dataPos, val, pos, sdbGetRawInt64, int64_t)
-
 #define SDB_GET_INT32(pData, dataPos, val, pos) SDB_GET_VAL(pData, dataPos, val, pos, sdbGetRawInt32, int32_t)
-
 #define SDB_GET_INT16(pData, dataPos, val, pos) SDB_GET_VAL(pData, dataPos, val, pos, sdbGetRawInt16, int16_t)
-
-#define SDB_GET_INT8(pData, dataPos, val, pos) SDB_GET_VAL(pData, dataPos, val, pos, sdbGetRawInt8, int8_t)
+#define SDB_GET_INT8(pData, dataPos, val, pos)  SDB_GET_VAL(pData, dataPos, val, pos, sdbGetRawInt8, int8_t)

 #define SDB_GET_RESERVE(pRaw, dataPos, valLen, pos) \
  {                                                 \
@@ -66,11 +63,8 @@ extern "C" {
  }

 #define SDB_SET_INT64(pRaw, dataPos, val, pos) SDB_SET_VAL(pRaw, dataPos, val, pos, sdbSetRawInt64, int64_t)
-
 #define SDB_SET_INT32(pRaw, dataPos, val, pos) SDB_SET_VAL(pRaw, dataPos, val, pos, sdbSetRawInt32, int32_t)
-
 #define SDB_SET_INT16(pRaw, dataPos, val, pos) SDB_SET_VAL(pRaw, dataPos, val, pos, sdbSetRawInt16, int16_t)
-
 #define SDB_SET_INT8(pRaw, dataPos, val, pos) SDB_SET_VAL(pRaw, dataPos, val, pos, sdbSetRawInt8, int8_t)

 #define SDB_SET_BINARY(pRaw, dataPos, val, valLen, pos)     \
@@ -304,13 +298,16 @@ int32_t sdbGetMaxId(SSdb *pSdb, ESdbType type);
 int64_t sdbGetTableVer(SSdb *pSdb, ESdbType type);

 /**
- * @brief Update the version of sdb
+ * @brief Update the index of sdb
 *
 * @param pSdb The sdb object.
- * @param val The update value of the version.
- * @return int32_t The current version of sdb
+ * @param index The update value of the apply index.
+ * @return int32_t The current index of sdb
 */
-int64_t sdbUpdateVer(SSdb *pSdb, int32_t val);
+void    sdbSetApplyIndex(SSdb *pSdb, int64_t index);
+int64_t sdbGetApplyIndex(SSdb *pSdb);
+void    sdbSetApplyTerm(SSdb *pSdb, int64_t term);
+int64_t sdbGetApplyTerm(SSdb *pSdb);

 SSdbRaw *sdbAllocRaw(ESdbType type, int8_t sver, int32_t dataLen);
 void     sdbFreeRaw(SSdbRaw *pRaw);
@@ -339,6 +336,7 @@ typedef struct SSdb {
  char          *tmpDir;
  int64_t        lastCommitVer;
  int64_t        curVer;
+  int64_t        curTerm;
  int64_t        tableVer[SDB_MAX];
  int64_t        maxId[SDB_MAX];
  EKeyType       keyTypes[SDB_MAX];
@@ -352,6 +350,14 @@ typedef struct SSdb {
  SdbDecodeFp    decodeFps[SDB_MAX];
 } SSdb;

+typedef struct SSdbIter {
+  TdFilePtr file;
+  int64_t   readlen;
+} SSdbIter;
+
+SSdbIter *sdbIterInit(SSdb *pSdb);
+SSdbIter *sdbIterRead(SSdb *pSdb, SSdbIter *iter, char **ppBuf, int32_t *len);
+
 #ifdef __cplusplus
 }
 #endif

--- a/include/libs/command/command.h
+++ b/include/libs/command/command.h
@@ -24,7 +24,7 @@ int32_t qExecCommand(SNode* pStmt, SRetrieveTableRsp** pRsp);
 int32_t qExecStaticExplain(SQueryPlan *pDag, SRetrieveTableRsp **pRsp);
 int32_t qExecExplainBegin(SQueryPlan *pDag, SExplainCtx **pCtx, int64_t startTs);
 int32_t qExecExplainEnd(SExplainCtx *pCtx, SRetrieveTableRsp **pRsp);
-int32_t qExplainUpdateExecInfo(SExplainCtx        *pCtx, SExplainRsp *pRspMsg, int32_t groupId, SRetrieveTableRsp **pRsp);
+int32_t qExplainUpdateExecInfo(SExplainCtx *pCtx, SExplainRsp *pRspMsg, int32_t groupId, SRetrieveTableRsp **pRsp);
 void    qExplainFreeCtx(SExplainCtx *pCtx);


--- a/include/libs/function/function.h
+++ b/include/libs/function/function.h
@@ -39,6 +39,7 @@ typedef bool (*FExecInit)(struct SqlFunctionCtx *pCtx, struct SResultRowEntryInf
 typedef int32_t (*FExecProcess)(struct SqlFunctionCtx *pCtx);
 typedef int32_t (*FExecFinalize)(struct SqlFunctionCtx *pCtx, SSDataBlock* pBlock);
 typedef int32_t (*FScalarExecProcess)(SScalarParam *pInput, int32_t inputNum, SScalarParam *pOutput);
+typedef int32_t (*FExecCombine)(struct SqlFunctionCtx *pDestCtx, struct SqlFunctionCtx *pSourceCtx);

 typedef struct SScalarFuncExecFuncs {
  FExecGetEnv getEnv;
@@ -50,6 +51,7 @@ typedef struct SFuncExecFuncs {
  FExecInit init;
  FExecProcess process;
  FExecFinalize finalize;
+  FExecCombine combine;
 } SFuncExecFuncs;

 typedef struct SFileBlockInfo {

--- a/include/libs/function/functionMgt.h
+++ b/include/libs/function/functionMgt.h
@@ -146,7 +146,8 @@ bool fmIsBuiltinFunc(const char* pFunc);

 bool fmIsAggFunc(int32_t funcId);
 bool fmIsScalarFunc(int32_t funcId);
-bool fmIsNonstandardSQLFunc(int32_t funcId);
+bool fmIsVectorFunc(int32_t funcId);
+bool fmIsIndefiniteRowsFunc(int32_t funcId);
 bool fmIsStringFunc(int32_t funcId);
 bool fmIsDatetimeFunc(int32_t funcId);
 bool fmIsSelectFunc(int32_t funcId);

--- a/include/libs/nodes/nodes.h
+++ b/include/libs/nodes/nodes.h
@@ -59,10 +59,10 @@ extern "C" {
  for (SListCell* cell = (NULL != (list) ? (list)->pHead : NULL); \
       (NULL != cell ? (node = &(cell->pNode), true) : (node = NULL, false)); cell = cell->pNext)

-#define DESTORY_LIST(list)  \
-  do {                      \
-    nodesDestroyList(list); \
-    list = NULL;            \
+#define DESTORY_LIST(list)    \
+  do {                        \
+    nodesDestroyList((list)); \
+    (list) = NULL;            \
  } while (0)

 typedef enum ENodeType {
@@ -96,6 +96,7 @@ typedef enum ENodeType {
  QUERY_NODE_EXPLAIN_OPTIONS,
  QUERY_NODE_STREAM_OPTIONS,
  QUERY_NODE_TOPIC_OPTIONS,
+  QUERY_NODE_LEFT_VALUE,

  // Statement nodes are used in parser and planner module.
  QUERY_NODE_SET_OPERATOR,
@@ -211,6 +212,7 @@ typedef enum ENodeType {
  QUERY_NODE_PHYSICAL_PLAN_STREAM_FINAL_INTERVAL,
  QUERY_NODE_PHYSICAL_PLAN_FILL,
  QUERY_NODE_PHYSICAL_PLAN_SESSION_WINDOW,
+  QUERY_NODE_PHYSICAL_PLAN_STREAM_SESSION_WINDOW,
  QUERY_NODE_PHYSICAL_PLAN_STATE_WINDOW,
  QUERY_NODE_PHYSICAL_PLAN_PARTITION,
  QUERY_NODE_PHYSICAL_PLAN_DISPATCH,

--- a/include/libs/nodes/plannodes.h
+++ b/include/libs/nodes/plannodes.h
@@ -54,6 +54,7 @@ typedef struct SScanLogicNode {
  int64_t            sliding;
  int8_t             intervalUnit;
  int8_t             slidingUnit;
+  SNode*             pTagCond;
 } SScanLogicNode;

 typedef struct SJoinLogicNode {
@@ -295,6 +296,8 @@ typedef struct SSessionWinodwPhysiNode {
  int64_t          gap;
 } SSessionWinodwPhysiNode;

+typedef SSessionWinodwPhysiNode SStreamSessionWinodwPhysiNode;
+
 typedef struct SStateWinodwPhysiNode {
  SWinodwPhysiNode window;
  SNode*           pStateKey;
@@ -343,6 +346,7 @@ typedef struct SSubplan {
  SNodeList*     pParents;      // the data destination subplan, get data from current subplan
  SPhysiNode*    pNode;         // physical plan of current subplan
  SDataSinkNode* pDataSink;     // data of the subplan flow into the datasink
+  SNode*         pTagCond;
 } SSubplan;

 typedef enum EExplainMode { EXPLAIN_MODE_DISABLE = 1, EXPLAIN_MODE_STATIC, EXPLAIN_MODE_ANALYZE } EExplainMode;

--- a/include/libs/nodes/querynodes.h
+++ b/include/libs/nodes/querynodes.h
@@ -81,6 +81,7 @@ typedef struct SValueNode {
  char*     literal;
  bool      isDuration;
  bool      translate;
+  bool      notReserved;
  int16_t   placeholderNo;
  union {
    bool     b;
@@ -93,6 +94,10 @@ typedef struct SValueNode {
  char    unit;
 } SValueNode;

+typedef struct SLeftValueNode {
+  ENodeType type;
+} SLeftValueNode;
+
 typedef struct SOperatorNode {
  SExprNode     node;  // QUERY_NODE_OPERATOR
  EOperatorType opType;
@@ -236,7 +241,8 @@ typedef struct SSelectStmt {
  bool        isTimeOrderQuery;
  bool        hasAggFuncs;
  bool        hasRepeatScanFuncs;
-  bool        hasNonstdSQLFunc;
+  bool        hasIndefiniteRowsFunc;
+  bool        hasSelectValFunc;
 } SSelectStmt;

 typedef enum ESetOperatorType { SET_OP_TYPE_UNION_ALL = 1, SET_OP_TYPE_UNION } ESetOperatorType;

--- a/include/libs/sync/sync.h
+++ b/include/libs/sync/sync.h
@@ -78,15 +78,33 @@ typedef struct SFsmCbMeta {
  int32_t    code;
  ESyncState state;
  uint64_t   seqNum;
+  SyncTerm   term;
+  SyncTerm   currentTerm;
 } SFsmCbMeta;

+typedef struct SReConfigCbMeta {
+  int32_t   code;
+  SyncIndex index;
+  SyncTerm  term;
+  SyncTerm  currentTerm;
+} SReConfigCbMeta;
+
 typedef struct SSyncFSM {
  void* data;
+
  void (*FpCommitCb)(struct SSyncFSM* pFsm, const SRpcMsg* pMsg, SFsmCbMeta cbMeta);
  void (*FpPreCommitCb)(struct SSyncFSM* pFsm, const SRpcMsg* pMsg, SFsmCbMeta cbMeta);
  void (*FpRollBackCb)(struct SSyncFSM* pFsm, const SRpcMsg* pMsg, SFsmCbMeta cbMeta);
+
+  void (*FpRestoreFinishCb)(struct SSyncFSM* pFsm);
  int32_t (*FpGetSnapshot)(struct SSyncFSM* pFsm, SSnapshot* pSnapshot);
-  int32_t (*FpRestoreSnapshot)(struct SSyncFSM* pFsm, const SSnapshot* snapshot);
+  void* (*FpSnapshotRead)(struct SSyncFSM* pFsm, const SSnapshot* snapshot, void* iter, char** ppBuf, int32_t* len);
+  int32_t (*FpSnapshotApply)(struct SSyncFSM* pFsm, const SSnapshot* snapshot, char* pBuf, int32_t len);
+
+  void (*FpReConfigCb)(struct SSyncFSM* pFsm, SSyncCfg newCfg, SReConfigCbMeta cbMeta);
+
+  // int32_t (*FpRestoreSnapshot)(struct SSyncFSM* pFsm, const SSnapshot* snapshot);
+
 } SSyncFSM;

 // abstract definition of log store in raft
@@ -117,7 +135,6 @@ typedef struct SSyncLogStore {

 } SSyncLogStore;

-
 typedef struct SSyncInfo {
  SyncGroupId vgId;
  SSyncCfg    syncCfg;
@@ -144,6 +161,7 @@ int32_t     syncGetVgId(int64_t rid);
 int32_t     syncPropose(int64_t rid, const SRpcMsg* pMsg, bool isWeak);
 bool        syncEnvIsStart();
 const char* syncStr(ESyncState state);
+bool        syncIsRestoreFinish(int64_t rid);

 #ifdef __cplusplus
 }

--- a/include/libs/transport/trpc.h
+++ b/include/libs/transport/trpc.h
@@ -28,7 +28,7 @@ extern "C" {
 #define TAOS_CONN_CLIENT 1
 #define IsReq(pMsg)      (pMsg->msgType & 1U)

-extern int tsRpcHeadSize;
+extern int32_t tsRpcHeadSize;

 typedef struct {
  uint32_t clientIp;
@@ -69,10 +69,10 @@ typedef struct SRpcInit {
  char     localFqdn[TSDB_FQDN_LEN];
  uint16_t localPort;     // local port
  char *   label;         // for debug purpose
-  int      numOfThreads;  // number of threads to handle connections
-  int      sessions;      // number of sessions allowed
+  int32_t  numOfThreads;  // number of threads to handle connections
+  int32_t  sessions;      // number of sessions allowed
  int8_t   connType;      // TAOS_CONN_UDP, TAOS_CONN_TCPC, TAOS_CONN_TCPS
-  int      idleTime;      // milliseconds, 0 means idle timer is disabled
+  int32_t  idleTime;      // milliseconds, 0 means idle timer is disabled

  // the following is for client app ecurity only
  char *user;  // user name
@@ -108,9 +108,9 @@ int32_t rpcInit();
 void    rpcCleanup();
 void *  rpcOpen(const SRpcInit *pRpc);
 void    rpcClose(void *);
-void *  rpcMallocCont(int contLen);
+void *  rpcMallocCont(int32_t contLen);
 void    rpcFreeCont(void *pCont);
-void *  rpcReallocCont(void *ptr, int contLen);
+void *  rpcReallocCont(void *ptr, int32_t contLen);

 // Because taosd supports multi-process mode
 // These functions should not be used on the server side
@@ -121,10 +121,10 @@ void rpcRegisterBrokenLinkArg(SRpcMsg *msg);
 void rpcReleaseHandle(void *handle, int8_t type);  // just release client conn to rpc instance, no close sock

 // These functions will not be called in the child process
-void rpcSendRedirectRsp(void *pConn, const SEpSet *pEpSet);
-void rpcSendRequestWithCtx(void *thandle, const SEpSet *pEpSet, SRpcMsg *pMsg, int64_t *rid, SRpcCtx *ctx);
-int  rpcGetConnInfo(void *thandle, SRpcConnInfo *pInfo);
-void rpcSendRecv(void *shandle, SEpSet *pEpSet, SRpcMsg *pReq, SRpcMsg *pRsp);
+void    rpcSendRedirectRsp(void *pConn, const SEpSet *pEpSet);
+void    rpcSendRequestWithCtx(void *thandle, const SEpSet *pEpSet, SRpcMsg *pMsg, int64_t *rid, SRpcCtx *ctx);
+int32_t rpcGetConnInfo(void *thandle, SRpcConnInfo *pInfo);
+void    rpcSendRecv(void *shandle, SEpSet *pEpSet, SRpcMsg *pReq, SRpcMsg *pRsp);

 #ifdef __cplusplus
 }

--- a/include/util/taoserror.h
+++ b/include/util/taoserror.h
@@ -253,6 +253,7 @@ int32_t* taosGetErrno();
 #define TSDB_CODE_MND_TRANS_INVALID_STAGE       TAOS_DEF_ERROR_CODE(0, 0x03D2)
 #define TSDB_CODE_MND_TRANS_CONFLICT            TAOS_DEF_ERROR_CODE(0, 0x03D3)
 #define TSDB_CODE_MND_TRANS_UNKNOW_ERROR        TAOS_DEF_ERROR_CODE(0, 0x03D4)
+#define TSDB_CODE_MND_TRANS_CLOG_IS_NULL        TAOS_DEF_ERROR_CODE(0, 0x03D5)

 // mnode-mq
 #define TSDB_CODE_MND_TOPIC_ALREADY_EXIST       TAOS_DEF_ERROR_CODE(0, 0x03E0)
@@ -638,6 +639,7 @@ int32_t* taosGetErrno();
 #define TSDB_CODE_PAR_NOT_ALLOWED_FUNC          TAOS_DEF_ERROR_CODE(0, 0x264F)
 #define TSDB_CODE_PAR_NOT_ALLOWED_WIN_QUERY     TAOS_DEF_ERROR_CODE(0, 0x2650)
 #define TSDB_CODE_PAR_INVALID_DROP_COL          TAOS_DEF_ERROR_CODE(0, 0x2651)
+#define TSDB_CODE_PAR_INVALID_COL_JSON          TAOS_DEF_ERROR_CODE(0, 0x2652)

 //planner
 #define TSDB_CODE_PLAN_INTERNAL_ERROR           TAOS_DEF_ERROR_CODE(0, 0x2700)

--- a/include/util/tdef.h
+++ b/include/util/tdef.h
@@ -132,6 +132,7 @@ typedef enum EOperatorType {
  OP_TYPE_MOD,
  // unary arithmetic operator
  OP_TYPE_MINUS,
+  OP_TYPE_ASSIGN,

  // bit operator
  OP_TYPE_BIT_AND,
@@ -233,6 +234,7 @@ typedef enum ELogicConditionType {
 #define TSDB_MAX_TAG_CONDITIONS 1024

 #define TSDB_MAX_JSON_TAG_LEN 16384
+#define TSDB_MAX_JSON_KEY_LEN 256

 #define TSDB_AUTH_LEN          16
 #define TSDB_PASSWORD_LEN      32
@@ -426,11 +428,11 @@ enum {
 };

 #define DEFAULT_HANDLE 0
-#define MNODE_HANDLE   -1
-#define QNODE_HANDLE   -2
-#define SNODE_HANDLE   -3
-#define VNODE_HANDLE   -4
-#define BNODE_HANDLE   -5
+#define MNODE_HANDLE   1
+#define QNODE_HANDLE   -1
+#define SNODE_HANDLE   -2
+#define VNODE_HANDLE   -3
+#define BNODE_HANDLE   -4

 #define TSDB_CONFIG_OPTION_LEN 16
 #define TSDB_CONIIG_VALUE_LEN  48

--- a/include/util/tencode.h
+++ b/include/util/tencode.h
@@ -82,7 +82,7 @@ typedef struct {
  do {                               \
    SEncoder coder = {0};            \
    tEncoderInit(&coder, NULL, 0);   \
-    if ((E)(&coder, S) == 0) {       \
+    if ((E)(&coder, S) >= 0) {       \
      SIZE = coder.pos;              \
      RET = 0;                       \
    } else {                         \

--- a/packaging/tools/install_client.sh
+++ b/packaging/tools/install_client.sh
@@ -17,6 +17,7 @@ serverName="taosd"
 clientName="taos"
 uninstallScript="rmtaos"
 configFile="taos.cfg"
+tarName="taos.tar.gz"

 osType=Linux
 pagMode=full
@@ -242,6 +243,11 @@ function install_examples() {

 function update_TDengine() {
    # Start to update
+    if [ ! -e ${tarName} ]; then
+        echo "File ${tarName} does not exist"
+        exit 1
+    fi
+    tar -zxf ${tarName}
    echo -e "${GREEN}Start to update ${productName} client...${NC}"
    # Stop the client shell if running
    if pidof ${clientName} &> /dev/null; then
@@ -264,42 +270,49 @@ function update_TDengine() {

    echo
    echo -e "\033[44;32;1m${productName} client is updated successfully!${NC}"
+
+    rm -rf $(tar -tf ${tarName})
 }

 function install_TDengine() {
-  # Start to install
-  echo -e "${GREEN}Start to install ${productName} client...${NC}"
-
-  install_main_path
-  install_log
-  install_header
-  install_lib
-  install_jemalloc
-  if [ "$verMode" == "cluster" ]; then
-    install_connector
-  fi
-  install_examples
-  install_bin
-  install_config
+    # Start to install
+    if [ ! -e ${tarName} ]; then
+        echo "File ${tarName} does not exist"
+        exit 1
+    fi
+    tar -zxf ${tarName}
+    echo -e "${GREEN}Start to install ${productName} client...${NC}"

-  echo
-  echo -e "\033[44;32;1m${productName} client is installed successfully!${NC}"
+    install_main_path
+    install_log
+    install_header
+    install_lib
+    install_jemalloc
+    if [ "$verMode" == "cluster" ]; then
+        install_connector
+    fi
+    install_examples
+    install_bin
+    install_config
+
+    echo
+    echo -e "\033[44;32;1m${productName} client is installed successfully!${NC}"

-  rm -rf $(tar -tf ${tarName})
+    rm -rf $(tar -tf ${tarName})
 }


 ## ==============================Main program starts from here============================
 # Install or updata client and client
 # if server is already install, don't install client
-  if [ -e ${bin_dir}/${serverName} ]; then
-      echo -e "\033[44;32;1mThere are already installed ${productName} server, so don't need install client!${NC}"
-      exit 0
-  fi
+if [ -e ${bin_dir}/${serverName} ]; then
+    echo -e "\033[44;32;1mThere are already installed ${productName} server, so don't need install client!${NC}"
+    exit 0
+fi

-  if [ -x ${bin_dir}/${clientName} ]; then
-      update_flag=1
-      update_TDengine
-  else
-      install_TDengine
-  fi
+if [ -x ${bin_dir}/${clientName} ]; then
+    update_flag=1
+    update_TDengine
+else
+    install_TDengine
+fi
--- a/source/client/src/clientImpl.c
+++ b/source/client/src/clientImpl.c
@@ -866,8 +866,7 @@ static char* parseTagDatatoJson(void* p) {
    if (j == 0) {
      if (*val == TSDB_DATA_TYPE_NULL) {
        string = taosMemoryCalloc(1, 8);
-        sprintf(varDataVal(string), "%s", TSDB_DATA_NULL_STR_L);
-        varDataSetLen(string, strlen(varDataVal(string)));
+        sprintf(string, "%s", TSDB_DATA_NULL_STR_L);
        goto end;
      }
      continue;
@@ -1003,7 +1002,7 @@ static int32_t doConvertUCS4(SReqResultInfo* pResultInfo, int32_t numOfRows, int
              length = 0;
            }
            varDataSetLen(dst, length + CHAR_BYTES * 2);
-            *(char*)(varDataVal(dst), length + CHAR_BYTES) = '\"';
+            *(char*)POINTER_SHIFT(varDataVal(dst), length + CHAR_BYTES) = '\"';
          } else if (jsonInnerType == TSDB_DATA_TYPE_DOUBLE) {
            double jsonVd = *(double*)(jsonInnerData);
            sprintf(varDataVal(dst), "%.9lf", jsonVd);

--- a/source/client/test/clientTests.cpp
+++ b/source/client/test/clientTests.cpp
@@ -567,7 +567,6 @@ TEST(testCase, insert_test) {
  taos_free_result(pRes);
  taos_close(pConn);
 }
-#endif

 TEST(testCase, projection_query_tables) {
  TAOS* pConn = taos_connect("localhost", "root", "taosdata", NULL, 0);
@@ -606,7 +605,7 @@ TEST(testCase, projection_query_tables) {
  }
  taos_free_result(pRes);

-  for(int32_t i = 0; i < 100000; i += 20) {
+  for(int32_t i = 0; i < 1000000; i += 20) {
    char sql[1024] = {0};
    sprintf(sql,
            "insert into tu values(now+%da, %d)(now+%da, %d)(now+%da, %d)(now+%da, %d)"
@@ -626,7 +625,7 @@ TEST(testCase, projection_query_tables) {

  printf("start to insert next table\n");

-  for(int32_t i = 0; i < 100000; i += 20) {
+  for(int32_t i = 0; i < 1000000; i += 20) {
    char sql[1024] = {0};
    sprintf(sql,
            "insert into tu2 values(now+%da, %d)(now+%da, %d)(now+%da, %d)(now+%da, %d)"
@@ -693,6 +692,8 @@ TEST(testCase, projection_query_stables) {
  taos_close(pConn);
 }

+#endif
+
 TEST(testCase, agg_query_tables) {
  TAOS* pConn = taos_connect("localhost", "root", "taosdata", NULL, 0);
  ASSERT_NE(pConn, nullptr);
@@ -705,7 +706,7 @@ TEST(testCase, agg_query_tables) {
  }
  taos_free_result(pRes);

-  pRes = taos_query(pConn, "select tbname from st1");
+  pRes = taos_query(pConn, "explain analyze select count(*) from tu interval(1s)");
  if (taos_errno(pRes) != 0) {
    printf("failed to select from table, reason:%s\n", taos_errstr(pRes));
    taos_free_result(pRes);

--- a/source/common/src/tdatablock.c
+++ b/source/common/src/tdatablock.c
@@ -1538,7 +1538,7 @@ int32_t buildSubmitReqFromDataBlock(SSubmitReq** pReq, const SArray* pDataBlocks
  int32_t     msgLen = sizeof(SSubmitReq);
  int32_t     numOfBlks = 0;
  SRowBuilder rb = {0};
-  tdSRowInit(&rb, 0);  // TODO: use the latest version
+  tdSRowInit(&rb, pTSchema->version);  // TODO: use the latest version

  for (int32_t i = 0; i < sz; ++i) {
    SSDataBlock* pDataBlock = taosArrayGet(pDataBlocks, i);

--- a/source/common/src/tmsg.c
+++ b/source/common/src/tmsg.c
@@ -3318,9 +3318,11 @@ int32_t tSerializeSExplainRsp(void *buf, int32_t bufLen, SExplainRsp *pRsp) {
  if (tEncodeI32(&encoder, pRsp->numOfPlans) < 0) return -1;
  for (int32_t i = 0; i < pRsp->numOfPlans; ++i) {
    SExplainExecInfo *info = &pRsp->subplanInfo[i];
-    if (tEncodeU64(&encoder, info->startupCost) < 0) return -1;
-    if (tEncodeU64(&encoder, info->totalCost) < 0) return -1;
+    if (tEncodeDouble(&encoder, info->startupCost) < 0) return -1;
+    if (tEncodeDouble(&encoder, info->totalCost) < 0) return -1;
    if (tEncodeU64(&encoder, info->numOfRows) < 0) return -1;
+    if (tEncodeU32(&encoder, info->verboseLen) < 0) return -1;
+    if (tEncodeBinary(&encoder, info->verboseInfo, info->verboseLen) < 0) return -1;
  }

  tEndEncode(&encoder);
@@ -3341,9 +3343,11 @@ int32_t tDeserializeSExplainRsp(void *buf, int32_t bufLen, SExplainRsp *pRsp) {
    if (pRsp->subplanInfo == NULL) return -1;
  }
  for (int32_t i = 0; i < pRsp->numOfPlans; ++i) {
-    if (tDecodeU64(&decoder, &pRsp->subplanInfo[i].startupCost) < 0) return -1;
-    if (tDecodeU64(&decoder, &pRsp->subplanInfo[i].totalCost) < 0) return -1;
+    if (tDecodeDouble(&decoder, &pRsp->subplanInfo[i].startupCost) < 0) return -1;
+    if (tDecodeDouble(&decoder, &pRsp->subplanInfo[i].totalCost) < 0) return -1;
    if (tDecodeU64(&decoder, &pRsp->subplanInfo[i].numOfRows) < 0) return -1;
+    if (tDecodeU32(&decoder, &pRsp->subplanInfo[i].verboseLen) < 0) return -1;
+    if (tDecodeBinary(&decoder, (uint8_t**) &pRsp->subplanInfo[i].verboseInfo, &pRsp->subplanInfo[i].verboseLen) < 0) return -1;
  }

  tEndDecode(&decoder);
@@ -3817,7 +3821,7 @@ int tDecodeSVCreateStbReq(SDecoder *pCoder, SVCreateStbReq *pReq) {

 STSchema *tdGetSTSChemaFromSSChema(SSchema **pSchema, int32_t nCols) {
  STSchemaBuilder schemaBuilder = {0};
-  if (tdInitTSchemaBuilder(&schemaBuilder, 0) < 0) {
+  if (tdInitTSchemaBuilder(&schemaBuilder, 1) < 0) {
    return NULL;
  }


--- a/source/common/src/trow.c
+++ b/source/common/src/trow.c
@@ -924,7 +924,7 @@ void tdSRowPrint(STSRow *row, STSchema *pSchema, const char *tag) {
  STSRowIter iter = {0};
  tdSTSRowIterInit(&iter, pSchema);
  tdSTSRowIterReset(&iter, row);
-  printf("%s >>>", tag);
+  printf("%s >>>type:%d,sver:%d ", tag, (int32_t)TD_ROW_TYPE(row), (int32_t)TD_ROW_SVER(row));
  for (int i = 0; i < pSchema->numOfCols; ++i) {
    STColumn *stCol = pSchema->columns + i;
    SCellVal  sVal = {255, NULL};

--- a/source/dnode/mgmt/mgmt_mnode/inc/mmInt.h
+++ b/source/dnode/mgmt/mgmt_mnode/inc/mmInt.h
@@ -24,19 +24,22 @@ extern "C" {
 #endif

 typedef struct SMnodeMgmt {
-  SDnodeData   *pData;
-  SMnode       *pMnode;
-  SMsgCb        msgCb;
-  const char   *path;
-  const char   *name;
-  SSingleWorker queryWorker;
-  SSingleWorker readWorker;
-  SSingleWorker writeWorker;
-  SSingleWorker syncWorker;
-  SSingleWorker monitorWorker;
-  SReplica      replicas[TSDB_MAX_REPLICA];
-  int8_t        replica;
-  int8_t        selfIndex;
+  SDnodeData    *pData;
+  SMnode        *pMnode;
+  SMsgCb         msgCb;
+  const char    *path;
+  const char    *name;
+  SSingleWorker  queryWorker;
+  SSingleWorker  readWorker;
+  SSingleWorker  writeWorker;
+  SSingleWorker  syncWorker;
+  SSingleWorker  monitorWorker;
+  SReplica       replicas[TSDB_MAX_REPLICA];
+  int8_t         replica;
+  int8_t         selfIndex;
+  bool           stopped;
+  int32_t        refCount;
+  TdThreadRwlock lock;
 } SMnodeMgmt;

 // mmFile.c
@@ -45,6 +48,8 @@ int32_t mmWriteFile(SMnodeMgmt *pMgmt, SDCreateMnodeReq *pMsg, bool deployed);

 // mmInt.c
 int32_t mmAlter(SMnodeMgmt *pMgmt, SDAlterMnodeReq *pMsg);
+int32_t mmAcquire(SMnodeMgmt *pMgmt);
+void    mmRelease(SMnodeMgmt *pMgmt);

 // mmHandle.c
 SArray *mmGetMsgHandles();

--- a/source/dnode/mgmt/mgmt_mnode/src/mmFile.c
+++ b/source/dnode/mgmt/mgmt_mnode/src/mmFile.c
@@ -53,43 +53,45 @@ int32_t mmReadFile(SMnodeMgmt *pMgmt, bool *pDeployed) {
  *pDeployed = deployed->valueint;

  cJSON *mnodes = cJSON_GetObjectItem(root, "mnodes");
-  if (!mnodes || mnodes->type != cJSON_Array) {
-    dError("failed to read %s since nodes not found", file);
-    goto _OVER;
-  }
-
-  pMgmt->replica = cJSON_GetArraySize(mnodes);
-  if (pMgmt->replica <= 0 || pMgmt->replica > TSDB_MAX_REPLICA) {
-    dError("failed to read %s since mnodes size %d invalid", file, pMgmt->replica);
-    goto _OVER;
-  }
-
-  for (int32_t i = 0; i < pMgmt->replica; ++i) {
-    cJSON *node = cJSON_GetArrayItem(mnodes, i);
-    if (node == NULL) break;
-
-    SReplica *pReplica = &pMgmt->replicas[i];
-
-    cJSON *id = cJSON_GetObjectItem(node, "id");
-    if (!id || id->type != cJSON_Number) {
-      dError("failed to read %s since id not found", file);
+  if (mnodes != NULL) {
+    if (!mnodes || mnodes->type != cJSON_Array) {
+      dError("failed to read %s since nodes not found", file);
      goto _OVER;
    }
-    pReplica->id = id->valueint;

-    cJSON *fqdn = cJSON_GetObjectItem(node, "fqdn");
-    if (!fqdn || fqdn->type != cJSON_String || fqdn->valuestring == NULL) {
-      dError("failed to read %s since fqdn not found", file);
+    pMgmt->replica = cJSON_GetArraySize(mnodes);
+    if (pMgmt->replica <= 0 || pMgmt->replica > TSDB_MAX_REPLICA) {
+      dError("failed to read %s since mnodes size %d invalid", file, pMgmt->replica);
      goto _OVER;
    }
-    tstrncpy(pReplica->fqdn, fqdn->valuestring, TSDB_FQDN_LEN);

-    cJSON *port = cJSON_GetObjectItem(node, "port");
-    if (!port || port->type != cJSON_Number) {
-      dError("failed to read %s since port not found", file);
-      goto _OVER;
+    for (int32_t i = 0; i < pMgmt->replica; ++i) {
+      cJSON *node = cJSON_GetArrayItem(mnodes, i);
+      if (node == NULL) break;
+
+      SReplica *pReplica = &pMgmt->replicas[i];
+
+      cJSON *id = cJSON_GetObjectItem(node, "id");
+      if (!id || id->type != cJSON_Number) {
+        dError("failed to read %s since id not found", file);
+        goto _OVER;
+      }
+      pReplica->id = id->valueint;
+
+      cJSON *fqdn = cJSON_GetObjectItem(node, "fqdn");
+      if (!fqdn || fqdn->type != cJSON_String || fqdn->valuestring == NULL) {
+        dError("failed to read %s since fqdn not found", file);
+        goto _OVER;
+      }
+      tstrncpy(pReplica->fqdn, fqdn->valuestring, TSDB_FQDN_LEN);
+
+      cJSON *port = cJSON_GetObjectItem(node, "port");
+      if (!port || port->type != cJSON_Number) {
+        dError("failed to read %s since port not found", file);
+        goto _OVER;
+      }
+      pReplica->port = port->valueint;
    }
-    pReplica->port = port->valueint;
  }

  code = 0;
@@ -122,21 +124,23 @@ int32_t mmWriteFile(SMnodeMgmt *pMgmt, SDCreateMnodeReq *pMsg, bool deployed) {
  char   *content = taosMemoryCalloc(1, maxLen + 1);

  len += snprintf(content + len, maxLen - len, "{\n");
-  len += snprintf(content + len, maxLen - len, "  \"mnodes\": [{\n");

  int8_t replica = (pMsg != NULL ? pMsg->replica : pMgmt->replica);
-  for (int32_t i = 0; i < replica; ++i) {
-    SReplica *pReplica = &pMgmt->replicas[i];
-    if (pMsg != NULL) {
-      pReplica = &pMsg->replicas[i];
-    }
-    len += snprintf(content + len, maxLen - len, "    \"id\": %d,\n", pReplica->id);
-    len += snprintf(content + len, maxLen - len, "    \"fqdn\": \"%s\",\n", pReplica->fqdn);
-    len += snprintf(content + len, maxLen - len, "    \"port\": %u\n", pReplica->port);
-    if (i < replica - 1) {
-      len += snprintf(content + len, maxLen - len, "  },{\n");
-    } else {
-      len += snprintf(content + len, maxLen - len, "  }],\n");
+  if (replica > 0) {
+    len += snprintf(content + len, maxLen - len, "  \"mnodes\": [{\n");
+    for (int32_t i = 0; i < replica; ++i) {
+      SReplica *pReplica = &pMgmt->replicas[i];
+      if (pMsg != NULL) {
+        pReplica = &pMsg->replicas[i];
+      }
+      len += snprintf(content + len, maxLen - len, "    \"id\": %d,\n", pReplica->id);
+      len += snprintf(content + len, maxLen - len, "    \"fqdn\": \"%s\",\n", pReplica->fqdn);
+      len += snprintf(content + len, maxLen - len, "    \"port\": %u\n", pReplica->port);
+      if (i < replica - 1) {
+        len += snprintf(content + len, maxLen - len, "  },{\n");
+      } else {
+        len += snprintf(content + len, maxLen - len, "  }],\n");
+      }
    }
  }


--- a/source/dnode/mgmt/mgmt_mnode/src/mmHandle.c
+++ b/source/dnode/mgmt/mgmt_mnode/src/mmHandle.c
@@ -237,6 +237,16 @@ SArray *mmGetMsgHandles() {
  if (dmSetMgmtHandle(pArray, TDMT_VND_SYNC_VNODE_RSP, mmPutNodeMsgToWriteQueue, 0) == NULL) goto _OVER;
  if (dmSetMgmtHandle(pArray, TDMT_VND_COMPACT_VNODE_RSP, mmPutNodeMsgToWriteQueue, 0) == NULL) goto _OVER;

+  if (dmSetMgmtHandle(pArray, TDMT_VND_SYNC_TIMEOUT, mmPutNodeMsgToSyncQueue, 1) == NULL) goto _OVER;
+  if (dmSetMgmtHandle(pArray, TDMT_VND_SYNC_PING, mmPutNodeMsgToSyncQueue, 1) == NULL) goto _OVER;
+  if (dmSetMgmtHandle(pArray, TDMT_VND_SYNC_PING_REPLY, mmPutNodeMsgToSyncQueue, 1) == NULL) goto _OVER;
+  if (dmSetMgmtHandle(pArray, TDMT_VND_SYNC_CLIENT_REQUEST, mmPutNodeMsgToSyncQueue, 1) == NULL) goto _OVER;
+  if (dmSetMgmtHandle(pArray, TDMT_VND_SYNC_CLIENT_REQUEST_REPLY, mmPutNodeMsgToSyncQueue, 1) == NULL) goto _OVER;
+  if (dmSetMgmtHandle(pArray, TDMT_VND_SYNC_REQUEST_VOTE, mmPutNodeMsgToSyncQueue, 1) == NULL) goto _OVER;
+  if (dmSetMgmtHandle(pArray, TDMT_VND_SYNC_REQUEST_VOTE_REPLY, mmPutNodeMsgToSyncQueue, 1) == NULL) goto _OVER;
+  if (dmSetMgmtHandle(pArray, TDMT_VND_SYNC_APPEND_ENTRIES, mmPutNodeMsgToSyncQueue, 1) == NULL) goto _OVER;
+  if (dmSetMgmtHandle(pArray, TDMT_VND_SYNC_APPEND_ENTRIES_REPLY, mmPutNodeMsgToSyncQueue, 1) == NULL) goto _OVER;
+
  code = 0;

 _OVER:

--- a/source/dnode/mgmt/mgmt_mnode/src/mmInt.c
+++ b/source/dnode/mgmt/mgmt_mnode/src/mmInt.c
@@ -39,32 +39,44 @@ static int32_t mmRequire(const SMgmtInputOpt *pInput, bool *required) {
 }

 static void mmBuildOptionForDeploy(SMnodeMgmt *pMgmt, const SMgmtInputOpt *pInput, SMnodeOpt *pOption) {
+  pOption->standby = false;
+  pOption->deploy = true;
  pOption->msgCb = pMgmt->msgCb;
  pOption->replica = 1;
  pOption->selfIndex = 0;
+
  SReplica *pReplica = &pOption->replicas[0];
  pReplica->id = 1;
  pReplica->port = tsServerPort;
  tstrncpy(pReplica->fqdn, tsLocalFqdn, TSDB_FQDN_LEN);
-  pOption->deploy = true;
-
-  pMgmt->selfIndex = pOption->selfIndex;
-  pMgmt->replica = pOption->replica;
-  memcpy(&pMgmt->replicas, pOption->replicas, sizeof(SReplica) * TSDB_MAX_REPLICA);
 }

 static void mmBuildOptionForOpen(SMnodeMgmt *pMgmt, SMnodeOpt *pOption) {
  pOption->msgCb = pMgmt->msgCb;
-  pOption->selfIndex = pMgmt->selfIndex;
-  pOption->replica = pMgmt->replica;
-  memcpy(&pOption->replicas, pMgmt->replicas, sizeof(SReplica) * TSDB_MAX_REPLICA);
  pOption->deploy = false;
+  pOption->standby = false;
+
+  if (pMgmt->replica > 0) {
+    pOption->standby = true;
+    pOption->replica = 1;
+    pOption->selfIndex = 0;
+    SReplica *pReplica = &pOption->replicas[0];
+    for (int32_t i = 0; i < pMgmt->replica; ++i) {
+      if (pMgmt->replicas[i].id != pMgmt->pData->dnodeId) continue;
+      pReplica->id = pMgmt->replicas[i].id;
+      pReplica->port = pMgmt->replicas[i].port;
+      memcpy(pReplica->fqdn, pMgmt->replicas[i].fqdn, TSDB_FQDN_LEN);
+    }
+  }
 }

-static int32_t mmBuildOptionFromReq(SMnodeMgmt *pMgmt, SMnodeOpt *pOption, SDCreateMnodeReq *pCreate) {
+static int32_t mmBuildOptionForAlter(SMnodeMgmt *pMgmt, SMnodeOpt *pOption, SDCreateMnodeReq *pCreate) {
  pOption->msgCb = pMgmt->msgCb;
+  pOption->standby = false;
+  pOption->deploy = false;
  pOption->replica = pCreate->replica;
  pOption->selfIndex = -1;
+
  for (int32_t i = 0; i < pCreate->replica; ++i) {
    SReplica *pReplica = &pOption->replicas[i];
    pReplica->id = pCreate->replicas[i].id;
@@ -79,17 +91,13 @@ static int32_t mmBuildOptionFromReq(SMnodeMgmt *pMgmt, SMnodeOpt *pOption, SDCre
    dError("failed to build mnode options since %s", terrstr());
    return -1;
  }
-  pOption->deploy = true;

-  pMgmt->selfIndex = pOption->selfIndex;
-  pMgmt->replica = pOption->replica;
-  memcpy(&pMgmt->replicas, pOption->replicas, sizeof(SReplica) * TSDB_MAX_REPLICA);
  return 0;
 }

 int32_t mmAlter(SMnodeMgmt *pMgmt, SDAlterMnodeReq *pMsg) {
  SMnodeOpt option = {0};
-  if (mmBuildOptionFromReq(pMgmt, &option, pMsg) != 0) {
+  if (mmBuildOptionForAlter(pMgmt, &option, pMsg) != 0) {
    return -1;
  }

@@ -97,12 +105,6 @@ int32_t mmAlter(SMnodeMgmt *pMgmt, SDAlterMnodeReq *pMsg) {
    return -1;
  }

-  bool deployed = true;
-  if (mmWriteFile(pMgmt, pMsg, deployed) != 0) {
-    dError("failed to write mnode file since %s", terrstr());
-    return -1;
-  }
-
  return 0;
 }

@@ -110,6 +112,7 @@ static void mmClose(SMnodeMgmt *pMgmt) {
  if (pMgmt->pMnode != NULL) {
    mmStopWorker(pMgmt);
    mndClose(pMgmt->pMnode);
+    taosThreadRwlockDestroy(&pMgmt->lock);
    pMgmt->pMnode = NULL;
  }

@@ -122,6 +125,11 @@ static int32_t mmOpen(SMgmtInputOpt *pInput, SMgmtOutputOpt *pOutput) {
    return -1;
  }

+  if (syncInit() != 0) {
+    dError("failed to init sync since %s", terrstr());
+    return -1;
+  }
+
  SMnodeMgmt *pMgmt = taosMemoryCalloc(1, sizeof(SMnodeMgmt));
  if (pMgmt == NULL) {
    terrno = TSDB_CODE_OUT_OF_MEMORY;
@@ -137,6 +145,7 @@ static int32_t mmOpen(SMgmtInputOpt *pInput, SMgmtOutputOpt *pOutput) {
  pMgmt->msgCb.queueFps[WRITE_QUEUE] = (PutToQueueFp)mmPutRpcMsgToWriteQueue;
  pMgmt->msgCb.queueFps[SYNC_QUEUE] = (PutToQueueFp)mmPutRpcMsgToSyncQueue;
  pMgmt->msgCb.mgmt = pMgmt;
+  taosThreadRwlockInit(&pMgmt->lock, NULL);

  bool deployed = false;
  if (mmReadFile(pMgmt, &deployed) != 0) {
@@ -170,7 +179,8 @@ static int32_t mmOpen(SMgmtInputOpt *pInput, SMgmtOutputOpt *pOutput) {
  }
  tmsgReportStartup("mnode-worker", "initialized");

-  if (!deployed) {
+  if (!deployed || pMgmt->replica > 0) {
+    pMgmt->replica = 0;
    deployed = true;
    if (mmWriteFile(pMgmt, NULL, deployed) != 0) {
      dError("failed to write mnode file since %s", terrstr());
@@ -206,3 +216,22 @@ SMgmtFunc mmGetMgmtFunc() {

  return mgmtFunc;
 }
+
+int32_t mmAcquire(SMnodeMgmt *pMgmt) {
+  int32_t code = 0;
+
+  taosThreadRwlockRdlock(&pMgmt->lock);
+  if (pMgmt->stopped) {
+    code = -1;
+  } else {
+    atomic_add_fetch_32(&pMgmt->refCount, 1);
+  }
+  taosThreadRwlockUnlock(&pMgmt->lock);
+  return code;
+}
+
+void mmRelease(SMnodeMgmt *pMgmt) {
+  taosThreadRwlockRdlock(&pMgmt->lock);
+  atomic_sub_fetch_32(&pMgmt->refCount, 1);
+  taosThreadRwlockUnlock(&pMgmt->lock);
+}
\ No newline at end of file
--- a/source/dnode/mgmt/mgmt_mnode/src/mmWorker.c
+++ b/source/dnode/mgmt/mgmt_mnode/src/mmWorker.c
@@ -56,6 +56,23 @@ static void mmProcessQueue(SQueueInfo *pInfo, SRpcMsg *pMsg) {
  taosFreeQitem(pMsg);
 }

+static void mmProcessSyncQueue(SQueueInfo *pInfo, SRpcMsg *pMsg) {
+  SMnodeMgmt *pMgmt = pInfo->ahandle;
+  dTrace("msg:%p, get from mnode-sync queue", pMsg);
+
+  pMsg->info.node = pMgmt->pMnode;
+
+  SMsgHead *pHead = pMsg->pCont;
+  pHead->contLen = ntohl(pHead->contLen);
+  pHead->vgId = ntohl(pHead->vgId);
+
+  int32_t code = mndProcessSyncMsg(pMsg);
+
+  dTrace("msg:%p, is freed, code:0x%x", pMsg, code);
+  rpcFreeCont(pMsg->pCont);
+  taosFreeQitem(pMsg);
+}
+
 static int32_t mmPutNodeMsgToWorker(SSingleWorker *pWorker, SRpcMsg *pMsg) {
  dTrace("msg:%p, put into worker %s, type:%s", pMsg, pWorker->name, TMSG_INFO(pMsg->msgType));
  taosWriteQitem(pWorker->queue, pMsg);
@@ -105,7 +122,17 @@ int32_t mmPutRpcMsgToReadQueue(SMnodeMgmt *pMgmt, SRpcMsg *pMsg) {
 }

 int32_t mmPutRpcMsgToSyncQueue(SMnodeMgmt *pMgmt, SRpcMsg *pMsg) {
-  return mmPutRpcMsgToWorker(&pMgmt->syncWorker, pMsg);
+  int32_t code = -1;
+  if (mmAcquire(pMgmt) == 0) {
+    code = mmPutRpcMsgToWorker(&pMgmt->syncWorker, pMsg);
+    mmRelease(pMgmt);
+  }
+
+  if (code != 0) {
+    rpcFreeCont(pMsg->pCont);
+    pMsg->pCont = NULL;
+  }
+  return code;
 }

 int32_t mmStartWorker(SMnodeMgmt *pMgmt) {
@@ -149,7 +176,7 @@ int32_t mmStartWorker(SMnodeMgmt *pMgmt) {
      .min = 1,
      .max = 1,
      .name = "mnode-sync",
-      .fp = (FItem)mmProcessQueue,
+      .fp = (FItem)mmProcessSyncQueue,
      .param = pMgmt,
  };
  if (tSingleWorkerInit(&pMgmt->syncWorker, &sCfg) != 0) {
@@ -174,6 +201,11 @@ int32_t mmStartWorker(SMnodeMgmt *pMgmt) {
 }

 void mmStopWorker(SMnodeMgmt *pMgmt) {
+  taosThreadRwlockWrlock(&pMgmt->lock);
+  pMgmt->stopped = 1;
+  taosThreadRwlockUnlock(&pMgmt->lock);
+  while (pMgmt->refCount > 0) taosMsleep(10);
+
  tSingleWorkerCleanup(&pMgmt->monitorWorker);
  tSingleWorkerCleanup(&pMgmt->queryWorker);
  tSingleWorkerCleanup(&pMgmt->readWorker);

--- a/source/dnode/mgmt/mgmt_vnode/src/vmHandle.c
+++ b/source/dnode/mgmt/mgmt_vnode/src/vmHandle.c
@@ -138,7 +138,7 @@ static void vmGenerateVnodeCfg(SCreateVnodeReq *pCreate, SVnodeCfg *pCfg) {
  pCfg->dbId = pCreate->dbUid;
  pCfg->szPage = pCreate->pageSize * 1024;
  pCfg->szCache = pCreate->pages;
-  pCfg->szBuf = pCreate->buffer * 1024 * 1024;
+  pCfg->szBuf = (uint64_t)pCreate->buffer * 1024 * 1024;
  pCfg->isWeak = true;
  pCfg->tsdbCfg.compression = pCreate->compression;
  pCfg->tsdbCfg.precision = pCreate->precision;

--- a/source/dnode/mgmt/test/mnode/CMakeLists.txt
+++ b/source/dnode/mgmt/test/mnode/CMakeLists.txt
@@ -4,7 +4,7 @@ target_link_libraries(
    dmnodeTest sut
 )

-add_test(
-    NAME dmnodeTest
-    COMMAND dmnodeTest
-)
+#add_test(
+#    NAME dmnodeTest
+#    COMMAND dmnodeTest
+#)
--- a/source/dnode/mnode/impl/inc/mndInt.h
+++ b/source/dnode/mnode/impl/inc/mndInt.h
@@ -19,6 +19,7 @@
 #include "mndDef.h"

 #include "sdb.h"
+#include "syncTools.h"
 #include "tcache.h"
 #include "tdatablock.h"
 #include "tglobal.h"
@@ -31,12 +32,14 @@
 extern "C" {
 #endif

+// clang-format off
 #define mFatal(...) { if (mDebugFlag & DEBUG_FATAL) { taosPrintLog("MND FATAL ", DEBUG_FATAL, 255, __VA_ARGS__); }}
 #define mError(...) { if (mDebugFlag & DEBUG_ERROR) { taosPrintLog("MND ERROR ", DEBUG_ERROR, 255, __VA_ARGS__); }}
 #define mWarn(...)  { if (mDebugFlag & DEBUG_WARN)  { taosPrintLog("MND WARN ", DEBUG_WARN, 255, __VA_ARGS__); }}
 #define mInfo(...)  { if (mDebugFlag & DEBUG_INFO)  { taosPrintLog("MND ", DEBUG_INFO, 255, __VA_ARGS__); }}
 #define mDebug(...) { if (mDebugFlag & DEBUG_DEBUG) { taosPrintLog("MND ", DEBUG_DEBUG, mDebugFlag, __VA_ARGS__); }}
 #define mTrace(...) { if (mDebugFlag & DEBUG_TRACE) { taosPrintLog("MND ", DEBUG_TRACE, mDebugFlag, __VA_ARGS__); }}
+// clang-format on

 #define SYSTABLE_SCH_TABLE_NAME_LEN ((TSDB_TABLE_NAME_LEN - 1) + VARSTR_HEADER_SIZE)
 #define SYSTABLE_SCH_DB_NAME_LEN    ((TSDB_DB_NAME_LEN - 1) + VARSTR_HEADER_SIZE)
@@ -72,11 +75,13 @@ typedef struct {
 } STelemMgmt;

 typedef struct {
-  int32_t    errCode;
-  sem_t      syncSem;
  SWal      *pWal;
-  SSyncNode *pSyncNode;
+  sem_t      syncSem;
+  int64_t    sync;
  ESyncState state;
+  bool       standby;
+  bool       restored;
+  int32_t    errCode;
 } SSyncMgmt;

 typedef struct {

--- a/source/dnode/mnode/impl/inc/mndSync.h
+++ b/source/dnode/mnode/impl/inc/mndSync.h
@@ -26,6 +26,8 @@ int32_t mndInitSync(SMnode *pMnode);
 void    mndCleanupSync(SMnode *pMnode);
 bool    mndIsMaster(SMnode *pMnode);
 int32_t mndSyncPropose(SMnode *pMnode, SSdbRaw *pRaw);
+void    mndSyncStart(SMnode *pMnode);
+void    mndSyncStop(SMnode *pMnode);

 #ifdef __cplusplus
 }

--- a/source/dnode/mnode/impl/inc/mndTopic.h
+++ b/source/dnode/mnode/impl/inc/mndTopic.h
@@ -35,7 +35,7 @@ int32_t mndDropTopicByDB(SMnode *pMnode, STrans *pTrans, SDbObj *pDb);

 const char *mndTopicGetShowName(const char topic[TSDB_TOPIC_FNAME_LEN]);

-int32_t mndSetTopicRedoLogs(SMnode *pMnode, STrans *pTrans, SMqTopicObj *pTopic);
+int32_t mndSetTopicCommitLogs(SMnode *pMnode, STrans *pTrans, SMqTopicObj *pTopic);

 #ifdef __cplusplus
 }

--- a/source/dnode/mnode/impl/src/mndConsumer.c
+++ b/source/dnode/mnode/impl/src/mndConsumer.c
@@ -419,7 +419,9 @@ static int32_t mndProcessSubscribeReq(SRpcMsg *pMsg) {
    SMqTopicObj topicObj = {0};
    memcpy(&topicObj, pTopic, sizeof(SMqTopicObj));
    topicObj.refConsumerCnt = pTopic->refConsumerCnt + 1;
-    if (mndSetTopicRedoLogs(pMnode, pTrans, &topicObj) != 0) goto SUBSCRIBE_OVER;
+    mInfo("subscribe topic %s by consumer %ld cgroup %s, refcnt %d", pTopic->name, consumerId, cgroup,
+          topicObj.refConsumerCnt);
+    if (mndSetTopicCommitLogs(pMnode, pTrans, &topicObj) != 0) goto SUBSCRIBE_OVER;

    mndReleaseTopic(pMnode, pTopic);
  }

--- a/source/dnode/mnode/impl/src/mndDnode.c
+++ b/source/dnode/mnode/impl/src/mndDnode.c
@@ -448,13 +448,13 @@ static int32_t mndCreateDnode(SMnode *pMnode, SRpcMsg *pReq, SCreateDnodeReq *pC
  }
  mDebug("trans:%d, used to create dnode:%s", pTrans->id, dnodeObj.ep);

-  SSdbRaw *pRedoRaw = mndDnodeActionEncode(&dnodeObj);
-  if (pRedoRaw == NULL || mndTransAppendRedolog(pTrans, pRedoRaw) != 0) {
-    mError("trans:%d, failed to append redo log since %s", pTrans->id, terrstr());
+  SSdbRaw *pCommitRaw = mndDnodeActionEncode(&dnodeObj);
+  if (pCommitRaw == NULL || mndTransAppendCommitlog(pTrans, pCommitRaw) != 0) {
+    mError("trans:%d, failed to append commit log since %s", pTrans->id, terrstr());
    mndTransDrop(pTrans);
    return -1;
  }
-  sdbSetRawStatus(pRedoRaw, SDB_STATUS_READY);
+  sdbSetRawStatus(pCommitRaw, SDB_STATUS_READY);

  if (mndTransPrepare(pMnode, pTrans) != 0) {
    mError("trans:%d, failed to prepare since %s", pTrans->id, terrstr());
@@ -524,13 +524,13 @@ static int32_t mndDropDnode(SMnode *pMnode, SRpcMsg *pReq, SDnodeObj *pDnode) {
  }
  mDebug("trans:%d, used to drop dnode:%d", pTrans->id, pDnode->id);

-  SSdbRaw *pRedoRaw = mndDnodeActionEncode(pDnode);
-  if (pRedoRaw == NULL || mndTransAppendRedolog(pTrans, pRedoRaw) != 0) {
-    mError("trans:%d, failed to append redo log since %s", pTrans->id, terrstr());
+  SSdbRaw *pCommitRaw = mndDnodeActionEncode(pDnode);
+  if (pCommitRaw == NULL || mndTransAppendCommitlog(pTrans, pCommitRaw) != 0) {
+    mError("trans:%d, failed to append commit log since %s", pTrans->id, terrstr());
    mndTransDrop(pTrans);
    return -1;
  }
-  sdbSetRawStatus(pRedoRaw, SDB_STATUS_DROPPED);
+  sdbSetRawStatus(pCommitRaw, SDB_STATUS_DROPPED);

  if (mndTransPrepare(pMnode, pTrans) != 0) {
    mError("trans:%d, failed to prepare since %s", pTrans->id, terrstr());

--- a/source/dnode/mnode/impl/src/mndMnode.c
+++ b/source/dnode/mnode/impl/src/mndMnode.c
@@ -39,14 +39,16 @@ static int32_t  mndRetrieveMnodes(SRpcMsg *pReq, SShowObj *pShow, SSDataBlock *p
 static void     mndCancelGetNextMnode(SMnode *pMnode, void *pIter);

 int32_t mndInitMnode(SMnode *pMnode) {
-  SSdbTable table = {.sdbType = SDB_MNODE,
-                     .keyType = SDB_KEY_INT32,
-                     .deployFp = (SdbDeployFp)mndCreateDefaultMnode,
-                     .encodeFp = (SdbEncodeFp)mndMnodeActionEncode,
-                     .decodeFp = (SdbDecodeFp)mndMnodeActionDecode,
-                     .insertFp = (SdbInsertFp)mndMnodeActionInsert,
-                     .updateFp = (SdbUpdateFp)mndMnodeActionUpdate,
-                     .deleteFp = (SdbDeleteFp)mndMnodeActionDelete};
+  SSdbTable table = {
+      .sdbType = SDB_MNODE,
+      .keyType = SDB_KEY_INT32,
+      .deployFp = (SdbDeployFp)mndCreateDefaultMnode,
+      .encodeFp = (SdbEncodeFp)mndMnodeActionEncode,
+      .decodeFp = (SdbDecodeFp)mndMnodeActionDecode,
+      .insertFp = (SdbInsertFp)mndMnodeActionInsert,
+      .updateFp = (SdbUpdateFp)mndMnodeActionUpdate,
+      .deleteFp = (SdbDeleteFp)mndMnodeActionDelete,
+  };

  mndSetMsgHandle(pMnode, TDMT_MND_CREATE_MNODE, mndProcessCreateMnodeReq);
  mndSetMsgHandle(pMnode, TDMT_MND_DROP_MNODE, mndProcessDropMnodeReq);

--- a/source/dnode/mnode/impl/src/mndOffset.c
+++ b/source/dnode/mnode/impl/src/mndOffset.c
@@ -153,6 +153,7 @@ int32_t mndCreateOffsets(STrans *pTrans, const char *cgroup, const char *topicNa
      return -1;
    }
    sdbSetRawStatus(pOffsetRaw, SDB_STATUS_READY);
+    // commit log or redo log?
    if (mndTransAppendRedolog(pTrans, pOffsetRaw) < 0) {
      return -1;
    }
@@ -188,7 +189,7 @@ static int32_t mndProcessCommitOffsetReq(SRpcMsg *pMsg) {
    pOffsetObj->offset = pOffset->offset;
    SSdbRaw *pOffsetRaw = mndOffsetActionEncode(pOffsetObj);
    sdbSetRawStatus(pOffsetRaw, SDB_STATUS_READY);
-    mndTransAppendRedolog(pTrans, pOffsetRaw);
+    mndTransAppendCommitlog(pTrans, pOffsetRaw);
    if (create) {
      taosMemoryFree(pOffsetObj);
    } else {

--- a/source/dnode/mnode/impl/src/mndStb.c
+++ b/source/dnode/mnode/impl/src/mndStb.c
@@ -743,9 +743,7 @@ static int32_t mndCreateStb(SMnode *pMnode, SRpcMsg *pReq, SMCreateStbReq *pCrea

  mDebug("trans:%d, used to create stb:%s", pTrans->id, pCreate->name);

-  if (mndBuildStbFromReq(pMnode, &stbObj, pCreate, pDb) != 0) {
-    goto _OVER;
-  }
+  if (mndBuildStbFromReq(pMnode, &stbObj, pCreate, pDb) != 0) goto _OVER;

  if (mndAddStbToTrans(pMnode, pTrans, pDb, &stbObj) < 0) goto _OVER;


--- a/source/dnode/mnode/impl/src/mndStream.c
+++ b/source/dnode/mnode/impl/src/mndStream.c
@@ -279,13 +279,13 @@ int32_t mndAddStreamToTrans(SMnode *pMnode, SStreamObj *pStream, const char *ast
  }
  mDebug("trans:%d, used to create stream:%s", pTrans->id, pStream->name);

-  SSdbRaw *pRedoRaw = mndStreamActionEncode(pStream);
-  if (pRedoRaw == NULL || mndTransAppendRedolog(pTrans, pRedoRaw) != 0) {
-    mError("trans:%d, failed to append redo log since %s", pTrans->id, terrstr());
+  SSdbRaw *pCommitRaw = mndStreamActionEncode(pStream);
+  if (pCommitRaw == NULL || mndTransAppendCommitlog(pTrans, pCommitRaw) != 0) {
+    mError("trans:%d, failed to append commit log since %s", pTrans->id, terrstr());
    mndTransDrop(pTrans);
    return -1;
  }
-  sdbSetRawStatus(pRedoRaw, SDB_STATUS_READY);
+  sdbSetRawStatus(pCommitRaw, SDB_STATUS_READY);

  return 0;
 }

--- a/source/dnode/mnode/impl/src/mndSubscribe.c
+++ b/source/dnode/mnode/impl/src/mndSubscribe.c
@@ -417,7 +417,7 @@ static int32_t mndPersistRebResult(SMnode *pMnode, SRpcMsg *pMsg, const SMqRebOu

  // 2. redo log: subscribe and vg assignment
  // subscribe
-  if (mndSetSubRedoLogs(pMnode, pTrans, pOutput->pSub) != 0) {
+  if (mndSetSubCommitLogs(pMnode, pTrans, pOutput->pSub) != 0) {
    goto REB_FAIL;
  }

@@ -479,7 +479,11 @@ static int32_t mndPersistRebResult(SMnode *pMnode, SRpcMsg *pMsg, const SMqRebOu
      SMqTopicObj topicObj = {0};
      memcpy(&topicObj, pTopic, sizeof(SMqTopicObj));
      topicObj.refConsumerCnt = pTopic->refConsumerCnt - consumerNum;
-      if (mndSetTopicRedoLogs(pMnode, pTrans, &topicObj) != 0) goto REB_FAIL;
+      // TODO is that correct?
+      pTopic->refConsumerCnt = topicObj.refConsumerCnt;
+      mInfo("subscribe topic %s unref %d consumer cgroup %s, refcnt %d", pTopic->name, consumerNum, cgroup,
+            topicObj.refConsumerCnt);
+      if (mndSetTopicCommitLogs(pMnode, pTrans, &topicObj) != 0) goto REB_FAIL;
    }
  }


--- a/source/dnode/mnode/impl/src/mndSync.c
+++ b/source/dnode/mnode/impl/src/mndSync.c
@@ -17,178 +17,199 @@
 #include "mndSync.h"
 #include "mndTrans.h"

-static int32_t mndInitWal(SMnode *pMnode) {
-  SSyncMgmt *pMgmt = &pMnode->syncMgmt;
+int32_t mndSyncEqMsg(const SMsgCb *msgcb, SRpcMsg *pMsg) { 
+  SMsgHead *pHead = pMsg->pCont;
+  pHead->contLen = htonl(pHead->contLen);
+  pHead->vgId = htonl(pHead->vgId);

-  char path[PATH_MAX] = {0};
-  snprintf(path, sizeof(path), "%s%swal", pMnode->path, TD_DIRSEP);
-  SWalCfg cfg = {
-      .vgId = 1,
-      .fsyncPeriod = 0,
-      .rollPeriod = -1,
-      .segSize = -1,
-      .retentionPeriod = -1,
-      .retentionSize = -1,
-      .level = TAOS_WAL_FSYNC,
-  };
-  pMgmt->pWal = walOpen(path, &cfg);
-  if (pMgmt->pWal == NULL) return -1;
+  return tmsgPutToQueue(msgcb, SYNC_QUEUE, pMsg); 
+}
+
+int32_t mndSyncSendMsg(const SEpSet *pEpSet, SRpcMsg *pMsg) { return tmsgSendReq(pEpSet, pMsg); }
+
+void mndSyncCommitMsg(struct SSyncFSM *pFsm, const SRpcMsg *pMsg, SFsmCbMeta cbMeta) {
+  SMnode  *pMnode = pFsm->data;
+  SSdbRaw *pRaw = pMsg->pCont;

+  mTrace("raw:%p, apply to sdb, ver:%" PRId64 " role:%s", pRaw, cbMeta.index, syncStr(cbMeta.state));
+  sdbWriteWithoutFree(pMnode->pSdb, pRaw);
+  sdbSetApplyIndex(pMnode->pSdb, cbMeta.index);
+  sdbSetApplyTerm(pMnode->pSdb, cbMeta.term);
+  if (cbMeta.state == TAOS_SYNC_STATE_LEADER) {
+    tsem_post(&pMnode->syncMgmt.syncSem);
+  }
+}
+
+int32_t mndSyncGetSnapshot(struct SSyncFSM *pFsm, SSnapshot *pSnapshot) {
+  SMnode *pMnode = pFsm->data;
+  pSnapshot->lastApplyIndex = sdbGetApplyIndex(pMnode->pSdb);
+  pSnapshot->lastApplyTerm = sdbGetApplyTerm(pMnode->pSdb);
  return 0;
 }

-static void mndCloseWal(SMnode *pMnode) {
-  SSyncMgmt *pMgmt = &pMnode->syncMgmt;
-  if (pMgmt->pWal != NULL) {
-    walClose(pMgmt->pWal);
-    pMgmt->pWal = NULL;
-  }
+void mndRestoreFinish(struct SSyncFSM *pFsm) {
+  SMnode *pMnode = pFsm->data;
+  mndTransPullup(pMnode);
+  pMnode->syncMgmt.restored = true;
 }

-static int32_t mndRestoreWal(SMnode *pMnode) {
-  SWal   *pWal = pMnode->syncMgmt.pWal;
-  SSdb   *pSdb = pMnode->pSdb;
-  int64_t lastSdbVer = sdbUpdateVer(pSdb, 0);
-  int32_t code = -1;
-
-  SWalReadHandle *pHandle = walOpenReadHandle(pWal);
-  if (pHandle == NULL) return -1;
-
-  int64_t first = walGetFirstVer(pWal);
-  int64_t last = walGetLastVer(pWal);
-  mDebug("start to restore wal, sdbver:%" PRId64 ", first:%" PRId64 " last:%" PRId64, lastSdbVer, first, last);
-
-  first = TMAX(lastSdbVer + 1, first);
-  for (int64_t ver = first; ver >= 0 && ver <= last; ++ver) {
-    if (walReadWithHandle(pHandle, ver) < 0) {
-      mError("ver:%" PRId64 ", failed to read from wal since %s", ver, terrstr());
-      goto _OVER;
-    }
-
-    SWalHead *pHead = pHandle->pHead;
-    int64_t   sdbVer = sdbUpdateVer(pSdb, 0);
-    if (sdbVer + 1 != ver) {
-      terrno = TSDB_CODE_SDB_INVALID_WAl_VER;
-      mError("ver:%" PRId64 ", failed to write to sdb, since inconsistent with sdbver:%" PRId64, ver, sdbVer);
-      goto _OVER;
-    }
-
-    mTrace("ver:%" PRId64 ", will be restored, content:%p", ver, pHead->head.body);
-    if (sdbWriteWithoutFree(pSdb, (void *)pHead->head.body) < 0) {
-      mError("ver:%" PRId64 ", failed to write to sdb since %s", ver, terrstr());
-      goto _OVER;
-    }
-
-    sdbUpdateVer(pSdb, 1);
-    mDebug("ver:%" PRId64 ", is restored", ver);
+void *mndSnapshotRead(struct SSyncFSM *pFsm, const SSnapshot *snapshot, void *iter, char **ppBuf, int32_t *len) {
+  SMnode   *pMnode = pFsm->data;
+  SSdbIter *pIter = iter;
+
+  if (iter == NULL) {
+    pIter = sdbIterInit(pMnode->pSdb);
  }

-  int64_t sdbVer = sdbUpdateVer(pSdb, 0);
-  mDebug("restore wal finished, sdbver:%" PRId64, sdbVer);
+  return sdbIterRead(pMnode->pSdb, pIter, ppBuf, len);
+}

-  mndTransPullup(pMnode);
-  sdbVer = sdbUpdateVer(pSdb, 0);
-  mDebug("pullup trans finished, sdbver:%" PRId64, sdbVer);
-
-  if (sdbVer != lastSdbVer) {
-    mInfo("sdb restored from %" PRId64 " to %" PRId64 ", write file", lastSdbVer, sdbVer);
-    if (sdbWriteFile(pSdb) != 0) {
-      goto _OVER;
-    }
-
-    if (walCommit(pWal, sdbVer) != 0) {
-      goto _OVER;
-    }
-
-    if (walBeginSnapshot(pWal, sdbVer) < 0) {
-      goto _OVER;
-    }
-
-    if (walEndSnapshot(pWal) < 0) {
-      goto _OVER;
-    }
-  }
+int32_t mndSnapshotApply(struct SSyncFSM* pFsm, const SSnapshot* snapshot, char* pBuf, int32_t len) {
+  SMnode *pMnode = pFsm->data;
+  sdbWrite(pMnode->pSdb, (SSdbRaw*)pBuf);
+  return 0;
+}
+  
+void mndReConfig(struct SSyncFSM* pFsm, SSyncCfg newCfg, SReConfigCbMeta cbMeta) {

-  code = 0;
+}

-_OVER:
-  walCloseReadHandle(pHandle);
-  return code;
+SSyncFSM *mndSyncMakeFsm(SMnode *pMnode) {
+  SSyncFSM *pFsm = taosMemoryCalloc(1, sizeof(SSyncFSM));
+  pFsm->data = pMnode;
+
+  pFsm->FpCommitCb = mndSyncCommitMsg;
+  pFsm->FpPreCommitCb = NULL;
+  pFsm->FpRollBackCb = NULL;
+
+  pFsm->FpGetSnapshot = mndSyncGetSnapshot;
+  pFsm->FpRestoreFinishCb = mndRestoreFinish;
+  pFsm->FpSnapshotRead = mndSnapshotRead;
+  pFsm->FpSnapshotApply = mndSnapshotApply;
+  pFsm->FpReConfigCb = mndReConfig;
+  
+  return pFsm;
 }

 int32_t mndInitSync(SMnode *pMnode) {
  SSyncMgmt *pMgmt = &pMnode->syncMgmt;
-  tsem_init(&pMgmt->syncSem, 0, 0);

-  if (mndInitWal(pMnode) < 0) {
+  char path[PATH_MAX + 20] = {0};
+  snprintf(path, sizeof(path), "%s%swal", pMnode->path, TD_DIRSEP);
+  SWalCfg cfg = {
+      .vgId = 1,
+      .fsyncPeriod = 0,
+      .rollPeriod = -1,
+      .segSize = -1,
+      .retentionPeriod = -1,
+      .retentionSize = -1,
+      .level = TAOS_WAL_FSYNC,
+  };
+
+  pMgmt->pWal = walOpen(path, &cfg);
+  if (pMgmt->pWal == NULL) {
    mError("failed to open wal since %s", terrstr());
    return -1;
  }

-  if (mndRestoreWal(pMnode) < 0) {
-    mError("failed to restore wal since %s", terrstr());
-    return -1;
+  SSyncInfo syncInfo = {.vgId = 1, .FpSendMsg = mndSyncSendMsg, .FpEqMsg = mndSyncEqMsg};
+  snprintf(syncInfo.path, sizeof(syncInfo.path), "%s%ssync", pMnode->path, TD_DIRSEP);
+  syncInfo.pWal = pMgmt->pWal;
+  syncInfo.pFsm = mndSyncMakeFsm(pMnode);
+
+  SSyncCfg *pCfg = &syncInfo.syncCfg;
+  pCfg->replicaNum = pMnode->replica;
+  pCfg->myIndex = pMnode->selfIndex;
+  mInfo("start to open mnode sync, replica:%d myindex:%d standby:%d", pCfg->replicaNum, pCfg->myIndex,
+        pMgmt->standby);
+  for (int32_t i = 0; i < pMnode->replica; ++i) {
+    SNodeInfo *pNode = &pCfg->nodeInfo[i];
+    tstrncpy(pNode->nodeFqdn, pMnode->replicas[i].fqdn, sizeof(pNode->nodeFqdn));
+    pNode->nodePort = pMnode->replicas[i].port;
+    mInfo("index:%d, fqdn:%s port:%d", i, pNode->nodeFqdn, pNode->nodePort);
  }

-  if (pMnode->selfId == 1) {
-    pMgmt->state = TAOS_SYNC_STATE_LEADER;
+  tsem_init(&pMgmt->syncSem, 0, 0);
+  pMgmt->sync = syncOpen(&syncInfo);
+  if (pMgmt->sync <= 0) {
+    mError("failed to open sync since %s", terrstr());
+    return -1;
  }
-  pMgmt->pSyncNode = NULL;
+
+  mDebug("mnode sync is opened, id:%" PRId64, pMgmt->sync);
  return 0;
 }

 void mndCleanupSync(SMnode *pMnode) {
  SSyncMgmt *pMgmt = &pMnode->syncMgmt;
-  tsem_destroy(&pMgmt->syncSem);
-  mndCloseWal(pMnode);
-}
+  syncStop(pMgmt->sync);
+  mDebug("sync:%" PRId64 " is stopped", pMgmt->sync);

-static int32_t mndSyncApplyCb(struct SSyncFSM *fsm, SyncIndex index, const SSyncBuffer *buf, void *pData) {
-  SMnode    *pMnode = pData;
-  SSyncMgmt *pMgmt = &pMnode->syncMgmt;
-
-  pMgmt->errCode = 0;
-  tsem_post(&pMgmt->syncSem);
+  tsem_destroy(&pMgmt->syncSem);
+  if (pMgmt->pWal != NULL) {
+    walClose(pMgmt->pWal);
+  }

-  return 0;
+  memset(pMgmt, 0, sizeof(SSyncMgmt));
 }

 int32_t mndSyncPropose(SMnode *pMnode, SSdbRaw *pRaw) {
-  SWal *pWal = pMnode->syncMgmt.pWal;
-  SSdb *pSdb = pMnode->pSdb;
-
-  int64_t ver = sdbUpdateVer(pSdb, 1);
-  if (walWrite(pWal, ver, 1, pRaw, sdbGetRawTotalSize(pRaw)) < 0) {
-    sdbUpdateVer(pSdb, -1);
-    mError("ver:%" PRId64 ", failed to write raw:%p to wal since %s", ver, pRaw, terrstr());
-    return -1;
-  }
-
-  mTrace("ver:%" PRId64 ", write to wal, raw:%p", ver, pRaw);
-  walCommit(pWal, ver);
-  walFsync(pWal, true);
-
-#if 1
-  return 0;
-#else
-  if (pMnode->replica == 1) return 0;
-
  SSyncMgmt *pMgmt = &pMnode->syncMgmt;
  pMgmt->errCode = 0;

-  SSyncBuffer buf = {.data = pRaw, .len = sdbGetRawTotalSize(pRaw)};
-
-  bool    isWeak = false;
-  int32_t code = syncPropose(pMgmt->pSyncNode, &buf, pMnode, isWeak);
+  SRpcMsg rsp = {.code = TDMT_MND_APPLY_MSG, .contLen = sdbGetRawTotalSize(pRaw)};
+  rsp.pCont = rpcMallocCont(rsp.contLen);
+  if (rsp.pCont == NULL) return -1;
+  memcpy(rsp.pCont, pRaw, rsp.contLen);
+
+  const bool isWeak = false;
+  int32_t    code = syncPropose(pMgmt->sync, &rsp, isWeak);
+  if (code == 0) {
+    tsem_wait(&pMgmt->syncSem);
+  } else if (code == TAOS_SYNC_PROPOSE_NOT_LEADER) {
+    terrno = TSDB_CODE_APP_NOT_READY;
+  } else if (code == TAOS_SYNC_PROPOSE_OTHER_ERROR) {
+    terrno = TSDB_CODE_SYN_INTERNAL_ERROR;
+  } else {
+    terrno = TSDB_CODE_APP_ERROR;
+  }

+  rpcFreeCont(rsp.pCont);
  if (code != 0) return code;
-
-  tsem_wait(&pMgmt->syncSem);
  return pMgmt->errCode;
-#endif
 }

+void mndSyncStart(SMnode *pMnode) {
+  SSyncMgmt *pMgmt = &pMnode->syncMgmt;
+  syncSetMsgCb(pMgmt->sync, &pMnode->msgCb);
+  if (pMgmt->standby) {
+    syncStartStandBy(pMgmt->sync);
+  } else {
+    syncStart(pMgmt->sync);
+  }
+  mDebug("sync:%" PRId64 " is started", pMgmt->sync);
+}
+
+void mndSyncStop(SMnode *pMnode) {}
+
 bool mndIsMaster(SMnode *pMnode) {
  SSyncMgmt *pMgmt = &pMnode->syncMgmt;
-  return pMgmt->state == TAOS_SYNC_STATE_LEADER;
+  pMgmt->state = syncGetMyRole(pMgmt->sync);
+
+  return (pMgmt->state == TAOS_SYNC_STATE_LEADER) && (pMnode->syncMgmt.restored);
 }
+
+int32_t mndAlter(SMnode *pMnode, const SMnodeOpt *pOption) {
+  SSyncCfg cfg = {.replicaNum = pOption->replica, .myIndex = pOption->selfIndex};
+  mInfo("start to alter mnode sync, replica:%d myindex:%d standby:%d", cfg.replicaNum, cfg.myIndex, pOption->standby);
+  for (int32_t i = 0; i < pOption->replica; ++i) {
+    SNodeInfo *pNode = &cfg.nodeInfo[i];
+    tstrncpy(pNode->nodeFqdn, pOption->replicas[i].fqdn, sizeof(pNode->nodeFqdn));
+    pNode->nodePort = pOption->replicas[i].port;
+    mInfo("index:%d, fqdn:%s port:%d", i, pNode->nodeFqdn, pNode->nodePort);
+  }
+
+  SSyncMgmt *pMgmt = &pMnode->syncMgmt;
+  pMgmt->standby = pOption->standby;
+  return syncReconfig(pMgmt->sync, &cfg);
+}
\ No newline at end of file
--- a/source/dnode/mnode/impl/src/mndTopic.c
+++ b/source/dnode/mnode/impl/src/mndTopic.c
@@ -386,14 +386,14 @@ static int32_t mndCreateTopic(SMnode *pMnode, SRpcMsg *pReq, SCMCreateTopicReq *
  }
  mDebug("trans:%d, used to create topic:%s", pTrans->id, pCreate->name);

-  SSdbRaw *pRedoRaw = mndTopicActionEncode(&topicObj);
-  if (pRedoRaw == NULL || mndTransAppendRedolog(pTrans, pRedoRaw) != 0) {
-    mError("trans:%d, failed to append redo log since %s", pTrans->id, terrstr());
+  SSdbRaw *pCommitRaw = mndTopicActionEncode(&topicObj);
+  if (pCommitRaw == NULL || mndTransAppendCommitlog(pTrans, pCommitRaw) != 0) {
+    mError("trans:%d, failed to append commit log since %s", pTrans->id, terrstr());
    taosMemoryFreeClear(topicObj.physicalPlan);
    mndTransDrop(pTrans);
    return -1;
  }
-  sdbSetRawStatus(pRedoRaw, SDB_STATUS_READY);
+  sdbSetRawStatus(pCommitRaw, SDB_STATUS_READY);

  if (mndTransPrepare(pMnode, pTrans) != 0) {
    mError("trans:%d, failed to prepare since %s", pTrans->id, terrstr());
@@ -473,13 +473,13 @@ CREATE_TOPIC_OVER:
 }

 static int32_t mndDropTopic(SMnode *pMnode, STrans *pTrans, SRpcMsg *pReq, SMqTopicObj *pTopic) {
-  SSdbRaw *pRedoRaw = mndTopicActionEncode(pTopic);
-  if (pRedoRaw == NULL || mndTransAppendRedolog(pTrans, pRedoRaw) != 0) {
-    mError("trans:%d, failed to append redo log since %s", pTrans->id, terrstr());
+  SSdbRaw *pCommitRaw = mndTopicActionEncode(pTopic);
+  if (pCommitRaw == NULL || mndTransAppendCommitlog(pTrans, pCommitRaw) != 0) {
+    mError("trans:%d, failed to append commit log since %s", pTrans->id, terrstr());
    mndTransDrop(pTrans);
    return -1;
  }
-  sdbSetRawStatus(pRedoRaw, SDB_STATUS_DROPPED);
+  sdbSetRawStatus(pCommitRaw, SDB_STATUS_DROPPED);

  if (mndTransPrepare(pMnode, pTrans) != 0) {
    mError("trans:%d, failed to prepare since %s", pTrans->id, terrstr());
@@ -627,11 +627,11 @@ static int32_t mndRetrieveTopic(SRpcMsg *pReq, SShowObj *pShow, SSDataBlock *pBl
  return numOfRows;
 }

-int32_t mndSetTopicRedoLogs(SMnode *pMnode, STrans *pTrans, SMqTopicObj *pTopic) {
-  SSdbRaw *pRedoRaw = mndTopicActionEncode(pTopic);
-  if (pRedoRaw == NULL) return -1;
-  if (mndTransAppendCommitlog(pTrans, pRedoRaw) != 0) return -1;
-  if (sdbSetRawStatus(pRedoRaw, SDB_STATUS_READY) != 0) return -1;
+int32_t mndSetTopicCommitLogs(SMnode *pMnode, STrans *pTrans, SMqTopicObj *pTopic) {
+  SSdbRaw *pCommitRaw = mndTopicActionEncode(pTopic);
+  if (pCommitRaw == NULL) return -1;
+  if (mndTransAppendCommitlog(pTrans, pCommitRaw) != 0) return -1;
+  if (sdbSetRawStatus(pCommitRaw, SDB_STATUS_READY) != 0) return -1;

  return 0;
 }

--- a/source/dnode/mnode/impl/src/mndTrans.c
+++ b/source/dnode/mnode/impl/src/mndTrans.c
@@ -681,14 +681,8 @@ static int32_t mndTransSync(SMnode *pMnode, STrans *pTrans) {
    return -1;
  }

+  sdbFreeRaw(pRaw);
  mDebug("trans:%d, sync finished", pTrans->id);
-
-  code = sdbWrite(pMnode->pSdb, pRaw);
-  if (code != 0) {
-    mError("trans:%d, failed to write sdb since %s", pTrans->id, terrstr());
-    return -1;
-  }
-
  return 0;
 }

@@ -768,6 +762,12 @@ int32_t mndTransPrepare(SMnode *pMnode, STrans *pTrans) {
    return -1;
  }

+  if (taosArrayGetSize(pTrans->commitLogs) <= 0) {
+    terrno = TSDB_CODE_MND_TRANS_CLOG_IS_NULL;
+    mError("trans:%d, failed to prepare since %s", pTrans->id, terrstr());
+    return -1;
+  }
+
  mDebug("trans:%d, prepare transaction", pTrans->id);
  if (mndTransSync(pMnode, pTrans) != 0) {
    mError("trans:%d, failed to prepare since %s", pTrans->id, terrstr());
@@ -1080,6 +1080,8 @@ static bool mndTransPerformRedoLogStage(SMnode *pMnode, STrans *pTrans) {
 }

 static bool mndTransPerformRedoActionStage(SMnode *pMnode, STrans *pTrans) {
+  if (!mndIsMaster(pMnode)) return false;
+
  bool    continueExec = true;
  int32_t code = mndTransExecuteRedoActions(pMnode, pTrans);

@@ -1169,6 +1171,8 @@ static bool mndTransPerformUndoLogStage(SMnode *pMnode, STrans *pTrans) {
 }

 static bool mndTransPerformUndoActionStage(SMnode *pMnode, STrans *pTrans) {
+  if (!mndIsMaster(pMnode)) return false;
+
  bool    continueExec = true;
  int32_t code = mndTransExecuteUndoActions(pMnode, pTrans);

@@ -1350,19 +1354,35 @@ _OVER:
  return code;
 }

+static int32_t mndCompareTransId(int32_t *pTransId1, int32_t *pTransId2) { return *pTransId1 >= *pTransId2 ? 1 : 0; }
+
 void mndTransPullup(SMnode *pMnode) {
-  STrans *pTrans = NULL;
-  void   *pIter = NULL;
+  SSdb   *pSdb = pMnode->pSdb;
+  SArray *pArray = taosArrayInit(sdbGetSize(pSdb, SDB_TRANS), sizeof(int32_t));
+  if (pArray == NULL) return;

+  void *pIter = NULL;
  while (1) {
+    STrans *pTrans = NULL;
    pIter = sdbFetch(pMnode->pSdb, SDB_TRANS, pIter, (void **)&pTrans);
    if (pIter == NULL) break;
+    taosArrayPush(pArray, &pTrans->id);
+    sdbRelease(pSdb, pTrans);
+  }

-    mndTransExecute(pMnode, pTrans);
-    sdbRelease(pMnode->pSdb, pTrans);
+  taosArraySort(pArray, (__compar_fn_t)mndCompareTransId);
+
+  for (int32_t i = 0; i < taosArrayGetSize(pArray); ++i) {
+    int32_t *pTransId = taosArrayGet(pArray, i);
+    STrans  *pTrans = mndAcquireTrans(pMnode, *pTransId);
+    if (pTrans != NULL) {
+      mndTransExecute(pMnode, pTrans);
+    }
+    mndReleaseTrans(pMnode, pTrans);
  }

  sdbWriteFile(pMnode->pSdb);
+  taosArrayDestroy(pArray);
 }

 static int32_t mndRetrieveTrans(SRpcMsg *pReq, SShowObj *pShow, SSDataBlock *pBlock, int32_t rows) {

--- a/source/dnode/mnode/impl/src/mndUser.c
+++ b/source/dnode/mnode/impl/src/mndUser.c
@@ -272,13 +272,13 @@ static int32_t mndCreateUser(SMnode *pMnode, char *acct, SCreateUserReq *pCreate
  }
  mDebug("trans:%d, used to create user:%s", pTrans->id, pCreate->user);

-  SSdbRaw *pRedoRaw = mndUserActionEncode(&userObj);
-  if (pRedoRaw == NULL || mndTransAppendRedolog(pTrans, pRedoRaw) != 0) {
-    mError("trans:%d, failed to append redo log since %s", pTrans->id, terrstr());
+  SSdbRaw *pCommitRaw = mndUserActionEncode(&userObj);
+  if (pCommitRaw == NULL || mndTransAppendCommitlog(pTrans, pCommitRaw) != 0) {
+    mError("trans:%d, failed to commit redo log since %s", pTrans->id, terrstr());
    mndTransDrop(pTrans);
    return -1;
  }
-  sdbSetRawStatus(pRedoRaw, SDB_STATUS_READY);
+  sdbSetRawStatus(pCommitRaw, SDB_STATUS_READY);

  if (mndTransPrepare(pMnode, pTrans) != 0) {
    mError("trans:%d, failed to prepare since %s", pTrans->id, terrstr());
@@ -352,13 +352,13 @@ static int32_t mndAlterUser(SMnode *pMnode, SUserObj *pOld, SUserObj *pNew, SRpc
  }
  mDebug("trans:%d, used to alter user:%s", pTrans->id, pOld->user);

-  SSdbRaw *pRedoRaw = mndUserActionEncode(pNew);
-  if (pRedoRaw == NULL || mndTransAppendRedolog(pTrans, pRedoRaw) != 0) {
-    mError("trans:%d, failed to append redo log since %s", pTrans->id, terrstr());
+  SSdbRaw *pCommitRaw = mndUserActionEncode(pNew);
+  if (pCommitRaw == NULL || mndTransAppendCommitlog(pTrans, pCommitRaw) != 0) {
+    mError("trans:%d, failed to append commit log since %s", pTrans->id, terrstr());
    mndTransDrop(pTrans);
    return -1;
  }
-  sdbSetRawStatus(pRedoRaw, SDB_STATUS_READY);
+  sdbSetRawStatus(pCommitRaw, SDB_STATUS_READY);

  if (mndTransPrepare(pMnode, pTrans) != 0) {
    mError("trans:%d, failed to prepare since %s", pTrans->id, terrstr());
@@ -559,13 +559,13 @@ static int32_t mndDropUser(SMnode *pMnode, SRpcMsg *pReq, SUserObj *pUser) {
  }
  mDebug("trans:%d, used to drop user:%s", pTrans->id, pUser->user);

-  SSdbRaw *pRedoRaw = mndUserActionEncode(pUser);
-  if (pRedoRaw == NULL || mndTransAppendRedolog(pTrans, pRedoRaw) != 0) {
-    mError("trans:%d, failed to append redo log since %s", pTrans->id, terrstr());
+  SSdbRaw *pCommitRaw = mndUserActionEncode(pUser);
+  if (pCommitRaw == NULL || mndTransAppendCommitlog(pTrans, pCommitRaw) != 0) {
+    mError("trans:%d, failed to append commit log since %s", pTrans->id, terrstr());
    mndTransDrop(pTrans);
    return -1;
  }
-  sdbSetRawStatus(pRedoRaw, SDB_STATUS_DROPPED);
+  sdbSetRawStatus(pCommitRaw, SDB_STATUS_DROPPED);

  if (mndTransPrepare(pMnode, pTrans) != 0) {
    mError("trans:%d, failed to prepare since %s", pTrans->id, terrstr());

--- a/source/dnode/mnode/impl/src/mnode.c
+++ b/source/dnode/mnode/impl/src/mnode.c
@@ -86,7 +86,6 @@ static void *mndThreadFp(void *param) {
    lastTime++;
    taosMsleep(100);
    if (pMnode->stopped) break;
-    if (!mndIsMaster(pMnode)) continue;

    if (lastTime % (tsTransPullupInterval * 10) == 0) {
      mndPullupTrans(pMnode);
@@ -264,6 +263,7 @@ static void mndSetOptions(SMnode *pMnode, const SMnodeOpt *pOption) {
  memcpy(&pMnode->replicas, pOption->replicas, sizeof(SReplica) * TSDB_MAX_REPLICA);
  pMnode->msgCb = pOption->msgCb;
  pMnode->selfId = pOption->replicas[pOption->selfIndex].id;
+  pMnode->syncMgmt.standby = pOption->standby;
 }

 SMnode *mndOpen(const char *path, const SMnodeOpt *pOption) {
@@ -330,15 +330,77 @@ void mndClose(SMnode *pMnode) {
  }
 }

-int32_t mndAlter(SMnode *pMnode, const SMnodeOpt *pOption) {
-  mDebug("start to alter mnode");
-  mDebug("mnode is altered");
-  return 0;
+int32_t mndStart(SMnode *pMnode) {
+  mndSyncStart(pMnode);
+  return mndInitTimer(pMnode);
+}
+
+void mndStop(SMnode *pMnode) {
+  mndSyncStop(pMnode);
+  return mndCleanupTimer(pMnode);
 }

-int32_t mndStart(SMnode *pMnode) { return mndInitTimer(pMnode); }
+int32_t mndProcessSyncMsg(SRpcMsg *pMsg) {
+  SMnode    *pMnode = pMsg->info.node;
+  SSyncMgmt *pMgmt = &pMnode->syncMgmt;
+  int32_t    code = TAOS_SYNC_PROPOSE_OTHER_ERROR;
+
+  if (!syncEnvIsStart()) {
+    mError("failed to process sync msg:%p type:%s since syncEnv stop", pMsg, TMSG_INFO(pMsg->msgType));
+    return TAOS_SYNC_PROPOSE_OTHER_ERROR;
+  }
+
+  SSyncNode *pSyncNode = syncNodeAcquire(pMgmt->sync);
+  if (pSyncNode == NULL) {
+    mError("failed to process sync msg:%p type:%s since syncNode is null", pMsg, TMSG_INFO(pMsg->msgType));
+    return TAOS_SYNC_PROPOSE_OTHER_ERROR;
+  }
+
+  char  logBuf[512];
+  char *syncNodeStr = sync2SimpleStr(pMgmt->sync);
+  snprintf(logBuf, sizeof(logBuf), "==vnodeProcessSyncReq== msgType:%d, syncNode: %s", pMsg->msgType, syncNodeStr);
+  syncRpcMsgLog2(logBuf, pMsg);
+  taosMemoryFree(syncNodeStr);
+
+  if (pMsg->msgType == TDMT_VND_SYNC_TIMEOUT) {
+    SyncTimeout *pSyncMsg = syncTimeoutFromRpcMsg2(pMsg);
+    code = syncNodeOnTimeoutCb(pSyncNode, pSyncMsg);
+    syncTimeoutDestroy(pSyncMsg);
+  } else if (pMsg->msgType == TDMT_VND_SYNC_PING) {
+    SyncPing *pSyncMsg = syncPingFromRpcMsg2(pMsg);
+    code = syncNodeOnPingCb(pSyncNode, pSyncMsg);
+    syncPingDestroy(pSyncMsg);
+  } else if (pMsg->msgType == TDMT_VND_SYNC_PING_REPLY) {
+    SyncPingReply *pSyncMsg = syncPingReplyFromRpcMsg2(pMsg);
+    code = syncNodeOnPingReplyCb(pSyncNode, pSyncMsg);
+    syncPingReplyDestroy(pSyncMsg);
+  } else if (pMsg->msgType == TDMT_VND_SYNC_CLIENT_REQUEST) {
+    SyncClientRequest *pSyncMsg = syncClientRequestFromRpcMsg2(pMsg);
+    code = syncNodeOnClientRequestCb(pSyncNode, pSyncMsg);
+    syncClientRequestDestroy(pSyncMsg);
+  } else if (pMsg->msgType == TDMT_VND_SYNC_REQUEST_VOTE) {
+    SyncRequestVote *pSyncMsg = syncRequestVoteFromRpcMsg2(pMsg);
+    code = syncNodeOnRequestVoteCb(pSyncNode, pSyncMsg);
+    syncRequestVoteDestroy(pSyncMsg);
+  } else if (pMsg->msgType == TDMT_VND_SYNC_REQUEST_VOTE_REPLY) {
+    SyncRequestVoteReply *pSyncMsg = syncRequestVoteReplyFromRpcMsg2(pMsg);
+    code = syncNodeOnRequestVoteReplyCb(pSyncNode, pSyncMsg);
+    syncRequestVoteReplyDestroy(pSyncMsg);
+  } else if (pMsg->msgType == TDMT_VND_SYNC_APPEND_ENTRIES) {
+    SyncAppendEntries *pSyncMsg = syncAppendEntriesFromRpcMsg2(pMsg);
+    code = syncNodeOnAppendEntriesCb(pSyncNode, pSyncMsg);
+    syncAppendEntriesDestroy(pSyncMsg);
+  } else if (pMsg->msgType == TDMT_VND_SYNC_APPEND_ENTRIES_REPLY) {
+    SyncAppendEntriesReply *pSyncMsg = syncAppendEntriesReplyFromRpcMsg2(pMsg);
+    code = syncNodeOnAppendEntriesReplyCb(pSyncNode, pSyncMsg);
+    syncAppendEntriesReplyDestroy(pSyncMsg);
+  } else {
+    mError("failed to process msg:%p since invalid type:%s", pMsg, TMSG_INFO(pMsg->msgType));
+    code = TAOS_SYNC_PROPOSE_OTHER_ERROR;
+  }

-void mndStop(SMnode *pMnode) { return mndCleanupTimer(pMnode); }
+  return code;
+}

 int32_t mndProcessMsg(SRpcMsg *pMsg) {
  SMnode *pMnode = pMsg->info.node;
@@ -346,7 +408,8 @@ int32_t mndProcessMsg(SRpcMsg *pMsg) {
  mTrace("msg:%p, will be processed, type:%s app:%p", pMsg, TMSG_INFO(pMsg->msgType), ahandle);

  if (IsReq(pMsg)) {
-    if (!mndIsMaster(pMnode)) {
+    if (!mndIsMaster(pMnode) && pMsg->msgType != TDMT_MND_TRANS_TIMER && pMsg->msgType != TDMT_MND_MQ_TIMER &&
+        pMsg->msgType != TDMT_MND_TELEM_TIMER) {
      terrno = TSDB_CODE_APP_NOT_READY;
      mDebug("msg:%p, failed to process since %s, app:%p", pMsg, terrstr(), ahandle);
      return -1;

--- a/source/dnode/mnode/impl/test/sdb/sdbTest.cpp
+++ b/source/dnode/mnode/impl/test/sdb/sdbTest.cpp
@@ -493,9 +493,8 @@ TEST_F(MndTestSdb, 01_Write_Str) {
  ASSERT_EQ(sdbGetSize(pSdb, SDB_USER), 2);
  ASSERT_EQ(sdbGetMaxId(pSdb, SDB_USER), -1);
  ASSERT_EQ(sdbGetTableVer(pSdb, SDB_USER), 2 );
-  ASSERT_EQ(sdbUpdateVer(pSdb, 0), -1);
-  ASSERT_EQ(sdbUpdateVer(pSdb, 1), 0);
-  ASSERT_EQ(sdbUpdateVer(pSdb, -1), -1);
+  sdbSetApplyIndex(pSdb, -1);
+  ASSERT_EQ(sdbGetApplyIndex(pSdb), -1);
  ASSERT_EQ(mnode.insertTimes, 2);
  ASSERT_EQ(mnode.deleteTimes, 0);

@@ -537,9 +536,6 @@ TEST_F(MndTestSdb, 01_Write_Str) {

    ASSERT_EQ(sdbGetSize(pSdb, SDB_USER), 3);
    ASSERT_EQ(sdbGetTableVer(pSdb, SDB_USER), 4);
-    ASSERT_EQ(sdbUpdateVer(pSdb, 0), -1);
-    ASSERT_EQ(sdbUpdateVer(pSdb, 1), 0);
-    ASSERT_EQ(sdbUpdateVer(pSdb, -1), -1);
    ASSERT_EQ(mnode.insertTimes, 3);
    ASSERT_EQ(mnode.deleteTimes, 0);

@@ -704,8 +700,9 @@ TEST_F(MndTestSdb, 01_Write_Str) {
  }

  // write version
-  ASSERT_EQ(sdbUpdateVer(pSdb, 1), 0);
-  ASSERT_EQ(sdbUpdateVer(pSdb, 1), 1);
+  sdbSetApplyIndex(pSdb, 0);
+  sdbSetApplyIndex(pSdb, 1);
+  ASSERT_EQ(sdbGetApplyIndex(pSdb), 1);
  ASSERT_EQ(sdbWriteFile(pSdb), 0);
  ASSERT_EQ(sdbWriteFile(pSdb), 0);

@@ -775,7 +772,7 @@ TEST_F(MndTestSdb, 01_Read_Str) {
  ASSERT_EQ(sdbGetSize(pSdb, SDB_USER), 2);
  ASSERT_EQ(sdbGetMaxId(pSdb, SDB_USER), -1);
  ASSERT_EQ(sdbGetTableVer(pSdb, SDB_USER), 5);
-  ASSERT_EQ(sdbUpdateVer(pSdb, 0), 1);
+  ASSERT_EQ(sdbGetApplyIndex(pSdb), 1);
  ASSERT_EQ(mnode.insertTimes, 4);
  ASSERT_EQ(mnode.deleteTimes, 0);


--- a/source/dnode/mnode/impl/test/trans/CMakeLists.txt
+++ b/source/dnode/mnode/impl/test/trans/CMakeLists.txt
@@ -31,7 +31,7 @@ target_include_directories(
    PUBLIC "${TD_SOURCE_DIR}/include/dnode/mnode"
    PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}/../../inc"
 )
-add_test(
-    NAME transTest2
-    COMMAND transTest2
-)
+#add_test(
+#    NAME transTest2
+#    COMMAND transTest2
+#)
--- a/source/dnode/mnode/impl/test/trans/trans2.cpp
+++ b/source/dnode/mnode/impl/test/trans/trans2.cpp
@@ -23,6 +23,11 @@ int32_t sendReq(const SEpSet *pEpSet, SRpcMsg *pMsg) {
  return -1;
 }

+int32_t putToQueue(void *pMgmt, SRpcMsg *pMsg) {
+  terrno = TSDB_CODE_INVALID_PTR;
+  return -1;
+}
+
 class MndTestTrans2 : public ::testing::Test {
 protected:
  static void InitLog() {
@@ -55,6 +60,9 @@ class MndTestTrans2 : public ::testing::Test {
    msgCb.reportStartupFp = reportStartup;
    msgCb.sendReqFp = sendReq;
    msgCb.sendRspFp = sendRsp;
+    msgCb.queueFps[SYNC_QUEUE] = putToQueue;
+    msgCb.queueFps[WRITE_QUEUE] = putToQueue;
+     msgCb.queueFps[READ_QUEUE] = putToQueue;
    msgCb.mgmt = (SMgmtWrapper *)(&msgCb);  // hack
    tmsgSetDefault(&msgCb);

@@ -77,6 +85,7 @@ class MndTestTrans2 : public ::testing::Test {
  static void SetUpTestSuite() {
    InitLog();
    walInit();
+    syncInit();
    InitMnode();
  }


--- a/source/dnode/mnode/sdb/src/sdb.c
+++ b/source/dnode/mnode/sdb/src/sdb.c
@@ -31,11 +31,9 @@ SSdb *sdbInit(SSdbOpt *pOption) {
  char path[PATH_MAX + 100] = {0};
  snprintf(path, sizeof(path), "%s%sdata", pOption->path, TD_DIRSEP);
  pSdb->currDir = strdup(path);
-  snprintf(path, sizeof(path), "%s%ssync", pOption->path, TD_DIRSEP);
-  pSdb->syncDir = strdup(path);
  snprintf(path, sizeof(path), "%s%stmp", pOption->path, TD_DIRSEP);
  pSdb->tmpDir = strdup(path);
-  if (pSdb->currDir == NULL || pSdb->currDir == NULL || pSdb->currDir == NULL) {
+  if (pSdb->currDir == NULL || pSdb->tmpDir == NULL) {
    sdbCleanup(pSdb);
    terrno = TSDB_CODE_OUT_OF_MEMORY;
    mError("failed to init sdb since %s", terrstr());
@@ -55,6 +53,7 @@ SSdb *sdbInit(SSdbOpt *pOption) {
  }

  pSdb->curVer = -1;
+  pSdb->curTerm = -1;
  pSdb->lastCommitVer = -1;
  pSdb->pMnode = pOption->pMnode;
  mDebug("sdb init successfully");
@@ -149,12 +148,6 @@ static int32_t sdbCreateDir(SSdb *pSdb) {
    return -1;
  }

-  if (taosMkDir(pSdb->syncDir) != 0) {
-    terrno = TAOS_SYSTEM_ERROR(errno);
-    mError("failed to create dir:%s since %s", pSdb->syncDir, terrstr());
-    return -1;
-  }
-
  if (taosMkDir(pSdb->tmpDir) != 0) {
    terrno = TAOS_SYSTEM_ERROR(errno);
    mError("failed to create dir:%s since %s", pSdb->tmpDir, terrstr());
@@ -164,4 +157,10 @@ static int32_t sdbCreateDir(SSdb *pSdb) {
  return 0;
 }

-int64_t sdbUpdateVer(SSdb *pSdb, int32_t val) { return atomic_add_fetch_64(&pSdb->curVer, val); }
\ No newline at end of file
+void sdbSetApplyIndex(SSdb *pSdb, int64_t index) { pSdb->curVer = index; }
+
+int64_t sdbGetApplyIndex(SSdb *pSdb) { return pSdb->curVer; }
+
+void sdbSetApplyTerm(SSdb *pSdb, int64_t term) { pSdb->curTerm = term; }
+
+int64_t sdbGetApplyTerm(SSdb *pSdb) { return pSdb->curTerm; }
--- a/source/dnode/mnode/sdb/src/sdbFile.c
+++ b/source/dnode/mnode/sdb/src/sdbFile.c
@@ -65,6 +65,16 @@ static int32_t sdbReadFileHead(SSdb *pSdb, TdFilePtr pFile) {
    return -1;
  }

+  ret = taosReadFile(pFile, &pSdb->curTerm, sizeof(int64_t));
+  if (ret < 0) {
+    terrno = TAOS_SYSTEM_ERROR(errno);
+    return -1;
+  }
+  if (ret != sizeof(int64_t)) {
+    terrno = TSDB_CODE_FILE_CORRUPTED;
+    return -1;
+  }
+
  for (int32_t i = 0; i < SDB_TABLE_SIZE; ++i) {
    int64_t maxId = 0;
    ret = taosReadFile(pFile, &maxId, sizeof(int64_t));
@@ -123,6 +133,11 @@ static int32_t sdbWriteFileHead(SSdb *pSdb, TdFilePtr pFile) {
    return -1;
  }

+  if (taosWriteFile(pFile, &pSdb->curTerm, sizeof(int64_t)) != sizeof(int64_t)) {
+    terrno = TAOS_SYSTEM_ERROR(errno);
+    return -1;
+  }
+
  for (int32_t i = 0; i < SDB_TABLE_SIZE; ++i) {
    int64_t maxId = 0;
    if (i < SDB_MAX) {
@@ -182,6 +197,7 @@ int32_t sdbReadFile(SSdb *pSdb) {
  if (sdbReadFileHead(pSdb, pFile) != 0) {
    mError("failed to read file:%s head since %s", file, terrstr());
    pSdb->curVer = -1;
+    pSdb->curTerm = -1;
    taosMemoryFree(pRaw);
    taosCloseFile(&pFile);
    return -1;
@@ -256,8 +272,8 @@ static int32_t sdbWriteFileImp(SSdb *pSdb) {
  char curfile[PATH_MAX] = {0};
  snprintf(curfile, sizeof(curfile), "%s%ssdb.data", pSdb->currDir, TD_DIRSEP);

-  mDebug("start to write file:%s, current ver:%" PRId64 ", commit ver:%" PRId64, curfile, pSdb->curVer,
-         pSdb->lastCommitVer);
+  mDebug("start to write file:%s, current ver:%" PRId64 " term:%" PRId64 ", commit ver:%" PRId64, curfile, pSdb->curVer,
+         pSdb->curTerm, pSdb->lastCommitVer);

  TdFilePtr pFile = taosOpenFile(tmpfile, TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_TRUNC);
  if (pFile == NULL) {
@@ -350,7 +366,7 @@ static int32_t sdbWriteFileImp(SSdb *pSdb) {
    mError("failed to write file:%s since %s", curfile, tstrerror(code));
  } else {
    pSdb->lastCommitVer = pSdb->curVer;
-    mDebug("write file:%s successfully, ver:%" PRId64, curfile, pSdb->lastCommitVer);
+    mDebug("write file:%s successfully, ver:%" PRId64 " term:%" PRId64, curfile, pSdb->lastCommitVer, pSdb->curTerm);
  }

  terrno = code;
@@ -376,3 +392,66 @@ int32_t sdbDeploy(SSdb *pSdb) {

  return 0;
 }
+
+SSdbIter *sdbIterInit(SSdb *pSdb) {
+  char datafile[PATH_MAX] = {0};
+  char tmpfile[PATH_MAX] = {0};
+  snprintf(datafile, sizeof(datafile), "%s%ssdb.data", pSdb->currDir, TD_DIRSEP);
+  snprintf(tmpfile, sizeof(datafile), "%s%ssdb.data", pSdb->tmpDir, TD_DIRSEP);
+
+  if (taosCopyFile(datafile, tmpfile) != 0) {
+    terrno = TAOS_SYSTEM_ERROR(errno);
+    mError("failed to copy file %s to %s since %s", datafile, tmpfile, terrstr());
+    return NULL;
+  }
+
+  SSdbIter *pIter = taosMemoryCalloc(1, sizeof(SSdbIter));
+  if (pIter == NULL) {
+    terrno = TSDB_CODE_OUT_OF_MEMORY;
+    return NULL;
+  }
+
+  pIter->file = taosOpenFile(tmpfile, TD_FILE_READ);
+  if (pIter->file == NULL) {
+    terrno = TAOS_SYSTEM_ERROR(errno);
+    mError("failed to read snapshot file:%s since %s", tmpfile, terrstr());
+    taosMemoryFree(pIter);
+    return NULL;
+  }
+
+  mDebug("start to read snapshot file:%s, iter:%p", tmpfile, pIter);
+  return pIter;
+}
+
+SSdbIter *sdbIterRead(SSdb *pSdb, SSdbIter *pIter, char **ppBuf, int32_t *buflen) {
+  const int32_t maxlen = 100;
+
+  char *pBuf = taosMemoryCalloc(1, maxlen);
+  if (pBuf == NULL) {
+    terrno = TSDB_CODE_OUT_OF_MEMORY;
+    return NULL;
+  }
+
+  int32_t readlen = taosReadFile(pIter->file, pBuf, maxlen);
+  if (readlen == 0) {
+    mTrace("read snapshot to the end, readlen:%" PRId64, pIter->readlen);
+    taosMemoryFree(pBuf);
+    taosCloseFile(&pIter->file);
+    taosMemoryFree(pIter);
+    pIter = NULL;
+  } else if (readlen < 0) {
+    terrno = TAOS_SYSTEM_ERROR(errno);
+    mError("failed to read snapshot since %s, readlen:%" PRId64, terrstr(), pIter->readlen);
+    taosMemoryFree(pBuf);
+    taosCloseFile(&pIter->file);
+    taosMemoryFree(pIter);
+    pIter = NULL;
+  } else {
+    pIter->readlen += readlen;
+    mTrace("read snapshot, readlen:%" PRId64, pIter->readlen);
+    *ppBuf = pBuf;
+    *buflen = readlen;
+  }
+
+  return pIter;
+}
--- a/source/dnode/vnode/CMakeLists.txt
+++ b/source/dnode/vnode/CMakeLists.txt
@@ -48,7 +48,6 @@ target_sources(
    # tq
    "src/tq/tq.c"
    "src/tq/tqCommit.c"
-    "src/tq/tqMetaStore.c"
    "src/tq/tqOffset.c"
    "src/tq/tqPush.c"
    "src/tq/tqRead.c"

--- a/source/dnode/vnode/src/inc/tq.h
+++ b/source/dnode/vnode/src/inc/tq.h
@@ -20,9 +20,9 @@

 #include "executor.h"
 #include "os.h"
-#include "tcache.h"
 #include "thash.h"
 #include "tmsg.h"
+#include "tqueue.h"
 #include "trpc.h"
 #include "ttimer.h"
 #include "wal.h"
@@ -41,45 +41,6 @@ extern "C" {
 #define tqTrace(...) do { if (tqDebugFlag & DEBUG_TRACE) { taosPrintLog("TQ  ", DEBUG_TRACE, tqDebugFlag, __VA_ARGS__); }} while(0)
 // clang-format on

-#define TQ_BUFFER_SIZE 4
-
-#define TQ_BUCKET_MASK 0xFF
-#define TQ_BUCKET_SIZE 256
-
-#define TQ_PAGE_SIZE 4096
-// key + offset + size
-#define TQ_IDX_SIZE 24
-// 4096 / 24
-#define TQ_MAX_IDX_ONE_PAGE 170
-// 24 * 170
-#define TQ_IDX_PAGE_BODY_SIZE 4080
-// 4096 - 4080
-#define TQ_IDX_PAGE_HEAD_SIZE 16
-
-#define TQ_ACTION_CONST      0
-#define TQ_ACTION_INUSE      1
-#define TQ_ACTION_INUSE_CONT 2
-#define TQ_ACTION_INTXN      3
-
-#define TQ_SVER 0
-
-// TODO: inplace mode is not implemented
-#define TQ_UPDATE_INPLACE 0
-#define TQ_UPDATE_APPEND  1
-
-#define TQ_DUP_INTXN_REWRITE 0
-#define TQ_DUP_INTXN_REJECT  2
-
-static inline bool tqUpdateAppend(int32_t tqConfigFlag) { return tqConfigFlag & TQ_UPDATE_APPEND; }
-
-static inline bool tqDupIntxnReject(int32_t tqConfigFlag) { return tqConfigFlag & TQ_DUP_INTXN_REJECT; }
-
-static const int8_t TQ_CONST_DELETE = TQ_ACTION_CONST;
-
-#define TQ_DELETE_TOKEN (void*)&TQ_CONST_DELETE
-
-typedef enum { TQ_ITEM_READY, TQ_ITEM_PROCESS, TQ_ITEM_EMPTY } STqItemStatus;
-
 typedef struct STqOffsetCfg   STqOffsetCfg;
 typedef struct STqOffsetStore STqOffsetStore;

@@ -98,53 +59,6 @@ struct STqReadHandle {
  STSchema*         pSchema;
 };

-typedef struct {
-  int16_t ver;
-  int16_t action;
-  int32_t checksum;
-  int64_t ssize;
-  char    content[];
-} STqSerializedHead;
-
-typedef int32_t (*FTqSerialize)(const void* pObj, STqSerializedHead** ppHead);
-typedef int32_t (*FTqDeserialize)(void* self, const STqSerializedHead* pHead, void** ppObj);
-typedef void (*FTqDelete)(void*);
-
-typedef struct {
-  int64_t key;
-  int64_t offset;
-  int64_t serializedSize;
-  void*   valueInUse;
-  void*   valueInTxn;
-} STqMetaHandle;
-
-typedef struct STqMetaList {
-  STqMetaHandle       handle;
-  struct STqMetaList* next;
-  // struct STqMetaList* inTxnPrev;
-  // struct STqMetaList* inTxnNext;
-  struct STqMetaList* unpersistPrev;
-  struct STqMetaList* unpersistNext;
-} STqMetaList;
-
-typedef struct {
-  STQ*         pTq;
-  STqMetaList* bucket[TQ_BUCKET_SIZE];
-  // a table head
-  STqMetaList* unpersistHead;
-  // topics that are not connectted
-  STqMetaList* unconnectTopic;
-
-  TdFilePtr pFile;
-  TdFilePtr pIdxFile;
-
-  char*          dirPath;
-  int32_t        tqConfigFlag;
-  FTqSerialize   pSerializer;
-  FTqDeserialize pDeserializer;
-  FTqDelete      pDeleter;
-} STqMetaStore;
-
 typedef struct {
  int64_t  consumerId;
  int32_t  epoch;
@@ -172,6 +86,9 @@ typedef struct {
  qTaskInfo_t    task[5];
 } STqExec;

+int32_t tEncodeSTqExec(SEncoder* pEncoder, const STqExec* pExec);
+int32_t tDecodeSTqExec(SDecoder* pDecoder, STqExec* pExec);
+
 struct STQ {
  char*     path;
  SHashObj* pushMgr;  // consumerId -> STqExec*
@@ -179,7 +96,8 @@ struct STQ {
  SHashObj* pStreamTasks;
  SVnode*   pVnode;
  SWal*     pWal;
-  // TDB*      pTdb;
+  TDB*      pMetaStore;
+  TTB*      pExecStore;
 };

 typedef struct {
@@ -187,89 +105,12 @@ typedef struct {
  tmr_h  timer;
 } STqMgmt;

-static STqMgmt tqMgmt;
-
-typedef struct {
-  int8_t         status;
-  int64_t        offset;
-  qTaskInfo_t    task;
-  STqReadHandle* pReadHandle;
-} STqTaskItem;
-
-// new version
-typedef struct {
-  int64_t     firstOffset;
-  int64_t     lastOffset;
-  STqTaskItem output[TQ_BUFFER_SIZE];
-} STqBuffer;
-
-typedef struct {
-  char            topicName[TSDB_TOPIC_FNAME_LEN];
-  char*           sql;
-  char*           logicalPlan;
-  char*           physicalPlan;
-  char*           qmsg;
-  STqBuffer       buffer;
-  SWalReadHandle* pReadhandle;
-} STqTopic;
-
-typedef struct {
-  int64_t consumerId;
-  int32_t epoch;
-  char    cgroup[TSDB_TOPIC_FNAME_LEN];
-  SArray* topics;  // SArray<STqTopic>
-} STqConsumer;
-
-typedef struct {
-  int8_t      type;
-  int8_t      nodeType;
-  int8_t      reserved[6];
-  int64_t     streamId;
-  qTaskInfo_t task;
-  // TODO sync function
-} STqStreamPusher;
-
-typedef struct {
-  int8_t inited;
-  tmr_h  timer;
-} STqPushMgmt;
-
-static STqPushMgmt tqPushMgmt;
+static STqMgmt tqMgmt = {0};

 // init once
 int  tqInit();
 void tqCleanUp();

-// open in each vnode
-// required by vnode
-
-int32_t tqSerializeConsumer(const STqConsumer*, STqSerializedHead**);
-int32_t tqDeserializeConsumer(STQ*, const STqSerializedHead*, STqConsumer**);
-
-static int FORCE_INLINE tqQueryExecuting(int32_t status) { return status; }
-
-// tqMetaStore.h
-STqMetaStore* tqStoreOpen(STQ* pTq, const char* path, FTqSerialize pSerializer, FTqDeserialize pDeserializer,
-                          FTqDelete pDeleter, int32_t tqConfigFlag);
-int32_t       tqStoreClose(STqMetaStore*);
-// int32_t       tqStoreDelete(TqMetaStore*);
-// int32_t       tqStoreCommitAll(TqMetaStore*);
-int32_t tqStorePersist(STqMetaStore*);
-// clean deleted idx and data from persistent file
-int32_t tqStoreCompact(STqMetaStore*);
-
-void* tqHandleGet(STqMetaStore*, int64_t key);
-// make it unpersist
-void*   tqHandleTouchGet(STqMetaStore*, int64_t key);
-int32_t tqHandleMovePut(STqMetaStore*, int64_t key, void* value);
-int32_t tqHandleCopyPut(STqMetaStore*, int64_t key, void* value, size_t vsize);
-// delete committed kv pair
-// notice that a delete action still needs to be committed
-int32_t tqHandleDel(STqMetaStore*, int64_t key);
-int32_t tqHandlePurge(STqMetaStore*, int64_t key);
-int32_t tqHandleCommit(STqMetaStore*, int64_t key);
-int32_t tqHandleAbort(STqMetaStore*, int64_t key);
-
 // tqOffset
 STqOffsetStore* STqOffsetOpen(STqOffsetCfg*);
 void            STqOffsetClose(STqOffsetStore*);

--- a/source/dnode/vnode/src/meta/metaQuery.c
+++ b/source/dnode/vnode/src/meta/metaQuery.c
@@ -278,12 +278,13 @@ STSchema *metaGetTbTSchema(SMeta *pMeta, tb_uid_t uid, int32_t sver) {
  pSW = metaGetTableSchema(pMeta, quid, sver, 0);
  if (!pSW) return NULL;

-  tdInitTSchemaBuilder(&sb, 0);
+  tdInitTSchemaBuilder(&sb, sver);
  for (int i = 0; i < pSW->nCols; i++) {
    pSchema = pSW->pSchema + i;
    tdAddColToSchema(&sb, pSchema->type, pSchema->flags, pSchema->colId, pSchema->bytes);
  }
  pTSchema = tdGetSchemaFromBuilder(&sb);
+
  tdDestroyTSchemaBuilder(&sb);

  taosMemoryFree(pSW->pSchema);

--- a/source/dnode/vnode/src/meta/metaTable.c
+++ b/source/dnode/vnode/src/meta/metaTable.c
@@ -607,31 +607,39 @@ static int metaUpdateTableTagVal(SMeta *pMeta, int64_t version, SVAlterTbReq *pA
  if (iCol == 0) {
    // TODO : need to update tag index
  }
-
  ctbEntry.version = version;
-  SKVRowBuilder kvrb = {0};
-  const SKVRow  pOldTag = (const SKVRow)ctbEntry.ctbEntry.pTags;
-  SKVRow        pNewTag = NULL;
-
-  tdInitKVRowBuilder(&kvrb);
-  for (int32_t i = 0; i < pTagSchema->nCols; i++) {
-    SSchema *pCol = &pTagSchema->pSchema[i];
-    if (iCol == i) {
-      tdAddColToKVRow(&kvrb, pCol->colId, pAlterTbReq->pTagVal, pAlterTbReq->nTagVal);
-    } else {
-      void *p = tdGetKVRowValOfCol(pOldTag, pCol->colId);
-      if (p) {
-        if (IS_VAR_DATA_TYPE(pCol->type)) {
-          tdAddColToKVRow(&kvrb, pCol->colId, p, varDataTLen(p));
-        } else {
-          tdAddColToKVRow(&kvrb, pCol->colId, p, pCol->bytes);
+  if(pTagSchema->nCols == 1 && pTagSchema->pSchema[0].type == TSDB_DATA_TYPE_JSON){
+    ctbEntry.ctbEntry.pTags = taosMemoryMalloc(pAlterTbReq->nTagVal);
+    if(ctbEntry.ctbEntry.pTags == NULL){
+      terrno = TSDB_CODE_OUT_OF_MEMORY;
+      goto _err;
+    }
+    memcpy((void*)ctbEntry.ctbEntry.pTags, pAlterTbReq->pTagVal, pAlterTbReq->nTagVal);
+  }else{
+    SKVRowBuilder kvrb = {0};
+    const SKVRow  pOldTag = (const SKVRow)ctbEntry.ctbEntry.pTags;
+    SKVRow        pNewTag = NULL;
+
+    tdInitKVRowBuilder(&kvrb);
+    for (int32_t i = 0; i < pTagSchema->nCols; i++) {
+      SSchema *pCol = &pTagSchema->pSchema[i];
+      if (iCol == i) {
+        tdAddColToKVRow(&kvrb, pCol->colId, pAlterTbReq->pTagVal, pAlterTbReq->nTagVal);
+      } else {
+        void *p = tdGetKVRowValOfCol(pOldTag, pCol->colId);
+        if (p) {
+          if (IS_VAR_DATA_TYPE(pCol->type)) {
+            tdAddColToKVRow(&kvrb, pCol->colId, p, varDataTLen(p));
+          } else {
+            tdAddColToKVRow(&kvrb, pCol->colId, p, pCol->bytes);
+          }
        }
      }
    }
-  }

-  ctbEntry.ctbEntry.pTags = tdGetKVRowFromBuilder(&kvrb);
-  tdDestroyKVRowBuilder(&kvrb);
+    ctbEntry.ctbEntry.pTags = tdGetKVRowFromBuilder(&kvrb);
+    tdDestroyKVRowBuilder(&kvrb);
+  }

  // save to table.db
  metaSaveToTbDb(pMeta, &ctbEntry);
@@ -641,6 +649,7 @@ static int metaUpdateTableTagVal(SMeta *pMeta, int64_t version, SVAlterTbReq *pA

  tDecoderClear(&dc1);
  tDecoderClear(&dc2);
+  if (ctbEntry.ctbEntry.pTags) taosMemoryFree((void*)ctbEntry.ctbEntry.pTags);
  if (ctbEntry.pBuf) taosMemoryFree(ctbEntry.pBuf);
  if (stbEntry.pBuf) tdbFree(stbEntry.pBuf);
  tdbTbcClose(pTbDbc);

--- a/source/dnode/vnode/src/tq/tq.c
+++ b/source/dnode/vnode/src/tq/tq.c
@@ -14,14 +14,42 @@
 */

 #include "tq.h"
-#include "tqueue.h"
+#include "tdbInt.h"

 int32_t tqInit() {
-  //
+  int8_t old;
+  while (1) {
+    old = atomic_val_compare_exchange_8(&tqMgmt.inited, 0, 2);
+    if (old != 2) break;
+  }
+
+  if (old == 0) {
+    tqMgmt.timer = taosTmrInit(10000, 100, 10000, "TQ");
+    if (tqMgmt.timer == NULL) {
+      atomic_store_8(&tqMgmt.inited, 0);
+      return -1;
+    }
+    atomic_store_8(&tqMgmt.inited, 1);
+  }
  return 0;
 }

-void tqCleanUp() {}
+void tqCleanUp() {
+  int8_t old;
+  while (1) {
+    old = atomic_val_compare_exchange_8(&tqMgmt.inited, 1, 2);
+    if (old != 2) break;
+  }
+
+  if (old == 1) {
+    taosTmrCleanUp(tqMgmt.timer);
+    atomic_store_8(&tqMgmt.inited, 0);
+  }
+}
+
+int tqExecKeyCompare(const void* pKey1, int32_t kLen1, const void* pKey2, int32_t kLen2) {
+  return strcmp(pKey1, pKey2);
+}

 STQ* tqOpen(const char* path, SVnode* pVnode, SWal* pWal) {
  STQ* pTq = taosMemoryMalloc(sizeof(STQ));
@@ -32,9 +60,6 @@ STQ* tqOpen(const char* path, SVnode* pVnode, SWal* pWal) {
  pTq->path = strdup(path);
  pTq->pVnode = pVnode;
  pTq->pWal = pWal;
-  /*if (tdbOpen(path, 4096, 1, &pTq->pTdb) < 0) {*/
-  /*ASSERT(0);*/
-  /*}*/

  pTq->execs = taosHashInit(64, MurmurHash3_32, true, HASH_ENTRY_LOCK);

@@ -42,6 +67,43 @@ STQ* tqOpen(const char* path, SVnode* pVnode, SWal* pWal) {

  pTq->pushMgr = taosHashInit(64, taosGetDefaultHashFunction(TSDB_DATA_TYPE_BIGINT), true, HASH_ENTRY_LOCK);

+  if (tdbOpen(path, 16 * 1024, 1, &pTq->pMetaStore) < 0) {
+    ASSERT(0);
+  }
+
+  if (tdbTbOpen("exec", -1, -1, tqExecKeyCompare, pTq->pMetaStore, &pTq->pExecStore) < 0) {
+    ASSERT(0);
+  }
+
+  TXN txn;
+
+  if (tdbTxnOpen(&txn, 0, tdbDefaultMalloc, tdbDefaultFree, NULL, 0) < 0) {
+    ASSERT(0);
+  }
+
+  /*if (tdbBegin(pTq->pMetaStore, &txn) < 0) {*/
+  /*ASSERT(0);*/
+  /*}*/
+
+  TBC* pCur;
+  if (tdbTbcOpen(pTq->pExecStore, &pCur, &txn) < 0) {
+    ASSERT(0);
+  }
+
+  void* pKey;
+  int   kLen;
+  void* pVal;
+  int   vLen;
+
+  tdbTbcMoveToFirst(pCur);
+  while (tdbTbcNext(pCur, &pKey, &kLen, &pVal, &vLen) == 0) {
+    // create, put into execsj
+  }
+
+  if (tdbTxnClose(&txn) < 0) {
+    ASSERT(0);
+  }
+
  return pTq;
 }

@@ -51,11 +113,43 @@ void tqClose(STQ* pTq) {
    taosHashCleanup(pTq->execs);
    taosHashCleanup(pTq->pStreamTasks);
    taosHashCleanup(pTq->pushMgr);
+    tdbClose(pTq->pMetaStore);
    taosMemoryFree(pTq);
  }
  // TODO
 }

+int32_t tEncodeSTqExec(SEncoder* pEncoder, const STqExec* pExec) {
+  if (tStartEncode(pEncoder) < 0) return -1;
+  if (tEncodeCStr(pEncoder, pExec->subKey) < 0) return -1;
+  if (tEncodeI64(pEncoder, pExec->consumerId) < 0) return -1;
+  if (tEncodeI32(pEncoder, pExec->epoch) < 0) return -1;
+  if (tEncodeI8(pEncoder, pExec->subType) < 0) return -1;
+  if (tEncodeI8(pEncoder, pExec->withTbName) < 0) return -1;
+  if (tEncodeI8(pEncoder, pExec->withSchema) < 0) return -1;
+  if (tEncodeI8(pEncoder, pExec->withTag) < 0) return -1;
+  if (pExec->subType == TOPIC_SUB_TYPE__TABLE) {
+    if (tEncodeCStr(pEncoder, pExec->qmsg) < 0) return -1;
+  }
+  tEndEncode(pEncoder);
+  return pEncoder->pos;
+}
+
+int32_t tDecodeSTqExec(SDecoder* pDecoder, STqExec* pExec) {
+  if (tStartDecode(pDecoder) < 0) return -1;
+  if (tDecodeCStrTo(pDecoder, pExec->subKey) < 0) return -1;
+  if (tDecodeI64(pDecoder, &pExec->consumerId) < 0) return -1;
+  if (tDecodeI32(pDecoder, &pExec->epoch) < 0) return -1;
+  if (tDecodeI8(pDecoder, &pExec->subType) < 0) return -1;
+  if (tDecodeI8(pDecoder, &pExec->withTbName) < 0) return -1;
+  if (tDecodeI8(pDecoder, &pExec->withSchema) < 0) return -1;
+  if (tDecodeI8(pDecoder, &pExec->withTag) < 0) return -1;
+  if (pExec->subType == TOPIC_SUB_TYPE__TABLE) {
+    if (tDecodeCStrAlloc(pDecoder, &pExec->qmsg) < 0) return -1;
+  }
+  tEndDecode(pDecoder);
+  return 0;
+}
 int32_t tqUpdateTbUidList(STQ* pTq, const SArray* tbUidList, bool isAdd) {
  void* pIter = NULL;
  while (1) {
@@ -214,7 +308,7 @@ int tqPushMsg(STQ* pTq, void* msg, int32_t msgLen, tmsg_t msgType, int64_t ver)
    if (taosHashGetSize(pTq->pStreamTasks) == 0) return 0;

    if (tdUpdateExpireWindow(pTq->pVnode->pSma, msg, ver) != 0) {
-      // TODO error handle
+      // TODO handle sma error
    }
    void* data = taosMemoryMalloc(msgLen);
    if (data == NULL) {
@@ -230,134 +324,6 @@ int tqPushMsg(STQ* pTq, void* msg, int32_t msgLen, tmsg_t msgType, int64_t ver)

 int tqCommit(STQ* pTq) {
  // do nothing
-  /*return tqStorePersist(pTq->tqMeta);*/
-  return 0;
-}
-
-int32_t tqGetTopicHandleSize(const STqTopic* pTopic) {
-  return strlen(pTopic->topicName) + strlen(pTopic->sql) + strlen(pTopic->physicalPlan) + strlen(pTopic->qmsg) +
-         sizeof(int64_t) * 3;
-}
-
-int32_t tqGetConsumerHandleSize(const STqConsumer* pConsumer) {
-  int     num = taosArrayGetSize(pConsumer->topics);
-  int32_t sz = 0;
-  for (int i = 0; i < num; i++) {
-    STqTopic* pTopic = taosArrayGet(pConsumer->topics, i);
-    sz += tqGetTopicHandleSize(pTopic);
-  }
-  return sz;
-}
-
-static FORCE_INLINE int32_t tEncodeSTqTopic(void** buf, const STqTopic* pTopic) {
-  int32_t tlen = 0;
-  tlen += taosEncodeString(buf, pTopic->topicName);
-  /*tlen += taosEncodeString(buf, pTopic->sql);*/
-  /*tlen += taosEncodeString(buf, pTopic->physicalPlan);*/
-  tlen += taosEncodeString(buf, pTopic->qmsg);
-  /*tlen += taosEncodeFixedI64(buf, pTopic->persistedOffset);*/
-  /*tlen += taosEncodeFixedI64(buf, pTopic->committedOffset);*/
-  /*tlen += taosEncodeFixedI64(buf, pTopic->currentOffset);*/
-  return tlen;
-}
-
-static FORCE_INLINE const void* tDecodeSTqTopic(const void* buf, STqTopic* pTopic) {
-  buf = taosDecodeStringTo(buf, pTopic->topicName);
-  /*buf = taosDecodeString(buf, &pTopic->sql);*/
-  /*buf = taosDecodeString(buf, &pTopic->physicalPlan);*/
-  buf = taosDecodeString(buf, &pTopic->qmsg);
-  /*buf = taosDecodeFixedI64(buf, &pTopic->persistedOffset);*/
-  /*buf = taosDecodeFixedI64(buf, &pTopic->committedOffset);*/
-  /*buf = taosDecodeFixedI64(buf, &pTopic->currentOffset);*/
-  return buf;
-}
-
-static FORCE_INLINE int32_t tEncodeSTqConsumer(void** buf, const STqConsumer* pConsumer) {
-  int32_t sz;
-
-  int32_t tlen = 0;
-  tlen += taosEncodeFixedI64(buf, pConsumer->consumerId);
-  tlen += taosEncodeFixedI32(buf, pConsumer->epoch);
-  tlen += taosEncodeString(buf, pConsumer->cgroup);
-  sz = taosArrayGetSize(pConsumer->topics);
-  tlen += taosEncodeFixedI32(buf, sz);
-  for (int32_t i = 0; i < sz; i++) {
-    STqTopic* pTopic = taosArrayGet(pConsumer->topics, i);
-    tlen += tEncodeSTqTopic(buf, pTopic);
-  }
-  return tlen;
-}
-
-static FORCE_INLINE const void* tDecodeSTqConsumer(const void* buf, STqConsumer* pConsumer) {
-  int32_t sz;
-
-  buf = taosDecodeFixedI64(buf, &pConsumer->consumerId);
-  buf = taosDecodeFixedI32(buf, &pConsumer->epoch);
-  buf = taosDecodeStringTo(buf, pConsumer->cgroup);
-  buf = taosDecodeFixedI32(buf, &sz);
-  pConsumer->topics = taosArrayInit(sz, sizeof(STqTopic));
-  if (pConsumer->topics == NULL) return NULL;
-  for (int32_t i = 0; i < sz; i++) {
-    STqTopic pTopic;
-    buf = tDecodeSTqTopic(buf, &pTopic);
-    taosArrayPush(pConsumer->topics, &pTopic);
-  }
-  return buf;
-}
-
-int tqSerializeConsumer(const STqConsumer* pConsumer, STqSerializedHead** ppHead) {
-  int32_t sz = tEncodeSTqConsumer(NULL, pConsumer);
-
-  if (sz > (*ppHead)->ssize) {
-    void* tmpPtr = taosMemoryRealloc(*ppHead, sizeof(STqSerializedHead) + sz);
-    if (tmpPtr == NULL) {
-      taosMemoryFree(*ppHead);
-      terrno = TSDB_CODE_TQ_OUT_OF_MEMORY;
-      return -1;
-    }
-    *ppHead = tmpPtr;
-    (*ppHead)->ssize = sz;
-  }
-
-  void* ptr = (*ppHead)->content;
-  void* abuf = ptr;
-  tEncodeSTqConsumer(&abuf, pConsumer);
-
-  return 0;
-}
-
-int32_t tqDeserializeConsumer(STQ* pTq, const STqSerializedHead* pHead, STqConsumer** ppConsumer) {
-  const void* str = pHead->content;
-  *ppConsumer = taosMemoryCalloc(1, sizeof(STqConsumer));
-  if (*ppConsumer == NULL) {
-    terrno = TSDB_CODE_TQ_OUT_OF_MEMORY;
-    return -1;
-  }
-  if (tDecodeSTqConsumer(str, *ppConsumer) == NULL) {
-    terrno = TSDB_CODE_TQ_OUT_OF_MEMORY;
-    return -1;
-  }
-  STqConsumer* pConsumer = *ppConsumer;
-  int32_t      sz = taosArrayGetSize(pConsumer->topics);
-  for (int32_t i = 0; i < sz; i++) {
-    STqTopic* pTopic = taosArrayGet(pConsumer->topics, i);
-    pTopic->pReadhandle = walOpenReadHandle(pTq->pWal);
-    if (pTopic->pReadhandle == NULL) {
-      ASSERT(false);
-    }
-    for (int j = 0; j < TQ_BUFFER_SIZE; j++) {
-      pTopic->buffer.output[j].status = 0;
-      STqReadHandle* pReadHandle = tqInitSubmitMsgScanner(pTq->pVnode->pMeta);
-      SReadHandle    handle = {
-             .reader = pReadHandle,
-             .meta = pTq->pVnode->pMeta,
-             .pMsgCb = &pTq->pVnode->msgCb,
-      };
-      pTopic->buffer.output[j].pReadHandle = pReadHandle;
-      pTopic->buffer.output[j].task = qCreateStreamExecTaskInfo(pTopic->qmsg, &handle);
-    }
-  }
-
  return 0;
 }

@@ -627,6 +593,23 @@ int32_t tqProcessVgDeleteReq(STQ* pTq, char* msg, int32_t msgLen) {

  int32_t code = taosHashRemove(pTq->execs, pReq->subKey, strlen(pReq->subKey));
  ASSERT(code == 0);
+
+  TXN txn;
+
+  if (tdbTxnOpen(&txn, 0, tdbDefaultMalloc, tdbDefaultFree, NULL, TDB_TXN_WRITE | TDB_TXN_READ_UNCOMMITTED) < 0) {
+    ASSERT(0);
+  }
+
+  if (tdbBegin(pTq->pMetaStore, &txn) < 0) {
+    ASSERT(0);
+  }
+
+  tdbTbDelete(pTq->pExecStore, pReq->subKey, (int)strlen(pReq->subKey), &txn);
+
+  if (tdbCommit(pTq->pMetaStore, &txn) < 0) {
+    ASSERT(0);
+  }
+
  return 0;
 }

@@ -675,6 +658,45 @@ int32_t tqProcessVgChangeReq(STQ* pTq, char* msg, int32_t msgLen) {
      pExec->pDropTbUid = taosHashInit(64, taosGetDefaultHashFunction(TSDB_DATA_TYPE_BIGINT), false, HASH_NO_LOCK);
    }
    taosHashPut(pTq->execs, req.subKey, strlen(req.subKey), pExec, sizeof(STqExec));
+
+    int32_t code;
+    int32_t vlen;
+    tEncodeSize(tEncodeSTqExec, pExec, vlen, code);
+    ASSERT(code == 0);
+
+    void* buf = taosMemoryCalloc(1, vlen);
+    if (buf == NULL) {
+      ASSERT(0);
+    }
+
+    SEncoder encoder;
+    tEncoderInit(&encoder, buf, vlen);
+
+    if (tEncodeSTqExec(&encoder, pExec) < 0) {
+      ASSERT(0);
+    }
+
+    TXN txn;
+
+    if (tdbTxnOpen(&txn, 0, tdbDefaultMalloc, tdbDefaultFree, NULL, TDB_TXN_WRITE | TDB_TXN_READ_UNCOMMITTED) < 0) {
+      ASSERT(0);
+    }
+
+    if (tdbBegin(pTq->pMetaStore, &txn) < 0) {
+      ASSERT(0);
+    }
+
+    if (tdbTbUpsert(pTq->pExecStore, req.subKey, (int)strlen(req.subKey), buf, vlen, &txn) < 0) {
+      ASSERT(0);
+    }
+
+    if (tdbCommit(pTq->pMetaStore, &txn) < 0) {
+      ASSERT(0);
+    }
+
+    tEncoderClear(&encoder);
+    taosMemoryFree(buf);
+
    return 0;
  } else {
    /*if (req.newConsumerId != -1) {*/

--- a/source/dnode/vnode/src/tq/tqMetaStore.c
+++ b/source/dnode/vnode/src/tq/tqMetaStore.c
--- a/source/dnode/vnode/src/tq/tqRead.c
+++ b/source/dnode/vnode/src/tq/tqRead.c
@@ -83,11 +83,11 @@ bool tqNextDataBlockFilterOut(STqReadHandle* pHandle, SHashObj* filterOutUids) {

 int32_t tqRetrieveDataBlock(SArray** ppCols, STqReadHandle* pHandle, uint64_t* pGroupId, uint64_t* pUid,
                            int32_t* pNumOfRows, int16_t* pNumOfCols) {
-  /*int32_t         sversion = pHandle->pBlock->sversion;*/
-  // TODO set to real sversion
  *pUid = 0;

-  int32_t sversion = 1;
+  // TODO set to real sversion
+  /*int32_t sversion = 1;*/
+  int32_t sversion = htonl(pHandle->pBlock->sversion);
  if (pHandle->sver != sversion || pHandle->cachedSchemaUid != pHandle->msgIter.suid) {
    pHandle->pSchema = metaGetTbTSchema(pHandle->pVnodeMeta, pHandle->msgIter.uid, sversion);
    if (pHandle->pSchema == NULL) {

--- a/source/dnode/vnode/src/tsdb/tsdbRead.c
+++ b/source/dnode/vnode/src/tsdb/tsdbRead.c
@@ -1638,9 +1638,7 @@ static int32_t mergeTwoRowFromMem(STsdbReadHandle* pTsdbReadHandle, int32_t capa
  int32_t numOfColsOfRow1 = 0;

  if (pSchema1 == NULL) {
-    // pSchema1 = metaGetTbTSchema(REPO_META(pTsdbReadHandle->pTsdb), uid, TD_ROW_SVER(row1));
-    // TODO: use the real schemaVersion
-    pSchema1 = metaGetTbTSchema(REPO_META(pTsdbReadHandle->pTsdb), uid, 1);
+    pSchema1 = metaGetTbTSchema(REPO_META(pTsdbReadHandle->pTsdb), uid, TD_ROW_SVER(row1));
  }

 #ifdef TD_DEBUG_PRINT_ROW
@@ -1657,9 +1655,7 @@ static int32_t mergeTwoRowFromMem(STsdbReadHandle* pTsdbReadHandle, int32_t capa
  if (row2) {
    isRow2DataRow = TD_IS_TP_ROW(row2);
    if (pSchema2 == NULL) {
-      // pSchema2 = metaGetTbTSchema(REPO_META(pTsdbReadHandle->pTsdb), uid, TD_ROW_SVER(row2));
-      // TODO: use the real schemaVersion
-      pSchema2 = metaGetTbTSchema(REPO_META(pTsdbReadHandle->pTsdb), uid, 1);
+      pSchema2 = metaGetTbTSchema(REPO_META(pTsdbReadHandle->pTsdb), uid, TD_ROW_SVER(row2));
    }
    if (isRow2DataRow) {
      numOfColsOfRow2 = schemaNCols(pSchema2);

--- a/source/dnode/vnode/src/tsdb/tsdbReadImpl.c
+++ b/source/dnode/vnode/src/tsdb/tsdbReadImpl.c
--- a/source/dnode/vnode/src/vnd/vnodeSync.c
+++ b/source/dnode/vnode/src/vnd/vnodeSync.c
--- a/source/dnode/vnode/test/tqMetaTest.cpp
+++ b/source/dnode/vnode/test/tqMetaTest.cpp
--- a/source/libs/command/inc/commandInt.h
+++ b/source/libs/command/inc/commandInt.h
@@ -36,6 +36,8 @@ extern "C" {
 #define EXPLAIN_SORT_FORMAT "Sort"
 #define EXPLAIN_INTERVAL_FORMAT "Interval on Column %s"
 #define EXPLAIN_SESSION_FORMAT "Session"
+#define EXPLAIN_STATE_WINDOW_FORMAT "StateWindow on Column %s"
+#define EXPLAIN_PARITION_FORMAT "Partition on Column %s"
 #define EXPLAIN_ORDER_FORMAT "Order: %s"
 #define EXPLAIN_FILTER_FORMAT "Filter: "
 #define EXPLAIN_FILL_FORMAT "Fill: %s"
@@ -60,7 +62,7 @@ extern "C" {
 #define EXPLAIN_GROUPS_FORMAT "groups=%d"
 #define EXPLAIN_WIDTH_FORMAT "width=%d"
 #define EXPLAIN_FUNCTIONS_FORMAT "functions=%d"
-#define EXPLAIN_EXECINFO_FORMAT "cost=%" PRIu64 "..%" PRIu64 " rows=%" PRIu64
+#define EXPLAIN_EXECINFO_FORMAT "cost=%.3f..%.3f rows=%" PRIu64

 typedef struct SExplainGroup {
  int32_t   nodeNum;

--- a/source/libs/command/src/command.c
+++ b/source/libs/command/src/command.c
--- a/source/libs/command/src/explain.c
+++ b/source/libs/command/src/explain.c
--- a/source/libs/executor/inc/executorimpl.h
+++ b/source/libs/executor/inc/executorimpl.h
--- a/source/libs/executor/src/executorMain.c
+++ b/source/libs/executor/src/executorMain.c
--- a/source/libs/executor/src/executorimpl.c
+++ b/source/libs/executor/src/executorimpl.c
--- a/source/libs/executor/src/groupoperator.c
+++ b/source/libs/executor/src/groupoperator.c
--- a/source/libs/executor/src/joinoperator.c
+++ b/source/libs/executor/src/joinoperator.c
--- a/source/libs/executor/src/scanoperator.c
+++ b/source/libs/executor/src/scanoperator.c
--- a/source/libs/executor/src/timewindowoperator.c
+++ b/source/libs/executor/src/timewindowoperator.c
--- a/source/libs/function/inc/builtins.h
+++ b/source/libs/function/inc/builtins.h
--- a/source/libs/function/inc/builtinsimpl.h
+++ b/source/libs/function/inc/builtinsimpl.h
--- a/source/libs/function/inc/functionMgtInt.h
+++ b/source/libs/function/inc/functionMgtInt.h
--- a/source/libs/function/src/builtins.c
+++ b/source/libs/function/src/builtins.c
--- a/source/libs/function/src/builtinsimpl.c
+++ b/source/libs/function/src/builtinsimpl.c
--- a/source/libs/function/src/functionMgt.c
+++ b/source/libs/function/src/functionMgt.c
--- a/source/libs/function/src/udfd.c
+++ b/source/libs/function/src/udfd.c
--- a/source/libs/index/CMakeLists.txt
+++ b/source/libs/index/CMakeLists.txt
--- a/source/libs/index/inc/indexTfile.h
+++ b/source/libs/index/inc/indexTfile.h
--- a/source/libs/index/src/index.c
+++ b/source/libs/index/src/index.c
--- a/source/libs/index/src/indexCache.c
+++ b/source/libs/index/src/indexCache.c
--- a/source/libs/index/src/indexFstCountingWriter.c
+++ b/source/libs/index/src/indexFstCountingWriter.c
--- a/source/libs/index/src/indexTfile.c
+++ b/source/libs/index/src/indexTfile.c
--- a/source/libs/index/test/indexTests.cc
+++ b/source/libs/index/test/indexTests.cc
--- a/source/libs/nodes/src/nodesCloneFuncs.c
+++ b/source/libs/nodes/src/nodesCloneFuncs.c
--- a/source/libs/nodes/src/nodesCodeFuncs.c
+++ b/source/libs/nodes/src/nodesCodeFuncs.c
--- a/source/libs/nodes/src/nodesTraverseFuncs.c
+++ b/source/libs/nodes/src/nodesTraverseFuncs.c
--- a/source/libs/nodes/src/nodesUtilFuncs.c
+++ b/source/libs/nodes/src/nodesUtilFuncs.c
--- a/source/libs/parser/inc/parAst.h
+++ b/source/libs/parser/inc/parAst.h
--- a/source/libs/parser/inc/parInsertData.h
+++ b/source/libs/parser/inc/parInsertData.h
--- a/source/libs/parser/src/parAstCreater.c
+++ b/source/libs/parser/src/parAstCreater.c
--- a/source/libs/parser/src/parCalcConst.c
+++ b/source/libs/parser/src/parCalcConst.c
--- a/source/libs/parser/src/parInsert.c
+++ b/source/libs/parser/src/parInsert.c
--- a/source/libs/parser/src/parInsertData.c
+++ b/source/libs/parser/src/parInsertData.c
--- a/source/libs/parser/src/parTranslater.c
+++ b/source/libs/parser/src/parTranslater.c
--- a/source/libs/parser/src/parUtil.c
+++ b/source/libs/parser/src/parUtil.c
--- a/source/libs/parser/test/mockCatalog.cpp
+++ b/source/libs/parser/test/mockCatalog.cpp
--- a/source/libs/planner/CMakeLists.txt
+++ b/source/libs/planner/CMakeLists.txt
--- a/source/libs/planner/src/planLogicCreater.c
+++ b/source/libs/planner/src/planLogicCreater.c
--- a/source/libs/planner/src/planOptimizer.c
+++ b/source/libs/planner/src/planOptimizer.c
--- a/source/libs/planner/src/planPhysiCreater.c
+++ b/source/libs/planner/src/planPhysiCreater.c
--- a/source/libs/planner/src/planner.c
+++ b/source/libs/planner/src/planner.c
--- a/source/libs/planner/test/planBasicTest.cpp
+++ b/source/libs/planner/test/planBasicTest.cpp
--- a/source/libs/planner/test/planGroupByTest.cpp
+++ b/source/libs/planner/test/planGroupByTest.cpp
--- a/source/libs/planner/test/planJoinTest.cpp
+++ b/source/libs/planner/test/planJoinTest.cpp
--- a/source/libs/planner/test/planOptimizeTest.cpp
+++ b/source/libs/planner/test/planOptimizeTest.cpp
--- a/source/libs/planner/test/planTestUtil.cpp
+++ b/source/libs/planner/test/planTestUtil.cpp
--- a/source/libs/scalar/src/scalar.c
+++ b/source/libs/scalar/src/scalar.c
--- a/source/libs/scalar/src/sclfunc.c
+++ b/source/libs/scalar/src/sclfunc.c
--- a/source/libs/scalar/src/sclvector.c
+++ b/source/libs/scalar/src/sclvector.c
--- a/source/libs/scalar/test/scalar/scalarTests.cpp
+++ b/source/libs/scalar/test/scalar/scalarTests.cpp
--- a/source/libs/stream/src/tstreamUpdate.c
+++ b/source/libs/stream/src/tstreamUpdate.c
--- a/source/libs/sync/inc/syncIO.h
+++ b/source/libs/sync/inc/syncIO.h
--- a/source/libs/sync/inc/syncInt.h
+++ b/source/libs/sync/inc/syncInt.h
--- a/source/libs/sync/inc/syncVoteMgr.h
+++ b/source/libs/sync/inc/syncVoteMgr.h
--- a/source/libs/sync/src/syncAppendEntries.c
+++ b/source/libs/sync/src/syncAppendEntries.c
--- a/source/libs/sync/src/syncCommit.c
+++ b/source/libs/sync/src/syncCommit.c
--- a/source/libs/sync/src/syncMain.c
+++ b/source/libs/sync/src/syncMain.c
--- a/source/libs/sync/src/syncMessage.c
+++ b/source/libs/sync/src/syncMessage.c
--- a/source/libs/sync/src/syncVoteMgr.c
+++ b/source/libs/sync/src/syncVoteMgr.c
--- a/source/libs/sync/test/syncConfigChangeTest.cpp
+++ b/source/libs/sync/test/syncConfigChangeTest.cpp
--- a/source/libs/sync/test/syncSnapshotTest.cpp
+++ b/source/libs/sync/test/syncSnapshotTest.cpp
--- a/source/libs/transport/inc/transComm.h
+++ b/source/libs/transport/inc/transComm.h
--- a/source/libs/transport/src/trans.c
+++ b/source/libs/transport/src/trans.c
--- a/source/libs/wal/src/walMgmt.c
+++ b/source/libs/wal/src/walMgmt.c
--- a/source/util/src/terror.c
+++ b/source/util/src/terror.c
--- a/source/util/src/tutil.c
+++ b/source/util/src/tutil.c
--- a/tests/script/jenkins/basic.txt
+++ b/tests/script/jenkins/basic.txt
--- a/tests/script/sh/deploy.sh
+++ b/tests/script/sh/deploy.sh
--- a/tests/script/tsim/mnode/basic2.sim
+++ b/tests/script/tsim/mnode/basic2.sim
--- a/tests/script/tsim/stable/add_column.sim
+++ b/tests/script/tsim/stable/add_column.sim
--- a/tests/script/tsim/stable/column_drop.sim
+++ b/tests/script/tsim/stable/column_drop.sim
--- a/tests/script/tsim/stable/column_modify.sim
+++ b/tests/script/tsim/stable/column_modify.sim
--- a/tests/script/tsim/stream/session0.sim
+++ b/tests/script/tsim/stream/session0.sim
--- a/tests/script/tsim/stream/session1.sim
+++ b/tests/script/tsim/stream/session1.sim
--- a/tests/script/tsim/sync/insertDataByRunBack.sim
+++ b/tests/script/tsim/sync/insertDataByRunBack.sim
--- a/tests/script/tsim/sync/threeReplica1VgElectWihtInsert.sim
+++ b/tests/script/tsim/sync/threeReplica1VgElectWihtInsert.sim
--- a/tests/script/tsim/trans/lossdata1.sim
+++ b/tests/script/tsim/trans/lossdata1.sim
--- a/tests/system-test/0-others/udfTest.py
+++ b/tests/system-test/0-others/udfTest.py
--- a/tests/system-test/0-others/udf_create.py
+++ b/tests/system-test/0-others/udf_create.py
--- a/tests/system-test/0-others/udf_restart_taosd.py
+++ b/tests/system-test/0-others/udf_restart_taosd.py
--- a/tests/system-test/1-insert/insertWithMoreVgroup.py
+++ b/tests/system-test/1-insert/insertWithMoreVgroup.py
--- a/tests/system-test/1-insert/manyVgroups.json
+++ b/tests/system-test/1-insert/manyVgroups.json
--- a/tests/system-test/2-query/check_tsdb.py
+++ b/tests/system-test/2-query/check_tsdb.py
--- a/tests/system-test/2-query/json_tag.py
+++ b/tests/system-test/2-query/json_tag.py
--- a/tests/system-test/7-tmq/subscribeDb.py
+++ b/tests/system-test/7-tmq/subscribeDb.py
--- a/tests/system-test/7-tmq/subscribeStb.py
+++ b/tests/system-test/7-tmq/subscribeStb.py
--- a/tests/system-test/7-tmq/subscribeStb0.py
+++ b/tests/system-test/7-tmq/subscribeStb0.py
--- a/tests/system-test/7-tmq/subscribeStb2.py
+++ b/tests/system-test/7-tmq/subscribeStb2.py
--- a/tests/system-test/fulltest.sh
+++ b/tests/system-test/fulltest.sh
--- a/tests/test/c/sdbDump.c
+++ b/tests/test/c/sdbDump.c
--- a/tests/test/c/tmqSim.c
+++ b/tests/test/c/tmqSim.c
--- a/taos-tools @ a8bb88c9
+++ b/taos-tools @ a8bb88c9