Merge branch '3.0' into 3.0test/jcy

637eb12c · jiacy-jcy · ef9302a5 · bd76ae52 · 637eb12c · 637eb12c
47 changed file
--- a/docs/en/07-develop/02-model/index.mdx
+++ b/docs/en/07-develop/02-model/index.mdx
@@ -9,15 +9,15 @@ The data model employed by TDengine is similar to that of a relational database.
 The [characteristics of time-series data](https://www.taosdata.com/blog/2019/07/09/86.html) from different data collection points may be different. Characteristics include collection frequency, retention policy and others which determine how you create and configure the database. For e.g. days to keep, number of replicas, data block size, whether data updates are allowed and other configurable parameters would be determined by the characteristics of your data and your business requirements. For TDengine to operate with the best performance, we strongly recommend that you create and configure different databases for data with different characteristics. This allows you, for example, to set up different storage and retention policies. When creating a database, there are a lot of parameters that can be configured such as, the days to keep data, the number of replicas, the number of memory blocks, time precision, the minimum and maximum number of rows in each data block, whether compression is enabled, the time range of the data in single data file and so on. Below is an example of the SQL statement to create a database.
 ```sql
-CREATE DATABASE power KEEP 365 DAYS 10 BLOCKS 6 UPDATE 1;
+CREATE DATABASE power KEEP 365 DURATION 10 BUFFER 16 VGROUPS 100 WAL 1;
 ```
 In the above SQL statement:
 - a database named "power" will be created
 - the data in it will be kept for 365 days, which means that data older than 365 days will be deleted automatically
 - a new data file will be created every 10 days
- the number of memory blocks is 6
+- the size of memory cache for writing is 16 MB
- data is allowed to be updated
+- data will be firstly written to WAL without FSYNC
 For more details please refer to [Database](/taos-sql/database).
@@ -30,7 +30,6 @@ USE power;
 :::note
 - Any table or STable must belong to a database. To create a table or STable, the database it belongs to must be ready.
- JOIN operations can't be performed on tables from two different databases.
 - Timestamp needs to be specified when inserting rows or querying historical rows.
 :::
@@ -52,7 +51,7 @@ Similar to creating a regular table, when creating a STable, the name and schema
 For each kind of data collection point, a corresponding STable must be created. There may be many STables in an application. For electrical power system, we need to create a STable respectively for meters, transformers, busbars, switches. There may be multiple kinds of data collection points on a single device, for example there may be one data collection point for electrical data like current and voltage and another data collection point for environmental data like temperature, humidity and wind direction. Multiple STables are required for these kinds of devices.
-At most 4096 (or 1024 prior to version 2.1.7.0) columns are allowed in a STable. If there are more than 4096 of metrics to be collected for a data collection point, multiple STables are required. There can be multiple databases in a system, while one or more STables can exist in a database.
+At most 4096 columns are allowed in a STable. If there are more than 4096 of metrics to be collected for a data collection point, multiple STables are required. There can be multiple databases in a system, while one or more STables can exist in a database.
 ## Create Table
@@ -66,12 +65,11 @@ In the above SQL statement, "d1001" is the table name, "meters" is the STable na
 In the TDengine system, it's recommended to create a table for a data collection point via STable. A table created via STable is called subtable in some parts of the TDengine documentation. All SQL commands applied on regular tables can be applied on subtables.
-:::warning
-It's not recommended to create a table in a database while using a STable from another database as template.
 :::tip
 It's suggested to use the globally unique ID of a data collection point as the table name. For example the device serial number could be used as a unique ID. If a unique ID doesn't exist, multiple IDs that are not globally unique can be combined to form a globally unique ID. It's not recommended to use a globally unique ID as tag value.
+:::
 ## Create Table Automatically
 In some circumstances, it's unknown whether the table already exists when inserting rows. The table can be created automatically using the SQL statement below, and nothing will happen if the table already exists.

--- a/docs/en/07-develop/03-insert-data/01-sql-writing.mdx
+++ b/docs/en/07-develop/03-insert-data/01-sql-writing.mdx
@@ -42,7 +42,7 @@ INSERT INTO d1001 VALUES (1538548684000, 10.2, 220, 0.23) (1538548696650, 10.3,
 ### Insert into Multiple Tables
-Data can be inserted into multiple tables in the same SQL statement. The example below inserts 2 rows into table "d1001" and 1 row into table "d1002".
+Data can be inserted into multiple tables in single SQL statement. The example below inserts 2 rows into table "d1001" and 1 row into table "d1002".
 ```sql
 INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31) (1538548695000, 12.6, 218, 0.33) d1002 VALUES (1538548696800, 12.3, 221, 0.31);
@@ -52,15 +52,15 @@ For more details about `INSERT` please refer to [INSERT](/taos-sql/insert).
 :::info
- Inserting in batches can improve performance. Normally, the higher the batch size, the better the performance. Please note that a single row can't exceed 48K bytes and each SQL statement can't exceed 1MB.
+- Inserting in batches can improve performance. Normally, the higher the batch size, the better the performance. Please note that a single row can't exceed 48 KB bytes and each SQL statement can't exceed 1 MB.
- Inserting with multiple threads can also improve performance. However, depending on the system resources on the application side and the server side, when the number of inserting threads grows beyond a specific point the performance may drop instead of improving. The proper number of threads needs to be tested in a specific environment to find the best number.
+- Inserting with multiple threads can also improve performance. However, depending on the system resources on the application side and the server side, when the number of inserting threads grows beyond a specific point the performance may drop instead of improving. The proper number of threads needs to be tested in a specific environment to find the best number. The proper number of threads may be impacted by the system resources on the server side, the system resources on the client side, the table schemas, etc. 
 :::
 :::warning
- If the timestamp for the row to be inserted already exists in the table, the behavior depends on the value of parameter `UPDATE`. If it's set to 0 (the default value), the row will be discarded. If it's set to 1, the new values will override the old values for the same row.
+- If the timestamp for the row to be inserted already exists in the table, the old data will be overritten by the new values for the columns for which new values are provided, columns for which no new values are provided are not impacted.
- The timestamp to be inserted must be newer than the timestamp of subtracting current time by the parameter `KEEP`. If `KEEP` is set to 3650 days, then the data older than 3650 days ago can't be inserted. The timestamp to be inserted can't be newer than the timestamp of current time plus parameter `DAYS`. If `DAYS` is set to 2, the data newer than 2 days later can't be inserted.
+- The timestamp to be inserted must be newer than the timestamp of subtracting current time by the parameter `KEEP`. If `KEEP` is set to 3650 days, then the data older than 3650 days ago can't be inserted. The timestamp to be inserted can't be newer than the timestamp of current time plus parameter `DURATION`. If `DAYS` is set to 2, the data newer than 2 days later can't be inserted.
 :::
@@ -101,7 +101,7 @@ For more details about `INSERT` please refer to [INSERT](/taos-sql/insert).
 ### Insert with Parameter Binding
-TDengine also provides API support for parameter binding. Similar to MySQL, only `?` can be used in these APIs to represent the parameters to bind. From version 2.1.1.0 and 2.1.2.0, parameter binding support for inserting data has improved significantly to improve the insert performance by avoiding the cost of parsing SQL statements.
+TDengine also provides API support for parameter binding. Similar to MySQL, only `?` can be used in these APIs to represent the parameters to bind. Parameter binding support for inserting data has improved significantly to improve the insert performance by avoiding the cost of parsing SQL statements.
 Parameter binding is available only with native connection.

--- a/docs/zh/07-develop/02-model/index.mdx
+++ b/docs/zh/07-develop/02-model/index.mdx
@@ -8,13 +8,13 @@ TDengine 采用类关系型数据模型，需要建库、建表。因此对于
 ## 创建库
-不同类型的数据采集点往往具有不同的数据特征，包括数据采集频率的高低，数据保留时间的长短，副本的数目，数据块的大小，是否允许更新数据等等。为了在各种场景下 TDengine 都能最大效率的工作，TDengine 建议将不同数据特征的表创建在不同的库里，因为每个库可以配置不同的存储策略。创建一个库时，除 SQL 标准的选项外，还可以指定保留时长、副本数、内存块个数、时间精度、文件块里最大最小记录条数、是否压缩、一个数据文件覆盖的天数等多种参数。比如：
+不同类型的数据采集点往往具有不同的数据特征，包括数据采集频率的高低，数据保留时间的长短，副本的数目，数据块的大小，是否允许更新数据等等。为了在各种场景下 TDengine 都能最大效率的工作，TDengine 建议将不同数据特征的表创建在不同的库里，因为每个库可以配置不同的存储策略。创建一个库时，除 SQL 标准的选项外，还可以指定保留时长、副本数、缓存大小、时间精度、文件块里最大最小记录条数、是否压缩、一个数据文件覆盖的天数等多种参数。比如：
 ```sql
-CREATE DATABASE power KEEP 365 DAYS 10 BLOCKS 6 UPDATE 1;
+CREATE DATABASE power KEEP 365 DURATION 10 BUFFER 16 VGROUPS 100 WAL 1;
 ```
-上述语句将创建一个名为 power 的库，这个库的数据将保留 365 天（超过 365 天将被自动删除），每 10 天一个数据文件，内存块数为 6，允许更新数据。详细的语法及参数请见 [数据库管理](/taos-sql/database) 章节。
+上述语句将创建一个名为 power 的库，这个库的数据将保留 365 天（超过 365 天将被自动删除），每 10 天一个数据文件，每个 VNODE 的写入内存池的大小为 16 MB，数据库的 VGROUPS 数量，对该数据库入会写 WAL 但不执行 FSYNC。详细的语法及参数请见 [数据库管理](/taos-sql/database) 章节。
 创建库之后，需要使用 SQL 命令 `USE` 将当前库切换过来，例如：
@@ -27,7 +27,6 @@ USE power;
 :::note
 - 任何一张表或超级表必须属于某个库，在创建表之前，必须先创建库。
- 处于两个不同库的表是不能进行 JOIN 操作的。
 - 创建并插入记录、查询历史记录的时候，均需要指定时间戳。
 :::
@@ -40,15 +39,11 @@ USE power;
 CREATE STABLE meters (ts timestamp, current float, voltage int, phase float) TAGS (location binary(64), groupId int);
 ```
-:::note
-这一指令中的 STABLE 关键字，在 2.0.15 之前的版本中需写作 TABLE 。
-:::
 与创建普通表一样，创建超级表时，需要提供表名（示例中为 meters），表结构 Schema，即数据列的定义。第一列必须为时间戳（示例中为 ts)，其他列为采集的物理量（示例中为 current, voltage, phase)，数据类型可以为整型、浮点型、字符串等。除此之外，还需要提供标签的 schema (示例中为 location, groupId)，标签的数据类型可以为整型、浮点型、字符串等。采集点的静态属性往往可以作为标签，比如采集点的地理位置、设备型号、设备组 ID、管理员 ID 等等。标签的 schema 可以事后增加、删除、修改。具体定义以及细节请见 [TAOS SQL 的超级表管理](/taos-sql/stable) 章节。
 每一种类型的数据采集点需要建立一个超级表，因此一个物联网系统，往往会有多个超级表。对于电网，我们就需要对智能电表、变压器、母线、开关等都建立一个超级表。在物联网中，一个设备就可能有多个数据采集点（比如一台风力发电的风机，有的采集点采集电流、电压等电参数，有的采集点采集温度、湿度、风向等环境参数），这个时候，对这一类型的设备，需要建立多张超级表。
-一张超级表最多容许 4096 列 （在 2.1.7.0 版本之前，列数限制为 1024 列），如果一个采集点采集的物理量个数超过 4096，需要建多张超级表来处理。一个系统可以有多个 DB，一个 DB 里可以有一到多个超级表。
+一张超级表最多容许 4096 列，如果一个采集点采集的物理量个数超过 4096，需要建多张超级表来处理。一个系统可以有多个 DB，一个 DB 里可以有一到多个超级表。
 ## 创建表
@@ -60,11 +55,6 @@ CREATE TABLE d1001 USING meters TAGS ("California.SanFrancisco", 2);
 其中 d1001 是表名，meters 是超级表的表名，后面紧跟标签 Location 的具体标签值 "California.SanFrancisco"，标签 groupId 的具体标签值 2。虽然在创建表时，需要指定标签值，但可以事后修改。详细细则请见 [TAOS SQL 的表管理](/taos-sql/table) 章节。
-:::warning
-目前 TDengine 没有从技术层面限制使用一个 database (db1) 的超级表作为模板建立另一个 database (db2) 的子表，后续会禁止这种用法，不建议使用这种方法建表。
-:::
 TDengine 建议将数据采集点的全局唯一 ID 作为表名(比如设备序列号）。但对于有的场景，并没有唯一的 ID，可以将多个 ID 组合成一个唯一的 ID。不建议将具有唯一性的 ID 作为标签值。
 ### 自动建表

--- a/docs/zh/07-develop/03-insert-data/01-sql-writing.mdx
+++ b/docs/zh/07-develop/03-insert-data/01-sql-writing.mdx
@@ -52,15 +52,15 @@ INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31) (1538548695000, 12.6,
 :::info
- 要提高写入效率，需要批量写入。一批写入的记录条数越多，插入效率就越高。但一条记录不能超过 48K，一条 SQL 语句总长度不能超过 1M 。
+- 要提高写入效率，需要批量写入。一般来说一批写入的记录条数越多，插入效率就越高。但一条记录不能超过 48K，一条 SQL 语句总长度不能超过 1M 。
- TDengine 支持多线程同时写入，要进一步提高写入速度，一个客户端需要打开 20 个以上的线程同时写。但线程数达到一定数量后，无法再提高，甚至还会下降，因为线程频繁切换，带来额外开销。
+- TDengine 支持多线程同时写入，要进一步提高写入速度，一个客户端需要打开多个同时写。但线程数达到一定数量后，无法再提高，甚至还会下降，因为线程频繁切换，会带来额外开销，合适的线程数量与服务端的处理能力，服务端的具体配置，数据库的参数，数据定义的 Schema，写入数据的 Batch Size 等很多因素相关。一般来说，服务端和客户端处理能力越强，所能支持的并发写入的线程可以越多；数据库配置时的 vgroups 越多（但仍然要在服务端的处理能力以内）则所能支持的并发写入越多；数据定义的 Schema 越简单，所能支持的并发写入越多。
 :::
 :::warning
- 对同一张表，如果新插入记录的时间戳已经存在，默认情形下（UPDATE=0）新记录将被直接抛弃，也就是说，在一张表里，时间戳必须是唯一的。如果应用自动生成记录，很有可能生成的时间戳是一样的，这样，成功插入的记录条数会小于应用插入的记录条数。如果在创建数据库时使用了 UPDATE 1 选项，插入相同时间戳的新记录将覆盖原有记录。
+- 对同一张表，如果新插入记录的时间戳已经存在，则指定了新值的列会用新值覆盖旧值，而没有指定新值的列则不受影响。
- 写入的数据的时间戳必须大于当前时间减去配置参数 keep 的时间。如果 keep 配置为 3650 天，那么无法写入比 3650 天还早的数据。写入数据的时间戳也不能大于当前时间加配置参数 days。如果 days 为 2，那么无法写入比当前时间还晚 2 天的数据。
+- 写入的数据的时间戳必须大于当前时间减去配置参数 keep 的时间。如果 keep 配置为 3650 天，那么无法写入比 3650 天还早的数据。写入数据的时间戳也不能大于当前时间加配置参数 duration。如果 duration 为 2，那么无法写入比当前时间还晚 2 天的数据。
 :::
@@ -104,7 +104,7 @@ INSERT INTO d1001 VALUES (1538548685000, 10.3, 219, 0.31) (1538548695000, 12.6,
 ### 参数绑定写入
-TDengine 也提供了支持参数绑定的 Prepare API，与 MySQL 类似，这些 API 目前也仅支持用问号 `?` 来代表待绑定的参数。从 2.1.1.0 和 2.1.2.0 版本开始，TDengine 大幅改进了参数绑定接口对数据写入（INSERT）场景的支持。这样在通过参数绑定接口写入数据时，就避免了 SQL 语法解析的资源消耗，从而在绝大多数情况下显著提升写入性能。
+TDengine 也提供了支持参数绑定的 Prepare API，与 MySQL 类似，这些 API 目前也仅支持用问号 `?` 来代表待绑定的参数。在通过参数绑定接口写入数据时，就避免了 SQL 语法解析的资源消耗，从而在绝大多数情况下显著提升写入性能。
 需要注意的是，只有使用原生连接的连接器，才能使用参数绑定功能。

--- a/include/util/thash.h
+++ b/include/util/thash.h
@@ -188,7 +188,7 @@ void *taosHashGetKey(void *data, size_t* keyLen);
 void *taosHashAcquire(SHashObj *pHashObj, const void *key, size_t keyLen);
 /**
- * release the prevous acquired obj
+ * release the previous acquired obj
 *
 * @param pHashObj
 * @param data

--- a/source/client/src/clientEnv.c
+++ b/source/client/src/clientEnv.c
@@ -189,12 +189,15 @@ void destroyTscObj(void *pObj) {
  SClientHbKey connKey = {.tscRid = pTscObj->id, .connType = pTscObj->connType};
  hbDeregisterConn(pTscObj->pAppInfo->pAppHbMgr, connKey);
-  int64_t connNum = atomic_sub_fetch_64(&pTscObj->pAppInfo->numOfConns, 1);
  destroyAllRequests(pTscObj->pRequests);
+  taosHashCleanup(pTscObj->pRequests);
  schedulerStopQueryHb(pTscObj->pAppInfo->pTransporter);
  tscDebug("connObj 0x%" PRIx64 " p:%p destroyed, remain inst totalConn:%" PRId64, pTscObj->id, pTscObj,
           pTscObj->pAppInfo->numOfConns);
+  int64_t connNum = atomic_sub_fetch_64(&pTscObj->pAppInfo->numOfConns, 1);
  if (0 == connNum) {
    destroyAppInst(pTscObj->pAppInfo);
  }

--- a/source/client/src/clientHb.c
+++ b/source/client/src/clientHb.c
@@ -671,8 +671,7 @@ static void *hbThreadFunc(void *param) {
  }
 #endif
  while (1) {
-    int8_t threadStop = atomic_val_compare_exchange_8(&clientHbMgr.threadStop, 1, 2);
+    if (1 == clientHbMgr.threadStop) {
-    if (1 == threadStop) {
      break;
    }
@@ -760,9 +759,7 @@ static void hbStopThread() {
    return;
  }
-  while (2 != atomic_load_8(&clientHbMgr.threadStop)) {
+  taosThreadJoin(clientHbMgr.thread, NULL);    
-    taosUsleep(10);
-  }
  tscDebug("hb thread stopped");
 }

--- a/source/common/src/systable.c
+++ b/source/common/src/systable.c
@@ -123,6 +123,9 @@ static const SSysDbTableSchema userStbsSchema[] = {
    {.name = "tags", .bytes = 4, .type = TSDB_DATA_TYPE_INT},
    {.name = "last_update", .bytes = 8, .type = TSDB_DATA_TYPE_TIMESTAMP},
    {.name = "table_comment", .bytes = TSDB_TB_COMMENT_LEN + VARSTR_HEADER_SIZE, .type = TSDB_DATA_TYPE_VARCHAR},
+    {.name = "watermark", .bytes = 64 + VARSTR_HEADER_SIZE, .type = TSDB_DATA_TYPE_VARCHAR},
+    {.name = "max_delay", .bytes = 64 + VARSTR_HEADER_SIZE, .type = TSDB_DATA_TYPE_VARCHAR},
+    {.name = "rollup", .bytes = 128 + VARSTR_HEADER_SIZE, .type = TSDB_DATA_TYPE_VARCHAR},
 };
 static const SSysDbTableSchema streamSchema[] = {
@@ -146,8 +149,8 @@ static const SSysDbTableSchema userTblsSchema[] = {
    {.name = "uid", .bytes = 8, .type = TSDB_DATA_TYPE_BIGINT},
    {.name = "vgroup_id", .bytes = 4, .type = TSDB_DATA_TYPE_INT},
    {.name = "ttl", .bytes = 4, .type = TSDB_DATA_TYPE_INT},
-    {.name = "table_comment", .bytes = TSDB_TB_COMMENT_LEN + VARSTR_HEADER_SIZE, .type = TSDB_DATA_TYPE_VARCHAR},
+    {.name = "table_comment", .bytes = TSDB_TB_COMMENT_LEN - 1 + VARSTR_HEADER_SIZE, .type = TSDB_DATA_TYPE_VARCHAR},
-    {.name = "type", .bytes = 20 + VARSTR_HEADER_SIZE, .type = TSDB_DATA_TYPE_VARCHAR},
+    {.name = "type", .bytes = 21 + VARSTR_HEADER_SIZE, .type = TSDB_DATA_TYPE_VARCHAR},
 };
 static const SSysDbTableSchema userTblDistSchema[] = {

--- a/source/dnode/mnode/impl/src/mndStb.c
+++ b/source/dnode/mnode/impl/src/mndStb.c
@@ -2109,7 +2109,7 @@ static int32_t mndRetrieveStb(SRpcMsg *pReq, SShowObj *pShow, SSDataBlock *pBloc
    pColInfo = taosArrayGet(pBlock->pDataBlock, cols++);
    colDataAppend(pColInfo, numOfRows, (const char *)&pStb->updateTime, false);  // number of tables
-    pColInfo = taosArrayGet(pBlock->pDataBlock, cols);
+    pColInfo = taosArrayGet(pBlock->pDataBlock, cols++);
    if (pStb->commentLen > 0) {
      char comment[TSDB_TB_COMMENT_LEN + VARSTR_HEADER_SIZE] = {0};
      STR_TO_VARSTR(comment, pStb->comment);
@@ -2122,6 +2122,34 @@ static int32_t mndRetrieveStb(SRpcMsg *pReq, SShowObj *pShow, SSDataBlock *pBloc
      colDataAppendNULL(pColInfo, numOfRows);
    }
+    char watermark[64 + VARSTR_HEADER_SIZE] = {0};
+    sprintf(varDataVal(watermark), "%" PRId64 "a,%" PRId64 "a", pStb->watermark[0], pStb->watermark[1]);
+    varDataSetLen(watermark, strlen(varDataVal(watermark)));
+    pColInfo = taosArrayGet(pBlock->pDataBlock, cols++);
+    colDataAppend(pColInfo, numOfRows, (const char *)watermark, false);
+    char maxDelay[64 + VARSTR_HEADER_SIZE] = {0};
+    sprintf(varDataVal(maxDelay), "%" PRId64 "a,%" PRId64 "a", pStb->maxdelay[0], pStb->maxdelay[1]);
+    varDataSetLen(maxDelay, strlen(varDataVal(maxDelay)));
+    pColInfo = taosArrayGet(pBlock->pDataBlock, cols++);
+    colDataAppend(pColInfo, numOfRows, (const char *)maxDelay, false);
+    char rollup[128 + VARSTR_HEADER_SIZE] = {0};
+    int32_t rollupNum = (int32_t)taosArrayGetSize(pStb->pFuncs);
+    for (int32_t i = 0; i < rollupNum; ++i) {
+      char *funcName = taosArrayGet(pStb->pFuncs, i);
+      if (i) {
+        strcat(varDataVal(rollup), ", ");        
+      }
+      strcat(varDataVal(rollup), funcName);
+    }
+    varDataSetLen(rollup, strlen(varDataVal(rollup)));
+    pColInfo = taosArrayGet(pBlock->pDataBlock, cols++);
+    colDataAppend(pColInfo, numOfRows, (const char *)rollup, false);
    numOfRows++;
    sdbRelease(pSdb, pStb);
  }

--- a/source/dnode/vnode/src/inc/sma.h
+++ b/source/dnode/vnode/src/inc/sma.h
@@ -67,7 +67,6 @@ struct SRSmaStat {
  int64_t   submitVer;
  int64_t   refId;         // shared by fetch tasks
  int8_t    triggerStat;   // shared by fetch tasks
-  int8_t    runningStat;   // for persistence task 
  SHashObj *rsmaInfoHash;  // key: stbUid, value: SRSmaInfo;
 };
@@ -83,7 +82,6 @@ struct SSmaStat {
 #define SMA_RSMA_STAT(s)     (&(s)->rsmaStat)
 #define RSMA_INFO_HASH(r)    ((r)->rsmaInfoHash)
 #define RSMA_TRIGGER_STAT(r) (&(r)->triggerStat)
-#define RSMA_RUNNING_STAT(r) (&(r)->runningStat)
 #define RSMA_REF_ID(r)       ((r)->refId)
 #define RSMA_SUBMIT_VER(r)   ((r)->submitVer)
@@ -93,7 +91,7 @@ enum {
  TASK_TRIGGER_STAT_INACTIVE = 2,
  TASK_TRIGGER_STAT_PAUSED = 3,
  TASK_TRIGGER_STAT_CANCELLED = 4,
-  TASK_TRIGGER_STAT_FINISHED = 5,
+  TASK_TRIGGER_STAT_DROPPED = 5,
 };
 void  tdDestroySmaEnv(SSmaEnv *pSmaEnv);

--- a/source/dnode/vnode/src/inc/vnodeInt.h
+++ b/source/dnode/vnode/src/inc/vnodeInt.h
@@ -172,8 +172,9 @@ int32_t tdProcessTSmaCreate(SSma* pSma, int64_t version, const char* msg);
 int32_t tdProcessTSmaInsert(SSma* pSma, int64_t indexUid, const char* msg);
 int64_t tdRSmaGetMaxSubmitVer(SSma* pSma, int8_t level);
-int32_t tdProcessRSmaCreate(SVnode* pVnode, SVCreateStbReq* pReq);
+int32_t tdProcessRSmaCreate(SSma* pSma, SVCreateStbReq* pReq);
 int32_t tdProcessRSmaSubmit(SSma* pSma, void* pMsg, int32_t inputType);
+int32_t tdProcessRSmaDrop(SSma* pSma, SVDropStbReq* pReq);
 int32_t tdFetchTbUidList(SSma* pSma, STbUidStore** ppStore, tb_uid_t suid, tb_uid_t uid);
 int32_t tdUpdateTbUidList(SSma* pSma, STbUidStore* pUidStore);
 void    tdUidStoreDestory(STbUidStore* pStore);

--- a/source/dnode/vnode/src/sma/smaEnv.c
+++ b/source/dnode/vnode/src/sma/smaEnv.c
@@ -254,26 +254,7 @@ static void tdDestroyRSmaStat(void *pRSmaStat) {
    // step 1: set rsma trigger stat cancelled
    atomic_store_8(RSMA_TRIGGER_STAT(pStat), TASK_TRIGGER_STAT_CANCELLED);
-    // step 2: wait the persistence thread to finish
+    // step 2: destroy the rsma info and associated fetch tasks
-    int32_t nLoops = 0;
-    if (atomic_load_8(RSMA_RUNNING_STAT(pStat)) == 1) {
-      while (1) {
-        if (atomic_load_8(RSMA_TRIGGER_STAT(pStat)) == TASK_TRIGGER_STAT_FINISHED) {
-          smaDebug("vgId:%d, rsma persist task finished already", SMA_VID(pSma));
-          break;
-        } else {
-          smaDebug("vgId:%d, rsma persist task not finished yet since rsma stat in %" PRIi8, SMA_VID(pSma),
-                   atomic_load_8(RSMA_TRIGGER_STAT(pStat)));
-        }
-        ++nLoops;
-        if (nLoops > 1000) {
-          sched_yield();
-          nLoops = 0;
-        }
-      }
-    }
-    // step 3: destroy the rsma info and associated fetch tasks
    // TODO: use taosHashSetFreeFp when taosHashSetFreeFp is ready.
    if (taosHashGetSize(RSMA_INFO_HASH(pStat)) > 0) {
      void *infoHash = taosHashIterate(RSMA_INFO_HASH(pStat), NULL);
@@ -285,8 +266,8 @@ static void tdDestroyRSmaStat(void *pRSmaStat) {
    }
    taosHashCleanup(RSMA_INFO_HASH(pStat));
-    // step 5: wait all triggered fetch tasks finished
+    // step 3: wait all triggered fetch tasks finished
-    nLoops = 0;
+    int32_t nLoops = 0;
    while (1) {
      if (T_REF_VAL_GET((SSmaStat *)pStat) == 0) {
        smaDebug("vgId:%d, rsma fetch tasks all finished", SMA_VID(pSma));

--- a/source/dnode/vnode/src/sma/smaRollup.c
+++ b/source/dnode/vnode/src/sma/smaRollup.c
@@ -37,8 +37,6 @@ static SRSmaInfo *tdGetRSmaInfoBySuid(SSma *pSma, int64_t suid);
 static int32_t    tdRSmaFetchAndSubmitResult(SRSmaInfoItem *pItem, STSchema *pTSchema, int64_t suid, SRSmaStat *pStat,
                                             int8_t blkType);
 static void       tdRSmaFetchTrigger(void *param, void *tmrId);
-static void       tdRSmaPersistTrigger(void *param, void *tmrId);
-static void      *tdRSmaPersistExec(void *param);
 static void       tdRSmaQTaskInfoGetFName(int32_t vid, int64_t version, char *outputName);
 static int32_t tdRSmaQTaskInfoIterInit(SRSmaQTaskInfoIter *pIter, STFile *pTFile);
@@ -67,9 +65,12 @@ struct SRSmaInfo {
 static SRSmaInfo *tdGetRSmaInfoByItem(SRSmaInfoItem *pItem) {
  // adapt accordingly if definition of SRSmaInfo update
-  int32_t rsmaInfoHeadLen = sizeof(int64_t) + sizeof(STSchema *);
+  SRSmaInfo *pResult = NULL;
-  ASSERT(pItem->level == 1 || pItem->level == 2);
+  int32_t    rsmaInfoHeadLen = sizeof(int64_t) + sizeof(STSchema *);
-  return (SRSmaInfo *)POINTER_SHIFT(pItem, -sizeof(SRSmaInfoItem) * (pItem->level - 1) - rsmaInfoHeadLen);
+  ASSERT(pItem->level == TSDB_RETENTION_L1 || pItem->level == TSDB_RETENTION_L2);
+  pResult = (SRSmaInfo *)POINTER_SHIFT(pItem, -(sizeof(SRSmaInfoItem) * (pItem->level - 1) + rsmaInfoHeadLen));
+  ASSERT(pResult->pTSchema->numOfCols > 1);
+  return pResult;
 }
 struct SRSmaQTaskInfoItem {
@@ -278,7 +279,7 @@ static int32_t tdSetRSmaInfoItemParams(SSma *pSma, SRSmaParam *param, SRSmaStat
    if (pItem->maxDelay > TSDB_MAX_ROLLUP_MAX_DELAY) {
      pItem->maxDelay = TSDB_MAX_ROLLUP_MAX_DELAY;
    }
-    pItem->level = (idx == 0 ? TSDB_RETENTION_L1 : TSDB_RETENTION_L2);
+    pItem->level = idx == 0 ? TSDB_RETENTION_L1 : TSDB_RETENTION_L2;
    smaInfo("vgId:%d table:%" PRIi64 " level:%" PRIi8 " maxdelay:%" PRIi64 " watermark:%" PRIi64
            ", finally maxdelay:%" PRIi32,
            SMA_VID(pSma), pRSmaInfo->suid, idx + 1, param->maxdelay[idx], param->watermark[idx], pItem->maxDelay);
@@ -375,20 +376,48 @@ _err:
 /**
 * @brief Check and init qTaskInfo_t, only applicable to stable with SRSmaParam currently
 *
- * @param pVnode
+ * @param pSma
 * @param pReq
 * @return int32_t
 */
-int32_t tdProcessRSmaCreate(SVnode *pVnode, SVCreateStbReq *pReq) {
+int32_t tdProcessRSmaCreate(SSma *pSma, SVCreateStbReq *pReq) {
-  SSma *pSma = pVnode->pSma;
+  SVnode *pVnode = pSma->pVnode;
  if (!pReq->rollup) {
-    smaTrace("vgId:%d, return directly since no rollup for stable %s %" PRIi64, SMA_VID(pSma), pReq->name, pReq->suid);
+    smaTrace("vgId:%d, not create rsma for stable %s %" PRIi64 " since no rollup in req", TD_VID(pVnode), pReq->name,
+             pReq->suid);
+    return TSDB_CODE_SUCCESS;
+  }
+  if (!VND_IS_RSMA(pVnode)) {
+    smaTrace("vgId:%d, not create rsma for stable %s %" PRIi64 " since vnd is not rsma", TD_VID(pVnode), pReq->name,
+             pReq->suid);
    return TSDB_CODE_SUCCESS;
  }
  return tdProcessRSmaCreateImpl(pSma, &pReq->rsmaParam, pReq->suid, pReq->name);
 }
+/**
+ * @brief drop cache for stb
+ *
+ * @param pSma
+ * @param pReq
+ * @return int32_t
+ */
+int32_t tdProcessRSmaDrop(SSma *pSma,  SVDropStbReq *pReq) { 
+  SVnode *pVnode = pSma->pVnode;
+  if (!VND_IS_RSMA(pVnode)) {
+    smaTrace("vgId:%d, not create rsma for stable %s %" PRIi64 " since vnd is not rsma", TD_VID(pVnode), pReq->name,
+             pReq->suid);
+    return TSDB_CODE_SUCCESS;
+  }
+  smaDebug("vgId:%d, drop rsma for table %" PRIi64 " succeed", TD_VID(pVnode), pReq->suid);
+  return TSDB_CODE_SUCCESS;
+ }
 /**
 * @brief store suid/[uids], prefer to use array and then hash
 *
@@ -1174,123 +1203,6 @@ _err:
  return TSDB_CODE_FAILED;
 }
-static void *tdRSmaPersistExec(void *param) {
-  setThreadName("rsma-task-persist");
-  SRSmaStat *pRSmaStat = param;
-  SSma      *pSma = pRSmaStat->pSma;
-  int8_t triggerStat = atomic_load_8(RSMA_TRIGGER_STAT(pRSmaStat));
-  if (TASK_TRIGGER_STAT_CANCELLED == triggerStat || TASK_TRIGGER_STAT_PAUSED == triggerStat) {
-    goto _end;
-  }
-  // execution
-  tdRSmaPersistExecImpl(pRSmaStat);
-_end:
-  if (TASK_TRIGGER_STAT_INACTIVE == atomic_val_compare_exchange_8(RSMA_TRIGGER_STAT(pRSmaStat),
-                                                                  TASK_TRIGGER_STAT_INACTIVE,
-                                                                  TASK_TRIGGER_STAT_ACTIVE)) {
-    smaDebug("vgId:%d, rsma persist task is active again", SMA_VID(pSma));
-  } else if (TASK_TRIGGER_STAT_CANCELLED == atomic_val_compare_exchange_8(RSMA_TRIGGER_STAT(pRSmaStat),
-                                                                          TASK_TRIGGER_STAT_CANCELLED,
-                                                                          TASK_TRIGGER_STAT_FINISHED)) {
-    smaDebug("vgId:%d, rsma persist task is cancelled", SMA_VID(pSma));
-  } else {
-    smaWarn("vgId:%d, rsma persist task in stat %" PRIi8, SMA_VID(pSma), atomic_load_8(RSMA_TRIGGER_STAT(pRSmaStat)));
-  }
-  atomic_store_8(RSMA_RUNNING_STAT(pRSmaStat), 0);
-  smaDebug("vgId:%d, release rsetId rsetId:%" PRIi64 " refId:%d", SMA_VID(pSma), smaMgmt.rsetId, pRSmaStat->refId);
-  tdReleaseSmaRef(smaMgmt.rsetId, pRSmaStat->refId, __func__, __LINE__);
-  taosThreadExit(NULL);
-  return NULL;
-}
-static void tdRSmaPersistTask(SRSmaStat *pRSmaStat) {
-  TdThreadAttr thAttr;
-  taosThreadAttrInit(&thAttr);
-  taosThreadAttrSetDetachState(&thAttr, PTHREAD_CREATE_DETACHED);
-  TdThread tid;
-  if (taosThreadCreate(&tid, &thAttr, tdRSmaPersistExec, pRSmaStat) != 0) {
-    if (TASK_TRIGGER_STAT_INACTIVE == atomic_val_compare_exchange_8(RSMA_TRIGGER_STAT(pRSmaStat),
-                                                                    TASK_TRIGGER_STAT_INACTIVE,
-                                                                    TASK_TRIGGER_STAT_ACTIVE)) {
-      smaDebug("vgId:%d, persist task is active again", SMA_VID(pRSmaStat->pSma));
-    } else if (TASK_TRIGGER_STAT_CANCELLED == atomic_val_compare_exchange_8(RSMA_TRIGGER_STAT(pRSmaStat),
-                                                                            TASK_TRIGGER_STAT_CANCELLED,
-                                                                            TASK_TRIGGER_STAT_FINISHED)) {
-      smaDebug("vgId:%d, persist task is cancelled and set finished", SMA_VID(pRSmaStat->pSma));
-    } else {
-      smaWarn("vgId:%d, persist task in abnormal stat %" PRIi8, SMA_VID(pRSmaStat->pSma),
-              atomic_load_8(RSMA_TRIGGER_STAT(pRSmaStat)));
-    }
-    atomic_store_8(RSMA_RUNNING_STAT(pRSmaStat), 0);
-    smaDebug("vgId:%d, release rsetId rsetId:%" PRIi64 " refId:%d)", SMA_VID(pRSmaStat->pSma), smaMgmt.rsetId,
-             pRSmaStat->refId);
-    tdReleaseSmaRef(smaMgmt.rsetId, pRSmaStat->refId, __func__, __LINE__);
-  }
-  taosThreadAttrDestroy(&thAttr);
-}
-/**
- * @brief trigger to persist rsma qTaskInfo
- *
- * @param param
- * @param tmrId
- */
-static void tdRSmaPersistTrigger(void *param, void *tmrId) {
-  SRSmaStat *rsmaStat = param;
-  SRSmaStat *pRSmaStat = (SRSmaStat *)taosAcquireRef(smaMgmt.rsetId, rsmaStat->refId);
-  ASSERT(0);
-  if (!pRSmaStat) {
-    smaDebug("rsma persistence task not start since already destroyed");
-    return;
-  }
-  int8_t tmrStat =
-      atomic_val_compare_exchange_8(RSMA_TRIGGER_STAT(pRSmaStat), TASK_TRIGGER_STAT_ACTIVE, TASK_TRIGGER_STAT_INACTIVE);
-  switch (tmrStat) {
-    case TASK_TRIGGER_STAT_ACTIVE: {
-      atomic_store_8(RSMA_RUNNING_STAT(pRSmaStat), 1);
-      if (TASK_TRIGGER_STAT_CANCELLED != atomic_val_compare_exchange_8(RSMA_TRIGGER_STAT(pRSmaStat),
-                                                                       TASK_TRIGGER_STAT_CANCELLED,
-                                                                       TASK_TRIGGER_STAT_FINISHED)) {
-        smaDebug("vgId:%d, rsma persistence start since active", SMA_VID(pRSmaStat->pSma));
-        // start persist task
-        tdRSmaPersistTask(pRSmaStat);
-        // taosTmrReset(tdRSmaPersistTrigger, 5000, pRSmaStat, pRSmaStat->tmrHandle,
-        //              RSMA_TMR_ID(pRSmaStat));
-      } else {
-        atomic_store_8(RSMA_RUNNING_STAT(pRSmaStat), 0);
-      }
-      return;
-    } break;
-    case TASK_TRIGGER_STAT_CANCELLED: {
-      atomic_store_8(RSMA_TRIGGER_STAT(pRSmaStat), TASK_TRIGGER_STAT_FINISHED);
-      smaDebug("rsma persistence not start since cancelled and finished");
-    } break;
-    case TASK_TRIGGER_STAT_PAUSED: {
-      smaDebug("rsma persistence not start since paused");
-    } break;
-    case TASK_TRIGGER_STAT_INACTIVE: {
-      smaDebug("rsma persistence not start since inactive");
-    } break;
-    case TASK_TRIGGER_STAT_INIT: {
-      smaDebug("rsma persistence not start since init");
-    } break;
-    default: {
-      smaWarn("rsma persistence not start since unknown stat %" PRIi8, tmrStat);
-    } break;
-  }
-  taosReleaseRef(smaMgmt.rsetId, rsmaStat->refId);
-}
 /**
 * @brief trigger to get rsma result
 *
@@ -1314,8 +1226,7 @@ static void tdRSmaFetchTrigger(void *param, void *tmrId) {
  int8_t rsmaTriggerStat = atomic_load_8(RSMA_TRIGGER_STAT(pStat));
  switch (rsmaTriggerStat) {
    case TASK_TRIGGER_STAT_PAUSED:
-    case TASK_TRIGGER_STAT_CANCELLED:
+    case TASK_TRIGGER_STAT_CANCELLED: {
-    case TASK_TRIGGER_STAT_FINISHED: {
      tdReleaseSmaRef(smaMgmt.rsetId, pItem->refId, __func__, __LINE__);
      smaDebug("vgId:%d, not fetch rsma level %" PRIi8 " data since stat is %" PRIi8 ", rsetId rsetId:%" PRIi64
               " refId:%d",
@@ -1328,8 +1239,6 @@ static void tdRSmaFetchTrigger(void *param, void *tmrId) {
  SRSmaInfo *pRSmaInfo = tdGetRSmaInfoByItem(pItem);
-  ASSERT(pRSmaInfo->suid > 0);
  int8_t fetchTriggerStat =
      atomic_val_compare_exchange_8(&pItem->triggerStat, TASK_TRIGGER_STAT_ACTIVE, TASK_TRIGGER_STAT_INACTIVE);
  switch (fetchTriggerStat) {

--- a/source/dnode/vnode/src/tsdb/tsdbCache.c
+++ b/source/dnode/vnode/src/tsdb/tsdbCache.c
@@ -472,6 +472,7 @@ static int32_t getNextRowFromFS(void *iter, TSDBROW **ppRow) {
    case SFSNEXTROW_FILESET: {
      SDFileSet *pFileSet = NULL;
+    _next_fileset:
      if (--state->iFileSet >= 0) {
        pFileSet = (SDFileSet *)taosArrayGet(state->aDFileSet, state->iFileSet);
      } else {
@@ -508,6 +509,10 @@ static int32_t getNextRowFromFS(void *iter, TSDBROW **ppRow) {
      state->pBlockIdx = taosArraySearch(state->aBlockIdx, state->pBlockIdxExp, tCmprBlockIdx, TD_EQ);
      if (code) goto _err;
+      if (!state->pBlockIdx) {
+        goto _next_fileset;
+      }
      tMapDataReset(&state->blockMap);
      code = tsdbReadBlock(state->pDataFReader, state->pBlockIdx, &state->blockMap, NULL);
      /* code = tsdbReadBlock(state->pDataFReader, &state->blockIdx, &state->blockMap, NULL); */

--- a/source/dnode/vnode/src/tsdb/tsdbRead.c
+++ b/source/dnode/vnode/src/tsdb/tsdbRead.c
@@ -400,7 +400,7 @@ static int32_t tsdbReaderCreate(SVnode* pVnode, SQueryTableDataCond* pCond, STsd
  pReader->idStr = (idstr != NULL) ? strdup(idstr) : NULL;
  pReader->verRange = getQueryVerRange(pVnode, pCond, level);
  pReader->type = pCond->type;
-  pReader->window = updateQueryTimeWindow(pVnode->pTsdb, &pCond->twindows);
+  pReader->window = updateQueryTimeWindow(pReader->pTsdb, &pCond->twindows);
  ASSERT(pCond->numOfCols > 0);
@@ -2203,15 +2203,15 @@ static STsdb* getTsdbByRetentions(SVnode* pVnode, TSKEY winSKey, SRetention* ret
    if (level == TSDB_RETENTION_L0) {
      *pLevel = TSDB_RETENTION_L0;
-      tsdbDebug("vgId:%d, read handle %p rsma level %d is selected to query %s", vgId, TSDB_RETENTION_L0, str);
+      tsdbDebug("vgId:%d, rsma level %d is selected to query %s", vgId, TSDB_RETENTION_L0, str);
      return VND_RSMA0(pVnode);
    } else if (level == TSDB_RETENTION_L1) {
      *pLevel = TSDB_RETENTION_L1;
-      tsdbDebug("vgId:%d, read handle %p rsma level %d is selected to query %s", vgId, TSDB_RETENTION_L1, str);
+      tsdbDebug("vgId:%d, rsma level %d is selected to query %s", vgId, TSDB_RETENTION_L1, str);
      return VND_RSMA1(pVnode);
    } else {
      *pLevel = TSDB_RETENTION_L2;
-      tsdbDebug("vgId:%d, read handle %p rsma level %d is selected to query %s", vgId, TSDB_RETENTION_L2, str);
+      tsdbDebug("vgId:%d, rsma level %d is selected to query %s", vgId, TSDB_RETENTION_L2, str);
      return VND_RSMA2(pVnode);
    }
  }

--- a/source/dnode/vnode/src/vnd/vnodeSvr.c
+++ b/source/dnode/vnode/src/vnd/vnodeSvr.c
@@ -417,7 +417,7 @@ static int32_t vnodeProcessCreateStbReq(SVnode *pVnode, int64_t version, void *p
    goto _err;
  }
-  if (tdProcessRSmaCreate(pVnode, &req) < 0) {
+  if (tdProcessRSmaCreate(pVnode->pSma, &req) < 0) {
    pRsp->code = terrno;
    goto _err;
  }
@@ -573,6 +573,11 @@ static int32_t vnodeProcessDropStbReq(SVnode *pVnode, int64_t version, void *pRe
    goto _exit;
  }
+  if (tdProcessRSmaDrop(pVnode->pSma, &req) < 0) {
+    rcode = terrno;
+    goto _exit;
+  }
  // return rsp
 _exit:
  pRsp->code = rcode;

--- a/source/libs/catalog/src/catalog.c
+++ b/source/libs/catalog/src/catalog.c
@@ -1293,6 +1293,7 @@ void catalogDestroy(void) {
  if (!taosCheckCurrentInDll()) {
    ctgClearCacheEnqueue(NULL, true, true, true);
+    taosThreadJoin(gCtgMgmt.updateThread, NULL);    
  }
  taosHashCleanup(gCtgMgmt.pCluster);

--- a/source/libs/executor/inc/tfill.h
+++ b/source/libs/executor/inc/tfill.h
@@ -42,6 +42,7 @@ typedef struct SFillInfo {
  TSKEY     start;                // start timestamp
  TSKEY     end;                  // endKey for fill
  TSKEY     currentKey;           // current active timestamp, the value may be changed during the fill procedure.
+  int32_t   tsSlotId;             // primary time stamp slot id
  int32_t   order;                // order [TSDB_ORDER_ASC|TSDB_ORDER_DESC]
  int32_t   type;                 // fill type
  int32_t   numOfRows;            // number of rows in the input data block
@@ -74,8 +75,8 @@ struct SFillColInfo* createFillColInfo(SExprInfo* pExpr, int32_t numOfOutput, co
 bool taosFillHasMoreResults(struct SFillInfo* pFillInfo);
 SFillInfo* taosCreateFillInfo(int32_t order, TSKEY skey, int32_t numOfTags, int32_t capacity, int32_t numOfCols,
-                                     SInterval* pInterval, int32_t fillType,
+                              SInterval* pInterval, int32_t fillType, struct SFillColInfo* pCol, int32_t slotId,
-                                     struct SFillColInfo* pCol, const char* id);
+                              const char* id);
 void* taosDestroyFillInfo(struct SFillInfo *pFillInfo);
 int64_t taosFillResultDataBlock(struct SFillInfo* pFillInfo, SSDataBlock* p, int32_t capacity);

--- a/source/libs/executor/src/cachescanoperator.c
+++ b/source/libs/executor/src/cachescanoperator.c
+/*
+ * Copyright (c) 2019 TAOS Data, Inc. <jhtao@taosdata.com>
+ *
+ * This program is free software: you can use, redistribute, and/or modify
+ * it under the terms of the GNU Affero General Public License, version 3
+ * or later ("AGPL"), as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.
+ *
+ * You should have received a copy of the GNU Affero General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#include "os.h"
+#include "function.h"
+#include "tname.h"
+#include "tdatablock.h"
+#include "tmsg.h"
+#include "executorimpl.h"
+#include "tcompare.h"
+#include "thash.h"
+#include "ttypes.h"
+#include "executorInt.h"
+static SSDataBlock* doScanLastrow(SOperatorInfo* pOperator);
+static void destroyLastrowScanOperator(void* param, int32_t numOfOutput);
+static int32_t extractTargetSlotId(const SArray* pColMatchInfo, SExecTaskInfo* pTaskInfo, int32_t** pSlotIds);
+SOperatorInfo* createLastrowScanOperator(SLastRowScanPhysiNode* pScanNode, SReadHandle* readHandle, SArray* pTableList,
+                                         SExecTaskInfo* pTaskInfo) {
+  SLastrowScanInfo* pInfo = taosMemoryCalloc(1, sizeof(SLastrowScanInfo));
+  SOperatorInfo*    pOperator = taosMemoryCalloc(1, sizeof(SOperatorInfo));
+  if (pInfo == NULL || pOperator == NULL) {
+    goto _error;
+  }
+  pInfo->pTableList = pTableList;
+  pInfo->readHandle = *readHandle;
+  pInfo->pRes = createResDataBlock(pScanNode->node.pOutputDataBlockDesc);
+  int32_t numOfCols = 0;
+  pInfo->pColMatchInfo = extractColMatchInfo(pScanNode->pScanCols, pScanNode->node.pOutputDataBlockDesc, &numOfCols,
+                                             COL_MATCH_FROM_COL_ID);
+  int32_t* pCols = taosMemoryMalloc(numOfCols * sizeof(int32_t));
+  for (int32_t i = 0; i < numOfCols; ++i) {
+    SColMatchInfo* pColMatch = taosArrayGet(pInfo->pColMatchInfo, i);
+    pCols[i] = pColMatch->colId;
+  }
+  int32_t code = extractTargetSlotId(pInfo->pColMatchInfo, pTaskInfo, &pInfo->pSlotIds);
+  if (code != TSDB_CODE_SUCCESS) {
+    goto _error;
+  }
+  tsdbLastRowReaderOpen(readHandle->vnode, LASTROW_RETRIEVE_TYPE_ALL, pTableList, pCols, numOfCols,
+                        &pInfo->pLastrowReader);
+  taosMemoryFree(pCols);
+  pOperator->name = "LastrowScanOperator";
+  pOperator->operatorType = QUERY_NODE_PHYSICAL_PLAN_LAST_ROW_SCAN;
+  pOperator->blocking = false;
+  pOperator->status = OP_NOT_OPENED;
+  pOperator->info = pInfo;
+  pOperator->pTaskInfo = pTaskInfo;
+  pOperator->exprSupp.numOfExprs = taosArrayGetSize(pInfo->pRes->pDataBlock);
+  initResultSizeInfo(pOperator, 1024);
+  blockDataEnsureCapacity(pInfo->pRes, pOperator->resultInfo.capacity);
+  pOperator->fpSet =
+      createOperatorFpSet(operatorDummyOpenFn, doScanLastrow, NULL, NULL, destroyLastrowScanOperator, NULL, NULL, NULL);
+  pOperator->cost.openCost = 0;
+  return pOperator;
+  _error:
+  pTaskInfo->code = TSDB_CODE_OUT_OF_MEMORY;
+  taosMemoryFree(pInfo);
+  taosMemoryFree(pOperator);
+  return NULL;
+}
+SSDataBlock* doScanLastrow(SOperatorInfo* pOperator) {
+  if (pOperator->status == OP_EXEC_DONE) {
+    return NULL;
+  }
+  SLastrowScanInfo* pInfo = pOperator->info;
+  SExecTaskInfo*    pTaskInfo = pOperator->pTaskInfo;
+  int32_t size = taosArrayGetSize(pInfo->pTableList);
+  if (size == 0) {
+    setTaskStatus(pTaskInfo, TASK_COMPLETED);
+    return NULL;
+  }
+  // check if it is a group by tbname
+  if (size == taosArrayGetSize(pInfo->pTableList)) {
+    blockDataCleanup(pInfo->pRes);
+    tsdbRetrieveLastRow(pInfo->pLastrowReader, pInfo->pRes, pInfo->pSlotIds);
+    return (pInfo->pRes->info.rows == 0) ? NULL : pInfo->pRes;
+  } else {
+    // todo fetch the result for each group
+  }
+  return pInfo->pRes->info.rows == 0 ? NULL : pInfo->pRes;
+}
+void destroyLastrowScanOperator(void* param, int32_t numOfOutput) {
+  SLastrowScanInfo* pInfo = (SLastrowScanInfo*)param;
+  blockDataDestroy(pInfo->pRes);
+  tsdbLastrowReaderClose(pInfo->pLastrowReader);
+  taosMemoryFreeClear(param);
+}
+int32_t extractTargetSlotId(const SArray* pColMatchInfo, SExecTaskInfo* pTaskInfo, int32_t** pSlotIds) {
+  size_t   numOfCols = taosArrayGetSize(pColMatchInfo);
+  *pSlotIds = taosMemoryMalloc(numOfCols * sizeof(int32_t));
+  if (*pSlotIds == NULL)  {
+    return TSDB_CODE_OUT_OF_MEMORY;
+  }
+  for (int32_t i = 0; i < numOfCols; ++i) {
+    SColMatchInfo* pColMatch = taosArrayGet(pColMatchInfo, i);
+    for (int32_t j = 0; j < pTaskInfo->schemaVer.sw->nCols; ++j) {
+      if (pColMatch->colId == pTaskInfo->schemaVer.sw->pSchema[j].colId &&
+          pColMatch->colId == PRIMARYKEY_TIMESTAMP_COL_ID) {
+        (*pSlotIds)[pColMatch->targetSlotId] = -1;
+        break;
+      }
+      if (pColMatch->colId == pTaskInfo->schemaVer.sw->pSchema[j].colId) {
+        (*pSlotIds)[pColMatch->targetSlotId] = j;
+        break;
+      }
+    }
+  }
+  return TSDB_CODE_SUCCESS;
+}
\ No newline at end of file
--- a/source/libs/executor/src/executorimpl.c
+++ b/source/libs/executor/src/executorimpl.c
@@ -4013,10 +4013,12 @@ static int32_t initFillInfo(SFillOperatorInfo* pInfo, SExprInfo* pExpr, int32_t
  w = getFirstQualifiedTimeWindow(win.skey, &w, pInterval, TSDB_ORDER_ASC);
  int32_t order = TSDB_ORDER_ASC;
-  pInfo->pFillInfo = taosCreateFillInfo(order, w.skey, 0, capacity, numOfCols, pInterval, fillType, pColInfo, id);
+  pInfo->pFillInfo = taosCreateFillInfo(order, w.skey, 0, capacity, numOfCols, pInterval,
+      fillType, pColInfo, pInfo->primaryTsCol, id);
  pInfo->win = win;
  pInfo->p = taosMemoryCalloc(numOfCols, POINTER_BYTES);
  if (pInfo->pFillInfo == NULL || pInfo->p == NULL) {
    taosMemoryFree(pInfo->pFillInfo);
    taosMemoryFree(pInfo->p);

--- a/source/libs/executor/src/tfill.c
+++ b/source/libs/executor/src/tfill.c
@@ -14,6 +14,7 @@
 */
 #include "os.h"
+#include "query.h"
 #include "taosdef.h"
 #include "tmsg.h"
 #include "ttypes.h"
@@ -48,14 +49,15 @@ static void setTagsValue(SFillInfo* pFillInfo, void** data, int32_t genRows) {
  }
 }
-static void setNullRow(SSDataBlock* pBlock, int32_t numOfCol, int32_t rowIndex) {
+static void setNullRow(SSDataBlock* pBlock, int64_t ts, int32_t rowIndex) {
  // the first are always the timestamp column, so start from the second column.
  for (int32_t i = 0; i < taosArrayGetSize(pBlock->pDataBlock); ++i) {
    SColumnInfoData* p = taosArrayGet(pBlock->pDataBlock, i);
-    if (p->info.type == TSDB_DATA_TYPE_TIMESTAMP && i == 0) {
+    if (p->info.type == TSDB_DATA_TYPE_TIMESTAMP) {
-      continue;
+        colDataAppend(p, rowIndex, (const char*)&ts, false);
+    } else {
+      colDataAppendNULL(p, rowIndex);
    }
-    colDataAppendNULL(p, rowIndex);
  }
 }
@@ -64,16 +66,17 @@ static void setNullRow(SSDataBlock* pBlock, int32_t numOfCol, int32_t rowIndex)
 static void doSetVal(SColumnInfoData* pDstColInfoData, int32_t rowIndex, const SGroupKeys* pKey);
-static void doFillOneRowResult(SFillInfo* pFillInfo, SSDataBlock* pBlock, SSDataBlock* pSrcBlock, int64_t ts,
+static void doFillOneRow(SFillInfo* pFillInfo, SSDataBlock* pBlock, SSDataBlock* pSrcBlock, int64_t ts,
-                               bool outOfBound) {
+                         bool outOfBound) {
  SPoint  point1, point2, point;
  int32_t step = GET_FORWARD_DIRECTION_FACTOR(pFillInfo->order);
  // set the primary timestamp column value
  int32_t          index = pFillInfo->numOfCurrent;
-  SColumnInfoData* pCol0 = taosArrayGet(pBlock->pDataBlock, 0);
+  SColumnInfoData* pCol0 = taosArrayGet(pBlock->pDataBlock, pFillInfo->tsSlotId);
  char*            val = colDataGetData(pCol0, index);
+  // set the primary timestamp value
  *(TSKEY*)val = pFillInfo->currentKey;
  // set the other values
@@ -92,7 +95,7 @@ static void doFillOneRowResult(SFillInfo* pFillInfo, SSDataBlock* pBlock, SSData
    }
  } else if (pFillInfo->type == TSDB_FILL_NEXT) {
    SArray* p = FILL_IS_ASC_FILL(pFillInfo) ? pFillInfo->next : pFillInfo->prev;
+    // todo  refactor: start from 0 not 1
    for (int32_t i = 1; i < pFillInfo->numOfCols; ++i) {
      SFillColInfo* pCol = &pFillInfo->pFillCol[i];
      if (TSDB_COL_IS_TAG(pCol->flag)) {
@@ -106,7 +109,7 @@ static void doFillOneRowResult(SFillInfo* pFillInfo, SSDataBlock* pBlock, SSData
  } else if (pFillInfo->type == TSDB_FILL_LINEAR) {
    // TODO : linear interpolation supports NULL value
    if (outOfBound) {
-      setNullRow(pBlock, pFillInfo->numOfCols, index);
+      setNullRow(pBlock, pFillInfo->currentKey, index);
    } else {
      for (int32_t i = 1; i < pFillInfo->numOfCols; ++i) {
        SFillColInfo* pCol = &pFillInfo->pFillCol[i];
@@ -143,7 +146,7 @@ static void doFillOneRowResult(SFillInfo* pFillInfo, SSDataBlock* pBlock, SSData
      }
    }
  } else if (pFillInfo->type == TSDB_FILL_NULL) {  // fill with NULL
-    setNullRow(pBlock, pFillInfo->numOfCols, index);
+    setNullRow(pBlock, pFillInfo->currentKey, index);
  } else {  // fill with user specified value for each column
    for (int32_t i = 1; i < pFillInfo->numOfCols; ++i) {
      SFillColInfo* pCol = &pFillInfo->pFillCol[i];
@@ -166,6 +169,8 @@ static void doFillOneRowResult(SFillInfo* pFillInfo, SSDataBlock* pBlock, SSData
        int64_t v = 0;
        GET_TYPED_DATA(v, int64_t, pVar->nType, &pVar->i);
        colDataAppend(pDst, index, (char*)&v, false);
+      } else if (pDst->info.type == TSDB_DATA_TYPE_TIMESTAMP) {
+        colDataAppend(pDst, index, (const char*)&pFillInfo->currentKey, false);
      }
    }
  }
@@ -247,7 +252,7 @@ static int32_t fillResultImpl(SFillInfo* pFillInfo, SSDataBlock* pBlock, int32_t
      // fill the gap between two input rows
      while (((pFillInfo->currentKey < ts && ascFill) || (pFillInfo->currentKey > ts && !ascFill)) &&
             pFillInfo->numOfCurrent < outputRows) {
-        doFillOneRowResult(pFillInfo, pBlock, pFillInfo->pSrcBlock, ts, false);
+        doFillOneRow(pFillInfo, pBlock, pFillInfo->pSrcBlock, ts, false);
      }
      // output buffer is full, abort
@@ -343,7 +348,7 @@ static int64_t appendFilledResult(SFillInfo* pFillInfo, SSDataBlock* pBlock, int
   */
  pFillInfo->numOfCurrent = 0;
  while (pFillInfo->numOfCurrent < resultCapacity) {
-    doFillOneRowResult(pFillInfo, pBlock, pFillInfo->pSrcBlock, pFillInfo->start, true);
+    doFillOneRow(pFillInfo, pBlock, pFillInfo->pSrcBlock, pFillInfo->start, true);
  }
  pFillInfo->numOfTotal += pFillInfo->numOfCurrent;
@@ -408,7 +413,7 @@ static int32_t taosNumOfRemainRows(SFillInfo* pFillInfo) {
 }
 struct SFillInfo* taosCreateFillInfo(int32_t order, TSKEY skey, int32_t numOfTags, int32_t capacity, int32_t numOfCols,
-                                     SInterval* pInterval, int32_t fillType, struct SFillColInfo* pCol,
+                                     SInterval* pInterval, int32_t fillType, struct SFillColInfo* pCol, int32_t primaryTsSlotId,
                                     const char* id) {
  if (fillType == TSDB_FILL_NONE) {
    return NULL;
@@ -420,6 +425,8 @@ struct SFillInfo* taosCreateFillInfo(int32_t order, TSKEY skey, int32_t numOfTag
    return NULL;
  }
+  pFillInfo->tsSlotId = primaryTsSlotId;
  taosResetFillInfo(pFillInfo, skey);
  pFillInfo->order = order;
@@ -589,11 +596,10 @@ int64_t taosFillResultDataBlock(SFillInfo* pFillInfo, SSDataBlock* p, int32_t ca
    assert(numOfRes == pFillInfo->numOfCurrent);
  }
-  //  qDebug("fill:%p, generated fill result, src block:%d, index:%d, brange:%"PRId64"-%"PRId64", currentKey:%"PRId64",
+  qDebug("fill:%p, generated fill result, src block:%d, index:%d, brange:%" PRId64 "-%" PRId64 ", currentKey:%" PRId64
-  //  current:%d, total:%d, %p",
+         ", current : % d, total : % d, %s", pFillInfo,
-  //      pFillInfo, pFillInfo->numOfRows, pFillInfo->index, pFillInfo->start, pFillInfo->end, pFillInfo->currentKey,
+           pFillInfo->numOfRows, pFillInfo->index, pFillInfo->start, pFillInfo->end, pFillInfo->currentKey,
-  //      pFillInfo->numOfCurrent,
+           pFillInfo->numOfCurrent, pFillInfo->numOfTotal, pFillInfo->id);
-  //         pFillInfo->numOfTotal, pFillInfo->handle);
  return numOfRes;
 }

--- a/source/libs/function/inc/builtinsimpl.h
+++ b/source/libs/function/inc/builtinsimpl.h
@@ -157,6 +157,7 @@ int32_t elapsedCombine(SqlFunctionCtx* pDestCtx, SqlFunctionCtx* pSourceCtx);
 bool getHistogramFuncEnv(struct SFunctionNode* pFunc, SFuncExecEnv* pEnv);
 bool histogramFunctionSetup(SqlFunctionCtx *pCtx, SResultRowEntryInfo* pResultInfo);
 int32_t histogramFunction(SqlFunctionCtx* pCtx);
+int32_t histogramFunctionPartial(SqlFunctionCtx* pCtx);
 int32_t histogramFunctionMerge(SqlFunctionCtx* pCtx);
 int32_t histogramFinalize(SqlFunctionCtx* pCtx, SSDataBlock* pBlock);
 int32_t histogramPartialFinalize(SqlFunctionCtx* pCtx, SSDataBlock* pBlock);

--- a/source/libs/function/src/builtins.c
+++ b/source/libs/function/src/builtins.c
@@ -1427,9 +1427,12 @@ static int32_t translateIrate(SFunctionNode* pFunc, char* pErrBuf, int32_t len)
 }
 static int32_t translateFirstLast(SFunctionNode* pFunc, char* pErrBuf, int32_t len) {
-  // first(col_list) will be rewritten as first(col)
+  int32_t numOfParams = LIST_LENGTH(pFunc->pParameterList);
-  if (1 != LIST_LENGTH(pFunc->pParameterList)) {
+  for (int32_t i = 0; i < numOfParams; ++i) {
-    return TSDB_CODE_SUCCESS;
+    SNode* pParamNode = nodesListGetNode(pFunc->pParameterList, i);
+    if (QUERY_NODE_VALUE == nodeType(pParamNode)) {
+      return invaildFuncParaValueErrMsg(pErrBuf, len, pFunc->functionName);
+    }
  }
  pFunc->node.resType = ((SExprNode*)nodesListGetNode(pFunc->pParameterList, 0))->resType;
@@ -2323,7 +2326,7 @@ const SBuiltinFuncDefinition funcMgtBuiltins[] = {
    .translateFunc = translateHistogramPartial,
    .getEnvFunc   = getHistogramFuncEnv,
    .initFunc     = histogramFunctionSetup,
-    .processFunc  = histogramFunction,
+    .processFunc  = histogramFunctionPartial,
    .finalizeFunc = histogramPartialFinalize,
    .invertFunc   = NULL,
    .combineFunc  = histogramCombine,

--- a/source/libs/function/src/builtinsimpl.c
+++ b/source/libs/function/src/builtinsimpl.c
@@ -907,11 +907,13 @@ int32_t avgFunctionMerge(SqlFunctionCtx* pCtx) {
  SAvgRes* pInfo = GET_ROWCELL_INTERBUF(GET_RES_INFO(pCtx));
-  int32_t  start = pInput->startRowIndex;
+  int32_t start = pInput->startRowIndex;
-  char*    data = colDataGetData(pCol, start);
-  SAvgRes* pInputInfo = (SAvgRes*)varDataVal(data);
-  avgTransferInfo(pInputInfo, pInfo);
+  for(int32_t i = start; i < start + pInput->numOfRows; ++i) {
+    char* data = colDataGetData(pCol, i);
+    SAvgRes* pInputInfo = (SAvgRes*)varDataVal(data);
+    avgTransferInfo(pInputInfo, pInfo);
+  }
  SET_VAL(GET_RES_INFO(pCtx), 1, 1);
@@ -2512,11 +2514,13 @@ int32_t apercentileFunctionMerge(SqlFunctionCtx* pCtx) {
  SAPercentileInfo* pInfo = GET_ROWCELL_INTERBUF(pResInfo);
-  int32_t           start = pInput->startRowIndex;
+  int32_t start = pInput->startRowIndex;
-  char*             data = colDataGetData(pCol, start);
-  SAPercentileInfo* pInputInfo = (SAPercentileInfo*)varDataVal(data);
-  apercentileTransferInfo(pInputInfo, pInfo);
+  for(int32_t i = start; i < start + pInput->numOfRows; ++i) {
+    char* data = colDataGetData(pCol, i);
+    SAPercentileInfo* pInputInfo = (SAPercentileInfo*)varDataVal(data);
+    apercentileTransferInfo(pInputInfo, pInfo);
+  }
  SET_VAL(pResInfo, 1, 1);
  return TSDB_CODE_SUCCESS;
@@ -2877,13 +2881,17 @@ static int32_t firstLastFunctionMergeImpl(SqlFunctionCtx* pCtx, bool isFirstQuer
  SFirstLastRes* pInfo = GET_ROWCELL_INTERBUF(GET_RES_INFO(pCtx));
-  int32_t        start = pInput->startRowIndex;
+  int32_t start = pInput->startRowIndex;
-  char*          data = colDataGetData(pCol, start);
+  int32_t numOfElems = 0;
-  SFirstLastRes* pInputInfo = (SFirstLastRes*)varDataVal(data);
-  firstLastTransferInfo(pCtx, pInputInfo, pInfo, isFirstQuery);
-  int32_t numOfElems = pInputInfo->hasResult ? 1 : 0;
+  for(int32_t i = start; i < start + pInput->numOfRows; ++i) {
+    char* data = colDataGetData(pCol, i);
+    SFirstLastRes* pInputInfo = (SFirstLastRes*)varDataVal(data);
+    firstLastTransferInfo(pCtx, pInputInfo, pInfo, isFirstQuery);
+    if (!numOfElems) {
+      numOfElems = pInputInfo->hasResult ? 1 : 0;
+    }
+  }
  SET_VAL(GET_RES_INFO(pCtx), numOfElems, 1);
@@ -3703,11 +3711,13 @@ int32_t spreadFunctionMerge(SqlFunctionCtx* pCtx) {
  SSpreadInfo* pInfo = GET_ROWCELL_INTERBUF(GET_RES_INFO(pCtx));
-  int32_t      start = pInput->startRowIndex;
+  int32_t start = pInput->startRowIndex;
-  char*        data = colDataGetData(pCol, start);
-  SSpreadInfo* pInputInfo = (SSpreadInfo*)varDataVal(data);
-  spreadTransferInfo(pInputInfo, pInfo);
+  for(int32_t i = start; i < start + pInput->numOfRows; ++i) {
+    char* data = colDataGetData(pCol, i);
+    SSpreadInfo* pInputInfo = (SSpreadInfo*)varDataVal(data);
+    spreadTransferInfo(pInputInfo, pInfo);
+  }
  SET_VAL(GET_RES_INFO(pCtx), 1, 1);
@@ -3873,11 +3883,13 @@ int32_t elapsedFunctionMerge(SqlFunctionCtx* pCtx) {
  SElapsedInfo* pInfo = GET_ROWCELL_INTERBUF(GET_RES_INFO(pCtx));
-  int32_t       start = pInput->startRowIndex;
+  int32_t start = pInput->startRowIndex;
-  char*         data = colDataGetData(pCol, start);
-  SElapsedInfo* pInputInfo = (SElapsedInfo*)varDataVal(data);
-  elapsedTransferInfo(pInputInfo, pInfo);
+  for(int32_t i = start; i < start + pInput->numOfRows; ++i) {
+    char* data = colDataGetData(pCol, i);
+    SElapsedInfo* pInputInfo = (SElapsedInfo*)varDataVal(data);
+    elapsedTransferInfo(pInputInfo, pInfo);
+  }
  SET_VAL(GET_RES_INFO(pCtx), 1, 1);
  return TSDB_CODE_SUCCESS;
@@ -4098,7 +4110,7 @@ bool histogramFunctionSetup(SqlFunctionCtx* pCtx, SResultRowEntryInfo* pResultIn
  return true;
 }
-int32_t histogramFunction(SqlFunctionCtx* pCtx) {
+static int32_t histogramFunctionImpl(SqlFunctionCtx* pCtx, bool isPartial) {
  SHistoFuncInfo* pInfo = GET_ROWCELL_INTERBUF(GET_RES_INFO(pCtx));
  SInputColumnInfoData* pInput = &pCtx->input;
@@ -4130,10 +4142,22 @@ int32_t histogramFunction(SqlFunctionCtx* pCtx) {
    }
  }
-  SET_VAL(GET_RES_INFO(pCtx), numOfElems, pInfo->numOfBins);
+  if (!isPartial) {
+    SET_VAL(GET_RES_INFO(pCtx), numOfElems, pInfo->numOfBins);
+  } else {
+    SET_VAL(GET_RES_INFO(pCtx), numOfElems, 1);
+  }
  return TSDB_CODE_SUCCESS;
 }
+int32_t histogramFunction(SqlFunctionCtx* pCtx) {
+  return histogramFunctionImpl(pCtx, false);
+}
+int32_t histogramFunctionPartial(SqlFunctionCtx* pCtx) {
+  return histogramFunctionImpl(pCtx, true);
+}
 static void histogramTransferInfo(SHistoFuncInfo* pInput, SHistoFuncInfo* pOutput) {
  pOutput->normalized = pInput->normalized;
  pOutput->numOfBins = pInput->numOfBins;
@@ -4152,11 +4176,13 @@ int32_t histogramFunctionMerge(SqlFunctionCtx* pCtx) {
  SHistoFuncInfo* pInfo = GET_ROWCELL_INTERBUF(GET_RES_INFO(pCtx));
-  int32_t         start = pInput->startRowIndex;
+  int32_t start = pInput->startRowIndex;
-  char*           data = colDataGetData(pCol, start);
-  SHistoFuncInfo* pInputInfo = (SHistoFuncInfo*)varDataVal(data);
-  histogramTransferInfo(pInputInfo, pInfo);
+  for(int32_t i = start; i < start + pInput->numOfRows; ++i) {
+    char* data = colDataGetData(pCol, i);
+    SHistoFuncInfo* pInputInfo = (SHistoFuncInfo*)varDataVal(data);
+    histogramTransferInfo(pInputInfo, pInfo);
+  }
  SET_VAL(GET_RES_INFO(pCtx), pInfo->numOfBins, pInfo->numOfBins);
  return TSDB_CODE_SUCCESS;
@@ -4199,6 +4225,7 @@ int32_t histogramFinalize(SqlFunctionCtx* pCtx, SSDataBlock* pBlock) {
 }
 int32_t histogramPartialFinalize(SqlFunctionCtx* pCtx, SSDataBlock* pBlock) {
+  SResultRowEntryInfo* pResInfo = GET_RES_INFO(pCtx);
  SHistoFuncInfo* pInfo = GET_ROWCELL_INTERBUF(GET_RES_INFO(pCtx));
  int32_t         resultBytes = getHistogramInfoSize();
  char*           res = taosMemoryCalloc(resultBytes + VARSTR_HEADER_SIZE, sizeof(char));
@@ -4212,7 +4239,7 @@ int32_t histogramPartialFinalize(SqlFunctionCtx* pCtx, SSDataBlock* pBlock) {
  colDataAppend(pCol, pBlock->info.rows, res, false);
  taosMemoryFree(res);
-  return 1;
+  return pResInfo->numOfRes;
 }
 int32_t histogramCombine(SqlFunctionCtx* pDestCtx, SqlFunctionCtx* pSourceCtx) {
@@ -4370,11 +4397,13 @@ int32_t hllFunctionMerge(SqlFunctionCtx* pCtx) {
  SHLLInfo* pInfo = GET_ROWCELL_INTERBUF(GET_RES_INFO(pCtx));
-  int32_t   start = pInput->startRowIndex;
+  int32_t start = pInput->startRowIndex;
-  char*     data = colDataGetData(pCol, start);
-  SHLLInfo* pInputInfo = (SHLLInfo*)varDataVal(data);
-  hllTransferInfo(pInputInfo, pInfo);
+  for(int32_t i = start; i < start + pInput->numOfRows; ++i) {
+    char* data = colDataGetData(pCol, i);
+    SHLLInfo* pInputInfo = (SHLLInfo*)varDataVal(data);
+    hllTransferInfo(pInputInfo, pInfo);
+  }
  SET_VAL(GET_RES_INFO(pCtx), 1, 1);
  return TSDB_CODE_SUCCESS;

--- a/source/libs/planner/src/planLogicCreater.c
+++ b/source/libs/planner/src/planLogicCreater.c
@@ -162,12 +162,11 @@ static EScanType getScanType(SLogicPlanContext* pCxt, SNodeList* pScanPseudoCols
  }
  if (NULL == pScanCols) {
-    // select count(*) from t
    return NULL == pScanPseudoCols
               ? SCAN_TYPE_TABLE
               : ((FUNCTION_TYPE_BLOCK_DIST_INFO == ((SFunctionNode*)nodesListGetNode(pScanPseudoCols, 0))->funcType)
                      ? SCAN_TYPE_BLOCK_INFO
-                      : SCAN_TYPE_TAG);
+                      : SCAN_TYPE_TABLE);
  }
  if (TSDB_SYSTEM_TABLE == tableType) {
@@ -181,7 +180,7 @@ static EScanType getScanType(SLogicPlanContext* pCxt, SNodeList* pScanPseudoCols
    }
  }
-  return SCAN_TYPE_TAG;
+  return SCAN_TYPE_TABLE;
 }
 static SNode* createPrimaryKeyCol(uint64_t tableId) {

--- a/source/libs/planner/src/planOptimizer.c
+++ b/source/libs/planner/src/planOptimizer.c
@@ -1364,9 +1364,9 @@ static EDealRes partTagsOptHasColImpl(SNode* pNode, void* pContext) {
  return DEAL_RES_CONTINUE;
 }
-static bool partTagsOptHasCol(SNodeList* pPartKeys) {
+static bool planOptNodeListHasCol(SNodeList* pKeys) {
  bool hasCol = false;
-  nodesWalkExprs(pPartKeys, partTagsOptHasColImpl, &hasCol);
+  nodesWalkExprs(pKeys, partTagsOptHasColImpl, &hasCol);
  return hasCol;
 }
@@ -1409,7 +1409,7 @@ static bool partTagsOptMayBeOptimized(SLogicNode* pNode) {
    return false;
  }
-  return !partTagsOptHasCol(partTagsGetPartKeys(pNode)) && partTagsOptAreSupportedFuncs(partTagsGetFuncs(pNode));
+  return !planOptNodeListHasCol(partTagsGetPartKeys(pNode)) && partTagsOptAreSupportedFuncs(partTagsGetFuncs(pNode));
 }
 static int32_t partTagsOptRebuildTbanme(SNodeList* pPartKeys) {
@@ -2096,6 +2096,37 @@ static int32_t mergeProjectsOptimize(SOptimizeContext* pCxt, SLogicSubplan* pLog
  return mergeProjectsOptimizeImpl(pCxt, pLogicSubplan, pProjectNode);
 }
+static bool tagScanMayBeOptimized(SLogicNode* pNode) {
+  if (QUERY_NODE_LOGIC_PLAN_SCAN != nodeType(pNode) || (SCAN_TYPE_TAG == ((SScanLogicNode*)pNode)->scanType)) {
+    return false;
+  }
+  SScanLogicNode *pScan = (SScanLogicNode*)pNode;
+  if (NULL != pScan->pScanCols) {
+    return false;
+  }
+  if (NULL == pNode->pParent || QUERY_NODE_LOGIC_PLAN_AGG != nodeType(pNode->pParent) || 1 != LIST_LENGTH(pNode->pParent->pChildren)) {
+    return false;
+  }
+  SAggLogicNode* pAgg = (SAggLogicNode*)(pNode->pParent);
+  if (NULL == pAgg->pGroupKeys || NULL != pAgg->pAggFuncs || planOptNodeListHasCol(pAgg->pGroupKeys)) {
+    return false;
+  }
+  return true;
+}
+static int32_t tagScanOptimize(SOptimizeContext* pCxt, SLogicSubplan* pLogicSubplan) {
+  SScanLogicNode* pScanNode = (SScanLogicNode*)optFindPossibleNode(pLogicSubplan->pNode, tagScanMayBeOptimized);
+  if (NULL == pScanNode) {
+    return TSDB_CODE_SUCCESS;
+  }
+  pScanNode->scanType = SCAN_TYPE_TAG;
+  pCxt->optimized = true;
+  return TSDB_CODE_SUCCESS;
+}
 // clang-format off
 static const SOptimizeRule optimizeRuleSet[] = {
  {.pName = "ScanPath",                   .optimizeFunc = scanPathOptimize},
@@ -2108,7 +2139,8 @@ static const SOptimizeRule optimizeRuleSet[] = {
  {.pName = "EliminateSetOperator",       .optimizeFunc = eliminateSetOpOptimize},
  {.pName = "RewriteTail",                .optimizeFunc = rewriteTailOptimize},
  {.pName = "RewriteUnique",              .optimizeFunc = rewriteUniqueOptimize},
-  {.pName = "LastRowScan",                .optimizeFunc = lastRowScanOptimize}
+  {.pName = "LastRowScan",               .optimizeFunc = lastRowScanOptimize},
+  {.pName = "TagScan",                   .optimizeFunc = tagScanOptimize}
 };
 // clang-format on

--- a/source/libs/planner/test/planOptimizeTest.cpp
+++ b/source/libs/planner/test/planOptimizeTest.cpp
@@ -87,4 +87,11 @@ TEST_F(PlanOptimizeTest, eliminateProjection) {
 TEST_F(PlanOptimizeTest, pushDownProjectCond) {
  useDb("root", "test");
  run("select 1-abs(c1) from (select unique(c1) c1 from st1s3) where 1-c1>5 order by 1 nulls first");
+}
+TEST_F(PlanOptimizeTest, tagScan) {
+  useDb("root", "test");
+  run("select tag1 from st1 group by tag1");
+  run("select distinct tag1 from st1");
+  run("select tag1*tag1 from st1 group by tag1*tag1");
 }
\ No newline at end of file
--- a/source/libs/scheduler/src/schRemote.c
+++ b/source/libs/scheduler/src/schRemote.c
@@ -35,7 +35,7 @@ int32_t schValidateRspMsgType(SSchJob *pJob, SSchTask *pTask, int32_t msgType) {
        SCH_TASK_ELOG("rsp msg type mis-match, last sent msgType:%s, rspType:%s", TMSG_INFO(lastMsgType),
                      TMSG_INFO(msgType));
        SCH_ERR_RET(TSDB_CODE_SCH_STATUS_ERROR);
-      }    
+      }
      if (taskStatus != JOB_TASK_STATUS_PART_SUCC) {
        SCH_TASK_ELOG("rsp msg conflicted with task status, status:%s, rspType:%s", jobTaskStatusStr(taskStatus),
                      TMSG_INFO(msgType));
@@ -75,7 +75,7 @@ int32_t schValidateRspMsgType(SSchJob *pJob, SSchTask *pTask, int32_t msgType) {
 // Note: no more task error processing, handled in function internal
 int32_t schHandleResponseMsg(SSchJob *pJob, SSchTask *pTask, int32_t execId, SDataBuf *pMsg, int32_t rspCode) {
  int32_t code = 0;
-  char *msg = pMsg->pData;
+  char   *msg = pMsg->pData;
  int32_t msgSize = pMsg->len;
  int32_t msgType = pMsg->msgType;
@@ -253,15 +253,15 @@ int32_t schHandleResponseMsg(SSchJob *pJob, SSchTask *pTask, int32_t execId, SDa
      rsp->sversion = ntohl(rsp->sversion);
      rsp->tversion = ntohl(rsp->tversion);
      rsp->affectedRows = be64toh(rsp->affectedRows);
      SCH_ERR_JRET(rsp->code);
      SCH_ERR_JRET(schSaveJobQueryRes(pJob, rsp));
      atomic_add_fetch_32(&pJob->resNumOfRows, rsp->affectedRows);
-      taosMemoryFreeClear(msg);              
+      taosMemoryFreeClear(msg);
      SCH_ERR_RET(schProcessOnTaskSuccess(pJob, pTask));
      break;
@@ -375,7 +375,8 @@ int32_t schHandleCallback(void *param, SDataBuf *pMsg, int32_t rspCode) {
  SSchTask              *pTask = NULL;
  SSchJob               *pJob = NULL;
-  qDebug("begin to handle rsp msg, type:%s, handle:%p, code:%s", TMSG_INFO(pMsg->msgType), pMsg->handle, tstrerror(rspCode));
+  qDebug("begin to handle rsp msg, type:%s, handle:%p, code:%s", TMSG_INFO(pMsg->msgType), pMsg->handle,
+         tstrerror(rspCode));
  SCH_ERR_RET(schProcessOnCbBegin(&pJob, &pTask, pParam->queryId, pParam->refId, pParam->taskId));
@@ -387,7 +388,8 @@ int32_t schHandleCallback(void *param, SDataBuf *pMsg, int32_t rspCode) {
  taosMemoryFreeClear(pMsg->pData);
  taosMemoryFreeClear(param);
-  qDebug("end to handle rsp msg, type:%s, handle:%p, code:%s", TMSG_INFO(pMsg->msgType), pMsg->handle, tstrerror(rspCode));
+  qDebug("end to handle rsp msg, type:%s, handle:%p, code:%s", TMSG_INFO(pMsg->msgType), pMsg->handle,
+         tstrerror(rspCode));
  SCH_RET(code);
 }
@@ -424,7 +426,7 @@ int32_t schHandleCommitCallback(void *param, SDataBuf *pMsg, int32_t code) {
 }
 int32_t schHandleHbCallback(void *param, SDataBuf *pMsg, int32_t code) {
-  SSchedulerHbRsp rsp = {0};
+  SSchedulerHbRsp        rsp = {0};
  SSchTaskCallbackParam *pParam = (SSchTaskCallbackParam *)param;
  if (code) {
@@ -453,8 +455,8 @@ _return:
  SCH_RET(code);
 }
+int32_t schMakeCallbackParam(SSchJob *pJob, SSchTask *pTask, int32_t msgType, bool isHb, SSchTrans *trans,
-int32_t schMakeCallbackParam(SSchJob *pJob, SSchTask *pTask, int32_t msgType, bool isHb, SSchTrans *trans, void **pParam) {
+                             void **pParam) {
  if (!isHb) {
    SSchTaskCallbackParam *param = taosMemoryCalloc(1, sizeof(SSchTaskCallbackParam));
    if (NULL == param) {
@@ -940,7 +942,8 @@ int32_t schBuildAndSendMsg(SSchJob *pJob, SSchTask *pTask, SQueryNodeAddr *addr,
  if (NULL == addr) {
    addr = taosArrayGet(pTask->candidateAddrs, pTask->candidateIdx);
    isCandidateAddr = true;
-    SCH_TASK_DLOG("target candidateIdx %d", pTask->candidateIdx);
+    SCH_TASK_DLOG("target candidateIdx %d, epInUse %d/%d", pTask->candidateIdx, addr->epSet.inUse,
+                  addr->epSet.numOfEps);
  }
  switch (msgType) {

--- a/source/libs/scheduler/src/schTask.c
+++ b/source/libs/scheduler/src/schTask.c
@@ -21,11 +21,9 @@
 #include "tref.h"
 #include "trpc.h"
 void schFreeTask(SSchJob *pJob, SSchTask *pTask) {
  schDeregisterTaskHb(pJob, pTask);
  if (pTask->candidateAddrs) {
    taosArrayDestroy(pTask->candidateAddrs);
  }
@@ -45,17 +43,17 @@ void schFreeTask(SSchJob *pJob, SSchTask *pTask) {
  }
 }
 int32_t schInitTask(SSchJob *pJob, SSchTask *pTask, SSubplan *pPlan, SSchLevel *pLevel, int32_t levelNum) {
  int32_t code = 0;
  pTask->plan = pPlan;
  pTask->level = pLevel;
  pTask->execId = -1;
  pTask->maxExecTimes = SCH_TASK_MAX_EXEC_TIMES(pLevel->level, levelNum);
  pTask->timeoutUsec = SCH_DEFAULT_TASK_TIMEOUT_USEC;
  pTask->taskId = schGenTaskId();
-  pTask->execNodes = taosHashInit(SCH_MAX_CANDIDATE_EP_NUM, taosGetDefaultHashFunction(TSDB_DATA_TYPE_INT), true, HASH_NO_LOCK);
+  pTask->execNodes =
+      taosHashInit(SCH_MAX_CANDIDATE_EP_NUM, taosGetDefaultHashFunction(TSDB_DATA_TYPE_INT), true, HASH_NO_LOCK);
  pTask->profile.execTime = taosMemoryCalloc(pTask->maxExecTimes, sizeof(int64_t));
  if (NULL == pTask->execNodes || NULL == pTask->profile.execTime) {
    SCH_ERR_JRET(TSDB_CODE_QRY_OUT_OF_MEMORY);
@@ -110,8 +108,8 @@ int32_t schDropTaskExecNode(SSchJob *pJob, SSchTask *pTask, void *handle, int32_
  } else {
    SCH_TASK_DLOG("execId %d removed from execNodeList", execId);
  }
-  if (execId != pTask->execId) {     // ignore it
+  if (execId != pTask->execId) {  // ignore it
    SCH_TASK_DLOG("execId %d is not current execId %d", execId, pTask->execId);
    SCH_ERR_RET(TSDB_CODE_SCH_IGNORE_ERROR);
  }
@@ -149,13 +147,13 @@ int32_t schProcessOnTaskFailure(SSchJob *pJob, SSchTask *pTask, int32_t errCode)
  if (TSDB_CODE_SCH_IGNORE_ERROR == errCode) {
    return TSDB_CODE_SCH_IGNORE_ERROR;
  }
  int8_t status = 0;
  if (schJobNeedToStop(pJob, &status)) {
    SCH_TASK_DLOG("no more task failure processing cause of job status %s", jobTaskStatusStr(status));
    SCH_ERR_RET(TSDB_CODE_SCH_IGNORE_ERROR);
  }
  if (SCH_GET_TASK_STATUS(pTask) != JOB_TASK_STATUS_EXEC) {
    SCH_TASK_ELOG("task already not in EXEC status, status:%s", SCH_GET_TASK_STATUS_STR(pTask));
    SCH_ERR_RET(TSDB_CODE_SCH_STATUS_ERROR);
@@ -204,8 +202,6 @@ int32_t schProcessOnTaskFailure(SSchJob *pJob, SSchTask *pTask, int32_t errCode)
  SCH_RET(errCode);
 }
 // Note: no more task error processing, handled in function internal
 int32_t schProcessOnTaskSuccess(SSchJob *pJob, SSchTask *pTask) {
  bool    moved = false;
@@ -265,13 +261,14 @@ int32_t schProcessOnTaskSuccess(SSchJob *pJob, SSchTask *pTask) {
    int32_t   readyNum = atomic_add_fetch_32(&parent->childReady, 1);
    SCH_LOCK_TASK(parent);
-    SDownstreamSourceNode source = {.type = QUERY_NODE_DOWNSTREAM_SOURCE,
+    SDownstreamSourceNode source = {
-                                    .taskId = pTask->taskId,
+        .type = QUERY_NODE_DOWNSTREAM_SOURCE,
-                                    .schedId = schMgmt.sId,
+        .taskId = pTask->taskId,
-                                    .execId = pTask->execId,
+        .schedId = schMgmt.sId,
-                                    .addr = pTask->succeedAddr,
+        .execId = pTask->execId,
-                                    .fetchMsgType = SCH_FETCH_TYPE(pTask),
+        .addr = pTask->succeedAddr,
-                                    };
+        .fetchMsgType = SCH_FETCH_TYPE(pTask),
+    };
    qSetSubplanExecutionNode(parent->plan, pTask->plan->id.groupId, &source);
    SCH_UNLOCK_TASK(parent);
@@ -291,29 +288,29 @@ int32_t schRescheduleTask(SSchJob *pJob, SSchTask *pTask) {
    return TSDB_CODE_SUCCESS;
  }
-  if (SCH_TASK_TIMEOUT(pTask) && JOB_TASK_STATUS_EXEC == pTask->status && 
+  if (SCH_TASK_TIMEOUT(pTask) && JOB_TASK_STATUS_EXEC == pTask->status && pJob->fetchTask != pTask &&
-      pJob->fetchTask != pTask && taosArrayGetSize(pTask->candidateAddrs) > 1) {
+      taosArrayGetSize(pTask->candidateAddrs) > 1) {
    SCH_TASK_DLOG("task execId %d will be rescheduled now", pTask->execId);
    schDropTaskOnExecNode(pJob, pTask);
    taosHashClear(pTask->execNodes);
    SCH_ERR_RET(schProcessOnTaskFailure(pJob, pTask, TSDB_CODE_SCH_TIMEOUT_ERROR));
  }
  return TSDB_CODE_SUCCESS;
 }
-int32_t schDoTaskRedirect(SSchJob *pJob, SSchTask *pTask, SDataBuf* pData, int32_t rspCode) {
+int32_t schDoTaskRedirect(SSchJob *pJob, SSchTask *pTask, SDataBuf *pData, int32_t rspCode) {
  int32_t code = 0;
  if ((pTask->execId + 1) >= pTask->maxExecTimes) {
    SCH_TASK_DLOG("task no more retry since reach max try times, execId:%d", pTask->execId);
-    schSwitchJobStatus(pJob, JOB_TASK_STATUS_FAIL, (void*)&rspCode);
+    schSwitchJobStatus(pJob, JOB_TASK_STATUS_FAIL, (void *)&rspCode);
    return TSDB_CODE_SUCCESS;
  }
  SCH_TASK_DLOG("task will be redirected now, status:%s", SCH_GET_TASK_STATUS_STR(pTask));
  schDropTaskOnExecNode(pJob, pTask);
  taosHashClear(pTask->execNodes);
  SCH_ERR_JRET(schRemoveTaskFromExecList(pJob, pTask));
@@ -328,25 +325,24 @@ int32_t schDoTaskRedirect(SSchJob *pJob, SSchTask *pTask, SDataBuf* pData, int32
    if (pData) {
      SCH_ERR_JRET(schUpdateTaskCandidateAddr(pJob, pTask, pData->pEpSet));
    }
    if (SCH_TASK_NEED_FLOW_CTRL(pJob, pTask)) {
      if (JOB_TASK_STATUS_EXEC == SCH_GET_TASK_STATUS(pTask)) {
        SCH_ERR_JRET(schLaunchTasksInFlowCtrlList(pJob, pTask));
      }
-    }    
+    }
    SCH_SET_TASK_STATUS(pTask, JOB_TASK_STATUS_INIT);
    SCH_ERR_JRET(schLaunchTask(pJob, pTask));
    return TSDB_CODE_SUCCESS;
  }
  // merge plan
  pTask->childReady = 0;
  qClearSubplanExecutionNode(pTask->plan);
  // Note: current error task and upper level merge task
@@ -355,10 +351,10 @@ int32_t schDoTaskRedirect(SSchJob *pJob, SSchTask *pTask, SDataBuf* pData, int32
  }
  SCH_SET_TASK_STATUS(pTask, JOB_TASK_STATUS_INIT);
  int32_t childrenNum = taosArrayGetSize(pTask->children);
  for (int32_t i = 0; i < childrenNum; ++i) {
-    SSchTask* pChild = taosArrayGetP(pTask->children, i);
+    SSchTask *pChild = taosArrayGetP(pTask->children, i);
    SCH_LOCK_TASK(pChild);
    schDoTaskRedirect(pJob, pChild, NULL, rspCode);
    SCH_UNLOCK_TASK(pChild);
@@ -371,7 +367,7 @@ _return:
  SCH_RET(schProcessOnTaskFailure(pJob, pTask, code));
 }
-int32_t schHandleRedirect(SSchJob *pJob, SSchTask *pTask, SDataBuf* pData, int32_t rspCode) {
+int32_t schHandleRedirect(SSchJob *pJob, SSchTask *pTask, SDataBuf *pData, int32_t rspCode) {
  int32_t code = 0;
  if (SCH_IS_DATA_BIND_TASK(pTask)) {
@@ -537,7 +533,7 @@ int32_t schHandleTaskRetry(SSchJob *pJob, SSchTask *pTask) {
  SCH_ERR_RET(schRemoveTaskFromExecList(pJob, pTask));
  SCH_SET_TASK_STATUS(pTask, JOB_TASK_STATUS_INIT);
  if (SCH_TASK_NEED_FLOW_CTRL(pJob, pTask)) {
    SCH_ERR_RET(schLaunchTasksInFlowCtrlList(pJob, pTask));
  }
@@ -545,7 +541,8 @@ int32_t schHandleTaskRetry(SSchJob *pJob, SSchTask *pTask) {
  schDeregisterTaskHb(pJob, pTask);
  if (SCH_IS_DATA_BIND_TASK(pTask)) {
-    SCH_SWITCH_EPSET(&pTask->plan->execNode);
+    SQueryNodeAddr *addr = taosArrayGet(pTask->candidateAddrs, pTask->candidateIdx);
+    SCH_SWITCH_EPSET(addr);
  } else {
    SCH_ERR_RET(schSwitchTaskCandidateAddr(pJob, pTask));
  }
@@ -558,20 +555,21 @@ int32_t schHandleTaskRetry(SSchJob *pJob, SSchTask *pTask) {
 int32_t schSetAddrsFromNodeList(SSchJob *pJob, SSchTask *pTask) {
  int32_t addNum = 0;
  int32_t nodeNum = 0;
  if (pJob->nodeList) {
    nodeNum = taosArrayGetSize(pJob->nodeList);
    for (int32_t i = 0; i < nodeNum && addNum < SCH_MAX_CANDIDATE_EP_NUM; ++i) {
      SQueryNodeLoad *nload = taosArrayGet(pJob->nodeList, i);
      SQueryNodeAddr *naddr = &nload->addr;
      if (NULL == taosArrayPush(pTask->candidateAddrs, naddr)) {
        SCH_TASK_ELOG("taosArrayPush execNode to candidate addrs failed, addNum:%d, errno:%d", addNum, errno);
        SCH_ERR_RET(TSDB_CODE_QRY_OUT_OF_MEMORY);
      }
-      SCH_TASK_DLOG("set %dth candidate addr, id %d, fqdn:%s, port:%d", i, naddr->nodeId, SCH_GET_CUR_EP(naddr)->fqdn, SCH_GET_CUR_EP(naddr)->port);
+      SCH_TASK_DLOG("set %dth candidate addr, id %d, fqdn:%s, port:%d", i, naddr->nodeId, SCH_GET_CUR_EP(naddr)->fqdn,
+                    SCH_GET_CUR_EP(naddr)->port);
      ++addNum;
    }
@@ -585,7 +583,6 @@ int32_t schSetAddrsFromNodeList(SSchJob *pJob, SSchTask *pTask) {
  return TSDB_CODE_SUCCESS;
 }
 int32_t schSetTaskCandidateAddrs(SSchJob *pJob, SSchTask *pTask) {
  if (NULL != pTask->candidateAddrs) {
    return TSDB_CODE_SUCCESS;
@@ -628,16 +625,17 @@ int32_t schSetTaskCandidateAddrs(SSchJob *pJob, SSchTask *pTask) {
  return TSDB_CODE_SUCCESS;
 }
-int32_t schUpdateTaskCandidateAddr(SSchJob *pJob, SSchTask *pTask, SEpSet* pEpSet) {
+int32_t schUpdateTaskCandidateAddr(SSchJob *pJob, SSchTask *pTask, SEpSet *pEpSet) {
  if (NULL == pTask->candidateAddrs || 1 != taosArrayGetSize(pTask->candidateAddrs)) {
-    SCH_TASK_ELOG("not able to update cndidate addr, addr num %d", (int32_t)(pTask->candidateAddrs ? taosArrayGetSize(pTask->candidateAddrs): 0));
+    SCH_TASK_ELOG("not able to update cndidate addr, addr num %d",
+                  (int32_t)(pTask->candidateAddrs ? taosArrayGetSize(pTask->candidateAddrs) : 0));
    SCH_ERR_RET(TSDB_CODE_APP_ERROR);
  }
-  SQueryNodeAddr* pAddr = taosArrayGet(pTask->candidateAddrs, 0);
+  SQueryNodeAddr *pAddr = taosArrayGet(pTask->candidateAddrs, 0);
-  SEp* pOld = &pAddr->epSet.eps[pAddr->epSet.inUse];
+  SEp *pOld = &pAddr->epSet.eps[pAddr->epSet.inUse];
-  SEp* pNew = &pEpSet->eps[pEpSet->inUse];
+  SEp *pNew = &pEpSet->eps[pEpSet->inUse];
  SCH_TASK_DLOG("update task ep from %s:%d to %s:%d", pOld->fqdn, pOld->port, pNew->fqdn, pNew->port);
@@ -647,7 +645,7 @@ int32_t schUpdateTaskCandidateAddr(SSchJob *pJob, SSchTask *pTask, SEpSet* pEpSe
 }
 int32_t schSwitchTaskCandidateAddr(SSchJob *pJob, SSchTask *pTask) {
-  int32_t candidateNum = taosArrayGetSize(pTask->candidateAddrs);  
+  int32_t candidateNum = taosArrayGetSize(pTask->candidateAddrs);
  if (++pTask->candidateIdx >= candidateNum) {
    pTask->candidateIdx = 0;
  }
@@ -655,8 +653,6 @@ int32_t schSwitchTaskCandidateAddr(SSchJob *pJob, SSchTask *pTask) {
  return TSDB_CODE_SUCCESS;
 }
 int32_t schRemoveTaskFromExecList(SSchJob *pJob, SSchTask *pTask) {
  int32_t code = taosHashRemove(pJob->execTasks, &pTask->taskId, sizeof(pTask->taskId));
  if (code) {
@@ -692,33 +688,32 @@ void schDropTaskOnExecNode(SSchJob *pJob, SSchTask *pTask) {
  SCH_TASK_DLOG("task has been dropped on %d exec nodes", size);
 }
+int32_t schProcessOnTaskStatusRsp(SQueryNodeEpId *pEpId, SArray *pStatusList) {
+  int32_t   taskNum = (int32_t)taosArrayGetSize(pStatusList);
-int32_t schProcessOnTaskStatusRsp(SQueryNodeEpId* pEpId, SArray* pStatusList) {
-  int32_t taskNum = (int32_t)taosArrayGetSize(pStatusList);
  SSchTask *pTask = NULL;
-  SSchJob *pJob = NULL;
+  SSchJob  *pJob = NULL;
-  qDebug("%d task status in hb rsp from nodeId:%d, fqdn:%s, port:%d", taskNum, pEpId->nodeId, pEpId->ep.fqdn, pEpId->ep.port);
+  qDebug("%d task status in hb rsp from nodeId:%d, fqdn:%s, port:%d", taskNum, pEpId->nodeId, pEpId->ep.fqdn,
+         pEpId->ep.port);
  for (int32_t i = 0; i < taskNum; ++i) {
    STaskStatus *pStatus = taosArrayGet(pStatusList, i);
-    int32_t code = 0;
+    int32_t      code = 0;
-    qDebug("QID:0x%" PRIx64 ",TID:0x%" PRIx64 ",EID:%d task status in server: %s", 
+    qDebug("QID:0x%" PRIx64 ",TID:0x%" PRIx64 ",EID:%d task status in server: %s", pStatus->queryId, pStatus->taskId,
-      pStatus->queryId, pStatus->taskId, pStatus->execId, jobTaskStatusStr(pStatus->status));
+           pStatus->execId, jobTaskStatusStr(pStatus->status));
    if (schProcessOnCbBegin(&pJob, &pTask, pStatus->queryId, pStatus->refId, pStatus->taskId)) {
      continue;
    }
    if (pStatus->execId != pTask->execId) {
-      //TODO
+      // TODO
      SCH_TASK_DLOG("execId %d mis-match current execId %d", pStatus->execId, pTask->execId);
      schProcessOnCbEnd(pJob, pTask, 0);
      continue;
    }
    if (pStatus->status == JOB_TASK_STATUS_FAIL) {
      // RECORD AND HANDLE ERROR!!!!
      schProcessOnCbEnd(pJob, pTask, 0);
@@ -832,7 +827,6 @@ void schDropTaskInHashList(SSchJob *pJob, SHashObj *list) {
  }
 }
 // Note: no more error processing, handled in function internal
 int32_t schLaunchFetchTask(SSchJob *pJob) {
  int32_t code = 0;
@@ -851,5 +845,3 @@ _return:
  SCH_RET(schProcessOnTaskFailure(pJob, pJob->fetchTask, code));
 }
--- a/source/libs/sync/inc/syncInt.h
+++ b/source/libs/sync/inc/syncInt.h
@@ -251,6 +251,9 @@ void syncStartStandBy(int64_t rid);
 bool syncNodeCanChange(SSyncNode* pSyncNode);
 bool syncNodeCheckNewConfig(SSyncNode* pSyncNode, const SSyncCfg* pNewCfg);
+int32_t syncNodeLeaderTransfer(SSyncNode* pSyncNode);
+int32_t syncNodeLeaderTransferTo(SSyncNode* pSyncNode, SNodeInfo newLeader);
 // for debug --------------
 void syncNodePrint(SSyncNode* pObj);
 void syncNodePrint2(char* s, SSyncNode* pObj);

--- a/source/libs/sync/src/syncMain.c
+++ b/source/libs/sync/src/syncMain.c
@@ -316,6 +316,40 @@ int32_t syncLeaderTransferTo(int64_t rid, SNodeInfo newLeader) {
  return ret;
 }
+int32_t syncNodeLeaderTransfer(SSyncNode* pSyncNode) {
+  if (pSyncNode->peersNum == 0) {
+    sError("only one replica, cannot leader transfer");
+    terrno = TSDB_CODE_SYN_ONE_REPLICA;
+    return -1;
+  }
+  SNodeInfo newLeader = (pSyncNode->peersNodeInfo)[0];
+  int32_t   ret = syncNodeLeaderTransferTo(pSyncNode, newLeader);
+  return ret;
+}
+int32_t syncNodeLeaderTransferTo(SSyncNode* pSyncNode, SNodeInfo newLeader) {
+  int32_t ret = 0;
+  if (pSyncNode->replicaNum == 1) {
+    sError("only one replica, cannot leader transfer");
+    terrno = TSDB_CODE_SYN_ONE_REPLICA;
+    return -1;
+  }
+  SyncLeaderTransfer* pMsg = syncLeaderTransferBuild(pSyncNode->vgId);
+  pMsg->newLeaderId.addr = syncUtilAddr2U64(newLeader.nodeFqdn, newLeader.nodePort);
+  pMsg->newLeaderId.vgId = pSyncNode->vgId;
+  pMsg->newNodeInfo = newLeader;
+  ASSERT(pMsg != NULL);
+  SRpcMsg rpcMsg = {0};
+  syncLeaderTransfer2RpcMsg(pMsg, &rpcMsg);
+  syncLeaderTransferDestroy(pMsg);
+  ret = syncNodePropose(pSyncNode, &rpcMsg, false);
+  return ret;
+}
 bool syncCanLeaderTransfer(int64_t rid) {
  SSyncNode* pSyncNode = (SSyncNode*)taosAcquireRef(tsNodeRefId, rid);
  if (pSyncNode == NULL) {
@@ -1113,6 +1147,8 @@ void syncNodeStartStandBy(SSyncNode* pSyncNode) {
 void syncNodeClose(SSyncNode* pSyncNode) {
  syncNodeEventLog(pSyncNode, "sync close");
+  // leader transfer
  int32_t ret;
  ASSERT(pSyncNode != NULL);
@@ -1527,8 +1563,8 @@ void syncNodeEventLog(const SSyncNode* pSyncNode, char* str) {
    char logBuf[256 + 256];
    if (pSyncNode != NULL && pSyncNode->pRaftCfg != NULL && pSyncNode->pRaftStore != NULL) {
      snprintf(logBuf, sizeof(logBuf),
-               "vgId:%d, sync %s %s, term:%" PRIu64 ", commit:%" PRId64 ", beginlog:%" PRId64 ", lastlog:%" PRId64
+               "vgId:%d, sync %s %s, term:%" PRIu64 ", commit:%" PRId64 ", first:%" PRId64 ", last:%" PRId64
-               ", lastsnapshot:%" PRId64
+               ", snapshot:%" PRId64
               ", standby:%d, "
               "strategy:%d, batch:%d, "
               "replica-num:%d, "
@@ -1548,8 +1584,8 @@ void syncNodeEventLog(const SSyncNode* pSyncNode, char* str) {
    char* s = (char*)taosMemoryMalloc(len);
    if (pSyncNode != NULL && pSyncNode->pRaftCfg != NULL && pSyncNode->pRaftStore != NULL) {
      snprintf(s, len,
-               "vgId:%d, sync %s %s, term:%" PRIu64 ", commit:%" PRId64 ", beginlog:%" PRId64 ", lastlog:%" PRId64
+               "vgId:%d, sync %s %s, term:%" PRIu64 ", commit:%" PRId64 ", first:%" PRId64 ", last:%" PRId64
-               ", lastsnapshot:%" PRId64
+               ", snapshot:%" PRId64
               ", standby:%d, "
               "strategy:%d, batch:%d, "
               "replica-num:%d, "
@@ -1594,8 +1630,8 @@ void syncNodeErrorLog(const SSyncNode* pSyncNode, char* str) {
    char logBuf[256 + 256];
    if (pSyncNode != NULL && pSyncNode->pRaftCfg != NULL && pSyncNode->pRaftStore != NULL) {
      snprintf(logBuf, sizeof(logBuf),
-               "vgId:%d, sync %s %s, term:%" PRIu64 ", commit:%" PRId64 ", beginlog:%" PRId64 ", lastlog:%" PRId64
+               "vgId:%d, sync %s %s, term:%" PRIu64 ", commit:%" PRId64 ", first:%" PRId64 ", last:%" PRId64
-               ", lastsnapshot:%" PRId64
+               ", snapshot:%" PRId64
               ", standby:%d, "
               "replica-num:%d, "
               "lconfig:%" PRId64 ", changing:%d, restore:%d, %s",
@@ -1613,8 +1649,8 @@ void syncNodeErrorLog(const SSyncNode* pSyncNode, char* str) {
    char* s = (char*)taosMemoryMalloc(len);
    if (pSyncNode != NULL && pSyncNode->pRaftCfg != NULL && pSyncNode->pRaftStore != NULL) {
      snprintf(s, len,
-               "vgId:%d, sync %s %s, term:%" PRIu64 ", commit:%" PRId64 ", beginlog:%" PRId64 ", lastlog:%" PRId64
+               "vgId:%d, sync %s %s, term:%" PRIu64 ", commit:%" PRId64 ", first:%" PRId64 ", last:%" PRId64
-               ", lastsnapshot:%" PRId64
+               ", snapshot:%" PRId64
               ", standby:%d, "
               "replica-num:%d, "
               "lconfig:%" PRId64 ", changing:%d, restore:%d, %s",
@@ -1644,8 +1680,8 @@ char* syncNode2SimpleStr(const SSyncNode* pSyncNode) {
  SyncIndex logBeginIndex = pSyncNode->pLogStore->syncLogBeginIndex(pSyncNode->pLogStore);
  snprintf(s, len,
-           "vgId:%d, sync %s, term:%" PRIu64 ", commit:%" PRId64 ", beginlog:%" PRId64 ", lastlog:%" PRId64
+           "vgId:%d, sync %s, term:%" PRIu64 ", commit:%" PRId64 ", first:%" PRId64 ", last:%" PRId64
-           ", lastsnapshot:%" PRId64
+           ", snapshot:%" PRId64
           ", standby:%d, "
           "replica-num:%d, "
           "lconfig:%" PRId64 ", changing:%d, restore:%d",

--- a/source/libs/transport/inc/transComm.h
+++ b/source/libs/transport/inc/transComm.h
@@ -317,6 +317,11 @@ typedef struct STransReq {
  void* data;
 } STransReq;
+void  transReqQueueInit(queue* q);
+void* transReqQueuePushReq(queue* q);
+void* transReqQueueRemove(void* arg);
+void  transReqQueueClear(queue* q);
 // queue sending msgs
 typedef struct {
  SArray* q;

--- a/source/libs/transport/src/transCli.c
+++ b/source/libs/transport/src/transCli.c
@@ -19,7 +19,7 @@ typedef struct SCliConn {
  T_REF_DECLARE()
  uv_connect_t connReq;
  uv_stream_t* stream;
-  uv_write_t   writeReq;
+  queue        wreqQueue;
  void* hostThrd;
@@ -586,9 +586,10 @@ static SCliConn* cliCreateConn(SCliThrd* pThrd) {
  uv_tcp_init(pThrd->loop, (uv_tcp_t*)(conn->stream));
  conn->stream->data = conn;
-  conn->writeReq.data = conn;
  conn->connReq.data = conn;
+  transReqQueueInit(&conn->wreqQueue);
  transQueueInit(&conn->cliMsgs, NULL);
  QUEUE_INIT(&conn->conn);
  conn->hostThrd = pThrd;
@@ -627,6 +628,8 @@ static void cliDestroy(uv_handle_t* handle) {
  transCtxCleanup(&conn->ctx);
  transQueueDestroy(&conn->cliMsgs);
  tTrace("%s conn %p destroy successfully", CONN_GET_INST_LABEL(conn), conn);
+  transReqQueueClear(&conn->wreqQueue);
  transDestroyBuffer(&conn->readBuf);
  taosMemoryFree(conn);
 }
@@ -649,11 +652,8 @@ static bool cliHandleNoResp(SCliConn* conn) {
  return res;
 }
 static void cliSendCb(uv_write_t* req, int status) {
-  SCliConn* pConn = req && req->handle ? req->handle->data : NULL;
+  SCliConn* pConn = transReqQueueRemove(req);
-  taosMemoryFree(req);
+  if (pConn == NULL) return;
-  if (pConn == NULL) {
-    return;
-  }
  if (status == 0) {
    tTrace("%s conn %p data already was written out", CONN_GET_INST_LABEL(pConn), pConn);
@@ -711,7 +711,7 @@ void cliSend(SCliConn* pConn) {
    CONN_SET_PERSIST_BY_APP(pConn);
  }
-  uv_write_t* req = taosMemoryCalloc(1, sizeof(uv_write_t));
+  uv_write_t* req = transReqQueuePushReq(&pConn->wreqQueue);
  uv_write(req, (uv_stream_t*)pConn->stream, &wb, 1, cliSendCb);
  return;
 _RETURN:

--- a/source/libs/transport/src/transComm.c
+++ b/source/libs/transport/src/transComm.c
@@ -293,6 +293,48 @@ void* transCtxDumpBrokenlinkVal(STransCtx* ctx, int32_t* msgType) {
  return ret;
 }
+void transReqQueueInit(queue* q) {
+  // init req queue
+  QUEUE_INIT(q);
+}
+void* transReqQueuePushReq(queue* q) {
+  uv_write_t* req = taosMemoryCalloc(1, sizeof(uv_write_t));
+  STransReq*  wreq = taosMemoryCalloc(1, sizeof(STransReq));
+  wreq->data = req;
+  req->data = wreq;
+  QUEUE_PUSH(q, &wreq->q);
+  return req;
+}
+void* transReqQueueRemove(void* arg) {
+  void*       ret = NULL;
+  uv_write_t* req = arg;
+  STransReq*  wreq = req && req->data ? req->data : NULL;
+  assert(wreq->data == req);
+  if (wreq == NULL || wreq->data == NULL) {
+    taosMemoryFree(wreq->data);
+    taosMemoryFree(wreq);
+    return req;
+  }
+  QUEUE_REMOVE(&wreq->q);
+  ret = req && req->handle ? req->handle->data : NULL;
+  taosMemoryFree(wreq->data);
+  taosMemoryFree(wreq);
+  return ret;
+}
+void transReqQueueClear(queue* q) {
+  while (!QUEUE_IS_EMPTY(q)) {
+    queue* h = QUEUE_HEAD(q);
+    QUEUE_REMOVE(h);
+    STransReq* req = QUEUE_DATA(h, STransReq, q);
+    taosMemoryFree(req->data);
+    taosMemoryFree(req);
+  }
+}
 void transQueueInit(STransQueue* queue, void (*freeFunc)(const void* arg)) {
  queue->q = taosArrayInit(2, sizeof(void*));
  queue->freeFunc = (void (*)(const void*))freeFunc;

--- a/source/libs/transport/src/transSvr.c
+++ b/source/libs/transport/src/transSvr.c
@@ -316,7 +316,7 @@ void uvOnRecvCb(uv_stream_t* cli, ssize_t nread, const uv_buf_t* buf) {
        memset(&conn->regArg, 0, sizeof(conn->regArg));
      }
    }
-    transUnrefSrvHandle(conn);
+    destroyConn(conn, true);
  }
 }
 void uvAllocConnBufferCb(uv_handle_t* handle, size_t suggested_size, uv_buf_t* buf) {
@@ -331,14 +331,7 @@ void uvOnTimeoutCb(uv_timer_t* handle) {
 }
 void uvOnSendCb(uv_write_t* req, int status) {
-  STransReq* wreq = req && req->data ? req->data : NULL;
+  SSvrConn* conn = transReqQueueRemove(req);
-  SSvrConn*  conn = req && req->handle ? req->handle->data : NULL;
-  if (wreq != NULL && conn != NULL) {
-    QUEUE_REMOVE(&wreq->q);
-    taosMemoryFree(wreq->data);
-    taosMemoryFree(wreq);
-  }
  if (conn == NULL) return;
  if (status == 0) {
@@ -441,13 +434,7 @@ static void uvStartSendRespInternal(SSvrMsg* smsg) {
  uvPrepareSendData(smsg, &wb);
  transRefSrvHandle(pConn);
+  uv_write_t* req = transReqQueuePushReq(&pConn->wreqQueue);
-  uv_write_t* req = taosMemoryCalloc(1, sizeof(uv_write_t));
-  STransReq*  wreq = taosMemoryCalloc(1, sizeof(STransReq));
-  wreq->data = req;
-  req->data = wreq;
-  QUEUE_PUSH(&pConn->wreqQueue, &wreq->q);
  uv_write(req, (uv_stream_t*)pConn->pTcp, &wb, 1, uvOnSendCb);
 }
 static void uvStartSendResp(SSvrMsg* smsg) {
@@ -757,7 +744,7 @@ static SSvrConn* createConn(void* hThrd) {
  SSvrConn* pConn = (SSvrConn*)taosMemoryCalloc(1, sizeof(SSvrConn));
-  QUEUE_INIT(&pConn->wreqQueue);
+  transReqQueueInit(&pConn->wreqQueue);
  QUEUE_INIT(&pConn->queue);
  QUEUE_PUSH(&pThrd->conn, &pConn->queue);
@@ -792,9 +779,6 @@ static void destroyConn(SSvrConn* conn, bool clear) {
      tTrace("conn %p to be destroyed", conn);
      uv_close((uv_handle_t*)conn->pTcp, uvDestroyConn);
    }
-    //} else {
-    //  uvDestroyConn((uv_handle_t*)conn->pTcp);
-    //}
  }
 }
 static void destroyConnRegArg(SSvrConn* conn) {
@@ -834,13 +818,7 @@ static void uvDestroyConn(uv_handle_t* handle) {
    destroySmsg(msg);
  }
-  while (!QUEUE_IS_EMPTY(&conn->wreqQueue)) {
+  transReqQueueClear(&conn->wreqQueue);
-    queue* h = QUEUE_HEAD(&conn->wreqQueue);
-    QUEUE_REMOVE(h);
-    STransReq* req = QUEUE_DATA(h, STransReq, q);
-    taosMemoryFree(req->data);
-    taosMemoryFree(req);
-  }
  transQueueDestroy(&conn->srvMsgs);
  QUEUE_REMOVE(&conn->queue);

--- a/tests/pytest/crash_gen/shared/misc.py
+++ b/tests/pytest/crash_gen/shared/misc.py
@@ -4,7 +4,8 @@ import logging
 import os
 import sys
 from typing import Optional
+import time , datetime
+from datetime import datetime
 import taos
@@ -43,6 +44,10 @@ class MyLoggingAdapter(logging.LoggerAdapter):
 class Logging:
    logger = None # type: Optional[MyLoggingAdapter]
+    @classmethod
+    def _get_datetime(cls):
+        return datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-1]
    @classmethod
    def getLogger(cls):
        return cls.logger
@@ -64,22 +69,22 @@ class Logging:
        # global logger
        cls.logger = MyLoggingAdapter(_logger, {})
        cls.logger.setLevel(logging.DEBUG if debugMode else logging.INFO)  # default seems to be INFO
    @classmethod
    def info(cls, msg):
-        cls.logger.info(msg)
+        cls.logger.info("[time]: " + cls._get_datetime() +" [msg]: "+ msg)
    @classmethod
    def debug(cls, msg):
-        cls.logger.debug(msg)
+        cls.logger.debug("[time]: " + cls._get_datetime() +" [msg]: "+ msg)
    @classmethod
    def warning(cls, msg):
-        cls.logger.warning(msg)
+        cls.logger.warning("[time]: " + cls._get_datetime() +" [msg]: "+ msg)
    @classmethod
    def error(cls, msg):
-        cls.logger.error(msg)
+        cls.logger.error("[time]: " + cls._get_datetime() +" [msg]: "+ msg)
 class Status:
    STATUS_EMPTY    = 99

--- a/tests/script/tsim/sync/vnodesnapshot-test.sim
+++ b/tests/script/tsim/sync/vnodesnapshot-test.sim
@@ -168,11 +168,103 @@ system sh/exec.sh -n dnode3 -s stop -x SIGINT
+########################################################
 print ===> start dnode1 dnode2 dnode3 dnode4
 system sh/exec.sh -n dnode1 -s start
 system sh/exec.sh -n dnode2 -s start
 system sh/exec.sh -n dnode3 -s start
 system sh/exec.sh -n dnode4 -s start
+sleep 3000
+print =============== query data 
+sql connect
+sql use db
+sql select * from ct1
+print rows: $rows
+print $data00 $data01 $data02
+if $rows != 100 then
+  return -1
+endi
+system sh/exec.sh -n dnode1 -s stop -x SIGINT
+system sh/exec.sh -n dnode2 -s stop -x SIGINT
+system sh/exec.sh -n dnode3 -s stop -x SIGINT
+system sh/exec.sh -n dnode4 -s stop -x SIGINT
+########################################################
+########################################################
+print ===> start dnode1 dnode3 dnode4
+system sh/exec.sh -n dnode1 -s start
+#system sh/exec.sh -n dnode2 -s start
+system sh/exec.sh -n dnode3 -s start
+system sh/exec.sh -n dnode4 -s start
+sleep 7000
+print =============== query data 
+sql connect
+sql use db
+sql select * from ct1
+print rows: $rows
+print $data00 $data01 $data02
+if $rows != 100 then
+  return -1
+endi
+system sh/exec.sh -n dnode1 -s stop -x SIGINT
+#system sh/exec.sh -n dnode2 -s stop -x SIGINT
+system sh/exec.sh -n dnode3 -s stop -x SIGINT
+system sh/exec.sh -n dnode4 -s stop -x SIGINT
+########################################################
+########################################################
+print ===> start dnode1 dnode2 dnode4
+system sh/exec.sh -n dnode1 -s start
+system sh/exec.sh -n dnode2 -s start
+#system sh/exec.sh -n dnode3 -s start
+system sh/exec.sh -n dnode4 -s start
+sleep 3000
+print =============== query data 
+sql select * from ct1
+print rows: $rows
+print $data00 $data01 $data02
+if $rows != 100 then
+  return -1
+endi
+system sh/exec.sh -n dnode1 -s stop -x SIGINT
+system sh/exec.sh -n dnode2 -s stop -x SIGINT
+#system sh/exec.sh -n dnode3 -s stop -x SIGINT
+system sh/exec.sh -n dnode4 -s stop -x SIGINT
+########################################################
+########################################################
+print ===> start dnode1 dnode2 dnode3 
+system sh/exec.sh -n dnode1 -s start
+system sh/exec.sh -n dnode2 -s start
+system sh/exec.sh -n dnode3 -s start
+#system sh/exec.sh -n dnode4 -s start
+sleep 3000
+print =============== query data 
+sql select * from ct1
+print rows: $rows
+print $data00 $data01 $data02
+if $rows != 100 then
+  return -1
+endi
+system sh/exec.sh -n dnode1 -s stop -x SIGINT
+system sh/exec.sh -n dnode2 -s stop -x SIGINT
+system sh/exec.sh -n dnode3 -s stop -x SIGINT
+#system sh/exec.sh -n dnode4 -s stop -x SIGINT
+########################################################
--- a/tests/script/tsim/valgrind/checkError1.sim
+++ b/tests/script/tsim/valgrind/checkError1.sim
@@ -98,7 +98,7 @@ print ----> start to check if there are ERRORS in vagrind log file for each dnod
 system_content sh/checkValgrind.sh -n dnode1 
 print cmd return result ----> [ $system_content ]
-if $system_content <= 2 then
+if $system_content <= 0 then
  return 0
 endi 

--- a/tests/script/tsim/valgrind/checkError2.sim
+++ b/tests/script/tsim/valgrind/checkError2.sim
@@ -48,7 +48,7 @@ sql insert into ct1 values(now+1s, 11, 2.1, 3.1)(now+2s, -12, -2.2, -3.2)(now+3s
 print =============== step6: select data
 sql select * from ct1
-#sql select * from stb
+sql select * from stb
 _OVER:
 system sh/exec.sh -n dnode1 -s stop -x SIGINT
@@ -58,7 +58,7 @@ print ----> start to check if there are ERRORS in vagrind log file for each dnod
 system_content sh/checkValgrind.sh -n dnode1 
 print cmd return result ----> [ $system_content ]
-if $system_content <= 2 then
+if $system_content <= 0 then
  return 0
 endi 

--- a/tests/system-test/2-query/abs.py
+++ b/tests/system-test/2-query/abs.py
@@ -538,9 +538,9 @@ class TDTestCase:
        tdSql.query("select c1 ,t1 from stb1 where t1 =0 ")
        tdSql.checkRows(13)
        tdSql.query("select t1 from stb1 where t1 >0 ")
-        tdSql.checkRows(3)
+        tdSql.checkRows(12)
        tdSql.query("select t1 from stb1 where t1 =3 ")
-        tdSql.checkRows(1)
+        tdSql.checkRows(12)
        # tdSql.query("select sum(t1) from (select c1 ,t1 from stb1)")
        # tdSql.checkData(0,0,61)
        # tdSql.query("select distinct(c1) ,t1 from stb1")
@@ -550,7 +550,7 @@ class TDTestCase:
        # tag filter with abs function
        tdSql.query("select t1 from stb1 where abs(t1)=1")
-        tdSql.checkRows(1)
+        tdSql.checkRows(0)
        tdSql.query("select t1 from stb1 where abs(c1+t1)=1")
        tdSql.checkRows(1)
        tdSql.checkData(0,0,0)

--- a/tests/system-test/2-query/and_or_for_byte.py
+++ b/tests/system-test/2-query/and_or_for_byte.py
@@ -495,9 +495,9 @@ class TDTestCase:
        tdSql.checkRows(13)
        self.check_function("&", False ,"t1","c1+2","abs(c2)")
        tdSql.query("select t1 from stb1 where t1 >0 ")
-        tdSql.checkRows(3)
+        tdSql.checkRows(12)
        tdSql.query("select t1 from stb1 where t1 =3 ")
-        tdSql.checkRows(1)
+        tdSql.checkRows(12)
        # tdSql.query("select sum(t1) from (select c1 ,t1 from stb1)")
        # tdSql.checkData(0,0,61)
        # tdSql.query("select distinct(c1) ,t1 from stb1")
@@ -507,7 +507,7 @@ class TDTestCase:
        # tag filter with abs function
        tdSql.query("select t1 from stb1 where abs(t1)=1")
-        tdSql.checkRows(1)
+        tdSql.checkRows(0)
        tdSql.query("select t1 from stb1 where abs(c1+t1)=1")
        tdSql.checkRows(1)
        tdSql.checkData(0,0,0)

--- a/tests/system-test/2-query/elapsed.py
+++ b/tests/system-test/2-query/elapsed.py
@@ -1315,24 +1315,26 @@ class TDTestCase:
        tdSql.error("select elapsed(tsv ,1s) from (select elapsed(ts,1s) tsv from regular_table_1);")
        tdSql.error("select elapsed(ts ,1s) from (select elapsed(ts,1s) ts from regular_table_1);")
        # # bug fix
-        # tdSql.error("select elapsed(tsc ,1s) from (select tscol tsc from regular_table_1) ;")
+        tdSql.error("select elapsed(tsc ,1s) from (select tscol tsc from regular_table_1) ;")
        # case TD-12276
-        # tdSql.error("select elapsed(ts,1s) from (select ts,tbname from regular_table_1 order by ts asc );")
+        tdSql.query("select elapsed(ts,1s) from (select ts,tbname from regular_table_1 order by ts asc );")
+        tdSql.checkData(0,0,90.000000000)
-        # tdSql.error("select elapsed(ts,1s) from (select ts,tbname  from regular_table_1 order by ts desc );")
+        tdSql.query("select elapsed(ts,1s) from (select ts,tbname  from regular_table_1 order by ts desc );")
+        tdSql.checkData(0,0,90.000000000)
-        # tdSql.error("select elapsed(ts,1s) from (select ts ,max(q_int),tbname  from regular_table_1 order by ts  ) interval(1s);")
+        tdSql.query("select elapsed(ts,1s) from (select ts ,max(q_int),tbname  from regular_table_1 order by ts  ) interval(1s);")
-        # tdSql.error("select elapsed(ts,1s) from (select ts ,q_int,tbname  from regular_table_1 order by ts  ) interval(1s);")
+        tdSql.query("select elapsed(ts,1s) from (select ts ,q_int,tbname  from regular_table_1 order by ts  ) interval(10s);")
        # sub table
        tdSql.query("select elapsed(ts,1s) from (select ts from sub_table1_1  );")
-        # tdSql.error("select elapsed(ts,1s) from (select ts ,max(q_int),tbname  from sub_table1_1 order by ts  ) interval(1s);")
+        tdSql.query("select elapsed(ts,1s) from (select ts ,max(q_int),tbname  from sub_table1_1 order by ts  ) interval(1s);")
-        # tdSql.error("select elapsed(ts,1s) from (select ts ,q_int,tbname  from sub_table1_1 order by ts  ) interval(1s);")
+        tdSql.query("select elapsed(ts,1s) from (select ts ,q_int,tbname  from sub_table1_1 order by ts  ) interval(10s);")
        tdSql.query("select elapsed(ts,1s) from (select ts ,tbname,top(q_int,3)  from sub_table1_1   ) interval(10s);")
@@ -1342,7 +1344,7 @@ class TDTestCase:
        tdSql.query("select elapsed(ts,1s) from (select ts ,tbname from sub_table1_1   ) interval(10s);")
-        # tdSql.error("select elapsed(ts,1s) from (select ts ,count(*),tbname  from sub_table1_1 order by ts  ) interval(1s);")
+        tdSql.error("select elapsed(ts,1s) from (select ts ,count(*),tbname  from sub_table1_1 order by ts  ) interval(1s);")
        querys = ["count(*)","avg(q_int)", "sum(q_double)","stddev(q_float)","LEASTSQUARES(q_int,0,1)","elapsed(ts,1s)"]
@@ -1488,8 +1490,8 @@ class TDTestCase:
        tdSql.query('select elapsed(ts,1s) from  ( select * from sub_table1_1   where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000") session(ts,1w) ; ')
-        # tdSql.error('select elapsed(ts,1s) from  ( select ts ,q_int from sub_table1_1   where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000") session(ts,1w) ; ')
+        tdSql.query('select elapsed(ts,1s) from  ( select ts ,q_int from sub_table1_1   where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000") session(ts,1w) ; ')
-        # tdSql.error('select elapsed(ts,1s) from sub_table1_1   where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000" interval(20s) fill (next) session(ts,1w) ; ')
+        # tdSql.query('select elapsed(ts,1s) from sub_table1_1   where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000" interval(20s) fill (next) session(ts,1w) ; ')
        tdSql.query('select elapsed(ts,1s) from sub_empty_1  where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000"  session(ts,1w) ; ')
        tdSql.checkRows(0)
@@ -1506,14 +1508,14 @@ class TDTestCase:
        tdSql.checkRows(10)
        tdSql.checkData(0,0,0)
-        # tdSql.error('select elapsed(ts,1s) from  ( select * from sub_table1_1   where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000") state_window(q_int) ; ')
+        tdSql.query('select elapsed(ts,1s) from  ( select * from sub_table1_1   where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000") state_window(q_int) ; ')
-        # tdSql.error('select elapsed(ts,1s) from  ( select ts ,q_int from sub_table1_1   where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000") state_window(q_int) ; ')
+        tdSql.query('select elapsed(ts,1s) from  ( select ts ,q_int from sub_table1_1   where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000") state_window(q_int) ; ')
-        # tdSql.error('select elapsed(ts,1s) from sub_table1_1   where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000" interval(20s) fill (next) state_window(q_int) ; ')
+        tdSql.error('select elapsed(ts,1s) from sub_table1_1   where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000" interval(20s) fill (next) state_window(q_int) ; ')
-        # tdSql.query('select elapsed(ts,1s) from sub_empty_1  where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000"  state_window(q_int); ')
+        tdSql.query('select elapsed(ts,1s) from sub_empty_1  where ts>="2015-01-01 00:00:00.000"  and ts < "2015-01-01 00:10:00.000"  state_window(q_int); ')
-        # tdSql.checkRows(0)
+        tdSql.checkRows(0)
    def continuous_query(self):

--- a/tests/system-test/2-query/json_tag.py
+++ b/tests/system-test/2-query/json_tag.py
@@ -57,7 +57,7 @@ class TDTestCase:
        # test duplicate key using the first one. elimate empty key
        tdSql.execute("CREATE TABLE if not exists jsons1_8 using jsons1 tags('{\"tag1\":null, \"tag1\":true, \"tag1\":45, \"1tag$\":2, \" \":90, \"\":32}')")
        tdSql.query("select jtag from jsons1_8")
-        tdSql.checkData(0, 0, '{" ":90,"1tag$":2,"tag1":null}')
+        tdSql.checkRows(0);
        tdSql.query("select ts,jtag from jsons1 order by ts limit 2,3")
        tdSql.checkData(0, 0, '2020-06-02 09:17:08.000')
@@ -153,38 +153,17 @@ class TDTestCase:
        #test scalar operation
        tdSql.query("select jtag contains 'tag1',jtag->'tag1' from jsons1 order by jtag->'tag1'")
-        tdSql.checkRows(13)
+        tdSql.checkRows(9)
-        tdSql.checkData(0, 0, False)
-        tdSql.checkData(5, 0, True)
-        tdSql.checkData(12, 0, True)
        tdSql.query("select jtag->'tag1' like 'fe%',jtag->'tag1' from jsons1 order by jtag->'tag1'")
-        tdSql.checkRows(13)
+        tdSql.checkRows(9)
-        tdSql.checkData(10, 0, False)
-        tdSql.checkData(11, 0, False)
-        tdSql.checkData(12, 0, True)
        tdSql.query("select jtag->'tag1' not like 'fe%',jtag->'tag1' from jsons1 order by jtag->'tag1'")
-        tdSql.checkRows(13)
+        tdSql.checkRows(9)
-        tdSql.checkData(10, 0, False)
-        tdSql.checkData(11, 0, True)
-        tdSql.checkData(12, 0, False)
        tdSql.query("select jtag->'tag1' match 'fe',jtag->'tag1' from jsons1 order by jtag->'tag1'")
-        tdSql.checkRows(13)
+        tdSql.checkRows(9)
-        tdSql.checkData(10, 0, False)
-        tdSql.checkData(11, 0, False)
-        tdSql.checkData(12, 0, True)
        tdSql.query("select jtag->'tag1' nmatch 'fe',jtag->'tag1' from jsons1 order by jtag->'tag1'")
-        tdSql.checkRows(13)
+        tdSql.checkRows(9)
-        tdSql.checkData(10, 0, False)
-        tdSql.checkData(11, 0, True)
-        tdSql.checkData(12, 0, False)
        tdSql.query("select jtag->'tag1',jtag->'tag1'>='a' from jsons1 order by jtag->'tag1'")
-        tdSql.checkRows(13)
+        tdSql.checkRows(9)
-        tdSql.checkData(0, 0, None)
-        tdSql.checkData(0, 1, False)
-        tdSql.checkData(7, 0, "false")
-        tdSql.checkData(7, 1, False)
-        tdSql.checkData(8, 1, False)
-        tdSql.checkData(12, 1, True)
        # test select normal column
        tdSql.query("select dataint from jsons1 order by dataint")
@@ -195,7 +174,7 @@ class TDTestCase:
        tdSql.query("select * from jsons1")
        tdSql.checkRows(9)
        tdSql.query("select jtag from jsons1")
-        tdSql.checkRows(13)
+        tdSql.checkRows(9)
        tdSql.query("select * from jsons1 where jtag is null")
        tdSql.checkRows(1)
        tdSql.query("select * from jsons1 where jtag is not null")
@@ -227,7 +206,7 @@ class TDTestCase:
        tdSql.checkData(0, 0, None)
        tdSql.query("select jtag->'tag1' from jsons1")
-        tdSql.checkRows(13)
+        tdSql.checkRows(9)
        # test header name
        res = tdSql.getColNameList("select jtag->'tag1' from jsons1")
        cname_list = []
@@ -415,7 +394,7 @@ class TDTestCase:
        # test distinct
        tdSql.execute("insert into jsons1_14 using jsons1 tags('{\"tag1\":\"收到货\",\"tag2\":\"\",\"tag3\":null}') values(1591062628000, 2, NULL, '你就会', 'dws')")
        tdSql.query("select distinct jtag->'tag1' from jsons1")
-        tdSql.checkRows(8)
+        tdSql.checkRows(7)
        # tdSql.query("select distinct jtag from jsons1")
        # tdSql.checkRows(9)
@@ -523,12 +502,12 @@ class TDTestCase:
        # union all
        tdSql.query("select jtag->'tag1' from jsons1 union all select jtag->'tag2' from jsons2")
-        tdSql.checkRows(17)
+        tdSql.checkRows(13)
        tdSql.query("select jtag->'tag1' from jsons1_1 union all select jtag->'tag2' from jsons2_1")
-        tdSql.checkRows(2)
+        tdSql.checkRows(3)
        tdSql.query("select jtag->'tag1' from jsons1_1 union all select jtag->'tag1' from jsons2_1")
-        tdSql.checkRows(2)
+        tdSql.checkRows(3)
        tdSql.query("select dataint,jtag->'tag1',tbname from jsons1 union all select dataint,jtag->'tag1',tbname from jsons2")
        tdSql.checkRows(13)
        tdSql.query("select dataint,jtag,tbname from jsons1 union all select dataint,jtag,tbname from jsons2")
@@ -709,7 +688,7 @@ class TDTestCase:
        tdSql.checkData(0, 0, None)
        tdSql.execute("CREATE TABLE if not exists jsons1_20 using jsons1 tags(NULL)")
        tdSql.query("select jtag from jsons1_20")
-        tdSql.checkData(0, 0, None)
+        tdSql.checkRows(0)
        tdSql.execute("insert into jsons1_21 using jsons1 tags(NULL) values(1591061628000, 11, false, '你就会','')")
        tdSql.query("select jtag from jsons1_21")
        tdSql.checkData(0, 0, None)

--- a/tests/system-test/7-tmq/tmqAutoCreateTbl.py
+++ b/tests/system-test/7-tmq/tmqAutoCreateTbl.py
+import taos
+import sys
+import time
+import socket
+import os
+import threading
+from enum import Enum
+from util.log import *
+from util.sql import *
+from util.cases import *
+from util.dnodes import *
+sys.path.append("./7-tmq")
+from tmqCommon import *
+class TDTestCase:
+    def __init__(self):
+        self.vgroups    = 2
+        self.ctbNum     = 100
+        self.rowsPerTbl = 10000
+    def init(self, conn, logSql):
+        tdLog.debug(f"start to excute {__file__}")
+        tdSql.init(conn.cursor(), False)
+    def prepareTestEnv(self):
+        tdLog.printNoPrefix("======== prepare test env include database, stable, ctables, and insert data: ")
+        paraDict = {'dbName':     'dbt',
+                    'dropFlag':   1,
+                    'event':      '',
+                    'vgroups':    3,
+                    'stbName':    'stb',
+                    'colPrefix':  'c',
+                    'tagPrefix':  't',
+                    'colSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1},{'type': 'TIMESTAMP', 'count':1}],
+                    'tagSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1}],
+                    'ctbPrefix':  'ctb',
+                    'ctbStartIdx': 0,
+                    'ctbNum':     500,
+                    'rowsPerTbl': 1000,
+                    'batchNum':   500,
+                    'startTs':    1640966400000,  # 2022-01-01 00:00:00.000
+                    'pollDelay':  3,
+                    'showMsg':    1,
+                    'showRow':    1,
+                    'snapshot':   0}
+        paraDict['vgroups'] = self.vgroups
+        paraDict['ctbNum'] = self.ctbNum
+        paraDict['rowsPerTbl'] = self.rowsPerTbl
+        tmqCom.initConsumerTable()
+        tdCom.create_database(tdSql, paraDict["dbName"],paraDict["dropFlag"], vgroups=paraDict["vgroups"],replica=1)
+        tdLog.info("create stb")
+        tmqCom.create_stable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"])
+        tdLog.info("create ctb")
+        tmqCom.create_ctable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"],ctbPrefix=paraDict['ctbPrefix'],
+                             ctbNum=paraDict["ctbNum"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("insert data")
+        tmqCom.insert_data_interlaceByMultiTbl(tsql=tdSql,dbName=paraDict["dbName"],ctbPrefix=paraDict["ctbPrefix"],
+                                               ctbNum=paraDict["ctbNum"],rowsPerTbl=paraDict["rowsPerTbl"],batchNum=paraDict["batchNum"],
+                                               startTs=paraDict["startTs"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("restart taosd to ensure that the data falls into the disk")
+        # tdDnodes.stop(1)
+        # tdDnodes.start(1)
+        tdSql.query("flush database %s"%(paraDict['dbName']))
+        return
+    def tmqCase1(self):
+        tdLog.printNoPrefix("======== test case 1: ")
+        paraDict = {'dbName':     'dbt',
+                    'dropFlag':   1,
+                    'event':      '',
+                    'vgroups':    4,
+                    'stbName':    'stb',
+                    'colPrefix':  'c',
+                    'tagPrefix':  't',
+                    'colSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1},{'type': 'TIMESTAMP', 'count':1}],
+                    'tagSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1}],
+                    'ctbPrefix':  'ctb',
+                    'ctbStartIdx': 0,
+                    'ctbNum':     1000,
+                    'rowsPerTbl': 1000,
+                    'batchNum':   400,
+                    'startTs':    1640966400000,  # 2022-01-01 00:00:00.000
+                    'pollDelay':  5,
+                    'showMsg':    1,
+                    'showRow':    1,
+                    'snapshot':   1}
+        # paraDict['vgroups'] = self.vgroups
+        # paraDict['ctbNum'] = self.ctbNum
+        # paraDict['rowsPerTbl'] = self.rowsPerTbl
+        tmqCom.initConsumerTable()
+        tdCom.create_database(tdSql, paraDict["dbName"],paraDict["dropFlag"], vgroups=paraDict["vgroups"],replica=1)
+        tdLog.info("create stb")
+        tmqCom.create_stable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"])
+        tdLog.info("create ctb")
+        tmqCom.create_ctable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"],ctbPrefix=paraDict['ctbPrefix'],
+                             ctbNum=paraDict["ctbNum"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("insert data")
+        tmqCom.insert_data_interlaceByMultiTbl(tsql=tdSql,dbName=paraDict["dbName"],ctbPrefix=paraDict["ctbPrefix"],
+                                               ctbNum=paraDict["ctbNum"],rowsPerTbl=paraDict["rowsPerTbl"],batchNum=paraDict["batchNum"],
+                                               startTs=paraDict["startTs"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("create topics from stb1")
+        topicFromStb1 = 'topic_stb1'                
+        queryString = "select ts, c1, c2 from %s.%s"%(paraDict['dbName'], paraDict['stbName'])
+        sqlString = "create topic %s as %s" %(topicFromStb1, queryString)
+        tdLog.info("create topic sql: %s"%sqlString)
+        tdSql.execute(sqlString)
+        consumerId     = 0
+        expectrowcnt   = paraDict["rowsPerTbl"] * paraDict["ctbNum"]
+        topicList      = topicFromStb1
+        ifcheckdata    = 0
+        ifManualCommit = 0
+        keyList        = 'group.id:cgrp1,\
+                        enable.auto.commit:true,\
+                        auto.commit.interval.ms:500,\
+                        auto.offset.reset:earliest'
+        tmqCom.insertConsumerInfo(consumerId, expectrowcnt,topicList,keyList,ifcheckdata,ifManualCommit)
+        tdLog.info("start consume processor")
+        tmqCom.startTmqSimProcess(pollDelay=paraDict['pollDelay'],dbName=paraDict["dbName"],showMsg=paraDict['showMsg'], showRow=paraDict['showRow'],snapshot=paraDict['snapshot'])
+        # time.sleep(3)
+        tmqCom.getStartCommitNotifyFromTmqsim()
+        tdLog.info("================= restart dnode ===========================")
+        tdDnodes.stop(1)
+        tdDnodes.start(1)
+        time.sleep(5)
+        tdLog.info("insert process end, and start to check consume result")
+        expectRows = 1
+        resultList = tmqCom.selectConsumeResult(expectRows)
+        totalConsumeRows = 0
+        for i in range(expectRows):
+            totalConsumeRows += resultList[i]
+        tdSql.query(queryString)
+        totalRowsInserted = tdSql.getRows()
+        if totalConsumeRows != totalRowsInserted:
+            tdLog.info("act consume rows: %d, expect consume rows: %d"%(totalConsumeRows, totalRowsInserted))
+            tdLog.exit("tmq consume rows error!")
+        tdSql.query("drop topic %s"%topicFromStb1)
+        tdLog.printNoPrefix("======== test case 1 end ...... ")
+    def tmqCase2(self):
+        tdLog.printNoPrefix("======== test case 2: ")  
+        paraDict = {'dbName':     'dbt',
+                    'dropFlag':   1,
+                    'event':      '',
+                    'vgroups':    4,
+                    'stbName':    'stb',
+                    'colPrefix':  'c',
+                    'tagPrefix':  't',
+                    'colSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1},{'type': 'TIMESTAMP', 'count':1}],
+                    'tagSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1}],
+                    'ctbPrefix':  'ctb',
+                    'ctbStartIdx': 0,
+                    'ctbNum':     1000,
+                    'rowsPerTbl': 1000,
+                    'batchNum':   1000,
+                    'startTs':    1640966400000,  # 2022-01-01 00:00:00.000
+                    'pollDelay':  5,
+                    'showMsg':    1,
+                    'showRow':    1,
+                    'snapshot':   1}
+        # paraDict['vgroups'] = self.vgroups
+        # paraDict['ctbNum'] = self.ctbNum
+        # paraDict['rowsPerTbl'] = self.rowsPerTbl
+        tmqCom.initConsumerTable()
+        tdCom.create_database(tdSql, paraDict["dbName"],paraDict["dropFlag"], vgroups=paraDict["vgroups"],replica=1)
+        tdLog.info("create stb")
+        tmqCom.create_stable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"])
+        tdLog.info("create ctb")
+        tmqCom.create_ctable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"],ctbPrefix=paraDict['ctbPrefix'],
+                             ctbNum=paraDict["ctbNum"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("insert data")
+        tmqCom.insert_data_interlaceByMultiTbl(tsql=tdSql,dbName=paraDict["dbName"],ctbPrefix=paraDict["ctbPrefix"],
+                                               ctbNum=paraDict["ctbNum"],rowsPerTbl=paraDict["rowsPerTbl"],batchNum=paraDict["batchNum"],
+                                               startTs=paraDict["startTs"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("create topics from stb1")
+        topicFromStb1 = 'topic_stb1'                
+        queryString = "select ts, c1, c2 from %s.%s"%(paraDict['dbName'], paraDict['stbName'])
+        sqlString = "create topic %s as %s" %(topicFromStb1, queryString)
+        tdLog.info("create topic sql: %s"%sqlString)
+        tdSql.execute(sqlString)
+        consumerId     = 0
+        expectrowcnt   = paraDict["rowsPerTbl"] * paraDict["ctbNum"] * 2
+        topicList      = topicFromStb1
+        ifcheckdata    = 0
+        ifManualCommit = 0
+        keyList        = 'group.id:cgrp1,\
+                        enable.auto.commit:true,\
+                        auto.commit.interval.ms:1000,\
+                        auto.offset.reset:earliest'
+        tmqCom.insertConsumerInfo(consumerId, expectrowcnt,topicList,keyList,ifcheckdata,ifManualCommit)
+        tdLog.info("start consume processor")
+        tmqCom.startTmqSimProcess(pollDelay=paraDict['pollDelay'],dbName=paraDict["dbName"],showMsg=paraDict['showMsg'], showRow=paraDict['showRow'],snapshot=paraDict['snapshot'])
+        tdLog.info("create some new child table and insert data ")
+        tmqCom.insert_data_with_autoCreateTbl(tdSql,paraDict["dbName"],paraDict["stbName"],"ctb",paraDict["ctbNum"],paraDict["rowsPerTbl"],paraDict["batchNum"])
+        tmqCom.getStartCommitNotifyFromTmqsim()
+        tdLog.info("================= restart dnode ===========================")
+        tdDnodes.stop(1)
+        tdDnodes.start(1)
+        time.sleep(5)
+        tdLog.info("insert process end, and start to check consume result")
+        expectRows = 1
+        resultList = tmqCom.selectConsumeResult(expectRows)
+        totalConsumeRows = 0
+        for i in range(expectRows):
+            totalConsumeRows += resultList[i]
+        tdSql.query(queryString)
+        totalRowsInserted = tdSql.getRows()
+        if totalConsumeRows != totalRowsInserted:
+            tdLog.info("act consume rows: %d, expect consume rows: %d"%(totalConsumeRows, totalRowsInserted))
+            tdLog.exit("tmq consume rows error!")
+        tdSql.query("drop topic %s"%topicFromStb1)
+        tdLog.printNoPrefix("======== test case 2 end ...... ")
+    # 自动建表完成数据插入，启动消费
+    def tmqCase3(self):
+        tdLog.printNoPrefix("======== test case 3: ")   
+        paraDict = {'dbName':     'dbt',
+                    'dropFlag':   1,
+                    'event':      '',
+                    'vgroups':    4,
+                    'stbName':    'stb',
+                    'colPrefix':  'c',
+                    'tagPrefix':  't',
+                    'colSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1},{'type': 'TIMESTAMP', 'count':1}],
+                    'tagSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1}],
+                    'ctbPrefix':  'ctb',
+                    'ctbStartIdx': 0,
+                    'ctbNum':     1000,
+                    'rowsPerTbl': 1000,
+                    'batchNum':   400,
+                    'startTs':    1640966400000,  # 2022-01-01 00:00:00.000
+                    'pollDelay':  5,
+                    'showMsg':    1,
+                    'showRow':    1,
+                    'snapshot':   1}
+        paraDict['vgroups'] = self.vgroups
+        paraDict['ctbNum'] = self.ctbNum
+        paraDict['rowsPerTbl'] = self.rowsPerTbl
+        tmqCom.initConsumerTable()
+        tdCom.create_database(tdSql, paraDict["dbName"],paraDict["dropFlag"], vgroups=paraDict["vgroups"],replica=1)
+        tdLog.info("create stb")
+        tmqCom.create_stable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"])
+        tdLog.info("insert data by auto create ctb")
+        tmqCom.insert_data_with_autoCreateTbl(tdSql,paraDict["dbName"],paraDict["stbName"],"ctb",paraDict["ctbNum"],paraDict["rowsPerTbl"],paraDict["batchNum"])
+        tdLog.info("create topics from stb1")
+        topicFromStb1 = 'topic_stb1'                
+        queryString = "select ts, c1, c2 from %s.%s"%(paraDict['dbName'], paraDict['stbName'])
+        sqlString = "create topic %s as %s" %(topicFromStb1, queryString)
+        tdLog.info("create topic sql: %s"%sqlString)
+        tdSql.execute(sqlString)        
+        consumerId     = 0
+        expectrowcnt   = paraDict["rowsPerTbl"] * paraDict["ctbNum"]
+        topicList      = topicFromStb1
+        ifcheckdata    = 0
+        ifManualCommit = 0
+        keyList        = 'group.id:cgrp1,\
+                        enable.auto.commit:true,\
+                        auto.commit.interval.ms:1000,\
+                        auto.offset.reset:earliest'
+        tmqCom.insertConsumerInfo(consumerId, expectrowcnt,topicList,keyList,ifcheckdata,ifManualCommit)
+        tdLog.info("start consume processor")
+        tmqCom.startTmqSimProcess(pollDelay=paraDict['pollDelay'],dbName=paraDict["dbName"],showMsg=paraDict['showMsg'], showRow=paraDict['showRow'],snapshot=paraDict['snapshot'])
+        # tdLog.info("================= restart dnode ===========================")
+        # tdDnodes.stop(1)
+        # tdDnodes.start(1)
+        # time.sleep(2)
+        tdLog.info("insert process end, and start to check consume result")
+        expectRows = 1
+        resultList = tmqCom.selectConsumeResult(expectRows)
+        totalConsumeRows = 0
+        for i in range(expectRows):
+            totalConsumeRows += resultList[i]
+        tdSql.query(queryString)
+        totalRowsInserted = tdSql.getRows()
+        if totalConsumeRows != totalRowsInserted:
+            tdLog.info("act consume rows: %d, expect consume rows: %d"%(totalConsumeRows, totalRowsInserted))
+            tdLog.exit("tmq consume rows error!")
+        tdSql.query("drop topic %s"%topicFromStb1)
+        tdLog.printNoPrefix("======== test case 3 end ...... ")
+    def run(self):
+        tdSql.prepare()
+        # self.tmqCase1()
+        # self.tmqCase2() 
+        self.tmqCase3()
+    def stop(self):
+        tdSql.close()
+        tdLog.success(f"{__file__} successfully executed")
+event = threading.Event()
+tdCases.addLinux(__file__, TDTestCase())
+tdCases.addWindows(__file__, TDTestCase())
--- a/tests/system-test/7-tmq/tmqCommon.py
+++ b/tests/system-test/7-tmq/tmqCommon.py
@@ -361,19 +361,25 @@ class TMQCom:
        if startTs == 0:
            t = time.time()
            startTs = int(round(t * 1000))
        #tdLog.debug("doing insert data into stable:%s rows:%d ..."%(stbName, allRows))
        rowsBatched = 0        
        for i in range(ctbNum):
-            sql += " %s.%s_%d using %s.%s tags (%d) values "%(dbName,ctbPrefix,i+ctbStartIdx,dbName,stbName,i)
+            tagBinaryValue = 'beijing'
+            if (i % 2 == 0):
+                tagBinaryValue = 'shanghai'
+            elif (i % 3 == 0):
+                tagBinaryValue = 'changsha'
+            sql += " %s.%s_%d using %s.%s tags (%d, %d, %d, '%s', '%s') values "%(dbName,ctbPrefix,i+ctbStartIdx,dbName,stbName,i+ctbStartIdx,i+ctbStartIdx,i+ctbStartIdx,tagBinaryValue,tagBinaryValue)
            for j in range(rowsPerTbl):
-                sql += "(%d, %d, 'tmqrow_%d') "%(startTs + j, j, j)
+                sql += "(%d, %d, %d, %d, 'binary_%d', 'nchar_%d', now) "%(startTs+j, j,j, j,i+ctbStartIdx,rowsBatched)
                rowsBatched += 1
                if ((rowsBatched == batchNum) or (j == rowsPerTbl - 1)):
                    tsql.execute(sql)
                    rowsBatched = 0
                    if j < rowsPerTbl - 1:
-                        sql = "insert into %s.%s_%d using %s.%s tags (%d) values " %(dbName,ctbPrefix,i+ctbStartIdx,dbName,stbName,i)
+                        sql = "insert into %s.%s_%d using %s.%s tags (%d, %d, %d, '%s', '%s') values " %(dbName,ctbPrefix,i+ctbStartIdx,dbName,stbName,i+ctbStartIdx,i+ctbStartIdx,i+ctbStartIdx,tagBinaryValue,tagBinaryValue)
                    else:
                        sql = "insert into "
        #end sql

--- a/tests/system-test/99-TDcase/TD-17255.py
+++ b/tests/system-test/99-TDcase/TD-17255.py
+import taos
+import sys
+import time
+import socket
+import os
+import threading
+from enum import Enum
+from util.log import *
+from util.sql import *
+from util.cases import *
+from util.dnodes import *
+sys.path.append("./7-tmq")
+from tmqCommon import *
+class TDTestCase:
+    def __init__(self):
+        self.vgroups    = 2
+        self.ctbNum     = 100
+        self.rowsPerTbl = 10000
+    def init(self, conn, logSql):
+        tdLog.debug(f"start to excute {__file__}")
+        tdSql.init(conn.cursor(), False)
+    def prepareTestEnv(self):
+        tdLog.printNoPrefix("======== prepare test env include database, stable, ctables, and insert data: ")
+        paraDict = {'dbName':     'dbt',
+                    'dropFlag':   1,
+                    'event':      '',
+                    'vgroups':    3,
+                    'stbName':    'stb',
+                    'colPrefix':  'c',
+                    'tagPrefix':  't',
+                    'colSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1},{'type': 'TIMESTAMP', 'count':1}],
+                    'tagSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1}],
+                    'ctbPrefix':  'ctb',
+                    'ctbStartIdx': 0,
+                    'ctbNum':     500,
+                    'rowsPerTbl': 1000,
+                    'batchNum':   500,
+                    'startTs':    1640966400000,  # 2022-01-01 00:00:00.000
+                    'pollDelay':  3,
+                    'showMsg':    1,
+                    'showRow':    1,
+                    'snapshot':   0}
+        paraDict['vgroups'] = self.vgroups
+        paraDict['ctbNum'] = self.ctbNum
+        paraDict['rowsPerTbl'] = self.rowsPerTbl
+        tmqCom.initConsumerTable()
+        tdCom.create_database(tdSql, paraDict["dbName"],paraDict["dropFlag"], vgroups=paraDict["vgroups"],replica=1)
+        tdLog.info("create stb")
+        tmqCom.create_stable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"])
+        tdLog.info("create ctb")
+        tmqCom.create_ctable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"],ctbPrefix=paraDict['ctbPrefix'],
+                             ctbNum=paraDict["ctbNum"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("insert data")
+        tmqCom.insert_data_interlaceByMultiTbl(tsql=tdSql,dbName=paraDict["dbName"],ctbPrefix=paraDict["ctbPrefix"],
+                                               ctbNum=paraDict["ctbNum"],rowsPerTbl=paraDict["rowsPerTbl"],batchNum=paraDict["batchNum"],
+                                               startTs=paraDict["startTs"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("restart taosd to ensure that the data falls into the disk")
+        # tdDnodes.stop(1)
+        # tdDnodes.start(1)
+        tdSql.query("flush database %s"%(paraDict['dbName']))
+        return
+    def tmqCase1(self):
+        tdLog.printNoPrefix("======== test case 1: ")
+        paraDict = {'dbName':     'dbt',
+                    'dropFlag':   1,
+                    'event':      '',
+                    'vgroups':    4,
+                    'stbName':    'stb',
+                    'colPrefix':  'c',
+                    'tagPrefix':  't',
+                    'colSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1},{'type': 'TIMESTAMP', 'count':1}],
+                    'tagSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1}],
+                    'ctbPrefix':  'ctb',
+                    'ctbStartIdx': 0,
+                    'ctbNum':     1000,
+                    'rowsPerTbl': 1000,
+                    'batchNum':   400,
+                    'startTs':    1640966400000,  # 2022-01-01 00:00:00.000
+                    'pollDelay':  5,
+                    'showMsg':    1,
+                    'showRow':    1,
+                    'snapshot':   1}
+        # paraDict['vgroups'] = self.vgroups
+        # paraDict['ctbNum'] = self.ctbNum
+        # paraDict['rowsPerTbl'] = self.rowsPerTbl
+        tmqCom.initConsumerTable()
+        tdCom.create_database(tdSql, paraDict["dbName"],paraDict["dropFlag"], vgroups=paraDict["vgroups"],replica=1)
+        tdLog.info("create stb")
+        tmqCom.create_stable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"])
+        tdLog.info("create ctb")
+        tmqCom.create_ctable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"],ctbPrefix=paraDict['ctbPrefix'],
+                             ctbNum=paraDict["ctbNum"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("insert data")
+        tmqCom.insert_data_interlaceByMultiTbl(tsql=tdSql,dbName=paraDict["dbName"],ctbPrefix=paraDict["ctbPrefix"],
+                                               ctbNum=paraDict["ctbNum"],rowsPerTbl=paraDict["rowsPerTbl"],batchNum=paraDict["batchNum"],
+                                               startTs=paraDict["startTs"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("create topics from stb1")
+        topicFromStb1 = 'topic_stb1'                
+        queryString = "select ts, c1, c2 from %s.%s"%(paraDict['dbName'], paraDict['stbName'])
+        sqlString = "create topic %s as %s" %(topicFromStb1, queryString)
+        tdLog.info("create topic sql: %s"%sqlString)
+        tdSql.execute(sqlString)
+        consumerId     = 0
+        expectrowcnt   = paraDict["rowsPerTbl"] * paraDict["ctbNum"]
+        topicList      = topicFromStb1
+        ifcheckdata    = 0
+        ifManualCommit = 0
+        keyList        = 'group.id:cgrp1,\
+                        enable.auto.commit:true,\
+                        auto.commit.interval.ms:500,\
+                        auto.offset.reset:earliest'
+        tmqCom.insertConsumerInfo(consumerId, expectrowcnt,topicList,keyList,ifcheckdata,ifManualCommit)
+        tdLog.info("start consume processor")
+        tmqCom.startTmqSimProcess(pollDelay=paraDict['pollDelay'],dbName=paraDict["dbName"],showMsg=paraDict['showMsg'], showRow=paraDict['showRow'],snapshot=paraDict['snapshot'])
+        # time.sleep(3)
+        tmqCom.getStartCommitNotifyFromTmqsim()
+        tdLog.info("================= restart dnode ===========================")
+        tdDnodes.stop(1)
+        tdDnodes.start(1)
+        time.sleep(5)
+        tdLog.info("insert process end, and start to check consume result")
+        expectRows = 1
+        resultList = tmqCom.selectConsumeResult(expectRows)
+        totalConsumeRows = 0
+        for i in range(expectRows):
+            totalConsumeRows += resultList[i]
+        tdSql.query(queryString)
+        totalRowsInserted = tdSql.getRows()
+        if totalConsumeRows != totalRowsInserted:
+            tdLog.info("act consume rows: %d, expect consume rows: %d"%(totalConsumeRows, totalRowsInserted))
+            tdLog.exit("tmq consume rows error!")
+        tdSql.query("drop topic %s"%topicFromStb1)
+        tdLog.printNoPrefix("======== test case 1 end ...... ")
+    def tmqCase2(self):
+        tdLog.printNoPrefix("======== test case 2: ")  
+        paraDict = {'dbName':     'dbt',
+                    'dropFlag':   1,
+                    'event':      '',
+                    'vgroups':    4,
+                    'stbName':    'stb',
+                    'colPrefix':  'c',
+                    'tagPrefix':  't',
+                    'colSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1},{'type': 'TIMESTAMP', 'count':1}],
+                    'tagSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1}],
+                    'ctbPrefix':  'ctb',
+                    'ctbStartIdx': 0,
+                    'ctbNum':     1000,
+                    'rowsPerTbl': 1000,
+                    'batchNum':   1000,
+                    'startTs':    1640966400000,  # 2022-01-01 00:00:00.000
+                    'pollDelay':  5,
+                    'showMsg':    1,
+                    'showRow':    1,
+                    'snapshot':   1}
+        # paraDict['vgroups'] = self.vgroups
+        # paraDict['ctbNum'] = self.ctbNum
+        # paraDict['rowsPerTbl'] = self.rowsPerTbl
+        tmqCom.initConsumerTable()
+        tdCom.create_database(tdSql, paraDict["dbName"],paraDict["dropFlag"], vgroups=paraDict["vgroups"],replica=1)
+        tdLog.info("create stb")
+        tmqCom.create_stable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"])
+        tdLog.info("create ctb")
+        tmqCom.create_ctable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"],ctbPrefix=paraDict['ctbPrefix'],
+                             ctbNum=paraDict["ctbNum"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("insert data")
+        tmqCom.insert_data_interlaceByMultiTbl(tsql=tdSql,dbName=paraDict["dbName"],ctbPrefix=paraDict["ctbPrefix"],
+                                               ctbNum=paraDict["ctbNum"],rowsPerTbl=paraDict["rowsPerTbl"],batchNum=paraDict["batchNum"],
+                                               startTs=paraDict["startTs"],ctbStartIdx=paraDict['ctbStartIdx'])
+        tdLog.info("create topics from stb1")
+        topicFromStb1 = 'topic_stb1'                
+        queryString = "select ts, c1, c2 from %s.%s"%(paraDict['dbName'], paraDict['stbName'])
+        sqlString = "create topic %s as %s" %(topicFromStb1, queryString)
+        tdLog.info("create topic sql: %s"%sqlString)
+        tdSql.execute(sqlString)
+        consumerId     = 0
+        expectrowcnt   = paraDict["rowsPerTbl"] * paraDict["ctbNum"] * 2
+        topicList      = topicFromStb1
+        ifcheckdata    = 0
+        ifManualCommit = 0
+        keyList        = 'group.id:cgrp1,\
+                        enable.auto.commit:true,\
+                        auto.commit.interval.ms:1000,\
+                        auto.offset.reset:earliest'
+        tmqCom.insertConsumerInfo(consumerId, expectrowcnt,topicList,keyList,ifcheckdata,ifManualCommit)
+        tdLog.info("start consume processor")
+        tmqCom.startTmqSimProcess(pollDelay=paraDict['pollDelay'],dbName=paraDict["dbName"],showMsg=paraDict['showMsg'], showRow=paraDict['showRow'],snapshot=paraDict['snapshot'])
+        tdLog.info("create some new child table and insert data ")
+        tmqCom.insert_data_with_autoCreateTbl(tdSql,paraDict["dbName"],paraDict["stbName"],"ctb",paraDict["ctbNum"],paraDict["rowsPerTbl"],paraDict["batchNum"])
+        tmqCom.getStartCommitNotifyFromTmqsim()
+        tdLog.info("================= restart dnode ===========================")
+        tdDnodes.stop(1)
+        tdDnodes.start(1)
+        time.sleep(5)
+        tdLog.info("insert process end, and start to check consume result")
+        expectRows = 1
+        resultList = tmqCom.selectConsumeResult(expectRows)
+        totalConsumeRows = 0
+        for i in range(expectRows):
+            totalConsumeRows += resultList[i]
+        tdSql.query(queryString)
+        totalRowsInserted = tdSql.getRows()
+        if totalConsumeRows != totalRowsInserted:
+            tdLog.info("act consume rows: %d, expect consume rows: %d"%(totalConsumeRows, totalRowsInserted))
+            tdLog.exit("tmq consume rows error!")
+        tdSql.query("drop topic %s"%topicFromStb1)
+        tdLog.printNoPrefix("======== test case 2 end ...... ")
+    # 自动建表完成数据插入，启动消费
+    def tmqCase3(self):
+        tdLog.printNoPrefix("======== test case 3: ")   
+        paraDict = {'dbName':     'dbt',
+                    'dropFlag':   1,
+                    'event':      '',
+                    'vgroups':    4,
+                    'stbName':    'stb',
+                    'colPrefix':  'c',
+                    'tagPrefix':  't',
+                    'colSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1},{'type': 'TIMESTAMP', 'count':1}],
+                    'tagSchema':   [{'type': 'INT', 'count':1},{'type': 'BIGINT', 'count':1},{'type': 'DOUBLE', 'count':1},{'type': 'BINARY', 'len':32, 'count':1},{'type': 'NCHAR', 'len':32, 'count':1}],
+                    'ctbPrefix':  'ctb',
+                    'ctbStartIdx': 0,
+                    'ctbNum':     1000,
+                    'rowsPerTbl': 1000,
+                    'batchNum':   400,
+                    'startTs':    1640966400000,  # 2022-01-01 00:00:00.000
+                    'pollDelay':  5,
+                    'showMsg':    1,
+                    'showRow':    1,
+                    'snapshot':   1}
+        paraDict['vgroups'] = self.vgroups
+        paraDict['ctbNum'] = self.ctbNum
+        paraDict['rowsPerTbl'] = self.rowsPerTbl
+        tmqCom.initConsumerTable()
+        tdCom.create_database(tdSql, paraDict["dbName"],paraDict["dropFlag"], vgroups=paraDict["vgroups"],replica=1)
+        tdLog.info("create stb")
+        tmqCom.create_stable(tdSql, dbName=paraDict["dbName"],stbName=paraDict["stbName"])
+        tdLog.info("insert data by auto create ctb")
+        tmqCom.insert_data_with_autoCreateTbl(tdSql,paraDict["dbName"],paraDict["stbName"],"ctb",paraDict["ctbNum"],paraDict["rowsPerTbl"],paraDict["batchNum"])
+        tdLog.info("create topics from stb1")
+        topicFromStb1 = 'topic_stb1'                
+        queryString = "select ts, c1, c2 from %s.%s"%(paraDict['dbName'], paraDict['stbName'])
+        sqlString = "create topic %s as %s" %(topicFromStb1, queryString)
+        tdLog.info("create topic sql: %s"%sqlString)
+        tdSql.execute(sqlString)        
+        consumerId     = 0
+        expectrowcnt   = paraDict["rowsPerTbl"] * paraDict["ctbNum"]
+        topicList      = topicFromStb1
+        ifcheckdata    = 0
+        ifManualCommit = 0
+        keyList        = 'group.id:cgrp1,\
+                        enable.auto.commit:true,\
+                        auto.commit.interval.ms:1000,\
+                        auto.offset.reset:earliest'
+        tmqCom.insertConsumerInfo(consumerId, expectrowcnt,topicList,keyList,ifcheckdata,ifManualCommit)
+        tdLog.info("start consume processor")
+        tmqCom.startTmqSimProcess(pollDelay=paraDict['pollDelay'],dbName=paraDict["dbName"],showMsg=paraDict['showMsg'], showRow=paraDict['showRow'],snapshot=paraDict['snapshot'])
+        # tdLog.info("================= restart dnode ===========================")
+        # tdDnodes.stop(1)
+        # tdDnodes.start(1)
+        # time.sleep(2)
+        tdLog.info("insert process end, and start to check consume result")
+        expectRows = 1
+        resultList = tmqCom.selectConsumeResult(expectRows)
+        totalConsumeRows = 0
+        for i in range(expectRows):
+            totalConsumeRows += resultList[i]
+        tdSql.query(queryString)
+        totalRowsInserted = tdSql.getRows()
+        if totalConsumeRows != totalRowsInserted:
+            tdLog.info("act consume rows: %d, expect consume rows: %d"%(totalConsumeRows, totalRowsInserted))
+            tdLog.exit("tmq consume rows error!")
+        tdSql.query("drop topic %s"%topicFromStb1)
+        tdLog.printNoPrefix("======== test case 3 end ...... ")
+    def run(self):
+        tdSql.prepare()
+        self.tmqCase1()
+        # self.tmqCase2() 
+        self.tmqCase3()
+    def stop(self):
+        tdSql.close()
+        tdLog.success(f"{__file__} successfully executed")
+event = threading.Event()
+tdCases.addLinux(__file__, TDTestCase())
+tdCases.addWindows(__file__, TDTestCase())
--- a/tests/test/c/tmqSim.c
+++ b/tests/test/c/tmqSim.c
@@ -52,6 +52,7 @@ typedef struct {
  // char     autoOffsetRest[16];    // none, earliest, latest
  TdFilePtr pConsumeRowsFile;
+  TdFilePtr pConsumeMetaFile;  
  int32_t   ifCheckData;
  int64_t   expectMsgCnt;
@@ -445,7 +446,7 @@ static void dumpToFileForCheck(TdFilePtr pFile, TAOS_ROW row, TAOS_FIELD* fields
  taosFprintfFile(pFile, "\n");
 }
-static int32_t msg_process(TAOS_RES* msg, SThreadInfo* pInfo, int32_t msgIndex) {
+static int32_t data_msg_process(TAOS_RES* msg, SThreadInfo* pInfo, int32_t msgIndex) {
  char    buf[1024];
  int32_t totalRows = 0;
@@ -496,6 +497,52 @@ static int32_t msg_process(TAOS_RES* msg, SThreadInfo* pInfo, int32_t msgIndex)
  return totalRows;
 }
+static int32_t meta_msg_process(TAOS_RES* msg, SThreadInfo* pInfo, int32_t msgIndex) {
+  char    buf[1024];
+  int32_t totalRows = 0;
+  // printf("topic: %s\n", tmq_get_topic_name(msg));
+  int32_t     vgroupId = tmq_get_vgroup_id(msg);
+  const char* dbName = tmq_get_db_name(msg);
+  taosFprintfFile(g_fp, "consumerId: %d, msg index:%" PRId64 "\n", pInfo->consumerId, msgIndex);
+  taosFprintfFile(g_fp, "dbName: %s, topic: %s, vgroupId: %d\n", dbName != NULL ? dbName : "invalid table",
+                  tmq_get_topic_name(msg), vgroupId);
+  {
+    tmq_raw_data *raw = tmq_get_raw_meta(msg);
+    if(raw){
+	  TAOS_RES* pRes = taos_query(pInfo->taos, "use metadb");
+	  if (taos_errno(pRes) != 0) {
+		pError("error when use metadb, reason:%s\n", taos_errstr(pRes));
+		taosFprintfFile(g_fp, "error when use metadb, reason:%s\n", taos_errstr(pRes));
+		taosCloseFile(&g_fp);
+		taos_free_result(pRes);
+		exit(-1);
+	  }	  
+	  taos_free_result(pRes);
+	  taosFprintfFile(g_fp, "raw:%p\n", raw);
+      int32_t ret = taos_write_raw_meta(pInfo->taos, raw);
+      taosMemoryFree(raw);	  
+    }
+    char* result = tmq_get_json_meta(msg);
+    if(result){
+  	  //printf("meta result: %s\n", result);
+  	  taosFprintfFile(pInfo->pConsumeMetaFile, "%s\n", result);
+  	  taosMemoryFree(result);
+    }
+  }
+  totalRows++;
+  return totalRows;
+}
 int queryDB(TAOS* taos, char* command) {
  TAOS_RES* pRes = taos_query(taos, command);
  int       code = taos_errno(pRes);
@@ -526,7 +573,7 @@ int32_t notifyMainScript(SThreadInfo* pInfo, int32_t cmdId) {
 static int32_t g_once_commit_flag = 0;
 static void    tmq_commit_cb_print(tmq_t* tmq, int32_t code, void* param) {
-  pError("tmq_commit_cb_print() commit %d\n", code);
+  taosFprintfFile(g_fp, "tmq_commit_cb_print() commit %d\n", code);
  if (0 == g_once_commit_flag) {
    g_once_commit_flag = 1;
@@ -630,8 +677,12 @@ void loop_consume(SThreadInfo* pInfo) {
    // getCurrentTimeString(tmpString));
    sprintf(filename, "%s/../log/consumerid_%d.txt", configDir, pInfo->consumerId);
    pInfo->pConsumeRowsFile = taosOpenFile(filename, TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_TRUNC | TD_FILE_STREAM);
-    if (pInfo->pConsumeRowsFile == NULL) {
-      taosFprintfFile(g_fp, "%s create file fail for save rows content\n", getCurrentTimeString(tmpString));
+	sprintf(filename, "%s/../log/meta_consumerid_%d.txt", configDir, pInfo->consumerId);
+	pInfo->pConsumeMetaFile = taosOpenFile(filename, TD_FILE_CREATE | TD_FILE_WRITE | TD_FILE_TRUNC | TD_FILE_STREAM);
+    if (pInfo->pConsumeRowsFile == NULL || pInfo->pConsumeMetaFile == NULL) {
+      taosFprintfFile(g_fp, "%s create file fail for save rows or save meta\n", getCurrentTimeString(tmpString));
      return;
    }
  }
@@ -645,7 +696,11 @@ void loop_consume(SThreadInfo* pInfo) {
    TAOS_RES* tmqMsg = tmq_consumer_poll(pInfo->tmq, consumeDelay);
    if (tmqMsg) {
      if (0 != g_stConfInfo.showMsgFlag) {
-        totalRows += msg_process(tmqMsg, pInfo, totalMsgs);
+	  	tmq_res_t msgType = tmq_get_res_type(tmqMsg);
+		if (msgType == TMQ_RES_TABLE_META) {
+ 		  totalRows += meta_msg_process(tmqMsg, pInfo, totalMsgs);
+		} else if (msgType == TMQ_RES_DATA)
+          totalRows += data_msg_process(tmqMsg, pInfo, totalMsgs);
      }
      taos_free_result(tmqMsg);