@@ -23,7 +23,25 @@ By subscribing to a topic, a consumer can obtain the latest data in that topic i
...
@@ -23,7 +23,25 @@ By subscribing to a topic, a consumer can obtain the latest data in that topic i
To implement these features, TDengine indexes its write-ahead log (WAL) file for fast random access and provides configurable methods for replacing and retaining this file. You can define a retention period and size for this file. For information, see the CREATE DATABASE statement. In this way, the WAL file is transformed into a persistent storage engine that remembers the order in which events occur. However, note that configuring an overly long retention period for your WAL files makes database compression inefficient. TDengine then uses the WAL file instead of the time-series database as its storage engine for queries in the form of topics. TDengine reads the data from the WAL file; uses a unified query engine instance to perform filtering, transformations, and other operations; and finally pushes the data to consumers.
To implement these features, TDengine indexes its write-ahead log (WAL) file for fast random access and provides configurable methods for replacing and retaining this file. You can define a retention period and size for this file. For information, see the CREATE DATABASE statement. In this way, the WAL file is transformed into a persistent storage engine that remembers the order in which events occur. However, note that configuring an overly long retention period for your WAL files makes database compression inefficient. TDengine then uses the WAL file instead of the time-series database as its storage engine for queries in the form of topics. TDengine reads the data from the WAL file; uses a unified query engine instance to perform filtering, transformations, and other operations; and finally pushes the data to consumers.
Tips: Data subscription is to consume data from the wal. If some wal files are deleted according to WAL retention policy, the deleted data can't be consumed any more. So you need to set a reasonable value for parameter `WAL_RETENTION_PERIOD` or `WAL_RETENTION_SIZE` when creating the database and make sure your application consume the data in a timely way to make sure there is no data loss. This behavior is similar to Kafka and other widely used message queue products.
Tips:(c interface for example)
- A consumption group consumes all data under the same topic, and different consumption groups are independent of each other;
- A consumption group consumes all vgroups of the same topic, which can be composed of multiple consumers, but a vgroup is only consumed by one consumer. If the number of consumers exceeds the number of vgroups, the excess consumers do not consume data;
- On the server side, only one offset is saved for each vgroup, and the offsets for each vgroup are monotonically increasing, but not necessarily continuous. There is no correlation between the offsets of various vgroups;
- Each poll server will return a result block, which belongs to a vgroup and may contain data from multiple versions of wal. This block can be accessed through tmq_get_vgroup_offset. The offset interface obtains the offset of the first record in the block;
- If a consumer group has never committed an offset, when its member consumers restart and pull data again, they start consuming from the set value of the parameter auto.offset.reset; In a consumer lifecycle, the client locally records the offset of the most recent pull data and will not pull duplicate data;
- If a consumer terminates abnormally (without calling tmq_close), they need to wait for about 12 seconds to trigger their consumer group rebalance. The consumer's status on the server will change to LOST, and after about 1 day, the consumer will be automatically deleted; Exit normally, and after exiting, the consumer will be deleted; Add a new consumer, wait for about 2 seconds to trigger Rebalance, and the consumer's status on the server will change to ready;
- The consumer group Rebalance will reassign Vgroups to all consumer members in the ready state of the group, and consumers can only assign/see/commit/poll operations to the Vgroups they are responsible for;
- Consumers can tmq_position to obtain the offset of the current consumption, seek to the specified offset, and consume again;
- Seek points the position to the specified offset without executing the commit operation. Once the seek is successful, it can poll the specified offset and subsequent data;
- Position is to obtain the current consumption position, which is the position to be taken next time, not the current consumption position
- Commit is the submission of the consumption location. Without parameters, it is the submission of the current consumption location (the location to be taken next time, not the current consumption location). With parameters, it is the location in the submission parameters (i.e. the location to be taken after the next exit and restart)
- Seek is to set the consumer's consumption position. Wherever the seek goes, the position will be returned, all of which are the positions to be taken next time
- Seek does not affect commit, commit does not affect seek, independent of each other, the two are different concepts
- The begin interface is the offset of the first data in wal, and the end interface is the offset+1 of the last data in wal10.
- Before the seek operation, tmq must be call tmq_get_topic_assignment, The assignment interface obtains the vgroup ID and offset range of the consumer. The seek operation will detect whether the vgroup ID and offset are legal, and if they are illegal, an error will be reported;
- Due to the existence of a WAL expiration deletion mechanism, even if the seek operation is successful, it is possible that the offset has expired when polling data. If the offset of poll is less than the WAL minimum version number, it will be consumed from the WAL minimum version number;
- The tmq_get_vgroup_offset interface obtains the offset of the first data in the result block where the record is located. When seeking to this offset, it will consume all the data in this block. Refer to point four;
- Data subscription is to consume data from the wal. If some wal files are deleted according to WAL retention policy, the deleted data can't be consumed any more. So you need to set a reasonable value for parameter `WAL_RETENTION_PERIOD` or `WAL_RETENTION_SIZE` when creating the database and make sure your application consume the data in a timely way to make sure there is no data loss. This behavior is similar to Kafka and other widely used message queue products.
## Data Schema and API
## Data Schema and API
...
@@ -33,36 +51,57 @@ The related schemas and APIs in various languages are described as follows:
...
@@ -33,36 +51,57 @@ The related schemas and APIs in various languages are described as follows:
DLL_EXPORT int64_t tmq_position(tmq_t *tmq, const char *pTopicName, int32_t vgId); // The current offset is the offset of the last consumed message + 1
@@ -378,7 +378,7 @@ In addition to writing data using the SQL method or the parameter binding API, w
...
@@ -378,7 +378,7 @@ In addition to writing data using the SQL method or the parameter binding API, w
- `TAOS_RES* taos_schemaless_insert(TAOS* taos, const char* lines[], int numLines, int protocol, int precision)`
- `TAOS_RES* taos_schemaless_insert(TAOS* taos, const char* lines[], int numLines, int protocol, int precision)`
**Function description**
**Function description**
This interface writes the text data of the line protocol to TDengine.
- This interface writes the text data of the line protocol to TDengine.
**Parameter description**
**Parameter description**
- taos: database connection, established by the `taos_connect()` function.
- taos: database connection, established by the `taos_connect()` function.
...
@@ -387,12 +387,13 @@ In addition to writing data using the SQL method or the parameter binding API, w
...
@@ -387,12 +387,13 @@ In addition to writing data using the SQL method or the parameter binding API, w
- protocol: the protocol type of the lines, used to identify the text data format.
- protocol: the protocol type of the lines, used to identify the text data format.
- precision: precision string for the timestamp in the text data.
- precision: precision string for the timestamp in the text data.
**return value**
**Return value**
TAOS_RES structure, application can get error message by using `taos_errstr()` and also error code by using `taos_errno()`.
- TAOS_RES structure, application can get error message by using `taos_errstr()` and also error code by using `taos_errno()`.
In some cases, the returned TAOS_RES is `NULL`, and it is still possible to call `taos_errno()` to safely get the error code information.
In some cases, the returned TAOS_RES is `NULL`, and it is still possible to call `taos_errno()` to safely get the error code information.
The returned TAOS_RES needs to be freed by the caller in order to avoid memory leaks.
The returned TAOS_RES needs to be freed by the caller in order to avoid memory leaks.
**Description**
**Description**
The protocol type is enumerated and contains the following three formats.
The protocol type is enumerated and contains the following three formats.
- TSDB_SML_LINE_PROTOCOL: InfluxDB line protocol (Line Protocol)
- TSDB_SML_LINE_PROTOCOL: InfluxDB line protocol (Line Protocol)
...
@@ -427,3 +428,89 @@ In addition to writing data using the SQL method or the parameter binding API, w
...
@@ -427,3 +428,89 @@ In addition to writing data using the SQL method or the parameter binding API, w
- Within _raw interfaces represent data through the passed parameters lines and len. In order to solve the problem that the original interface data contains '\0' and is truncated. The totalRows pointer returns the number of parsed data rows.
- Within _raw interfaces represent data through the passed parameters lines and len. In order to solve the problem that the original interface data contains '\0' and is truncated. The totalRows pointer returns the number of parsed data rows.
- Within _ttl interfaces can pass the ttl parameter to control the ttl expiration time of the table.
- Within _ttl interfaces can pass the ttl parameter to control the ttl expiration time of the table.
- Within _reqid interfaces can track the entire call chain by passing the reqid parameter.
- Within _reqid interfaces can track the entire call chain by passing the reqid parameter.
The commit interface is divided into two types, each with synchronous and asynchronous interfaces:
- The first type: based on message submission, submit the progress in the message. If the message passes NULL, submit the current progress of all vgroups consumed by the current consumer: tmq_commit_sync/tmq_commit_async
- The second type: submit based on the offset of a Vgroup in a topic: tmq_commit_offset_sync/tmq_commit_offset_async
**Parameter description**
- msg:Message consumed, If the message passes NULL, submit the current progress of all vgroups consumed by the current consumer
**Return value**
- zero success, none zero failed, wrong message can be obtained through `char *tmq_err2str(int32_t code)`
DLL_EXPORT int64_t tmq_position(tmq_t *tmq, const char *pTopicName, int32_t vgId); // The current offset is the offset of the last consumed message + 1