From 6e6644298c25e518fdc767361dde7e7620b90ac1 Mon Sep 17 00:00:00 2001 From: Cai Yudong Date: Mon, 20 Sep 2021 22:51:52 +0800 Subject: [PATCH] [skip ci] Update flush doc (#8261) Signed-off-by: yudong.cai --- .../milvus_flush_collections_en.md | 92 +++++++++---------- 1 file changed, 46 insertions(+), 46 deletions(-) diff --git a/docs/design_docs/milvus_flush_collections_en.md b/docs/design_docs/milvus_flush_collections_en.md index 22cd5f119..74fcdc2f0 100644 --- a/docs/design_docs/milvus_flush_collections_en.md +++ b/docs/design_docs/milvus_flush_collections_en.md @@ -1,5 +1,5 @@ # Flush Collection -`Flush` operation is used to make sure that the data has been writen into the persistent storage, this document introduce how `Flush` operation works in `Milvus 2.0`. The following figure shows the execution flow of `Flush` +`Flush` operation is used to make sure that inserted data will be written into persistent storage. This document will introduce how `Flush` operation works in `Milvus 2.0`. The following figure shows the execution flow of `Flush`. ![flush_collections](./graphs/flush_data_coord.png) @@ -7,9 +7,7 @@ ```proto service MilvusService { ... - rpc Flush(FlushRequest) returns (FlushResponse) {} - ... } @@ -27,7 +25,7 @@ message FlushResponse{ ``` -2. When received the `Flush` request, the `Proxy` would wraps this request into `FlushTask`, and pushs this task into `DdTaskQueue` queue. After that, `Proxy` would call method of `WatiToFinish` to wait until the task finished. +2. When `Proxy` receives `Flush` request, it would wrap this request into `FlushTask`, and push this task into `DdTaskQueue` queue. After that, `Proxy` would call method `WatiToFinish` to wait until the task finished. ```go type task interface { TraceCtx() context.Context @@ -55,46 +53,50 @@ type FlushTask struct { } ``` -3. There is a backgroud service in `Proxy`, this service would get the `FlushTask` from `DdTaskQueue`, and executes it in three phases. - - `PreExecute`,`FlushTask` does nothing at this phase, and return directly - - `Execute`, at this phase, `Proxy` would send `Flush` request to `DataCoord` via `Grpc`,and wait for the reponse, the `proto` is defined as follow: - ```proto - service DataCoord { - ... - - rpc Flush(FlushRequest) returns (FlushResponse) {} +3. There is a backgroud service in `Proxy`. This service gets `FlushTask` from `DdTaskQueue`, and executes in three phases: + - `PreExecute` + + `FlushTask` does nothing at this phase, and returns directly + + - `Execute` - ... - } - - message FlushRequest { - common.MsgBase base = 1; - int64 dbID = 2; - int64 collectionID = 4; - } - - message FlushResponse { - common.Status status = 1; - int64 dbID = 2; - int64 collectionID = 3; - repeated int64 segmentIDs = 4; + `Proxy` sends `Flush` request to `DataCoord` via `Grpc`, and waits for the response, the `proto` is defined as follow: + ```proto + service DataCoord { + ... + rpc Flush(FlushRequest) returns (FlushResponse) {} + ... + } + + message FlushRequest { + common.MsgBase base = 1; + int64 dbID = 2; + int64 collectionID = 4; + } + + message FlushResponse { + common.Status status = 1; + int64 dbID = 2; + int64 collectionID = 3; + repeated int64 segmentIDs = 4; + } ``` - - `PostExecute`, `FlushTask` does nothing at this phase, and return directly - -4. After receiving `Flush` request from `Proxy`, `DataCoord` would call `SealAllSegments` to seal all the growing segments that belong to this `Collection`, and no longer allocate new `ID`s for these segments. After that, `DataCoord` would send response to `Proxy`, and the response should contain all the sealed segment ID. - -5. In `Milvus 2.0`, the `Flush` is an asynchronous operation. So when `SDK` receives the response of `Flush`, it only means that the `DataCoord` has sealed these segments, and there are 2 problem that we have to soluved. - - The sealed segments might still in the memory, and not have been writen into persistent storage yet. + - `PostExecute` + + `FlushTask` does nothing at this phase, and returns directly + +4. After receiving `Flush` request from `Proxy`, `DataCoord` would call `SealAllSegments` to seal all the growing segments belonging to this `Collection`, and do not allocate new `ID`s for these segments any more. After that, `DataCoord` would send response to `Proxy`, which contain all the sealed segment `ID`s. + +5. In `Milvus 2.0`, `Flush` is an asynchronous operation. So when `SDK` receives the response of `Flush`, it only means that the `DataCoord` has sealed these segments. There are 2 problems that we have to solve. + - The sealed segments might still in memory, and have not been written into persistent storage yet. - `DataCoord` would no longer allocate new `ID`s for these sealed segments, but how to make sure all the allocated `ID`s have been consumed by `DataNode`. -6. For the first problem, `SDK` should send `GetSegmentInfo` request to `DataCoord` periodically, until all the sealed segment are in state of `Flushed`. the `proto` is defined as following. +6. For the first problem, `SDK` should send `GetSegmentInfo` request to `DataCoord` periodically, until all sealed segments are in state of `Flushed`. The `proto` is defined as following. ```proto service DataCoord { ... - rpc GetSegmentInfo(GetSegmentInfoRequest) returns (GetSegmentInfoResponse) {} - ... } @@ -132,25 +134,23 @@ enum SegmentState { ``` -7. For second problem, `DataNode` would report a timestamp to `DataCoord` every time it consumes a package from `MsgStream`,the Proto is define as follow. +7. For the second problem, `DataNode` would report a timestamp to `DataCoord` every time it consumes a package from `MsgStream`, the `proto` is define as follow. ```proto - message DataNodeTtMsg { - common.MsgBase base =1; - string channel_name = 2; - uint64 timestamp = 3; +message DataNodeTtMsg { + common.MsgBase base = 1; + string channel_name = 2; + uint64 timestamp = 3; } -``` + ``` 8. There is a backgroud service, `startDataNodeTsLoop`, in `DataCoord` to process the message of `DataNodeTtMsg`. - - Firstly, `DataCoord` would extract `channel_name` from `DataNodeTtMsg`, and filter out all the sealed segments that attached on this `channel_name` - - Compare the timestamp when the segment enters into state of `Sealed` with the `DataNodeTtMsg.timestamp`, if `DataNodeTtMsg.timestamp` is greater, it means that all the `ID`s belong to that segment have been consumed by `DataNode`,so it's safe to notify `DataNode` to write that segment into persistent storage. The `proto` is defined as follow. + - Firstly, `DataCoord` would extract `channel_name` from `DataNodeTtMsg`, and filter out all sealed segments that attached on this `channel_name` + - Compare the timestamp when the segment enters into state of `Sealed` with the `DataNodeTtMsg.timestamp`, if `DataNodeTtMsg.timestamp` is greater, which means that all `ID`s belonging to that segment have been consumed by `DataNode`, it's safe to notify `DataNode` to write that segment into persistent storage. The `proto` is defined as follow: ```proto service DataNode { ... - rpc FlushSegments(FlushSegmentsRequest) returns(common.Status) {} - ... } @@ -160,5 +160,5 @@ message FlushSegmentsRequest { int64 collectionID = 3; repeated int64 segmentIDs = 4; } +``` -``` \ No newline at end of file -- GitLab