`Flush` operation is used to make sure that the data has been writen into the persistent storage, this document introduce how `Flush` operation works in `Milvus 2.0`. The following figure shows the execution flow of `Flush`
1. Firstly, `SDK` starts a `Flush` request to `Proxy` via `Grpc`, the `proto` is defined as follows:
```proto
serviceMilvusService{
...
rpcFlush(FlushRequest)returns(FlushResponse){}
...
}
messageFlushRequest{
common.MsgBasebase=1;
stringdb_name=2;
repeatedstringcollection_names=3;
}
messageFlushResponse{
common.Statusstatus=1;
stringdb_name=2;
map<string,schema.LongArray>coll_segIDs=3;
}
```
2. When received the `Flush` request, the `Proxy` would wraps this request into `FlushTask`, and pushs this task into `DdTaskQueue` queue. After that, `Proxy` would call method of `WatiToFinish` to wait until the task finished.
```go
typetaskinterface{
TraceCtx()context.Context
ID()UniqueID// return ReqID
SetID(uidUniqueID)// set ReqID
Name()string
Type()commonpb.MsgType
BeginTs()Timestamp
EndTs()Timestamp
SetTs(tsTimestamp)
OnEnqueue()error
PreExecute(ctxcontext.Context)error
Execute(ctxcontext.Context)error
PostExecute(ctxcontext.Context)error
WaitToFinish()error
Notify(errerror)
}
typeFlushTaskstruct{
Condition
*milvuspb.FlushRequest
ctxcontext.Context
dataCoordtypes.DataCoord
result*milvuspb.FlushResponse
}
```
3. There is a backgroud service in `Proxy`, this service would get the `FlushTask` from `DdTaskQueue`, and executes it in three phases.
-`PreExecute`,`FlushTask` does nothing at this phase, and return directly
-`Execute`, at this phase, `Proxy` would send `Flush` request to `DataCoord` via `Grpc`,and wait for the reponse, the `proto` is defined as follow:
```proto
serviceDataCoord{
...
rpcFlush(FlushRequest)returns(FlushResponse){}
...
}
messageFlushRequest{
common.MsgBasebase=1;
int64dbID=2;
int64collectionID=4;
}
messageFlushResponse{
common.Statusstatus=1;
int64dbID=2;
int64collectionID=3;
repeatedint64segmentIDs=4;
```
-`PostExecute`, `FlushTask` does nothing at this phase, and return directly
4. After receiving `Flush` request from `Proxy`, `DataCoord` would call `SealAllSegments` to seal all the growing segments that belong to this `Collection`, and no longer allocate new `ID`s for these segments. After that, `DataCoord` would send response to `Proxy`, and the response should contain all the sealed segment ID.
5. In `Milvus 2.0`, the `Flush` is an asynchronous operation. So when `SDK` receives the response of `Flush`, it only means that the `DataCoord` has sealed these segments, and there are 2 problem that we have to soluved.
- The sealed segments might still in the memory, and not have been writen into persistent storage yet.
-`DataCoord` would no longer allocate new `ID`s for these sealed segments, but how to make sure all the allocated `ID`s have been consumed by `DataNode`.
6. For the first problem, `SDK` should send `GetSegmentInfo` request to `DataCoord` periodically, until all the sealed segment are in state of `Flushed`. the `proto` is defined as following.
7. For second problem, `DataNode` would report a timestamp to `DataCoord` every time it consumes a package from `MsgStream`,the Proto is define as follow.
```proto
messageDataNodeTtMsg{
common.MsgBasebase=1;
stringchannel_name=2;
uint64timestamp=3;
}
```
8. There is a backgroud service, `startDataNodeTsLoop`, in `DataCoord` to process the message of `DataNodeTtMsg`.
- Firstly, `DataCoord` would extract `channel_name` from `DataNodeTtMsg`, and filter out all the sealed segments that attached on this `channel_name`
- Compare the timestamp when the segment enters into state of `Sealed` with the `DataNodeTtMsg.timestamp`, if `DataNodeTtMsg.timestamp` is greater, it means that all the `ID`s belong to that segment have been consumed by `DataNode`,so it's safe to notify `DataNode` to write that segment into persistent storage. The `proto` is defined as follow.