diff --git a/docs/design_docs/datanode_ddl_flush_design_0519_2021.md b/docs/design_docs/datanode_ddl_flush_design_0519_2021.md new file mode 100644 index 0000000000000000000000000000000000000000..149f571d5406922ae4b1b46ea0be649f263bc22a --- /dev/null +++ b/docs/design_docs/datanode_ddl_flush_design_0519_2021.md @@ -0,0 +1,68 @@ +# DataNode DDL Flush Design + +update: 5.19.2021, by [Goose](https://github.com/XuanYang-cn) + +## Background + +Data Definition Language (DDL) is a language used to define data structures and modify data[1](#techterms1). +In Milvus terminology, for instance, `CreateCollection` and `DropPartition` etc. are DDL. In order to recover +or redo DD operations, DataNode flushes DDLs into persistent storages. + +Before this design, DataNode buffers DDL chunks by collection, flushes all buffered data in manul/auto flush. + +Now in [DataNode Recovery Design](datanode_recover_design_0513_2021.md), flowgraph : vchannel = 1 : 1, and insert +data of one segment is always in one vchannel. So each flowgraph concerns only about ONE specific collection. For +DDL channels, one flowgraph only cares about DDL operations of one collection. + +## Goals + +- Flowgraph knows about which segment/collection to concern. +- DDNode update masPositions once it buffers ddl about the collection +- DDNode buffers binlog Paths generated by auto-flush +- In manul-flush, a background flush-complete goroutinue waits for DDNode and InsertBufferNode both done flushing, +waiting for both binlog paths. + +## Detailed design + +1. Redisign of DDL binlog paths and etcd paths for these binlog paths + + +DDL flushes based on a manul flush of a segment. + +**Former design** +``` +# minIO/S3 ddl binlog paths +${tenant}/data_definition_log/${collection_id}/ts/${log_idx} +${tenant}/data_definition_log/${collection_id}/ddl/${log_idx} + +# etcd paths for ddl binlog paths +${prefix}/${collectionID}/${idx} +``` + +The minIO/S3 ddl binlog paths seems ok, but etcd paths aren't clear, especially when we want to relate a ddl flush +to a certain segment flush. + +**Redesign** +``` +# etcd paths for ddl binlog paths +${prefix}/${collectionID}/${segmentID}/${idx} +``` + +``` +message SaveBinlogPathsRequest { + common.MsgBase base = 1; + int64 segmentID = 2; + int64 collectionID = 3; + ID2PathList field2BinlogPaths = 4; + repeated DDLBinlogMeta = 5; + repeated internal.MsgPosition start_positions = 7; + repeated internal.MsgPosition end_positions = 8; + } +``` + +## TODOs + +1. Refactor auto-flush of ddNode +3. Refactor etcd paths + +[1]: *[techterms.com](https://techterms.com/definition/ddl#:~:text=Stands%20for%20%22Data%20Definition%20Language,SQL%2C%20the%20Structured%20Query%20Language)* diff --git a/docs/design_docs/datanode_recovery_design_0513_2021.md b/docs/design_docs/datanode_recovery_design_0513_2021.md index 8a0a33cdebb80d5725631ef1c324c3e54656747c..221fb0cc2305680ee45415123a0b0cfd2dc9585a 100644 --- a/docs/design_docs/datanode_recovery_design_0513_2021.md +++ b/docs/design_docs/datanode_recovery_design_0513_2021.md @@ -63,12 +63,10 @@ manul-flush and upload to DataServce together. ```proto rpc SaveBinlogPaths(SaveBinlogPathsRequest) returns (common.Status){} - - -message ID2PathList { - int64 ID = 1; - repeated string Paths = 2; -} +message ID2PathList { + int64 ID = 1; + repeated string Paths = 2; +} message SaveBinlogPathsRequest { common.MsgBase base = 1; @@ -87,20 +85,16 @@ message SaveBinlogPathsRequest { The same as DataNode ```proto -message FieldFlushMeta { - int64 fieldID = 1; - repeated string binlog_paths = 2; -} - -message SegmentFlushMeta{ - int64 segmentID = 1; - bool is_flushed = 2; - repeated FieldFlushMeta fields = 5; +// key: ${prefix}/${segmentID}/${fieldID}/${idx} +message SegmentFieldBinlogMeta { + int64 fieldID = 1; + string binlog_path = 2; } -message DDLFlushMeta { - int64 collectionID = 1; - repeated string binlog_paths = 2; +// key: ${prefix}/${collectionID}/${idx} +message DDLBinlogMeta { + string ddl_binlog_path = 1; + string ts_binlog_path = 2; } ```