Add ddl flush design (#5289)

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-12-01 11:29:48 +08:00 · 2021-05-19 12:06:16 +08:00 · 2021-05-19 12:06:16 +08:00 · 4b712284f2
commit 4b712284f2
parent 9b37cab922
2 changed files with 80 additions and 18 deletions
--- a/docs/design_docs/datanode_ddl_flush_design_0519_2021.md
+++ b/docs/design_docs/datanode_ddl_flush_design_0519_2021.md
@ -0,0 +1,68 @@
+# DataNode DDL Flush Design
+
+update: 5.19.2021, by [Goose](https://github.com/XuanYang-cn)
+
+## Background
+
+Data Definition Language (DDL) is a language used to define data structures and modify data<sup>[1](#techterms1)</sup>.
+In Milvus terminology, for instance, `CreateCollection` and `DropPartition` etc. are DDL. In order to recover
+or redo DD operations, DataNode flushes DDLs into persistent storages.
+
+Before this design, DataNode buffers DDL chunks by collection, flushes all buffered data in manul/auto flush.
+
+Now in [DataNode Recovery Design](datanode_recover_design_0513_2021.md), flowgraph : vchannel = 1 : 1, and insert
+data of one segment is always in one vchannel. So each flowgraph concerns only about ONE specific collection. For
+DDL channels, one flowgraph only cares about DDL operations of one collection.
+
+## Goals
+
+- Flowgraph knows about which segment/collection to concern.
+- DDNode update masPositions once it buffers ddl about the collection
+- DDNode buffers binlog Paths generated by auto-flush
+- In manul-flush, a background flush-complete goroutinue waits for DDNode and InsertBufferNode both done flushing,
+waiting for both binlog paths.
+
+## Detailed design
+
+1. Redisign of DDL binlog paths and etcd paths for these binlog paths
+
+
+DDL flushes based on a manul flush of a segment.
+
+**Former design**
+```
+# minIO/S3 ddl binlog paths
+${tenant}/data_definition_log/${collection_id}/ts/${log_idx}
+${tenant}/data_definition_log/${collection_id}/ddl/${log_idx}
+
+# etcd paths for ddl binlog paths
+${prefix}/${collectionID}/${idx}
+```
+
+The minIO/S3 ddl binlog paths seems ok, but etcd paths aren't clear, especially when we want to relate a ddl flush
+to a certain segment flush.
+
+**Redesign**
+```
+# etcd paths for ddl binlog paths
+${prefix}/${collectionID}/${segmentID}/${idx}
+```
+
+```
+message SaveBinlogPathsRequest {
+    common.MsgBase base = 1;
+    int64 segmentID = 2;
+    int64 collectionID = 3;
+    ID2PathList field2BinlogPaths = 4;
+    repeated DDLBinlogMeta = 5;
+    repeated internal.MsgPosition start_positions = 7;  
+    repeated internal.MsgPosition end_positions = 8; 
+ }
+```
+
+## TODOs
+
+1. Refactor auto-flush of ddNode
+3. Refactor etcd paths
+
+<a name="techterms1">[1]</a>: *[techterms.com](https://techterms.com/definition/ddl#:~:text=Stands%20for%20%22Data%20Definition%20Language,SQL%2C%20the%20Structured%20Query%20Language)*
--- a/docs/design_docs/datanode_recovery_design_0513_2021.md
+++ b/docs/design_docs/datanode_recovery_design_0513_2021.md
@ -63,12 +63,10 @@ manul-flush and upload to DataServce together.

 ```proto
 rpc SaveBinlogPaths(SaveBinlogPathsRequest) returns (common.Status){}
-
-
-message ID2PathList {                                                                                            
-    int64 ID = 1;                                                                                                
-    repeated string Paths = 2;                                                                                   
-}                                                                                                                
+message ID2PathList {
+    int64 ID = 1;
+    repeated string Paths = 2;
+}

 message SaveBinlogPathsRequest {                                                                                 
    common.MsgBase base = 1; 
@ -87,20 +85,16 @@ message SaveBinlogPathsRequest {
 The same as DataNode

 ```proto
-message FieldFlushMeta {
-    int64 fieldID = 1;
-    repeated string binlog_paths = 2;
+// key: ${prefix}/${segmentID}/${fieldID}/${idx}
+message SegmentFieldBinlogMeta {
+    int64  fieldID = 1;
+    string binlog_path = 2;
 }

-message SegmentFlushMeta{
-    int64 segmentID = 1;
-    bool is_flushed = 2;
-    repeated FieldFlushMeta fields = 5;
-}
-
-message DDLFlushMeta {
-    int64 collectionID = 1;
-    repeated string binlog_paths = 2;
+// key: ${prefix}/${collectionID}/${idx}
+message DDLBinlogMeta {
+    string ddl_binlog_path = 1;
+    string ts_binlog_path = 2;
 }
 ```