milvus/docs/design_docs/datanode_ddl_flush_design_0519_2021.md
XuanYang-cn ac19711d74 Destroy DataNode when drop collections (#5638)
* Destroy DataNode when drop collections

Signed-off-by: yangxuan <xuan.yang@zilliz.com>

* golanci-lint

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>

Co-authored-by: yefu.chen <yefu.chen@zilliz.com>
2021-06-15 16:04:48 +08:00

2.6 KiB

DataNode DDL Flush Design

update: 5.19.2021, by Goose update: 5.21.2021, by Goose update: 6.04.2021, by Goose

THIS IS OUTDATE

Background

Data Definition Language (DDL) is a language used to define data structures and modify data1. In Milvus terminology, for instance, CreateCollection and DropPartition etc. are DDL. In order to recover or redo DD operations, DataNode flushes DDLs into persistent storages.

Before this design, DataNode buffers DDL chunks by collection, flushes all buffered data in manul/auto flush.

Now in DataNode Recovery Design, flowgraph : vchannel = 1 : 1, and insert data of one segment is always in one vchannel. So each flowgraph concerns only about ONE specific collection. For DDL channels, one flowgraph only cares about DDL operations of one collection. In this case, I don't think it's necessary to auto-flush ddl anymore.

Goals

  • Flowgraph knows about which segment/collection to concern.
  • DDNode update masPositions once it buffers ddl about the collection.
  • DDNode won't auto flush.
  • In manul-flush, a background flush-complete goroutinue waits for DDNode and InsertBufferNode both done flushing, waiting for both binlog paths.

Detailed design

  1. Redisign of DDL binlog paths and etcd paths for these binlog paths

DDL flushes based on a manul flush of a segment.

Former design

# minIO/S3 ddl binlog paths
${tenant}/data_definition_log/${collection_id}/ts/${log_idx}
${tenant}/data_definition_log/${collection_id}/ddl/${log_idx}

# etcd paths for ddl binlog paths
${prefix}/${collectionID}/${idx}

The minIO/S3 ddl binlog paths seems ok, but etcd paths aren't clear, especially when we want to relate a ddl flush to a certain segment flush.

Redesign

# etcd paths for ddl binlog paths
${prefix}/${collectionID}/${segmentID}/${idx}
message PositionPair {
  internal.MsgPosition start_position = 1;
  internal.MsgPosition end_position = 2;
}

message SaveBinlogPathsRequest {
    common.MsgBase base = 1;
    int64 segmentID = 2;
    int64 collectionID = 3;
    ID2PathList field2BinlogPaths = 4;
    repeated DDLBinlogMeta = 5;
    PositionPair dml_position = 6;
    PositionPair ddl_position =7;
 }

TODOs

  1. Refactor auto-flush of ddNode
  2. Refactor etcd paths

[1]: techterms.com