mirror of
https://gitee.com/milvus-io/milvus.git
synced 2024-12-01 11:29:48 +08:00
Add ddl flush design (#5289)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
This commit is contained in:
parent
9b37cab922
commit
4b712284f2
68
docs/design_docs/datanode_ddl_flush_design_0519_2021.md
Normal file
68
docs/design_docs/datanode_ddl_flush_design_0519_2021.md
Normal file
@ -0,0 +1,68 @@
|
||||
# DataNode DDL Flush Design
|
||||
|
||||
update: 5.19.2021, by [Goose](https://github.com/XuanYang-cn)
|
||||
|
||||
## Background
|
||||
|
||||
Data Definition Language (DDL) is a language used to define data structures and modify data<sup>[1](#techterms1)</sup>.
|
||||
In Milvus terminology, for instance, `CreateCollection` and `DropPartition` etc. are DDL. In order to recover
|
||||
or redo DD operations, DataNode flushes DDLs into persistent storages.
|
||||
|
||||
Before this design, DataNode buffers DDL chunks by collection, flushes all buffered data in manul/auto flush.
|
||||
|
||||
Now in [DataNode Recovery Design](datanode_recover_design_0513_2021.md), flowgraph : vchannel = 1 : 1, and insert
|
||||
data of one segment is always in one vchannel. So each flowgraph concerns only about ONE specific collection. For
|
||||
DDL channels, one flowgraph only cares about DDL operations of one collection.
|
||||
|
||||
## Goals
|
||||
|
||||
- Flowgraph knows about which segment/collection to concern.
|
||||
- DDNode update masPositions once it buffers ddl about the collection
|
||||
- DDNode buffers binlog Paths generated by auto-flush
|
||||
- In manul-flush, a background flush-complete goroutinue waits for DDNode and InsertBufferNode both done flushing,
|
||||
waiting for both binlog paths.
|
||||
|
||||
## Detailed design
|
||||
|
||||
1. Redisign of DDL binlog paths and etcd paths for these binlog paths
|
||||
|
||||
|
||||
DDL flushes based on a manul flush of a segment.
|
||||
|
||||
**Former design**
|
||||
```
|
||||
# minIO/S3 ddl binlog paths
|
||||
${tenant}/data_definition_log/${collection_id}/ts/${log_idx}
|
||||
${tenant}/data_definition_log/${collection_id}/ddl/${log_idx}
|
||||
|
||||
# etcd paths for ddl binlog paths
|
||||
${prefix}/${collectionID}/${idx}
|
||||
```
|
||||
|
||||
The minIO/S3 ddl binlog paths seems ok, but etcd paths aren't clear, especially when we want to relate a ddl flush
|
||||
to a certain segment flush.
|
||||
|
||||
**Redesign**
|
||||
```
|
||||
# etcd paths for ddl binlog paths
|
||||
${prefix}/${collectionID}/${segmentID}/${idx}
|
||||
```
|
||||
|
||||
```
|
||||
message SaveBinlogPathsRequest {
|
||||
common.MsgBase base = 1;
|
||||
int64 segmentID = 2;
|
||||
int64 collectionID = 3;
|
||||
ID2PathList field2BinlogPaths = 4;
|
||||
repeated DDLBinlogMeta = 5;
|
||||
repeated internal.MsgPosition start_positions = 7;
|
||||
repeated internal.MsgPosition end_positions = 8;
|
||||
}
|
||||
```
|
||||
|
||||
## TODOs
|
||||
|
||||
1. Refactor auto-flush of ddNode
|
||||
3. Refactor etcd paths
|
||||
|
||||
<a name="techterms1">[1]</a>: *[techterms.com](https://techterms.com/definition/ddl#:~:text=Stands%20for%20%22Data%20Definition%20Language,SQL%2C%20the%20Structured%20Query%20Language)*
|
@ -63,12 +63,10 @@ manul-flush and upload to DataServce together.
|
||||
|
||||
```proto
|
||||
rpc SaveBinlogPaths(SaveBinlogPathsRequest) returns (common.Status){}
|
||||
|
||||
|
||||
message ID2PathList {
|
||||
int64 ID = 1;
|
||||
repeated string Paths = 2;
|
||||
}
|
||||
message ID2PathList {
|
||||
int64 ID = 1;
|
||||
repeated string Paths = 2;
|
||||
}
|
||||
|
||||
message SaveBinlogPathsRequest {
|
||||
common.MsgBase base = 1;
|
||||
@ -87,20 +85,16 @@ message SaveBinlogPathsRequest {
|
||||
The same as DataNode
|
||||
|
||||
```proto
|
||||
message FieldFlushMeta {
|
||||
int64 fieldID = 1;
|
||||
repeated string binlog_paths = 2;
|
||||
// key: ${prefix}/${segmentID}/${fieldID}/${idx}
|
||||
message SegmentFieldBinlogMeta {
|
||||
int64 fieldID = 1;
|
||||
string binlog_path = 2;
|
||||
}
|
||||
|
||||
message SegmentFlushMeta{
|
||||
int64 segmentID = 1;
|
||||
bool is_flushed = 2;
|
||||
repeated FieldFlushMeta fields = 5;
|
||||
}
|
||||
|
||||
message DDLFlushMeta {
|
||||
int64 collectionID = 1;
|
||||
repeated string binlog_paths = 2;
|
||||
// key: ${prefix}/${collectionID}/${idx}
|
||||
message DDLBinlogMeta {
|
||||
string ddl_binlog_path = 1;
|
||||
string ts_binlog_path = 2;
|
||||
}
|
||||
```
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user