milvus/docs/design_docs/milvus_create_collection_en.md
neza2017 a0b39cc08a
update design detail of "create collection" (#6795)
Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
2021-08-11 11:49:15 +08:00

5.0 KiB

Create Collections

Milvus 2.0 use Collection to represent a set of data, like Table in traditional database. Users can create or drop Collection. Altering the Schema of Collection is not supported yet. This article introduces the execution path of CreateCollection, at the end of this article, you should know which components are involved in CreateCollection.

The execution flow of CreateCollection is shown in the following figure:

create_collection

  1. Firstly, SDK starts a CreateCollection request to Proxy via Grpc, the proto is defined as follows:
service MilvusService {
    ...

    rpc CreateCollection(CreateCollectionRequest) returns (common.Status) {}

    ...
}

message CreateCollectionRequest {
  common.MsgBase base = 1; // must
  string db_name = 2;
  string collection_name = 3; // must
  // `schema` is the serialized `schema.CollectionSchema`
  bytes schema = 4; // must
  int32 shards_num = 5; // must. Once set, no modification is allowed
}

message CollectionSchema {
  string name = 1;
  string description = 2;
  bool autoID = 3; // deprecated later, keep compatible with c++ part now
  repeated FieldSchema fields = 4;
}

  1. When received the CreateCollection request, the Proxy would wraps this request into CreateCollectionTask, and pushs this task into DdTaskQueue queue. After that, Proxy would call method of WatiToFinish to wait until the task finished.
type task interface {
	TraceCtx() context.Context
	ID() UniqueID // return ReqID
	SetID(uid UniqueID) // set ReqID
	Name() string
	Type() commonpb.MsgType
	BeginTs() Timestamp
	EndTs() Timestamp
	SetTs(ts Timestamp)
	OnEnqueue() error
	PreExecute(ctx context.Context) error
	Execute(ctx context.Context) error
	PostExecute(ctx context.Context) error
	WaitToFinish() error
	Notify(err error)
Notify(err error)

type CreateCollectionTask struct {
	Condition
	*milvuspb.CreateCollectionRequest
	ctx context.Context
	rootCoord types.RootCoord
	dataCoordClient types.DataCoord
	result *commonpb.Status
	schema *schemapb.CollectionSchema
}
  1. There is a backgroud service in Proxy, this service would get the CreateCollectionTask from DdTaskQueue, and executes it in three phases.

    • PreExecute, do some static checking at this phase, such as check if Collection Name and Field Name is legal, if there are duplicate columns, etc.
    • Execute, at thie phase, Proxy would send CreateCollection request to RootCoord via Grpc,and wait the reponse, the proto is defined as follow:
    service RootCoord {
        ...
    
        rpc CreateCollection(milvus.CreateCollectionRequest) returns (common.Status){}
    
        ...
    }
    
    • PostExecute, CreateCollectonTask does nothing at this phase, and return directly.
  2. RootCoord would wraps the CreateCollection request into CreateCollectionReqTask, and then call function executeTask. executeTask would return until the context is done or CreateCollectionReqTask.Execute returned.

type reqTask interface {
	Ctx() context.Context
	Type() commonpb.MsgType
	Execute(ctx context.Context) error
	Core() *Core
Core() *Core }

type CreateCollectionReqTask struct {
	baseReqTask
	Req *milvuspb.CreateCollectionRequest
}
  1. CreateCollectionReqTask.Execute would alloc CollecitonID and default PartitionID, and set Virtual Channel and Physical Channel, which are used by MsgStream, then write the Collection's meta into metaTable

  2. After Collection's meta writing into metaTable, Milvus would consider this collection has been created successfully.

  3. RootCoord would alloc a timestamp from TSO before writing Collection's meta into metaTable, and this timestamp is considered as the point when the collection was created

  4. At last RoooCoord will send a message of CreateCollectionRequest into MsgStream, and other components, who has subscribe to the MsgStream, would be notified. The Proto of CreateCollectionRequest is defined as follow:

message CreateCollectionRequest {
  common.MsgBase base = 1;
  string db_name = 2;
  string collectionName = 3;
  string partitionName = 4;
  int64 dbID = 5;
  int64 collectionID = 6;
  int64 partitionID = 7;
  // `schema` is the serialized `schema.CollectionSchema`
  bytes schema = 8;
  repeated string virtualChannelNames = 9;
  repeated string physicalChannelNames = 10;
}

  1. After all these operations, RootCoord would update internal timestamp and return, so the Proxy would get the reponse.

Notes:

  1. In the Proxy, all DDL requests will be wraped into task, and push the task into DdTaskQueue, the backgroud service will read a new task from DdTaskQueue only when the previous one is finished. So all the DDL requests are executed serially on the Proxy

  2. In the RootCoord, all DDL requests will be wrapped into reqTask, but there is no task queue, so the DDL requests will be executed in paralledl on RootCoord.