Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
5.1 KiB
Create Collections
Milvus 2.0
uses Collection
to represent a set of data, like Table
in a traditional database. Users can create or drop Collection
. Altering the Schema
of Collection
is not supported yet. This article introduces the execution path of CreateCollection
, at the end of this article, you should know which components are involved in CreateCollection
.
The execution flow of CreateCollection
is shown in the following figure:
- Firstly,
SDK
starts aCreateCollection
request toProxy
viaGrpc
, theproto
is defined as follows:
service MilvusService {
...
rpc CreateCollection(CreateCollectionRequest) returns (common.Status) {}
...
}
message CreateCollectionRequest {
// Not useful for now
common.MsgBase base = 1;
// Not useful for now
string db_name = 2;
// The unique collection name in milvus.(Required)
string collection_name = 3;
// The serialized `schema.CollectionSchema`(Required)
bytes schema = 4;
// Once set, no modification is allowed (Optional)
// https://github.com/milvus-io/milvus/issues/6690
int32 shards_num = 5;
}
message CollectionSchema {
string name = 1;
string description = 2;
bool autoID = 3; // deprecated later, keep compatible with c++ part now
repeated FieldSchema fields = 4;
}
- When received the
CreateCollection
request, theProxy
would wrap this request intoCreateCollectionTask
, and pushes this task intoDdTaskQueue
queue. After that,Proxy
would callWaitToFinish
method to wait until the task is finished.
type task interface {
TraceCtx() context.Context
ID() UniqueID // return ReqID
SetID(uid UniqueID) // set ReqID
Name() string
Type() commonpb.MsgType
BeginTs() Timestamp
EndTs() Timestamp
SetTs(ts Timestamp)
OnEnqueue() error
PreExecute(ctx context.Context) error
Execute(ctx context.Context) error
PostExecute(ctx context.Context) error
WaitToFinish() error
Notify(err error)
Notify(err error)
type createCollectionTask struct {
Condition
*milvuspb.CreateCollectionRequest
ctx context.Context
rootCoord types.RootCoord
result *commonpb.Status
schema *schemapb.CollectionSchema
}
-
There is a background service in
Proxy
, this service would get theCreateCollectionTask
fromDdTaskQueue
, and execute it in three phases.PreExecute
, do some static checking at this phase, such as check ifCollection Name
andField Name
are legal, if there are duplicate columns, etc.Execute
, at this phase,Proxy
would sendCreateCollection
request toRootCoord
viaGrpc
, and wait for response, theproto
is defined as follows:
service RootCoord { ... rpc CreateCollection(milvus.CreateCollectionRequest) returns (common.Status){} ... }
PostExecute
,CreateCollectonTask
does nothing at this phase, and return directly.
-
RootCoord
would wrap theCreateCollection
request intoCreateCollectionReqTask
, and then call functionexecuteTask
.executeTask
would return until thecontext
is done orCreateCollectionReqTask.Execute
is returned.
type reqTask interface {
Ctx() context.Context
Type() commonpb.MsgType
Execute(ctx context.Context) error
Core() *Core
Core() *Core }
type CreateCollectionReqTask struct {
baseReqTask
Req *milvuspb.CreateCollectionRequest
}
-
CreateCollectionReqTask.Execute
would allocCollecitonID
and defaultPartitionID
, and setVirtual Channel
andPhysical Channel
, which are used byMsgStream
, then write theCollection
's meta intometaTable
-
After
Collection
's meta writing intometaTable
,Milvus
would consider this collection has been created successfully. -
RootCoord
would alloc a timestamp fromTSO
before writingCollection
's meta intometaTable
, and this timestamp is considered as the point when the collection was created -
At last
RootCoord
will send a message ofCreateCollectionRequest
intoMsgStream
, and other components, who have subscribed to theMsgStream
, would be notified. TheProto
ofCreateCollectionRequest
is defined as follow:
message CreateCollectionRequest {
common.MsgBase base = 1;
string db_name = 2;
string collectionName = 3;
string partitionName = 4;
int64 dbID = 5;
int64 collectionID = 6;
int64 partitionID = 7;
// `schema` is the serialized `schema.CollectionSchema`
bytes schema = 8;
repeated string virtualChannelNames = 9;
repeated string physicalChannelNames = 10;
}
- After all these operations,
RootCoord
would update the internal timestamp and return, so theProxy
would get the response.
Notes:
-
In the
Proxy
, allDDL
requests will be wrapped intotask
, and push thetask
intoDdTaskQueue
, the background service will read a newtask
fromDdTaskQueue
only when the previous one is finished. So all theDDL
requests are executed serially on theProxy
-
In the
RootCoord
, allDDL
requests will be wrapped intoreqTask
, but there is no task queue, so theDDL
requests will be executed in parallel onRootCoord
.