mirror of
https://gitee.com/milvus-io/milvus.git
synced 2024-12-04 04:49:08 +08:00
b1e95ccf28
design doc about master service recover on power failure Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
8.2 KiB
8.2 KiB
Master service recovery on power failure
1. Basic idea
master service
reads meta from etcd when it startsmaster service
needs to store theposition
of the msgstream into etcd every time it consumes the msgstream.master service
reads theposition
of msgstream from etcd when it starts up, then seek to the specifiedposition
and re-consume the msgstream- Ensure that all messages from the msgstream are processed in idempotent fashion, so that repeated consumption of the same message does not cause system inconsistencies
master service
registers itself in etcd and finds out if the dependentdata service
andindex service
are online via etcd
2. Specific tasks
2.1 Read meta from etcd
master service
needs to load meta from etcd when it starts, this part is already done
2.2 dd requests
from grpc
- The
dd requests
, such as create_collection, create_partition, etc., from grpc are marked as done only if the related meata have been writen into etcd. - The
dd requests
should be send todd msgstream
when the operation is done. - There may be a fault here, that is, the
dd request
has been written to etcd, but it has not been sent todd msgstream
yet, then themaster service
has crashed. - For the scenarios mentioned in item 3,
master service
needs to check if alldd requests
are sent todd msgstream
when it starts up. master service
's built-in scheduler ensures that all grpc requests are executed serially, so it only needs to check whether the most recentdd requests
are sent to thedd msgstream
, and resend them if not.- Take
create_collection
as an example to illustrate the process- When
create collection
is written to etcd, 2 additional keys are updated,dd_msg
anddd_type
dd_msg
is the serialization of thedd_msg
dd_type
is the message type ofdd_msg
, such ascreate_collection
,create_partition
,drop_collection,
etc. It's used to deserializesdd_msg
.- Update the meta of
create_collection
,dd_msg
anddd_type
at the same time in a transactional manner. - When
dd_msg
has been sent todd msgstream
, deletedd_msg
anddd_type
from etcd. - When the
master service
starts, first check whether there aredd_msg
anddd_type
in etcd, if so, then deserializedd_msg
according todd_type
, and then send it to thedd msgstream
, otherwise no processing will be done - There may be a failure here, that is,
dd_msg
has been sent to thedd msgstream
, but has not been deleted from etcd yet, then themaster service
crashed, at this case, thedd_msg
would be sent todd msgstream
repeatedly, so the receiver needs to count this case.
- When
2.3 create index
requests from grpc
- In the processing of
create index
,master service
callsmetaTable
'sGetNotIndexedSegments
to get all segment ids that are not indexed - After getting the segment ids,
master service
callindex service
create the index on these segment ids. - In the current implementation, the
create index
requests will return after the segment ids are put into a go channel. - The
master service
starts a background task that keeps reading the segment ids from the go channel, and then calls theindex service
to create the index. - There is a fault here, the segment ids have been put into the go channel in the processing function of the grpc request, and then the grpc returns, but the
master service
's background task has not yet read them from the go channel, thebmaster service
crashes. At this time, the client thinks that the index is created, but themaster service
does not callindex service
to create the index. - The solution for the fault mentioned in item 5
- Remove the go channel and
master service
's background task - In the request processing function of
create index
, the call will return only when all segment ids have been sendindex service
- Some segment ids may be send to
index service
repeatedly, andindex service
needs to handle such requests
- Remove the go channel and
2.4 New segment from data service
- Each time a new segment is created, the
data service
sends the segment id to themaster service
via msgstream master service
needs to update the segment id to the collection meta and record the position of the msgstream in etcd- Step 2 is transactional and the operation will be successful only if the collection meta in etcd is updated
- So the
master service
only needs to restore the msgstream to the position when recovering from a power failure
2.5 Flushed segment from data node
- Each time the
data node
finishes flushing a segment, it sends the segment id to themaster service
via msgstream. master service
needs to fetch binlog fromdata service
by id and send request toindex service
to create index on this segment- When the
index service
is called successfully, it will return a build id, and thenmaster service
will update the build id to thecollection meta
and record the position of the msgstream in etcd. - Step 3 is transactional and the operation will be successful only if the
collection meta
in etcd is updated - So the
master service
only needs to restore the msgstream to the position when recovering from a power failure
2.6 Failed to call external grpc service
master service
needs grpc service fromdata service
andindex service
, if the grpc call failed, it needs to reconnect.master service
does not listen to the status of thedata service
andindex service
in real time
2.7 Add virtual channel assignment when creating collection
- Add a new field, "number of shards" in the
create collection
request, the "num of shards" tell themaster service
to create the number of virtual channel for this collection. - In the current implementation, virtual channels and physical channels have a one-to-one relationship, and the total number of physical channels increases as the number of virtual channels increases; later, the total number of physical channels needs to be fixed, and multiple virtual channels share one physical channel
- The name of the virtual channel is globally unique, and the
collection meta
records the correspondence between the virtual channel and the physical channel
Add processing of time synchronization signals from proxy node
- A virtual channel can be inserted by multiple proxies, so the timestamp in the virtual channel is not increase monotonically
- All proxies report the timestamp of all the virtual channels to the
master service
periodically - The
master service
collects the timestamps from the proxies on each virtual channel and gets the minimum one as the timestamp of that virtual channel, and then inserts the timestamp into the virtual channel - The proxy reports the timestamp to the
master service
via grpc - The proxy needs to register itself in etcd when it starts,
master service
will listen to the corresponding key to determine how many active proxies there are, and thus determine if all of them have sent timestamps to master - If a proxy is not registered in etcd but sends a timestamp or any other grpc request to master, master will ignore the grpc request
2.9 Register service in etcd
master service
needs to register itself with etcd when it starts- The registration should include ip address, port, its own id, global incremental timestamp
2.10 Remove the code related to proxy service
- The
proxy service
related code will be removed - The the job of time synchronization which done by the
proxy service
is partially simplified and handed over to the master (subsection 2.8)
2.11 Query collection meta based on timeline
- Add a new field of
timestamp
to the grpc request ofdescribe collection
master service
should provide snapshot on thecollection mate
- Return the
collection meta
at the point of timestamp mentioned in the request
2.12 Timestamp of dd operations
master service
response to set the timestamp ofdd operations
, create collection, create partition, drop collection, drop partitionmaster service
response to send timestamp todd msgstream
, if there is a dd message, then use the current latest timestamp from that message, if not, get a timestamp from tso