milvus/docs/design_docs/milvus_timesync_en.md

# Timesync -- All The things you should know

`Time Synchronization` is the kernel part of Milvus 2.0; it affects all components of the system. This document describes the detailed design of `Time Synchronization`.

There are 2 kinds of events in Milvus 2.0:

- DDL events
  - create collection
  - drop collection
  - create partition
  - drop partition
- DML events
  - insert
  - search
  - etc

All events have a `Timestamp` to indicate when this event occurs.

Suppose there are two users, `u1` and `u2`. They connect to Milvus and do the following operations at the respective timestamps.

| ts  | u1                   | u2           |
| --- | -------------------- | ------------ |
| t0  | create Collection C0 | -            |
| t2  | -                    | search on C0 |
| t5  | insert A1 into C0    | -            |
| t7  | -                    | search on C0 |
| t10 | insert A2            | -            |
| t12 | -                    | search on C0 |
| t15 | delete A1 from C0    | -            |
| t17 | -                    | search on C0 |

Ideally, `u2` expects `C0` to be empty at `t2`, and could only see `A1` at `t7`; while `u2` could see both `A1` and `A2` at `t12`, but only see `A2` at `t17`.

It's easy to achieve this in a `single-node` database. But for a `Distributed System`, such as `Milvus`, it's a little difficult; the following problems need to be solved:

1. If `u1` and `u2` are on different nodes, and their time clock is not synchronized. To give an extreme example, suppose that the time of `u2` is 24 hours later than `u1`, then all the operations of `u1` can't be seen by `u2` until next day.
2. Network latency. If `u2` starts the `Search on C0` at `t17`, then how can it be guaranteed that all the `events` before `t17` have been processed? If the events of `delete A1 from C0` has been delayed due to the network latency, then it would lead to incorrect state: `u2` would see both `A1` and `A2` at `t17`.

`Time synchronization system` is used to solve the above problems.

## Timestamp Oracle(TSO)

Like [TiKV](https://github.com/tikv/tikv), Milvus 2.0 provides `TSO` service. All the events must alloc timestamp from `TSO`，not from local clock, so the first problem can be solved.

`TSO` is provided by the `RootCoord` component. Clients could alloc one or more timestamp in a single request; the `proto` is defined as following.

```proto
service RootCoord {
    ...
    rpc AllocTimestamp(AllocTimestampRequest) returns (AllocTimestampResponse) {}
    ...
}

message AllocTimestampRequest {
  common.MsgBase base = 1;
  uint32 count = 3;
}

message AllocTimestampResponse {
    common.Status status = 1;
    uint64 timestamp = 2;
    uint32 count = 3;
}
```

`Timestamp` is of type `uint64`, containing physical and logical parts.

This is the format of `Timestamp`

![Timestamp struct](./graphs/time_stamp_struct.jpg)

In an `AllocTimestamp` request, if `AllocTimestampRequest.count` is greater than `1`, `AllocTimestampResponse.timestamp` indicates the first available timestamp in the response.

## Time Synchronization

To understand the `Time Synchronization` better, let's introduce the data operation of Milvus 2.0 briefly.
Taking `Insert Operation` as an example.

- User can configure lots of `Proxy` to achieve load balancing, in `Milvus 2.0`
- User can use `SDK` to connect to any `Proxy`
- When `Proxy` receives `Insert` Request from `SDK`, it splits `InsertMsg` into different `MsgStream` according to the hash value of `Primary Key`
- Each `InsertMsg` would be assigned with a `Timestamp` before sending to the `MsgStream`

>*Note: `MsgStream` is the wrapper of message queue, the default message queue in `Milvus 2.0` is `pulsar`*

![proxy insert](./graphs/timesync_proxy_insert_msg.png)

Based on the above information, we know that the `MsgStream` has the following characteristics:

- In `MsgStream`, `InsertMsg` from the same `Proxy` must be incremented in timestamp
- In `MsgStream`, `InsertMsg` from different `Proxy` have no relationship in timestamp

The following figure shows an example of `InsertMsg` in `MsgStream`. The snippet contains 5 `InsertMsg`, 3 of them from `Proxy1` and others from `Proxy2`.

The 3 `InsertMsg` from `Proxy1` are incremented in timestamp, and the 2 `InsertMsg` from `Proxy2` are also incremented in timestamps, but there is no relationship between `Proxy1` and `Proxy2`.

![msgstream](./graphs/timesync_msgstream.png)

So the second problem has turned into this: after reading a message from `MsgStream`, how to make sure that all the messages with smaller timestamp have been consumed?

For example, when reading a message with timestamp `110` produced by `Proxy2`, but the message with timestamp `80` produced by `Proxy1`, is still in the `MsgStream`. How can this situation be handled?

The following graph shows the core logic of `Time Synchronization System` in `Milvus 2.0`; it should solve the second problem.

- Each `Proxy` will periodically report its latest timestamp of every `MsgStream` to `RootCoord`; the default interval is `200ms`
- For each `Msgstream`, `Rootcoord` finds the minimum timestamp of all `Proxy` on this `Msgstream`, and inserts this minimum timestamp into the `Msgstream`
- When the consumer reads the timestamp inserted by the `RootCoord` on the `MsgStream`, it indicates that all messages with smaller timestamp have been consumed, so all actions that depend on this timestamp can be executed safely
- The message inserted by `RootCoord` into `MsgStream` is of type `TimeTick`

![upload time tick](./graphs/timesync_proxy_upload_time_tick.png)

This is the `Proto` that is used by `Proxy` to report timestamp to `RootCoord`:

```proto
service RootCoord {
    ...
    rpc UpdateChannelTimeTick(internal.ChannelTimeTickMsg) returns (common.Status) {}
    ... 
}

message ChannelTimeTickMsg {
  common.MsgBase base = 1;
  repeated string channelNames = 2;
  repeated uint64 timestamps = 3;
  uint64 default_timestamp = 4;
}
```

After inserting `Timetick`, the `Msgstream` should look like this:
![msgstream time tick](./graphs/timesync_msgtream_timetick.png)

`MsgStream` will process the messages in batches according to `TimeTick`, and ensure that the output messages meet the requirements of timestamp. For more details, please refer to the `MsgStream` design details.
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
+								# Timesync -- All The things you should know
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+								`Time Synchronization` is the kernel part of Milvus 2.0; it affects all components of the system. This document describes the detailed design of `Time Synchronization`.
-												[skip ci] Update timesync doc (#7822)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 18:54:57 +08:00
 								There are 2 kinds of events in Milvus 2.0:
-												[skip ci]Improve milvus_timesync_en.md (#11623)

Signed-off-by: tumao <yan.wang@zilliz.com>
											
										
										
											2021-11-11 13:19:14 +08:00
-												[skip ci] Update timesync doc (#7822)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 18:54:57 +08:00
+								- DDL events
 								  - create collection
 								  - drop collection
 								  - create partition
 								  - drop partition
 								- DML events
 								  - insert
 								  - search
 								  - etc
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
-												[skip ci] Update timesync doc (#7822)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 18:54:57 +08:00
+								All events have a `Timestamp` to indicate when this event occurs.
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+								Suppose there are two users, `u1` and `u2`. They connect to Milvus and do the following operations at the respective timestamps.
-												[skip ci] Update timesync doc (#7822)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 18:54:57 +08:00
-												[skip ci]Improve milvus_timesync_en.md (#11623)

Signed-off-by: tumao <yan.wang@zilliz.com>
											
										
										
											2021-11-11 13:19:14 +08:00
+								| ts  | u1                   | u2           |
 								| --- | -------------------- | ------------ |
 								| t0  | create Collection C0 | -            |
 								| t2  | -                    | search on C0 |
 								| t5  | insert A1 into C0    | -            |
 								| t7  | -                    | search on C0 |
 								| t10 | insert A2            | -            |
 								| t12 | -                    | search on C0 |
 								| t15 | delete A1 from C0    | -            |
 								| t17 | -                    | search on C0 |
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+								Ideally, `u2` expects `C0` to be empty at `t2`, and could only see `A1` at `t7`; while `u2` could see both `A1` and `A2` at `t12`, but only see `A2` at `t17`.
-												[skip ci] Update timesync doc (#7822)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 18:54:57 +08:00
-												[skip ci] Add note for design doc (#13041)

Signed-off-by: yhmo <yihua.mo@zilliz.com>
											
										
										
											2021-12-09 11:29:18 +08:00
+								It's easy to achieve this in a `single-node` database. But for a `Distributed System`, such as `Milvus`, it's a little difficult; the following problems need to be solved:
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
-												[skip ci] Check the syntax of timesync_md (#13390)

Signed-off-by: min.tian <min.tian.cn@gmail.com>
											
										
										
											2021-12-15 09:29:09 +08:00
+. If `u1` and `u2` are on different nodes, and their time clock is not synchronized. To give an extreme example, suppose that the time of `u2` is 24 hours later than `u1`, then all the operations of `u1` can't be seen by `u2` until next day.
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+. Network latency. If `u2` starts the `Search on C0` at `t17`, then how can it be guaranteed that all the `events` before `t17` have been processed? If the events of `delete A1 from C0` has been delayed due to the network latency, then it would lead to incorrect state: `u2` would see both `A1` and `A2` at `t17`.
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
 								`Time synchronization system` is used to solve the above problems.
 								## Timestamp Oracle(TSO)
-												[skip ci] Add note for design doc (#13405)

Signed-off-by: yhmo <yihua.mo@zilliz.com>
											
										
										
											2021-12-15 10:35:24 +08:00
+								Like [TiKV](https://github.com/tikv/tikv), Milvus 2.0 provides `TSO` service. All the events must alloc timestamp from `TSO`，not from local clock, so the first problem can be solved.
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+								`TSO` is provided by the `RootCoord` component. Clients could alloc one or more timestamp in a single request; the `proto` is defined as following.
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
 								```proto
 								service RootCoord {
 								    ...
 								    rpc AllocTimestamp(AllocTimestampRequest) returns (AllocTimestampResponse) {}
-												[skip ci]Improve milvus_timesync_en.md (#11623)

Signed-off-by: tumao <yan.wang@zilliz.com>
											
										
										
											2021-11-11 13:19:14 +08:00
+								    ...
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
+								}
 								message AllocTimestampRequest {
 								  common.MsgBase base = 1;
 								  uint32 count = 3;
 								}
 								message AllocTimestampResponse {
 								    common.Status status = 1;
 								    uint64 timestamp = 2;
 								    uint32 count = 3;
 								}
 								```
-												[skip ci]Improve milvus_timesync_en.md (#11623)

Signed-off-by: tumao <yan.wang@zilliz.com>
											
										
										
											2021-11-11 13:19:14 +08:00
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+								`Timestamp` is of type `uint64`, containing physical and logical parts.
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
 								This is the format of `Timestamp`
-												[skip ci] Update timesync doc graph (#7835)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 19:54:58 +08:00
+								![Timestamp struct](./graphs/time_stamp_struct.jpg)
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
-												[skip ci] Update timesync doc TSO and Synchronization part (#7819)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 19:24:57 +08:00
+								In an `AllocTimestamp` request, if `AllocTimestampRequest.count` is greater than `1`, `AllocTimestampResponse.timestamp` indicates the first available timestamp in the response.
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
 								## Time Synchronization
-												[skip ci]Improve milvus_timesync_en.md (#11623)

Signed-off-by: tumao <yan.wang@zilliz.com>
											
										
										
											2021-11-11 13:19:14 +08:00
-												[skip ci] Update timesync doc TSO and Synchronization part (#7819)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 19:24:57 +08:00
+								To understand the `Time Synchronization` better, let's introduce the data operation of Milvus 2.0 briefly.
 								Taking `Insert Operation` as an example.
-												[skip ci]Improve milvus_timesync_en.md (#11623)

Signed-off-by: tumao <yan.wang@zilliz.com>
											
										
										
											2021-11-11 13:19:14 +08:00
-												[skip ci] Update timesync doc TSO and Synchronization part (#7819)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 19:24:57 +08:00
+								- User can configure lots of `Proxy` to achieve load balancing, in `Milvus 2.0`
 								- User can use `SDK` to connect to any `Proxy`
 								- When `Proxy` receives `Insert` Request from `SDK`, it splits `InsertMsg` into different `MsgStream` according to the hash value of `Primary Key`
 								- Each `InsertMsg` would be assigned with a `Timestamp` before sending to the `MsgStream`
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
-												[skip ci] Update timesync doc TSO and Synchronization part (#7819)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 19:24:57 +08:00
+								>*Note: `MsgStream` is the wrapper of message queue, the default message queue in `Milvus 2.0` is `pulsar`*
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
 								![proxy insert](./graphs/timesync_proxy_insert_msg.png)
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+								Based on the above information, we know that the `MsgStream` has the following characteristics:
-												[skip ci]Improve milvus_timesync_en.md (#11623)

Signed-off-by: tumao <yan.wang@zilliz.com>
											
										
										
											2021-11-11 13:19:14 +08:00
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
+								- In `MsgStream`, `InsertMsg` from the same `Proxy` must be incremented in timestamp
 								- In `MsgStream`, `InsertMsg` from different `Proxy` have no relationship in timestamp
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+								The following figure shows an example of `InsertMsg` in `MsgStream`. The snippet contains 5 `InsertMsg`, 3 of them from `Proxy1` and others from `Proxy2`.
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
 								The 3 `InsertMsg` from `Proxy1` are incremented in timestamp, and the 2 `InsertMsg` from `Proxy2` are also incremented in timestamps, but there is no relationship between `Proxy1` and `Proxy2`.
 								![msgstream](./graphs/timesync_msgstream.png)
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+								So the second problem has turned into this: after reading a message from `MsgStream`, how to make sure that all the messages with smaller timestamp have been consumed?
-												[skip ci] Update timesync doc TSO and Synchronization part (#7819)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 19:24:57 +08:00
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+								For example, when reading a message with timestamp `110` produced by `Proxy2`, but the message with timestamp `80` produced by `Proxy1`, is still in the `MsgStream`. How can this situation be handled?
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+								The following graph shows the core logic of `Time Synchronization System` in `Milvus 2.0`; it should solve the second problem.
-												[skip ci]Improve milvus_timesync_en.md (#11623)

Signed-off-by: tumao <yan.wang@zilliz.com>
											
										
										
											2021-11-11 13:19:14 +08:00
-												[skip ci]Update typo issue (#11187)

Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
											
										
										
											2021-11-03 23:11:39 +08:00
+								- Each `Proxy` will periodically report its latest timestamp of every `MsgStream` to `RootCoord`; the default interval is `200ms`
-												[skip ci] Update timesync doc TSO and Synchronization part (#7819)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 19:24:57 +08:00
+								- For each `Msgstream`, `Rootcoord` finds the minimum timestamp of all `Proxy` on this `Msgstream`, and inserts this minimum timestamp into the `Msgstream`
 								- When the consumer reads the timestamp inserted by the `RootCoord` on the `MsgStream`, it indicates that all messages with smaller timestamp have been consumed, so all actions that depend on this timestamp can be executed safely
-												Grammar in timesync doc (#8439)

Signed-off-by: NotRyan <ryan.chan@zilliz.com>

Co-authored-by: NotRyan <ryan.chan@zilliz.com>
											
										
										
											2021-09-24 18:38:09 +08:00
+								- The message inserted by `RootCoord` into `MsgStream` is of type `TimeTick`
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
 								![upload time tick](./graphs/timesync_proxy_upload_time_tick.png)
-												[skip ci] Fix some grammar issues of milvus_timesync md (#12837)

Signed-off-by: min.tian <min.tian.cn@gmail.com>
											
										
										
											2021-12-07 09:57:49 +08:00
+								This is the `Proto` that is used by `Proxy` to report timestamp to `RootCoord`:
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
 								```proto
 								service RootCoord {
 								    ...
 								    rpc UpdateChannelTimeTick(internal.ChannelTimeTickMsg) returns (common.Status) {}
-												[skip ci]Improve milvus_timesync_en.md (#11623)

Signed-off-by: tumao <yan.wang@zilliz.com>
											
										
										
											2021-11-11 13:19:14 +08:00
+								    ...
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
+								}
 								message ChannelTimeTickMsg {
 								  common.MsgBase base = 1;
 								  repeated string channelNames = 2;
 								  repeated uint64 timestamps = 3;
 								  uint64 default_timestamp = 4;
 								}
 								```
-												[skip ci] Update timesync doc TSO and Synchronization part (#7819)

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
											
										
										
											2021-09-13 19:24:57 +08:00
+								After inserting `Timetick`, the `Msgstream` should look like this:
-												timesync doc (#6783)

Signed-off-by: yefu.chen <yefu.chen@zilliz.com>
											
										
										
											2021-08-11 11:48:48 +08:00
+								![msgstream time tick](./graphs/timesync_msgtream_timetick.png)
-												[skip ci] Fix typos in design doc (#11993)

Signed-off-by: Binbin Lv <binbin.lv@zilliz.com>
											
										
										
											2021-11-17 19:21:29 +08:00
+								`MsgStream` will process the messages in batches according to `TimeTick`, and ensure that the output messages meet the requirements of timestamp. For more details, please refer to the `MsgStream` design details.