issue: #35821
After collection loaded, if we need to increase/decrease collection's
replica, we need to release and load it again.
milvus offers 4 solution to update loaded collection's replica, this PR
aims to dynamic change the replica number without release, and after
replica number changed, milvus will execute load replica or release
replica in async, and the replica loaded status can be checked by
getReplicas API.
Notice that if set too much replicas than querynode can afford,the new
replica won't be loaded successfully until enough querynode joins.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #35846
querycoord will notify proxy to update shard leader cache after
delegator location changes, but during querynode's failure recovery,
some delegator may become unserviceable due to lacking of segments, and
back to serviceable after segment loaded, so we also need to notify
proxy to invalidate shard leader cache when delegator serviceable state
changes.
This PR will maintain querynode's serviceable state during heartbeat,
and notify proxy to invalidate shard leader cache if serviceable state
changes.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33744
This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
See also #34746
This PR add segment level field in response of
`GetPersistentSegmentInfo` and `GetQuerySegmentInfo`
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33103
when try to do stopping balance for stopping query node, balancer will
try to get node list from replica.GetNodes, then check whether node is
stopping, if so, stopping balance will be triggered for this replica.
after the replica refactor, replica.GetNodes only return rwNodes, and
the stopping node maintains in roNodes, so balancer couldn't find
replica which contains stopping node, and stopping balance for replica
won't be triggered, then query node will stuck forever due to
segment/channel doesn't move out.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #32910
* split replica's node list to channels when create replicas
* balance nodes among channels when node change happens
* implement channel level balance, let balance happens in channel level
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #32466
this PR enhance that when shard location changed, update proxy's shard
leader cache. in case of query node failover case, proxy can find
replica recover
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30647
- Add declarative resource group api
- Add config for resource group management
- Resource group recovery enhancement
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #30647
- ReplicaManager manage read only node now, and always do persistent of
node distribution of replica.
- All segment/channel checker using ReplicaManager to get read-only node
or read-write node, but not ResourceManager.
- ReplicaManager promise that only apply unique querynode to one replica
in same collection now (replicas in same collection never hold same
querynode at same time).
- ReplicaManager promise that fairly node count assignment policy if
multi replicas of collection is assigned to one resource group.
- Move some parameters check into ReplicaManager to avoid data race.
- Allow transfer replica to resource group that already load replica of
same collection
- Allow transfer node between resource groups that load replica of same
collection
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #29261
This PR Add restful api for devops to execute rolling upgrade, including
suspend/resume balance and manual transfer segments/channels.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also: #29650
Either segment dml position & channel checkpoint could be newer in some
cases. This PR make PackLoadSegments use the newer one improving load
performance during cases where there are lots of upsert.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #29453
sync distribution by rpc will also call loadSegment/releaseSegment,
which may cause all kinds of concurrent case on same segment, such as
concurrent load and release on one segment.
This PR add leader_checker which generate load/release task to correct
the leader view, instead of calling sync distribution by rpc
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
we consume the delta data from the lastest channel checkpoint while
loading segment,
this works well without level 0 segments, but now it may lead to miss
some delta data,
so we have to consume from the current target's channel checkpoint
related: #27349
---------
Signed-off-by: yah01 <yah2er0ne@outlook.com>