issue: #34095
When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.
This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
when querycoord process segment task, it will try to iterate whole
segment list to checke whether segment is loaded, which cost too much
cpu if there has thousands of segments.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #31091
This PR add GetByFilter interface in leader view manager, instead of all
kind of get func
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30816
check stale rules for leader task:
1. for reduce leader task, it should keep executing until leader's node
become offline.
2. for grow leader task,it should keep executing until leader's node
become stopping.
This PR check leader node's stopping state for grow leader task
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #31480#31481
release duplicate l0 segment task, which execute on old delegator may
cause segment lack, and execute on new delegator may break new
delegator's leader view.
This PR skip release duplicate l0 segment by segment_checker, cause l0
segment will be released with unsub channel
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30816
pr #31319 introduce the logic that segment checker need to load level
zero segment which only exist in current target.
This PR fix load segment task promote failed when segment only belongs
to current target
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
This PR add metrics for task latency in querycoord scheduler, so if any
kind of task stuck, it's easy to figure out by metrics
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30186
during channel balance, after new delegator loaded, instead of syncing
l0 segment's location to new delegator, we should load l0 segment on new
delegator, and release the old l0 segment, then start to release old
delegator.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30890
when leader checker find that leader view has an older load version of
segment, it will try to correct leader view. but the sync action doesn't
specify the latest load version. so the update operation will failed.
This PR fix leader checker can't update segment's load version and
keeping generate same task to scheduler.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #31103
Since querycoord need index meta information from datacoord only, broker
shall use `ListIndexes` to skip segment index building check logic in
datacoord
This PR is also related to #30538, in which DescribeIndex caused lots of
memory usage and lead to OOM eventually
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #30150
`checkLeaderTaskStale` will check segment whether exist on next current
for leaderTask's growing action, which will cause promote leader task
failed when segment only exist on current target
This PR will check segment for both current or next target.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30150
see also: #30258
cause `SyncDataDistribution` will try to load delta for segment. if miss
indexInfo in request, sync action will failed due to lack of index info.
This PR set indexinfo when try to set segment to leader view
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30723
This PR skip generate balance task when collection's target isn't ready.
also refine the check stale logic in query coord's scheduler, if channel
exist in current or next target, task won't be canceled.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #30150
For leader view distribution with offline nodes, a release task can
never be sent to querynode due to targetNode online check logic. Even
the request is dispatched, normal release task does not have "force"
flag when calling `delegator.ReleaseSegment`.
This PR adds a new type of querycoord task: LeaderTask, the
responsibility of which is to rectify leader view distribtion.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #30150
This PR fix three problems:
1. leader checker use wrong node id when generate release task, which
cause the release task finished immediately
2. the release request generated by leader_checker doesn't set the
`force` flag, the operation to clean leader view on delegator will fail.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #29841
if segment loaded, submit load segment task for it isn't permitted, to
avoid load segment twice. but this logic blocks the leader checker to
correct leader view by `LoadSegment`
This PR remove the segment loaded check, to fix that leader checker
cann't submit load task
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
the recent changes move the level 0 segments list to a new proto field,
which leads to the QueryCoord can't see the level 0 segments, handle the
new changes
fix#29907
Signed-off-by: yah01 <yang.cen@zilliz.com>
See also #29803
This PR:
- Add trace span for collection/partition load
- Use TraceSpan to generate Segment/ChannelTasks when loading
- Refine BaseTask trace tag usage
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #29453
sync distribution by rpc will also call loadSegment/releaseSegment,
which may cause all kinds of concurrent case on same segment, such as
concurrent load and release on one segment.
This PR add leader_checker which generate load/release task to correct
the leader view, instead of calling sync distribution by rpc
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #27675
For `GetRecoveryInfo` & `GetRecoveryInfoV2`, Level zero segment ids
shall be specified in vchan info so that querycoord could re-fetch
current segment info during watch procedure without having all segment
info
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #29575
Add `getCollectionTarget` method which is atomic when scope is
`CurrentTargetFirst` or `NextTargetFirst`
Also return error when executor finds no channel in target manager
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
`WatchDmChannel` only need growing segment info, this PR removes fetch
segmentInfos when fill watch dml channel request.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
the executor always fetches the latest segment info, so we could consume
from the latest checkpoint, which could save much time while deleted
many entities
Signed-off-by: yah01 <yang.cen@zilliz.com>
support enable/disable mmap for index, the user could alter the index's
mode by `AlterIndex` method
related: https://github.com/milvus-io/milvus/issues/21866
---------
Signed-off-by: yah01 <yah2er0ne@outlook.com>
Signed-off-by: yah01 <yang.cen@zilliz.com>
we consume the delta data from the lastest channel checkpoint while
loading segment,
this works well without level 0 segments, but now it may lead to miss
some delta data,
so we have to consume from the current target's channel checkpoint
related: #27349
---------
Signed-off-by: yah01 <yah2er0ne@outlook.com>
issue: #28831
release old delegator before new delegator update it's distribution may
cause `channel not availble` error
This PR will block release old delgator before new delegator finish
`syncDistribution`
Signed-off-by: Wei Liu <wei.liu@zilliz.com>