Cherry-pick from master
pr: #34763
See also #34746
This PR add segment level field in response of
`GetPersistentSegmentInfo` and `GetQuerySegmentInfo`
---------
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #34798
pr: #34810
after we remove the task priority on query coord, to avoid load/release
segment blocked by too much balance task, we limit the balance task size
in each round. at same time, we reduce the balance interval to trigger
balance more frequently.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #34595
pr: #34830
pr#34596 to we add an overloaded factor to segment in delegator, which
cause same segment got different score in delegator and worker. which
may cause segment bounce between delegator and worker.
This PR use average score to compute the delegator overloaded factor, to
avoid segment bounce between delegator and worker.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #34781
pr: #34782
when balance segment hasn't finished yet, query coord may found 2 loaded
copy of segment, then it will generate task to deduplicate, which may
cancel the balance task. then the old copy has been released, and the
new copy hasn't be ready yet but canceled, then search failed by segment
lack.
this PR set deduplicate segment task's proirity to low, to avoid balance
segment task canceled by deduplicate task.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #34595
pr: #34596
When consuming insert data on the delegator node, QueryCoord will move
out some sealed segments to manage its memory usage. After the growing
segment gets flushed, some sealed segments from other workers will be
moved back to the delegator node. To avoid the frequent movement of
segments, we estimate the maximum growing row count and preserve a
fixed-size memory in the delegator node.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
This PR cherry-picks the following commits:
fix: speed up segment lookup via channel name in datacoord (#33530)
needed by the next commit
feat: Major compaction (#33620)
issue: #30633
pr: #33620
---------
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
Signed-off-by: wayblink <anyang.wang@zilliz.com>
Co-authored-by: yiwangdr <80064917+yiwangdr@users.noreply.github.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
Cherry-pick from master
pr: #34282
See also #34234
`LoadPartitions` does not guarantee the current target has loading
partitions if there are some partitions already loaded before.
This PR check current target contains the partition to load when
advancing loading percentage to 100.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
pr: #34280
try to update index for l0 segment, will failed by `index not found`
This PR skip update index for l0 segment
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #34095
pr: #34096
When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.
This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Cherry-pick from master
pr: #34106
See also #34047
When `unassignNode` sync resource group with node removed failed
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
pr: #33052
issue: #30040
This PR introduce two database level props:
1. database.replica.number
2. database.resource_groups
User can set those two database props by AlterDatabase API, then can
load collection without specified replica_num and resource groups. then
it will use database level load param when try to load collections.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
pr: #33943
when querycoord process segment task, it will try to iterate whole
segment list to checke whether segment is loaded, which cost too much
cpu if there has thousands of segments.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33036
pr: #33037
This PR enable to dynamic update balancer policy without restart
querycoord.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33200#33207
pr: #33232
pr#33104 causes the offline node will be kept in resource group after qc
recover, and offline node will be assign to new replica as rwNode, then
request send to those node will fail by NodeNotFound.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33103
pr: #33104
when try to do stopping balance for stopping query node, balancer will
try to get node list from replica.GetNodes, then check whether node is
stopping, if so, stopping balance will be triggered for this replica.
after the replica refactor, replica.GetNodes only return rwNodes, and
the stopping node maintains in roNodes, so balancer couldn't find
replica which contains stopping node, and stopping balance for replica
won't be triggered, then query node will stuck forever due to
segment/channel doesn't move out.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33005
1. add `MemorySize` field for insert binlog.
2. `LogSize` means the file size in the storage object.
3. `MemorySize` means the size of the data in the memory.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
issue: #32910
see also: #32911
when channel exclusive mode is enabled, replica will record channel node
info in meta store, and if the balance policy changes, which means
channel exclusive mode is disabled, we should clean up the channel node
info in meta store, and stop to balance node between channels.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #32206, #32801
- search failure with some assertion, segment not loaded and resource
insufficient.
- segment leak when query segments
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #32910
* split replica's node list to channels when create replicas
* balance nodes among channels when node change happens
* implement channel level balance, let balance happens in channel level
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #32901
Cause release segment request need be send to delegator, but it need
replica to info find segment's delegator. but the stopping query node
will be marked as read only in replica, then `replica.Contains()` just
return true for rwNode in replica. then it can't get replica info by
stopping query node and release segment will be blocked.
This PR make `replica.Contains()` return true for both roNode and
rwNode.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #32165
In old `GetResourceGroupByCollection` implementation, it iterates all
replicas to match collection id, which is slow and CPU time consuming.
This PR make it utilize the coll2Replicas mapping by calling
`GetByCollection` and mapping replicas into resource group.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also #32165
Cache channel name to channel info to avoid iteration over channel
results when updating leader view version.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32466
this PR enhance that when shard location changed, update proxy's shard
leader cache. in case of query node failover case, proxy can find
replica recover
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #32449
to avoid GetShardLeaders return empty node list, this PR add node list
check in both client side and server side.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
related to #32165
1. for all the manager, support collection level index
2. remove collection level filter to avoid extra cpu usage when
collection number increases
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
issue: #30647
- Remove error report if there's no query node serve. It's hard for
programer to use it to do resource management.
- Change resource group `transferNode` logic to keep compatible with old
version sdk.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
See also #32165
Add segment dist and leader view filter criterion struct to store
frequent filter conditions.
Add collection/channel filter results for these two meta
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>