Commit Graph

527 Commits

Author SHA1 Message Date
wei liu
0201e00a2f
enhance: enable to set load config in cluster level (#35293)
issue: #35170
pr: #35169
This PR enable to set load configs in cluster level, such as replicas
and resource groups. then when load collections will use the load
config.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-07 12:38:21 +08:00
wei liu
2ac1bf7532
enhance: Enable setting the replica number and resource group during collection creation (#34403) (#34561)
issue: #30040
pr: #34403

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-06 15:06:17 +08:00
wei liu
d48c690cb3
enhance: Avoid unnecesary syncTargetVersion func call after querycoord recover (#34954) (#35234)
pr: #34954
before querycoord stop gracefully, we will save the current target to
meta store and recover it after querycoord start up, to speed the
querycoord's recovery time. but the target version hasn't been recovered
as expected, and it use latest timestamp as current target's version,
which has no effect to querycoord but an unnecessary syncTargetVersion
func call.

This PR recover the correct target version to avoid unnecessary
syncTargetVersion func call

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-05 10:18:16 +08:00
Chun Han
58f7c35b75
enhance: add log for partition stats(#30376) (#35220)
related: #30376
pr: https://github.com/milvus-io/milvus/pull/35219

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-08-02 19:34:21 +08:00
wei liu
11578772ef
fix: Set legacy level to l0 segment after qc restart (#35197) (#35211)
issue: #35087
pr: #35197
after qc restarts, and target is not ready yet, if dist_handler try to
update segment dist, it will set legacy level to l0 segment, which may
cause l0 segment be moved to other node, cause search/query failed.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-02 18:22:15 +08:00
cai.zhang
756922ebec
fix: [cherry-pick] Maintain load idempotency even when building new indexes (#35179)
issue: #34404 

master pr: #35178

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-08-02 17:28:15 +08:00
wei liu
5f601fcc50
enhance: Reduce delegator memory overloaded factor to 0.1 (#35092) (#35164)
pr: #35092

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-08-01 14:20:13 +08:00
congqixia
8991dc211e
enhance: [2.4] Fix go&cpp lint issues (#35107)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-30 20:25:55 +08:00
Jiquan Long
86edca8c1b
fix: support auto index for array (#35095)
/kind branch-feature
pr: #34450

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
Co-authored-by: Zhagnlu <lu.zhang@zilliz.com>
2024-07-30 17:57:50 +08:00
congqixia
d16320705e
enhance: [2.4] Add Segment Level in milvus segment info APIs (#34763) (#35023)
Cherry-pick from master
pr: #34763
See also #34746

This PR add segment level field in response of
`GetPersistentSegmentInfo` and `GetQuerySegmentInfo`

---------

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-29 10:11:52 +08:00
wei liu
b3bc7f3985
enhance: Limit collection's normal balance speed (#34810) (#34987)
issue: #34798
pr: #34810

after we remove the task priority on query coord, to avoid load/release
segment blocked by too much balance task, we limit the balance task size
in each round. at same time, we reduce the balance interval to trigger
balance more frequently.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-26 10:13:46 +08:00
jaime
77ae127a62
fix: check collection health(queryable) fail for releasing collection (#34948)
issue: #34946
pr: #34947

---------

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-25 10:25:57 +08:00
wei liu
8c96026722
fix: Segment may bounce between delegator and worker (#34904)
issue: #34595
pr: #34830

pr#34596 to we add an overloaded factor to segment in delegator, which
cause same segment got different score in delegator and worker. which
may cause segment bounce between delegator and worker.

This PR use average score to compute the delegator overloaded factor, to
avoid segment bounce between delegator and worker.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-23 15:57:49 +08:00
wei liu
ebbccb870c
fix: Avoid segment lack caused by deduplicate segment task (#34782) (#34903)
issue: #34781
pr: #34782

when balance segment hasn't finished yet, query coord may found 2 loaded
copy of segment, then it will generate task to deduplicate, which may
cancel the balance task. then the old copy has been released, and the
new copy hasn't be ready yet but canceled, then search failed by segment
lack.

this PR set deduplicate segment task's proirity to low, to avoid balance
segment task canceled by deduplicate task.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-23 11:06:15 +08:00
wei liu
cf701a9bf0
enhance: Preserve fixed-size memory in delegator node for growing segment (#34600)
issue: #34595
pr: #34596
When consuming insert data on the delegator node, QueryCoord will move
out some sealed segments to manage its memory usage. After the growing
segment gets flushed, some sealed segments from other workers will be
moved back to the delegator node. To avoid the frequent movement of
segments, we estimate the maximum growing row count and preserve a
fixed-size memory in the delegator node.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-15 20:51:46 +08:00
wayblink
c62bf8a0b0
fix: [Cherry-pick]Pick major compaction fixs and optimizations (#34360)
This PR cherry-picks the following commits:

- fix: sync partitiion stats blocking balance task #33742
- fix: Fix meta prefix overlap bug #33830
- fix: Small fixs of major compaction #33929 
- fix: Fix memory buffer error & some renaming #33850
- fix: sync part stats task cannot be finished #34027 
- Add an option to enable/disable vector field clustering key #34097
- fix: fix error ignore in compactor #34169
- fix:load major compaction partial result #34052
- Use new stream segment reader in clustering compaction #34232

issue: #30633
pr: #33742 #33830 #33929 #33850 #34027 #34097 #34169 #34052 #34232

---------

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
Signed-off-by: wayblink <anyang.wang@zilliz.com>
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: Chun Han <116052805+MrPresent-Han@users.noreply.github.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
2024-07-03 09:53:37 +08:00
wayblink
99586066f5
feat: [cherry-pick] Major compaction (#34326)
This PR cherry-picks the following commits:
fix: speed up segment lookup via channel name in datacoord (#33530)
needed by the next commit
  feat: Major compaction (#33620)

issue: #30633
pr: #33620

---------

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
Signed-off-by: wayblink <anyang.wang@zilliz.com>
Co-authored-by: yiwangdr <80064917+yiwangdr@users.noreply.github.com>
Co-authored-by: MrPresent-Han <chun.han@zilliz.com>
2024-07-02 18:29:01 +08:00
congqixia
4aa8a12ce8
fix: [2.4] Check partition in current target when observing partition load status (#34282) (#34305)
Cherry-pick from master
pr: #34282
See also #34234

`LoadPartitions` does not guarantee the current target has loading
partitions if there are some partitions already loaded before.

This PR check current target contains the partition to load when
advancing loading percentage to 100.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-02 15:48:10 +08:00
wei liu
92b7eebb53
enhance: Skip update index for L0 segment (#34099) (#34280)
pr: #34280
try to update index for l0 segment, will failed by `index not found`

This PR skip update index for l0 segment

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-01 16:32:07 +08:00
wei liu
b18de95817
enhance: Avoid assign too much segment/channels to new querynode (#34096) (#34245)
issue: #34095
pr: #34096

When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-07-01 10:32:06 +08:00
jaime
0992f10694
enhance: improve check health (#34265)
issue: https://github.com/milvus-io/milvus/issues/34264
pr: #33800

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-01 10:18:07 +08:00
jaime
6423b6c718
enhance: move rocksmq from internal to pkg (#34165)
pr:  https://github.com/milvus-io/milvus/pull/33881
issue:  https://github.com/milvus-io/milvus/issues/33956

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-06-26 13:36:05 +08:00
congqixia
26b2e1d43c
fix: [2.4] Make querycoord panick when rg metastore sync fail (#34106) (#34127)
Cherry-pick from master
pr: #34106
See also #34047

When `unassignNode` sync resource group with node removed failed

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-06-26 10:04:03 +08:00
wei liu
061a00c58f
enhance: Enable database level replica num and resource groups for loading collection (#33052) (#33981)
pr: #33052

issue: #30040

This PR introduce two database level props:
1. database.replica.number
2. database.resource_groups

User can set those two database props by AlterDatabase API, then can
load collection without specified replica_num and resource groups. then
it will use database level load param when try to load collections.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-21 16:56:02 +08:00
wei liu
7d1d5a838a
fix: Fix GetReplicas API return nil status (#33715) (#34019)
issue: #33702
pr: #33715

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-21 10:26:02 +08:00
wei liu
fbc8fb3cb2
enhance: Skip return data distribution if no change happen (#32814) (#33985)
issue: #32813
pr: #32814

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-21 10:24:12 +08:00
wei liu
87508c3390
enhance: Avoid to iterate whole segment list for each task's process(#33943) (#33976)
pr: #33943

when querycoord process segment task, it will try to iterate whole
segment list to checke whether segment is loaded, which cost too much
cpu if there has thousands of segments.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-06-20 10:00:05 +08:00
SimFG
f664b51ebe
enhance: [2.4] try to speed up the loading of small collections (#33746)
- issue: #33569
- pr: #33570

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-06-11 15:07:55 +08:00
yihao.dai
ed1dee9e38
enhance: Support L0 import (#33514) (#33712)
issue: https://github.com/milvus-io/milvus/issues/33157

pr: https://github.com/milvus-io/milvus/pull/33514

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-06-08 11:17:52 +08:00
wayblink
deebae70a7
fix:[cherry-pick]Panic if ProcessActiveStandBy returns error (#33372)
pr:#33369
issue:#33368

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-05-27 10:13:59 +08:00
jaime
8990b8b051
fix: correct error of metrics stats (#33305)
issue: #32980
cherry pick from master
pr:  #33075 #33255

---------

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-05-24 09:15:41 +08:00
wei liu
32bfd9befa
enhance: Enable to dynamic update balancer policy in querycoord (#33037) (#33272)
issue: #33036
pr: #33037
This PR enable to dynamic update balancer policy without restart
querycoord.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-23 15:43:41 +08:00
wei liu
4b8680894f
fix: Clean offline node from resource group after qc restart (#33233)
issue: #33200 #33207
pr: #33232
pr#33104 causes the offline node will be kept in resource group after qc
recover, and offline node will be assign to new replica as rwNode, then
request send to those node will fail by NodeNotFound.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-22 14:07:39 +08:00
wei liu
9ae4945df2
fix: query node may stuck at stopping progress (#33104) (#33154)
issue: #33103 
pr: #33104
when try to do stopping balance for stopping query node, balancer will
try to get node list from replica.GetNodes, then check whether node is
stopping, if so, stopping balance will be triggered for this replica.

after the replica refactor, replica.GetNodes only return rwNodes, and
the stopping node maintains in roNodes, so balancer couldn't find
replica which contains stopping node, and stopping balance for replica
won't be triggered, then query node will stuck forever due to
segment/channel doesn't move out.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-20 15:01:43 +08:00
cai.zhang
6ea7633bd5
enhance: Add memory size for binlog (#33025)
issue: #33005
1. add `MemorySize` field for insert binlog.
2. `LogSize` means the file size in the storage object.
3. `MemorySize` means the size of the data in the memory.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
2024-05-15 12:59:34 +08:00
SimFG
1d48d0aeb2
enhance: use different value to get related data size according to segment type (#33017)
issue: #30436

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-05-14 14:59:33 +08:00
congqixia
861977ab60
fix: Start LeaderCacheObserver before SyncAll (#33035)
Related to #33033

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-05-14 13:25:32 +08:00
wei liu
cba2c7a3be
enhance: clean channel node info in meta store (#32988)
issue: #32910
see also: #32911
when channel exclusive mode is enabled, replica will record channel node
info in meta store, and if the balance policy changes, which means
channel exclusive mode is disabled, we should clean up the channel node
info in meta store, and stop to balance node between channels.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-14 10:05:40 +08:00
chyezh
293f14a8b9
fix: remove redundant replica recover (#32985)
issue: #22288 

- replica recover should be only triggered by replica recover

Signed-off-by: chyezh <chyezh@outlook.com>
2024-05-13 15:25:32 +08:00
Xiaofan
b044e5503e
enhance:Improve load speed (#32898)
fix #32897
add memory check when load collection

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-05-11 10:29:31 +08:00
chyezh
1c84a1c9b6
fix: lru related issue fixup patch (#32916)
issue: #32206, #32801

- search failure with some assertion, segment not loaded and resource
insufficient.

- segment leak when query segments

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-05-10 19:17:30 +08:00
wei liu
e2332bdc17
enhance: Enable channel exclusive balance policy (#32911)
issue: #32910  
* split replica's node list to channels when create replicas
 * balance nodes among channels when node change happens
 * implement channel level balance, let balance happens in channel level

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-10 17:27:31 +08:00
wei liu
04a8ec69f6
fix: Segment on stopping query node can't be release successfully (#32929)
issue: #32901
Cause release segment request need be send to delegator, but it need
replica to info find segment's delegator. but the stopping query node
will be marked as read only in replica, then `replica.Contains()` just
return true for rwNode in replica. then it can't get replica info by
stopping query node and release segment will be blocked.

This PR make `replica.Contains()` return true for both roNode and
rwNode.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-10 14:33:30 +08:00
Bingyi Sun
b7ef8da360
fix: set channel checkpoint to delta position (#32878)
issue: https://github.com/milvus-io/milvus/issues/32853

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-05-10 11:51:30 +08:00
congqixia
efa58ae423
enhance: Utilize coll2replica mapping when getting rg by collection (#32892)
See also #32165

In old `GetResourceGroupByCollection` implementation, it iterates all
replicas to match collection id, which is slow and CPU time consuming.
This PR make it utilize the coll2Replicas mapping by calling
`GetByCollection` and mapping replicas into resource group.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-05-09 19:37:30 +08:00
congqixia
acb0417a9f
enhance: Avoid iteration over channel results when update leaderview (#32887)
See also #32165

Cache channel name to channel info to avoid iteration over channel
results when updating leader view version.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-05-09 15:41:30 +08:00
wei liu
fad8f0afa5
enhance: enable stopping balance after balance has been suspended (#32812)
issue: #32811

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-08 10:15:29 +08:00
wei liu
ba02d54a30
enhance: update shard leader cache when leader location changed (#32470)
issue: #32466

this PR enhance that when shard location changed, update proxy's shard
leader cache. in case of query node failover case, proxy can find
replica recover

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-05-08 10:05:29 +08:00
yihao.dai
9db3aa18bc
enhance: Remove deprecated EnableIndex (#32704)
/kind improvement

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-05-07 17:11:30 +08:00
chyezh
b904c8d377
enhance: resource group unittest refactory (#32739)
issue: #30647

Signed-off-by: chyezh <chyezh@outlook.com>
2024-05-06 10:17:34 +08:00