milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2024-12-03 04:19:18 +08:00

Author	SHA1	Message	Date
congqixia	d16320705e	enhance: [2.4] Add Segment Level in milvus segment info APIs (#34763 ) (#35023 ) Cherry-pick from master pr: #34763 See also #34746 This PR add segment level field in response of `GetPersistentSegmentInfo` and `GetQuerySegmentInfo` --------- --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-29 10:11:52 +08:00
wei liu	b3bc7f3985	enhance: Limit collection's normal balance speed (#34810 ) (#34987 ) issue: #34798 pr: #34810 after we remove the task priority on query coord, to avoid load/release segment blocked by too much balance task, we limit the balance task size in each round. at same time, we reduce the balance interval to trigger balance more frequently. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-26 10:13:46 +08:00
jaime	77ae127a62	fix: check collection health(queryable) fail for releasing collection (#34948 ) issue: #34946 pr: #34947 --------- Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-07-25 10:25:57 +08:00
wei liu	8c96026722	fix: Segment may bounce between delegator and worker (#34904 ) issue: #34595 pr: #34830 pr#34596 to we add an overloaded factor to segment in delegator, which cause same segment got different score in delegator and worker. which may cause segment bounce between delegator and worker. This PR use average score to compute the delegator overloaded factor, to avoid segment bounce between delegator and worker. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-23 15:57:49 +08:00
wei liu	ebbccb870c	fix: Avoid segment lack caused by deduplicate segment task (#34782 ) (#34903 ) issue: #34781 pr: #34782 when balance segment hasn't finished yet, query coord may found 2 loaded copy of segment, then it will generate task to deduplicate, which may cancel the balance task. then the old copy has been released, and the new copy hasn't be ready yet but canceled, then search failed by segment lack. this PR set deduplicate segment task's proirity to low, to avoid balance segment task canceled by deduplicate task. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-23 11:06:15 +08:00
wei liu	cf701a9bf0	enhance: Preserve fixed-size memory in delegator node for growing segment (#34600 ) issue: #34595 pr: #34596 When consuming insert data on the delegator node, QueryCoord will move out some sealed segments to manage its memory usage. After the growing segment gets flushed, some sealed segments from other workers will be moved back to the delegator node. To avoid the frequent movement of segments, we estimate the maximum growing row count and preserve a fixed-size memory in the delegator node. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-15 20:51:46 +08:00
wayblink	c62bf8a0b0	fix: [Cherry-pick]Pick major compaction fixs and optimizations (#34360 ) This PR cherry-picks the following commits: - fix: sync partitiion stats blocking balance task #33742 - fix: Fix meta prefix overlap bug #33830 - fix: Small fixs of major compaction #33929 - fix: Fix memory buffer error & some renaming #33850 - fix: sync part stats task cannot be finished #34027 - Add an option to enable/disable vector field clustering key #34097 - fix: fix error ignore in compactor #34169 - fix:load major compaction partial result #34052 - Use new stream segment reader in clustering compaction #34232 issue: #30633 pr: #33742 #33830 #33929 #33850 #34027 #34097 #34169 #34052 #34232 --------- Signed-off-by: MrPresent-Han <chun.han@zilliz.com> Signed-off-by: wayblink <anyang.wang@zilliz.com> Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: Chun Han <116052805+MrPresent-Han@users.noreply.github.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2024-07-03 09:53:37 +08:00
wayblink	99586066f5	feat: [cherry-pick] Major compaction (#34326 ) This PR cherry-picks the following commits: fix: speed up segment lookup via channel name in datacoord (#33530) needed by the next commit feat: Major compaction (#33620) issue: #30633 pr: #33620 --------- Signed-off-by: yiwangdr <yiwangdr@gmail.com> Signed-off-by: wayblink <anyang.wang@zilliz.com> Co-authored-by: yiwangdr <80064917+yiwangdr@users.noreply.github.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com>	2024-07-02 18:29:01 +08:00
congqixia	4aa8a12ce8	fix: [2.4] Check partition in current target when observing partition load status (#34282 ) (#34305 ) Cherry-pick from master pr: #34282 See also #34234 `LoadPartitions` does not guarantee the current target has loading partitions if there are some partitions already loaded before. This PR check current target contains the partition to load when advancing loading percentage to 100. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-02 15:48:10 +08:00
wei liu	92b7eebb53	enhance: Skip update index for L0 segment (#34099 ) (#34280 ) pr: #34280 try to update index for l0 segment, will failed by `index not found` This PR skip update index for l0 segment Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-01 16:32:07 +08:00
wei liu	b18de95817	enhance: Avoid assign too much segment/channels to new querynode (#34096 ) (#34245 ) issue: #34095 pr: #34096 When a new query node comes online, the segment_checker, channel_checker, and balance_checker simultaneously attempt to allocate segments to it. If this occurs during the execution of a load task and the distribution of the new query node hasn't been updated, the query coordinator may mistakenly view the new query node as empty. As a result, it assigns segments or channels to it, potentially overloading the new query node with more segments or channels than expected. This PR measures the workload of the executing tasks on the target query node to prevent assigning an excessive number of segments to it. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-01 10:32:06 +08:00
jaime	0992f10694	enhance: improve check health (#34265 ) issue: https://github.com/milvus-io/milvus/issues/34264 pr: #33800 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-07-01 10:18:07 +08:00
jaime	6423b6c718	enhance: move rocksmq from internal to pkg (#34165 ) pr: https://github.com/milvus-io/milvus/pull/33881 issue: https://github.com/milvus-io/milvus/issues/33956 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-06-26 13:36:05 +08:00
congqixia	26b2e1d43c	fix: [2.4] Make querycoord panick when rg metastore sync fail (#34106 ) (#34127 ) Cherry-pick from master pr: #34106 See also #34047 When `unassignNode` sync resource group with node removed failed Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-26 10:04:03 +08:00
wei liu	061a00c58f	enhance: Enable database level replica num and resource groups for loading collection (#33052 ) (#33981 ) pr: #33052 issue: #30040 This PR introduce two database level props: 1. database.replica.number 2. database.resource_groups User can set those two database props by AlterDatabase API, then can load collection without specified replica_num and resource groups. then it will use database level load param when try to load collections. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-21 16:56:02 +08:00
wei liu	7d1d5a838a	fix: Fix GetReplicas API return nil status (#33715 ) (#34019 ) issue: #33702 pr: #33715 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-21 10:26:02 +08:00
wei liu	fbc8fb3cb2	enhance: Skip return data distribution if no change happen (#32814 ) (#33985 ) issue: #32813 pr: #32814 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-21 10:24:12 +08:00
wei liu	87508c3390	enhance: Avoid to iterate whole segment list for each task's process(#33943 ) (#33976 ) pr: #33943 when querycoord process segment task, it will try to iterate whole segment list to checke whether segment is loaded, which cost too much cpu if there has thousands of segments. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-20 10:00:05 +08:00
SimFG	f664b51ebe	enhance: [2.4] try to speed up the loading of small collections (#33746 ) - issue: #33569 - pr: #33570 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-06-11 15:07:55 +08:00
yihao.dai	ed1dee9e38	enhance: Support L0 import (#33514 ) (#33712 ) issue: https://github.com/milvus-io/milvus/issues/33157 pr: https://github.com/milvus-io/milvus/pull/33514 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-06-08 11:17:52 +08:00
wayblink	deebae70a7	fix:[cherry-pick]Panic if ProcessActiveStandBy returns error (#33372 ) pr:#33369 issue:#33368 Signed-off-by: wayblink <anyang.wang@zilliz.com>	2024-05-27 10:13:59 +08:00
jaime	8990b8b051	fix: correct error of metrics stats (#33305 ) issue: #32980 cherry pick from master pr: #33075 #33255 --------- Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-05-24 09:15:41 +08:00
wei liu	32bfd9befa	enhance: Enable to dynamic update balancer policy in querycoord (#33037 ) (#33272 ) issue: #33036 pr: #33037 This PR enable to dynamic update balancer policy without restart querycoord. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-23 15:43:41 +08:00
wei liu	4b8680894f	fix: Clean offline node from resource group after qc restart (#33233 ) issue: #33200 #33207 pr: #33232 pr#33104 causes the offline node will be kept in resource group after qc recover, and offline node will be assign to new replica as rwNode, then request send to those node will fail by NodeNotFound. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-22 14:07:39 +08:00
wei liu	9ae4945df2	fix: query node may stuck at stopping progress (#33104 ) (#33154 ) issue: #33103 pr: #33104 when try to do stopping balance for stopping query node, balancer will try to get node list from replica.GetNodes, then check whether node is stopping, if so, stopping balance will be triggered for this replica. after the replica refactor, replica.GetNodes only return rwNodes, and the stopping node maintains in roNodes, so balancer couldn't find replica which contains stopping node, and stopping balance for replica won't be triggered, then query node will stuck forever due to segment/channel doesn't move out. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-20 15:01:43 +08:00
cai.zhang	6ea7633bd5	enhance: Add memory size for binlog (#33025 ) issue: #33005 1. add `MemorySize` field for insert binlog. 2. `LogSize` means the file size in the storage object. 3. `MemorySize` means the size of the data in the memory. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2024-05-15 12:59:34 +08:00
SimFG	1d48d0aeb2	enhance: use different value to get related data size according to segment type (#33017 ) issue: #30436 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-05-14 14:59:33 +08:00
congqixia	861977ab60	fix: Start `LeaderCacheObserver` before `SyncAll` (#33035 ) Related to #33033 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-14 13:25:32 +08:00
wei liu	cba2c7a3be	enhance: clean channel node info in meta store (#32988 ) issue: #32910 see also: #32911 when channel exclusive mode is enabled, replica will record channel node info in meta store, and if the balance policy changes, which means channel exclusive mode is disabled, we should clean up the channel node info in meta store, and stop to balance node between channels. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-14 10:05:40 +08:00
chyezh	293f14a8b9	fix: remove redundant replica recover (#32985 ) issue: #22288 - replica recover should be only triggered by replica recover Signed-off-by: chyezh <chyezh@outlook.com>	2024-05-13 15:25:32 +08:00
Xiaofan	b044e5503e	enhance:Improve load speed (#32898 ) fix #32897 add memory check when load collection Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-05-11 10:29:31 +08:00
chyezh	1c84a1c9b6	fix: lru related issue fixup patch (#32916 ) issue: #32206, #32801 - search failure with some assertion, segment not loaded and resource insufficient. - segment leak when query segments --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-05-10 19:17:30 +08:00
wei liu	e2332bdc17	enhance: Enable channel exclusive balance policy (#32911 ) issue: #32910 * split replica's node list to channels when create replicas * balance nodes among channels when node change happens * implement channel level balance, let balance happens in channel level Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-10 17:27:31 +08:00
wei liu	04a8ec69f6	fix: Segment on stopping query node can't be release successfully (#32929 ) issue: #32901 Cause release segment request need be send to delegator, but it need replica to info find segment's delegator. but the stopping query node will be marked as read only in replica, then `replica.Contains()` just return true for rwNode in replica. then it can't get replica info by stopping query node and release segment will be blocked. This PR make `replica.Contains()` return true for both roNode and rwNode. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-10 14:33:30 +08:00
Bingyi Sun	b7ef8da360	fix: set channel checkpoint to delta position (#32878 ) issue: https://github.com/milvus-io/milvus/issues/32853 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-05-10 11:51:30 +08:00
congqixia	efa58ae423	enhance: Utilize coll2replica mapping when getting rg by collection (#32892 ) See also #32165 In old `GetResourceGroupByCollection` implementation, it iterates all replicas to match collection id, which is slow and CPU time consuming. This PR make it utilize the coll2Replicas mapping by calling `GetByCollection` and mapping replicas into resource group. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-09 19:37:30 +08:00
congqixia	acb0417a9f	enhance: Avoid iteration over channel results when update leaderview (#32887 ) See also #32165 Cache channel name to channel info to avoid iteration over channel results when updating leader view version. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-09 15:41:30 +08:00
wei liu	fad8f0afa5	enhance: enable stopping balance after balance has been suspended (#32812 ) issue: #32811 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:15:29 +08:00
wei liu	ba02d54a30	enhance: update shard leader cache when leader location changed (#32470 ) issue: #32466 this PR enhance that when shard location changed, update proxy's shard leader cache. in case of query node failover case, proxy can find replica recover --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:05:29 +08:00
yihao.dai	9db3aa18bc	enhance: Remove deprecated EnableIndex (#32704 ) /kind improvement Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-05-07 17:11:30 +08:00
chyezh	b904c8d377	enhance: resource group unittest refactory (#32739 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-05-06 10:17:34 +08:00
wei liu	d900e68440	fix: fix GetShardLeaders return empty node list (#32685 ) issue: #32449 to avoid GetShardLeaders return empty node list, this PR add node list check in both client side and server side. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-29 14:19:26 +08:00
chyezh	ef4c875d4c	fix: resource group ut may failure (#32688 ) issue: https://github.com/milvus-io/milvus/issues/30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-29 14:17:26 +08:00
wei liu	c0555d4b45	fix: Remove read only node from replica immedaitely after node down (#32666 ) issue: #32665 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-28 20:25:25 +08:00
congqixia	4cdf6c3c41	fix: Check partition nil before observe load progress (#32659 ) See also #32441 #32615 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-28 16:29:25 +08:00
congqixia	a239e9110e	enhance: Apply node-indexing and cache optimization for channel dist (#32595 ) See also #32165 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-28 16:19:24 +08:00
Xiaofan	02ace25c68	enhance: reduce the cpu usage when collection number is high (#32245 ) related to #32165 1. for all the manager, support collection level index 2. remove collection level filter to avoid extra cpu usage when collection number increases Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-04-26 11:49:25 +08:00
chyezh	f06509bf97	fix: get replica should not report error when no querynode serve (#32536 ) issue: #30647 - Remove error report if there's no query node serve. It's hard for programer to use it to do resource management. - Change resource group `transferNode` logic to keep compatible with old version sdk. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-25 19:25:24 +08:00
chyezh	b287fbaa2e	fix: return collection on recovering but not collection not loaded when target is not recovered (#32447 ) issue: #32398 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-25 11:21:26 +08:00
congqixia	f30c22626e	enhance: Pre-cache result for frequent filters (#32580 ) See also #32165 Add segment dist and leader view filter criterion struct to store frequent filter conditions. Add collection/channel filter results for these two meta --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-25 11:13:25 +08:00

1 2 3 4 5 ...

518 Commits