milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2024-12-04 12:59:23 +08:00

Author	SHA1	Message	Date
jaime	77ae127a62	fix: check collection health(queryable) fail for releasing collection (#34948 ) issue: #34946 pr: #34947 --------- Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-07-25 10:25:57 +08:00
wayblink	99586066f5	feat: [cherry-pick] Major compaction (#34326 ) This PR cherry-picks the following commits: fix: speed up segment lookup via channel name in datacoord (#33530) needed by the next commit feat: Major compaction (#33620) issue: #30633 pr: #33620 --------- Signed-off-by: yiwangdr <yiwangdr@gmail.com> Signed-off-by: wayblink <anyang.wang@zilliz.com> Co-authored-by: yiwangdr <80064917+yiwangdr@users.noreply.github.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com>	2024-07-02 18:29:01 +08:00
congqixia	4aa8a12ce8	fix: [2.4] Check partition in current target when observing partition load status (#34282 ) (#34305 ) Cherry-pick from master pr: #34282 See also #34234 `LoadPartitions` does not guarantee the current target has loading partitions if there are some partitions already loaded before. This PR check current target contains the partition to load when advancing loading percentage to 100. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-02 15:48:10 +08:00
jaime	6423b6c718	enhance: move rocksmq from internal to pkg (#34165 ) pr: https://github.com/milvus-io/milvus/pull/33881 issue: https://github.com/milvus-io/milvus/issues/33956 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-06-26 13:36:05 +08:00
congqixia	26b2e1d43c	fix: [2.4] Make querycoord panick when rg metastore sync fail (#34106 ) (#34127 ) Cherry-pick from master pr: #34106 See also #34047 When `unassignNode` sync resource group with node removed failed Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-26 10:04:03 +08:00
wei liu	061a00c58f	enhance: Enable database level replica num and resource groups for loading collection (#33052 ) (#33981 ) pr: #33052 issue: #30040 This PR introduce two database level props: 1. database.replica.number 2. database.resource_groups User can set those two database props by AlterDatabase API, then can load collection without specified replica_num and resource groups. then it will use database level load param when try to load collections. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-21 16:56:02 +08:00
wei liu	fbc8fb3cb2	enhance: Skip return data distribution if no change happen (#32814 ) (#33985 ) issue: #32813 pr: #32814 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-21 10:24:12 +08:00
wei liu	87508c3390	enhance: Avoid to iterate whole segment list for each task's process(#33943 ) (#33976 ) pr: #33943 when querycoord process segment task, it will try to iterate whole segment list to checke whether segment is loaded, which cost too much cpu if there has thousands of segments. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-20 10:00:05 +08:00
jaime	8990b8b051	fix: correct error of metrics stats (#33305 ) issue: #32980 cherry pick from master pr: #33075 #33255 --------- Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-05-24 09:15:41 +08:00
wei liu	9ae4945df2	fix: query node may stuck at stopping progress (#33104 ) (#33154 ) issue: #33103 pr: #33104 when try to do stopping balance for stopping query node, balancer will try to get node list from replica.GetNodes, then check whether node is stopping, if so, stopping balance will be triggered for this replica. after the replica refactor, replica.GetNodes only return rwNodes, and the stopping node maintains in roNodes, so balancer couldn't find replica which contains stopping node, and stopping balance for replica won't be triggered, then query node will stuck forever due to segment/channel doesn't move out. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-20 15:01:43 +08:00
cai.zhang	6ea7633bd5	enhance: Add memory size for binlog (#33025 ) issue: #33005 1. add `MemorySize` field for insert binlog. 2. `LogSize` means the file size in the storage object. 3. `MemorySize` means the size of the data in the memory. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2024-05-15 12:59:34 +08:00
wei liu	cba2c7a3be	enhance: clean channel node info in meta store (#32988 ) issue: #32910 see also: #32911 when channel exclusive mode is enabled, replica will record channel node info in meta store, and if the balance policy changes, which means channel exclusive mode is disabled, we should clean up the channel node info in meta store, and stop to balance node between channels. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-14 10:05:40 +08:00
chyezh	293f14a8b9	fix: remove redundant replica recover (#32985 ) issue: #22288 - replica recover should be only triggered by replica recover Signed-off-by: chyezh <chyezh@outlook.com>	2024-05-13 15:25:32 +08:00
wei liu	e2332bdc17	enhance: Enable channel exclusive balance policy (#32911 ) issue: #32910 * split replica's node list to channels when create replicas * balance nodes among channels when node change happens * implement channel level balance, let balance happens in channel level Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-10 17:27:31 +08:00
wei liu	04a8ec69f6	fix: Segment on stopping query node can't be release successfully (#32929 ) issue: #32901 Cause release segment request need be send to delegator, but it need replica to info find segment's delegator. but the stopping query node will be marked as read only in replica, then `replica.Contains()` just return true for rwNode in replica. then it can't get replica info by stopping query node and release segment will be blocked. This PR make `replica.Contains()` return true for both roNode and rwNode. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-10 14:33:30 +08:00
congqixia	efa58ae423	enhance: Utilize coll2replica mapping when getting rg by collection (#32892 ) See also #32165 In old `GetResourceGroupByCollection` implementation, it iterates all replicas to match collection id, which is slow and CPU time consuming. This PR make it utilize the coll2Replicas mapping by calling `GetByCollection` and mapping replicas into resource group. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-09 19:37:30 +08:00
wei liu	ba02d54a30	enhance: update shard leader cache when leader location changed (#32470 ) issue: #32466 this PR enhance that when shard location changed, update proxy's shard leader cache. in case of query node failover case, proxy can find replica recover --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:05:29 +08:00
yihao.dai	9db3aa18bc	enhance: Remove deprecated EnableIndex (#32704 ) /kind improvement Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-05-07 17:11:30 +08:00
wei liu	c0555d4b45	fix: Remove read only node from replica immedaitely after node down (#32666 ) issue: #32665 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-28 20:25:25 +08:00
congqixia	a239e9110e	enhance: Apply node-indexing and cache optimization for channel dist (#32595 ) See also #32165 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-28 16:19:24 +08:00
Xiaofan	02ace25c68	enhance: reduce the cpu usage when collection number is high (#32245 ) related to #32165 1. for all the manager, support collection level index 2. remove collection level filter to avoid extra cpu usage when collection number increases Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-04-26 11:49:25 +08:00
chyezh	f06509bf97	fix: get replica should not report error when no querynode serve (#32536 ) issue: #30647 - Remove error report if there's no query node serve. It's hard for programer to use it to do resource management. - Change resource group `transferNode` logic to keep compatible with old version sdk. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-25 19:25:24 +08:00
congqixia	f30c22626e	enhance: Pre-cache result for frequent filters (#32580 ) See also #32165 Add segment dist and leader view filter criterion struct to store frequent filter conditions. Add collection/channel filter results for these two meta --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-25 11:13:25 +08:00
congqixia	37ca32dbba	enhance: Make SegmentDistManager filter use node index (#32533 ) See also #32165 Change `SegmentDistFilter` to interface in order to provde node index when filter segment dist. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-24 16:53:24 +08:00
congqixia	bfebdecf3e	enhance: Make LeaderView Manager filter use map index (#32505 ) See also #32165 Change `LeaderViewFilter` to interface to provided map key to avoid iterating all key-values in LeaderViewManager Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-23 11:07:24 +08:00
congqixia	01c16fe6e3	enhance: Manual release pool after save targets (#32358 ) See also #31632 Release conc.Pool after usage to clean worker and stop background purge and ticktock. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-19 13:51:21 +08:00
chyezh	a8c8a6bb0f	fix: parameter check of TransferReplica and TransferNode (#32297 ) issue: #30647 - Same dst and src resource group should not be allowed in `TransferReplica` and `TransferNode`. - Remove redundant parameter check. Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-17 15:27:19 +08:00
yiwangdr	7deda4d5e9	enhance: speed up GetByCollectionAndNode (#32232 ) Related to https://github.com/milvus-io/milvus/issues/32165 Avoid iterating through all replicas/collections if possible. Iteration is expensive when there are large number of replicas/collections. Signed-off-by: yiwangdr <yiwangdr@gmail.com>	2024-04-17 10:23:25 +08:00
congqixia	dc11cbd123	enhance: Maintain collection-patitions mapping in qc meta (#32227 ) Related to #32165 Add collection to partitionIDs mapping to avoid interation on all partitions loaded when trying to get all partitions with collection id --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-15 10:05:19 +08:00
chyezh	48fe977a9d	enhance: declarative resource group api (#31930 ) issue: #30647 - Add declarative resource group api - Add config for resource group management - Resource group recovery enhancement --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-15 08:13:19 +08:00
congqixia	b9a487608a	fix: Make `ResourceGroup.nodes` concurrent safe (#32159 ) See also #32158 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-11 17:53:18 +08:00
chyezh	a3d6110957	fix: ut failure (#32120 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-10 17:30:48 +08:00
chyezh	0be67e7f99	fix: ut failure (#32119 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-10 17:23:27 +08:00
wei liu	c4806b69c4	enhance: Refactor leader view manager interface (#31133 ) issue: #31091 This PR add GetByFilter interface in leader view manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-10 15:13:36 +08:00
chyezh	a2502bde75	enhance: replica manager enhancement (#31496 ) issue: #30647 - ReplicaManager manage read only node now, and always do persistent of node distribution of replica. - All segment/channel checker using ReplicaManager to get read-only node or read-write node, but not ResourceManager. - ReplicaManager promise that only apply unique querynode to one replica in same collection now (replicas in same collection never hold same querynode at same time). - ReplicaManager promise that fairly node count assignment policy if multi replicas of collection is assigned to one resource group. - Move some parameters check into ReplicaManager to avoid data race. - Allow transfer replica to resource group that already load replica of same collection - Allow transfer node between resource groups that load replica of same collection --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-05 04:57:16 +08:00
congqixia	56e371c478	fix: Check replica exists before get latest leader (#31848 ) See also #31847 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-03 10:05:22 +08:00
wei liu	0944a1f790	enhance: Refactor channel dist manager interface (#31119 ) issue: #31091 This PR add GetByFilter interface in channel dist manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-02 10:23:14 +08:00
wei liu	92971707de	enhance: Add restful api for devops to execute rolling upgrade (#29998 ) issue: #29261 This PR Add restful api for devops to execute rolling upgrade, including suspend/resume balance and manual transfer segments/channels. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-27 16:15:19 +08:00
congqixia	8e5865f630	enhance: Save collection targets by batches (#31616 ) See also #28491 #31240 When colleciton number is large, querycoord saves collection target one by one, which is slow and may block querycoord exits. In local run, 500 collections scenario may lead to about 40 seconds saving collection targets. This PR changes the `SaveCollectionTarget` interface into batch one and organizes the collection in 16 per bundle batches to accelerate this procedure. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-27 00:09:08 +08:00
chyezh	9f9ef8ac32	enhance: transfer resource group and dbname to querynode when load (#30936 ) issue: #30931 Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-21 11:59:12 +08:00
congqixia	a647b84f3e	enhance: Add AllPartitionsID const to replace InvalidPartitionID (#31438 ) "-1" as `InvalidPartitionID` previously used as All partition place holder in delete cases. It's confusing and hard to maintain when a const var has more than one meaning. This PR add `AllPartitionsID` to replace these usages in delete scenarios. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-20 19:01:05 +08:00
congqixia	c3d53eb1bf	enhance: Remove metrics when target removed (#31399 ) See also #31390 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-20 10:09:08 +08:00
congqixia	194a611814	enhance: Add metrics for querycoord current target cp lag (#31391 ) See also #31390 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-19 14:07:05 +08:00
wei liu	3e7e9f15cd	fix: Wrong behavior of CurrentTargetFirst/NextTargetFirst in target maanger (#31379 ) issue: #31162 when give scope CurrentTargetFirst/NextTargetFirst, it's expected to scan both current and next target. This PR fixed wrong behavior of CurrentTargetFirst/NextTargetFirst in target manager, which may cause unexpected task generated, and load collection may stuck forever due to dirty leader view. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-19 11:49:05 +08:00
wei liu	d79aa58b37	enhance: Speed up target recovery after query coord restart (#31240 ) issue: #28491 after querycoord restart, it will pull a new target, which include channel and segment list. when segments loaded on querynode has reached the target, the collection could provide search/query. but if segment list changes by time, ater querycoord pull a new target, it will takes a few minutes to catch up the target's segment distribution. and before that, query/search will fail due to lack of segments. This PR save the current loaded target to meta storein querycoord's stop progress, and recover it when query coord starts, to speed up the target recovery time. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-15 14:19:03 +08:00
chyezh	ff4237bb90	enhance: add hostname into node info (#30673 ) issue: https://github.com/milvus-io/milvus/issues/30647 - Address may be reused in k8s environment. Using hostname can be better. Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-15 10:45:06 +08:00
wei liu	efe8cecc88	enhance: refactor segment dist manager interface (#31073 ) issue: #31091 This PR add `GetByFilter` interface in segment dist manager, instead of all kind of get func Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 16:29:01 +08:00
congqixia	c886aa29ff	enhance: Use `ListIndexes` instead of `DescribeIndex` for qc broker (#31122 ) See also #31103 Since querycoord need index meta information from datacoord only, broker shall use `ListIndexes` to skip segment index building check logic in datacoord This PR is also related to #30538, in which DescribeIndex caused lots of memory usage and lead to OOM eventually --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-07 21:43:03 +08:00
wei liu	99297ab81b	fix: Add retry on unimplemented error for datacoord (#30554 ) issue: #30553 when datacoord with version 2.2 and querycoord with version 2.3 coexist during rolling upgrade, `DescribeIndex/GetIndexInfo` will return `unimplemented` error This PR add retry on `DescribeIndex/GetIndexInfo`, to prevent load collection failed during rolling upgrade from milvus 2.2 to 2.3. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-02-18 17:26:52 +08:00
SimFG	ddccccbcab	enhance: add the bytes data type for merge data and format some code (#30105 ) /kind improvement Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-01-18 22:18:55 +08:00

1 2 3 4

174 Commits