milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2024-12-05 05:18:52 +08:00

Author	SHA1	Message	Date
chyezh	21a9de5c8e	fix: resource group ut fixup (#32509 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-23 10:01:23 +08:00
congqixia	d7ff1bbe5c	enhance: Make querycoordv2 collection observer task driven (#32441 ) See also #32440 - Add loadTask in collection observer - For load collection/partitions, load task shall timeout as a whole - Change related constructor to load jobs --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-22 10:39:22 +08:00
congqixia	01c16fe6e3	enhance: Manual release pool after save targets (#32358 ) See also #31632 Release conc.Pool after usage to clean worker and stop background purge and ticktock. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-19 13:51:21 +08:00
chyezh	a8c8a6bb0f	fix: parameter check of TransferReplica and TransferNode (#32297 ) issue: #30647 - Same dst and src resource group should not be allowed in `TransferReplica` and `TransferNode`. - Remove redundant parameter check. Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-17 15:27:19 +08:00
yiwangdr	7deda4d5e9	enhance: speed up GetByCollectionAndNode (#32232 ) Related to https://github.com/milvus-io/milvus/issues/32165 Avoid iterating through all replicas/collections if possible. Iteration is expensive when there are large number of replicas/collections. Signed-off-by: yiwangdr <yiwangdr@gmail.com>	2024-04-17 10:23:25 +08:00
congqixia	72c172a7d7	enhance: Remove duplicated collectionID label for task latency (#32308 ) `CollectionID` already exists in channel name, so remove it to save metrics traffic. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-16 18:55:19 +08:00
chyezh	70e3d5b495	fix: wrong node id in TestCheckNodesInReplica (#32268 ) issue: #31930 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-15 17:38:17 +08:00
wei liu	4822b109bd	fix: Skip to load l0 segment on old version query node (#32124 ) issue: #32107 during rolling upgrade progress, skip to load l0 segment on old version query node --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-15 11:23:23 +08:00
congqixia	dc11cbd123	enhance: Maintain collection-patitions mapping in qc meta (#32227 ) Related to #32165 Add collection to partitionIDs mapping to avoid interation on all partitions loaded when trying to get all partitions with collection id --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-15 10:05:19 +08:00
chyezh	48fe977a9d	enhance: declarative resource group api (#31930 ) issue: #30647 - Add declarative resource group api - Add config for resource group management - Resource group recovery enhancement --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-15 08:13:19 +08:00
wei liu	68dec7dcd4	fix: Use correct ts to avoid exclude segment list leak (#31991 ) issue: #31990 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-12 10:39:19 +08:00
congqixia	b9a487608a	fix: Make `ResourceGroup.nodes` concurrent safe (#32159 ) See also #32158 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-11 17:53:18 +08:00
congqixia	25a1c9ecf0	fix: Make coordinator `Register` not blocked on ProcessActiveStandby (#32069 ) See also #32066 This PR make coordinator register successful and let `ProcessActiveStandBy` run async. And roles may receive stop signal and notify servers. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-10 18:49:18 +08:00
chyezh	a3d6110957	fix: ut failure (#32120 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-10 17:30:48 +08:00
chyezh	0be67e7f99	fix: ut failure (#32119 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-10 17:23:27 +08:00
wei liu	c4806b69c4	enhance: Refactor leader view manager interface (#31133 ) issue: #31091 This PR add GetByFilter interface in leader view manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-10 15:13:36 +08:00
wei liu	177ddda47f	fix: Check stale should check leader task's leader id (#31962 ) issue: #30816 check stale rules for leader task: 1. for reduce leader task, it should keep executing until leader's node become offline. 2. for grow leader task,it should keep executing until leader's node become stopping. This PR check leader node's stopping state for grow leader task Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-09 15:33:25 +08:00
zhenshan.cao	089c805e0a	enhance:Refactor hybrid search (#32020 ) issue: https://github.com/milvus-io/milvus/issues/25639 https://github.com/milvus-io/milvus/issues/31368 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-04-09 14:21:18 +08:00
yiwangdr	1cd15d9322	test: support segment release in integration test (#31190 ) issue: #29507 Notice that api_testonly.go files should be guarded by compiler tag `test`, so that production build rules don't compile them and these APIs don't get misused. Signed-off-by: yiwangdr <yiwangdr@gmail.com>	2024-04-09 11:39:17 +08:00
chyezh	a2502bde75	enhance: replica manager enhancement (#31496 ) issue: #30647 - ReplicaManager manage read only node now, and always do persistent of node distribution of replica. - All segment/channel checker using ReplicaManager to get read-only node or read-write node, but not ResourceManager. - ReplicaManager promise that only apply unique querynode to one replica in same collection now (replicas in same collection never hold same querynode at same time). - ReplicaManager promise that fairly node count assignment policy if multi replicas of collection is assigned to one resource group. - Move some parameters check into ReplicaManager to avoid data race. - Allow transfer replica to resource group that already load replica of same collection - Allow transfer node between resource groups that load replica of same collection --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-05 04:57:16 +08:00
congqixia	c2aad513c0	fix: Check collection nil before check load status (#31850 ) See also #31849 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-03 10:07:13 +08:00
congqixia	56e371c478	fix: Check replica exists before get latest leader (#31848 ) See also #31847 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-03 10:05:22 +08:00
wei liu	7471a8005f	fix: querycoord panic after node down (#31831 ) issue: #30519 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-03 10:03:22 +08:00
congqixia	0feee53631	enhance: Add back unit test for compactor and fix some TODOs (#31829 ) This PR adds back compactor "Unhandled" data type unit test and fixes some TODOs behvaior Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-02 20:35:14 +08:00
Bingyi Sun	91cb529ba6	fix: get latest collection info when checking index (#31744 ) issue: https://github.com/milvus-io/milvus/issues/31727 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-04-02 14:43:13 +08:00
wei liu	0944a1f790	enhance: Refactor channel dist manager interface (#31119 ) issue: #31091 This PR add GetByFilter interface in channel dist manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-02 10:23:14 +08:00
congqixia	16d869c57e	enhance: Add EmbedEtcd testutil and remove etcd dep of task pkg (#31802 ) See also #20478 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-02 09:59:14 +08:00
wei liu	bb500d66c7	fix: Remove segment from leader view can't be executed (#31663 ) issue: #31664 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-01 10:39:12 +08:00
wei liu	c311932d5f	fix: Update segment's version in leader task (#31643 ) issue: #31468 1. when segment's version in leader view doesn't match segment's version in dist, should update leader view 2. after call loadDeltalog, should update segment's load version with latest ts 3. change leader task's priority from high to low, to avoid leader task replace segment task and balance task --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-01 10:37:21 +08:00
wei liu	92971707de	enhance: Add restful api for devops to execute rolling upgrade (#29998 ) issue: #29261 This PR Add restful api for devops to execute rolling upgrade, including suspend/resume balance and manual transfer segments/channels. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-27 16:15:19 +08:00
wei liu	5d752498e7	fix: Skip release duplicate l0 segment (#31540 ) issue: #31480 #31481 release duplicate l0 segment task, which execute on old delegator may cause segment lack, and execute on new delegator may break new delegator's leader view. This PR skip release duplicate l0 segment by segment_checker, cause l0 segment will be released with unsub channel --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-27 12:53:10 +08:00
congqixia	8e5865f630	enhance: Save collection targets by batches (#31616 ) See also #28491 #31240 When colleciton number is large, querycoord saves collection target one by one, which is slow and may block querycoord exits. In local run, 500 collections scenario may lead to about 40 seconds saving collection targets. This PR changes the `SaveCollectionTarget` interface into batch one and organizes the collection in 16 per bundle batches to accelerate this procedure. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-27 00:09:08 +08:00
congqixia	73858b23bc	fix: Make target observer auto/manual task mutual exclusive (#31584 ) See also #30867 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-26 09:57:08 +08:00
wei liu	6438d65459	fix: Grow task stuck at stopping node (#31487 ) issue: #30816 this PR fix that grow task stuck at stopping node Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-25 18:57:07 +08:00
congqixia	4d2142d041	fix: Check latest leader exists before using it (#31500 ) See also #31495 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-22 18:25:07 +08:00
wei liu	03eaa5d478	fix: Load segment task promote failed (#31430 ) issue: #30816 pr #31319 introduce the logic that segment checker need to load level zero segment which only exist in current target. This PR fix load segment task promote failed when segment only belongs to current target --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-21 18:09:07 +08:00
chyezh	9f9ef8ac32	enhance: transfer resource group and dbname to querynode when load (#30936 ) issue: #30931 Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-21 11:59:12 +08:00
wei liu	7c7375031d	enhance: Add metrics for task latency in querycoord scheduler (#31405 ) This PR add metrics for task latency in querycoord scheduler, so if any kind of task stuck, it's easy to figure out by metrics --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-20 19:29:06 +08:00
congqixia	a647b84f3e	enhance: Add AllPartitionsID const to replace InvalidPartitionID (#31438 ) "-1" as `InvalidPartitionID` previously used as All partition place holder in delete cases. It's confusing and hard to maintain when a const var has more than one meaning. This PR add `AllPartitionsID` to replace these usages in delete scenarios. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-20 19:01:05 +08:00
congqixia	c3d53eb1bf	enhance: Remove metrics when target removed (#31399 ) See also #31390 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-20 10:09:08 +08:00
congqixia	194a611814	enhance: Add metrics for querycoord current target cp lag (#31391 ) See also #31390 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-19 14:07:05 +08:00
wei liu	3e7e9f15cd	fix: Wrong behavior of CurrentTargetFirst/NextTargetFirst in target maanger (#31379 ) issue: #31162 when give scope CurrentTargetFirst/NextTargetFirst, it's expected to scan both current and next target. This PR fixed wrong behavior of CurrentTargetFirst/NextTargetFirst in target manager, which may cause unexpected task generated, and load collection may stuck forever due to dirty leader view. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-19 11:49:05 +08:00
wei liu	c26c1b33c2	fix: Transfer l0 segment to new delegator after balance (#31319 ) issue: #30186 during channel balance, after new delegator loaded, instead of syncing l0 segment's location to new delegator, we should load l0 segment on new delegator, and release the old l0 segment, then start to release old delegator. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-19 09:59:05 +08:00
wei liu	4dfdb1a443	fix: save current target after target observer stop (#31315 ) issue: #28491 should save target to meta store after target observer stop, incase of target changed Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-18 12:27:04 +08:00
wei liu	d79aa58b37	enhance: Speed up target recovery after query coord restart (#31240 ) issue: #28491 after querycoord restart, it will pull a new target, which include channel and segment list. when segments loaded on querynode has reached the target, the collection could provide search/query. but if segment list changes by time, ater querycoord pull a new target, it will takes a few minutes to catch up the target's segment distribution. and before that, query/search will fail due to lack of segments. This PR save the current loaded target to meta storein querycoord's stop progress, and recover it when query coord starts, to speed up the target recovery time. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-15 14:19:03 +08:00
chyezh	ff4237bb90	enhance: add hostname into node info (#30673 ) issue: https://github.com/milvus-io/milvus/issues/30647 - Address may be reused in k8s environment. Using hostname can be better. Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-15 10:45:06 +08:00
jaime	db79be3ae0	fix: ctx cancel should be the last step while stopping server (#31220 ) issue: #31219 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-03-15 10:33:05 +08:00
congqixia	773c64ecbb	fix: Set nodeID when remove distribution (#31259 ) See also #30930 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-14 15:09:03 +08:00
wei liu	06b191b164	fix: Balance channel stuck forever due to logic dead lock (#31202 ) issue: #30816 cause balance channel will stuck until leader view catch up the current target, then start to unsub the old delegator. which make sure that the new delegator can provide search before release old delegator. but another logic in segment_checker skip loading segment during balance channel. so during balance channel, if query node crash, new delegator can't catch up target forever, then stuck forever. This PR remove the rule that skip loading segment during balance channel to avoid the logic dead lock here. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-13 15:05:04 +08:00
congqixia	5b51c20293	fix: Use `Remove` sync type for distribution removal (#31215 ) See also #31214 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-13 06:11:04 +08:00
wei liu	06df9b8462	fix: Balance segment/channel won't be trigger on multi replicas (#31107 ) issue: #30983 #30982 cause balancer call wrong interface to get segment/channel list in replica, then got a wrong average segment/channel number, which make each node have less segment/channel than average, and the balance won't be trigger in multi replica case. This PR fix that balance segment/channel won't be trigger on multi replicas Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-11 20:35:04 +08:00
wei liu	ddd918ba04	enhance: change frequency log to rated level (#31084 ) This PR change frequency log of check shard leader to rated level --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 16:39:02 +08:00
wei liu	efe8cecc88	enhance: refactor segment dist manager interface (#31073 ) issue: #31091 This PR add `GetByFilter` interface in segment dist manager, instead of all kind of get func Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 16:29:01 +08:00
wei liu	22df5061c1	fix: Leader checker can't update segment's load version (#31040 ) issue: #30890 when leader checker find that leader view has an older load version of segment, it will try to correct leader view. but the sync action doesn't specify the latest load version. so the update operation will failed. This PR fix leader checker can't update segment's load version and keeping generate same task to scheduler. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 11:57:01 +08:00
congqixia	c886aa29ff	enhance: Use `ListIndexes` instead of `DescribeIndex` for qc broker (#31122 ) See also #31103 Since querycoord need index meta information from datacoord only, broker shall use `ListIndexes` to skip segment index building check logic in datacoord This PR is also related to #30538, in which DescribeIndex caused lots of memory usage and lead to OOM eventually --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-07 21:43:03 +08:00
wei liu	2a047103d6	fix: Dirty sealed segment won't release after channel balance (#31095 ) issue: #31074 This PR fix dirty sealed segment doesn't release after channel balance, dirty sealed segment means segment doesn't exist in targets. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-07 16:23:01 +08:00
Bingyi Sun	e3cce11dd9	fix: data race in querynode task test (#31019 ) issue: https://github.com/milvus-io/milvus/issues/31022 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-03-05 16:26:59 +08:00
Bingyi Sun	7783098ddd	feat: support lazy load on querycoord (#30372 ) https://github.com/milvus-io/milvus/issues/30361 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-03-01 18:15:29 +08:00
SimFG	ee8d6f236c	enhance: make the watch dm channel request better compatibility (#30952 ) issue: #30938 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-03-01 16:07:37 +08:00
chyezh	0c7474d7e8	enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30317 ) 1. add coordinator graceful stop timeout to 5s 2. change the order of datacoord component while stop 3. change querynode grace stop timeout to 900s, and we should potentially change this to 600s when graceful stop is smooth issue: #30310 also see pr: #30306 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-02-29 17:01:50 +08:00
wei liu	545e8de401	fix: promote leader task failed when segment only exist on current target (#30794 ) issue: #30150 `checkLeaderTaskStale` will check segment whether exist on next current for leaderTask's growing action, which will cause promote leader task failed when segment only exist on current target This PR will check segment for both current or next target. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-02-28 13:14:59 +08:00
Bingyi Sun	ece9d273a7	enhance: some patches for #30636 (#30664 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-02-26 11:42:55 +08:00
wei liu	befe0e21fd	fix: Set indexInfo when try to set segment to leader view (#30758 ) issue: #30150 see also: #30258 cause `SyncDataDistribution` will try to load delta for segment. if miss indexInfo in request, sync action will failed due to lack of index info. This PR set indexinfo when try to set segment to leader view Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-02-26 11:02:55 +08:00
wei liu	6dd7297178	fix: Skip generate balance task when target not ready (#30724 ) issue: #30723 This PR skip generate balance task when collection's target isn't ready. also refine the check stale logic in query coord's scheduler, if channel exist in current or next target, task won't be canceled. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-02-23 10:32:53 +08:00
congqixia	7b91fa3db8	fix: Make leader checker generate leader task instead of segment task (#30258 ) See also #30150 For leader view distribution with offline nodes, a release task can never be sent to querynode due to targetNode online check logic. Even the request is dispatched, normal release task does not have "force" flag when calling `delegator.ReleaseSegment`. This PR adds a new type of querycoord task: LeaderTask, the responsibility of which is to rectify leader view distribtion. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-02-21 11:08:51 +08:00
Bingyi Sun	564b12c661	enhance: make balance cost threshold configurable (#30636 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-02-19 15:24:50 +08:00
wei liu	99297ab81b	fix: Add retry on unimplemented error for datacoord (#30554 ) issue: #30553 when datacoord with version 2.2 and querycoord with version 2.3 coexist during rolling upgrade, `DescribeIndex/GetIndexInfo` will return `unimplemented` error This PR add retry on `DescribeIndex/GetIndexInfo`, to prevent load collection failed during rolling upgrade from milvus 2.2 to 2.3. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-02-18 17:26:52 +08:00
congqixia	a6d9eb7f20	fix: Remove balance plan of which From, To nodes are same when merging (#30634 ) See also #30627 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-02-18 17:24:50 +08:00
zhenshan.cao	bb93b22c84	fix: should return collectionName in response of ListAliases (#30532 ) issue : https://github.com/milvus-io/milvus/issues/30369 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-02-12 08:30:55 +08:00
Bingyi Sun	715f042965	feat: add a balancer based on both of row count and segment count (#30188 ) issue: https://github.com/milvus-io/milvus/issues/30039 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-02-06 17:15:50 +08:00
yiwangdr	32cff25f97	enhance: decrease coordinator init time (#29822 ) This PR mainly improve two items: 1. Target observer should refresh loading status during init time. An uninitialized loading status blocks search/query. Currently, the target observer refreshes every 10 seconds, i.e. we'd need to wait for 10s for no reason. That's also the reason why we constantly see false log "collection unloaded" upon mixcoord restarts. 2. Delete session when service is stopped. So that the new service doesn't need to wait for the previous session to expire (~10s). Item 1 is the major improvement of this PR, which should speed up init time by 10s. Item 2 is not a big concern in most cases as coordinators usually shut down after stop(). In those cases, coordinator restart triggers serverID change which further triggers an existing logic that deletes expired session. This PR only fixes rare cases where serverID doesn't change. integration test: `go test -tags dynamic -v -coverprofile=profile.out -covermode=atomic tests/integration/coordrecovery/coord_recovery_test.go -timeout=20m` Performance after the change: Average init time of coordinators: 10s Hardware: M2 Pro Test setup: 1000 collections with 1000 rows (dim=128) per collection. issue: #29409 Signed-off-by: yiwangdr <yiwangdr@gmail.com>	2024-02-05 14:00:12 +08:00
xige-16	060c8603a3	fix: Support mvcc with hybrid serach (#30114 ) issue: https://github.com/milvus-io/milvus/issues/29656 /kind bug Signed-off-by: xige-16 <xi.ge@zilliz.com> --------- Signed-off-by: xige-16 <xi.ge@zilliz.com>	2024-02-01 16:03:03 +08:00
Bingyi Sun	406bf14e84	enhance: Add growing row count weight (#30271 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-01-29 14:05:02 +08:00
aoiasd	f84d9a589a	fix: channel checker reduce balancing channels. (#30087 ) Ignore leader unavailable when channel checker judge repeat channel to avoid channel checker remove channels balancing. relate: https://github.com/milvus-io/milvus/issues/29841 https://github.com/milvus-io/milvus/issues/29838 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-01-26 10:59:00 +08:00
wei liu	f69f65ff68	fix: Leader checker can't remove segment from leader view (#30151 ) issue: #30150 This PR fix three problems: 1. leader checker use wrong node id when generate release task, which cause the release task finished immediately 2. the release request generated by leader_checker doesn't set the `force` flag, the operation to clean leader view on delegator will fail. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-20 18:58:58 +08:00
SimFG	ddccccbcab	enhance: add the bytes data type for merge data and format some code (#30105 ) /kind improvement Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-01-18 22:18:55 +08:00
smellthemoon	e52ce370b6	enhance:don't store logPath in meta to reduce memory (#28873 ) don't store logPath in meta to reduce memory, when service get segmentinfo, generate logpath from logid. #28885 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-01-18 22:06:31 +08:00
wei liu	f8695aef9d	fix: Trigger leader checker too frequency (#29991 ) issue: #29841 This PR fix leader checker use wrong check interval, which causes leader checker trigger too frequency Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-17 19:40:53 +08:00
congqixia	4c93912135	enhance: Shuffle candidates before channel assignment (#30066 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-17 19:34:53 +08:00
wei liu	57bd3e2181	fix: Leader checker canot submit load task (#30067 ) issue: #29841 if segment loaded, submit load segment task for it isn't permitted, to avoid load segment twice. but this logic blocks the leader checker to correct leader view by `LoadSegment` This PR remove the segment loaded check, to fix that leader checker cann't submit load task Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-17 19:12:54 +08:00
wei liu	9abc868d15	fix: Remove heartbeat lag logic during get shard leaders (#29999 ) issue: #29677 #29838 during get shard leaders, if qeurynode doesn't ack the heartbeat than 10s, querycoord will treat it as unavailable, and won't return shard leader on it. but when querynode has a full cpu usage, it's easily to stuck for more than 10s without ack the heartbeat, which cause no shard leader to search/query. This PR remove heartbeat lag logic during get shard leaders Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-17 11:22:52 +08:00
congqixia	7cb6bebd96	enhance: replace magic number with ParamItem for dist handler (#30020 ) See also #28817 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-16 17:33:03 +08:00
yah01	c68c128e47	fix: level 0 segments not loaded (#29908 ) the recent changes move the level 0 segments list to a new proto field, which leads to the QueryCoord can't see the level 0 segments, handle the new changes fix #29907 Signed-off-by: yah01 <yang.cen@zilliz.com>	2024-01-16 14:40:53 +08:00
smellthemoon	595ec2559c	enhance: change some frequent log level (#29953 ) Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-01-14 10:19:16 +08:00
congqixia	082ee1a709	enhance: Use newer checkpoint when packing LoadSegmentRequest (#29922 ) See also: #29650 Either segment dml position & channel checkpoint could be newer in some cases. This PR make PackLoadSegments use the newer one improving load performance during cases where there are lots of upsert. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-13 10:46:53 +08:00
wei liu	565fc3a019	enhance: Skip generate load segment task (#29724 ) issue: #29814 if channel is not subscribed yet, the generated load segment task will be remove from task scheduler due to the load segment task need to be transfer to worker node by shard leader. This PR skip generate load segment task when channel is not subscribed yet. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-12 18:56:58 +08:00
Bingyi Sun	e1258b8cad	feat: integrate storagev2 into loading segment (#29336 ) issue: #29335 --------- Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-01-12 18:10:51 +08:00
wei liu	797847904c	enhance: Change some frequency log to rated level (#29720 ) This PR change some frequency log to rated level Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-11 16:30:50 +08:00
congqixia	c4ddfff2a7	enhance: make Load process traceable in querycoord (#29806 ) See also #29803 This PR: - Add trace span for collection/partition load - Use TraceSpan to generate Segment/ChannelTasks when loading - Refine BaseTask trace tag usage --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-10 09:58:49 +08:00
xige-16	9702cef2b5	feat: Support multiple vector search (#29433 ) issue #25639 Signed-off-by: xige-16 <xi.ge@zilliz.com> Signed-off-by: xige-16 <xi.ge@zilliz.com>	2024-01-08 15:34:48 +08:00
congqixia	b5f039a221	fix: Assertion all async invocations in test case (#29737 ) Resolves: #29736 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-07 15:54:47 +08:00
wei liu	e98c62abbb	enhance: refactor leader_observer to leader_checker (#29454 ) issue: #29453 sync distribution by rpc will also call loadSegment/releaseSegment, which may cause all kinds of concurrent case on same segment, such as concurrent load and release on one segment. This PR add leader_checker which generate load/release task to correct the leader view, instead of calling sync distribution by rpc --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-05 15:54:55 +08:00
congqixia	3626f49025	fix: make sure balance candidate is alway pushed back (#29702 ) See also #29699 Querycoord panicked when tried to pop from an empty heap. We assume the heap shall not be empty, but in some branch, the candidate is never pushed back. This PR put pop & push in a closure and adds a defer call to push item back. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-05 10:08:47 +08:00
congqixia	da7c3cbd88	enhance: make delegator delete buffer holding all delete from cp (#29626 ) See also #29625 This PR: - Add a new implemention of `DeleteBuffer`: listDeleteBuffer - holds cacheBlock slice - `Put` method append new delete data into last block - when a block is full, append a new block into the list - Add `TryDiscard` method for `DeleteBuffer` interface - For doubleCacheBuffer, do nothing - For listDeleteBuffer, try to evict "old" blocks, which are blocks before the first block whose start ts is behind provided ts - Add checkpoint field for `UpdateVersion` sync action, which shall be used to discard old cache delete block --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-04 17:02:46 +08:00
congqixia	aa967de0a8	enhance: Explicitly pass LevelZero segment ids in vchan info (#29612 ) See also #27675 For `GetRecoveryInfo` & `GetRecoveryInfoV2`, Level zero segment ids shall be specified in vchan info so that querycoord could re-fetch current segment info during watch procedure without having all segment info Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-04 16:46:45 +08:00
wei liu	336fce0582	enhance: Rewrite gen segment plan based on assign segment (#29574 ) issue: #29582 This PR rewrite gen segment plan logic based on assign segment in `score_based_balancer` Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-04 11:10:44 +08:00
congqixia	a3cb8e2625	fix: Add atomic method to get collection target (#29577 ) Related to #29575 Add `getCollectionTarget` method which is atomic when scope is `CurrentTargetFirst` or `NextTargetFirst` Also return error when executor finds no channel in target manager --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-12-29 09:04:46 +08:00
wei liu	514e279f3a	enhance: Remove useless log in collection observer (#29554 ) This PR removed the useless log in collection observer Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-12-28 17:16:47 +08:00
wei liu	5474bce9d2	fix: Choose wrong shard leader during balance channel (#29529 ) issue: #29523 readable shard leader should still be the old one during channel balance, if the new shard leader is not ready. This PR fixed that query coord choose wrong shard leader during balance channel Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-12-28 15:22:51 +08:00
congqixia	aa279db44c	enhance: remove flushed segmentInfo in WatchChannelRequest (#29526 ) `WatchDmChannel` only need growing segment info, this PR removes fetch segmentInfos when fill watch dml channel request. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-12-28 00:40:47 +08:00

1 2 3 4 5 ...

515 Commits