milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2024-12-03 04:19:18 +08:00

Author	SHA1	Message	Date
congqixia	4aa8a12ce8	fix: [2.4] Check partition in current target when observing partition load status (#34282 ) (#34305 ) Cherry-pick from master pr: #34282 See also #34234 `LoadPartitions` does not guarantee the current target has loading partitions if there are some partitions already loaded before. This PR check current target contains the partition to load when advancing loading percentage to 100. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-02 15:48:10 +08:00
jaime	6423b6c718	enhance: move rocksmq from internal to pkg (#34165 ) pr: https://github.com/milvus-io/milvus/pull/33881 issue: https://github.com/milvus-io/milvus/issues/33956 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-06-26 13:36:05 +08:00
SimFG	f664b51ebe	enhance: [2.4] try to speed up the loading of small collections (#33746 ) - issue: #33569 - pr: #33570 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-06-11 15:07:55 +08:00
wei liu	9ae4945df2	fix: query node may stuck at stopping progress (#33104 ) (#33154 ) issue: #33103 pr: #33104 when try to do stopping balance for stopping query node, balancer will try to get node list from replica.GetNodes, then check whether node is stopping, if so, stopping balance will be triggered for this replica. after the replica refactor, replica.GetNodes only return rwNodes, and the stopping node maintains in roNodes, so balancer couldn't find replica which contains stopping node, and stopping balance for replica won't be triggered, then query node will stuck forever due to segment/channel doesn't move out. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-20 15:01:43 +08:00
chyezh	293f14a8b9	fix: remove redundant replica recover (#32985 ) issue: #22288 - replica recover should be only triggered by replica recover Signed-off-by: chyezh <chyezh@outlook.com>	2024-05-13 15:25:32 +08:00
Xiaofan	b044e5503e	enhance:Improve load speed (#32898 ) fix #32897 add memory check when load collection Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-05-11 10:29:31 +08:00
wei liu	e2332bdc17	enhance: Enable channel exclusive balance policy (#32911 ) issue: #32910 * split replica's node list to channels when create replicas * balance nodes among channels when node change happens * implement channel level balance, let balance happens in channel level Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-10 17:27:31 +08:00
wei liu	ba02d54a30	enhance: update shard leader cache when leader location changed (#32470 ) issue: #32466 this PR enhance that when shard location changed, update proxy's shard leader cache. in case of query node failover case, proxy can find replica recover --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:05:29 +08:00
chyezh	b904c8d377	enhance: resource group unittest refactory (#32739 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-05-06 10:17:34 +08:00
chyezh	ef4c875d4c	fix: resource group ut may failure (#32688 ) issue: https://github.com/milvus-io/milvus/issues/30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-29 14:17:26 +08:00
congqixia	4cdf6c3c41	fix: Check partition nil before observe load progress (#32659 ) See also #32441 #32615 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-28 16:29:25 +08:00
Xiaofan	02ace25c68	enhance: reduce the cpu usage when collection number is high (#32245 ) related to #32165 1. for all the manager, support collection level index 2. remove collection level filter to avoid extra cpu usage when collection number increases Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-04-26 11:49:25 +08:00
chyezh	21a9de5c8e	fix: resource group ut fixup (#32509 ) issue: #30647 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-23 10:01:23 +08:00
congqixia	d7ff1bbe5c	enhance: Make querycoordv2 collection observer task driven (#32441 ) See also #32440 - Add loadTask in collection observer - For load collection/partitions, load task shall timeout as a whole - Change related constructor to load jobs --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-22 10:39:22 +08:00
chyezh	70e3d5b495	fix: wrong node id in TestCheckNodesInReplica (#32268 ) issue: #31930 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-15 17:38:17 +08:00
chyezh	48fe977a9d	enhance: declarative resource group api (#31930 ) issue: #30647 - Add declarative resource group api - Add config for resource group management - Resource group recovery enhancement --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-15 08:13:19 +08:00
wei liu	c4806b69c4	enhance: Refactor leader view manager interface (#31133 ) issue: #31091 This PR add GetByFilter interface in leader view manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-10 15:13:36 +08:00
chyezh	a2502bde75	enhance: replica manager enhancement (#31496 ) issue: #30647 - ReplicaManager manage read only node now, and always do persistent of node distribution of replica. - All segment/channel checker using ReplicaManager to get read-only node or read-write node, but not ResourceManager. - ReplicaManager promise that only apply unique querynode to one replica in same collection now (replicas in same collection never hold same querynode at same time). - ReplicaManager promise that fairly node count assignment policy if multi replicas of collection is assigned to one resource group. - Move some parameters check into ReplicaManager to avoid data race. - Allow transfer replica to resource group that already load replica of same collection - Allow transfer node between resource groups that load replica of same collection --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-05 04:57:16 +08:00
wei liu	0944a1f790	enhance: Refactor channel dist manager interface (#31119 ) issue: #31091 This PR add GetByFilter interface in channel dist manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-02 10:23:14 +08:00
congqixia	73858b23bc	fix: Make target observer auto/manual task mutual exclusive (#31584 ) See also #30867 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-26 09:57:08 +08:00
chyezh	9f9ef8ac32	enhance: transfer resource group and dbname to querynode when load (#30936 ) issue: #30931 Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-21 11:59:12 +08:00
chyezh	ff4237bb90	enhance: add hostname into node info (#30673 ) issue: https://github.com/milvus-io/milvus/issues/30647 - Address may be reused in k8s environment. Using hostname can be better. Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-15 10:45:06 +08:00
wei liu	efe8cecc88	enhance: refactor segment dist manager interface (#31073 ) issue: #31091 This PR add `GetByFilter` interface in segment dist manager, instead of all kind of get func Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 16:29:01 +08:00
congqixia	c886aa29ff	enhance: Use `ListIndexes` instead of `DescribeIndex` for qc broker (#31122 ) See also #31103 Since querycoord need index meta information from datacoord only, broker shall use `ListIndexes` to skip segment index building check logic in datacoord This PR is also related to #30538, in which DescribeIndex caused lots of memory usage and lead to OOM eventually --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-07 21:43:03 +08:00
yiwangdr	32cff25f97	enhance: decrease coordinator init time (#29822 ) This PR mainly improve two items: 1. Target observer should refresh loading status during init time. An uninitialized loading status blocks search/query. Currently, the target observer refreshes every 10 seconds, i.e. we'd need to wait for 10s for no reason. That's also the reason why we constantly see false log "collection unloaded" upon mixcoord restarts. 2. Delete session when service is stopped. So that the new service doesn't need to wait for the previous session to expire (~10s). Item 1 is the major improvement of this PR, which should speed up init time by 10s. Item 2 is not a big concern in most cases as coordinators usually shut down after stop(). In those cases, coordinator restart triggers serverID change which further triggers an existing logic that deletes expired session. This PR only fixes rare cases where serverID doesn't change. integration test: `go test -tags dynamic -v -coverprofile=profile.out -covermode=atomic tests/integration/coordrecovery/coord_recovery_test.go -timeout=20m` Performance after the change: Average init time of coordinators: 10s Hardware: M2 Pro Test setup: 1000 collections with 1000 rows (dim=128) per collection. issue: #29409 Signed-off-by: yiwangdr <yiwangdr@gmail.com>	2024-02-05 14:00:12 +08:00
wei liu	797847904c	enhance: Change some frequency log to rated level (#29720 ) This PR change some frequency log to rated level Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-11 16:30:50 +08:00
wei liu	e98c62abbb	enhance: refactor leader_observer to leader_checker (#29454 ) issue: #29453 sync distribution by rpc will also call loadSegment/releaseSegment, which may cause all kinds of concurrent case on same segment, such as concurrent load and release on one segment. This PR add leader_checker which generate load/release task to correct the leader view, instead of calling sync distribution by rpc --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-05 15:54:55 +08:00
congqixia	da7c3cbd88	enhance: make delegator delete buffer holding all delete from cp (#29626 ) See also #29625 This PR: - Add a new implemention of `DeleteBuffer`: listDeleteBuffer - holds cacheBlock slice - `Put` method append new delete data into last block - when a block is full, append a new block into the list - Add `TryDiscard` method for `DeleteBuffer` interface - For doubleCacheBuffer, do nothing - For listDeleteBuffer, try to evict "old" blocks, which are blocks before the first block whose start ts is behind provided ts - Add checkpoint field for `UpdateVersion` sync action, which shall be used to discard old cache delete block --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-04 17:02:46 +08:00
congqixia	a3cb8e2625	fix: Add atomic method to get collection target (#29577 ) Related to #29575 Add `getCollectionTarget` method which is atomic when scope is `CurrentTargetFirst` or `NextTargetFirst` Also return error when executor finds no channel in target manager --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-12-29 09:04:46 +08:00
wei liu	514e279f3a	enhance: Remove useless log in collection observer (#29554 ) This PR removed the useless log in collection observer Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-12-28 17:16:47 +08:00
yah01	13beb5ccc0	fix: load gets stuck probably (#29191 ) we found the load got stuck probably, and reviewed the logs. the target observer seems not working, the reason is the taskDispatcher removes the task in a goroutine, and modifies the task status after committing the task into the goroutine pool, but this may happen after the task removed, which leads to the task will never be removed related #29086 Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-12-14 18:28:38 +08:00
yah01	9819090247	enhance: add more logs for target updating (#29090 ) - add more logs about the condition satisfying --------- Signed-off-by: yah01 <yah2er0ne@outlook.com> Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-12-12 14:06:43 +08:00
yah01	fab52d167b	fix: may miss stream delta while loading (#28871 ) we consume the delta data from the lastest channel checkpoint while loading segment, this works well without level 0 segments, but now it may lead to miss some delta data, so we have to consume from the current target's channel checkpoint related: #27349 --------- Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-12-05 17:34:45 +08:00
aoiasd	b4af6f8c40	fix: sync action load segment with lack collection index info list (#28788 ) relate: https://github.com/milvus-io/milvus/issues/28779 https://github.com/milvus-io/milvus/issues/28637 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2023-12-04 18:14:34 +08:00
wei liu	d081fd5481	enhance: Change some frequency log to rated level (#28897 ) This pr change some frequency log's level to rated. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-12-04 10:38:35 +08:00
congqixia	f9bb8e9648	enhance: Change const magic number in querycoord to param (#28819 ) See also #28817 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-30 09:06:28 +08:00
congqixia	81caf02554	fix: make qcv2 observer dispatcher execute exactly once (#28472 ) See also #28466 In `taskDispatcher.schedule`, same task may be resubmitted if the previous round did not finish In this case, TaskObserver.check may set current target by mistake, which may cause the random search/query failure Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-16 10:24:19 +08:00
yah01	ecb3f585c3	Fix passing the wrong dropped list from current target (#28265 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-08 17:02:18 +08:00
yah01	1b90630633	Fix the target updated before version updated to cause data missing (#28250 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-08 11:36:22 +08:00
wei liu	7485eeb689	fix sync distribution with wrong version (#28130 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-11-03 19:02:18 +08:00
yah01	dc89730a50	Support collection-level mmap control (#26901 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-02 23:52:16 +08:00
wei liu	e0222b2ce3	refine target manager code style (#27883 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-10-25 00:44:12 +08:00
congqixia	323fc107a7	Fix taskDispatcher add multiple tasks will ignore following ones (#27885 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-10-24 17:18:13 +08:00
congqixia	93a877f55e	Make qcv2 target&leader observer execute in parallel (#27844 ) - Add `taskDispatcher` to submit and run task async safely - Change `LeaderObeserver` and `TargetObserver` schedule and manual check action to submitting task into dispatcher - Fix logic problem in collection observer when manual check return false See also #27494 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-10-24 10:14:11 +08:00
wei liu	55e5f80e24	update collection target after observer start (#27774 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-10-19 21:52:10 +08:00
yah01	be980fbc38	Refine state check (#27541 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-10-11 21:01:35 +08:00
wei liu	42c475a0e0	remove useless log in querycoord (#27362 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-10-11 10:13:34 +08:00
congqixia	eca79d149c	Add ctx control for observer manual check methods (#27531 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-10-09 11:07:33 +08:00
yah01	a715165306	Set timeout for leader observer syncing (#27504 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-10-08 16:55:31 +08:00
yah01	a8ce1b6686	Refine QueryCoord stopping (#27371 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-09-27 16:27:27 +08:00

1 2 3

125 Commits