Found lots of `failed to updateTimeTick` with error `skip
ChannelTimeTickMsg from un-recognized session 1` The reason was etcd
client became singleton and used last root path in multiple cases are
run in one suite.
This PR add close singleton client invocation to fix this problem.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #30647
- Add declarative resource group api
- Add config for resource group management
- Resource group recovery enhancement
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #28491
after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.
This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30983#30982
cause balancer call wrong interface to get segment/channel list in
replica, then got a wrong average segment/channel number, which make
each node have less segment/channel than average, and the balance won't
be trigger in multi replica case.
This PR fix that balance segment/channel won't be trigger on multi
replicas
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #30715
- Bug: Set nil struct pointer to describe nil interface.
Panic with segment violation when calling method on this nil struct
pointer.
Signed-off-by: chyezh <chyezh@outlook.com>
This PR mainly improve two items:
1. Target observer should refresh loading status during init time. An
uninitialized loading status blocks search/query. Currently, the target
observer refreshes every 10 seconds, i.e. we'd need to wait for 10s for
no reason. That's also the reason why we constantly see false log
"collection unloaded" upon mixcoord restarts.
2. Delete session when service is stopped. So that the new service
doesn't need to wait for the previous session to expire (~10s).
Item 1 is the major improvement of this PR, which should speed up init
time by 10s.
Item 2 is not a big concern in most cases as coordinators usually shut
down after stop(). In those cases, coordinator restart triggers serverID
change which further triggers an existing logic that deletes expired
session. This PR only fixes rare cases where serverID doesn't change.
integration test:
`go test -tags dynamic -v -coverprofile=profile.out -covermode=atomic
tests/integration/coordrecovery/coord_recovery_test.go -timeout=20m`
Performance after the change:
Average init time of coordinators: 10s
Hardware: M2 Pro
Test setup: 1000 collections with 1000 rows (dim=128) per collection.
issue: #29409
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
don't store logPath in meta to reduce memory, when service get
segmentinfo, generate logpath from logid.
#28885
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
This pull request simplifies the integration test for cross-cluster
routing by reusing `integration.MiniClusterSuite`, instead of defining
custom Milvus clients, servers, and etcd client.
issue: https://github.com/milvus-io/milvus/issues/29874
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>