Commit Graph

109 Commits

Author SHA1 Message Date
wei liu
97a44b62fd
fix: Data race in datacoord channel manager (#37866)
issue: #37865

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-21 19:00:32 +08:00
yihao.dai
0fc0d1a888
fix: Limit the concurrency of channel tasks (#37740)
Limit the maximum concurrency of channel tasks for each DataNode to
prevent excessive subscriptions from causing DataNode OOM.

issue: https://github.com/milvus-io/milvus/issues/37665

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-18 16:26:30 +08:00
jaime
f348bd9441
feat: add segment,pipeline, replica and resourcegroup api for WebUI (#37344)
issue: #36621

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-11-07 11:52:25 +08:00
Zhen Ye
cae9e1c732
fix: drop collection failed if enable streaming service (#37444)
issue: #36858

- Start channel manager on datacoord, but with empty assign policy in
streaming service.
- Make collection at dropping state can be recovered by flusher to make
sure that
 milvus consume the dropCollection message.
- Add backoff for flusher lifetime.
- remove the proxy watcher from timetick at rootcoord in streaming
service.

Also see the better fixup: #37176

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-07 10:26:26 +08:00
wei liu
d51a808851
fix: Rootcoord stuck at graceful stop progress (#36880)
issue: #34553
when rootcoord trigger graceful stop progress, it will block until all
rpc finished. for create collection request, rootcoord need to block
until datacoord finish to watch all channels, but datacoord need to call
`rootcoord.Alloc` during watch channel, and rootcoord doesn't respond to
new request anymore. which cause create collection stucks, and graceful
stop progress stucks.

This PR remove the func call `rootcoord.Alloc` to solve the logic dead
lock during graceful stop progress.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-17 12:15:25 +08:00
Zhen Ye
99dff06391
enhance: using streaming service in insert/upsert/flush/delete/querynode (#35406)
issue: #33285

- using streaming service in insert/upsert/flush/delete/querynode
- fixup flusher bugs and refactor the flush operation
- enable streaming service for dml and ddl
- pass the e2e when enabling streaming service
- pass the integration tst when enabling streaming service

---------

Signed-off-by: chyezh <chyezh@outlook.com>
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-08-29 10:03:08 +08:00
XuanYang-cn
f12e368a76
fix: Fill nil schema so that Milvus can watch channel for those upgraded from 2.2 to 2.4 #35695 (#35694)
See also: [#35701 ](https://github.com/milvus-io/milvus/issues/35701)

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-08-27 10:36:59 +08:00
congqixia
c992a61a23
enhance: Separate allocator pkg in datacoord (#35622)
Related to #28861

Move allocator interface and implementation into separate package. Also
update some unittest logic.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-08-22 10:06:56 +08:00
XuanYang-cn
314f4d995b
enhance: Tidy dc channel manager (#34515)
See also: #34518

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-07-09 18:26:12 +08:00
jaime
21fc5f5d46
enhance: Remove datanode reporting TT based on MQ implementation (#34421)
issue: #34420

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-05 15:48:09 +08:00
congqixia
d51d0954bd
enhance: Continue loop when reassign channel fails (#34331)
Log will be confusing when `Reassign` channel operation failed for both
success & failure log will be printed in row. This PR continue the loop
to avoid this output.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-04 14:20:10 +08:00
jaime
d1f57aa4ba
enhance: remove deprecated code within channel manager (#34340)
issue: https://github.com/milvus-io/milvus/issues/33994

only remove deprecated code, no additional changes.

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-03 19:46:09 +08:00
jaime
d6afb31b94
enhance: make subfunctions of datanode component modular (#33992)
issue: #33994

also remove deprecated channel manager based on the etcd implementation

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-07-01 14:46:07 +08:00
jaime
9630974fbb
enhance: move rocksmq from internal to pkg module (#33881)
issue: #33956

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-06-25 21:18:15 +08:00
yiwangdr
e895cfed84
fix: reduce redundant map operations in datacoord (#33343)
More refactories will be added.
issue: #33342

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-05-24 12:47:40 +08:00
congqixia
8cf2cf5c94
enhance: Add go-deadlock as unittest only dependency (#33063)
See also #33062

This PR:

- Add `lock.RWMutex` & `lock.Mutex` alias to switch implementation based
  on build flags
- When build flags has `test` in it, use `go-deadlock` to detect
  possible deadlocks
- Replace all `sync.RWMutex` & `sync.Mutex` in datacoord pkg

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-05-15 16:33:34 +08:00
yiwangdr
b1eacb2ae8
feat: datacoord/node watch based on rpc (#32036)
issue: https://github.com/milvus-io/milvus/issues/25309

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-05-07 15:49:30 +08:00
yiwangdr
037de8e4d3
enhance: speed up minor functions calls in datacoord (#32389)
Related to https://github.com/milvus-io/milvus/issues/32165

1. nodeid based channel store access should use map access instead of
iteration.

2. The join-ish functions calls are slow when # collections/segments
increases (e.g. 10k).
e.g.
getNumRowsOfCollectionUnsafe is O(num_segments); GetAllCollectionNumRows
is of O(num_collections*num_segments).

Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2024-04-20 07:55:21 +08:00
congqixia
83da08c388
enhance: Use map instead of slice to maintain channel info (#32273)
See also #32165

`ChannelManager.Match` is a frequent operation for datacoord. When the
collection number is large, iteration over all channels will cost lots
of CPU time and time consuming.

This PR change the data structure storing datanode-channel info to map
avoiding this iteration when checking channel existence.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-04-16 15:57:19 +08:00
wei liu
0d849a6c0a
fix: fix collectionInfo leak in datacoord (#32175)
issue: #32029

lack of logic to clean collection info in datacoord's meta, This PR
clean collection info after drop channel, to avoid collection info leak
in datacoord

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-04-15 16:33:19 +08:00
smellthemoon
1c1f2a1371
enhance:change some logs (#29579)
related #29588

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-05 16:12:48 +08:00
XuanYang-cn
623939c9f5
enhance: Remove not in use policies (#29448)
THe results don't meet our requirements, and the code hasn't been
maintained for a long time.

See also: #29447

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-12-28 10:38:46 +08:00
XuanYang-cn
ae180d1628
enhance: Change ChannelManager to interface (#29300)
Rewrite cluster test
issue: #28854

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-12-25 19:24:46 +08:00
wei liu
fdbca10e23
fix: Fix channel manager bg checker exit when disable auto balance (#28459)
issue: #28454

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-20 18:20:22 +08:00
XuanYang-cn
a153950b10
Change channel to Interface (#27839)
This PR changes `*channel` into RWChannel interface

See also: #25309

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-11-13 11:16:18 +08:00
wei liu
14c8a90517
Fix auto balance block channel reassign after datanode restart (#28275)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-09 19:00:25 +08:00
wei liu
5b45a138b1
disable auto balance when old node exists (#28191)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-07 14:02:20 +08:00
jaime
6749957e71
Refine RPC call in unwatch drop channel (#27864)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-24 17:46:15 +08:00
Xiaofan
2ea7579dbb
Reduce rpc size for GetRecoveryInfoV2 (#27483)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-10-23 21:44:09 +08:00
jaime
d2dbbbc11b
Reduce write lock scope in channel manager (#27823)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-10-21 07:58:16 +08:00
congqixia
49516d44b4
Add ctx parameter and log tracer for watch and selectNodes (#27809)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-20 04:22:11 +08:00
MrPresent-Han
cb71a3e235
rm dependency to rc when getting recovery info(#25363) (#27405)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-10-09 18:51:32 +08:00
SimFG
26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
congqixia
8d13717cac
Fill Collection start position timestamp in WatchInfo (#26370)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-08-16 09:05:32 +08:00
Enwei Jiao
66fdc71479
Refactor logs in DataCoord & DataNode (#25574)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-07-14 15:56:31 +08:00
yiwangdr
c7b851f870
add interface for non-watch metakv (#25092)
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2023-06-26 09:20:44 +08:00
Xiaofan
72c5e2a41a
Fix channel reassigned to other datanodes (#25015)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-06-21 21:26:42 +08:00
congqixia
41af0a98fa
Use go-api/v2 for milvus-proto (#24770)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-09 01:28:37 +08:00
congqixia
4a22af6e1a
Unwatch channel in watch buffer (#23548)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-04-20 10:34:31 +08:00
congqixia
d83654c33f
Add Close method for ChannelManager in datacoord (#23493)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-04-18 17:54:31 +08:00
zhenshan.cao
4a32b842e8
Improve the check logic of channel remove (#23473)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-04-18 02:58:30 +08:00
congqixia
ba84f52119
Fix watcher loop quit and channel shouldDrop logic (#23402)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-04-14 09:54:28 +08:00
jaime
c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
MrPresent-Han
77c9e33e70
support dml channel balancer on datacoord (#22324) (#22377) (#22692)
Signed-off-by: MrPresent-Han <jamesharden11122@gmail.com>
2023-03-20 10:01:56 +08:00
zhenshan.cao
e768437681
Correct usage of Timer and Ticker (#22228)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2023-02-23 18:59:45 +08:00
aoiasd
148a024e05
Add tickle for datacoord watch event (#21193)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-02-15 16:20:34 +08:00
Enwei Jiao
89b810a4db
Refactor all params into ParamItem (#20987)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>

Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2022-12-07 18:01:19 +08:00
Ten Thousand Leaves
0700e56008
Increase MaxWatchDuration and make it configurable (#20884)
/kind improvement

Signed-off-by: Yuchen Gao <yuchen.gao@zilliz.com>

Signed-off-by: Yuchen Gao <yuchen.gao@zilliz.com>
2022-11-29 17:19:14 +08:00
bigsheeper
fc15789da9
Ensure compatibility of channel seek position and move syncPeriod to config (#20504)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2022-11-12 21:09:04 +08:00
bigsheeper
cd19d99ad7
Add channel level checkpoint (#20350)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2022-11-10 22:13:04 +08:00