Commit Graph

187 Commits

Author SHA1 Message Date
wei liu
92971707de
enhance: Add restful api for devops to execute rolling upgrade (#29998)
issue: #29261
This PR Add restful api for devops to execute rolling upgrade, including
suspend/resume balance and manual transfer segments/channels.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-27 16:15:19 +08:00
congqixia
8e5865f630
enhance: Save collection targets by batches (#31616)
See also #28491 #31240

When colleciton number is large, querycoord saves collection target one
by one, which is slow and may block querycoord exits.

In local run, 500 collections scenario may lead to about 40 seconds
saving collection targets.

This PR changes the `SaveCollectionTarget` interface into batch one and
organizes the collection in 16 per bundle batches to accelerate this
procedure.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-27 00:09:08 +08:00
chyezh
9f9ef8ac32
enhance: transfer resource group and dbname to querynode when load (#30936)
issue: #30931

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-21 11:59:12 +08:00
congqixia
a647b84f3e
enhance: Add AllPartitionsID const to replace InvalidPartitionID (#31438)
"-1" as `InvalidPartitionID` previously used as All partition place
holder in delete cases. It's confusing and hard to maintain when a const
var has more than one meaning.

This PR add `AllPartitionsID` to replace these usages in delete
scenarios.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-20 19:01:05 +08:00
congqixia
c3d53eb1bf
enhance: Remove metrics when target removed (#31399)
See also #31390

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-20 10:09:08 +08:00
congqixia
194a611814
enhance: Add metrics for querycoord current target cp lag (#31391)
See also #31390

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-19 14:07:05 +08:00
wei liu
3e7e9f15cd
fix: Wrong behavior of CurrentTargetFirst/NextTargetFirst in target maanger (#31379)
issue: #31162

when give scope CurrentTargetFirst/NextTargetFirst, it's expected to
scan both current and next target.

This PR fixed wrong behavior of CurrentTargetFirst/NextTargetFirst in
target manager, which may cause unexpected task generated, and load
collection may stuck forever due to dirty leader view.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-19 11:49:05 +08:00
wei liu
d79aa58b37
enhance: Speed up target recovery after query coord restart (#31240)
issue: #28491

after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.

This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-15 14:19:03 +08:00
chyezh
ff4237bb90
enhance: add hostname into node info (#30673)
issue: https://github.com/milvus-io/milvus/issues/30647

- Address may be reused in k8s environment. Using hostname can be
better.

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-15 10:45:06 +08:00
wei liu
efe8cecc88
enhance: refactor segment dist manager interface (#31073)
issue: #31091
This PR add `GetByFilter` interface in segment dist manager, instead of
all kind of get func

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-08 16:29:01 +08:00
congqixia
c886aa29ff
enhance: Use ListIndexes instead of DescribeIndex for qc broker (#31122)
See also #31103

Since querycoord need index meta information from datacoord only, broker
shall use `ListIndexes` to skip segment index building check logic in
datacoord

This PR is also related to #30538, in which DescribeIndex caused lots of
memory usage and lead to OOM eventually

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-07 21:43:03 +08:00
wei liu
99297ab81b
fix: Add retry on unimplemented error for datacoord (#30554)
issue: #30553

when datacoord with version 2.2 and querycoord with version 2.3 coexist
during rolling upgrade, `DescribeIndex/GetIndexInfo` will return
`unimplemented` error
This PR add retry on `DescribeIndex/GetIndexInfo`, to prevent load
collection failed during rolling upgrade from milvus 2.2 to 2.3.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-02-18 17:26:52 +08:00
SimFG
ddccccbcab
enhance: add the bytes data type for merge data and format some code (#30105)
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-18 22:18:55 +08:00
smellthemoon
e52ce370b6
enhance:don't store logPath in meta to reduce memory (#28873)
don't store logPath in meta to reduce memory, when service get
segmentinfo, generate logpath from logid.
#28885

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-18 22:06:31 +08:00
yah01
c68c128e47
fix: level 0 segments not loaded (#29908)
the recent changes move the level 0 segments list to a new proto field,
which leads to the QueryCoord can't see the level 0 segments, handle the
new changes
fix #29907

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-16 14:40:53 +08:00
congqixia
c4ddfff2a7
enhance: make Load process traceable in querycoord (#29806)
See also #29803

This PR:
- Add trace span for collection/partition load
- Use TraceSpan to generate Segment/ChannelTasks when loading
- Refine BaseTask trace tag usage

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-10 09:58:49 +08:00
congqixia
a3cb8e2625
fix: Add atomic method to get collection target (#29577)
Related to #29575

Add `getCollectionTarget` method which is atomic when scope is
`CurrentTargetFirst` or `NextTargetFirst`
Also return error when executor finds no channel in target manager

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-29 09:04:46 +08:00
wei liu
2ffde52f8a
fix: Upgrade from 2.2 should update CollectionLoadInfo (#29443)
milvus branch 2.3 add `loadType` in CollectionLoadInfo, so for
collection meta upgrade from 2.2, we should add `loadType` to
CollectionLoadInfo. This PR update CollectionLoadInfo with `loadType`
when meet a old version CollectionLoadInfo

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-26 14:18:47 +08:00
wei liu
008bae675d
enhance: Skip balance segment when channel need be balanced (#29116)
issue: #28622
After we support balance segment with growing segment count #28623, if
we balance segment and channel at same time, some segments need to be
rebalanced after balance channel finish.

This PR skip balance segment when channel need be balanced.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-14 16:44:43 +08:00
yah01
c0f6eccb7a
fix: No LevelZero segment in target (#28803)
the incorrect filter causes all LevelZero segment filtered, so the
deleted entities may be still visible
related: #27349

Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-29 11:48:27 +08:00
wei liu
911a915798
feat: enable balance based on growing segment row count (#28623)
issue: #28622 

query node with delegator will has more rows than other query node due
to delgator loads all growing rows.
This PR enable the balance segment which based on the num of growing
rows in leader view.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-27 14:58:26 +08:00
aoiasd
13a5b9f64a
fix: query l0 segment bugs (#28558)
relate: https://github.com/milvus-io/milvus/issues/27675

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-11-20 17:26:23 +08:00
wei liu
7895ac96b5
enhance: Remove rpc during querycoord start (#28396)
issue: #28332

during querycoord's recover, it try to call `DescribeCollection` and
`ShowPartitions` to root coord, to checker whether collection or
partition has been released in rootcoord. but if rootcoord isn't not
ready yet, the rpc will fail, the querycoord panic.

to fix this, we remove rpc call during querycoord's start

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-11-17 11:48:19 +08:00
yah01
dc89730a50
Support collection-level mmap control (#26901)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-02 23:52:16 +08:00
wei liu
e0222b2ce3
refine target manager code style (#27883)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-10-25 00:44:12 +08:00
Xiaofan
2ea7579dbb
Reduce rpc size for GetRecoveryInfoV2 (#27483)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-10-23 21:44:09 +08:00
yah01
635efdf170
Schedule loading L0 segments first (#27593)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-19 11:14:06 +08:00
yihao.dai
49b3a12804
Return newly defined merr instead of grpc unimplemented err (#27751)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-10-18 15:32:11 +08:00
congqixia
b91a5ef42c
Refine log and err handling in querycoord broker (#27546)
- Add log.Ctx(ctx) for all log occurences
- Use `merr.CheckRPCErr` for all grpc response error handling

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-10 11:49:32 +08:00
MrPresent-Han
cb71a3e235
rm dependency to rc when getting recovery info(#25363) (#27405)
Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-10-09 18:51:32 +08:00
Xiaofan
41124f281a
Remove parser dependency (#27514)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-10-08 15:05:31 +08:00
congqixia
5d558623fe
Add revive sub-lints and fix existing problems (#27495)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-10-07 20:53:38 +08:00
yah01
8394b3a1ec
Block creating new error from status reason (#27426)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-10-07 11:29:32 +08:00
Jiquan Long
370fdaf50d
Record engine version for segment index (#27384)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-09-28 18:03:28 +08:00
yah01
a8ce1b6686
Refine QueryCoord stopping (#27371)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-27 16:27:27 +08:00
wei liu
4071132f6a
reload loading collection when qc recover (#27300)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-09-27 11:43:28 +08:00
yah01
6539a5ae2c
Refine DataCoord status (#27262)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-26 17:15:27 +08:00
jaime
7f7c71ea7d
Decoupling client and server API in types interface (#27186)
Co-authored-by:: aoiasd <zhicheng.yue@zilliz.com>

Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-09-26 09:57:25 +08:00
foxspy
5db4a0489e
dynamic index version control (#27335)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-25 21:39:27 +08:00
foxspy
370b6fde58
milvus support multi index engine (#27178)
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-09-22 09:59:26 +08:00
SimFG
26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
congqixia
cc9974979f
Add staticcheck linter and fix existing problems (#27174)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-19 10:05:22 +08:00
yah01
168e82ee10
Fix panic while handling with the nil status (#27040)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-15 10:09:21 +08:00
SimFG
28681276e2
Improve the retry of the rpc client (#26795)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-06 17:43:14 +08:00
Enwei Jiao
fb0705df1b
Decouple basetable and componentparam (#26725)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-09-05 10:31:48 +08:00
congqixia
1a8cf5c415
Organize all mockery generation commands in Makefile (#26826)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-09-04 21:19:48 +08:00
yah01
3349db4aa7
Refine errors to remove changes breaking design (#26521)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-04 09:57:09 +08:00
yah01
941a383019
Fix failed to load collection with more than 128 partitions (#26763)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-09-02 00:09:01 +08:00
wei liu
5602b22531
refine checker code style (#26759)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-09-01 11:57:01 +08:00
Enwei Jiao
7d61355ab0
Refactor log for Query (#26310)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-08-14 18:57:32 +08:00
wei liu
6f89620a43
remove pull target rpc from lock (#26054)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-08-04 10:31:06 +08:00
Bingyi Sun
a3e22786ed
Move meta store to kv catalog (#25915)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-07-31 13:57:04 +08:00
congqixia
0bc03ede0d
Add eventlog pkg and support grpc streaming event observation (#25812)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-25 17:23:01 +08:00
wei liu
1748c54fd7
skip load/release segment when more than one delegator exist (#25718)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-07-24 19:01:01 +08:00
wei liu
fc19b85a40
fix count(*)retrieve redundant growing segment (#25825)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-07-24 14:09:00 +08:00
Bingyi Sun
9f31fc9a31
Add broker timeout config item. (#25855)
Signed-off-by: sunby <sunbingyi1992@gmail.com>
2023-07-24 14:07:00 +08:00
yah01
224515eaa3
Add segment dist containing condition for loading segment (#25736)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-19 15:02:58 +08:00
yah01
948d1f1f4a
Handle errors by merr for QueryCoord (#24926)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-07-17 14:59:34 +08:00
yiwangdr
b9189b9f41
Organize mocks from types.go (#25466)
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2023-07-14 10:12:31 +08:00
congqixia
5aec6036dc
Add index info in GetDataDistribution response (#25444)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-11 11:22:29 +08:00
congqixia
2bed5bc461
Fix false load success for partition load timeout but collection remains (#25379)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-07-06 19:10:25 +08:00
wei liu
2ae6def394
fix sync target version with wrong segment list (#25337)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-07-06 10:30:26 +08:00
wei liu
cba0feb119
add coordinator broker, to unify rootcoord api access (#25187)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-07-03 18:28:25 +08:00
wei liu
68ae199a9f
load segment with target version, avoid read redundant segment (#24929)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-06-27 11:48:45 +08:00
jaime
18df2ba6fd
[Cherry-Pick] Support Database (#24769)
Support Database(#23742)
Fix db nonexists error for FlushAll (#24222)
Fix check collection limits fails (#24235)
backward compatibility with empty DB name (#24317)
Fix GetFlushAllState with DB (#24347)
Remove db from global meta cache after drop database (#24474)
Fix db name is empty for describe collection response (#24603)
Add RBAC for Database API (#24653)
Fix miss load the same name collection during recover stage (#24941)

RBAC supports Database validation (#23609)
Fix to list grant with db return empty (#23922)
Optimize PrivilegeAll permission check (#23972)
Add the default db value for the rbac request (#24307)

Signed-off-by: jaime <yun.zhang@zilliz.com>
Co-authored-by: SimFG <bang.fu@zilliz.com>
Co-authored-by: longjiquan <jiquan.long@zilliz.com>
2023-06-25 17:20:43 +08:00
congqixia
a3630e17a0
Refine grpc Status handling: retry legacy only for Unimplemented (#25041)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-25 10:30:43 +08:00
yah01
7eeb78726e
Fix refresh unstable (#25027)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-20 21:50:43 +08:00
yiwangdr
4387f36897
make etcdKV private (#24778)
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
2023-06-13 10:52:38 +08:00
congqixia
41af0a98fa
Use go-api/v2 for milvus-proto (#24770)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-09 01:28:37 +08:00
yihao.dai
89db828f71
Fix load collection failed after drop partition (#24680)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-06-07 19:04:36 +08:00
yah01
848ef7418b
Fix the refresh may be notified finished early (#24438)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-05-26 18:43:26 +08:00
Jiquan Long
30415e1b83
Fix metric QueryCoordNumCollections (#24053) (#24107)
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-05-15 16:33:22 +08:00
wei liu
7407669a02
fix qc coordinator retry policy (#24057)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-05-12 16:59:21 +08:00
wei liu
fb6e4ceee7
correct the behavior of create duplicate rg (#23963)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-05-09 14:06:40 +08:00
wei liu
f36cdc182a
add retry on get recovery info (#23764)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-04-27 17:46:34 +08:00
foxspy
6f4ed517de
add growing segment index (#23615)
Signed-off-by: xianliang <xianliang.li@zilliz.com>
2023-04-26 10:14:41 +08:00
wei liu
cbfe7a45ef
fix pull target (#23491)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-04-18 18:30:32 +08:00
congqixia
b5a73e6d1a
Add OWNERS files for querycoordv2 sub pkgs (#23489)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-04-18 15:52:30 +08:00
yah01
bed8d6892e
Protect the QueryCoord meta from stale data migrated from old version (#23412)
Fix #23411

Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-14 14:42:28 +08:00
yah01
296380d6e6
Support async refresh (#23107)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-04-12 15:06:28 +08:00
jaime
c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
yah01
75737c65ac
Refine error handle of QueryCoord (#23068)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-31 10:54:29 +08:00
wei liu
74da53c027
fix update load percentage (#23054)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-03-30 10:48:23 +08:00
yah01
dc6d4b913a
Fix partitions may be not recovered with double load partitions (#23061)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-28 21:38:02 +08:00
yah01
081572d31c
Refactor QueryNode (#21625)
Signed-off-by: yah01 <yang.cen@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: aoiasd <zhicheng.yue@zilliz.com>
2023-03-27 00:42:00 +08:00
XuanYang-cn
93bc805933
Enhance ID allocator in DataNode (#22905)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-03-23 19:43:57 +08:00
yah01
68b9cabb87
Fix GetShardLeader returns old leader (#22887)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-21 16:57:57 +08:00
yihao.dai
1f718118e9
Dynamic load/release partitions (#22655)
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2023-03-20 14:55:57 +08:00
wei liu
bb5088e605
fix unassign from rg (#22747)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-03-16 14:27:55 +08:00
yah01
1a4732bb19
Use new errors to handle load failures cache (#22672)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-10 17:15:54 +08:00
yah01
90a5aa6265
Refine errors, re-define error codes (#22501)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-03-09 15:47:52 +08:00
wei liu
d433e95ce0
fix transfer node meta err (#22420)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-02-27 19:29:47 +08:00
Enwei Jiao
697dedac7e
Use cockroachdb/errors to replace other error pkg (#22390)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2023-02-26 11:31:49 +08:00
wei liu
a9a263d5a8
fix assign node to replica in nodeUp (#22323)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-02-23 14:15:45 +08:00
wei liu
87a4ddc7e2
fix rg e2e (#22187)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-02-16 10:48:34 +08:00
wei liu
7b4511b8f4
fix transfer node (#22120)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-02-14 16:16:34 +08:00
wei liu
e9684ddb5a
refine rg capacity behavior (#22089)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-02-09 18:46:31 +08:00
Bingyi Sun
4005508880
Fix load collection blocks when memory is limited. (#21990)
Signed-off-by: sunby <bingyi.sun@zilliz.com>
Co-authored-by: sunby <bingyi.sun@zilliz.com>
2023-02-06 16:07:53 +08:00
wei liu
5da6aade02
fix default capacity of default rg (#21965)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-02-06 15:55:56 +08:00
wei liu
73c44d4b29
resource group impl (#21609)
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-01-30 10:19:48 +08:00