Commit Graph

21071 Commits

Author SHA1 Message Date
Hao Tan
67c4340565
feat: Geospatial Data Type and GIS Function Support for milvus server (#35990)
issue:https://github.com/milvus-io/milvus/issues/27576

# Main Goals
1. Create and describe collections with geospatial fields, enabling both
client and server to recognize and process geo fields.
2. Insert geospatial data as payload values in the insert binlog, and
print the values for verification.
3. Load segments containing geospatial data into memory.
4. Ensure query outputs can display geospatial data.
5. Support filtering on GIS functions for geospatial columns.

# Solution
1. **Add Type**: Modify the Milvus core by adding a Geospatial type in
both the C++ and Go code layers, defining the Geospatial data structure
and the corresponding interfaces.
2. **Dependency Libraries**: Introduce necessary geospatial data
processing libraries. In the C++ source code, use Conan package
management to include the GDAL library. In the Go source code, add the
go-geom library to the go.mod file.
3. **Protocol Interface**: Revise the Milvus protocol to provide
mechanisms for Geospatial message serialization and deserialization.
4. **Data Pipeline**: Facilitate interaction between the client and
proxy using the WKT format for geospatial data. The proxy will convert
all data into WKB format for downstream processing, providing column
data interfaces, segment encapsulation, segment loading, payload
writing, and cache block management.
5. **Query Operators**: Implement simple display and support for filter
queries. Initially, focus on filtering based on spatial relationships
for a single column of geospatial literal values, providing parsing and
execution for query expressions.
6. **Client Modification**: Enable the client to handle user input for
geospatial data and facilitate end-to-end testing.Check the modification
in pymilvus.

---------

Signed-off-by: tasty-gumi <1021989072@qq.com>
2024-10-31 20:58:20 +08:00
liliu-z
4bac2eb13e
enhance: Update Knowhere version (#37315)
Signed-off-by: Li Liu <li.liu@zilliz.com>
2024-10-31 17:24:20 +08:00
Zhen Ye
448cc08960
fix: flowgraph crash when channel releasing (#37285)
issue: #37284

Signed-off-by: chyezh <chyezh@outlook.com>
2024-10-31 16:30:21 +08:00
Xiaofan
f13faa37aa
fix: make sure alias is cached (#36807)
fix #36806

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-10-31 01:05:03 -07:00
SimFG
284fd68093
enhance: update the expr version to fix the method call error (#37259)
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-31 14:58:21 +08:00
cai.zhang
2ef6cbbf59
feat: The expression supports filling elements through templates (#37033)
issue: #36672

The expression supports filling elements through templates, which helps
to reduce the overhead of parsing the elements.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-31 14:20:22 +08:00
cai.zhang
4d98833bc3
fix: Set current partition stats version to 0 by default when not present (#37299)
issue: #37156 

1. Still need to record the current stats version. 
2. Set it to 0 when the current stats version is not found.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-31 12:48:21 +08:00
smellthemoon
b8492498ac
fix: mask with valid data when preCheckOverflow (#37221)
#37175

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-31 10:44:26 +08:00
Gao
2092dc0ba1
enhance: reserve vector space to reduce reallocate cost in Views() and StringViews() (#37182)
issue: #37152

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-10-31 10:02:21 +08:00
zhuwenxing
6e37372619
test: update checker (#37275)
Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2024-10-31 09:50:20 +08:00
congqixia
7961568223
fix: Rectify OffsetOrderedArray contain logic (#37305)
Related to #36887

Remove non-hit pk delete record logic does not work since
`insert_record_.contain` does not work due to logic problem.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-30 21:26:19 +08:00
yanliang567
3a3404658e
test: Add a test for iterator enhancement (#37296)
related issue: https://github.com/milvus-io/milvus/issues/37084

Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
2024-10-30 16:54:20 +08:00
Patrick Weizhi Xu
43ad9af529
fix: use max MvccTs for iterator (#37247)
issue: #37158

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-10-30 13:58:20 +08:00
cai.zhang
955219057c
enhance: Refine code for get_deleted_bitmap (#36819) (#37204)
issue: #33744 

master pr: #36819 

Check whether the PK is truly sorted in the debug model.

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-30 12:02:38 +08:00
Bingyi Sun
90948e9444
fix: add SearchOnSealed unit test and fix a bug (#37241)
issue: https://github.com/milvus-io/milvus/issues/37244

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-30 10:26:19 +08:00
yellow-shine
22745a4980
Bump milvus version to v2.4.14 (#37253)
Signed-off-by: Yellow Shine <sammy.huang@zilliz.com>
2024-10-29 20:58:23 +08:00
sre-ci-robot
a8aecf12a0 Update all contributors
Signed-off-by: sre-ci-robot <sre-ci-robot@zilliz.com>
2024-10-29 12:24:19 +00:00
Ted Xu
262a994d6d
enhance: generally improve the performance of mix compactions (#37163)
See #37234

---------

Signed-off-by: Ted Xu <ted.xu@zilliz.com>
2024-10-29 18:12:20 +08:00
congqixia
9539739781
enhance: Release compacted growing segment if in dropped list (#37245)
See also #37205

Previously releasing growing segments could be triggered by two
conditions:

- Sealed Segment with same id is loaded
- Segment start position is before target checkpoint ts

Which has a worst case that the corresponding sealed segment is
compacted and the checkpoint is pinned by a growing l0 segment.

This PR introduces a new rule that: a growing segment could be released
if the segment id appeared in current target dropped segment id list.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 18:04:21 +08:00
congqixia
7dd6651124
fix: [2.5] Ref collection meta when load l0 segment meta only (#37181)
Cherry-pick from master
pr: #37178
Related to #37177

Previous PR #37160

Collection meta is not ref-ed when loading l0 segment in `RemoteLoad`
policy, which cause collection meta release when lots of l0 segment
released.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 16:38:22 +08:00
smellthemoon
86b9c3ef4a
fix: to just check null in group by field only (#37191)
#37187

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-29 15:38:30 +08:00
Zhen Ye
889434691c
enhance: enable asan for e2e (#37149)
issue: #35854

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-10-29 14:14:24 +08:00
congqixia
3106384fc4
enhance: Return deltadata for DeleteCodec.Deserialize (#37214)
Related to #35303 #30404

This PR change return type of `DeleteCodec.Deserialize` from
`storage.DeleteData` to `DeltaData`, which
reduces the memory usage of interface header.

Also refine `storage.DeltaData` methods to make it easier to usage.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 12:04:24 +08:00
congqixia
0f59bfdf30
enhance: Use middleware to observe restful v2 in/out rpc stats (#37223)
Related to #36102

Previous PR #36107 add grpc inteceptor to observe rpc stats. Using same
strategy, this pr add gin middleware to observer restful v2 rpc stats.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 11:22:24 +08:00
congqixia
5a0135727d
fix: Check resource when loading deltalogs (#37195)
Related to #36887

`LoadDeltaLogs` API did not check memory usage. When system is under
high delete load pressure, this could result into OOM quit.

This PR add resource check for `LoadDeltaLogs` actions and separate
internal deltalog loading function with public one.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:04:25 +08:00
congqixia
224d797f94
fix: Use singleton delete pool and avoid goroutine leakage (#37220)
Related to #36887

Previously using newly create pool per request shall cause goroutine
leakage. This PR change this behavior by using singleton delete pool.
This change could also provide better concurrency control over delete
memory usage.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:02:24 +08:00
XuanYang-cn
26028f4137
fix: Exlude L0 compaction when clustering is executing (#37141)
Also remove conflit check when executing L0. The exclusive is already
guarenteed in scheduler

See also: #37140

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-29 06:28:24 +08:00
sre-ci-robot
1e75a42053 Update all contributors
Signed-off-by: sre-ci-robot <sre-ci-robot@zilliz.com>
2024-10-28 12:00:48 +00:00
zhuwenxing
4c108b1564
test: update jieba tokenizer in test (#37199)
/kind improvement

---------

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2024-10-28 19:22:22 +08:00
yellow-shine
f75660456d
enhance: [skip e2e]update mergify (#37210)
Signed-off-by: Yellow Shine <sammy.huang@zilliz.com>
2024-10-28 19:18:23 +08:00
congqixia
d8c1bd24f2
enhance: Utilize proxy metacache for HasCollection (#37185)
Related to #37183

Utilize proxy metacache for `HasCollection` request, if collection
exists in metacache, it could be deducted that collection must exist in
system.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 18:54:23 +08:00
Patrick Weizhi Xu
fc69df44a1
fix: set guarantee ts for seach/query iterator (#37180)
issue: #37158

Return the GuaranteeTS so that the subsequent requests following the
correct TS.

BeginTS is the current timestamp when the task is created.
The GuaranteeTS is the one parsed based on both consistency level and
beginTS, in PreExecute of the task on Proxy.
The delegator will wait until GuaranteeTS is met.
In PostExecute of the task on Proxy, the TS of the first iterator
request will be returned to the SDK and add it to the subsequent
requests.
Hence, if the default consistency level is Eventually or Bounded, the
order of TS will be
> Guarantee TS < BeginTS

If it returns the BeginTS, the second request will need to catch up and
result in extra 200ms max of latency, which results in something like

| Call | Latency |
| --- | --- |
| first call on `Next()` | 30ms |
| second call on `Next()` | 210ms |
| third call on `Next()` | 10ms |
| fourth call on `Next()` | 11 ms |
| ... | ... |

where we expect

| Call | Latency |
| --- | --- |
| first call on `Next()` | 30ms |
| second call on `Next()` | 10ms |
| third call on `Next()` | 10ms |
| fourth call on `Next()` | 11 ms |
| ... | ... |

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-10-28 15:57:35 +08:00
congqixia
f87acdf2a2
fix: Ref collection meta when load l0 segment meta only (#37178)
Related to #37177

Previous PR #37160

Collection meta is not ref-ed when loading l0 segment in `RemoteLoad`
policy, which cause collection meta release when lots of l0 segment
released.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 15:49:38 +08:00
jaime
33b0b8df80
fix: may exceed max tnx in etcd operations (#36775)
issue: #36772

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-10-28 15:37:30 +08:00
cai.zhang
86687bd8ed
enhance: Refine code for get_deleted_bitmap (#36819)
issue: #33744 

Check whether the PK is truly sorted in the debug model.

---------

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-28 15:19:30 +08:00
XuanYang-cn
4926021c02
fix: Skip mark compaction timeout for mix and l0 compaction (#37118)
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.

This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.

The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.

See also: #37108, #37015

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-28 14:33:29 +08:00
Bingyi Sun
b81f162f6a
fix: fix several bugs and refactor some codes related with chunked segment (#37168)
issue: https://github.com/milvus-io/milvus/issues/37147

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-28 14:17:30 +08:00
zhuwenxing
cdee149191
test: fix testcases for verification after chaos (#37153)
/kind improvement

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2024-10-28 10:33:29 +08:00
zhuwenxing
c8dd665bf6
test: supplementing case for text match (#36693)
/kind improvement

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2024-10-28 10:31:40 +08:00
congqixia
7774b7275e
enhance: Replace PrimaryKey slice with PrimaryKeys saving memory (#37127)
Related to #35303

Slice of `storage.PrimaryKey` will have extra interface cost for each
element, which may cause notable memory usage when delta row count
number is large.

This PR replaces PrimaryKey slice with PrimaryKeys interface saving the
extra interface cost.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 10:29:30 +08:00
jaime
9d16b972ea
feat: add tasks page into management WebUI (#37002)
issue: #36621

1. Add API to access task runtime metrics, including:
  - build index task
  - compaction task
  - import task
- balance (including load/release of segments/channels and some leader
tasks on querycoord)
  - sync task
2. Add a debug model to the webpage by using debug=true or debug=false
in the URL query parameters to enable or disable debug mode.

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-10-28 10:13:29 +08:00
foxspy
d7b2ffe5aa
enhance: add an unify vector index config checker (#36844)
issue: #34298

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-28 10:11:37 +08:00
zhagnlu
eeb67a3845
fix:reset default auto index type for scalar (#37086)
#32900

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-10-27 16:19:29 +08:00
Bingyi Sun
a2f0092e39
fix: check sparse float before calling get_dim (#37145)
https://github.com/milvus-io/milvus/issues/37146

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-26 16:25:29 +08:00
aoiasd
fd72151037
fix: merge datanode bm25 error after reload growing segment with no data (#37154)
Segment with numrow 0 don't init bm25 stats, cause flush with bm25 stats
failed.
relate: https://github.com/milvus-io/milvus/issues/37150

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-26 07:43:28 +08:00
congqixia
05f880708d
enhance: Make skip load work for all branches (#37160)
Related to #37112

Skip load logic used to work only when there is multiple segment load
info entires in load request. In continous delete case, delegator still
loads l0 segment, which occupies lot of memory.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 23:37:29 +08:00
yihao.dai
ed37c27bda
fix: Fix collection leak in querynode (#37061)
Unref the removed L0 segment count.

issue: https://github.com/milvus-io/milvus/issues/36918

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 19:59:29 +08:00
yellow-shine
139f4e5ab5
enhance: [codecov]split code coverage into components (#37137)
Signed-off-by: Yellow Shine <sammy.huang@zilliz.com>
2024-10-25 17:42:14 +08:00
smellthemoon
44ddcb5a63
fix: not check has_value before get value in JSON (#37128)
https://github.com/milvus-io/milvus/issues/36236
also: https://github.com/milvus-io/milvus/issues/37113

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-25 17:19:28 +08:00
yihao.dai
d7b2906318
enhance: Make dataNode.import.maxConcurrentTaskNum dynamic (#37102)
Resize import execution pool when config
`dataNode.import.maxConcurrentTaskNum` update.

issue: https://github.com/milvus-io/milvus/issues/37095

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 16:51:29 +08:00