Commit Graph

1658 Commits

Author SHA1 Message Date
smellthemoon
44ddcb5a63
fix: not check has_value before get value in JSON (#37128)
https://github.com/milvus-io/milvus/issues/36236
also: https://github.com/milvus-io/milvus/issues/37113

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-25 17:19:28 +08:00
cqy123456
ff0b7ea0ef
enhance: build interim index for mmapped vector in ChunkedSealedSegment (#36993)
issue:https://github.com/milvus-io/milvus/issues/36392
related pr: https://github.com/milvus-io/milvus/pull/36391

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-25 15:55:28 +08:00
Yinzuo Jiang
3628593d20
feat: Implement custom function module in milvus expr (#36560)
OSPP 2024 project:
https://summer-ospp.ac.cn/org/prodetail/247410235?list=org&navpage=org

Solutions:

- parser (planparserv2)
    - add CallExpr in planparserv2/Plan.g4
    - update parser_visitor and show_visitor
- grpc protobuf
    - add CallExpr in plan.proto
- execution (`core/src/exec`)
- add `CallExpr` `ValueExpr` and `ColumnExpr` (both logical and
physical) for function call and function parameters
- function factory (`core/src/exec/expression/function`)
    - create a global hashmap when starting milvus (see server.go)
- the global hashmap stores function signatures and their function
pointers, the CallExpr in execution engine can get the function pointer
by function signature.
- custom functions
    - empty(string)
    - starts_with(string, string)
- add cpp/go unittests and E2E tests

closes: #36559

Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
2024-10-25 15:25:30 +08:00
smellthemoon
6ef014d931
fix: get correct size when sealed segment chunked (#37062)
#37019

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-25 12:01:31 +08:00
Gao
ad2df904c6
fix: correctly set ExecTermArrayVariableInField bitset result (#37111)
issue: https://github.com/milvus-io/milvus/issues/37110

Signed-off-by: chasingegg <chao.gao@zilliz.com>
2024-10-24 18:52:02 -07:00
Bingyi Sun
bf956a3ec2
fix: fix string field has invalid utf-8 (#37104)
issue: https://github.com/milvus-io/milvus/issues/37083
We use vector of string_view to save data temporally but real string
data will be released after record batch is deconstructed.
Change it to vector of string to avoid memory corruption.

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-24 18:33:47 -07:00
smellthemoon
2b3f5bec07
fix: panic when create index on all none data (#37046)
#37045

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-24 17:09:28 +08:00
yellow-shine
8902e2220e
enhance: enable asan for cpp unittest (#37041)
https://github.com/milvus-io/milvus/issues/35854

Signed-off-by: chyezh <chyezh@outlook.com>
Co-authored-by: chyezh <chyezh@outlook.com>
2024-10-23 17:21:27 +08:00
Bingyi Sun
90b3907a92
fix: fix missing return value in chunked column (#37064)
issue: https://github.com/milvus-io/milvus/issues/36834

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-22 10:29:19 -07:00
Alexander Guzhva
5a1f752272
enhance: [bitset] multiple 'and' and 'or' in a single op (#33345)
issue #34117
* Refactoring
* Added a capability to perform multiple bitwise `and` and `or`
operations in a single op
* AVX2, AVX512, ARM NEON, ARM SVE backed bitwise `and`, `op`, `xor` and
`sub` ops
* more unit tests for bitset
* fixed a bug in `or_with_count` for certain bitset sizes
* fixed a bug for certain offset values for inplace operations that take
two bitsets

Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>
2024-10-22 16:25:33 +08:00
smellthemoon
6bedc7e8c8
fix: not set valid_data in bitmap index when mmap (#37023)
#37013

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-22 12:03:26 +08:00
foxspy
346510ed23
enhance: Update Knowhere version (#37000)
Signed-off-by: foxspy <xian_hust@foxmail.com>
2024-10-21 11:39:26 +08:00
cqy123456
304098cd40
fix:Chunk Id out of range in vector BF search, after the growing index removes the vec chunks. (#36939)
issue: https://github.com/milvus-io/milvus/issues/36871
related pr: https://github.com/milvus-io/milvus/pull/36938

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-18 12:21:25 +08:00
SimFG
903c18ba26
enhance: consider the mmap chunck cache config when resource usage estimate (#36814)
- issue: #36530

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-18 10:17:23 +08:00
foxspy
3de57ec4fa
enhance: add vector index mgr to remove vector index type dependency (#36843)
issue: #34298

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-17 22:15:25 +08:00
smellthemoon
eb3e4583ec
enhance: all op(Null) is false in expr (#35527)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-10-17 21:14:30 +08:00
cqy123456
b474374ea5
enhance: use growingMmapEnabled to control the behavior of interim index, not vectorField (#36500)
issue:https://github.com/milvus-io/milvus/issues/36392
related pr: https://github.com/milvus-io/milvus/pull/36391

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-17 20:25:24 +08:00
Bingyi Sun
b2037c95a8
fix: use chunk_row_nums to iterate (#36882)
Fix segmentation fault error and remove useless codes.
https://github.com/milvus-io/milvus/issues/36834

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-16 11:15:25 +08:00
Buqian Zheng
9997c5de34
fix: remove excessive logging (#36859)
issue: https://github.com/milvus-io/milvus/issues/35853

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-10-16 10:47:22 +08:00
cqy123456
aa904be6ec
enhance: support sparse vector mmap in growing segment type (#36566)
issue: https://github.com/milvus-io/milvus/issues/32984
related pr: https://github.com/milvus-io/milvus/pull/36565

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-10-15 10:59:23 +08:00
Zhen Ye
f46c3acea9
fix: heap buffer overflow when unittest at index wrapper (#36838)
issue: #35852

---------

Signed-off-by: chyezh <chyezh@outlook.com>
2024-10-14 18:13:22 +08:00
Bingyi Sun
3a09b438c2
fix: fix macos code checker (#36817)
https://github.com/milvus-io/milvus/issues/36829

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-14 11:11:51 +08:00
sre-ci-robot
e170991a10
[automated] Update Knowhere Commit (#36823)
Update Knowhere Commit
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-10-13 01:21:20 +08:00
Min Tian
ef0c649bda
enhance: update knowhere version to support diskann iterator (#36813)
issue: #36812

Signed-off-by: min.tian <min.tian.cn@gmail.com>
2024-10-12 18:05:22 +08:00
Bingyi Sun
a75bb85f3a
feat: support chunked column for sealed segment (#35764)
This PR splits sealed segment to chunked data to avoid unnecessary
memory copy and save memory usage when loading segments so that loading
can be accelerated.

To support rollback to previous version, we add an option
`multipleChunkedEnable` which is false by default.

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-10-12 15:04:52 +08:00
aoiasd
db34572c56
feat: support load and query with bm25 metric (#36071)
relate: https://github.com/milvus-io/milvus/issues/35853

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-11 10:23:20 +08:00
zhagnlu
b1e678dcba
fix: fix json in [] expr bug (#36721)
#36718

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-10-11 01:11:20 +08:00
Buqian Zheng
f7b811450d
feat: add enable_tokenizer params to VarChar field (#36480)
issue: #35922

add an enable_tokenizer param to varchar field: must be set to true so
that a varchar field can enable_match or used as input of BM25 function

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-10-10 20:33:21 +08:00
SimFG
130a923dec
enhance: the estimate method when loading the collection (#36307)
- issue: #36530

---------

Signed-off-by: SimFG <bang.fu@zilliz.com>
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
Co-authored-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-09 17:35:19 +08:00
congqixia
c3d910756b
enhance: Update knowhere commit to fix mac compilation (#36706)
Related to zilliztech/knowhere#879

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-09 16:05:20 +08:00
sre-ci-robot
3936d12661
[automated] Update Knowhere Commit (#36634)
Update Knowhere Commit
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-10-01 01:05:15 +08:00
Rijin-N
a05a37a583
enhance: GCS native support (GCS implemented using Google Cloud Storage libraries) (#36214)
Native support for Google cloud storage using the Google Cloud Storage
libraries. Authentication is performed using GCS service account
credentials JSON.

Currently, Milvus supports Google Cloud Storage using S3-compatible APIs
via the AWS SDK. This approach has the following limitations:

1. Overhead: Translating requests between S3-compatible APIs and GCS can
introduce additional overhead.
2. Compatibility Limitations: Some features of the original S3 API may
not fully translate or work as expected with GCS.

To address these limitations, This enhancement is needed.

Related Issue: #36212
2024-09-30 13:23:32 +08:00
Buqian Zheng
94005b7198
fix: Sparse float vector incorrectly ExpandData at mmap mode (#36603)
issue: #36561

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-09-30 10:39:16 +08:00
yihao.dai
8ed34dce84
enhance: Reopen chunk cache cpp ut (#33622)
issue: https://github.com/milvus-io/milvus/issues/33210

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-28 18:19:15 +08:00
zhagnlu
9e3efa06be
fix:fix empty search result bug (#36582)
#36450

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-28 17:45:16 +08:00
zhagnlu
0799d927c6
fix:fix term expr overflow bug (#36525)
#36520

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-26 15:01:14 +08:00
sre-ci-robot
447e326629
[automated] Update Knowhere Commit (#36527)
Update Knowhere Commit
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-09-26 01:15:13 +08:00
Buqian Zheng
8495bc6bbc
fix: fix broken Sparse Float Vector raw data mmap (#36183)
issue: https://github.com/milvus-io/milvus/issues/36182

* improved `Column.h` to make the code much more readable and
maintainable, and added detailed comments.
* fixed an issue where `ArrayColumn::NumRows()` always returns 0 when
the mmap backing storage is a file.
* removed unused `ColumnBase` constructors and unnecessary members so we
don't get confused.
* Updated `test_chunk_cache.cpp` to make the tests parameterized: to
test both mmap enabled and disabled. Added sparse field in the test to
add coverage.
* re-enabled test `Sealed::GetSparseVectorFromChunkCache`. 
* But 2 other disabled tests `Sealed::WarmupChunkCache` and
`Sealed::GetVectorFromChunkCache` remain disabled, there seems to be
errors. @bigsheeper PTAL.

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-09-25 18:59:13 +08:00
yihao.dai
8cda48a96a
enhance: Use mmap.scalarIndex config for text index (#36400)
issue: https://github.com/milvus-io/milvus/issues/35273

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-09-24 12:21:13 +08:00
sre-ci-robot
167e4fb10d
[automated] Update Knowhere Commit (#36352)
Update Knowhere Commit
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-09-19 01:01:10 +08:00
Bingyi Sun
23b95aeba3
fix: remove element type check (#35828)
https://github.com/milvus-io/milvus/issues/36275
Array's element type is not same with schema's. It is INT32 for INT16
and INT8

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-09-18 11:37:10 +08:00
jaime
2ff3765058
enhance: catch std::stoi exception and improve error msg (#36267)
issue: #36255

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-09-14 16:17:08 +08:00
zhagnlu
489087d18b
enhance: refactor executor framework V2 (#35251)
#32636

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-13 20:57:09 +08:00
congqixia
58d3200986
enhance: Filter out non-hit delete records during load delta (#36207)
Related to #35303

This PR utilizes pk index in segment to exclude non-hit delete record
during load delete records. This ability is crucial when l0/delete
forward policy only replies on segment itself(without BF filtering).

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-13 19:05:08 +08:00
Jiquan Long
f0f2fb4cf0
enhance: span tracing of c++ part (#36205)
fix: https://github.com/milvus-io/milvus/issues/36204

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-09-13 11:19:09 +08:00
zhagnlu
5e5e87cc2f
enhance: rename some params and reduce default bitmapCardinalityLimit… (#36138)
#32900

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-09-12 12:09:08 +08:00
Jiquan Long
89bf226f0b
feat: support keyword text match (#35923)
fix: #35922

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-09-10 15:11:08 +08:00
Bingyi Sun
53a8a24554
fix: fix empty indices of sparse float (#35403)
https://github.com/milvus-io/milvus/issues/35401

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-09-10 14:23:07 +08:00
congqixia
851f3b9883
fix: Make legacy non-lexicographic branch break swtich (#36125)
Related to #35941
Previous PR: #36034

This patch makes the switch branching logic correct and make the unit
test work for cases which does not select the whole dataset.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-10 10:15:07 +08:00
congqixia
3123093dd7
enhance: Use MARISA_LABEL_ORDER when building trie index (#36034)
Related to #35941
Previous PR: #35943

This PR make `Trie` index using `MARISA_LABEL_ORDER`, which make
predictive search iterating in lexicographic order.

When trie index is build in label order, lexicographc could be utilized
accelerating `Range` operations.

However according to the official document, using `MARISA_LABEL_ORDER`
will make "exact match lookup, common prefix search, and predictive
search" slower.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-09-09 14:29:05 +08:00