Commit Graph

1351 Commits

Author SHA1 Message Date
cqy123456
74cfba0249
enhance:limit binlog index rows num (#30173)
issue: https://github.com/milvus-io/milvus/issues/27678
also relate issue: https://github.com/milvus-io/milvus/issues/30065

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-01-29 19:49:02 +08:00
sre-ci-robot
0542a0e7dc
[automated] Update Knowhere Commit (#30332)
Update Knowhere Commit
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-01-29 01:05:01 +08:00
zhagnlu
aeb1e36f00
enhance: change plan desc log from info to debug (#30304)
#30172

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-01-28 16:04:38 +08:00
xige-16
e9fdd2475d
fix: fix searchPlan metricType modified concurrently (#30227)
issue: #30225
/kind bug
Signed-off-by: xige-16 <xi.ge@zilliz.com>

---------

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-01-26 14:03:09 +08:00
MrPresent-Han
116d0f20b8
fix: groupby bug for ut (#30272)
related: #29965

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-25 20:57:00 +08:00
yihao.dai
c02fb64ad6
enhance: Allows proactive warming up of chunk cache (#30182)
Allows proactive warming up of chunk cache. Original vector data will be
asynchronously loaded into the chunk cache during the load process. It
has the potential to significantly reduce query/search latency for a
certain duration after the load, albeit with a concurrent increase in
disk usage.

issue: https://github.com/milvus-io/milvus/issues/30181

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-25 19:55:39 +08:00
yah01
a27c0e86fd
enhance: reduce many I/O operations while loading disk index (#30189)
before this, every time writting the index chunk data into the disk,
there are 4 I/O operations:
- open the file
- seek to the offset
- write the data
- close the file

this optimized this to open only once and continiously write all data.

This also makes it concurrent to load the files from object storage

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-25 15:23:02 +08:00
zhagnlu
8c58d9af67
enhance: optimize marisa trie range search for performance (#30079)
#30078
#29986

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-01-25 10:07:00 +08:00
Patrick Weizhi Xu
0907d76253
enhance: pass partition key scalar info if enabled when build vector index (#29931)
issue: #29892 

Pass optional scalar IVF offsets to Cardinal

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
2024-01-24 00:04:55 +08:00
cqy123456
42bb4e37e5
fix:diskann search crash when search list = 9999999999 (#30185)
issue: https://github.com/milvus-io/milvus/issues/29020
Json can't not pass a max_int32 value to int32_t, so let knowhere check
value range by itself.
After fix this, pymilvus will report:
pymilvus.exceptions.MilvusException: <MilvusException: (code=65535,
message=fail to search on QueryNode 6: worker(6) query failed: => failed
to search: arithmetic overflow: param search_list_size should be at most
2147483647)>

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-01-23 14:46:55 +08:00
cai.zhang
6cf2f09b60
feat: Support tencent cloud object storage for milvus (#30163)
issue: #30162

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-23 11:28:56 +08:00
yah01
a77693aa19
enhance: convert the GetObject util to async (#30166)
This makes it much easier to use

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-22 19:20:57 +08:00
sre-ci-robot
e967949cc5
[automated] Update Knowhere Commit (#30120)
Update Knowhere Commit
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-01-22 18:40:54 +08:00
MrPresent-Han
4436effdc3
enhance: support groupby based on scalar-index(#29965) (#30091)
related: #29965

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-22 10:50:54 +08:00
xige-16
aee19dcd6b
enhance: Opt vector dimension mismatch error message (#29928)
issue: https://github.com/milvus-io/milvus/issues/29791
/kind improvement

Signed-off-by: xige-16 <xi.ge@zilliz.com>

---------

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-01-19 17:52:54 +08:00
yah01
f542bdbf3c
enhance: calc the accurate mem size of segment (#30093)
this stats the real memory size of segment, also reduces the memory
usage in mmap mode
resolve #30095

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-19 12:32:53 +08:00
xige-16
fa7cf587b0
enhance: Opt metric type does not match error message (#29927)
issue: #29791 
/kind improvement
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-01-17 20:25:03 +08:00
yah01
1185e4dcd5
fix: written file size is over the int32 range and raises error (#30057)
we sum the total data size in int32, which could lead to an overflow
error
related #30056

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-17 16:42:54 +08:00
Bingyi Sun
8030b90891
fix: correct file name when loading index (#29985)
issue: #29973

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-01-16 10:24:52 +08:00
MrPresent-Han
c31e68446e
enhance: refine groupby-performance (#29933)
related: #29844

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-15 14:12:52 +08:00
chyezh
def717af55
fix: SealedIndexingEntry in SealedIndexingRecord may leak without smart pointer protect. (#29932)
may related issue: #29828

Signed-off-by: chyezh <ye.zhen@zilliz.com>
2024-01-14 10:28:51 +08:00
Bingyi Sun
e1258b8cad
feat: integrate storagev2 into loading segment (#29336)
issue: #29335

---------

Signed-off-by: sunby <sunbingyi1992@gmail.com>
2024-01-12 18:10:51 +08:00
yah01
f2e36db488
enhance: optimize the loading index performance (#29894)
this utilizes concurrent loading

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-12 17:44:51 +08:00
yah01
6c477ce3a7
enhance: optimize the loading strategy (#29910)
as we have the pool size limit so we don't need to limit the concurrency
manually

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-12 14:26:50 +08:00
yah01
aba2656e68
fix: missing field data after appending scalar index to loaded segment (#29912)
related #29843

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-12 14:04:54 +08:00
sre-ci-robot
4d11525f55
[automated] Update Knowhere Commit (#29904)
Update Knowhere Commit
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-01-12 14:00:50 +08:00
Xu Tong
e429965f32
Add float16 approve for multi-type part (#28427)
issue:https://github.com/milvus-io/milvus/issues/22837

Add bfloat16 vector, add the index part of float16 vector.

Signed-off-by: Writer-X <1256866856@qq.com>
2024-01-11 15:48:51 +08:00
Jiquan Long
67ab5be15a
enhance: optimize search performance of inverted index (#29794)
issue: #29793 
Use `DocSetCollector` instead of `TopDocsCollector`, which will avoid
scoring and sorting.

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-01-11 11:12:49 +08:00
zhagnlu
5164d30287
fix: increase expr recursion depth to avoid parse failed (#29860)
#29759

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-01-11 10:26:50 +08:00
yah01
031243fee7
feat: support mmap for marisa trie (#29613)
this supports mmap for marisa trie index
related https://github.com/milvus-io/milvus/issues/21866

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-11 10:22:50 +08:00
congqixia
d6429933a7
enhance: make Load process traceable in querynode & segcore (#29858)
See also #29803

This PR:
- Add trace span for `LoadIndex` & `LoadFieldData` in segment loader
- Add `TraceCtx` parameter for `Index.Load` in segcore
- Add span for ReadFiles & Engine Load for Memory/Disk Vector index

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-10 21:58:51 +08:00
Cai Yudong
cb9d9ec0f0
enhance: Correct sampleFraction's type to float (#29810)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2024-01-10 13:18:50 +08:00
Cai Yudong
600f6eff06
enhance: Upgrade gtest to 1.13.0 (#29805)
Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>
2024-01-10 13:16:57 +08:00
zhagnlu
601a8b801b
fix: add move cursor function to physical expr (#29603)
#29570

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-01-09 17:08:48 +08:00
zhenshan.cao
60e88fb833
fix: Restore the MVCC functionality. (#29749)
When the TimeTravel functionality was previously removed, it
inadvertently affected the MVCC functionality within the system. This PR
aims to reintroduce the internal MVCC functionality as follows:

1. Add MvccTimestamp to the requests of Search/Query and the results of
Search internally.
2. When the delegator receives a Query/Search request and there is no
MVCC timestamp set in the request, set the delegator's current tsafe as
the MVCC timestamp of the request. If the request already has an MVCC
timestamp, do not modify it.
3. When the Proxy handles Search and triggers the second phase ReQuery,
divide the ReQuery into different shards and pass the MVCC timestamp to
the corresponding Query requests.

issue: #29656

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-09 11:38:48 +08:00
xige-16
9702cef2b5
feat: Support multiple vector search (#29433)
issue #25639 

Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2024-01-08 15:34:48 +08:00
Jiquan Long
e9f3df3626
fix: inverted index file not found (#29695)
issue: https://github.com/milvus-io/milvus/issues/29654

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-01-07 20:26:49 +08:00
zhagnlu
d07197ab1a
enhance: add compare simd function (#29432)
#26137

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-01-07 20:20:57 +08:00
foxspy
271edc6669
fix: throw exception when upload file failed for DiskIndex (#29627)
related to : #29417 

cardinal indexes upload index files in `Serialize` interface, and throw
exception when the `Serialize` failed.

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-01-07 20:03:13 +08:00
cai.zhang
5dc300c4a9
fix: Fix bug for pk index doesn't have raw data (#29711)
issue: #29697

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-07 19:36:48 +08:00
MrPresent-Han
9e2e7157e9
feat: support search_group_by for milvus(#25324) (#28983)
related: #25324

Search GroupBy function, used to aggregate result entities based on a
specific scalar column.
several points to mention:

1. Temporarliy, the whole groupby is implemented separated from
iterative expr framework **for the first period**
2. In the long term, the groupBy operation will be incorporated into the
iterative expr framework:https://github.com/milvus-io/milvus/pull/28166
3. This pr includes some unrelated mocked interface regarding alterIndex
due to some unworth-to-mention reasons. All these un-associated content
will be removed before the final pr is merged. This version of pr is
only for review
4. All other related details were commented in the files comparison

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-05 15:50:47 +08:00
cqy123456
22bb84fa9d
feat:add new gpu index:GPU_BRUTE_FORCE and limit gpu index metric type (#29590)
issue: https://github.com/milvus-io/milvus/issues/29230
this pr do these things:
1. add gpu brute force;
2. limit gpu index only support l2 / ip;

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-01-05 15:24:48 +08:00
PowderLi
c8db36a63a
enhance: get a blob to check object storage config (#29703)
issue: #29672
the storage account need privileges of actions
`Microsoft.Storage/storageAccounts/blobServices/containers/blobs/*` at
least

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-01-05 14:50:46 +08:00
yah01
0ae90443ba
enhance: fill missed info for segcore error (#29610)
- fill missed error info
- format the error message directly

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-04 17:54:46 +08:00
yah01
99e0f1e65a
enhance: unable to compile C++ tests (#29616)
The tests need to call a private method, Milvus uses `#define` to
replace private with public, the hack trick works but would be broken if
the including order changed.

This uses friend to make all things work well

Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2024-01-04 13:20:46 +08:00
PowderLi
5f00bad4b8
fix: link with install path's libblob-chunk-manager (#29496)
issue: #29494

1. link with install path's libblob-chunk-manager
2. performance of `ShouldBindWith` is better than `ShouldBindBodyWith`
3. the middleware shouldn't read the unrefreshed parameter repeatly

Signed-off-by: PowderLi <min.li@zilliz.com>
2023-12-31 20:02:48 +08:00
Jiquan Long
3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
zhagnlu
79c417b14e
fix: pass active count to query context instead of timestamp (#29541)
#29319

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-12-31 16:08:48 +08:00
sre-ci-robot
c2345daf3a
[automated] Update Knowhere Commit (#29578)
Update Knowhere Commit
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-12-29 18:56:46 +08:00
Jiquan Long
6f4791da0b
fix: panic in concurrent insert/query scenario (#29408)
issue: https://github.com/milvus-io/milvus/issues/29405

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-26 15:10:48 +08:00