Commit Graph

8032 Commits

Author SHA1 Message Date
zhenshan.cao
7e6f73a12d
feat: Authorize users to query grant info of their roles (#29747)
Once a role is granted to a user, the user should automatically possess
the privilege information associated with that role.

issue: #29710

Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2024-01-08 15:10:49 +08:00
congqixia
fe47deebf3
fix: Set & Return correct SegmentLevel in querynode segment manager (#29740)
See also #27349

The segment level label in querynode used `Legacy` before segment level
was correctly passed in Load request. Now this attribute is still using
legacy so the metrics does not look right.

This PR add paramter for `NewSegment` and passes corrent values for each
invocation.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-08 14:16:48 +08:00
Jiquan Long
e9f3df3626
fix: inverted index file not found (#29695)
issue: https://github.com/milvus-io/milvus/issues/29654

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-01-07 20:26:49 +08:00
Jiquan Long
20fb847521
enhance: load delta logs concurrently (#29623)
This pr will make milvus load delta logs concurrently, which should
decrease the latency of loading a segment.
/kind improvement

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-01-07 20:22:48 +08:00
zhagnlu
d07197ab1a
enhance: add compare simd function (#29432)
#26137

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-01-07 20:20:57 +08:00
foxspy
271edc6669
fix: throw exception when upload file failed for DiskIndex (#29627)
related to : #29417 

cardinal indexes upload index files in `Serialize` interface, and throw
exception when the `Serialize` failed.

Signed-off-by: xianliang <xianliang.li@zilliz.com>
2024-01-07 20:03:13 +08:00
wayblink
635a7f777c
feat: add clustering key in create/describe collection (#29506)
#28410
/kind feature

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-01-07 19:56:48 +08:00
yihao.dai
156a0dd450
feat: Add import reader for Parquet (#29618)
This PR implements a Parquet reader for import.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-07 19:38:49 +08:00
cai.zhang
5dc300c4a9
fix: Fix bug for pk index doesn't have raw data (#29711)
issue: #29697

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-07 19:36:48 +08:00
congqixia
b5f039a221
fix: Assertion all async invocations in test case (#29737)
Resolves: #29736

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-07 15:54:47 +08:00
yah01
a0cec4047a
fix: make the entity num metric accurate (#29643)
fix #29642

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-05 18:24:47 +08:00
yihao.dai
23183ffb0f
feat: Add import reader for json (#29252)
This PR implements a new json reader for import.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-05 18:12:48 +08:00
aoiasd
70ec00cd5d
enhance: support access log print cluster prefix (#29646)
relate: https://github.com/milvus-io/milvus/issues/29645

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-01-05 16:34:47 +08:00
smellthemoon
1c1f2a1371
enhance:change some logs (#29579)
related #29588

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-05 16:12:48 +08:00
wei liu
e98c62abbb
enhance: refactor leader_observer to leader_checker (#29454)
issue: #29453 

sync distribution by rpc will also call loadSegment/releaseSegment,
which may cause all kinds of concurrent case on same segment, such as
concurrent load and release on one segment.
This PR add leader_checker which generate load/release task to correct
the leader view, instead of calling sync distribution by rpc

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-05 15:54:55 +08:00
MrPresent-Han
9e2e7157e9
feat: support search_group_by for milvus(#25324) (#28983)
related: #25324

Search GroupBy function, used to aggregate result entities based on a
specific scalar column.
several points to mention:

1. Temporarliy, the whole groupby is implemented separated from
iterative expr framework **for the first period**
2. In the long term, the groupBy operation will be incorporated into the
iterative expr framework:https://github.com/milvus-io/milvus/pull/28166
3. This pr includes some unrelated mocked interface regarding alterIndex
due to some unworth-to-mention reasons. All these un-associated content
will be removed before the final pr is merged. This version of pr is
only for review
4. All other related details were commented in the files comparison

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2024-01-05 15:50:47 +08:00
cqy123456
22bb84fa9d
feat:add new gpu index:GPU_BRUTE_FORCE and limit gpu index metric type (#29590)
issue: https://github.com/milvus-io/milvus/issues/29230
this pr do these things:
1. add gpu brute force;
2. limit gpu index only support l2 / ip;

Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
2024-01-05 15:24:48 +08:00
PowderLi
c8db36a63a
enhance: get a blob to check object storage config (#29703)
issue: #29672
the storage account need privileges of actions
`Microsoft.Storage/storageAccounts/blobServices/containers/blobs/*` at
least

Signed-off-by: PowderLi <min.li@zilliz.com>
2024-01-05 14:50:46 +08:00
wei liu
b45d08b47b
enhance: Add ctx for load index logs (#29686)
This PR add ctx for load index logs

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-05 14:24:49 +08:00
yihao.dai
3561586edf
feat: Add import reader for binlog (#28910)
This PR defines the new import reader interfaces and implement a binlog
reader for import.

issue: https://github.com/milvus-io/milvus/issues/28521

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-01-05 11:48:47 +08:00
congqixia
3626f49025
fix: make sure balance candidate is alway pushed back (#29702)
See also #29699

Querycoord panicked when tried to pop from an empty heap. We assume the
heap shall not be empty, but in some branch, the candidate is never
pushed back.

This PR put pop & push in a closure and adds a defer call to push item
back.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-05 10:08:47 +08:00
congqixia
dc6a6a50fa
enhance: reduce SyncTask AllocID call and refine code (#29701)
See also #27675

`Allocator.Alloc` and `Allocator.AllocOne` might be invoked multiple
times if there were multiple blobs set in one sync task.

This PR add pre-fetch logic for all blobs and cache logIDs in sync task
so that at most only one call of the allocator is needed.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-05 10:04:46 +08:00
cai.zhang
dc8b5c1130
enhance: Read azure file without ReadAll (#29602)
issue: #29292

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-01-04 20:50:46 +08:00
wayblink
05d735c322
enhance: Rename SearchV2 to HybridSearch (#29592)
related: https://github.com/milvus-io/milvus-proto/pull/233
issue: #29593
/kind enhancement

Signed-off-by: wayblink <anyang.wang@zilliz.com>
2024-01-04 19:22:46 +08:00
yah01
0ae90443ba
enhance: fill missed info for segcore error (#29610)
- fill missed error info
- format the error message directly

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-04 17:54:46 +08:00
yah01
9e0163e12f
enhance: use GPU pool for gpu tasks (#29678)
- this much improve the performance for GPU index

Signed-off-by: yah01 <yang.cen@zilliz.com>
2024-01-04 17:50:46 +08:00
congqixia
4f8c540c77
enhance: cache collection schema attributes to reduce proxy cpu (#29668)
See also #29113

The collection schema is crucial when performing search/query but some
of the information is calculated for every request.

This PR change schema field of cached collection info into a utility
`schemaInfo` type to store some stable result, say pk field,
partitionKeyEnabled, etc. And provided field name to id map for
search/query services.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-04 17:28:46 +08:00
smellthemoon
a988daf143
fix: unstable ut and fix goroutine leak (#29624)
fix unstable ut and fix goroutine leak
related: #27801

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-04 17:24:47 +08:00
XuanYang-cn
a3aff37f73
fix: Correct flush buffer size metrics (#29571)
See also: #29204

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-01-04 17:22:46 +08:00
smellthemoon
e09fc040aa
fix: the config value of DataCoordTimeTick become longer and longer (#29659)
#29658

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-04 17:06:47 +08:00
congqixia
da7c3cbd88
enhance: make delegator delete buffer holding all delete from cp (#29626)
See also #29625

This PR:
- Add a new implemention of `DeleteBuffer`: listDeleteBuffer
  - holds cacheBlock slice
  - `Put` method append new delete data into last block
  - when a block is full, append a new block into the list
- Add `TryDiscard` method for `DeleteBuffer` interface
  - For doubleCacheBuffer, do nothing
- For listDeleteBuffer, try to evict "old" blocks, which are blocks
before the first block whose start ts is behind provided ts
- Add checkpoint field for `UpdateVersion` sync action, which shall be
used to discard old cache delete block

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-04 17:02:46 +08:00
congqixia
79c06c5e73
fix: serializer shall bypass L0 segment merge stats step (#29636)
See also #27675
Fix logic problem introduced by #29413, which is serializer tries to
merge statslog list while level segments do not have statslog. This
shall result returning error. `writeBufferBase` ignores this error but
it shall only ignore `ErrSegmentNotFound`.

This PR add logic checking segment level before execution of merging
statslog list. And add error type check for getSyncTask failure.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-04 16:52:45 +08:00
congqixia
aa967de0a8
enhance: Explicitly pass LevelZero segment ids in vchan info (#29612)
See also #27675

For `GetRecoveryInfo` & `GetRecoveryInfoV2`, Level zero segment ids
shall be specified in vchan info so that querycoord could re-fetch
current segment info during watch procedure without having all segment
info

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-01-04 16:46:45 +08:00
yah01
99e0f1e65a
enhance: unable to compile C++ tests (#29616)
The tests need to call a private method, Milvus uses `#define` to
replace private with public, the hack trick works but would be broken if
the including order changed.

This uses friend to make all things work well

Signed-off-by: yah01 <yang.cen@zilliz.com>
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2024-01-04 13:20:46 +08:00
wei liu
336fce0582
enhance: Rewrite gen segment plan based on assign segment (#29574)
issue: #29582
This PR rewrite gen segment plan logic based on assign segment in
`score_based_balancer`

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-01-04 11:10:44 +08:00
SimFG
d23f87a393
enhance: Add concurrency for datacoord segment GC (#29561)
issue: https://github.com/milvus-io/milvus/issues/29553
/kind improvement

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-01-03 13:16:57 +08:00
smellthemoon
b12c90af72
enhance:Add upsert vector metrics (#29226)
add some metrics.

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-01-03 10:06:48 +08:00
XuanYang-cn
f1b6ccf305
enhance: compaction use ChannelManager interface (#29530)
Rewrite compaction_test.go

See also: #29447

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-01-02 18:08:49 +08:00
PowderLi
5f00bad4b8
fix: link with install path's libblob-chunk-manager (#29496)
issue: #29494

1. link with install path's libblob-chunk-manager
2. performance of `ShouldBindWith` is better than `ShouldBindBodyWith`
3. the middleware shouldn't read the unrefreshed parameter repeatly

Signed-off-by: PowderLi <min.li@zilliz.com>
2023-12-31 20:02:48 +08:00
Jiquan Long
3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
zhagnlu
79c417b14e
fix: pass active count to query context instead of timestamp (#29541)
#29319

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2023-12-31 16:08:48 +08:00
smellthemoon
ae640e7c80
fix: pass in undefined params (#29591)
fix pass in undefined params
issue: #29594

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2023-12-30 00:32:47 +08:00
sre-ci-robot
c2345daf3a
[automated] Update Knowhere Commit (#29578)
Update Knowhere Commit
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-12-29 18:56:46 +08:00
xige-16
02673914a0
feat: Support multiple vector indexes in a collection (#27700)
issue: #25639 

/kind improvement
Signed-off-by: xige-16 <xi.ge@zilliz.com>

---------

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-12-29 11:44:45 +08:00
congqixia
55af8f611f
fix: always sync level zero segments as flushed (#29569)
See also #27675

For now, Level zero segments shall always be synced as `Flushed` ones.
This PR fixes when level zero segments selected by policies other than
flush ts policy will be synced as growing state.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-29 10:34:47 +08:00
congqixia
a3cb8e2625
fix: Add atomic method to get collection target (#29577)
Related to #29575

Add `getCollectionTarget` method which is atomic when scope is
`CurrentTargetFirst` or `NextTargetFirst`
Also return error when executor finds no channel in target manager

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-29 09:04:46 +08:00
congqixia
a8b7629315
fix: exclude insertData before growing checkpoint (#29558)
Resolves: #29556
Refine exclude segment function signature
Add exclude growing before checkpoint logic

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-12-28 18:18:54 +08:00
MrPresent-Han
ed644983e2
enhance: add param for bloomfilter(#29388) (#29490)
related: #29388

Signed-off-by: MrPresent-Han <chun.han@zilliz.com>
2023-12-28 18:10:46 +08:00
wei liu
514e279f3a
enhance: Remove useless log in collection observer (#29554)
This PR removed the useless log in collection observer

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2023-12-28 17:16:47 +08:00
xige-16
0a70e8b601
enhance: Remove multiple vector field limit (#27827)
issue: https://github.com/milvus-io/milvus/issues/25639

/kind improvement
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2023-12-28 16:40:46 +08:00