issue: #29494
1. link with install path's libblob-chunk-manager
2. performance of `ShouldBindWith` is better than `ShouldBindBodyWith`
3. the middleware shouldn't read the unrefreshed parameter repeatly
Signed-off-by: PowderLi <min.li@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/27704
Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.
Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.
Not supported: `ARRAY` and `JSON`.
Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.
The inverted index is very easy to be used.
Taking below collection as an example:
```python
fields = [
FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
FieldSchema(name="int8", dtype=DataType.INT8),
FieldSchema(name="int16", dtype=DataType.INT16),
FieldSchema(name="int32", dtype=DataType.INT32),
FieldSchema(name="int64", dtype=DataType.INT64),
FieldSchema(name="float", dtype=DataType.FLOAT),
FieldSchema(name="double", dtype=DataType.DOUBLE),
FieldSchema(name="bool", dtype=DataType.BOOL),
FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
FieldSchema(name="random", dtype=DataType.DOUBLE),
FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```
Then we can simply create inverted index for field via:
```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```
Then, term query and range query on the field can be speed up
automatically by the inverted index:
```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```
---------
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
Memory management is broken due caused by chunkmanager not getting
cleaned up.
Fixing the following error:
```
SIGSEGV: segmentation violation
PC=0x10fc4aa0c m=6 sigcode=2
```
relate: https://github.com/milvus-io/milvus/issues/29507
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
See also #27675
For now, Level zero segments shall always be synced as `Flushed` ones.
This PR fixes when level zero segments selected by policies other than
flush ts policy will be synced as growing state.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #29575
Add `getCollectionTarget` method which is atomic when scope is
`CurrentTargetFirst` or `NextTargetFirst`
Also return error when executor finds no channel in target manager
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
since milvus's build procedure is in the responding docker enviornment,
this procedure does not bind with particular host os, in turn it allows
milvus build on any os , including latest ubuntu os
background:
our self hosted runner is only using ubuntu-latest os
benefit:
make this github workflow obtain the resource from the self-hosted
runner
Signed-off-by: Sammy Huang <sammy.huang@zilliz.com>
the array type can't be compacted, the system could continue with the
inserted segments, but these segments can be never compacted
fix#29503
---------
Signed-off-by: yah01 <yah2er0ne@outlook.com>
issue: #29523
readable shard leader should still be the old one during channel
balance, if the new shard leader is not ready.
This PR fixed that query coord choose wrong shard leader during balance
channel
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #29526
Previous PR removed flushed segment info from request, which causes
pipeline failing to exclude flushed segment info
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
THe results don't meet our requirements, and the code hasn't been
maintained for a long time.
See also: #29447
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
`WatchDmChannel` only need growing segment info, this PR removes fetch
segmentInfos when fill watch dml channel request.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
the purpose of this PR is to add node name to environment variable for
more easily knowing which node the pod is on
Signed-off-by: Sammy Huang <sammy.huang@zilliz.com>
See also #27515
When Delegator processes delete data, it forwards delete data with only
segment id specified. When two segments has same segment id but one is
growing and the other is sealed, the delete will be applied to both
segments which causes delete data out of order when concurrent load
segment occurs.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
`AssignSegment` method defines how to assign segment to nodes, but
score_based_balance implement another assign logic in
`genStoppingSegmentPlan`
This PR rewrite gen stopping segment plan based on assign segment.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
milvus branch 2.3 add `loadType` in CollectionLoadInfo, so for
collection meta upgrade from 2.2, we should add `loadType` to
CollectionLoadInfo. This PR update CollectionLoadInfo with `loadType`
when meet a old version CollectionLoadInfo
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue:https://github.com/milvus-io/milvus/issues/29230
this pr do two things about cagra index:
a.milvus yaml config support gpu memory settings
b.add cagra-params check
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>
Co-authored-by: yusheng.ma <yusheng.ma@zilliz.com>
See also #27675
Since serialization segment buffer does not related to sync manager can
shall be done before submit into sync manager. So that the pk statistic
file could be more accurate and reduce complex logic inside sync
manager.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
the executor always fetches the latest segment info, so we could consume
from the latest checkpoint, which could save much time while deleted
many entities
Signed-off-by: yah01 <yang.cen@zilliz.com>