Commit Graph

16 Commits

Author SHA1 Message Date
Yinzuo Jiang
7d74edd6dd
fix: update clang-tidy and clang-format from 10 to 12 (#33141)
Default llvm toolchain version in Ubuntu 20.04 is 10, while Ubuntu 22.04
does not have `clang-tidy-10` or `clang-format-10` by default.

issue: #33142

Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>
Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
2024-06-13 15:27:58 +08:00
Jiquan Long
e9f3df3626
fix: inverted index file not found (#29695)
issue: https://github.com/milvus-io/milvus/issues/29654

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-01-07 20:26:49 +08:00
Jiquan Long
3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
xige-16
111c608513
Add script to auto clang format (#18559)
Signed-off-by: xige-16 <xi.ge@zilliz.com>

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-08-11 11:04:37 +08:00
Enwei Jiao
283f5731d2
config from etcd (#18421)
Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>
2022-08-01 10:04:33 +08:00
zhenshan.cao
58ea38142f
Use boost dynamic_bitset in segcore (#16476)
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2022-04-14 22:37:34 +08:00
FluorineDog
6059558698 Add license files
Signed-off-by: FluorineDog <guilin.gou@zilliz.com>
2021-04-19 11:16:16 +08:00
xige-16
4c491471ee Add release collection and release partition interface for query node
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2021-02-24 15:58:55 +08:00
xige-16
7a7a73e89c Fix high memory usage in pulsarTtStream
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2021-02-23 11:40:12 +08:00
FluorineDog
15dd17488e Support benchmark
Signed-off-by: FluorineDog <guilin.gou@zilliz.com>
2021-02-23 10:47:21 +08:00
zhenshan.cao
fda8d62b38 Fix make error
Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>
2020-11-30 22:14:19 +08:00
quicksilver
86573d0053 Create Jenkins pipeline
Signed-off-by: quicksilver <zhifeng.zhang@zilliz.com>
2020-11-30 10:07:04 +08:00
FluorineDog
77fa75b1ec Add binary insert and warper of binary search, rename vector
Signed-off-by: FluorineDog <guilin.gou@zilliz.com>
2020-11-30 05:18:44 +08:00
FluorineDog
e45df02874 Add Generator for visitor pattern (#89)
Signed-off-by: FluorineDog <guilin.gou@zilliz.com>
2020-11-05 14:30:52 +08:00
GuoRentong
15b6963be0 Replace pdf figs with png figs
Signed-off-by: GuoRentong <rentong.guo@zilliz.com>
2020-10-31 15:11:47 +08:00
FluorineDog
a48ca80286 Format Code and duplicate class Segment
Signed-off-by: FluorineDog <guilin.gou@zilliz.com>
2020-10-24 18:04:57 +08:00