milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2024-11-30 10:59:32 +08:00

History

Jiquan Long 3f46c6d459 feat: support inverted index (#28783 ) issue: https://github.com/milvus-io/milvus/issues/27704 Add inverted index for some data types in Milvus. This index type can save a lot of memory compared to loading all data into RAM and speed up the term query and range query. Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL` and `VARCHAR`. Not supported: `ARRAY` and `JSON`. Note: - The inverted index for `VARCHAR` is not designed to serve full-text search now. We will treat every row as a whole keyword instead of tokenizing it into multiple terms. - The inverted index don't support retrieval well, so if you create inverted index for field, those operations which depend on the raw data will fallback to use chunk storage, which will bring some performance loss. For example, comparisons between two columns and retrieval of output fields. The inverted index is very easy to be used. Taking below collection as an example: ```python fields = [ FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100), FieldSchema(name="int8", dtype=DataType.INT8), FieldSchema(name="int16", dtype=DataType.INT16), FieldSchema(name="int32", dtype=DataType.INT32), FieldSchema(name="int64", dtype=DataType.INT64), FieldSchema(name="float", dtype=DataType.FLOAT), FieldSchema(name="double", dtype=DataType.DOUBLE), FieldSchema(name="bool", dtype=DataType.BOOL), FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000), FieldSchema(name="random", dtype=DataType.DOUBLE), FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim), ] schema = CollectionSchema(fields) collection = Collection("demo", schema) ``` Then we can simply create inverted index for field via: ```python index_type = "INVERTED" collection.create_index("int8", {"index_type": index_type}) collection.create_index("int16", {"index_type": index_type}) collection.create_index("int32", {"index_type": index_type}) collection.create_index("int64", {"index_type": index_type}) collection.create_index("float", {"index_type": index_type}) collection.create_index("double", {"index_type": index_type}) collection.create_index("bool", {"index_type": index_type}) collection.create_index("varchar", {"index_type": index_type}) ``` Then, term query and range query on the field can be speed up automatically by the inverted index: ```python result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"]) result = collection.query(expr='int64 < 5', output_fields=["pk"]) result = collection.query(expr='int64 > 2997', output_fields=["pk"]) result = collection.query(expr='1 < int64 < 5', output_fields=["pk"]) ``` --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>		2023-12-31 19:50:47 +08:00
..
chunk_mgr_factory.go	Add chunk manager request timeout (#27692 )	2023-10-23 20:08:08 +08:00
chunkmgr_mock.go	Format the code (#27275 )	2023-09-21 09:45:27 +08:00
etcd_mock.go	Remove deprecated io/ioutil usage (#27747 )	2023-10-17 20:32:09 +08:00
index_test.go	Unify interface of vector index & scalar index. (#15959 )	2022-03-21 14:23:24 +08:00
indexnode_component_mock.go	Decouple basetable and componentparam (#26725 )	2023-09-05 10:31:48 +08:00
indexnode_mock.go	Refine state check (#27541 )	2023-10-11 21:01:35 +08:00
indexnode_service_test.go	Refine state check (#27541 )	2023-10-11 21:01:35 +08:00
indexnode_service.go	feat: integrate storagev2 into index build process (#28995 )	2023-12-13 17:24:38 +08:00
indexnode_test.go	Decoupling client and server API in types interface (#27186 )	2023-09-26 09:57:25 +08:00
indexnode.go	enhance: update cagra index params in config and add params check (#29045 )	2023-12-26 11:04:47 +08:00
metrics_info_test.go	Move some modules from internal to public package (#22572 )	2023-04-06 19:14:32 +08:00
metrics_info.go	Refine state check (#27541 )	2023-10-11 21:01:35 +08:00
OWNERS	[skip ci]Update OWNERS files (#11898 )	2021-11-16 15:41:11 +08:00
task_scheduler_test.go	Format the code (#27275 )	2023-09-21 09:45:27 +08:00
task_scheduler.go	feat: integrate storagev2 into index build process (#28995 )	2023-12-13 17:24:38 +08:00
task_state_test.go	Fixbug: IndexNode should panic when save meta failed to MetaKV (#15347 )	2022-01-24 17:18:46 +08:00
task_state.go	IndexCoord handle events correctly (#17878 )	2022-07-07 14:44:21 +08:00
task_test.go	feat: integrate storagev2 into index build process (#28995 )	2023-12-13 17:24:38 +08:00
task.go	feat: support inverted index (#28783 )	2023-12-31 19:50:47 +08:00
taskinfo_ops.go	feat: integrate storagev2 into index build process (#28995 )	2023-12-13 17:24:38 +08:00
util.go	Add float16 vector (#25852 )	2023-09-08 10:03:16 +08:00