milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2024-12-02 11:59:00 +08:00

Author	SHA1	Message	Date
Jiquan Long	3f46c6d459	feat: support inverted index (#28783 ) issue: https://github.com/milvus-io/milvus/issues/27704 Add inverted index for some data types in Milvus. This index type can save a lot of memory compared to loading all data into RAM and speed up the term query and range query. Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL` and `VARCHAR`. Not supported: `ARRAY` and `JSON`. Note: - The inverted index for `VARCHAR` is not designed to serve full-text search now. We will treat every row as a whole keyword instead of tokenizing it into multiple terms. - The inverted index don't support retrieval well, so if you create inverted index for field, those operations which depend on the raw data will fallback to use chunk storage, which will bring some performance loss. For example, comparisons between two columns and retrieval of output fields. The inverted index is very easy to be used. Taking below collection as an example: ```python fields = [ FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100), FieldSchema(name="int8", dtype=DataType.INT8), FieldSchema(name="int16", dtype=DataType.INT16), FieldSchema(name="int32", dtype=DataType.INT32), FieldSchema(name="int64", dtype=DataType.INT64), FieldSchema(name="float", dtype=DataType.FLOAT), FieldSchema(name="double", dtype=DataType.DOUBLE), FieldSchema(name="bool", dtype=DataType.BOOL), FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000), FieldSchema(name="random", dtype=DataType.DOUBLE), FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim), ] schema = CollectionSchema(fields) collection = Collection("demo", schema) ``` Then we can simply create inverted index for field via: ```python index_type = "INVERTED" collection.create_index("int8", {"index_type": index_type}) collection.create_index("int16", {"index_type": index_type}) collection.create_index("int32", {"index_type": index_type}) collection.create_index("int64", {"index_type": index_type}) collection.create_index("float", {"index_type": index_type}) collection.create_index("double", {"index_type": index_type}) collection.create_index("bool", {"index_type": index_type}) collection.create_index("varchar", {"index_type": index_type}) ``` Then, term query and range query on the field can be speed up automatically by the inverted index: ```python result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"]) result = collection.query(expr='int64 < 5', output_fields=["pk"]) result = collection.query(expr='int64 > 2997', output_fields=["pk"]) result = collection.query(expr='1 < int64 < 5', output_fields=["pk"]) ``` --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2023-12-31 19:50:47 +08:00
MrPresent-Han	ed644983e2	enhance: add param for bloomfilter(#29388 ) (#29490 ) related: #29388 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2023-12-28 18:10:46 +08:00
congqixia	6a86ac0ac6	fix: Align minio object storage ut to new minio server behavior (#29014 ) See also #29013 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-12-06 15:42:43 +08:00
yihao.dai	b4353ca4ce	enhance: Remove vector chunk manager (#28569 ) We have implemented the chunkcache (in cpp) to retrieve vectors, hence rendering the vectorchunkcache (in golang) obsolete. issue: https://github.com/milvus-io/milvus/issues/28568 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2023-11-30 18:00:33 +08:00
XuanYang-cn	aae7e62729	feat: Add levelzero compaction in DN (#28470 ) See also: #27606 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2023-11-30 14:30:28 +08:00
cai.zhang	f5f4f0872e	enhance: Support importing data with parquet file (#28608 ) issue: #28272 Numpy does not support array type import. Array type data is imported through parquet. Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2023-11-29 20:52:27 +08:00
yihao.dai	4bd426dbe7	fix: Fix minio latency monitoring for get operation (#28510 ) see also: https://github.com/milvus-io/milvus/issues/28509 Currently Minio latency monitoring for get operation only collects the duration of getting object (which just returns an io.Reader and does not really read from minio), this pr will correct this behavior. Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2023-11-28 10:00:27 +08:00
congqixia	8a9ab69369	fix: Skip statslog generation flushing empty L0 segment (#28733 ) See also #27675 When L0 segment contains only delta data, merged statslog shall be skiped when performing sync task --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-25 15:10:25 +08:00
yah01	cc952e0486	enhance: optimize forwarding level0 deletions by respecting partition (#28456 ) - Cache the level 0 deletions after loading level0 segments - Divide the level 0 deletions by partition related: #27349 --------- Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-21 18:24:22 +08:00
congqixia	2b3fa8f67b	fix: Add length check for `storage.NewPrimaryKeyStats` (#28576 ) See also #28575 Add zero-length check for `storage.NewPrimaryKeyStats`. This function shall return error when non-positive rowNum passed. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-21 10:28:21 +08:00
Bingyi Sun	59355cb3dc	Update arrow version to v12 (#28425 ) issue: https://github.com/milvus-io/milvus/issues/28423 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2023-11-15 10:36:19 +08:00
congqixia	e576271a24	Fix buffer FieldData has no `ElementType` and array logsize always zero (#28295 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-09 14:16:20 +08:00
yah01	ece592a42f	Deliver L0 segments delete records (#27722 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-07 01:44:18 +08:00
PowderLi	0252871d30	fix azure ListObjects (#27931 ) Signed-off-by: PowderLi <min.li@zilliz.com>	2023-11-01 11:34:14 +08:00
Enwei Jiao	8ae9c947ae	Use OpenDAL to access object store (#25642 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2023-11-01 09:00:14 +08:00
yah01	9658367a3c	Refine chunk manager errors (#27590 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-10-31 12:18:15 +08:00
zhenshan.cao	6c3f29d003	Identify service providers based on addresses (#27907 ) Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2023-10-25 17:28:10 +08:00
zhagnlu	6060dd7ea8	Add chunk manager request timeout (#27692 ) Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2023-10-23 20:08:08 +08:00
XuanYang-cn	7358c3527b	Add iterators (#27643 ) See also: #27606 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2023-10-18 19:34:08 +08:00
congqixia	2f201c25e2	Remove deprecated io/ioutil usage (#27747 ) `io/ioutil` package is deprecated, use `io`,`os` package replacement also added golangci-lint rule to block future reference Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: guoguangwu <guoguangwu@magic-shield.com>	2023-10-17 20:32:09 +08:00
XuanYang-cn	2f16339aac	Enhance InsertData and FieldData (#27436 ) 1. Add NewInsertData 2. Add GetRowNum(), GetMemorySize(), and, Append() for InsertData 3. Add AppendRow() for FieldData for compaction Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2023-10-17 17:36:11 +08:00
congqixia	670cb386e7	Add back `gocritic` linter and fix related issues (#27289 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-09-22 10:05:26 +08:00
SimFG	26f06dd732	Format the code (#27275 ) Signed-off-by: SimFG <bang.fu@zilliz.com>	2023-09-21 09:45:27 +08:00
congqixia	cc9974979f	Add staticcheck linter and fix existing problems (#27174 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-09-19 10:05:22 +08:00
PowderLi	4feb3fa7c6	support azure (#26398 ) Signed-off-by: PowderLi <min.li@zilliz.com>	2023-09-19 10:01:23 +08:00
Xu Tong	9166011c4a	Add float16 vector (#25852 ) Signed-off-by: Writer-X <1256866856@qq.com>	2023-09-08 10:03:16 +08:00
bjzhjing	548c82eca5	Refactor storage.MergeInsertData() to optimize the merging process (#26839 ) Benchmark Milvus with https://github.com/qdrant/vector-db-benchmark and specify the datasets as 'deep-image-96-angular'. Meanwhile, do perf profiling during 'upload + index' stage of vector-db-benchmark and see the following hot spots. 39.59%--github.com/milvus-io/milvus/internal/storage.MergeInsertData \| \|--21.43%--github.com/milvus-io/milvus/internal/storage.MergeFieldData \| \| \| \|--17.22%--runtime.memmove \| \| \| \|--1.53%--asm_exc_page_fault \| ...... \| \|--18.16%--runtime.memmove \| \|--1.66%--asm_exc_page_fault ...... The hot code path is in storage.MergeInsertData() which updates buffer.buffer by creating a new 'InsertData' instance and merging both the old buffer.buffer and addedBuffer into it. When it calls golang runtime.memmove to move buffer.buffer which is with big size (>1M), the hot spots appear. To avoid the above overhead, update storage.MergeInsertData() by appending addedBuffer to buffer.buffer, instead of moving buffer.buffer and addedBuffer to a new 'InsertData'. This change removes the hot spots 'runtime.memmove' from perf profiling output. Additionally, the 'upload + index' time, which is one performance metric of vector-db-benchmark, is reduced around 60% with this change. Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>	2023-09-05 21:41:48 +08:00
Enwei Jiao	fb0705df1b	Decouple basetable and componentparam (#26725 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2023-09-05 10:31:48 +08:00
zhagnlu	411f9ac823	Upgrade minio-go and add region and virtual host config for segcore chunk manager (#26194 ) Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2023-08-11 10:37:36 +08:00
congqixia	2770ac4df5	Fix nilness linter errors (#26218 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-08-09 11:31:15 +08:00
zhenshan.cao	2c6c7749e2	Enable print_log support json data type (#26118 ) Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2023-08-04 11:27:05 +08:00
xige-16	f33451b3d8	Write the cache file to the cacheStorage.rootpath dir (#25715 ) Signed-off-by: xige-16 <xi.ge@zilliz.com>	2023-07-28 10:59:02 +08:00
xige-16	94d6cbb238	Fix querynode panic when binlog ts wrong (#25635 ) Signed-off-by: xige-16 <xi.ge@zilliz.com>	2023-07-18 10:41:20 +08:00
xige-16	33c2012675	Add more metrics (#25081 ) Signed-off-by: xige-16 <xi.ge@zilliz.com>	2023-06-26 17:52:44 +08:00
Xiaofan	e8911ebda7	Add retry time when lazy load BF (#25096 ) Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>	2023-06-25 11:32:43 +08:00
PowderLi	3f4356df10	fix the spelling of `field` (#25008 ) Signed-off-by: PowderLi <min.li@zilliz.com>	2023-06-21 14:00:42 +08:00
yah01	8bc5282eb3	Fix datanode always retries to load stats even file corrupted (#25012 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-06-20 16:40:42 +08:00
Enwei Jiao	1ef8f0fceb	Remove cgo PayloadWriter (#24892 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2023-06-14 18:04:38 +08:00
yah01	a9dccec03a	Add go payload writer (#24656 ) (#24762 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-06-09 13:52:39 +08:00
congqixia	41af0a98fa	Use go-api/v2 for milvus-proto (#24770 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-06-09 01:28:37 +08:00
yah01	ebd0279d3f	Check error by Error() and NoError() for better report message (#24736 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-06-08 15:36:36 +08:00
Enwei Jiao	d3af451d92	Upgrade golangci-lint (#24707 ) Signed-off-by: Enwei Jiao <enwei.jiao@zilliz.com>	2023-06-07 19:34:36 +08:00
Bingyi Sun	b71c967ed7	Fix NoSuchKey error caused by special stats log (#24670 ) Signed-off-by: sunby <bingyi.sun@zilliz.com> Co-authored-by: sunby <bingyi.sun@zilliz.com>	2023-06-06 17:34:36 +08:00
aoiasd	c84bdcea49	merge stats log when segment flushing or compacting (#23570 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2023-05-29 10:21:28 +08:00
congqixia	73a181d226	Fix get vector it timeout and improve some string const usage (#24141 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-05-16 17:41:22 +08:00
yah01	546080dcdd	Support to retrieve json (#23563 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-04-21 11:46:32 +08:00
Enwei Jiao	967a97b9bd	Support json & array types (#23408 ) Signed-off-by: yah01 <yang.cen@zilliz.com> Co-authored-by: yah01 <yang.cen@zilliz.com>	2023-04-20 11:32:31 +08:00
cai.zhang	9288020da3	Fix bug for batch delete files on gcp of minio (#23052 ) (#23083 ) (#23090 ) Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2023-04-09 16:22:30 +08:00
jaime	c9d0c157ec	Move some modules from internal to public package (#22572 ) Signed-off-by: jaime <yun.zhang@zilliz.com>	2023-04-06 19:14:32 +08:00
yah01	081572d31c	Refactor QueryNode (#21625 ) Signed-off-by: yah01 <yang.cen@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: aoiasd <zhicheng.yue@zilliz.com>	2023-03-27 00:42:00 +08:00

1 2 3 4 5 ...

386 Commits