milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2024-12-02 11:59:00 +08:00

Author	SHA1	Message	Date
congqixia	3106384fc4	enhance: Return deltadata for `DeleteCodec.Deserialize` (#37214 ) Related to #35303 #30404 This PR change return type of `DeleteCodec.Deserialize` from `storage.DeleteData` to `DeltaData`, which reduces the memory usage of interface header. Also refine `storage.DeltaData` methods to make it easier to usage. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-10-29 12:04:24 +08:00
smellthemoon	80a7c78f28	enhance: import supports null in parquet and json formats (#35558 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-08-20 16:50:55 +08:00
congqixia	de8a266d8a	enhance: Enable linux code checker (#35084 ) See also #34483 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-30 15:53:51 +08:00
shaoting-huang	88b373b024	enhance: binlog primary key turn off dict encoding (#34358 ) issue: #34357 Go Parquet uses dictionary encoding by default, and it will fall back to plain encoding if the dictionary size exceeds the dictionary size page limit. Users can specify custom fallback encoding by using `parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However, Go Parquet [fallbacks to plain encoding](`e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238)`) rather than custom encoding method users provide. Therefore, this patch only turns off dictionary encoding for the primary key. With a 5 million auto ID primary key benchmark, the parquet file size improves from 13.93 MB to 8.36 MB when dictionary encoding is turned off, reducing primary key storage space by 40%. Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-07-17 17:47:44 +08:00
smellthemoon	2a1356985d	enhance: support null in go payload (#32296 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-06-19 17:08:00 +08:00
shaoting-huang	8cdc0e6233	fix: fix data codec writer close (#33818 ) issue:#33813 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-06-18 13:59:57 +08:00
XuanYang-cn	f67b6dc2b0	fix: DeleteData merge wrong data casuing data loss (#33820 ) See also: #33819 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-06-14 17:57:56 +08:00
Buqian Zheng	8a1017a152	enhance: add helpers to parse sparse float vector in JSON (#32543 ) issue: #29419 added helper functions to parse JSON representation of sparse float vectors, will be used by both the restful server and the import utils. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-04-25 14:47:24 +08:00
Buqian Zheng	3c80083f51	feat: [Sparse Float Vector] add sparse vector support to milvus components (#30630 ) add sparse float vector support to different milvus components, including proxy, data node to receive and write sparse float vectors to binlog, query node to handle search requests, index node to build index for sparse float column, etc. https://github.com/milvus-io/milvus/issues/29419 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-13 14:32:54 -07:00
aoiasd	a0537156c0	enhance: delete codc deserialize data by stream batch (#30407 ) relate: https://github.com/milvus-io/milvus/issues/30404 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-02-06 17:04:25 +08:00
XuanYang-cn	d744962aa1	fix: Correct Size calculation of DeleteData (#30397 ) This PR would correct the actual deltalog size See also: #30191 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-02-02 10:47:04 +08:00
Xu Tong	e429965f32	Add float16 approve for multi-type part (#28427 ) issue：https://github.com/milvus-io/milvus/issues/22837 Add bfloat16 vector, add the index part of float16 vector. Signed-off-by: Writer-X <1256866856@qq.com>	2024-01-11 15:48:51 +08:00
Jiquan Long	3f46c6d459	feat: support inverted index (#28783 ) issue: https://github.com/milvus-io/milvus/issues/27704 Add inverted index for some data types in Milvus. This index type can save a lot of memory compared to loading all data into RAM and speed up the term query and range query. Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL` and `VARCHAR`. Not supported: `ARRAY` and `JSON`. Note: - The inverted index for `VARCHAR` is not designed to serve full-text search now. We will treat every row as a whole keyword instead of tokenizing it into multiple terms. - The inverted index don't support retrieval well, so if you create inverted index for field, those operations which depend on the raw data will fallback to use chunk storage, which will bring some performance loss. For example, comparisons between two columns and retrieval of output fields. The inverted index is very easy to be used. Taking below collection as an example: ```python fields = [ FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100), FieldSchema(name="int8", dtype=DataType.INT8), FieldSchema(name="int16", dtype=DataType.INT16), FieldSchema(name="int32", dtype=DataType.INT32), FieldSchema(name="int64", dtype=DataType.INT64), FieldSchema(name="float", dtype=DataType.FLOAT), FieldSchema(name="double", dtype=DataType.DOUBLE), FieldSchema(name="bool", dtype=DataType.BOOL), FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000), FieldSchema(name="random", dtype=DataType.DOUBLE), FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim), ] schema = CollectionSchema(fields) collection = Collection("demo", schema) ``` Then we can simply create inverted index for field via: ```python index_type = "INVERTED" collection.create_index("int8", {"index_type": index_type}) collection.create_index("int16", {"index_type": index_type}) collection.create_index("int32", {"index_type": index_type}) collection.create_index("int64", {"index_type": index_type}) collection.create_index("float", {"index_type": index_type}) collection.create_index("double", {"index_type": index_type}) collection.create_index("bool", {"index_type": index_type}) collection.create_index("varchar", {"index_type": index_type}) ``` Then, term query and range query on the field can be speed up automatically by the inverted index: ```python result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"]) result = collection.query(expr='int64 < 5', output_fields=["pk"]) result = collection.query(expr='int64 > 2997', output_fields=["pk"]) result = collection.query(expr='1 < int64 < 5', output_fields=["pk"]) ``` --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2023-12-31 19:50:47 +08:00
congqixia	8a9ab69369	fix: Skip statslog generation flushing empty L0 segment (#28733 ) See also #27675 When L0 segment contains only delta data, merged statslog shall be skiped when performing sync task --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-25 15:10:25 +08:00
yah01	ece592a42f	Deliver L0 segments delete records (#27722 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-07 01:44:18 +08:00
XuanYang-cn	7358c3527b	Add iterators (#27643 ) See also: #27606 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2023-10-18 19:34:08 +08:00
XuanYang-cn	2f16339aac	Enhance InsertData and FieldData (#27436 ) 1. Add NewInsertData 2. Add GetRowNum(), GetMemorySize(), and, Append() for InsertData 3. Add AppendRow() for FieldData for compaction Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2023-10-17 17:36:11 +08:00
SimFG	26f06dd732	Format the code (#27275 ) Signed-off-by: SimFG <bang.fu@zilliz.com>	2023-09-21 09:45:27 +08:00
Xu Tong	9166011c4a	Add float16 vector (#25852 ) Signed-off-by: Writer-X <1256866856@qq.com>	2023-09-08 10:03:16 +08:00
congqixia	41af0a98fa	Use go-api/v2 for milvus-proto (#24770 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-06-09 01:28:37 +08:00
yah01	ebd0279d3f	Check error by Error() and NoError() for better report message (#24736 ) Signed-off-by: yah01 <yang.cen@zilliz.com>	2023-06-08 15:36:36 +08:00
aoiasd	c84bdcea49	merge stats log when segment flushing or compacting (#23570 ) Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2023-05-29 10:21:28 +08:00
Enwei Jiao	967a97b9bd	Support json & array types (#23408 ) Signed-off-by: yah01 <yang.cen@zilliz.com> Co-authored-by: yah01 <yang.cen@zilliz.com>	2023-04-20 11:32:31 +08:00
jaime	c9d0c157ec	Move some modules from internal to public package (#22572 ) Signed-off-by: jaime <yun.zhang@zilliz.com>	2023-04-06 19:14:32 +08:00
yah01	081572d31c	Refactor QueryNode (#21625 ) Signed-off-by: yah01 <yang.cen@zilliz.com> Co-authored-by: Congqi Xia <congqi.xia@zilliz.com> Co-authored-by: aoiasd <zhicheng.yue@zilliz.com>	2023-03-27 00:42:00 +08:00
Xiaofan	949d5d078f	Fix memory calculation in dataCodec (#21800 ) Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>	2023-01-28 11:09:52 +08:00
Xiaofan	633a749880	Recude IndexCodec Load Memory (#20621 ) Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com> Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>	2022-11-18 10:47:08 +08:00
SimFG	a55f739608	Separate public proto files (#19782 ) Signed-off-by: SimFG <bang.fu@zilliz.com> Signed-off-by: SimFG <bang.fu@zilliz.com>	2022-10-16 20:49:27 +08:00
SimFG	d7f38a803d	Separate some proto files (#19218 ) Signed-off-by: SimFG <bang.fu@zilliz.com> Signed-off-by: SimFG <bang.fu@zilliz.com>	2022-09-16 16:56:49 +08:00
xige-16	4de1bfe5bc	Add cpp data codec (#18538 ) Signed-off-by: xige-16 <xi.ge@zilliz.com> Co-authored-by: zhagnlu lu.zhang@zilliz.com Signed-off-by: xige-16 <xi.ge@zilliz.com>	2022-09-09 22:12:34 +08:00
xige-16	99984b88e1	Support delete varChar value (#16229 ) Signed-off-by: xige-16 <xi.ge@zilliz.com>	2022-04-02 17:43:29 +08:00
XuanYang-cn	bccf65ec67	[skip e2e]Update license for storage datacodec (#14039 ) Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2021-12-23 12:01:36 +08:00
godchen	7e56f08747	Add payload bytes interface. (#13467 ) Signed-off-by: godchen0212 <qingxiang.chen@zilliz.com>	2021-12-16 16:35:42 +08:00
XuanYang-cn	48b45d82e5	Add ut for binlog_io to 100 coverage (#12283 ) Make DN ut coverage upto 90% Resolves: #8058 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2021-11-26 17:43:17 +08:00
godchen	9d5bcd3e3a	Close event and binlog reader (#12173 ) Signed-off-by: godchen <qingxiang.chen@zilliz.com>	2021-11-22 17:27:14 +08:00
godchen	863f1bb34e	Fix multi delete data not effect (#11422 ) Signed-off-by: godchen <qingxiang.chen@zilliz.com>	2021-11-09 15:01:17 +08:00
XuanYang-cn	cd06f50645	Remove schema in delete codec (#10517 ) Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2021-10-24 09:59:10 +08:00
godchen	ffc0c07610	Change delete data primary key to int64 (#10438 ) Signed-off-by: godchen <qingxiang.chen@zilliz.com>	2021-10-22 15:37:12 +08:00
XuanYang-cn	2255fe0b45	Change deserialize deltelog from 1 blob to blobs (#10085 ) See also: #9530 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2021-10-19 10:28:35 +08:00
godchen	59ab0e441c	Add bloom filter for stats (#9630 ) * Add bloom filter for stats Signed-off-by: godchen <qingxiang.chen@zilliz.com> * trigger GitHub actions Signed-off-by: godchen <qingxiang.chen@zilliz.com>	2021-10-13 10:22:33 +08:00
dragondriver	dedf745b76	Rename IndexParamsFile to IndexParamsKey (#9563 ) Signed-off-by: dragondriver <jiquan.long@zilliz.com>	2021-10-09 19:27:02 +08:00
dragondriver	818cf3ffa0	Split blob into several string rows when index file is large (#8919 ) Signed-off-by: dragondriver <jiquan.long@zilliz.com>	2021-09-30 17:57:01 +08:00
dragondriver	cf8600077f	Refactor the index file format (#8514 ) Signed-off-by: dragondriver <jiquan.long@zilliz.com>	2021-09-29 09:52:12 +08:00
godchen	af173dd2a0	Add delete codec (#8736 ) Signed-off-by: godchen <qingxiang.chen@zilliz.com>	2021-09-28 14:30:02 +08:00
godchen	db94d7771f	Read vector from disk (#6707 ) * Read vector from disk Signed-off-by: godchen <qingxiang.chen@zilliz.com> * go fmt Signed-off-by: godchen <qingxiang.chen@zilliz.com> * fix git action error Signed-off-by: godchen <qingxiang.chen@zilliz.com> * fix error Signed-off-by: godchen <qingxiang.chen@zilliz.com> * fix test error Signed-off-by: godchen <qingxiang.chen@zilliz.com> * fix action error Signed-off-by: godchen <qingxiang.chen@zilliz.com> * fix caculate error Signed-off-by: godchen <qingxiang.chen@zilliz.com> * change var name Signed-off-by: godchen <qingxiang.chen@zilliz.com> * remove unused method Signed-off-by: godchen <qingxiang.chen@zilliz.com> * remove unused method Signed-off-by: godchen <qingxiang.chen@zilliz.com> * fix error Signed-off-by: godchen <qingxiang.chen@zilliz.com> * fix len error Signed-off-by: godchen <qingxiang.chen@zilliz.com> * remove unused code Signed-off-by: godchen <qingxiang.chen@zilliz.com> * change bytes to float method Signed-off-by: godchen <qingxiang.chen@zilliz.com> * change float to bytes method Signed-off-by: godchen <qingxiang.chen@zilliz.com> * fix action error Signed-off-by: godchen <qingxiang.chen@zilliz.com>	2021-07-24 09:25:22 +08:00
Cai Yudong	a992dcf6a8	Support query return vector output field (#6570 ) * improve code readibility Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * add offset in RetrieveResults Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * add VectorFieldInfo into Segment struct Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * add new interface for query vector Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * update load vector field logic Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * update load vector field logic Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * fill in field name in query result Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * add FieldId into FieldData Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * add fillVectorOutputFieldsIfNeeded Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * update data_codec_test.go Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * add DeserializeFieldData Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * realize query return vector output field Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * fix static-check Signed-off-by: yudong.cai <yudong.cai@zilliz.com> * disable query vector case Signed-off-by: yudong.cai <yudong.cai@zilliz.com>	2021-07-16 17:19:55 +08:00
godchen	1c6786f85c	Add blob info (#5792 ) * Add blob info Signed-off-by: godchen <qingxiang.chen@zilliz.com> * fix error Signed-off-by: godchen <qingxiang.chen@zilliz.com> * fix error Signed-off-by: godchen <qingxiang.chen@zilliz.com>	2021-06-16 12:03:57 +08:00
Xiangyu Wang	23c4de0eb8	Flush statistics for all int64 fields (#5318 ) Resolves: #5262 Signed-off-by: Xiangyu Wang <xiangyu.wang@zilliz.com>	2021-05-20 10:38:45 +00:00
Xiangyu Wang	82ccd4cec0	Rename module (#4988 ) * Rename module Signed-off-by: Xiangyu Wang <xiangyu.wang@zilliz.com>	2021-04-22 14:45:57 +08:00
godchen	0dfcb90881	Add storage copyright Signed-off-by: godchen <qingxiang.chen@zilliz.com>	2021-04-19 11:32:24 +08:00

1 2

67 Commits