milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2024-12-04 04:49:08 +08:00

Author	SHA1	Message	Date
XuanYang-cn	a446e754b4	fix: [2.4]DeleteData merge wrong data casuing data loss (#33821 ) See also: #33819 pr: #33820 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-06-13 16:07:56 +08:00
congqixia	86f3433053	enhance: [2.4]Use fastjson lib for unmarshal delete log (#33787 ) (#33802 ) Cherry-pick from master pr: #33878 ``` goos: linux goarch: amd64 GOMAXPROC=1 cpu: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz BenchmarkJsonSerdeStd 343872 3568 ns/op 1335 B/op 25 allocs/op BenchmarkJsonSerdeFastjson 5124177 234.9 ns/op 16 B/op 1 allocs/op ``` --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-13 10:27:57 +08:00
wei liu	54feef30e7	enhance: Use BatchPkExist to reduce bloom filter func call cost (#33752 ) issue: #33610 pr: #33611 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-12 17:45:58 +08:00
wei liu	f2917f5bdf	enhance: Remove StringPrimaryKey to reduce unnecessary copy and function call cost (#33486 ) (#33649 ) issue: #33497 pr: #33486 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-06 10:40:01 +08:00
Cai Yudong	68e2d532d8	enhance: Cherry-pick following SparseFloatVector bulk insert PRs to Milvus2.4 (#33391 ) Cherry pick from master pr: #33064 #33101 #33187 #33259 #33224 #33064 Support readable JSON file import for Float16/BFloat16/SparseFloat #33101 Store SparseFloatVector into parquet as JSON string #33187 Fix SparseFloatVector data parse error for parquet #33259 Fix SparseFloatVector data parse error for json #33224 Optimize bulk insert unittest Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-05-30 10:31:45 +08:00
congqixia	e2626c7b9e	fix: [2.4]Allocate new slice for each batch in streaming reader (#33360 ) Cherry-pick from master pr: #33359 Related to #33268 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-24 18:59:42 +08:00
cai.zhang	6ea7633bd5	enhance: Add memory size for binlog (#33025 ) issue: #33005 1. add `MemorySize` field for insert binlog. 2. `LogSize` means the file size in the storage object. 3. `MemorySize` means the size of the data in the memory. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2024-05-15 12:59:34 +08:00
Cai Yudong	4fc7915c70	enhance: unify data generation test APIs (#32955 ) Issue: #22837 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-05-14 14:33:33 +08:00
congqixia	0e5765b116	enhance: Utilize `TestLocations` ability to accelerate write & compaction (#32948 ) See also #32642 This PR reuses hash locations for bloom filter prediction utilizing `storage.Location`, like enhancement #32642. Also adds a utility struct in storage: `LocationCache` to storage locations for variable K (numbers of hash functions) --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-13 10:15:32 +08:00
wei liu	5038036ece	enhance: Reuse hash locations during access bloom fitler (#32642 ) issue: #32530 when try to match segment bloom filter with pk, we can reuse the hash locations. This PR maintain the max hash Func, and compute hash location once for all segment, reuse hash location can speed up bf access --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-07 06:13:47 -07:00
Cai Yudong	bcdbd1966e	feat: Support sparse float vector bulk insert for binlog/json/parquet (#32649 ) Issue: #22837 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-05-07 18:43:30 +08:00
aoiasd	31dca3249e	enhance: add type info for payload writer error message and add log when querynode find new collection (#32522 ) relate: https://github.com/milvus-io/milvus/issues/32668 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-05-07 14:45:29 +08:00
Aldrin	cb8dbc3c83	fix: Removed minio bucket after use in test (#32624 ) issue: https://github.com/milvus-io/milvus/issues/32616 - Forcefully deleted the non empty minio bucket with dummy data. Signed-off-by: Aldrin <imagesai32@gmail.com>	2024-04-28 13:51:26 +08:00
chyezh	2586c2f1b3	enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740 ) issue: #19095,#29655,#31718 - Change `ListWithPrefix` to `WalkWithPrefix` of OOS into a pipeline mode. - File garbage collection is performed in other goroutine. - Segment Index Recycle clean index file too. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-25 20:41:27 +08:00
Buqian Zheng	8a1017a152	enhance: add helpers to parse sparse float vector in JSON (#32543 ) issue: #29419 added helper functions to parse JSON representation of sparse float vectors, will be used by both the restful server and the import utils. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-04-25 14:47:24 +08:00
Cai Yudong	5fc439c600	feat: Bulk insert support fp16/bf16 (#32157 ) Issue: #22837 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-04-22 10:05:22 +08:00
Ted Xu	dc5ea6f17c	feat: adding binlog streaming writer (#31537 ) See #31679 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-04-11 10:33:20 +08:00
aoiasd	5b693c466d	fix: delegator filter out all partition's delete msg when loading segment (#31585 ) May cause deleted data queryable a period of time. relate: https://github.com/milvus-io/milvus/issues/31484 https://github.com/milvus-io/milvus/issues/31548 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-04-09 15:21:24 +08:00
Cai Yudong	00438f408f	enhance: Unify data type check APIs for go (#31887 ) Issue: #22837 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-04-07 14:27:22 +08:00
cqy123456	976928ecd1	fix: fix fp16/bf16 some code missing and add more fp16/bf16 test (#31612 ) issue: #31534 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-03-28 14:11:10 +08:00
SimFG	b1a1cca10b	feat: add more operation detail info for better allocation (#30438 ) issue: #30436 --------- Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-03-28 06:33:11 +08:00
groot	5be395354c	fix: minio ssl compatible issue (#31607 ) issue: https://github.com/milvus-io/milvus/issues/30709 Signed-off-by: yhmo <yihua.mo@zilliz.com>	2024-03-27 14:41:20 +08:00
yihao.dai	31cf849f68	enhance: Support retriving file size from importutilv2.Reader (#31533 ) To reduce the overhead caused by listing the S3 objects, add an interface to importutil.Reader to retrieve file sizes. issue: https://github.com/milvus-io/milvus/issues/31532, https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-25 20:29:07 +08:00
Chun Han	c3264ca3e3	feat: support segment pruner (#31003 ) related: #30376	2024-03-22 13:57:06 +08:00
groot	c81909bfab	enhance: Support MinIO TLS connection (#31311 ) issue: https://github.com/milvus-io/milvus/issues/30709 pr: #31292 Signed-off-by: yhmo <yihua.mo@zilliz.com> Co-authored-by: Chen Rao <chenrao317328@163.com>	2024-03-21 11:15:20 +08:00
Buqian Zheng	d7dbc3c9d8	fix: [sparse float vector] support the new streaming deserialize reader (#31325 ) issue: https://github.com/milvus-io/milvus/issues/31324 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-17 13:59:04 +08:00
Buqian Zheng	3c80083f51	feat: [Sparse Float Vector] add sparse vector support to milvus components (#30630 ) add sparse float vector support to different milvus components, including proxy, data node to receive and write sparse float vectors to binlog, query node to handle search requests, index node to build index for sparse float column, etc. https://github.com/milvus-io/milvus/issues/29419 --------- Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-13 14:32:54 -07:00
Ted Xu	987d9023a5	enhance: Enable binlog deserialize reader in datanode compaction (#31036 ) See #30863 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-03-08 18:25:02 +08:00
wayblink	875036b81b	feat: Define FieldValue, FieldStats and PartitionStats (#30286 ) Define FieldValue, FieldStats, PartitionStats FieldValue is largely copied from PrimaryKey FieldStats is largely copied from PrimaryKeyStats PartitionStats is map[segmentid][]FieldStats Each partition can have a PartitionStats file /kind feature related: #30287 related: #30633 --------- Signed-off-by: wayblink <anyang.wang@zilliz.com>	2024-03-06 20:42:37 -08:00
Ted Xu	71adafa933	enhance: adding a streaming deserialize reader for binlogs (#30860 ) See #30863 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-03-04 19:31:09 +08:00
yihao.dai	a434d33e75	feat: Add import scheduler and manager (#29367 ) This PR introduces novel managerial roles for importv2: 1. ImportMeta: To manage all the import tasks; 2. ImportScheduler: To process tasks and modify their states; 3. ImportChecker: To ascertain the completion of all tasks and instigate relevant operations. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-01 18:31:02 +08:00
SimFG	229fc4f755	enhance: retry to read when the s3 get the unexpect eof error (#30861 ) /kind improvement issue: #30877 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-02-28 16:28:53 +08:00
Ted Xu	12acaf3e4f	enhance: Adding a generic stream payload reader (#30682 ) See: #30404 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-02-21 17:10:52 +08:00
wayblink	f976385421	enhance: replace binlogIO with io.BinlogIO in datanode (#29725 ) #30633 Signed-off-by: wayblink <anyang.wang@zilliz.com>	2024-02-20 14:38:51 +08:00
cai.zhang	77ba3ce3f3	enhance: Use virtual host for tencent cloud (#30650 ) Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-02-20 14:08:51 +08:00
aoiasd	a0537156c0	enhance: delete codc deserialize data by stream batch (#30407 ) relate: https://github.com/milvus-io/milvus/issues/30404 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-02-06 17:04:25 +08:00
XuanYang-cn	d744962aa1	fix: Correct Size calculation of DeleteData (#30397 ) This PR would correct the actual deltalog size See also: #30191 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-02-02 10:47:04 +08:00
congqixia	e677af19b0	enhance: Add PrimaryKeys interface to reduce memory usage (#30405 ) See also #30404 `PrimaryKey` is used to hold pk values for both int64 & varchar data type. Since it is an interface it may occupies more memory than pure slices when holding a group of pks. This PR add `PrimaryKeys` interface when some other module need to hold lots of PrimaryKeys. By using this interface, it could reduce the memory of pk slice to half when using Int64 Pk data type and reduce interface cost for each row of varchar as well. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-02-01 09:57:11 +08:00
yihao.dai	c5918290e6	feat: Add import executor and manager for datanode (#29438 ) This PR introduces novel importv2 roles for datanode: 1. Executor: To execute tasks, a import task will be divided into the following steps: read data -> hash data -> sync data; 2. Manager: To manage all the tasks; issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-01-31 20:45:04 +08:00
cai.zhang	6cf2f09b60	feat: Support tencent cloud object storage for milvus (#30163 ) issue: #30162 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-01-23 11:28:56 +08:00
cai.zhang	6bfa826320	fix: Fix bug for read data from azure (#30007 ) issue: #30005 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-01-22 15:44:54 +08:00
Xu Tong	e429965f32	Add float16 approve for multi-type part (#28427 ) issue：https://github.com/milvus-io/milvus/issues/22837 Add bfloat16 vector, add the index part of float16 vector. Signed-off-by: Writer-X <1256866856@qq.com>	2024-01-11 15:48:51 +08:00
congqixia	f18a7191f2	enhance: make `ColumnBasedInsertMsgToInsertData` check field missing (#29758 ) fix: #29757 In previous code, `ColumnBasedInsertMsgToInsertData` adds empty field if the insertMsg parameter does not have the column schema defined. This may lead to unexpected behavior of caller functions. This PR: - Add column missing check - Add column length check - Generate BlobInfo for ColumnBasedInsertMsgToInsertData result --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-09 11:50:48 +08:00
yihao.dai	3d07b6682c	feat: Add import reader for numpy (#29253 ) This PR implements a new numpy reader for import. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-01-08 19:42:49 +08:00
yah01	97e4ec5a69	enhance: use random root path for minio unit tests (#29753 ) this avoids the conflicts while running multiple unit tests Signed-off-by: yah01 <yah2er0ne@outlook.com>	2024-01-08 15:58:48 +08:00
yihao.dai	23183ffb0f	feat: Add import reader for json (#29252 ) This PR implements a new json reader for import. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-01-05 18:12:48 +08:00
smellthemoon	1c1f2a1371	enhance:change some logs (#29579 ) related #29588 Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-01-05 16:12:48 +08:00
yihao.dai	3561586edf	feat: Add import reader for binlog (#28910 ) This PR defines the new import reader interfaces and implement a binlog reader for import. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-01-05 11:48:47 +08:00
cai.zhang	dc8b5c1130	enhance: Read azure file without ReadAll (#29602 ) issue: #29292 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-01-04 20:50:46 +08:00
Jiquan Long	3f46c6d459	feat: support inverted index (#28783 ) issue: https://github.com/milvus-io/milvus/issues/27704 Add inverted index for some data types in Milvus. This index type can save a lot of memory compared to loading all data into RAM and speed up the term query and range query. Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL` and `VARCHAR`. Not supported: `ARRAY` and `JSON`. Note: - The inverted index for `VARCHAR` is not designed to serve full-text search now. We will treat every row as a whole keyword instead of tokenizing it into multiple terms. - The inverted index don't support retrieval well, so if you create inverted index for field, those operations which depend on the raw data will fallback to use chunk storage, which will bring some performance loss. For example, comparisons between two columns and retrieval of output fields. The inverted index is very easy to be used. Taking below collection as an example: ```python fields = [ FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100), FieldSchema(name="int8", dtype=DataType.INT8), FieldSchema(name="int16", dtype=DataType.INT16), FieldSchema(name="int32", dtype=DataType.INT32), FieldSchema(name="int64", dtype=DataType.INT64), FieldSchema(name="float", dtype=DataType.FLOAT), FieldSchema(name="double", dtype=DataType.DOUBLE), FieldSchema(name="bool", dtype=DataType.BOOL), FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000), FieldSchema(name="random", dtype=DataType.DOUBLE), FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim), ] schema = CollectionSchema(fields) collection = Collection("demo", schema) ``` Then we can simply create inverted index for field via: ```python index_type = "INVERTED" collection.create_index("int8", {"index_type": index_type}) collection.create_index("int16", {"index_type": index_type}) collection.create_index("int32", {"index_type": index_type}) collection.create_index("int64", {"index_type": index_type}) collection.create_index("float", {"index_type": index_type}) collection.create_index("double", {"index_type": index_type}) collection.create_index("bool", {"index_type": index_type}) collection.create_index("varchar", {"index_type": index_type}) ``` Then, term query and range query on the field can be speed up automatically by the inverted index: ```python result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"]) result = collection.query(expr='int64 < 5', output_fields=["pk"]) result = collection.query(expr='int64 > 2997', output_fields=["pk"]) result = collection.query(expr='1 < int64 < 5', output_fields=["pk"]) ``` --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2023-12-31 19:50:47 +08:00

1 2 3 4 5 ...

435 Commits