milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2024-12-11 09:46:26 +08:00

Author	SHA1	Message	Date
congqixia	c0ee25afd8	fix: Use k locations only for basic BF test location (#35380 ) Related to #35379 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-09 07:52:22 +08:00
congqixia	de8a266d8a	enhance: Enable linux code checker (#35084 ) See also #34483 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-30 15:53:51 +08:00
wei liu	c45f38aa61	enhance: Update protobuf-go to protobuf-go v2 (#34394 ) issue: #34252 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-29 11:31:51 +08:00
congqixia	4ee6c69217	enhance: Add Segment Level in milvus segment info APIs (#34763 ) See also #34746 This PR add segment level field in response of `GetPersistentSegmentInfo` and `GetQuerySegmentInfo` --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-26 10:01:46 +08:00
Chun Han	c46c401112	fix: refine handling type for segment pruner(#34923 ) (#34925 ) related: #34923 Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2024-07-25 13:57:45 +08:00
smellthemoon	5616b7e8d2	enhance: support null in c data_datacodec and load null value (#32183 ) 1. support read and write null in segcore will store valid_data(use uint8_t type to save memory) in fieldData. 2. support load null binlog reader read and write data into column(sealed segment), insertRecord(growing segment). In sealed segment, store valid_data directly. In growing segment, considering prior implementation and easy code reading, it covert uint8_t to fbvector<bool>, which may optimize in future. 3. retrieve valid_data. parse valid_data in search/query. #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-07-23 16:07:51 +08:00
shaoting-huang	88b373b024	enhance: binlog primary key turn off dict encoding (#34358 ) issue: #34357 Go Parquet uses dictionary encoding by default, and it will fall back to plain encoding if the dictionary size exceeds the dictionary size page limit. Users can specify custom fallback encoding by using `parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However, Go Parquet [fallbacks to plain encoding](`e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238)`) rather than custom encoding method users provide. Therefore, this patch only turns off dictionary encoding for the primary key. With a 5 million auto ID primary key benchmark, the parquet file size improves from 13.93 MB to 8.36 MB when dictionary encoding is turned off, reducing primary key storage space by 40%. Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-07-17 17:47:44 +08:00
congqixia	eb4bfa3281	fix: Revert reuse deserialize result to fix data overwritten (#34683 ) See also #34637 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-15 22:31:38 +08:00
congqixia	531092c031	enhance: Add lint rule to forbid gogo protobuf (#34594 ) github.com/gogo/protobuf is deprecated and could be error prune after upgrade protobuf message to v2. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-12 10:19:35 +08:00
SimFG	5016038781	enhance: release the record in delete codec and add some log for compaction (#34454 ) /kind improvement Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-07-09 15:40:17 +08:00
Ted Xu	eae4dfca7b	fix: reuse deserialize result to help improve memory management (#34507 ) Fixed #33268 The original reuse is broken by #33359 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-07-09 14:12:10 +08:00
congqixia	3333160b8d	enhance: Fix lint issues from recent PRs (#34482 ) See also #34483 Some lint issues are introduced due to lack of static check run. This PR fixes these problems. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-09 10:06:24 +08:00
shaoting-huang	f4dd7c7efb	enhance: add delta log stream new format reader and writer (#34116 ) issue: #34123 Benchmark case: The benchmark run the go benchmark function `BenchmarkDeltalogFormat` which is put in the Files changed. It tests the performance of serializing and deserializing from two different data formats under a 10 million delete log dataset. Metrics: The benchmarks measure the average time taken per operation (ns/op), memory allocated per operation (MB/op), and the number of memory allocations per operation (allocs/op). \| Test Name \| Avg Time (ns/op) \| Time Comparison \| Memory Allocation (MB/op) \| Memory Comparison \| Allocation Count (allocs/op) \| Allocation Comparison \| \|---------------------------------\|------------------\|-----------------\|---------------------------\|-------------------\|------------------------------\|------------------------\| \| one_string_format_reader \| 2,781,990,000 \| Baseline \| 2,422 \| Baseline \| 20,336,539 \| Baseline \| \| pk_ts_separate_format_reader \| 480,682,639 \| -82.72% \| 1,765 \| -27.14% \| 20,396,958 \| +0.30% \| \| one_string_format_writer \| 5,483,436,041 \| Baseline \| 13,900 \| Baseline \| 70,057,473 \| Baseline \| \| pk_and_ts_separate_format_writer\| 798,591,584 \| -85.43% \| 2,178 \| -84.34% \| 30,270,488 \| -56.78% \| Both read and write operations show significant improvements in both speed and memory allocation. Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-07-06 09:08:09 +08:00
Chun Han	fcafdb6d5f	enhance: reconstruct scalar part's code for segment-pruner(#30376 ) (#34346 ) related: #30376 1. support more complex expr 2. add more ut test for unrelated fields Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2024-07-04 16:36:09 +08:00
congqixia	0fd0fcfe1d	enhance: Fix lint issues & sdk testcase (#34399 ) Some lint issue is not detect due to recent static check pipeline issue. This PR fixes these problem and Go milvusclient testcases. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-07-03 19:42:10 +08:00
smellthemoon	ef3ced8138	fix: descriptor event in previous version not has nullable to parse error (#34235 ) #34176 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-07-01 16:38:06 +08:00
congqixia	e04f1f9748	enhance: Add unittest for `storage.DeleteLog` (#34190 ) See also #33787 Backport unit test part in #34188 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-26 17:14:04 +08:00
congqixia	fd922d921a	enhance: Add nilness linter and fix some small issues (#34049 ) Add `nilness` for govet linter and fixed some detected issues Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-24 14:52:03 +08:00
Chun Han	ca7ef26e4b	fix: sync part stats task cannot be finished(#30376 ) (#34027 ) related: #30376 also: refine log output for query_coord task by rephrasing action string Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com>	2024-06-24 10:16:02 +08:00
Ted Xu	78885a44c4	fix: turn on compression on stream writers (#34067 ) See #31679 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-06-24 10:08:02 +08:00
wayblink	380d3f4469	fix: Fix memory buffer error & some renaming (#33850 ) #30633 --------- Signed-off-by: wayblink <anyang.wang@zilliz.com>	2024-06-21 17:30:01 +08:00
congqixia	2f691f1e67	enhance: Unify DeleteLog parsing code (#34009 ) See also #33787 The parsing delete log is distributed in lots of places, which is not recommended and hard to maintain. This PR abstract common parsing logic into `DeleteLog.Parse` method to unify implementation and make it easier to replace json parsing lib. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-21 16:54:01 +08:00
shaoting-huang	5f02e52561	enhance: Refactor data codec deserialize (#33923 ) #33922 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-06-20 11:17:59 +08:00
smellthemoon	2a1356985d	enhance: support null in go payload (#32296 ) #31728 --------- Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-06-19 17:08:00 +08:00
Ted Xu	6d5747cb3e	feat: adding deltalog stream reader and writer (#33844 ) See #31679 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-06-19 14:42:01 +08:00
shaoting-huang	8cdc0e6233	fix: fix data codec writer close (#33818 ) issue:#33813 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-06-18 13:59:57 +08:00
congqixia	f993b2913b	enhance: Reserve space of payload writer when serialize data (#33817 ) See also #33561 #33562 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-17 12:06:04 +08:00
XuanYang-cn	f67b6dc2b0	fix: DeleteData merge wrong data casuing data loss (#33820 ) See also: #33819 Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-06-14 17:57:56 +08:00
shaoting-huang	0ecd694305	enhance: legacy code clean up (#33838 ) issue: #33839 Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-06-14 14:25:56 +08:00
wei liu	ab93d9c23d	enhance: Use BatchPkExist to reduce bloom filter func call cost (#33611 ) issue:#33610 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-13 17:57:56 +08:00
congqixia	512ea6be5f	enhance: Avoid merging insert data when buffering insert msgs (#33562 ) See also #33561 This PR: - Use zero copy when buffering insert messages - Make `storage.InsertCodec` support serialize multiple insert data chunk into same batch binlog files Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-13 11:15:56 +08:00
congqixia	b39dfc25dc	enhance: Use fastjson lib for unmarshal delete log (#33787 ) ``` goos: linux goarch: amd64 GOMAXPROC=1 cpu: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz BenchmarkJsonSerdeStd 343872 3568 ns/op 1335 B/op 25 allocs/op BenchmarkJsonSerdeFastjson 5124177 234.9 ns/op 16 B/op 1 allocs/op ``` --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-06-12 20:41:57 +08:00
wayblink	a1232fafda	feat: Major compaction (#33620 ) #30633 Signed-off-by: wayblink <anyang.wang@zilliz.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com>	2024-06-10 21:34:08 +08:00
wei liu	c6a1c49e02	enhance: Use Blocked Bloom Filter instead of basic bloom fitler impl. (#33405 ) issue: #32995 To speed up the construction and querying of Bloom filters, we chose a blocked Bloom filter instead of a basic Bloom filter implementation. WARN: This PR is compatible with old version bf impl, but if fall back to old milvus version, it may causes bloom filter deserialize failed. In single Bloom filter test cases with a capacity of 1,000,000 and a false positive rate (FPR) of 0.001, the blocked Bloom filter is 5 times faster than the basic Bloom filter in both querying and construction, at the cost of a 30% increase in memory usage. - Block BF construct time {"time": "54.128131ms"} - Block BF size {"size": 3021578} - Block BF Test cost {"time": "55.407352ms"} - Basic BF construct time {"time": "210.262183ms"} - Basic BF size {"size": 2396308} - Basic BF Test cost {"time": "192.596229ms"} In multi Bloom filter test cases with a capacity of 100,000, an FPR of 0.001, and 100 Bloom filters, we reuse the primary key locations for all Bloom filters to avoid repeated hash computations. As a result, the blocked Bloom filter is also 5 times faster than the basic Bloom filter in querying. - Block BF TestLocation cost {"time": "529.97183ms"} - Basic BF TestLocation cost {"time": "3.197430181s"} --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-31 17:49:45 +08:00
wei liu	322a4c5b8c	enhance: Remove StringPrimaryKey to reduce unnecessary copy and function call cost (#33486 ) issue: #33497 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-31 15:41:45 +08:00
congqixia	73c9b80a7d	enhance: Store locations for largest K in `LocationCache` (#33429 ) See also #32642 `LocationCache` used map to store different locations for different K which may cause lots of CPU time when get locations many times. This PR change the implementation of LocationCache to store only the location for the largest K used to totally remove the map access operation. See pprof from test of @XuanYang-cn ![image](https://github.com/milvus-io/milvus/assets/84113973/ad17cff8-62ad-4d78-9bb0-f6df0512f4ea) --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-29 10:05:42 +08:00
Ted Xu	066c8ea175	feat: stream reader/writer to support nulls (#33080 ) See: #31728 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-05-27 16:27:42 +08:00
congqixia	970bf18a49	fix: Allocate new slice for each batch in streaming reader (#33359 ) Related to #33268 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-24 18:07:41 +08:00
Ted Xu	a8bd9bea39	fix: adding blob memory size in binlog serde (#33324 ) See: #33280 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-05-24 10:33:40 +08:00
Cai Yudong	4004e4c545	enhance: Optimize bulk insert unittest (#33224 ) Issue: #22837 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-05-24 10:23:41 +08:00
Ted Xu	a9c7ce72b8	enhance: enable stream writer in compactions (#32612 ) See #31679 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-05-17 15:05:37 +08:00
cai.zhang	6ea7633bd5	enhance: Add memory size for binlog (#33025 ) issue: #33005 1. add `MemorySize` field for insert binlog. 2. `LogSize` means the file size in the storage object. 3. `MemorySize` means the size of the data in the memory. --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com> Signed-off-by: cai.zhang <cai.zhang@zilliz.com>	2024-05-15 12:59:34 +08:00
Cai Yudong	4fc7915c70	enhance: unify data generation test APIs (#32955 ) Issue: #22837 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-05-14 14:33:33 +08:00
congqixia	0e5765b116	enhance: Utilize `TestLocations` ability to accelerate write & compaction (#32948 ) See also #32642 This PR reuses hash locations for bloom filter prediction utilizing `storage.Location`, like enhancement #32642. Also adds a utility struct in storage: `LocationCache` to storage locations for variable K (numbers of hash functions) --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-13 10:15:32 +08:00
wei liu	5038036ece	enhance: Reuse hash locations during access bloom fitler (#32642 ) issue: #32530 when try to match segment bloom filter with pk, we can reuse the hash locations. This PR maintain the max hash Func, and compute hash location once for all segment, reuse hash location can speed up bf access --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-07 06:13:47 -07:00
Cai Yudong	bcdbd1966e	feat: Support sparse float vector bulk insert for binlog/json/parquet (#32649 ) Issue: #22837 Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-05-07 18:43:30 +08:00
aoiasd	31dca3249e	enhance: add type info for payload writer error message and add log when querynode find new collection (#32522 ) relate: https://github.com/milvus-io/milvus/issues/32668 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-05-07 14:45:29 +08:00
Aldrin	cb8dbc3c83	fix: Removed minio bucket after use in test (#32624 ) issue: https://github.com/milvus-io/milvus/issues/32616 - Forcefully deleted the non empty minio bucket with dummy data. Signed-off-by: Aldrin <imagesai32@gmail.com>	2024-04-28 13:51:26 +08:00
chyezh	2586c2f1b3	enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740 ) issue: #19095,#29655,#31718 - Change `ListWithPrefix` to `WalkWithPrefix` of OOS into a pipeline mode. - File garbage collection is performed in other goroutine. - Segment Index Recycle clean index file too. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-25 20:41:27 +08:00
Buqian Zheng	8a1017a152	enhance: add helpers to parse sparse float vector in JSON (#32543 ) issue: #29419 added helper functions to parse JSON representation of sparse float vectors, will be used by both the restful server and the import utils. Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-04-25 14:47:24 +08:00

1 2 3 4 5 ...

470 Commits