milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2024-12-02 20:09:57 +08:00

Author	SHA1	Message	Date
aoiasd	1b4e28b97f	enhance: Check by proxy rate limiter when delete get data by query. (#30891 ) relate: https://github.com/milvus-io/milvus/issues/30927 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-05-23 20:03:40 +08:00
wei liu	c7be2ce33a	enhance: Decrease bloom filter fp rate to reduce delete impact (#33301 ) when milvus process delete record, it need to find record's corresponded segment by bloom filter, and higher bloom filter fp rate will cause delete record forwards to wrong segments. This PR Decrease bloom filter's default fp to 0.001. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-23 18:15:41 +08:00
shaoting-huang	de7901121f	Upgrade go from 1.20 to 1.21 (#33047 ) Signed-off-by: shaoting-huang [shaoting-huang@zilliz.com] issue: https://github.com/milvus-io/milvus/issues/32982 # Background Go 1.21 introduces several improvements and changes over Go 1.20, which is quite stable now. According to [Go 1.21 Release Notes](https://tip.golang.org/doc/go1.21), the big difference of Go 1.21 is enabling Profile-Guided Optimization by default, which can improve performance by around 2-14%. Here are the summary steps of PGO: 1. Build Initial Binary (Without PGO) 2. Deploying the Production Environment 3. Run the program and collect Performance Analysis Data (CPU pprof) 4. Analyze the Collected Data and Select a Performance Profile for PGO 5. Place the Performance Analysis File in the Main Package Directory and Name It default.pgo 6. go build Detects the default.pgo File and Enables PGO 7. Build and Release the Updated Binary (With PGO) 8. Iterate and Repeat the Above Steps <img width="657" alt="Screenshot 2024-05-14 at 15 57 01" src="https://github.com/milvus-io/milvus/assets/167743503/b08d4300-0be1-44dc-801f-ce681dabc581"> # What does this PR do There are three experiments, search benchmark by Zilliz test platform, search benchmark by open-source [VectorDBBench](https://github.com/zilliztech/VectorDBBench?tab=readme-ov-file), and search benchmark with PGO. We do both search benchmarks by Zilliz test platform and by VectorDBBench to reduce reliance on a single experimental result. Besides, we validate the performance enhancement with PGO. ## Search Benchmark Report by Zilliz Test Platform An upgrade to Go 1.21 was conducted on a Milvus Standalone server, equipped with 16 CPUs and 64GB of memory. The search performance was evaluated using a 1 million entry local dataset with an L2 metric type in a 768-dimensional space. The system was tested for concurrent searches with 50 concurrent tasks for 1 hour, each with a 20-second interval. The reason for using one server rather than two servers to compare is to guarantee the same data source and same segment state after compaction. Test Sequence: 1. Go 1.20 Initial Run: Insert data, build index, load index, and search. 2. Go 1.20 Rebuild: Rebuild the index with the same dataset, load index, and search. 3. Go 1.21 Load: Upload to Go 1.21 within the server. Then load the index from the second run, and search. 4. Go 1.21 Rebuild: Rebuild the index with the same dataset, load index, and search. Search Metrics: \| Metric \| Go 1.20 \| Go 1.20 Rebuild Index \| Go 1.21 \| Go 1.21 Rebuild Index \| \|----------------------------\|------------------\|-----------------\|------------------\|-----------------\| \| `search requests` \| 10,942,683 \| 16,131,726 \| 16,200,887 \| 16,331,052 \| \| `search fails` \| 0 \| 0 \| 0 \| 0 \| \| `search RT_avg` (ms) \| 16.44 \| 11.15 \| 11.11 \| 11.02 \| \| `search RT_min` (ms) \| 1.30 \| 1.28 \| 1.31 \| 1.26 \| \| `search RT_max` (ms) \| 446.61 \| 233.22 \| 235.90 \| 147.93 \| \| `search TP50` (ms) \| 11.74 \| 10.46 \| 10.43 \| 10.35 \| \| `search TP99` (ms) \| 92.30 \| 25.76 \| 25.36 \| 25.23 \| \| `search RPS` \| 3,039 \| 4,481 \| 4,500 \| 4,536 \| ### Key Findings The benchmark tests reveal that the index build time with Go 1.20 at 340.39 ms and Go 1.21 at 337.60 ms demonstrated negligible performance variance in index construction. However, Go 1.21 offers slightly better performance in search operations compared to Go 1.20, with improvements in handling concurrent tasks and reducing response times. ## Search Benchmark Report By VectorDb Bench Follow [VectorDBBench](https://github.com/zilliztech/VectorDBBench?tab=readme-ov-file) to create a VectorDb Bench test for Go 1.20 and Go 1.21. We test the search performance with Go 1.20 and Go 1.21 (without PGO) on the Milvus Standalone system. The tests were conducted using the Cohere dataset with 1 million entries in a 768-dimensional space, utilizing the COSINE metric type. Search Metrics: Metric \| Go 1.20 \| Go 1.21 without PGO -- \| -- \| -- Load Duration (seconds) \| 1195.95 \| 976.37 Queries Per Second (QPS) \| 841.62 \| 875.89 99th Percentile Serial Latency (seconds) \| 0.0047 \| 0.0076 Recall \| 0.9487 \| 0.9489 ### Key Findings Go 1.21 indicates faster index loading times and larger search QPS handling. ## PGO Performance Test Milvus has already added [net/http/pprof](https://pkg.go.dev/net/http/pprof) in the metrics. So we can curl the CPU profile directly by running `curl -o default.pgo "http://${MILVUS_SERVER_IP}:${MILVUS_SERVER_PORT}/debug/pprof/profile?seconds=${TIME_SECOND}"` to collect the profile as the default.pgo during the first search. Then I build Milvus with PGO and use the same index to run the search again. The result is as below: Search Metrics \| Metric \| Go 1.21 Without PGO \| Go 1.21 With PGO \| Change (%) \| \|---------------------------------------------\|------------------\|-----------------\|------------\| \| `search Requests` \| 2,644,583 \| 2,837,726 \| +7.30% \| \| `search Fails` \| 0 \| 0 \| N/A \| \| `search RT_avg` (ms) \| 11.34 \| 10.57 \| -6.78% \| \| `search RT_min` (ms) \| 1.39 \| 1.32 \| -5.18% \| \| `search RT_max` (ms) \| 349.72 \| 143.72 \| -58.91% \| \| `search TP50` (ms) \| 10.57 \| 9.93 \| -6.05% \| \| `search TP99` (ms) \| 26.14 \| 24.16 \| -7.56% \| \| `search RPS` \| 4,407 \| 4,729 \| +7.30% \| ### Key Findings PGO led to a notable enhancement in search performance, particularly in reducing the maximum response time by 58% and increasing the search QPS by 7.3%. ### Further Analysis Generate a diff flame graphs between two CPU profiles by running `go tool pprof -http=:8000 -diff_base nopgo.pgo pgo.pgo -normalize` <img width="1894" alt="goprofiling" src="https://github.com/milvus-io/milvus/assets/167743503/ab9e91eb-95c7-4963-acd9-d1c3c73ee010"> Further insight of HnswIndexNode and Milvus Search Handler <img width="1906" alt="hnsw" src="https://github.com/milvus-io/milvus/assets/167743503/a04cf4a0-7c97-4451-b3cf-98afc20a0b05"> <img width="1873" alt="search_handler" src="https://github.com/milvus-io/milvus/assets/167743503/5f4d3982-18dd-4115-8e76-460f7f534c7f"> After applying PGO to the Milvus server, the CPU utilization of the faiss::fvec_L2 function has decreased. This optimization significantly enhances the performance of the [HnswIndexNode::Search::searchKnn](`e0c9c41aa2/src/index/hnsw/hnsw.cc (L203)`) method, which is frequently invoked by Knowhere during high-concurrency searches. As the explanation from Go release notes, the function might be more aggressively inlined by Go compiler during the second build with the CPU profiling collected from the first run. As a result, the search handler efficiency within Milvus DataNode has improved, allowing the server to process a higher number of search queries per second (QPS). # Conclusion The combination of Go 1.21 and PGO has led to substantial enhancements in search performance for Milvus server, particularly in terms of search QPS and response times, making it more efficient for handling high-concurrency search operations. Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>	2024-05-22 13:21:39 +08:00
yihao.dai	32560263fa	enhance: Query slot for compaction task (#32881 ) Query slot of compaction in datanode, and transfer the control logic for limiting compaction tasks from datacoord to the datanode. issue: https://github.com/milvus-io/milvus/issues/32809 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-05-17 18:19:38 +08:00
wei liu	cba2c7a3be	enhance: clean channel node info in meta store (#32988 ) issue: #32910 see also: #32911 when channel exclusive mode is enabled, replica will record channel node info in meta store, and if the balance policy changes, which means channel exclusive mode is disabled, we should clean up the channel node info in meta store, and stop to balance node between channels. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-14 10:05:40 +08:00
foxspy	f6777267e3	enhance: add score compute consistency config for knowhere (#32997 ) issue: https://github.com/milvus-io/milvus/issues/32583 related: #32584 Signed-off-by: xianliang.li <xianliang.li@zilliz.com>	2024-05-13 14:21:31 +08:00
Bingyi Sun	4724779b3b	enhance: remove fallback keys for config generator (#32946 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-05-13 13:33:31 +08:00
yiwangdr	855192eb3d	fix: sync milvus.yaml (#32920 ) issue: https://github.com/milvus-io/milvus/issues/25309 Signed-off-by: yiwangdr <yiwangdr@gmail.com>	2024-05-10 17:29:31 +08:00
aoiasd	54a51b1236	enhance: Support dynamic config for opentelemetry trace (#32169 ) relate: https://github.com/milvus-io/milvus/issues/31940 --------- Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-05-09 17:43:30 +08:00
chyezh	641f702f64	fix: add request resource timeout for lazy load, refactor context usage in cache (#32709 ) issue: #32663 - Use new param to control request resource timeout for lazy load. - Remove the timeout parameter of `Do`, remove `DoWait`. use `context` to control the timeout. - Use `VersionedNotifier` to avoid notify event lost and broadcast, remove the redundant goroutine in cache. related dev pr: #32684 Signed-off-by: chyezh <chyezh@outlook.com>	2024-05-07 16:33:30 +08:00
Bingyi Sun	fecd9c21ba	feat: LRU cache implementation (#32567 ) issue: https://github.com/milvus-io/milvus/issues/32783 This pr is the implementation of lru cache on branch lru-dev. Signed-off-by: sunby <sunbingyi1992@gmail.com> Co-authored-by: chyezh <chyezh@outlook.com> Co-authored-by: MrPresent-Han <chun.han@zilliz.com> Co-authored-by: Ted Xu <ted.xu@zilliz.com> Co-authored-by: jaime <yun.zhang@zilliz.com> Co-authored-by: wayblink <anyang.wang@zilliz.com>	2024-05-06 20:29:30 +08:00
chyezh	2586c2f1b3	enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740 ) issue: #19095,#29655,#31718 - Change `ListWithPrefix` to `WalkWithPrefix` of OOS into a pipeline mode. - File garbage collection is performed in other goroutine. - Segment Index Recycle clean index file too. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-25 20:41:27 +08:00
Ted Xu	744a54a534	enhance: enforce milvus.yaml assertion in UT (#32357 ) See #32168 Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-04-19 16:47:20 +08:00
Ted Xu	78d32bd8b2	enhance: update milvus.yaml (#31832 ) See #32168 --------- Signed-off-by: Ted Xu <ted.xu@zilliz.com>	2024-04-16 16:17:19 +08:00
edward.zeng	b7ff85638d	fix: mvcc database space exceeded for embed etcd (#32048 ) Fix #30314 Signed-off-by: Edward Zeng <jie.zeng@zilliz.com>	2024-04-12 21:39:19 +08:00
jaime	371e6d2c1a	enhance: refine sync memory watermark configuration (#32140 ) Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-04-11 20:07:24 +08:00
yihao.dai	49d109de18	enhance: Use an individual buffer size parameter for imports (#31833 ) Use an individual buffer size parameter for imports and set buffer size to 64MB. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-04-08 21:07:18 +08:00
yihao.dai	4e264003bf	enhance: Ensure ImportV2 waits for the index to be built and refine some logic (#31629 ) Feature Introduced: 1. Ensure ImportV2 waits for the index to be built Enhancements Introduced: 1. Utilization of local time for timeout ts instead of allocating ts from rootcoord. 3. Enhanced input file length check for binlog import. 4. Removal of duplicated manager in datanode. 5. Renaming of executor to scheduler in datanode. 6. Utilization of a thread pool in the scheduler in datanode. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-04-01 20:09:13 +08:00
Bingyi Sun	fbff46a005	enhance: add lazyload global config (#31610 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-03-27 20:23:10 +08:00
groot	5be395354c	fix: minio ssl compatible issue (#31607 ) issue: https://github.com/milvus-io/milvus/issues/30709 Signed-off-by: yhmo <yihua.mo@zilliz.com>	2024-03-27 14:41:20 +08:00
presburger	fe1961ff14	enhance: add comments for gpu mem pool setting (#31231 ) Signed-off-by: yusheng.ma <yusheng.ma@zilliz.com>	2024-03-25 14:41:07 +08:00
yihao.dai	f65a796d18	enhance: Add max file num limit and max file size limit for import (#31497 ) The max number of import files per request should not exceed 1024 by default (configurable). The import file size allowed for importing should not exceed 16GB by default (configurable). issue: https://github.com/milvus-io/milvus/issues/28521 Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-22 18:13:06 +08:00
yihao.dai	0fe5e90e8b	enhance: Remove import v1 (#31403 ) Remove all code and logic related to import v1. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-22 15:29:09 +08:00
Chun Han	c3264ca3e3	feat: support segment pruner (#31003 ) related: #30376	2024-03-22 13:57:06 +08:00
groot	c81909bfab	enhance: Support MinIO TLS connection (#31311 ) issue: https://github.com/milvus-io/milvus/issues/30709 pr: #31292 Signed-off-by: yhmo <yihua.mo@zilliz.com> Co-authored-by: Chen Rao <chenrao317328@163.com>	2024-03-21 11:15:20 +08:00
Bingyi Sun	9dbd67879f	enhance: use mmap prefix to define all mmap related configs (#31436 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-03-20 17:55:08 +08:00
Jiquan Long	dc2cdbe387	enhance: add more metrics (#31271 ) /kind improvement fix: #31272 This pr add more metrics, which are: - Slow query count, which the duration considered as slow can be configurable; - Number of deleted entities; - Number of entities imported; - Number of entities per collection; - Number of loaded entities per collection; - Number of indexed entities; - Number of indexed entities, per collection, per index and whether it's a vetor index; - Quota states (LongTimeTickDelay, MemoryExhuasted, DiskQuotaExhuasted) per database; --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-03-19 15:23:06 +08:00
congqixia	16c661c722	enhance: Use different interval for gc scan (#31363 ) See also #31362 This PR make datacoord garbage collection scan operation using differet interval than other opeartion. This interval is a newly added param item, which default value is 7*24 hours. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-19 11:27:06 +08:00
Bingyi Sun	bdc70dfc6a	feat: Add global mmap enable configuration (#31267 ) https://github.com/milvus-io/milvus/issues/31279 Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-03-18 15:17:10 +08:00
XuanYang-cn	ff80d2fd8c	enhance: Enable L0 by default (#30998 ) Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-03-08 15:53:02 +08:00
yihao.dai	c411cb4a49	enhance: Prevent the backlog of channelCP update tasks, perform batch updates of channelCPs (#30941 ) This PR includes the following adjustments: 1. To prevent channelCP update task backlog, only one task with the same vchannel is retained in the updater. Additionally, the lastUpdateTime is refreshed after the flowgraph submits the update task, rather than in the callBack function. 2. Batch updates of multiple vchannel checkpoints are performed in the UpdateChannelCheckpoint RPC (default batch size is 128). Additionally, the lock for channelCPs in DataCoord meta has been switched from key lock to global lock. 3. The concurrency of UpdateChannelCheckpoint RPCs in the datanode has been reduced from 1000 to 10. issue: https://github.com/milvus-io/milvus/issues/30004 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com> Co-authored-by: jaime <yun.zhang@zilliz.com> Co-authored-by: congqixia <congqi.xia@zilliz.com>	2024-03-07 20:39:02 +08:00
Jiquan Long	a88c896733	enhance: purge client infos periodically (#31037 ) https://github.com/milvus-io/milvus/issues/31007 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-03-06 12:50:59 +08:00
congqixia	8c2615f840	enhance: Add unit(seconds) for new added connection manager param (#31023 ) See also #31007 #31008 #31009 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-05 14:50:59 +08:00
congqixia	3b5ce73ded	enhance: Change proxy connection manager to concurrent safe (#31008 ) See also #31007 This PR: - Add param item for connection manager behavior: TTL & check interval - Change clientInfo map to concurrent map --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-03-05 10:39:00 +08:00
yihao.dai	a434d33e75	feat: Add import scheduler and manager (#29367 ) This PR introduces novel managerial roles for importv2: 1. ImportMeta: To manage all the import tasks; 2. ImportScheduler: To process tasks and modify their states; 3. ImportChecker: To ascertain the completion of all tasks and instigate relevant operations. issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-03-01 18:31:02 +08:00
groot	85de56e894	fix: Clean kafka default configuration (#30924 ) issue: #30917 Signed-off-by: yhmo <yihua.mo@zilliz.com>	2024-03-01 18:17:03 +08:00
MrPresent-Han	17a2fd048e	feat: support set up knowhere-build-pool-size on querynode(#29650 ) (#30922 ) related: #29650 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-02-29 18:15:00 +08:00
groot	ba6d33cd57	fix: Support TLS for kafka connection (#30468 ) #27977 Add extra configurations in milvus.yaml to pass certificates for kafka. Signed-off-by: yhmo <yihua.mo@zilliz.com>	2024-02-28 18:43:07 +08:00
chyezh	941dc755df	feat: add collection level flush rate control (#29567 ) flush rate control at collection level to avoid generate too much segment. 0.1 qps by default. issue: #29477 Signed-off-by: chyezh <ye.zhen@zilliz.com>	2024-02-18 15:32:50 +08:00
XuanYang-cn	e0ed5647b3	fix: Limit L0 Compaction segment size and count (#30374 ) See also: #30191 --------- Signed-off-by: yangxuan <xuan.yang@zilliz.com>	2024-02-01 20:39:03 +08:00
yihao.dai	c5918290e6	feat: Add import executor and manager for datanode (#29438 ) This PR introduces novel importv2 roles for datanode: 1. Executor: To execute tasks, a import task will be divided into the following steps: read data -> hash data -> sync data; 2. Manager: To manage all the tasks; issue: https://github.com/milvus-io/milvus/issues/28521 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-01-31 20:45:04 +08:00
cai.zhang	47af347d0e	enhance: Limit index pool size of standalone server (#30170 ) issue: #29926 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-01-30 16:47:03 +08:00
Bingyi Sun	406bf14e84	enhance: Add growing row count weight (#30271 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-01-29 14:05:02 +08:00
xige-16	033eae9e73	enhance: Set segment.maxSize param to 1024M (#30139 ) issue: #25639 /kind improvement When the number of vector columns increases, the number of rows per segment will decrease. In order to reduce the impact on vector indexing performance, it is necessary to increase the segment max limit. If a collection has multiple vector fields with memory and disk indices on different vector fields, the size limit after segment compaction is the minimum of segment.maxSize and segment.diskSegmentMaxSize. Signed-off-by: xige-16 <xi.ge@zilliz.com> --------- Signed-off-by: xige-16 <xi.ge@zilliz.com>	2024-01-29 10:17:02 +08:00
congqixia	7ced0af197	enhance: Enlarge default datanode sync parallel to 256 (#30270 ) See also #27675 After supporting control sync parallel in datanode globally, the shall change default value to a more suitable value for most use cases. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-26 11:35:00 +08:00
yihao.dai	c02fb64ad6	enhance: Allows proactive warming up of chunk cache (#30182 ) Allows proactive warming up of chunk cache. Original vector data will be asynchronously loaded into the chunk cache during the load process. It has the potential to significantly reduce query/search latency for a certain duration after the load, albeit with a concurrent increase in disk usage. issue: https://github.com/milvus-io/milvus/issues/30181 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-01-25 19:55:39 +08:00
MrPresent-Han	2a0eb1d2e6	feat: support general capacity restrict for cloud-side resoure contro… (#29845 ) related: #29844 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-01-16 16:32:53 +08:00
wei liu	07057fcf7c	fix: Unexpected rpc msg size limit (#29682 ) due to `clientMaxSendSize` and `serverMaxRecvSize` will limit the rpc request size limit, they should use same config value, and `serverMaxSendSize` and `clientMaxRecvSize` will limit the rpc response size limit, they should use same config value too. This PR fix unexpected rpc msg limit which caused by the wrong usage of misunderstanding rpc config items Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-05 15:56:47 +08:00
MrPresent-Han	ed644983e2	enhance: add param for bloomfilter(#29388 ) (#29490 ) related: #29388 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2023-12-28 18:10:46 +08:00
xige-16	0a70e8b601	enhance: Remove multiple vector field limit (#27827 ) issue: https://github.com/milvus-io/milvus/issues/25639 /kind improvement Signed-off-by: xige-16 <xi.ge@zilliz.com> Signed-off-by: xige-16 <xi.ge@zilliz.com>	2023-12-28 16:40:46 +08:00

1 2 3 4 5 ...

546 Commits