milvus

mirror of https://gitee.com/milvus-io/milvus.git synced 2024-12-11 09:46:26 +08:00

Author	SHA1	Message	Date
chyezh	e19d17076f	fix: delete may lost when enable lru cache, some field should be reset when ReleaseData (#32012 ) issue: #30361 - Delete may be lost when segment is not data-loaded status in lru cache. skip filtering to fix it. - `stats_` and `variable_fields_avg_size_` should be reset when `ReleaseData` - Remove repeat load delta log operation in lru. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-16 11:17:20 +08:00
Chun Han	337cc0756d	fix: lack good results for insufficient ef(#29883 ) (#32080 ) related: #29883 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-04-13 22:13:23 +08:00
Patrick Weizhi Xu	52ae47c850	enhance: gather materialized view search info once per request (#31996 ) issue: #29892 This PR: 1. Move the process of gathering materialized search info to when the search plan is created, before it goes to each segment, to avoid repeated work and access the plan node under multi-threaded circumstances. 2. Enforce the supported MV type to `VARCHAR` 3. Add integration test Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2024-04-11 15:21:19 +08:00
cai.zhang	1b767669a4	enhance: Throw error instead of crash when index cannot be built (#31844 ) issue: #27589 --------- Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-04-09 11:51:18 +08:00
cqy123456	aba4993c6c	fix: fix some fp16/bf16 code miss in segcore. (#31771 ) issue：https://github.com/milvus-io/milvus/issues/22837 Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>	2024-04-07 14:13:16 +08:00
Alexander Guzhva	cae5722229	enhance: performance improvements for the bitset (#31753 ) Issue: #31752 This PR improves the performance for bitset utilities (introduced in PR #30454), including varchar filtering Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>	2024-04-06 05:19:22 +08:00
zhagnlu	b2669e26dc	fix:reduce thread pool test time (#31893 ) #31877 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-04-05 10:05:12 +08:00
zhagnlu	d6d3b01a04	fix:remove thread pool timeout test because of high load cpu (#31879 ) #31877 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-04-03 15:55:38 +08:00
Cai Yudong	246586be27	enhance: Unify data type check APIs under internal/core (#31800 ) Issue: #22837 Move and rename following C++ APIs: datatype_sizeof() ==> GetDataTypeSize() datatype_name() ==> GetDataTypeName() datatype_is_vector() / IsVectorType() ==> IsVectorDataType() datatype_is_variable() ==> IsVariableDataType() datatype_is_sparse_vector() ==> IsSparseFloatVectorDataType() datatype_is_string() / IsString() ==> IsDataTypeString() datatype_is_floating() / IsFloat() ==> IsDataTypeFloat() datatype_is_binary() ==> IsDataTypeBinary() datatype_is_json() ==> IsDataTypeJson() datatype_is_array() ==> IsDataTypeArray() datatype_is_variable() == IsDataTypeVariable() datatype_is_integer() / IsIntegral() ==> IsDataTypeInteger() Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>	2024-04-02 19:15:14 +08:00
chyezh	5655ec4fc0	enhance: add mmap usage metrics (#31708 ) issue: #31707 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-01 11:35:12 +08:00
Cai Yudong	675a5dc822	fix: Save traceID and spanID as std::vector into search config (#31278 ) Issue: #30961 Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2024-03-29 14:29:11 +08:00
Jiquan Long	4eb4df1e81	fix: predict inverted index resource usage more reasonably (#31615 ) /kind improvement issue: #31617 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-03-27 19:33:09 +08:00
zhagnlu	659ad81ab7	fix: remove deprecated ut test (#31499 ) #31498 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-03-26 14:01:07 +08:00
Alexander Guzhva	c4b37fb285	enhance: Custom bitset and bitsetview prototypes (#30454 ) Issue: #31285 Basically, I've replaced `FixedVector<bool>` and `boost::dynamic_bitset` with custom bitset and bitsetview in order to reduce the memory bandwidth & increase performance for the filtering. This PR is for internal use only. Current progress (numbers are for GCC 9.5.0 on Ubuntu 22.04 LTS; clang-17 produces better performance numbers): Baseline: ``` [ RUN ] CApiTest.AssembeChunkPerfTest start test cost: 17903us [ OK ] CApiTest.AssembeChunkPerfTest (183 ms) [ RUN ] Expr.TestMultiLogicalExprsOptimization cost: 1391us cost: 5us cost: 4us cost: 4us cost: 6us cost: 4us cost: 4us cost: 4us cost: 4us cost: 4us 143 cost: 10us cost: 8us cost: 10us cost: 8us cost: 8us cost: 8us cost: 8us cost: 8us cost: 8us cost: 9us 8 /home/ubuntu/zilliz/milvus4/milvus/internal/core/unittest/test_expr.cpp:1561: Failure Expected: (cost_op) < (cost_no_op), actual: 143 vs 8 [ FAILED ] Expr.TestMultiLogicalExprsOptimization (7 ms) [ RUN ] Expr.TestExprs start test 3cost: 889us start test 10cost: 2us start test 20cost: 2us start test 30cost: 2us start test 50cost: 3us start test 100cost: 7us start test 200cost: 16us [ OK ] Expr.TestExprs (9 ms) [ RUN ] Expr.TestUnaryBenchTest start test type:2 cost: 124.8us start test type:3 cost: 163.1us start test type:4 cost: 275.9us start test type:5 cost: 590.9us start test type:10 cost: 62.7us start test type:11 cost: 65.9us [ OK ] Expr.TestUnaryBenchTest (1153 ms) [ RUN ] Expr.TestBinaryRangeBenchTest start test type:2 cost: 151.4us start test type:3 cost: 198.4us start test type:4 cost: 361.9us start test type:5 cost: 753.9us start test type:10 cost: 64.6us start test type:11 cost: 62.2us [ OK ] Expr.TestBinaryRangeBenchTest (1151 ms) [ RUN ] Expr.TestLogicalUnaryBenchTest start test type:2 cost: 121.14us start test type:3 cost: 156.84us start test type:4 cost: 249.76us start test type:5 cost: 534.44us start test type:10 cost: 82.2us start test type:11 cost: 83.52us [ OK ] Expr.TestLogicalUnaryBenchTest (1202 ms) [ RUN ] Expr.TestBinaryLogicalBenchTest start test type:2 cost: 80.64us start test type:3 cost: 78.22us start test type:4 cost: 255.76us start test type:5 cost: 532.04us start test type:10 cost: 89.26us start test type:11 cost: 90us [ OK ] Expr.TestBinaryLogicalBenchTest (1198 ms) [ RUN ] Expr.TestBinaryArithOpEvalRangeBenchExpr start test type:2 cost: 401.7us start test type:3 cost: 420.96us start test type:4 cost: 418.04us start test type:5 cost: 470.54us start test type:10 cost: 250.32us start test type:11 cost: 850.08us [ OK ] Expr.TestBinaryArithOpEvalRangeBenchExpr (1273 ms) [ RUN ] Expr.TestCompareExprBenchTest start test type:2 cost: 162us start test type:3 cost: 142us start test type:4 cost: 374us start test type:5 cost: 674us start test type:10 cost: 366us start test type:11 cost: 645us [ OK ] Expr.TestCompareExprBenchTest (1214 ms) [ RUN ] Expr.TestRefactorExprs start test 3cost: 1253us start test 10cost: 1060us start test 20cost: 681us start test 30cost: 522us start test 50cost: 511us start test 100cost: 506us start test 200cost: 497us [ OK ] Expr.TestRefactorExprs (1142 ms) ``` Candidate: ``` [ RUN ] CApiTest.AssembeChunkPerfTest start test cost: 6099us [ OK ] CApiTest.AssembeChunkPerfTest (153 ms) [ RUN ] Expr.TestMultiLogicalExprsOptimization cost: 42us cost: 15us cost: 15us cost: 14us cost: 15us cost: 15us cost: 15us cost: 15us cost: 15us cost: 15us 17 cost: 41us cost: 39us cost: 33us cost: 33us cost: 33us cost: 33us cost: 34us cost: 41us cost: 34us cost: 34us 35 [ OK ] Expr.TestMultiLogicalExprsOptimization (6 ms) [ RUN ] Expr.TestExprs start test 3cost: 20us start test 10cost: 2us start test 20cost: 2us start test 30cost: 2us start test 50cost: 4us start test 100cost: 8us start test 200cost: 15us [ OK ] Expr.TestExprs (8 ms) [ RUN ] Expr.TestUnaryBenchTest start test type:2 cost: 55.7us start test type:3 cost: 79.8us start test type:4 cost: 177.6us start test type:5 cost: 337.2us start test type:10 cost: 16.9us start test type:11 cost: 15.7us [ OK ] Expr.TestUnaryBenchTest (1140 ms) [ RUN ] Expr.TestBinaryRangeBenchTest start test type:2 cost: 57.1us start test type:3 cost: 87us start test type:4 cost: 177.5us start test type:5 cost: 342.7us start test type:10 cost: 17.9us start test type:11 cost: 16.7us [ OK ] Expr.TestBinaryRangeBenchTest (1152 ms) [ RUN ] Expr.TestLogicalUnaryBenchTest start test type:2 cost: 34.58us start test type:3 cost: 68.86us start test type:4 cost: 151.38us start test type:5 cost: 286.8us start test type:10 cost: 16.54us start test type:11 cost: 16.7us [ OK ] Expr.TestLogicalUnaryBenchTest (1165 ms) [ RUN ] Expr.TestBinaryLogicalBenchTest start test type:2 cost: 20us start test type:3 cost: 17.1us start test type:4 cost: 154.12us start test type:5 cost: 286.1us start test type:10 cost: 19.6us start test type:11 cost: 19.24us [ OK ] Expr.TestBinaryLogicalBenchTest (1188 ms) [ RUN ] Expr.TestBinaryArithOpEvalRangeBenchExpr start test type:2 cost: 125.7us start test type:3 cost: 111.34us start test type:4 cost: 148.02us start test type:5 cost: 306.7us start test type:10 cost: 149.3us start test type:11 cost: 282.94us [ OK ] Expr.TestBinaryArithOpEvalRangeBenchExpr (1221 ms) [ RUN ] Expr.TestCompareExprBenchTest start test type:2 cost: 89us start test type:3 cost: 79us start test type:4 cost: 323us start test type:5 cost: 629us start test type:10 cost: 313us start test type:11 cost: 591us [ OK ] Expr.TestCompareExprBenchTest (1228 ms) [ RUN ] Expr.TestRefactorExprs start test 3cost: 874us start test 10cost: 611us start test 20cost: 290us start test 30cost: 294us start test 50cost: 272us start test 100cost: 278us start test 200cost: 279us [ OK ] Expr.TestRefactorExprs (1149 ms) ``` Signed-off-by: Alexandr Guzhva <alexanderguzhva@gmail.com>	2024-03-24 21:49:07 +08:00
Patrick Weizhi Xu	982dd2834b	enhance: add materialized view search info (#30888 ) issue: #29892 This PR 1. Pass Materialized View (MV) search information obtained from the expression parsing planning procedure to Knowhere. It only performs when MV is enabled and the partition key is involved in the expression. The search information includes: 1. Touched field_id and the count of related categories in the expression. E.g., `color == red && color == blue` yields `field_id -> 2`. 2. Whether the expression only includes AND (&&) logical operator, default `true`. 3. Whether the expression has NOT (!) operator, default `false`. 4. Store if turning on MV on the proxy to eliminate reading from paramtable for every search request. 5. Renames to MV. ## Rebuttals 1. Did not write in `ExtractInfoPlanNodeVisitor` since the new scalar framework was introduced and this part might be removed in the future. 2. Currently only interested in `==` and `in` expression, `string` data type, anything else is a bonus. 3. Leave handling expressions like `F == A \|\| F == A` for future works of the optimizer. ## Detailed MV Info ![image](https://github.com/milvus-io/milvus/assets/6563846/b27c08a0-9fd3-4474-8897-30a3d6d6b36f) Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2024-03-21 11:19:07 +08:00
groot	c81909bfab	enhance: Support MinIO TLS connection (#31311 ) issue: https://github.com/milvus-io/milvus/issues/30709 pr: #31292 Signed-off-by: yhmo <yihua.mo@zilliz.com> Co-authored-by: Chen Rao <chenrao317328@163.com>	2024-03-21 11:15:20 +08:00
Chun Han	6939ad15f2	fix:possible out-of-bound due to groupby when reduing(#30711 ) (#31200 ) related: #30711 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-03-14 13:07:03 +08:00
Buqian Zheng	96cfae55a5	feat: [Sparse Float Vector] segcore to support sparse vector search and get raw vector by id (#30629 ) This PR adds the ability to search/get sparse float vectors in segcore, and added unit tests by modifying lots of existing tests into parameterized ones. https://github.com/milvus-io/milvus/issues/29419 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-12 09:16:30 -07:00
Buqian Zheng	070dfc77bf	feat: [Sparse Float Vector] segcore basics and index building (#30357 ) This commit adds sparse float vector support to segcore with the following: 1. data type enum declarations 2. Adds corresponding data structures for handling sparse float vectors in various scenarios, including: * FieldData as a bridge between the binlog and the in memory data structures * mmap::Column as the in memory representation of a sparse float vector column of a sealed segment; * ConcurrentVector as the in memory representation of a sparse float vector of a growing segment which supports inserts. 3. Adds logic in payload reader/writer to serialize/deserialize from/to binlog 4. Adds the ability to allow the index node to build sparse float vector index 5. Adds the ability to allow the query node to build growing index for growing segment and temp index for sealed segment without index built This commit also includes some code cleanness, comment improvement, and some unit tests for sparse vector. https://github.com/milvus-io/milvus/issues/29419 Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>	2024-03-11 14:45:02 +08:00
Cai Yudong	a99143dd52	fix: Save traceID and spanID as hex string into search config (#31071 ) Issue: #30961 Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2024-03-11 14:21:01 +08:00
cai.zhang	1aa97a5c21	enhance: Support more relational operators for binary expressions (#30902 ) issue: #30677 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-03-01 16:57:00 +08:00
Jiquan Long	e2f35954d4	enhance: support pattern matching on json field (#30779 ) issue: https://github.com/milvus-io/milvus/issues/30714 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-02-28 18:31:00 +08:00
Jiquan Long	16b785e149	enhance: optimize the memory usage and speed up loading variable length data (#30787 ) /kind improvement this removes the 1x copying while loading variable length data, also avoids constructing std::string, which could lead to memory fragmentation --------- Signed-off-by: yah01 <yah2er0ne@outlook.com> Signed-off-by: longjiquan <jiquan.long@zilliz.com> Co-authored-by: yah01 <yah2er0ne@outlook.com>	2024-02-28 16:45:00 +08:00
congqixia	a115b731ed	enhance: fix old pr cpp format issue (#30894 ) Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-02-28 16:28:20 +08:00
Cai Yudong	8a219e0102	feat: Support knowhere trace using OpenTelemetry (#30750 ) Issue: #21508 Signed-off-by: Yudong Cai <yudong.cai@zilliz.com>	2024-02-28 12:29:00 +08:00
yah01	57397b1307	enhance: add new LRU cache impl (#30360 ) - remove the unused LRU cache - add new LRU cache impl which wraps github.com/karlseguin/ccache related #30361 --------- Signed-off-by: yah01 <yang.cen@zilliz.com>	2024-02-27 20:58:40 +08:00
Jiquan Long	e2330f02f8	fix: pattern match use incorrect raw data (#30764 ) issue: https://github.com/milvus-io/milvus/issues/30687 We store all the varchar datas in an continuous address and use string_view to quickly find them. In this case, using string_view.data() directly will point to all rest varchar datas. --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-02-22 19:56:52 +08:00
MrPresent-Han	77eb6defb1	feat: support groupby on growing and non-indexed sealed egment(#30307 ) (#30644 ) related: #30308 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-02-21 14:02:53 +08:00
zhagnlu	18aac076de	fix: move test from NEON to X86 (#30324 ) #26137 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-02-21 11:58:53 +08:00
zhagnlu	976b6fc0e4	enhance: change opendal as compile configurable (#30384 ) #30373 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-02-20 19:16:52 +08:00
congqixia	18c351efa6	fix: Prevent ChunkCache use absolute path in All-in-one mode (#30666 ) See also #30651 Append operator of `std::filesystem::path` will replace whole path when the param of "/" operation is an absolute path. In "All-in-one" mode, this shall cause ChunkCache removing the original vector data file when building chunk cache during/after load procedure. This PR changes the ChunkCache path generation logic to a separate function in which will check whether the file path is absolute or not. If the file path is absolute, it removes the root path prefix and return concatenated file path. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-02-19 20:58:51 +08:00
zhagnlu	e8a6f1ea2b	fix: erase pk empty check when pk index replace raw data (#30432 ) #30350 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-02-07 14:56:47 +08:00
Jiquan Long	a587450e56	enhance: [skip-e2e] disable asan (#30498 ) fix: #30511 /kind improvement --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-02-04 21:25:05 +08:00
Jiquan Long	e549148a19	enhance: full-support for wildcard pattern matching (#30288 ) issue: #29988 This pr adds full-support for wildcard pattern matching from end to end. Before this pr, the users can only use prefix match in their expression, for example, "like 'prefix%'". With this pr, more flexible syntax can be combined. To do so, this pr makes these changes: - 1. support regex query both on index and raw data; - 2. translate the pattern matching to regex query, so that it can be handled by the regex query logic; - 3. loose the limit of the expression parsing, which allows general pattern matching syntax; With the support of regex query in segcore backend, we can also add mysql-like `REGEXP` syntax later easily. --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-02-01 12:37:04 +08:00
xige-16	e9fdd2475d	fix: fix searchPlan metricType modified concurrently (#30227 ) issue: #30225 /kind bug Signed-off-by: xige-16 <xi.ge@zilliz.com> --------- Signed-off-by: xige-16 <xi.ge@zilliz.com>	2024-01-26 14:03:09 +08:00
yihao.dai	c02fb64ad6	enhance: Allows proactive warming up of chunk cache (#30182 ) Allows proactive warming up of chunk cache. Original vector data will be asynchronously loaded into the chunk cache during the load process. It has the potential to significantly reduce query/search latency for a certain duration after the load, albeit with a concurrent increase in disk usage. issue: https://github.com/milvus-io/milvus/issues/30181 --------- Signed-off-by: bigsheeper <yihao.dai@zilliz.com>	2024-01-25 19:55:39 +08:00
zhagnlu	8c58d9af67	enhance: optimize marisa trie range search for performance (#30079 ) #30078 #29986 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-01-25 10:07:00 +08:00
Patrick Weizhi Xu	0907d76253	enhance: pass partition key scalar info if enabled when build vector index (#29931 ) issue: #29892 Pass optional scalar IVF offsets to Cardinal Signed-off-by: Patrick Weizhi Xu <weizhi.xu@zilliz.com>	2024-01-24 00:04:55 +08:00
cai.zhang	6cf2f09b60	feat: Support tencent cloud object storage for milvus (#30163 ) issue: #30162 Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>	2024-01-23 11:28:56 +08:00
MrPresent-Han	4436effdc3	enhance: support groupby based on scalar-index(#29965 ) (#30091 ) related: #29965 Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-01-22 10:50:54 +08:00
Xu Tong	e429965f32	Add float16 approve for multi-type part (#28427 ) issue：https://github.com/milvus-io/milvus/issues/22837 Add bfloat16 vector, add the index part of float16 vector. Signed-off-by: Writer-X <1256866856@qq.com>	2024-01-11 15:48:51 +08:00
congqixia	d6429933a7	enhance: make Load process traceable in querynode & segcore (#29858 ) See also #29803 This PR: - Add trace span for `LoadIndex` & `LoadFieldData` in segment loader - Add `TraceCtx` parameter for `Index.Load` in segcore - Add span for ReadFiles & Engine Load for Memory/Disk Vector index --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-01-10 21:58:51 +08:00
zhagnlu	601a8b801b	fix: add move cursor function to physical expr (#29603 ) #29570 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-01-09 17:08:48 +08:00
zhenshan.cao	60e88fb833	fix: Restore the MVCC functionality. (#29749 ) When the TimeTravel functionality was previously removed, it inadvertently affected the MVCC functionality within the system. This PR aims to reintroduce the internal MVCC functionality as follows: 1. Add MvccTimestamp to the requests of Search/Query and the results of Search internally. 2. When the delegator receives a Query/Search request and there is no MVCC timestamp set in the request, set the delegator's current tsafe as the MVCC timestamp of the request. If the request already has an MVCC timestamp, do not modify it. 3. When the Proxy handles Search and triggers the second phase ReQuery, divide the ReQuery into different shards and pass the MVCC timestamp to the corresponding Query requests. issue: #29656 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-01-09 11:38:48 +08:00
Jiquan Long	e9f3df3626	fix: inverted index file not found (#29695 ) issue: https://github.com/milvus-io/milvus/issues/29654 --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2024-01-07 20:26:49 +08:00
zhagnlu	d07197ab1a	enhance: add compare simd function (#29432 ) #26137 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>	2024-01-07 20:20:57 +08:00
MrPresent-Han	9e2e7157e9	feat: support search_group_by for milvus(#25324 ) (#28983 ) related: #25324 Search GroupBy function, used to aggregate result entities based on a specific scalar column. several points to mention: 1. Temporarliy, the whole groupby is implemented separated from iterative expr framework for the first period 2. In the long term, the groupBy operation will be incorporated into the iterative expr framework:https://github.com/milvus-io/milvus/pull/28166 3. This pr includes some unrelated mocked interface regarding alterIndex due to some unworth-to-mention reasons. All these un-associated content will be removed before the final pr is merged. This version of pr is only for review 4. All other related details were commented in the files comparison Signed-off-by: MrPresent-Han <chun.han@zilliz.com>	2024-01-05 15:50:47 +08:00
PowderLi	c8db36a63a	enhance: get a blob to check object storage config (#29703 ) issue: #29672 the storage account need privileges of actions `Microsoft.Storage/storageAccounts/blobServices/containers/blobs/*` at least Signed-off-by: PowderLi <min.li@zilliz.com>	2024-01-05 14:50:46 +08:00
yah01	99e0f1e65a	enhance: unable to compile C++ tests (#29616 ) The tests need to call a private method, Milvus uses `#define` to replace private with public, the hack trick works but would be broken if the including order changed. This uses friend to make all things work well Signed-off-by: yah01 <yang.cen@zilliz.com> Signed-off-by: yah01 <yah2er0ne@outlook.com>	2024-01-04 13:20:46 +08:00
Jiquan Long	3f46c6d459	feat: support inverted index (#28783 ) issue: https://github.com/milvus-io/milvus/issues/27704 Add inverted index for some data types in Milvus. This index type can save a lot of memory compared to loading all data into RAM and speed up the term query and range query. Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL` and `VARCHAR`. Not supported: `ARRAY` and `JSON`. Note: - The inverted index for `VARCHAR` is not designed to serve full-text search now. We will treat every row as a whole keyword instead of tokenizing it into multiple terms. - The inverted index don't support retrieval well, so if you create inverted index for field, those operations which depend on the raw data will fallback to use chunk storage, which will bring some performance loss. For example, comparisons between two columns and retrieval of output fields. The inverted index is very easy to be used. Taking below collection as an example: ```python fields = [ FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100), FieldSchema(name="int8", dtype=DataType.INT8), FieldSchema(name="int16", dtype=DataType.INT16), FieldSchema(name="int32", dtype=DataType.INT32), FieldSchema(name="int64", dtype=DataType.INT64), FieldSchema(name="float", dtype=DataType.FLOAT), FieldSchema(name="double", dtype=DataType.DOUBLE), FieldSchema(name="bool", dtype=DataType.BOOL), FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000), FieldSchema(name="random", dtype=DataType.DOUBLE), FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim), ] schema = CollectionSchema(fields) collection = Collection("demo", schema) ``` Then we can simply create inverted index for field via: ```python index_type = "INVERTED" collection.create_index("int8", {"index_type": index_type}) collection.create_index("int16", {"index_type": index_type}) collection.create_index("int32", {"index_type": index_type}) collection.create_index("int64", {"index_type": index_type}) collection.create_index("float", {"index_type": index_type}) collection.create_index("double", {"index_type": index_type}) collection.create_index("bool", {"index_type": index_type}) collection.create_index("varchar", {"index_type": index_type}) ``` Then, term query and range query on the field can be speed up automatically by the inverted index: ```python result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"]) result = collection.query(expr='int64 < 5', output_fields=["pk"]) result = collection.query(expr='int64 > 2997', output_fields=["pk"]) result = collection.query(expr='1 < int64 < 5', output_fields=["pk"]) ``` --------- Signed-off-by: longjiquan <jiquan.long@zilliz.com>	2023-12-31 19:50:47 +08:00

1 2 3 4 5 ...

486 Commits