/kind improvement
this removes the 1x copying while loading variable length data, also
avoids constructing std::string, which could lead to memory
fragmentation
---------
Signed-off-by: yah01 <yah2er0ne@outlook.com>
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
Co-authored-by: yah01 <yah2er0ne@outlook.com>
issue: https://github.com/milvus-io/milvus/issues/30687
We store all the varchar datas in an continuous address and use
string_view to quickly find them. In this case, using string_view.data()
directly will point to all rest varchar datas.
---------
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
the old version Knowhere would copy the index data while loading, we
need to consider this to avoid OOM.
Knowhere provides a util function to indicate whether it will load the
index with disk, if not, we need to double the memory usage prediction
for index data
Signed-off-by: yah01 <yang.cen@zilliz.com>
See also #30651
Append operator of `std::filesystem::path` will replace whole path when
the param of "/" operation is an absolute path.
In "All-in-one" mode, this shall cause ChunkCache removing the original
vector data file when building chunk cache during/after load procedure.
This PR changes the ChunkCache path generation logic to a separate
function in which will check whether the file path is absolute or not.
If the file path is absolute, it removes the root path prefix and return
concatenated file path.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #29988
This pr adds full-support for wildcard pattern matching from end to end.
Before this pr, the users can only use prefix match in their expression,
for example, "like 'prefix%'". With this pr, more flexible syntax can be
combined.
To do so, this pr makes these changes:
- 1. support regex query both on index and raw data;
- 2. translate the pattern matching to regex query, so that it can be
handled by the regex query logic;
- 3. loose the limit of the expression parsing, which allows general
pattern matching syntax;
With the support of regex query in segcore backend, we can also add
mysql-like `REGEXP` syntax later easily.
---------
Signed-off-by: longjiquan <jiquan.long@zilliz.com>
issue: #28521#29732
include
1. list collection's import jobs
2. create a new import job
3. get the progress of an import job
fix:
1. mix the order of dbName & collectionName #29728
2. trace log keep the same as v1
3. support traceID
4. azure precheck, blob name cannot end with / #29703
---------
Signed-off-by: PowderLi <min.li@zilliz.com>
according to our benchmark, concurrency level 16 is enough to fully
utilize the object storage network bandwidth
Signed-off-by: yah01 <yang.cen@zilliz.com>
Allows proactive warming up of chunk cache. Original vector data will be
asynchronously loaded into the chunk cache during the load process. It
has the potential to significantly reduce query/search latency for a
certain duration after the load, albeit with a concurrent increase in
disk usage.
issue: https://github.com/milvus-io/milvus/issues/30181
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
before this, every time writting the index chunk data into the disk,
there are 4 I/O operations:
- open the file
- seek to the offset
- write the data
- close the file
this optimized this to open only once and continiously write all data.
This also makes it concurrent to load the files from object storage
Signed-off-by: yah01 <yang.cen@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/29020
Json can't not pass a max_int32 value to int32_t, so let knowhere check
value range by itself.
After fix this, pymilvus will report:
pymilvus.exceptions.MilvusException: <MilvusException: (code=65535,
message=fail to search on QueryNode 6: worker(6) query failed: => failed
to search: arithmetic overflow: param search_list_size should be at most
2147483647)>
Signed-off-by: cqy123456 <qianya.cheng@zilliz.com>