This PR splits sealed segment to chunked data to avoid unnecessary
memory copy and save memory usage when loading segments so that loading
can be accelerated.
To support rollback to previous version, we add an option
`multipleChunkedEnable` which is false by default.
Signed-off-by: sunby <sunbingyi1992@gmail.com>
issue: #35922
add an enable_tokenizer param to varchar field: must be set to true so
that a varchar field can enable_match or used as input of BM25 function
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Native support for Google cloud storage using the Google Cloud Storage
libraries. Authentication is performed using GCS service account
credentials JSON.
Currently, Milvus supports Google Cloud Storage using S3-compatible APIs
via the AWS SDK. This approach has the following limitations:
1. Overhead: Translating requests between S3-compatible APIs and GCS can
introduce additional overhead.
2. Compatibility Limitations: Some features of the original S3 API may
not fully translate or work as expected with GCS.
To address these limitations, This enhancement is needed.
Related Issue: #36212
issue: https://github.com/milvus-io/milvus/issues/36182
* improved `Column.h` to make the code much more readable and
maintainable, and added detailed comments.
* fixed an issue where `ArrayColumn::NumRows()` always returns 0 when
the mmap backing storage is a file.
* removed unused `ColumnBase` constructors and unnecessary members so we
don't get confused.
* Updated `test_chunk_cache.cpp` to make the tests parameterized: to
test both mmap enabled and disabled. Added sparse field in the test to
add coverage.
* re-enabled test `Sealed::GetSparseVectorFromChunkCache`.
* But 2 other disabled tests `Sealed::WarmupChunkCache` and
`Sealed::GetVectorFromChunkCache` remain disabled, there seems to be
errors. @bigsheeper PTAL.
---------
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Related to #35303
This PR utilizes pk index in segment to exclude non-hit delete record
during load delete records. This ability is crucial when l0/delete
forward policy only replies on segment itself(without BF filtering).
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #35941
Previous PR: #36034
This patch makes the switch branching logic correct and make the unit
test work for cases which does not select the whole dataset.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #35941
Previous PR: #35943
This PR make `Trie` index using `MARISA_LABEL_ORDER`, which make
predictive search iterating in lexicographic order.
When trie index is build in label order, lexicographc could be utilized
accelerating `Range` operations.
However according to the official document, using `MARISA_LABEL_ORDER`
will make "exact match lookup, common prefix search, and predictive
search" slower.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
fix not append valid data when transfer to insert record and add a tiny
check when in groupBy field.
#35924
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
Related to #35941
For marisa trie `predictive_search` default behavior, it value iterated
is not in lexicographic order.
This PR is a brute force fix to make range operator returns correct
values.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #35927
There are serveral issue this PR addresses:
- Use `ResetTraceConfig` method instead init one in update event handler
- Implement dynamic stats.Handler to receive tracing config update event
- Update `enable_trace` flag when `ResetTraceConfig` is invoked
- Change `enable_trace` to `std::atomic<bool>` in case of data race
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33744
This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>