1. support read and write null in segcore
will store valid_data(use uint8_t type to save memory) in fieldData.
2. support load null
binlog reader read and write data into column(sealed segment),
insertRecord(growing segment). In sealed segment, store valid_data
directly. In growing segment, considering prior implementation and easy
code reading, it covert uint8_t to fbvector<bool>, which may optimize in
future.
3. retrieve valid_data.
parse valid_data in search/query.
#31728
---------
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
issue: #34595
pr#34596 to we add an overloaded factor to segment in delegator, which
cause same segment got different score in delegator and worker. which
may cause segment bounce between delegator and worker.
This PR use average score to compute the delegator overloaded factor, to
avoid segment bounce between delegator and worker.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
related with #34917
This PR updates the func utils get IP address function to support
getting a IPv6 address if it is available and return it. I've tested
that this works locally when running inside a docker container that has
an IPv6 network address.
This PR should address
https://github.com/milvus-io/milvus/discussions/20217
Thank you for review!
Signed-off-by: David Pichler <david.pichler@grainger.com>
Co-authored-by: David Pichler <david.pichler@grainger.com>
issue: #34685
knowhere needs a new json param `range_search_k` for RangeSearch to
early terminate the iterator.
Signed-off-by: min.tian <min.tian.cn@gmail.com>
issue: #33285
- make message builder and message conversion type safe
- add adaptor type and function to adapt old msgstream msgpack and
message interface
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #34781
when balance segment hasn't finished yet, query coord may found 2 loaded
copy of segment, then it will generate task to deduplicate, which may
cancel the balance task. then the old copy has been released, and the
new copy hasn't be ready yet but canceled, then search failed by segment
lack.
this PR set deduplicate segment task's proirity to low, to avoid balance
segment task canceled by deduplicate task.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
1. Move the common modules of streamingNode and dataNode to flushcommon
2. Add new GetVChannels interface for rootcoord
issue: https://github.com/milvus-io/milvus/issues/33285
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Related to #33235
THe querynode pipeline will make map & call ProcessInsert when there is
no write messages. So querynodes will have high CPU usage even when
there is no workload.
This PR check msg length before composing data struct and calling method
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
add scalar filtering and vector search latency metrics to distinguish
the cost of scalar filtering.
To add metrics in query chain, add a monitor module and move the metric
files from original storage module.
issue: #34780
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
Seals the largest growing segment if the total size of growing segments
of each shard exceeds the size threshold(default 4GB). Introducing this
policy can help keep the size of growing segments within a suitable
level, alleviating the pressure on the delegator.
issue: https://github.com/milvus-io/milvus/issues/34554
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #34357
Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238))
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.
With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
See also #34670
This PR add quota configuration for l0 segment entry number per
collection. If l0 compaction cannot keep up the insertion/upsertion
rate, this feature could back press the related rate.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
This PR removes the dependency of compaction on the ID allocator by
pre-allocating the logID and segmentID.
issue: https://github.com/milvus-io/milvus/issues/33957
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>