1. fix some typos in md,yaml #22893 Signed-off-by: Sheldon <chuanfeng.liu@zilliz.com>
3.2 KiB
SegmentGrowing
Growing segment has the following additional interfaces:
-
PreInsert(size) -> reservedOffset
: serial interface, which reserves space for future insertion and returns thereservedOffset
. -
Insert(reservedOffset, size, ...Data...)
: write...Data...
into range[reservedOffset, reservedOffset + size)
. This interface is allowed to be called concurrently....Data...
contains row_ids, timestamps two system attributes, and other columns- data columns can be stored either row-based or column-based.
PreDelete & Delete(reservedOffset, row_ids, timestamps)
is a delete interface similar to insert interface.
Growing segment stores data in the form of chunk. The number of rows in each chunk is restricted by configs.
Rows per segment are controlled by parameters size_per_Chunk
config
When inserting, first allocate enough space to ensure total_size <= num_chunk * size_per_chunk
, and then convert data from row format to column format.
During a search, each 'chunk' will be searched, and the search results will be saved as 'subquery result', then reduced into TopK.
Growing Segment also implements small batch index for vectors. The parameters of small batch index are preset in segcore config
When metric type
is specified in the schema, the default parameters will build an index for each chunk to accelerate query
SegmentGrowingImpl internal
- SegcoreConfig: contains parameters for Segcore,it has to be specified before create segment
- InsertRecord: inserted data put to here
- DeleteRecord: wait for delete implementation
- IndexingRecord: contains data with small index
- SealedIndexing: Record not used anymore
SegcoreConfig
- Manage chunk_sizeand small index parameters
parse_from
can parse from yaml files(this function is not enabled by default)- refer to
${milvus}/internal/core/unittest/test_utils/test_segcore.yaml
- refer to
default_config
offers default parameters
InsertRecord
Used to manage concurrent inserted data, including:
atomic<int64_t> reserved
reserved space calculationAckResponder
calculate which segment to insert, returns current segment offsetConcurrentVector
stores data columns, each column has one concurrent vector
The following steps are executed when insert,
-
Serially Execute
PreInsert(size) -> reserved_offset
to allocate memory space, the address of space is[reserved_offset, reserved_offset + size)
is reserved -
Parallelly execute
Insert(reserved_offset, size, ...Data...)
interface,copy data into the above memory address- First of all,for
ConcurrentVector
of each column, callgrow_to_at_least
to reserve space - For each column data, call
set_data_raw
interface to put data into corresponding locations. - After execution finished,call
AddSegment
ofAckResponder
,mark the space[reserved_offset, reserved_offset + size)
to already inserted
- First of all,for
ConcurrentVector
This is a column data storage that can be inserted concurrently. It is composed of multi-data chunks.
- After
grow_to_at_least(size)
called, reserve space no less thansize
set_data_raw(element_offset, source, element_count)
point source to continuous piece of dataget_span(chunk_id)
get the span of the corresponding chunk