milvus/docs/design_docs/segcore/segment_growing.md

# SegmentGrowing

Growing segment has the following additional interfaces:

1. `PreInsert(size) -> reseveredOffset`: serial interface, which reserves space for future insertion and returns the `reseveredOffset`.

2. `Insert(reseveredOffset, size, ...Data...)`: write `...Data...` into range `[reseveredOffset, reseveredOffset + size)`. This interface is allowed to be called concurrently.

   1. `...Data...` contains row_ids, timestamps two system attributes, and other columns
   2. data columns can be stored either row-based or column-based.
   3. `PreDelete & Delete(reseveredOffset, row_ids, timestamps)` is a delete interface similar to insert interface.

Growing segment stores data in the form of chunk. The number of rows in each chunk is restricted by configs.

Rows per segment are controlled by parameters `size_per_Chunk ` config

When inserting, first allocate enough space to ensure `total_size <= num_chunk * size_per_chunk`, and then convert data from row format to column format.

During a search, each 'chunk' will be searched, and the search results will be saved as 'subquery result', then reduced into TopK.

Growing Segment also implements small batch index for vectors. The parameters of small batch index are preset in `segcore config`

When `metric type` is specified in the schema, the default parameters will build index for each chunk to accelerate query

## SegmentGrowingImpl internal

1. SegcoreConfig: contains parameters for Segcore，it has to be specified before create segment
2. InsertRecord: inserted data put to here
3. DeleteRecord: wait for delete implementation
4. IndexingRecord: contains data with small index
5. SealedIndexing: Record not used any more

### SegcoreConfig

1. Manage chunk_sizeand small index parameters
2. `parse_from` can parse from yaml files（this function is not enabled by default）
   - refer to `${milvus}/internal/core/unittest/test_utils/test_segcore.yaml`
3. `default_config` offers default parameters

### InsertRecord

Used to manage concurrent inserted data, including:

1. `atomic<int64_t> reserved` reserved space calculation
2. `AckResponder` calculate which segment to insert，returns current segment offset
3. `ConcurrentVector` store data columns, each column has one concurrent vector

The following steps are executed when insert,

1. Serially Execute `PreInsert(size) -> reserved_offset` to allocate memory space, the address of space is `[reserved_offset, reserved_offset + size)` is reserved
2. Parallelly execute `Insert(reserved_offset, size, ...Data...)` interface，copy data into the above memory address

   - First of all，for `ConcurrentVector` of each column, call `grow_to_at_least` to reserve space
   - For each column data, call `set_data_raw` interface to put data into corresponding locations.
   - After execution finished，call`AddSegment` of `AckResponder` ，mark the space `[reserved_offset, reserved_offset + size)` to already inserted

### ConcurrentVector

This is a column data storage that can be inserted concurrently. It is composed of multi-data chunks.

1. After`grow_to_at_least(size)` called, reserve space no less than `size`
2. `set_data_raw(element_offset, source, element_count)` point source to continuous piece of data
3. `get_span(chunk_id)` get the span of the corresponding chunk
-												Add documentation for grwing segment (#9177)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 08:28:25 +08:00
+								# SegmentGrowing
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
+								Growing segment has the following additional interfaces:
 . `PreInsert(size) -> reseveredOffset`: serial interface, which reserves space for future insertion and returns the `reseveredOffset`.
-												Add documentation for grwing segment (#9177)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 08:28:25 +08:00
-												[skip ci]Update typos in segment growing md (#11832)

Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
											
										
										
											2021-11-15 19:55:46 +08:00
+. `Insert(reseveredOffset, size, ...Data...)`: write `...Data...` into range `[reseveredOffset, reseveredOffset + size)`. This interface is allowed to be called concurrently.
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
 . `...Data...` contains row_ids, timestamps two system attributes, and other columns
-												[skip ci] Fix grammar (#13446)

Signed-off-by: sida shen <sida.shen@zilliz.com>
											
										
										
											2021-12-15 18:41:24 +08:00
+. data columns can be stored either row-based or column-based.
-												[skip ci]Update typos in segment growing md (#11832)

Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
											
										
										
											2021-11-15 19:55:46 +08:00
+. `PreDelete & Delete(reseveredOffset, row_ids, timestamps)` is a delete interface similar to insert interface.
-												Add documentation for grwing segment (#9177)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 08:28:25 +08:00
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
+								Growing segment stores data in the form of chunk. The number of rows in each chunk is restricted by configs.
-												[skip e2e] Fix grammar (#13786)

Signed-off-by: sida shen <sida.shen@zilliz.com>
											
										
										
											2021-12-20 19:33:37 +08:00
+								Rows per segment are controlled by parameters `size_per_Chunk ` config
-												Add documentation for grwing segment (#9177)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 08:28:25 +08:00
-												[skip ci]Update typos in segment growing md (#12460)

Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com>
											
										
										
											2021-12-01 15:30:08 +08:00
+								When inserting, first allocate enough space to ensure `total_size <= num_chunk * size_per_chunk`, and then convert data from row format to column format.
-												Add documentation for grwing segment (#9177)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 08:28:25 +08:00
-												[skip e2e] Fix grammar (#13893)

Signed-off-by: sida shen <sida.shen@zilliz.com>
											
										
										
											2021-12-21 19:21:09 +08:00
+								During a search, each 'chunk' will be searched, and the search results will be saved as 'subquery result', then reduced into TopK.
-												Add documentation for grwing segment (#9177)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 08:28:25 +08:00
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
+								Growing Segment also implements small batch index for vectors. The parameters of small batch index are preset in `segcore config`
-												Add documentation for grwing segment (#9177)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 08:28:25 +08:00
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
+								When `metric type` is specified in the schema, the default parameters will build index for each chunk to accelerate query
-												Add documentation for grwing segment (#9177)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 08:28:25 +08:00
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
+								## SegmentGrowingImpl internal
-												[skip ci] Add document for growing segment internal parameters (#9225)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 22:02:02 +08:00
-												[skip ci] Fix typo in growing segment design doc (#11505)

Signed-off-by: Edward Zeng <jie.zeng@zilliz.com>
											
										
										
											2021-11-09 19:54:24 +08:00
+. SegcoreConfig: contains parameters for Segcore，it has to be specified before create segment
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
+. InsertRecord: inserted data put to here
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
+. DeleteRecord: wait for delete implementation
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
+. IndexingRecord: contains data with small index
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
+. SealedIndexing: Record not used any more
-												[skip ci] Add document for growing segment internal parameters (#9225)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 22:02:02 +08:00
 								### SegcoreConfig
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
-												[skip ci] Fix typo in growing segment design doc (#11704)

Signed-off-by: Edward Zeng <jie.zeng@zilliz.com>
											
										
										
											2021-11-12 13:35:27 +08:00
+. Manage chunk_sizeand small index parameters
-												[skip ci] Add document for growing segment internal parameters (#9225)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 22:02:02 +08:00
+. `parse_from` can parse from yaml files（this function is not enabled by default）
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
+								   - refer to `${milvus}/internal/core/unittest/test_utils/test_segcore.yaml`
 . `default_config` offers default parameters
-												[skip ci] Add document for growing segment internal parameters (#9225)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 22:02:02 +08:00
 								### InsertRecord
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
-												[skip e2e] Fix typo for design doc (#14384)

Signed-off-by: yhmo <yihua.mo@zilliz.com>
											
										
										
											2021-12-28 10:36:20 +08:00
+								Used to manage concurrent inserted data, including:
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
+. `atomic<int64_t> reserved` reserved space calculation
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
+. `AckResponder` calculate which segment to insert，returns current segment offset
-												[skip ci] Add document for growing segment internal parameters (#9225)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 22:02:02 +08:00
+. `ConcurrentVector` store data columns, each column has one concurrent vector
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
 								The following steps are executed when insert,
 . Serially Execute `PreInsert(size) -> reserved_offset` to allocate memory space, the address of space is `[reserved_offset, reserved_offset + size)` is reserved
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
+. Parallelly execute `Insert(reserved_offset, size, ...Data...)` interface，copy data into the above memory address
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
+								   - First of all，for `ConcurrentVector` of each column, call `grow_to_at_least` to reserve space
 								   - For each column data, call `set_data_raw` interface to put data into corresponding locations.
 								   - After execution finished，call`AddSegment` of `AckResponder` ，mark the space `[reserved_offset, reserved_offset + size)` to already inserted
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
 								### ConcurrentVector
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
-												[skip e2e] Add note for design doc (#13846)

Signed-off-by: yhmo <yihua.mo@zilliz.com>
											
										
										
											2021-12-21 11:14:54 +08:00
+								This is a column data storage that can be inserted concurrently. It is composed of multi-data chunks.
-												[skip ci]Add growing segment detailed insert process doc (#9238)

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
											
										
										
											2021-10-04 23:46:17 +08:00
-												[skip ci]Format markdown doc for segment_growing.md (#10640)

Signed-off-by: ruiyi.jiang <ruiyi.jiang@zilliz.com>
											
										
										
											2021-10-26 14:24:52 +08:00
+. After`grow_to_at_least(size)` called, reserve space no less than `size`
 . `set_data_raw(element_offset, source, element_count)` point source to continuous piece of data
 . `get_span(chunk_id)` get the span of the corresponding chunk