2021-10-05 23:08:28 +08:00
|
|
|
|
# SegmentSealed
|
2021-11-15 19:47:59 +08:00
|
|
|
|
SegmentSealed has an extra interface rather than SegmentInterface:
|
2021-10-05 23:08:28 +08:00
|
|
|
|
|
2021-12-24 12:11:27 +08:00
|
|
|
|
1. `LoadIndex(loadIndexInfo)`: load the index. indexInfo contains:
|
2021-10-05 23:08:28 +08:00
|
|
|
|
1. `FieldId`
|
2021-11-12 13:27:43 +08:00
|
|
|
|
2. `IndexParams`: index parameters in KV structure KV
|
2021-10-05 23:08:28 +08:00
|
|
|
|
3. `VecIndex`: vector index
|
2021-10-09 17:53:46 +08:00
|
|
|
|
2. `LoadFieldData(loadFieldDataInfo)`: load column data, could be either scalar column or vector column
|
2021-12-27 10:48:21 +08:00
|
|
|
|
1. Note: indexes and vector data for the same column may coexist. Indexes are prioritized in the search
|
2021-11-16 18:49:10 +08:00
|
|
|
|
3. `DropIndex(fieldId)`: drop and release an existing index of a specified field
|
2021-12-14 19:57:52 +08:00
|
|
|
|
4. `DropFieldData(fieldId)`: drop and release existing data for a specified field
|
2021-10-05 23:08:28 +08:00
|
|
|
|
|
2021-11-11 19:39:00 +08:00
|
|
|
|
Search is executable as long as all the columns involved in the search are loaded.
|
2021-10-07 02:03:02 +08:00
|
|
|
|
|
|
|
|
|
# SegmentSealedImpl internal data definition
|
|
|
|
|
1. `row_count_opt_`:
|
2021-12-13 19:34:05 +08:00
|
|
|
|
1. Fill row count when loading the first entity
|
2021-11-17 19:31:12 +08:00
|
|
|
|
2. All the other columns loaded must match the same row count
|
2021-10-09 17:53:46 +08:00
|
|
|
|
3. `xxx_ready_bitset_` & `system_ready_count_`
|
2021-10-07 02:03:02 +08:00
|
|
|
|
1. Used to record whether the corresponding column is loaded. Bitset corresponds to FieldOffset
|
2021-11-29 14:21:30 +08:00
|
|
|
|
2. Query is executable if and only if all the following conditions are met:
|
2021-11-11 19:39:00 +08:00
|
|
|
|
1. system_ready_count_ == 2, which means all the system columns' RowId/Timestamp are loaded
|
2021-11-29 14:21:30 +08:00
|
|
|
|
2. The scalar columns involved in the query is loaded
|
2021-10-07 02:03:02 +08:00
|
|
|
|
3. For the vector columns involved in the query, either the original data or the index is loaded
|
2021-10-09 17:53:46 +08:00
|
|
|
|
4. `scalar_indexings_`: store scalar index
|
2021-10-07 02:03:02 +08:00
|
|
|
|
|
2021-12-15 18:37:32 +08:00
|
|
|
|
1. Use StructuredSortedIndex in Knowhere
|
2021-10-07 02:03:02 +08:00
|
|
|
|
5. `primary_key_index_`: store index for pk column
|
|
|
|
|
1. Use brand new ScalarIndexBase format
|
2021-10-09 17:53:46 +08:00
|
|
|
|
2. **Note: The functions here may overlap with scalar indexes. It is recommended to replace scalar index with ScalarIndexBase**
|
2021-10-07 02:03:02 +08:00
|
|
|
|
6. `field_datas_`: store original data
|
2021-10-09 17:53:46 +08:00
|
|
|
|
1. `aligned_vector<char>` format guarantees `int/float` data are aligned
|
2021-10-07 02:03:02 +08:00
|
|
|
|
7. `SealedIndexingRecord vecindexs_`: store vector index
|
|
|
|
|
8. `row_ids_/timestamps_`: RowId/Timestamp data
|
|
|
|
|
9. `TimestampIndex`: Index for Timestamp column
|
|
|
|
|
10. `schema`: schema
|
|
|
|
|
|
2021-10-09 17:53:46 +08:00
|
|
|
|
# SegmentSealedImpl internal function definition
|
2021-12-31 11:45:19 +08:00
|
|
|
|
1. Most functions are the implementation of the corresponding functions of the segment interface, which will not be repeated here.
|
2021-12-30 12:17:48 +08:00
|
|
|
|
2. `update_row_count`: Used to update the row_count field.
|
|
|
|
|
3. `mask_with_timestamps`: Use Timestamp column to update search bitmask, used to support Time Travel function.
|