fix: #29757
In previous code, `ColumnBasedInsertMsgToInsertData` adds empty field if
the insertMsg parameter does not have the column schema defined. This
may lead to unexpected behavior of caller functions.
This PR:
- Add column missing check
- Add column length check
- Generate BlobInfo for ColumnBasedInsertMsgToInsertData result
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Benchmark Milvus with https://github.com/qdrant/vector-db-benchmark and
specify the datasets as 'deep-image-96-angular'. Meanwhile, do perf
profiling during 'upload + index' stage of vector-db-benchmark and see
the following hot spots.
39.59%--github.com/milvus-io/milvus/internal/storage.MergeInsertData
|
|--21.43%--github.com/milvus-io/milvus/internal/storage.MergeFieldData
| |
| |--17.22%--runtime.memmove
| |
| |--1.53%--asm_exc_page_fault
| ......
|
|--18.16%--runtime.memmove
|
|--1.66%--asm_exc_page_fault
......
The hot code path is in storage.MergeInsertData() which updates
buffer.buffer by creating a new 'InsertData' instance and merging both
the old buffer.buffer and addedBuffer into it. When it calls golang
runtime.memmove to move buffer.buffer which is with big size (>1M), the
hot spots appear.
To avoid the above overhead, update storage.MergeInsertData() by
appending addedBuffer to buffer.buffer, instead of moving buffer.buffer
and addedBuffer to a new 'InsertData'. This change removes the hot spots
'runtime.memmove' from perf profiling output. Additionally, the 'upload
+ index' time, which is one performance metric of vector-db-benchmark,
is reduced around 60% with this change.
Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>