Commit Graph

67 Commits

Author SHA1 Message Date
congqixia
3106384fc4
enhance: Return deltadata for DeleteCodec.Deserialize (#37214)
Related to #35303 #30404

This PR change return type of `DeleteCodec.Deserialize` from
`storage.DeleteData` to `DeltaData`, which
reduces the memory usage of interface header.

Also refine `storage.DeltaData` methods to make it easier to usage.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 12:04:24 +08:00
smellthemoon
80a7c78f28
enhance: import supports null in parquet and json formats (#35558)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-08-20 16:50:55 +08:00
congqixia
de8a266d8a
enhance: Enable linux code checker (#35084)
See also #34483

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-07-30 15:53:51 +08:00
shaoting-huang
88b373b024
enhance: binlog primary key turn off dict encoding (#34358)
issue: #34357 

Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238))
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.

With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-17 17:47:44 +08:00
smellthemoon
2a1356985d
enhance: support null in go payload (#32296)
#31728

---------

Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
2024-06-19 17:08:00 +08:00
shaoting-huang
8cdc0e6233
fix: fix data codec writer close (#33818)
issue:#33813

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-06-18 13:59:57 +08:00
XuanYang-cn
f67b6dc2b0
fix: DeleteData merge wrong data casuing data loss (#33820)
See also: #33819

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-06-14 17:57:56 +08:00
Buqian Zheng
8a1017a152
enhance: add helpers to parse sparse float vector in JSON (#32543)
issue: #29419

added helper functions to parse JSON representation of sparse float
vectors, will be used by both the restful server and the import utils.

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-04-25 14:47:24 +08:00
Buqian Zheng
3c80083f51
feat: [Sparse Float Vector] add sparse vector support to milvus components (#30630)
add sparse float vector support to different milvus components,
including proxy, data node to receive and write sparse float vectors to
binlog, query node to handle search requests, index node to build index
for sparse float column, etc.

https://github.com/milvus-io/milvus/issues/29419

---------

Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
2024-03-13 14:32:54 -07:00
aoiasd
a0537156c0
enhance: delete codc deserialize data by stream batch (#30407)
relate: https://github.com/milvus-io/milvus/issues/30404

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-02-06 17:04:25 +08:00
XuanYang-cn
d744962aa1
fix: Correct Size calculation of DeleteData (#30397)
This PR would correct the actual deltalog size

See also: #30191

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-02-02 10:47:04 +08:00
Xu Tong
e429965f32
Add float16 approve for multi-type part (#28427)
issue:https://github.com/milvus-io/milvus/issues/22837

Add bfloat16 vector, add the index part of float16 vector.

Signed-off-by: Writer-X <1256866856@qq.com>
2024-01-11 15:48:51 +08:00
Jiquan Long
3f46c6d459
feat: support inverted index (#28783)
issue: https://github.com/milvus-io/milvus/issues/27704

Add inverted index for some data types in Milvus. This index type can
save a lot of memory compared to loading all data into RAM and speed up
the term query and range query.

Supported: `INT8`, `INT16`, `INT32`, `INT64`, `FLOAT`, `DOUBLE`, `BOOL`
and `VARCHAR`.

Not supported: `ARRAY` and `JSON`.

Note:
- The inverted index for `VARCHAR` is not designed to serve full-text
search now. We will treat every row as a whole keyword instead of
tokenizing it into multiple terms.
- The inverted index don't support retrieval well, so if you create
inverted index for field, those operations which depend on the raw data
will fallback to use chunk storage, which will bring some performance
loss. For example, comparisons between two columns and retrieval of
output fields.

The inverted index is very easy to be used.

Taking below collection as an example:

```python
fields = [
		FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
		FieldSchema(name="int8", dtype=DataType.INT8),
		FieldSchema(name="int16", dtype=DataType.INT16),
		FieldSchema(name="int32", dtype=DataType.INT32),
		FieldSchema(name="int64", dtype=DataType.INT64),
		FieldSchema(name="float", dtype=DataType.FLOAT),
		FieldSchema(name="double", dtype=DataType.DOUBLE),
		FieldSchema(name="bool", dtype=DataType.BOOL),
		FieldSchema(name="varchar", dtype=DataType.VARCHAR, max_length=1000),
		FieldSchema(name="random", dtype=DataType.DOUBLE),
		FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim),
]
schema = CollectionSchema(fields)
collection = Collection("demo", schema)
```

Then we can simply create inverted index for field via:

```python
index_type = "INVERTED"
collection.create_index("int8", {"index_type": index_type})
collection.create_index("int16", {"index_type": index_type})
collection.create_index("int32", {"index_type": index_type})
collection.create_index("int64", {"index_type": index_type})
collection.create_index("float", {"index_type": index_type})
collection.create_index("double", {"index_type": index_type})
collection.create_index("bool", {"index_type": index_type})
collection.create_index("varchar", {"index_type": index_type})
```

Then, term query and range query on the field can be speed up
automatically by the inverted index:

```python
result = collection.query(expr='int64 in [1, 2, 3]', output_fields=["pk"])
result = collection.query(expr='int64 < 5', output_fields=["pk"])
result = collection.query(expr='int64 > 2997', output_fields=["pk"])
result = collection.query(expr='1 < int64 < 5', output_fields=["pk"])
```

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2023-12-31 19:50:47 +08:00
congqixia
8a9ab69369
fix: Skip statslog generation flushing empty L0 segment (#28733)
See also #27675

When L0 segment contains only delta data, merged statslog shall be
skiped when performing sync task

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-11-25 15:10:25 +08:00
yah01
ece592a42f
Deliver L0 segments delete records (#27722)
Signed-off-by: yah01 <yah2er0ne@outlook.com>
2023-11-07 01:44:18 +08:00
XuanYang-cn
7358c3527b
Add iterators (#27643)
See also: #27606

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-10-18 19:34:08 +08:00
XuanYang-cn
2f16339aac
Enhance InsertData and FieldData (#27436)
1. Add NewInsertData
2. Add GetRowNum(), GetMemorySize(), and, Append() for InsertData
3. Add AppendRow() for FieldData for compaction

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2023-10-17 17:36:11 +08:00
SimFG
26f06dd732
Format the code (#27275)
Signed-off-by: SimFG <bang.fu@zilliz.com>
2023-09-21 09:45:27 +08:00
Xu Tong
9166011c4a
Add float16 vector (#25852)
Signed-off-by: Writer-X <1256866856@qq.com>
2023-09-08 10:03:16 +08:00
congqixia
41af0a98fa
Use go-api/v2 for milvus-proto (#24770)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2023-06-09 01:28:37 +08:00
yah01
ebd0279d3f
Check error by Error() and NoError() for better report message (#24736)
Signed-off-by: yah01 <yang.cen@zilliz.com>
2023-06-08 15:36:36 +08:00
aoiasd
c84bdcea49
merge stats log when segment flushing or compacting (#23570)
Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2023-05-29 10:21:28 +08:00
Enwei Jiao
967a97b9bd
Support json & array types (#23408)
Signed-off-by: yah01 <yang.cen@zilliz.com>
Co-authored-by: yah01 <yang.cen@zilliz.com>
2023-04-20 11:32:31 +08:00
jaime
c9d0c157ec
Move some modules from internal to public package (#22572)
Signed-off-by: jaime <yun.zhang@zilliz.com>
2023-04-06 19:14:32 +08:00
yah01
081572d31c
Refactor QueryNode (#21625)
Signed-off-by: yah01 <yang.cen@zilliz.com>
Co-authored-by: Congqi Xia <congqi.xia@zilliz.com>
Co-authored-by: aoiasd <zhicheng.yue@zilliz.com>
2023-03-27 00:42:00 +08:00
Xiaofan
949d5d078f
Fix memory calculation in dataCodec (#21800)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2023-01-28 11:09:52 +08:00
Xiaofan
633a749880
Recude IndexCodec Load Memory (#20621)
Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>

Signed-off-by: xiaofan-luan <xiaofan.luan@zilliz.com>
2022-11-18 10:47:08 +08:00
SimFG
a55f739608
Separate public proto files (#19782)
Signed-off-by: SimFG <bang.fu@zilliz.com>

Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-10-16 20:49:27 +08:00
SimFG
d7f38a803d
Separate some proto files (#19218)
Signed-off-by: SimFG <bang.fu@zilliz.com>

Signed-off-by: SimFG <bang.fu@zilliz.com>
2022-09-16 16:56:49 +08:00
xige-16
4de1bfe5bc
Add cpp data codec (#18538)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
Co-authored-by: zhagnlu lu.zhang@zilliz.com

Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-09-09 22:12:34 +08:00
xige-16
99984b88e1
Support delete varChar value (#16229)
Signed-off-by: xige-16 <xi.ge@zilliz.com>
2022-04-02 17:43:29 +08:00
XuanYang-cn
bccf65ec67
[skip e2e]Update license for storage datacodec (#14039)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2021-12-23 12:01:36 +08:00
godchen
7e56f08747
Add payload bytes interface. (#13467)
Signed-off-by: godchen0212 <qingxiang.chen@zilliz.com>
2021-12-16 16:35:42 +08:00
XuanYang-cn
48b45d82e5
Add ut for binlog_io to 100 coverage (#12283)
Make DN ut coverage upto 90%
Resolves: #8058

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2021-11-26 17:43:17 +08:00
godchen
9d5bcd3e3a
Close event and binlog reader (#12173)
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-11-22 17:27:14 +08:00
godchen
863f1bb34e
Fix multi delete data not effect (#11422)
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-11-09 15:01:17 +08:00
XuanYang-cn
cd06f50645
Remove schema in delete codec (#10517)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2021-10-24 09:59:10 +08:00
godchen
ffc0c07610
Change delete data primary key to int64 (#10438)
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-10-22 15:37:12 +08:00
XuanYang-cn
2255fe0b45
Change deserialize deltelog from 1 blob to blobs (#10085)
See also: #9530

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2021-10-19 10:28:35 +08:00
godchen
59ab0e441c
Add bloom filter for stats (#9630)
* Add bloom filter for stats

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* trigger GitHub actions

Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-10-13 10:22:33 +08:00
dragondriver
dedf745b76
Rename IndexParamsFile to IndexParamsKey (#9563)
Signed-off-by: dragondriver <jiquan.long@zilliz.com>
2021-10-09 19:27:02 +08:00
dragondriver
818cf3ffa0
Split blob into several string rows when index file is large (#8919)
Signed-off-by: dragondriver <jiquan.long@zilliz.com>
2021-09-30 17:57:01 +08:00
dragondriver
cf8600077f
Refactor the index file format (#8514)
Signed-off-by: dragondriver <jiquan.long@zilliz.com>
2021-09-29 09:52:12 +08:00
godchen
af173dd2a0
Add delete codec (#8736)
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-09-28 14:30:02 +08:00
godchen
db94d7771f
Read vector from disk (#6707)
* Read vector from disk

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* go fmt

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix git action error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix test error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix action error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix caculate error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* change var name

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* remove unused method

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* remove unused method

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix len error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* remove unused code

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* change bytes to float method

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* change float to bytes method

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix action error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-07-24 09:25:22 +08:00
Cai Yudong
a992dcf6a8
Support query return vector output field (#6570)
* improve code readibility

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add offset in RetrieveResults

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add VectorFieldInfo into Segment struct

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add new interface for query vector

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* update load vector field logic

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* update load vector field logic

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fill in field name in query result

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add FieldId into FieldData

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add fillVectorOutputFieldsIfNeeded

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* update data_codec_test.go

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add DeserializeFieldData

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* realize query return vector output field

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fix static-check

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* disable query vector case

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>
2021-07-16 17:19:55 +08:00
godchen
1c6786f85c
Add blob info (#5792)
* Add blob info

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>

* fix error

Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-06-16 12:03:57 +08:00
Xiangyu Wang
23c4de0eb8
Flush statistics for all int64 fields (#5318)
Resolves: #5262

Signed-off-by: Xiangyu Wang <xiangyu.wang@zilliz.com>
2021-05-20 10:38:45 +00:00
Xiangyu Wang
82ccd4cec0
Rename module (#4988)
* Rename module

Signed-off-by: Xiangyu Wang <xiangyu.wang@zilliz.com>
2021-04-22 14:45:57 +08:00
godchen
0dfcb90881 Add storage copyright
Signed-off-by: godchen <qingxiang.chen@zilliz.com>
2021-04-19 11:32:24 +08:00