milvus/internal/storage
shaoting-huang 88b373b024
enhance: binlog primary key turn off dict encoding (#34358)
issue: #34357 

Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238))
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.

With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-17 17:47:44 +08:00
..
aliyun Identify service providers based on addresses (#27907) 2023-10-25 17:28:10 +08:00
gcp Format the code (#27275) 2023-09-21 09:45:27 +08:00
tencent feat: Support tencent cloud object storage for milvus (#30163) 2024-01-23 11:28:56 +08:00
azure_object_storage_test.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
azure_object_storage.go enhance: Add nilness linter and fix some small issues (#34049) 2024-06-24 14:52:03 +08:00
binlog_iterator_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
binlog_iterator.go enhance: legacy code clean up (#33838) 2024-06-14 14:25:56 +08:00
binlog_reader.go fix: descriptor event in previous version not has nullable to parse error (#34235) 2024-07-01 16:38:06 +08:00
binlog_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
binlog_util_test.go Format the code (#27275) 2023-09-21 09:45:27 +08:00
binlog_util.go Move some modules from internal to public package (#22572) 2023-04-06 19:14:32 +08:00
binlog_writer_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
binlog_writer.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
data_codec_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
data_codec.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
data_sorter_test.go enhance: add helpers to parse sparse float vector in JSON (#32543) 2024-04-25 14:47:24 +08:00
data_sorter.go feat: [Sparse Float Vector] add sparse vector support to milvus components (#30630) 2024-03-13 14:32:54 -07:00
delta_data_test.go enhance: Add unittest for storage.DeleteLog (#34190) 2024-06-26 17:14:04 +08:00
delta_data.go enhance: Unify DeleteLog parsing code (#34009) 2024-06-21 16:54:01 +08:00
event_data.go enhance: Fix lint issues from recent PRs (#34482) 2024-07-09 10:06:24 +08:00
event_header.go Move some modules from internal to public package (#22572) 2023-04-06 19:14:32 +08:00
event_reader.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
event_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
event_writer_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
event_writer.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
factory.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
field_stats_test.go feat: Major compaction (#33620) 2024-06-10 21:34:08 +08:00
field_stats.go feat: Major compaction (#33620) 2024-06-10 21:34:08 +08:00
field_value_test.go feat: Define FieldValue, FieldStats and PartitionStats (#30286) 2024-03-06 20:42:37 -08:00
field_value.go enhance: reconstruct scalar part's code for segment-pruner(#30376) (#34346) 2024-07-04 16:36:09 +08:00
index_data_codec_test.go enhance: Add memory size for binlog (#33025) 2024-05-15 12:59:34 +08:00
index_data_codec.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
insert_data_test.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
insert_data.go enhance: Add lint rule to forbid gogo protobuf (#34594) 2024-07-12 10:19:35 +08:00
local_chunk_manager_test.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
local_chunk_manager.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
minio_object_storage_test.go fix: Removed minio bucket after use in test (#32624) 2024-04-28 13:51:26 +08:00
minio_object_storage.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
options.go enhance: Support MinIO TLS connection (#31311) 2024-03-21 11:15:20 +08:00
OWNERS [skip ci]Update OWNERS files (#11898) 2021-11-16 15:41:11 +08:00
partition_stats_test.go feat: Define FieldValue, FieldStats and PartitionStats (#30286) 2024-03-06 20:42:37 -08:00
partition_stats.go fix: sync part stats task cannot be finished(#30376) (#34027) 2024-06-24 10:16:02 +08:00
payload_reader_test.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
payload_reader.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
payload_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
payload_writer_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
payload_writer.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
payload.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
pk_statistics.go enhance: Use BatchPkExist to reduce bloom filter func call cost (#33611) 2024-06-13 17:57:56 +08:00
primary_key_test.go Use go-api/v2 for milvus-proto (#24770) 2023-06-09 01:28:37 +08:00
primary_key.go enhance: Remove StringPrimaryKey to reduce unnecessary copy and function call cost (#33486) 2024-05-31 15:41:45 +08:00
primary_keys_test.go enhance: Add PrimaryKeys interface to reduce memory usage (#30405) 2024-02-01 09:57:11 +08:00
primary_keys.go enhance: Add PrimaryKeys interface to reduce memory usage (#30405) 2024-02-01 09:57:11 +08:00
print_binlog_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
print_binlog.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
remote_chunk_manager_test.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
remote_chunk_manager.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
serde_events_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
serde_events.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
serde_test.go enhance: add delta log stream new format reader and writer (#34116) 2024-07-06 09:08:09 +08:00
serde.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
stats_test.go enhance: Use Blocked Bloom Filter instead of basic bloom fitler impl. (#33405) 2024-05-31 17:49:45 +08:00
stats.go enhance: Use Blocked Bloom Filter instead of basic bloom fitler impl. (#33405) 2024-05-31 17:49:45 +08:00
storage_test.go enhance: Remove vector chunk manager (#28569) 2023-11-30 18:00:33 +08:00
types.go enhance: use WalkWithPrefix api for oss, enable piplined file gc (#31740) 2024-04-25 20:41:27 +08:00
unsafe_test.go [skip e2e]Update license for storage unsafe (#14452) 2021-12-28 20:03:56 +08:00
unsafe.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
utils_test.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
utils.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00