milvus/internal/storage
shaoting-huang 88b373b024
enhance: binlog primary key turn off dict encoding (#34358)
issue: #34357 

Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238))
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.

With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-17 17:47:44 +08:00
..
aliyun
gcp
tencent
azure_object_storage_test.go
azure_object_storage.go enhance: Add nilness linter and fix some small issues (#34049) 2024-06-24 14:52:03 +08:00
binlog_iterator_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
binlog_iterator.go enhance: legacy code clean up (#33838) 2024-06-14 14:25:56 +08:00
binlog_reader.go fix: descriptor event in previous version not has nullable to parse error (#34235) 2024-07-01 16:38:06 +08:00
binlog_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
binlog_util_test.go
binlog_util.go
binlog_writer_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
binlog_writer.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
data_codec_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
data_codec.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
data_sorter_test.go
data_sorter.go
delta_data_test.go enhance: Add unittest for storage.DeleteLog (#34190) 2024-06-26 17:14:04 +08:00
delta_data.go enhance: Unify DeleteLog parsing code (#34009) 2024-06-21 16:54:01 +08:00
event_data.go enhance: Fix lint issues from recent PRs (#34482) 2024-07-09 10:06:24 +08:00
event_header.go
event_reader.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
event_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
event_writer_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
event_writer.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
factory.go
field_stats_test.go feat: Major compaction (#33620) 2024-06-10 21:34:08 +08:00
field_stats.go feat: Major compaction (#33620) 2024-06-10 21:34:08 +08:00
field_value_test.go
field_value.go enhance: reconstruct scalar part's code for segment-pruner(#30376) (#34346) 2024-07-04 16:36:09 +08:00
index_data_codec_test.go
index_data_codec.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
insert_data_test.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
insert_data.go enhance: Add lint rule to forbid gogo protobuf (#34594) 2024-07-12 10:19:35 +08:00
local_chunk_manager_test.go
local_chunk_manager.go
minio_object_storage_test.go
minio_object_storage.go
options.go
OWNERS
partition_stats_test.go
partition_stats.go fix: sync part stats task cannot be finished(#30376) (#34027) 2024-06-24 10:16:02 +08:00
payload_reader_test.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
payload_reader.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
payload_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
payload_writer_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
payload_writer.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
payload.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
pk_statistics.go enhance: Use BatchPkExist to reduce bloom filter func call cost (#33611) 2024-06-13 17:57:56 +08:00
primary_key_test.go
primary_key.go enhance: Remove StringPrimaryKey to reduce unnecessary copy and function call cost (#33486) 2024-05-31 15:41:45 +08:00
primary_keys_test.go
primary_keys.go
print_binlog_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
print_binlog.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
remote_chunk_manager_test.go
remote_chunk_manager.go
serde_events_test.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
serde_events.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
serde_test.go enhance: add delta log stream new format reader and writer (#34116) 2024-07-06 09:08:09 +08:00
serde.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
stats_test.go enhance: Use Blocked Bloom Filter instead of basic bloom fitler impl. (#33405) 2024-05-31 17:49:45 +08:00
stats.go enhance: Use Blocked Bloom Filter instead of basic bloom fitler impl. (#33405) 2024-05-31 17:49:45 +08:00
storage_test.go
types.go
unsafe_test.go
unsafe.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
utils_test.go enhance: support null in go payload (#32296) 2024-06-19 17:08:00 +08:00
utils.go enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00