milvus/internal
shaoting-huang 88b373b024
enhance: binlog primary key turn off dict encoding (#34358)
issue: #34357 

Go Parquet uses dictionary encoding by default, and it will fall back to
plain encoding if the dictionary size exceeds the dictionary size page
limit. Users can specify custom fallback encoding by using
`parquet.WithEncoding(ENCODING_METHOD)` in writer properties. However,
Go Parquet [fallbacks to plain
encoding](e65c1e295d/go/parquet/file/column_writer_types.gen.go.tmpl (L238))
rather than custom encoding method users provide. Therefore, this patch
only turns off dictionary encoding for the primary key.

With a 5 million auto ID primary key benchmark, the parquet file size
improves from 13.93 MB to 8.36 MB when dictionary encoding is turned
off, reducing primary key storage space by 40%.

Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
2024-07-17 17:47:44 +08:00
..
allocator enhance: Pre-allocate ids for import (#33958) 2024-07-07 21:26:14 +08:00
core fix: correctly set search params when using knowhere iterator (#34731) 2024-07-17 15:13:41 +08:00
datacoord enhance: Add l0 segment entry num quota (#34733) 2024-07-17 17:35:41 +08:00
datanode enhance: Pre-allocate ids for compaction (#34187) 2024-07-17 13:23:42 +08:00
distributed fix: Restful API use deprecate error code cause access log panic. (#34576) 2024-07-12 10:13:35 +08:00
http enhance: add restful api to trigger component stop (#32076) 2024-06-07 10:35:54 +08:00
indexnode feat: support partition key isolation (#34336) 2024-07-11 19:01:35 +08:00
kv enhance: Fix lint issues from recent PRs (#34482) 2024-07-09 10:06:24 +08:00
metastore fix: streaming service related fix patch (#34696) 2024-07-16 15:49:38 +08:00
mocks fix: ut failure for grpc upgrade (#34726) 2024-07-16 21:49:40 +08:00
parser/planparserv2 enhance: Fix lint issues from recent PRs (#34482) 2024-07-09 10:06:24 +08:00
proto enhance: Pre-allocate ids for compaction (#34187) 2024-07-17 13:23:42 +08:00
proxy fix: fix metaCache cleanup issue when listPolicy failed (#34449) 2024-07-16 10:03:38 +08:00
querycoordv2 enhance: Preserve fixed-size memory in delegator node for growing segment. (#34596) 2024-07-15 20:51:46 +08:00
querynodev2 enhance: mark duplicated pk as deleted (#34586) 2024-07-16 14:25:39 +08:00
registry Add querynode client wrapper and avoid grpc in standalone mode (#27781) 2023-10-19 11:10:07 +08:00
rootcoord enhance: Add l0 segment entry num quota (#34733) 2024-07-17 17:35:41 +08:00
storage enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
streamingcoord/server enhance: streaming service grpc utilities (#34436) 2024-07-15 20:49:38 +08:00
streamingnode fix: ut failure for grpc upgrade (#34726) 2024-07-16 21:49:40 +08:00
tso enhance: move rocksmq from internal to pkg module (#33881) 2024-06-25 21:18:15 +08:00
types enhance: Check by proxy rate limiter when delete get data by query. (#30891) 2024-05-23 20:03:40 +08:00
util enhance: binlog primary key turn off dict encoding (#34358) 2024-07-17 17:47:44 +08:00
.mockery.yaml enhance: streaming service grpc utilities (#34436) 2024-07-15 20:49:38 +08:00