| ▲ | gkbrk 4 hours ago | |||||||
My Hacker News items table in ClickHouse has 47,428,860 items, and it's 5.82 GB compressed and 18.18 GB uncompressed. What makes Parquet compression worse here, when both formats are columnar? | ||||||||
| ▲ | 0cf8612b2e1e 4 hours ago | parent | next [-] | |||||||
Sorting, compression algorithm +level, and data types can all have an impact. I noted elsewhere that a Boolean is getting represented as an integer. That’s one bit vs 1-4 bytes. There is also flexibility in what you define as the dataset. Skinnier, but more focused tables could be space saving vs a wide table that covers everything -will probably break compressible runs of data. | ||||||||
| ▲ | xnx 4 hours ago | parent | prev | next [-] | |||||||
Parquet has a few compression option. Not sure which one they are using. | ||||||||
| ||||||||
| ▲ | boznz 22 minutes ago | parent | prev [-] | |||||||
.. and Remove all the political shit-slop since COVID/AI and it's probably under a gig. | ||||||||