Remix.run Logo
gopalv 4 hours ago

> What does compression do to query performance?

That section is the most relevant whenever compression in a DB is discussed.

The purpose of a database is to find, aggregate or update data - storage is where the trade-off gets expressed. There are no silver bullets here.

Any method of compression which speeds up either filter rejection or scan rate is better than something that only trades off IO for CPU usage.

For example, dictionary encoding can be slower to read (because you decompress the whole dictionary and not just the skip read after filter), but not if you can squeeze out an IN clause by turning string comparisons into O(1) dictionary followed by a simple integer filter. Remember, this can be arbitrarily complex (Druid is a great example of this) and then the bitmaps can be used because the dictionary index will be a dense 0-N.

Even better if that can feed a deterministic operation like UPPER() so that you do it over the dictionary hits once, instead of each row. You can even use it over the same hash slot, instead of another dictionary collision check or hash computation.

If anyone is looking at JSONB compression, go take a long look at the Variant encoding proposals from Databricks/Snowflake for Iceberg[1].

Turning a single column "payload" JSONB field into chunks which are columnarized and strictly typed allows you to do all the tricks mentioned here, but on loosely typed data but chunk by chunk.

[1] - https://github.com/apache/parquet-format/blob/master/Variant...

PaulWaldman 3 hours ago | parent [-]

There’s an issue tracking TimescaleDB JSONB compression: https://github.com/timescale/timescaledb/issues/2978

kevinob11 2 hours ago | parent [-]

Ha, we (my partner at our company) filed this issue 5 years ago. We have a large-ish (but not giant) json blob in one of our timescale tables that I'd love to get better compression on. It changes just frequently enough that we didn't split it into columns, but infrequently enough that it could (I think) be compressed quite nicely. Generally timescale has been great for us.