Remix.run Logo
atemerev a day ago

I mostly need analytics, all data is immutable and append-only.

joshstrange a day ago | parent [-]

And that’s exactly the limited-ness I’m talking about. If that works for you, Clickhouse is amazing. For things like logs I can 100% see the value.

Other data that is ETL’d and might need to update? That sucks.

edmundsauto a day ago | parent | next [-]

There are design patterns / architectures that data engineers often employ to make this less "sucky". Data modeling is magical! (Specifically talking about things like datelist and cumulative tables)

slt2021 21 hours ago | parent | prev | next [-]

you are doing data warehousing wrong, need to learn basics of data warehousing best practices.

Data Warehouse consists of Slowly Changing Dimensions and Facts. none of these require updates

atemerev a day ago | parent | prev [-]

If you can afford rare, batched updates, it sucks much less.

Anyway, yes, if your data is highly mutable, or you cannot do batch writes, then yes, Clickhouse is a wrong choice. Otherwise... it is _really_ hard to ignore 50x (or more) speedup.

Logs, events, metrics, rarely updated things like phone numbers or geocoding, archives, embeddings... Whoooop — it slurps entire Reddit in 48 seconds. Straight from S3. Magic.

If you still want really fast analytics, but have more complex scenarios and/or data loading practices, there's also Kinetica... if you can afford the price. For tiny datasets (a few terabytes), DuckDB might be a great choice too. But Postgres is usually a wrong thing to make work.