Remix.run Logo
atemerev a day ago

I mostly need analytics, all data is immutable and append-only.

joshstrange a day ago | parent [-]

And that’s exactly the limited-ness I’m talking about. If that works for you, Clickhouse is amazing. For things like logs I can 100% see the value.

Other data that is ETL’d and might need to update? That sucks.

edmundsauto a day ago | parent | next [-]

There are design patterns / architectures that data engineers often employ to make this less "sucky". Data modeling is magical! (Specifically talking about things like datelist and cumulative tables)

slt2021 19 hours ago | parent | prev | next [-]

you are doing data warehousing wrong, need to learn basics of data warehousing best practices.

Data Warehouse consists of Slowly Changing Dimensions and Facts. none of these require updates

atemerev 20 hours ago | parent | prev [-]

If you can afford rare, batched updates, it sucks much less.

Anyway, yes, if your data is highly mutable, or you cannot do batch writes, then yes, Clickhouse is a wrong choice. Otherwise... it is _really_ hard to ignore 50x (or more) speedup.

Logs, events, metrics, rarely updated things like phone numbers or geocoding, archives, embeddings... Whoooop — it slurps entire Reddit in 48 seconds. Straight from S3. Magic.

If you still want really fast analytics, but have more complex scenarios and/or data loading practices, there's also Kinetica... if you can afford the price. For tiny datasets (a few terabytes), DuckDB might be a great choice too. But Postgres is usually a wrong thing to make work.