Remix.run Logo
atemerev a day ago

When I get back from Clickhouse to Postgres, I am always shocked. Like, what it is doing for some minutes importing this 20G dump? Shouldn't it take seconds?

joshstrange a day ago | parent [-]

Every time I use Clickhouse I want blow my brains out, especially knowing that Postgres exists. I’m not saying Clickhouse doesn’t have its place or that Postgres can do everything that Clickhouse can.

What I am saying is that I really dislike working in Clickhouse with all of the weird foot guns. Unless you are using it in a very specific, and in my opinion, limited way, it feels worse than Postgres in every way.

mdaniel a day ago | parent | next [-]

Anything in my life that uses Zookeeper or its dumbass etcd friend means I'm going to have a real bad time. I am thankful they're at least shipping their own ZK-ish but it seems to have fallen into the same trap as etcd, where membership has to be managed like the precious little pets that they are https://clickhouse.com/docs/guides/sre/keeper/clickhouse-kee...

jiggawatts 19 hours ago | parent [-]

Zookeeper in the only clustering product I’ve ever used that actively refused to start a cluster after an all-nodes stop/start.

It blows my mind that a high availability system would purposefully prevent availability as a “feature”.

sciurus 3 hours ago | parent [-]

Although this is oversimplifying things [0], in the face of partitions zookeeper emphasizes consistency over availability.

[0] https://martin.kleppmann.com/2015/05/11/please-stop-calling-...

valyala 6 hours ago | parent | prev | next [-]

Just don't use ClickHouse for OLTP tasks. ClickHouse is an analytical database, which isn't optimized for transactional workloads. Keep calm and use Postgresql for OLTP, and ClickHouse for OLAP.

atemerev a day ago | parent | prev [-]

I mostly need analytics, all data is immutable and append-only.

joshstrange a day ago | parent [-]

And that’s exactly the limited-ness I’m talking about. If that works for you, Clickhouse is amazing. For things like logs I can 100% see the value.

Other data that is ETL’d and might need to update? That sucks.

edmundsauto a day ago | parent | next [-]

There are design patterns / architectures that data engineers often employ to make this less "sucky". Data modeling is magical! (Specifically talking about things like datelist and cumulative tables)

slt2021 21 hours ago | parent | prev | next [-]

you are doing data warehousing wrong, need to learn basics of data warehousing best practices.

Data Warehouse consists of Slowly Changing Dimensions and Facts. none of these require updates

atemerev a day ago | parent | prev [-]

If you can afford rare, batched updates, it sucks much less.

Anyway, yes, if your data is highly mutable, or you cannot do batch writes, then yes, Clickhouse is a wrong choice. Otherwise... it is _really_ hard to ignore 50x (or more) speedup.

Logs, events, metrics, rarely updated things like phone numbers or geocoding, archives, embeddings... Whoooop — it slurps entire Reddit in 48 seconds. Straight from S3. Magic.

If you still want really fast analytics, but have more complex scenarios and/or data loading practices, there's also Kinetica... if you can afford the price. For tiny datasets (a few terabytes), DuckDB might be a great choice too. But Postgres is usually a wrong thing to make work.