Remix.run Logo
atemerev a day ago

When I get back from Clickhouse to Postgres, I am always shocked. Like, what it is doing for some minutes importing this 20G dump? Shouldn't it take seconds?

joshstrange a day ago | parent [-]

Every time I use Clickhouse I want blow my brains out, especially knowing that Postgres exists. I’m not saying Clickhouse doesn’t have its place or that Postgres can do everything that Clickhouse can.

What I am saying is that I really dislike working in Clickhouse with all of the weird foot guns. Unless you are using it in a very specific, and in my opinion, limited way, it feels worse than Postgres in every way.

mdaniel 20 hours ago | parent | next [-]

Anything in my life that uses Zookeeper or its dumbass etcd friend means I'm going to have a real bad time. I am thankful they're at least shipping their own ZK-ish but it seems to have fallen into the same trap as etcd, where membership has to be managed like the precious little pets that they are https://clickhouse.com/docs/guides/sre/keeper/clickhouse-kee...

jiggawatts 17 hours ago | parent [-]

Zookeeper in the only clustering product I’ve ever used that actively refused to start a cluster after an all-nodes stop/start.

It blows my mind that a high availability system would purposefully prevent availability as a “feature”.

sciurus 40 minutes ago | parent [-]

Although this is oversimplifying things [0], in the face of partitions zookeeper emphasizes consistency over availability.

[0] https://martin.kleppmann.com/2015/05/11/please-stop-calling-...

valyala 4 hours ago | parent | prev | next [-]

Just don't use ClickHouse for OLTP tasks. ClickHouse is an analytical database, which isn't optimized for transactional workloads. Keep calm and use Postgresql for OLTP, and ClickHouse for OLAP.

atemerev a day ago | parent | prev [-]

I mostly need analytics, all data is immutable and append-only.

joshstrange a day ago | parent [-]

And that’s exactly the limited-ness I’m talking about. If that works for you, Clickhouse is amazing. For things like logs I can 100% see the value.

Other data that is ETL’d and might need to update? That sucks.

edmundsauto a day ago | parent | next [-]

There are design patterns / architectures that data engineers often employ to make this less "sucky". Data modeling is magical! (Specifically talking about things like datelist and cumulative tables)

slt2021 19 hours ago | parent | prev | next [-]

you are doing data warehousing wrong, need to learn basics of data warehousing best practices.

Data Warehouse consists of Slowly Changing Dimensions and Facts. none of these require updates

atemerev 20 hours ago | parent | prev [-]

If you can afford rare, batched updates, it sucks much less.

Anyway, yes, if your data is highly mutable, or you cannot do batch writes, then yes, Clickhouse is a wrong choice. Otherwise... it is _really_ hard to ignore 50x (or more) speedup.

Logs, events, metrics, rarely updated things like phone numbers or geocoding, archives, embeddings... Whoooop — it slurps entire Reddit in 48 seconds. Straight from S3. Magic.

If you still want really fast analytics, but have more complex scenarios and/or data loading practices, there's also Kinetica... if you can afford the price. For tiny datasets (a few terabytes), DuckDB might be a great choice too. But Postgres is usually a wrong thing to make work.