Remix.run Logo
jascha_eng 3 days ago

Depending on your workload you might also be able to use Timescale to have very fast analytical queries inside postgres directly. That avoids having to replicate the data altogether.

Note that I work for the company that built timescale (Tiger Data). Clickhouse is cool though, just throwing another option into the ring.

Tbf in terms of speed Clickhouse pulls ahead on most benchmark, unless you want to join a lot with your postgres data directly then you might benefit from having everything in one place. And of course you avoid the sync overhead.

oulipo2 3 days ago | parent [-]

I'm indeed already using Timescaledb, I was wondering if I would really gain something from adding clickhouse

wkrp 3 days ago | parent | next [-]

I was using Timescale for a small project of mine and eventually switched to Clickhouse. While there was a 2-4x disk space reduction, the major benefits have operational (updates & backups). The documentation is much better since Timescale's mixes their cloud product documentation in, really muddying the water.

Despite that, man it is really nice to be able to join your non-timeseries data in your queries (perhaps the fdw will allow this for clickhouse? I need to look into that). If you don't have to deal with the operations side too much and performance isn't a problem, Timescale is really nice.

oulipo2 3 days ago | parent [-]

Can you tell me more about why timescale doesn't perform in your opinion? My use case for timescale would be to gather my IoT telemetry data (perhaps 20/100 points per second) and store eg 1 year worth of it to do some analysis and query some past data, then offload that to parquet files on S3 for older data

I'd like to be able to use that for alert detection, etc, and some dashboard metrics, so I was thinking that it was the kind of perfect use-case for timescale, but because I haven't been using it yet "at scale" (not deployed yet) I don't know how it will behave

How do you do JOINs with business data for Clickhouse then? Do you have to do some kind of weird process where you query CH, then query Postgres, then join "manually" in your backend?

wkrp 2 days ago | parent [-]

I was a little unclear, I think Timescale performs quite well. Just that in my (very limited) experience, Clickhouse performs better on the same data.

I actually have a blogpost on my experience with it here: https://www.wkrp.xyz/a-small-time-review-of-timescaledb/ that goes into a bit more detail as to my use case and issues I experienced. I'm actually half-way through writing the follow up using Clickhouse.

As detailed in the blog post, my data is all MMO video game stats such as item drops. With Timescale, I was able to join an "items" table with information such as the item name and image url in the same query as the "item_drops" table. This way the data includes everything needed for presentation. To accomplish the same in clickhouse, I create an "items" table and an "items_dict" dictionary (https://clickhouse.com/docs/sql-reference/dictionaries) that contains the same data. The Clickhouse query then JOINs the item_dict against item_drops to achieve the same thing.

If you know the shape of your data, you can probably whip up some quick scripts for generating fake versions and inserting into Timescale to get a feel for storage and query performance.

saisrirampur 3 days ago | parent | prev [-]

More on use-cases involving TimescaleDB replication/migration to ClickHouse https://clickhouse.com/blog/timescale-to-clickhouse-clickpip...