Remix.run Logo
oulipo2 3 days ago

Can you tell me more about why timescale doesn't perform in your opinion? My use case for timescale would be to gather my IoT telemetry data (perhaps 20/100 points per second) and store eg 1 year worth of it to do some analysis and query some past data, then offload that to parquet files on S3 for older data

I'd like to be able to use that for alert detection, etc, and some dashboard metrics, so I was thinking that it was the kind of perfect use-case for timescale, but because I haven't been using it yet "at scale" (not deployed yet) I don't know how it will behave

How do you do JOINs with business data for Clickhouse then? Do you have to do some kind of weird process where you query CH, then query Postgres, then join "manually" in your backend?

wkrp 2 days ago | parent [-]

I was a little unclear, I think Timescale performs quite well. Just that in my (very limited) experience, Clickhouse performs better on the same data.

I actually have a blogpost on my experience with it here: https://www.wkrp.xyz/a-small-time-review-of-timescaledb/ that goes into a bit more detail as to my use case and issues I experienced. I'm actually half-way through writing the follow up using Clickhouse.

As detailed in the blog post, my data is all MMO video game stats such as item drops. With Timescale, I was able to join an "items" table with information such as the item name and image url in the same query as the "item_drops" table. This way the data includes everything needed for presentation. To accomplish the same in clickhouse, I create an "items" table and an "items_dict" dictionary (https://clickhouse.com/docs/sql-reference/dictionaries) that contains the same data. The Clickhouse query then JOINs the item_dict against item_drops to achieve the same thing.

If you know the shape of your data, you can probably whip up some quick scripts for generating fake versions and inserting into Timescale to get a feel for storage and query performance.