Remix.run Logo
buremba 3 hours ago

All you need is Postgres until you scale into TBs of data. We use Postgresql as a durable workflow engine, vector search, time-series data, BM25 search, OLTP/OLAP engine, and a queue. It's basically the only dependency we have for https://lobu.ai

The main benefit is centralizing all the data in one place so we don't need to worry about copying data in between multiple systems. Once something becomes the bottleneck, you can eventually migrate to a purpose specific tool to scale out.To be honest, LISTEN/NOTIFY in my opinion is the most fragile part of PG but it's fine as start until you scale out.

tibbon 2 hours ago | parent | next [-]

But when you hit that wall, it is hard to stop and convince people to use different patterns and systems. I've seen so many tables go from "it will only be a few thousand rows" to suddenly several TB and then people are looking confused when performance and db admin tasks get really difficult.

I'm working at a scale where almost every day I have to ask people "are you use you need to treat that as relational data? It doesn't seem relational"

dieselgate an hour ago | parent [-]

> are you use you need to treat that as relational data?

Is this intended to be "you sure you need..."?

turkeyboi 5 minutes ago | parent [-]

Obviously, yes

sroussey an hour ago | parent | prev | next [-]

Use different “databases” besides public at the very start. No joins between them. You will be in a good position to just split the postgres instance by those at a later date. They will have different usage patterns than the merged version you have now, and will be easier to optimize and will buy you some time. And time is all you need.

ceres an hour ago | parent | prev | next [-]

Just an fyi, when I try to sign in with google for your app I get the message: "The app is requesting access to sensitive info in your Google Account. Until the developer (*reka*kc*@gmail.com) verifies this app with Google, you shouldn't use it."

buremba an hour ago | parent [-]

Ahh, sorry about that. It should be fixed in an hour, looks like we mixed the permissions. I just tried and confirmed other login methods work if you would like to try out.

hmaxdml 3 hours ago | parent | prev | next [-]

Listen/notify is poised to become much better in PG 18 and 19

stuartaxelowen 2 hours ago | parent [-]

Why’s that?

TkTech 2 hours ago | parent [-]

In pg19 https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit... will land, which significantly improves NOTIFY performance. Right now LISTEN/NOTIFY doesn't scale to very busy instances because a `NOTIFY` within a transaction takes a global lock.

ivanr 2 hours ago | parent | next [-]

More context: https://www.recall.ai/blog/postgres-listen-notify-does-not-s...

doctorpangloss 36 minutes ago | parent | prev [-]

Well another POV is, AWS sells RDS instances capable of global lock NOTIFY. Clearly people have been using it despite it being really slow.

It's a terrible architecture but does it matter? This article should really say "AWS is a useful but expensive way to run your apps," which isn't say much of anything at all.

cultofmetatron an hour ago | parent | prev | next [-]

conversely, startups that start scaling for tbs of data never make it to needing tbs of data. They burn too much energy scaling when they don't yet have a product people want yet.

throwaway7783 3 hours ago | parent | prev | next [-]

I'm in the same camp. Do you use any specific extensions? Especially for OLAP and time series (partitioned tables + related extensions work fine, but curious if you use anything else)

osigurdson 15 minutes ago | parent | next [-]

From experience, I'd suggest using ClickHouse beyond a few billion rows of timeseries data in Postgres.

throwaway7783 6 minutes ago | parent [-]

Nice thing about our use case is that its not strictly analytics, but looking at most recent raw data. ClickHouse is definitely the powerhouse for analytics

buremba 2 hours ago | parent | prev [-]

The native extensions are fine but I don't have good experience with any third party extensions, so far tried Timescale, pg_lake, citus, and pgvectorscale. They look very appealing but it's usually a trap as you can't get the value without using the vendor's cloud offerings.

I think if you grow enough to look for these extensions, it's usually better to bet on purpose-specific tooling. For example, I use DuckDB/Iceberg combination extensively for columnar data and connect DuckDB to PG when I need it.

throwaway7783 5 minutes ago | parent [-]

Fair enough. How do you do BM25?

pphysch 3 hours ago | parent | prev [-]

I don't see logs mentioned. I agree with most those applications but would keep my OLAP stuff (metrics, logs, traces) in a separate store like VictoriaMetrics, both for capacity and read activity.

TkTech 2 hours ago | parent | next [-]

pg_timescale can take you pretty far for metrics and would be Good Enough for almost all users. Totally agree on raw, high-volume logs though.

buremba 2 hours ago | parent | prev [-]

Yeah I have logs in Sentry, which also uses Postgresql.