Remix.run Logo
everfrustrated 17 hours ago

This is why I love Postgres. It can get you to being one of the largest websites before you need to reconsider your architecture just by throwing CPU and disk at it. At that point you can well afford to hire people who are deep experts at sharding etc.

zozbot234 15 hours ago | parent | next [-]

PostgreSQL actually supports sharding out of the box, it's just a matter of setting up the right table partitioning and using Foreign Data Wrapper (FDW) to forward queries to remote databases. I'm not sure what the post is referencing when they say that sharding requires leaving Postgres altogether.

dmix 14 hours ago | parent [-]

This is specifically what they said about sharding

> The primary rationale is that sharding existing application workloads would be highly complex and time-consuming, requiring changes to hundreds of application endpoints and potentially taking months or even years

manquer 11 hours ago | parent | next [-]

> potentially taking months or even years

On one hand OAI sell coding agents and constantly hype how easy it will replace developers and most of the code written is by agents, on the other hand they claim it will take years to refactor

Both cannot be true at the same time.

simonw 13 hours ago | parent | prev | next [-]

Genuinely sounds like the kind of challenge that could be solved with a swarm of Codex coding agents. I'm surprised they aren't treating this as an ideal use-case to show off their stack!

gloflo 9 hours ago | parent | next [-]

Oh snap! Maybe it's all a great deception for making money?

csto12 13 hours ago | parent | prev | next [-]

I read your message, guessed the author, and I’m happy to announce I guessed correctly.

Ozzie_osman 11 hours ago | parent | prev | next [-]

Getting the sharing in-place, yes, but maintaining it operationally would still be a headache. Things like schema migrations across shards, resharding, and even observability.

aisuxmorethanhn 12 hours ago | parent | prev [-]

It wouldn’t work.

zozbot234 14 hours ago | parent | prev [-]

I know they said that, but in fact sharding is entirely a database-level concern. The application need not be aware of it at all.

EB66 14 hours ago | parent [-]

Sharding can be made mostly transparent, but it's not purely a DB-level concern in practice. Once data is split across nodes, join patterns, cross-shard transactions, global uniqueness, certain keys hit with a lot of traffic, etc matter a lot. Even if partitioning handles routing, the application's query patterns and its consistency/latency requirements can still force application-level changes.

zozbot234 8 hours ago | parent | next [-]

> mostly transparent, but it's not purely a DB-level concern in practice ...

But how would any of that change by going outside Postgres itself to begin with? That's the part that doesn't make much sense to me.

londons_explore 6 hours ago | parent [-]

When sharded, anything crossing a shard boundary becomes non-transactional.

Ie. if you shard by userId, then a "share" feature which allows a user to share data with another user by having a "SharedDocuments" table cannot be consistent.

That in turn means you're probably going to have to rewrite the application to handle cases like a shared document having one or other user attached to it disappear or reappear. There are loads of bugs that can happen with weak consistency like this, and at scale every very rare bug is going to happen and need dealing with.

zozbot234 6 hours ago | parent [-]

> When sharded, anything crossing a shard boundary becomes non-transactional.

Not necessarily? You can have two-phase commit for cross-shard writes, which ought to be rare anyway.

londons_explore 5 hours ago | parent [-]

Two-phase commit provides an eventual consistency guarantee only....

Other clients (readers) have to be able to deal with inconsistencies in the meantime.

Also, 2PC in postgres is incompatible with temporary tables, which rules out use with longrunning batch analysis jobs which might use temporary tables for intermediate work and then save results. Eg. "We want to send this marketing campaign to the top 10% of users" doesn't work with the naive approach.

ants_a 3 hours ago | parent [-]

These are limitations in the current PostgreSQL implementation. It's quite possible to have consistent commits and snapshots across sharded databases. Hopefully some day in PostgreSQL too.

awesome_dude 10 hours ago | parent | prev [-]

> Once data is split across nodes, join patterns, cross-shard transactions, global uniqueness, certain keys hit with a lot of traffic

If you're having trouble there then a proxy "layer" between your application and the sharded database makes sense, meaning your application still keeps its naieve understanding of the data (as it should) and the proxy/database access layer handles that messiness... shirley

9rx 8 hours ago | parent | prev [-]

> At that point you can well afford to hire people who are deep experts at sharding etc.

Can you, though? OpenAI is haemorrhaging money like it is going out of style and, according to the news cycle over the last couple of days, will likely to be bankrupt by 2027.

londons_explore 6 hours ago | parent [-]

And typically the bigger the company gets, the harder it is to migrate to a new data model.

You suddenly have literally thousands of internal users of a datastore, and "We want to shard by userId, nobody please don't do joins on user Id anymore" becomes an impossible ask.