Remix.run Logo
bhouston 13 hours ago

Argh. Shard the damn database already.

Why are they not sharing by user/org yet? It is so simple and would fix the primary issue they are running into.

All these work arounds they go through to avoid a straight forward fix.

samwillis 13 hours ago | parent | next [-]

The message of the talk was very much that you can scale to massive throughput without having to shard and having only a single master.

Of course they considered it, but the tradeoffs didn't match what they wanted to do - plus they found you could scale to this level without sharding.

vanviegen 12 hours ago | parent [-]

The talk seems to be mostly about all the limitations and workarounds they've had to deal with, because they choose not to shard. Apparently, they have a policy of adding no new functionality to the database, which presumably means additional separate database services being setup for each new feature. That sounds a lot like accumulating tech debt very rapidly, just because sharding is not on the table, for whatever reason.

bhouston 9 hours ago | parent [-]

Yeah, when they mentioned that they couldn't put any more services on their main DB because of this issue I did a facepalm. They are building out explicit tech debt now because they are not sharding.

bohanoai 9 hours ago | parent | prev | next [-]

Speaker here — Bohan from OpenAI.

Our application has hundreds of endpoints, which makes sharding non-trivial. We've already offloaded shardable workloads—particularly write-heavy ones—from PostgreSQL. What remains is primarily read-only and would require substantial effort to shard. Currently, the workload scales well on Azure Database for PostgreSQL, and we have sufficient headroom to support future growth.

That said, we're not ruling out sharding in the future—it’s just not a near-term priority.

mike_hearn 9 hours ago | parent | prev | next [-]

Sharding is often not simple. The whole reason you're using a powerful database in the first place is that you want its ability to analyze data and answer complex questions. If you didn't you might as well just use a bunch of NFS mounts: it's sharding and even simpler than a database.

iampims 13 hours ago | parent | prev | next [-]

Not sure I would qualify sharding a DB that get 1M qps as straight forward. I agree with you that it seems that an org would be a natural sharding key, but we know that at this scale, nothing really is ever straight forward, especially when it's your first rodeo.

bhouston 9 hours ago | parent | next [-]

> Not sure I would qualify sharding a DB that get 1M qps as straight forward.

Sharding at the application layer (basically figure out the shard from org/user in your application code prior to interacting with the DB), will scale to any QPS rate. This is what I was referring to.

evanelias 12 hours ago | parent | prev [-]

That's true, but that's also why you really should shard long before hitting that point...

If your company is growing at this insane rate, it should be obvious that eventually you must shard. And the longer you delay this, the more painful it will be to accomplish.

levkk 11 hours ago | parent | prev [-]

That's exactly what I'm saying!