Remix.run Logo
Ozzie_osman 4 hours ago

  We sharded over 20 TB that we know about.
This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than that
singron 2 hours ago | parent | next [-]

If your working set is 20 TB, then it's pretty big. Each database has its own mix of hot/cold data, so it's impossible to compare without more information. A better measure might be IOPS. RDS has fairly low maximum IOPS unless you spend a lot more for provisioned IOPS or use Aurora.

rbranson 3 hours ago | parent | prev | next [-]

You are correct. As a point of comparison: almost ten years ago at Segment we had a single Aurora PostgreSQL instance with ~50T of data, it was used to index potential identity data in a much larger corpus of files stored in S3.

GiorgioG 4 hours ago | parent | prev [-]

For a vast majority of use cases 20TB is positively enormous.

mplanchard 3 hours ago | parent | next [-]

RDS caps out at 64 TB unless you use Aurora, so 20 TB is totally manageable without sharding.

returningfory2 4 hours ago | parent | prev | next [-]

This product is for Postgres deployments that are so large they need to be sharded. For these use cases, I think 20TB is about normal.

jeltz 3 hours ago | parent | prev | next [-]

Yes. But for most workloads it is not much for PostgreSQL. You often will not have to shard at all.

happyopossum 4 hours ago | parent | prev | next [-]

Sure, but 20TB in “the only database you need” is mere hours or minutes worth of data for many workflows.

tingletech 4 hours ago | parent | prev [-]

that article seems to suggest 20TB total over the dozen deployments in prod.