Remix.run Logo
jiggawatts a day ago

You will never get to the moon by making a faster and faster bus.

I see a lot of software with that initial small scale "baked into it" at every level of its design, from the database engine choice, schema, concurrency handling, internal architecture, and even the form design and layout.

The best-engineered software I've seen (and written) always started at the maximum scale, with at least a plan for handling future feature extensions.

As a random example, the CommVault backup software was developed in AT&T to deal with their enormous distributed scale, and it was the only decently scalable backup software I had ever used. It was a serious challenge with its competitors to run a mere report of last night's backup job status!

I also see a lot of "started small, grew too big" software make hundreds of silly little mistakes throughout, such as using drop-down controls for selecting users or groups. Works great for that mom & pop corner store customer with half a dozen accounts, fails miserably at orgs with half a million. Ripping that out and fixing it can be a decidedly non-trivial piece of work.

Similarly, cardinality in the database schema has really irritating exceptions that only turn up at the million or billion row scale and can be obscenely difficult to fix later. An example I'm familiar with is that the ISBN codes used to "uniquely" identify books are almost, but not quite unique. There are a handful of duplicates, and yes, they turn up in real libraries. This means that if you used these as a primary key somewhere... bzzt... start over from the beginning with something else!

There is no way to prepare for this if you start with indexing the book on your own bookshelf. Whatever you cook up will fail at scale and will need a rethink.

rmunn a day ago | parent | next [-]

Counterpoint: the idea that your project will be the one to scale up to the millions of users/requests/etc is hubris. Odds are, your project won't scale past a scale of 10,000 to 100,000. Designing every project to scale to the millions from the beginning often leads to overengineering, adding needless complexity when a simpler solution would have worked better.

Naturally, that advice doesn't hold if you know ahead of time that the project is going to be deployed at massive scale. In which case, go ahead and implement your database replication, load balancing, and failover from the start. But if you're designing an app for internal use at your company of 500, well, feel free to just use SQLite as your database. You won't ever run into the problems of scale in this app, and single-file databases have unique advantages when your scale is small.

Basically: know when huge scale is likely, and when it's immensely UNlikely. Design accordingly.

jiggawatts 20 hours ago | parent [-]

> Odds are, your project won't scale past a scale of 10,000 to 100,000.

That may be a self-fulfilling prophecy.

I agree in general that most apps don't need fancy scaling features, but apps that can't scale... won't... and hence "don't need scaling features".

> You won't ever run into the problems of scale in this app, and single-file databases have unique advantages when your scale is small.

I saw a customer start off with essentially a single small warehouse selling I dunno... widgets or something... and then the corporation grew and grew to a multi-national shipping and logistic corporation. They were saddled with an obscure proprietary database that worked like SQLite and had incredibly difficult to overcome technical challenges. They couldn't just migrate off, because that would have needed a massive many-year long total rewrite of their app.

For one performance issue we were seriously trying to convince them to use phase-change cooling on frequency-optimized server CPUs like a gamer overclocking their rig because that was the only way to eke out just enough performance to ensure their overnight backups didn't run into the morning busy time.

That's just not an issue with SQL Server or any similar standard client-server database engine.

jkrejcha 18 hours ago | parent [-]

I think part of that thinking though is that if you do basic stuff like use a standard database engine or don't go too off the beaten path if that's what you need, it tends to be that you get the ultimately needed scale for basically free.

This is a lot of times what I see the "don't build for huge scale" to be. It's not necessarily "be proud of O(n^2) algorithms". Rather it's more "use Postgres instead of some hyperscale sharded database when you only have 10 million users" because the alternative tends to miss the forest (and oftentimes the scale, ironically) for the trees

jiggawatts 9 hours ago | parent [-]

Yes, but also I've found that using a decently scalable engine is insufficient for a good outcome without the scaled data.

The best software I've written always had > 10 GB of existing data to work with from day one. So for example the "customers" table didn't have one sample entry, it had one million real entries. The "products" table had a real history of product recalls, renames, category changes over time, special one-off products for big customers, etc...

That's how you find out the reality of the business instead of some idealised textbook scenario.

Things like: Oh, actually, 99% of our product SKUs are one-off specials, but 99% of the user interactivity and sales volume is with the generic off-the-shelf 1% of them, so the UI has to cater for this and database table needs a "tag" on these so that they can be filtered. Then, it turns out the filtering 10 million products down to 100K has non-trivial performance issues when paging through the list. Or even worse, 50% of the products are secret because their mere existence or their name is "insider info" that we don't want our own staff to see. Did I say "staff"? I meant subcontractors, partners, and resellers, all with their own access rules and column-level data masking that needs to be consistent across dozens of tables. Okay, let's start going down the rabbithole of column naming conventions and metadata...

You can't predict that stuff in a vacuum, no human has the foresight needed to "ask the right questions" to figure this all out through workshops or whatever. The ground-truth reality is the best thing, especially up-front during the early phases of development.

A lot of the above is actually easy to implement as a software developer, but hard to change half-way-through a project.

t43562 10 hours ago | parent | prev [-]

You can by making a bigger and bigger rocket though.