Remix.run Logo
jiggawatts 9 hours ago

Yes, but also I've found that using a decently scalable engine is insufficient for a good outcome without the scaled data.

The best software I've written always had > 10 GB of existing data to work with from day one. So for example the "customers" table didn't have one sample entry, it had one million real entries. The "products" table had a real history of product recalls, renames, category changes over time, special one-off products for big customers, etc...

That's how you find out the reality of the business instead of some idealised textbook scenario.

Things like: Oh, actually, 99% of our product SKUs are one-off specials, but 99% of the user interactivity and sales volume is with the generic off-the-shelf 1% of them, so the UI has to cater for this and database table needs a "tag" on these so that they can be filtered. Then, it turns out the filtering 10 million products down to 100K has non-trivial performance issues when paging through the list. Or even worse, 50% of the products are secret because their mere existence or their name is "insider info" that we don't want our own staff to see. Did I say "staff"? I meant subcontractors, partners, and resellers, all with their own access rules and column-level data masking that needs to be consistent across dozens of tables. Okay, let's start going down the rabbithole of column naming conventions and metadata...

You can't predict that stuff in a vacuum, no human has the foresight needed to "ask the right questions" to figure this all out through workshops or whatever. The ground-truth reality is the best thing, especially up-front during the early phases of development.

A lot of the above is actually easy to implement as a software developer, but hard to change half-way-through a project.