Remix.run Logo
mdavid626 5 days ago

I disagree. In modern highly scalable architectures I’d prefer doing joins in the layer front of the database (backend).

The “backend” scales much easier than the database. Loading data by simple indexes, eg. user_id, and joining it on the backend, keeps the db fast. Spinning up another backend instance is easy - unlike db instance.

If you think, your joins must happen in db, because data too big to be loaded to memory on backend, restructure it, so it’s possible.

Bonus points for moving joins to the frontend. This makes data highly cacheable - fast to load, as you need to load less data and frees up resources on server side.

riv991 5 days ago | parent | next [-]

High Scale is so subjective here, I'd hazard a guess that 99% of businesses are not at the scale where they need to worry about scaling larger than a single Postgres or MySQL instance can handle.

Tade0 5 days ago | parent | next [-]

In the case of one project I've been in, the issue was the ORM creating queries, which Postgres deemed too large to do in-memory, so it fell back to performing them on-disk.

Interestingly it didn't even use JOIN everywhere it could because, according to the documentation, not all databases had the necessary features.

A hard lesson in the caveats of outsourcing work to ORMs.

richardlblair 5 days ago | parent [-]

I've worked both with ORMs and without. As a general rule, if the ORM is telling you there is something wrong with your query / tables it is probably right.

The only time I've seen this is my career was a project that was an absolute pile of waste. The "CTO" was self taught, all the tables were far too wide with a ton of null values. The company did very well financially, but the tech was so damn terrible. It was such a liability.

mdavid626 5 days ago | parent | prev | next [-]

Scalability is not the keyword here.

The same principle applies to small applications too.

If you apply it correctly, the application never going to be slow due to slow db queries and you won’t have to optimize complex queries at all.

Plus if you want to split out part of an app to its own service, it’ll be easily possible.

nicoburns 5 days ago | parent [-]

One of the last companies I worked at had very fast queries and response times doing all the joins in-memory in the database. And that was only on a database on a small machine with 8GB RAM. That leaves a vast amount of room for vertical scaling before we started hitting limits.

dondraper36 5 days ago | parent | prev [-]

Vertical scaling is criminally underrated, unfortunately. Maybe, it's because horizontal scaling looks so much better on Linkedin.

mdavid626 5 days ago | parent [-]

Sooner or later even small apps reach hardware limits.

My proposed design doesn’t bring many hard disadvantages.

But it allows you to avoid vertical hardware scaling.

Saves money and development time.

dondraper36 5 days ago | parent [-]

Not really disagreeing with you here, but that "later" never comes for most companies.

AdrianB1 5 days ago | parent | prev | next [-]

My manufacturing data is hundreds of GB to a few TB in size per instance and I am talking about hot data, that is actively queried. It is not possible to restructure and it is a terrible idea to do joins in the front end. Not every app is tiny.

mdavid626 5 days ago | parent | next [-]

In some cases, it’s true.

But your thinking is rather limited. Even such data can be organized in a way, that joins are not necessarily in the db.

This kind of design always “starts” on the frontend - by choosing how and what data will be visible eg. on a table view.

Many people think, showing all data, all the time is the only way.

AdrianB1 5 days ago | parent [-]

The SQL database has more than a dozen semi-independent applications that treat different aspects of the manufacturing process, for example from recipes and batches to maintenance, scrap management and raw material inventory. The data is interlocked, the apps are independent as different people in very different roles are using it. No, it never starts in the front end, it started as a system and evolved by adding more data and more apps. Think SAP as another such example.

mdavid626 5 days ago | parent [-]

This is and “old-school” design. Nowadays I wouldn’t let apps meet in the database.

Simple service oriented architecture is much preferred. Each app with its own data.

Then such problems can be easily avoided.

dakiol 5 days ago | parent [-]

It’s not old school, it’s actually solid design. I have worked too with people that think the frontend or even services should guide the design/architecture of the whole thing. Seems tempting and it has the initial impression that it works, but long terms it’s just bad design. Having Data structures (and mainly this means database structures) stable is key to long term maintenance.

johnmaguire 5 days ago | parent [-]

> Seems tempting and it has the initial impression that it works, but long terms it’s just bad design.

This appears as an opinion rather than an argument. Could you explain what you find bad about the design?

In any case, I believe a DB per backend service isn't a decision driven by the frontend - rather, it's driven by data migration and data access requirements.

dakiol a day ago | parent | next [-]

It's an opinion based on countless of references and books out there. I cannot cite them, but it's like "code should be designed to depend on abstract interfaces instead of a concrete implementation", "everything is a byte stream", "adding more people to a late project makes it later", "Bad programmers worry about the code. Good programmers worry about data structures and their relationships", "Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious.", etc... they are usually true.

RaftPeople 5 days ago | parent | prev [-]

> In any case, I believe a DB per backend service isn't a decision driven by the frontend - rather, it's driven by data migration and data access requirements.

I think the idea of breaking up a shared enterprise DB into many distinct but communicating and dependent DB's was driven by a desire to reduce team+system dependencies to increase ability to change.

While the pro is valid and we make use of the idea sometimes when we design things, the cons are significant. Splitting up a DB that has data that is naturally shared by many departments in the business and by many modules/functional areas of the system increases complexity substantially.

In the shared model, when some critical attribute of an item (sku) is updated, then all of the different modules+functional areas of enterprise are immediately using that current and correct master value.

In the distributed model, there is significant complexity and effort to share this state across all areas. I've worked on systems designed this way and this issue frequently causes problems related to timing.

As with everything, no single solution is best for all situations. We only split this kind of shared state when the pros outweigh the cons, which is sometimes but not that often.

mdavid626 5 days ago | parent [-]

I disagree. I generally understand the problem a "split-up" database brings to the table. This is how people designed things in the last many decades.

What I propose is to leave this design behind.

The split up design fits modern use cases much better. People want all kind of data. They want to change what data they want rather often.

"One" database for all of this doesn't really work -- you can't change the schema since it's used by many applications. So, you'll stuck with a design coming from a time when requirements were probably quite different. Of course, you can make some modifications, but not many and not fundamental ones.

In the split-up design, since you're not sharing the database, you can do whatever you want. Change schema as you see fit. Store data in multiple different forms (duplicates), so it can be queried quickly. The only thing you have to keep is the interface to the outside world (department etc.). Here you can use eg. versioning of your API. Handy.

The 90's are over. We don't have to stick to the limitations people had back then.

Yes of course, data not being up-to-date in every system can be a problem. BUT business people nowadays tend to accept that more, than the inability to change data structures ("we can't add a new field", "we can't change this field" etc.).

RaftPeople 4 days ago | parent | next [-]

> In the split-up design, since you're not sharing the database, you can do whatever you want.

> we can't add a new field, we can't change this field

Ok, let's do an example.

Assumption:

A-ERP system with approximately 30 modules in use (e.g. sales order mgmt, inventory, purchasing, etc)

B-For split DB, the DB is split by module and data flows exist for all shared data. So there are X different copies of the item master (many and possibly most of those modules use the item master), each with the subset of data required by the specific module.

Sample change, add a new field to the item master:

Shared DB:

1-Update DB schema for item master

2-Update code in different modules that need to use the new data element (per feature requirements)

Split DB:

1-Update DB schema in all modules that require the new data element (per feature requirements)

2-Update code in different modules that need to use the new data element

3-Update the data flows for item data in each module that needs to use the new data element

I think you're understating the level of effort when you say "now we can do whatever we want". The actual effort in this change (which is a very common example) is actually greater than in a shared DB and requires more coordination.

Again, there are times when it's the right thing to do, but definitely not a silver bullet without trade-offs.

mdavid626 2 days ago | parent [-]

It's interesting to see what people consider difficult to do.

In my opinion the "Split DB" case you outlined is still much easier to do.

It's never the lines of code or number of steps that need to be done make it complicated or difficult.

It's always the strange, weird, unexpected things. I change "this" and "that" breaks, but nobody knows why.

The biggest benefit of my approach is that it can be split-up between people. One team handles this part, the other some other part. You can only break your part of the database, not everything for everyone else.

sgarland 5 days ago | parent | prev [-]

If you have to change your schema frequently, you didn’t adequately (or at all, more likely) model your data.

DB schema is supposed to be inflexible and strict; that’s how you can guarantee that the data it’s storing is correct.

> The 90s are over

And now we have a generation of devs who think that 1 msec latency for disk reads is normal, that applications need to ship their own OS to run, and that SQL is an antiquated language that they don’t need to bother to learn.

mdavid626 5 days ago | parent | prev [-]

Good, simple solution could be data duplication, eg. store some props from the joined tables directly in the main table.

I know, for many, this is one of the deadly sins, but I think it can work out very well.

sgarland 5 days ago | parent | prev [-]

Unless all your tables have the same width - or you’re doing weird things with constants in your SELECTs - you can’t UNION the various queries, so they’re sequential. You could parallelize those I suppose, but now you’re adding more complexity.

If you want a KV store, use a KV store. If you want an RDBMS, then use its features. They haven’t changed much in the last 50 years for a reason.