Remix.run Logo
quietbritishjim 6 days ago

I think it's ok to have this rule as a first approximation, but like all design rules you should understand it well enough to know when to break it.

I worked on an application which joined across lots of tables, which made a few dozen records balloon to many thousands of result rows, with huge redundancy in the results. Think of something like a single conceptual result having details A, B, C from one table, X, Y from another table, and 1, 2, 3 from another table. Instead of having 8 result rows (or 9 if you include the top level one from the main table) you have 18 (AX1, AX2, AX3, AY1, ...). It gets exponentially worse with more tables.

We moved to separate queries for the different tables. Importantly, we were able to filter them all on the same condition, so we were not making multiple queries to child tables when there were lots of top-level results.

The result was much faster because the extra network overhead was overshadowed by the saving in query processing and quantity of data returned. And the application code was actually simpler, because it was a pain to pick out unique child results from the big JOIN. It was literally a win in every respect with no downsides.

(Later, we just stuffed all the data into a single JSONB in a single table, which was even better. But even that is an example of breaking the old normalisation rule.)

9rx 6 days ago | parent | next [-]

> which made a few dozen records balloon to many thousands of result rows

That doesn't really sound like a place where data is actually conceptually joined. I expect, as it is something commonly attempted, that you were abusing joins to try and work around the n+1 problem. As a corollary to the above, you also shouldn't de-join in application code.

kccqzy 5 days ago | parent [-]

It's a join. A join without any ON or USING clause or any filtering is a Cartesian product which is what's happening here.

nicoburns 5 days ago | parent | prev | next [-]

If you use CTEs and json_agg then you can combine your separate queries into one query without redundant data.

wongarsu 5 days ago | parent | prev | next [-]

That reminds me of many cases of adhering to database normalisation rules even in views and queries, even in a case where you should break it. Aggregation functions like postgres's array_agg and jsonb_agg are incredibly powerful at preventing the number of rows from ballooning in situations like those

magicalhippo 5 days ago | parent | prev [-]

I think it's more like avoid doing a "limiting" join in the application, ie where the join is used to limit the output to a subset or similar.

As a somewhat contrived example since I just got out of bed, if your software has a function that needs all the invoice items from invoices from this year which invoice address country is a given value, use a join rather than loading all invoices, invoice addresses and invoice items and performing the filtering on the client side.

Though as you point out, if you just need to load a given record along with details, prefer fetching detail rows independently instead of making a Cartesian behemoth.