Remix.run Logo
dapperdrake 5 hours ago

Other way around. Aggregation is usually faster than a join.

sgarland 3 hours ago | parent [-]

Disagree, though in practice it depends on the query, cardinality of the various columns across table, indices, and RDBMS implementation (so, everything).

A simple equijoin with high cardinality and indexed columns will usually be extremely fast. The same join in a 1:M might be fast, or it might result in a massive fanout. In the case of the latter, if your RDBMS uses a clustering index, and if you’ve designed your schemata to exploit this fact (e.g. a table called UserPurchase that has a PK of (user_id, purchase_id)) can still be quite fast.

Aggregations often imply large amounts of data being retrieved, though this is not necessarily true.

dapperdrake 2 hours ago | parent [-]

That level of database optimization is rare in practice. As soon as a non-database person gets decision making authority there goes your data model and disk layout.

And many important datasets never make it into any kind of database like that. Very few people provide "index columns" in their CSV files. Or they use long variable length strings as their primary key.

OP pertains to that kind of data. Some stuff in text files.