Remix.run Logo
sigwinch28 16 hours ago

Or it’s simply an indicator of a schema that has not been excessively normalised (why create an addresses_cities table just to ensure no duplicate cities are ever written to the addresses table?)

valiant55 14 hours ago | parent | next [-]

It depends when you see it, but I agree that DISTINCT shouldn't be used in production. If I'm writing a one off query and DISTINCT gets me over the finish line sparing me a few minutes then that's fine.

echelon 12 hours ago | parent | prev | next [-]

DISTINCT, as well as the other aggregation functions, are fantastic for offline analytics queries. I find a lot of use for them in reporting, non-production code.

sgarland 10 hours ago | parent | prev [-]

Because a city/region/state can be uniquely identified with a postal code (hell, in Ireland, the entire address is encapsulated in the postal code), but the reverse is not true.

At scale, repeated low-cardinality columns matter a great deal.

virissimo 8 hours ago | parent | next [-]

There are ZIP codes that overlap a city and also an unincorporated area. Furthermore, there are zip codes that overlap different states. A data model that renders these unrepresentable may come back to bite you.

pbnjay 8 hours ago | parent | prev | next [-]

FYI this is not true in the US. Zip codes identify postal routes not locations

bdangubic 6 hours ago | parent | prev | next [-]

saying zipcodes uniquely identify city/state/region is like saying John uniquely identifies a human :)

lucyjojo 4 hours ago | parent | prev [-]

these kinds of things are almost never true in the real world.