Remix.run Logo
djoldman 5 days ago

A classic.

See also:

https://en.wikipedia.org/wiki/Datasaurus_dozen

sunrunner 5 days ago | parent | next [-]

Content warning: This is a baker’s dozen not a regular dozen, in case anyone clicks through expecting to find twelve and is mildly and briefly perturbed.

djoldman 5 days ago | parent | prev [-]

The scary thing is that yea we can see these in 2D and maybe 3D. But ...

usually there are more than 2 or 3 columns in our data :(

imurray 5 days ago | parent [-]

It's clearly hard, but there are tools for doing exploratory visualization of high-dim data. GGobi http://ggobi.org/ and all the ones that arrange points but try to get local neighborhoods correct (t-sne, umap, et al.).

lamename 5 days ago | parent [-]

Yeah, but still "scary" because you have to be really careful to not fool yourself and pay attention even with those algorithms. For example, a good demonstration with tsne https://distill.pub/2016/misread-tsne/?hl=cs