Remix.run Logo
macleginn 2 days ago

People also apply the notion of data manifold to language data (which is fundamentally discrete), and even for images the smoothness is hard to come buy (e.g., "images of cars" is not smooth because of shape and colour discontinuities). I guess the best we can do is to hope that there is an underlying virtual "data manifold" from which our datapoints have been "sampled", and knowing its structure may be useful.

hansvm 2 days ago | parent [-]

Those are less problematic than you might imagine.

- For language, individual words might be discrete, but concepts being communicated have more nuance and fill in the gaps.

- For language, even to the extent that discreteness applies, you can treat the data as being sampled from a coarser manifold and still extract a lot of meaningful structure.

- Images of cars are more continuous than you might imagine because of hue differences induced by time of day, camera lens, shadows, etc.

- Images of cars are potentially smooth even when considering shape and color discontinuities. Manifolds don't have to be globally connected. Local differentiability is usually the thing people are looking for in practical applications.