Does the way "manifold" is used when describing subsets of the representational space of neural networks (e.g. "data lies on a low-dimensional manifold within the high-dimensional representation space") actually correspond to this formal definition, or is it just co-opting the name to mean something simpler (just an embedded sub-space)?

If it is the formal definition being used, then why? Do people actually reason about data manifolds using "atlases" and "charts" of locally euclidean parts of the manifold?

▲

antognini 2 days ago | parent | next [-]

It's hard to prove rigorously which is why people usually refer to it as the "manifold hypothesis." But it is reasonable to suppose that (most) data does live on a manifold in the strict sense of the term. If you imagine the pixels associated with a handwritten "6", you can smoothly deform the 6 into a variety of appearances where all the intermediate stages are recognizable as a 6.

However the embedding space of a typical neural network that is representing the data is not a manifold. If you use ReLU activations the kinks that the ReLU function creates break the smoothness. (Though if you exclusively used a smooth activation function like the swish function you could maintain a manifold structure.)

▲

macleginn 2 days ago | parent [-]

People also apply the notion of data manifold to language data (which is fundamentally discrete), and even for images the smoothness is hard to come buy (e.g., "images of cars" is not smooth because of shape and colour discontinuities). I guess the best we can do is to hope that there is an underlying virtual "data manifold" from which our datapoints have been "sampled", and knowing its structure may be useful.

	▲	hansvm 2 days ago \| parent [-]
		Those are less problematic than you might imagine. - For language, individual words might be discrete, but concepts being communicated have more nuance and fill in the gaps. - For language, even to the extent that discreteness applies, you can treat the data as being sampled from a coarser manifold and still extract a lot of meaningful structure. - Images of cars are more continuous than you might imagine because of hue differences induced by time of day, camera lens, shadows, etc. - Images of cars are potentially smooth even when considering shape and color discontinuities. Manifolds don't have to be globally connected. Local differentiability is usually the thing people are looking for in practical applications.

▲

griffzhowl 2 days ago | parent | prev | next [-]

There's a field known as information geometry. I don't know much about it myself as I'm more into physics, but here's a recent example of applying geometrical analysis to neural networks. Looks interesting as they find a phenomenon analogous to phase transitions during training

Information Geometry of Evolution of Neural Network Parameters While Training

https://arxiv.org/abs/2406.05295

▲

youoy 2 days ago | parent | prev [-]

The closest thing that you may get is a manifold + noise. Maybe some people thing about it in that way. Think for example of the graph of y=sin(x)+noise, you can say that this is a 1 dimensional data manifold. And you can say that locally a data manifold is something that looks like a graph or embedding (with more dimensions) plus noise.

But i am skeptical whether this definition can be useful in the real world of algorithms. For example you can define things like topological data analysis, but the applications are limited, mainly due to the curse of dimensionality.

	▲	qbit42 2 days ago \| parent [-]
		Sometimes statistical rates for empirical risk minimization can be related to the intrinsic dimension of the data manifold (and noise level if present). In such cases, you are running the same algorithm but getting a performance guarantee that depends on the structure of the data, stronger when it is low dimensional.