Remix.run Logo
niemandhier 2 days ago

Wow, I think I might just have grasped one of the sources of the problems we keep seeing with LLMs.

Johnson-Lichtenstrauss guarantees a distance preserving embedding for a finite set of points into a space with a dimension based on the number of points.

It does not say anything about preserving the underlying topology of the contious high dimensional manifold, that would be Takens/Whitney-style embedding results (and Sauer–Yorke for attractors).

The embedding dimensions needed to fulfil Takens are related to the original manifolds dimension and not the number of points.

It’s quite probable that we observe violations of topological features of the original manifold, when using our to low dimensional embedded version to interpolate.

I used AI to sort the hodge pudge of math in my head into something another human could understand, edited result is below:

=== AI in use === If you want to resolve an attractor down to a spatial scale rho, you need about n ≈ C * rho^(-d_B) sample points (here d_B is the box-counting/fractal dimension).

The Johnson–Lindenstrauss (JL) lemma says that to preserve all pairwise distances among n points within a factor 1±ε, you need a target dimension

k ≳ (d_B / ε^2) * log(C / rho).

So as you ask for finer resolution (rho → 0), the required k must grow. If you keep k fixed (i.e., you embed into a dimension that’s too low), there is a smallest resolvable scale

rho* (roughly rho* ≳ C * exp(-(ε^2/d_B) * k), up to constants),

below which you can’t keep all distances separated: points that are far on the true attractor will show up close after projection. That’s called “folding” and might be the source of some of the problems we observe .

=== AI end ===

Bottom line: JL protects distance geometry for a finite sample at a chosen resolution; if you push the resolution finer without increasing k, collisions are inevitable. This is perfectly consistent with the embedding theorems for dynamical systems, which require higher dimensions to get a globally one-to-one (no-folds) representation of the entire attractor.

If someone is bored and would like to discuss this, feel free to email me.

sdl 2 days ago | parent [-]

So basically the map projection problem [1] in higher dimensions?

[1] https://en.m.wikipedia.org/wiki/Map_projection

niemandhier 2 days ago | parent [-]

Worse. Map projection means that you cannot have a mapping that preserves elements of the internal geometry: angles and such.

Violation of topology means that a surface wrongly is mapped to one intersecting itself: Think Klein Bottle.

https://en.wikipedia.org/wiki/Klein_bottle

bravura a day ago | parent [-]

Can you share an actual example demonstrating this potential pathology?

Like many things in ML, this might be a problem in theory but empirically it isn’t important, or is very low on the stack rank of issues with our models.

Nevermark 16 hours ago | parent [-]

A fold means two different regions of topology get projected across each other.

It's a problem for the simplest of reasons, information is lost. You cannot reconstruct the original topology.

In terms of the model, it now can't distinguish between what were completely different regions.

From the Klein bottle perspective, a 4D shape gets projected into a 3D shape. On most of the bottle, there is still a 1 to 1 topological mapping from 3D to 4D versions.

But where two surfaces now intersect, there is now no way to distinguish between previously unrelated information. The model won’t be able to anything sensible with that.

TLDR; We don't like folding.