| ▲ | lumost 2 hours ago | |
They are saying that models should be invariant to data's orientation - and only sensitive to the distance between vectors. This has a pretty significant effect on reducing the set of possible models, and may stabilize the optimization. In simple terms, large ML models like LLMs often learn trivial rules such as "if the 21st decimal place of the 5th dimension in the embedding vector is 5 - then the image is of a cat." Learning such a memorization function is usually not what we are trying to do, and there are a variety of techniques to avoid these trivial solutions and "smooth" the optimization geometry. | ||