| ▲ | A polynomial autoencoder beats PCA on transformer embeddings(ivanpleshkov.dev) | |||||||||||||||||||||||||
| 43 points by timvisee 3 days ago | 14 comments | ||||||||||||||||||||||||||
| ▲ | mentalgear 2 minutes ago | parent | next [-] | |||||||||||||||||||||||||
Geometric Algebra (GA) also has high potential to transform neural architectures. Models like the Geometric Algebra Transformer (GATr) and Versor (2026) demonstrate it can enhance or restructure the Attention Mechanism. By representing data as multivectors, translational and rotational symmetries are encoded natively which allows them to handle geometric hierarchies with massive efficiency gains (reports of up to 78x speedups and 200x parameter reductions) compared to standard Transformers. | ||||||||||||||||||||||||||
| ▲ | folderquestion 28 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||
This sound like projecting data into the linear space spanned by {x_i, x_i*x_j} where x_i are the features variables, and then applying standard regularization methods to remove noise and low value coefficients. Anisotropy and the cone ideas may explain why PCA underperforms, but it does not uniquely justify this particular quadratic decoder. The geometric story is not doing explanatory work beyond “data is nonlinear,” and the real substance is simply that second-order reconstruction empirically helps. | ||||||||||||||||||||||||||
| ▲ | yobbo 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
My understanding after scanning the code examples is the technique expands the dimensionality of each data point with a set consisting of the quadratic coefficients of its existing dimensions. I thought it sounded like kernel PCA. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | pleshkov 3 days ago | parent | prev | next [-] | |||||||||||||||||||||||||
Author here — questions and pushback both welcome. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | teiferer 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
I came here from a discussion about CS students who should not be bothered to set up email filters. How can they ever expect to be able to digest just the first paragraph in that article? | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | magicalhippo 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||
I'm just a casual LLM user, but your description of the anisotropy made me think about the recent work on KV cache quantization techniques such as TurboQuant where they apply a random rotation on each vector before quantizing, as I understood it precisely to make it more isotropic. But for RAG that might be too much work per vector? | ||||||||||||||||||||||||||