| ▲ | liteclient 6 hours ago | |
it makes sense architecturally they replace dot-product attention with topology-based scalar distances derived from a laplacian embedding - that effectively reduces attention scoring to a 1D energy comparison which can save memory and compute that said, i’d treat the results with a grain of salt give there is no peer review, and benchmarks are only on 30M parameter model so far | ||
| ▲ | reactordev 5 hours ago | parent [-] | |
Yup, keyword here is “under the right conditions”. This may work well for their use case but fail horribly in others without further peer review and testing. | ||