Remix clone Hacker News

new | show | ask | jobs Github

	▲	liteclient 6 hours ago
		it makes sense architecturally they replace dot-product attention with topology-based scalar distances derived from a laplacian embedding - that effectively reduces attention scoring to a 1D energy comparison which can save memory and compute that said, i’d treat the results with a grain of salt give there is no peer review, and benchmarks are only on 30M parameter model so far
	▲	reactordev 5 hours ago \| parent [-]
		Yup, keyword here is “under the right conditions”. This may work well for their use case but fail horribly in others without further peer review and testing.