Remix clone Hacker News

new | show | ask | jobs Github

	▲	rob_c 3 days ago
		Yes and working out how to disentangle the information storage mechanisms from say language processing is a massive area of interest. Only problem with Attention Transformers imo is that they're a bit too good :p Imagine a slightly lossy compression algorithm which can store 10x, 100x the current best lossless and be able to maintain 99.999% fidelity when recalling that information. Probably, very probably a pipe dream. But why do large on device models seem to be able to remember adjust everything from Wikipedia and store that in smaller format than a direct archive of the source Material. (Look at the current best from diffusion models as well)