Remix clone Hacker News

new | show | ask | jobs Github

	▲	pmarreck 4 days ago
		so after your edit it would be (just to clarify): `I use ____ ___ = downscale_common(lightweight1(.)) + downscale_common(lightweight2(.)) ?` And does it generate 2 at a time and keep going that way, or is there some overlap?
	▲	porridgeraisin 4 days ago \| parent [-]
		You generate blocks of 2 at a time yes. In general, k. As you can imagine, larger k performs worse. LLM(I like cats) is very likely to continue with "because they", but beyond that, there's too many possibilities. LLM(I like cats because they are) = small and cute and they meow, while LLM(I like cats because they eat) = all the rats in my garden. If you try to predict the whole thing at once you might end up with I like cats because they are all the rats and they garden > Overlap Check out an inference method called self-speculative decoding which solves(somewhat) the above problem of k-token prediction, which does overlap the same ___ across multiple computations.