Remix clone Hacker News

new | show | ask | jobs Github

	▲	omneity 4 hours ago
		Isn’t this in a sense an RNN built out of a slice of an LLM? Which if true means it might have the same drawbacks, namely slowness to train but also benefits such as an endless context window (in theory)
	▲	ctoa 2 hours ago \| parent [-]
		It's sort of an RNN, but it's also basically a transformer with shared layer weights. Each step is equivalent to one transformer layer, the computation for n steps is the same as the computation for a transformer with n layers. The notion of context window applies to the sequence, it doesn't really affect that, each iteration sees and attends over the whole sequence.