Remix clone Hacker News

new | show | ask | jobs Github

	▲	chillee 5 days ago
		The 32 parallel sequences is also arbitrary and significantly changes your conclusions. For example, if they run with 256 parallel sequences then that would result in a 8x cheaper factor in your calculations for both prefill and decode. The component about requiring long context lengths to be compute-bound for attention is also quite misleading.
	▲	Barbing 5 days ago \| parent [-]
		Anyone up to publishing their own guess range?