Remix clone Hacker News

	▲	ow5 21 hours ago
		Hi! one of the contributors to the paper — we have kernels not released yet that can shave down decoding latency by >20%. Also when we ran experiments for streaming with the current kernels, we were median ~1.3x slower at inference
	▲	ein0p 20 hours ago \| parent [-]
		Thanks for chiming in! How do you explain the top-most graph in Figure 5? Am I misreading it?