Remix clone Hacker News

new | show | ask | jobs Github

	▲	woadwarrior01 8 hours ago
		> Are they doubling down on local LLMs then? Neural Accelerators (aka NAX) accelerates matmults with tile sizes >= 32. From a very high level perspective, LLM inference has two phases: (chunked) prefill and decode. The former is matmults (GEMM) and the latter is matrix vector mults (GEMV). Neural Accelerators make the former (prefill) faster and have no impact on the latter.