Remix clone Hacker News

new | show | ask | jobs Github

	▲	emsign an hour ago
		I hope this horrible time will soon be over when cheaper NPUs come available from more hardware companies, and also when model size get optimized down further. I wonder what hyperscaled compute farms and models will be good for at that running cost when most AI needs can be fulfilled by on-prem and on-device hardware and models. Probably only customer left are big governments. So in the end the tax payer has to pay for those billions of investments by the AI cartel.
	▲	zozbot234 an hour ago \| parent [-]
		The typical NPU is only marginally helpful for on-prem inference. A GPU can read quantized data from main memory and dequantize/pad it locally (making effective use of memory throughput); a NPU often needs to read padded data directly from memory, which is wasteful. So it only helps a little bit wrt. prefill. Also, smaller models can obviously be used but a smaller model will be a lot weaker in real-world knowledge and this tends to limit their smarts in a way that can't be compensated by more thinking.