Most NPUs are almost universally too weak to use for serious LLM inference. Most of the time you get better performance-per-watt out of GPU compute shaders, the majority of NPUs are dark silicon.

Keep in mind - Nvidia has no NPU hardware because that functionality is baked-into their GPU architecture. AMD, Apple and Intel are all in this awkward NPU boat because they wanted to avoid competition with Nvidia and continue shipping simple raster designs.

▲

aurareturn 4 days ago | parent [-]

Apple is in this NPU boat because they are optimized for mobile first.

Nvidia does not optimize for mobile first.

AMD and Intel were forced by Microsoft to add NPUs in order to sell “AI PCs”. Turns out the kind of AI that people want to run locally can’t run on an NPU. It’s too weak like you said.

AMD and Intel both have matmul acceleration directly in their GPUs. Only Apple does not.

▲

bigyabai 4 days ago | parent [-]

Nvidia's approach works just fine on mobile. Devices like the Switch have complex GPGPU pipelines and don't compromise whatsoever on power efficiency.

Nonetheless, Apple's architecture on mobile doesn't have to define how they approach laptops, destops and datacenters. If the mobile-first approach is limiting their addressable market, then maybe Tim's obsessing over the wrong audience?

	▲	aurareturn 3 days ago \| parent [-]
		MacBooks benefit from mobile optimization. Apple just needs to add matmul hardware acceleration into their GPUs.