▲ | bigyabai 4 days ago | ||||||||||||||||
Most NPUs are almost universally too weak to use for serious LLM inference. Most of the time you get better performance-per-watt out of GPU compute shaders, the majority of NPUs are dark silicon. Keep in mind - Nvidia has no NPU hardware because that functionality is baked-into their GPU architecture. AMD, Apple and Intel are all in this awkward NPU boat because they wanted to avoid competition with Nvidia and continue shipping simple raster designs. | |||||||||||||||||
▲ | aurareturn 4 days ago | parent [-] | ||||||||||||||||
Apple is in this NPU boat because they are optimized for mobile first. Nvidia does not optimize for mobile first. AMD and Intel were forced by Microsoft to add NPUs in order to sell “AI PCs”. Turns out the kind of AI that people want to run locally can’t run on an NPU. It’s too weak like you said. AMD and Intel both have matmul acceleration directly in their GPUs. Only Apple does not. | |||||||||||||||||
|