Remix.run Logo
cpburns2009 10 hours ago

Just in case anyone isn't aware. NPUs are low power, slow, and meant for small models.

jcgrillo 7 hours ago | parent [-]

I wonder what was the imagined use case? TBH I was seriously thinking about buying a framework desktop but the NPU put me off.. I don't get why I should have to pay money for a bunch of silicon that doesn't do anything. And now that there's some software support... it still doesn't do anything? Why does it even exist at all then?

ThatPlayer an hour ago | parent | next [-]

At least part of it is probably Microsoft's 40 TOPS NPU requirement for their Copilot+ badge. Intel also have NPUs in their modern CPUs. Phones CPU manufacturers have been doing it even longer, though Google calls theirs TPU.

I use an older Google Coral TPU running in my home lab being used by Frigate NVR for object detection for security cameras. It's more efficient, but less flexible than running it on the GPU.

Don't know if I need an NPU for my daily driver computer, but I would want one for my next home server.

cpburns2009 5 hours ago | parent | prev | next [-]

The NPU is entirely useless for the Framework Desktop, and really all Strix Halo devices. Where it could be useful is cell phones with the examples mentioned by @naasking (audio-text and text-audio processing), and maybe IoT.

naasking 5 hours ago | parent | prev [-]

Small models aren't entirely useless, and the NPU can run LLMs up to around 8B parameters from what I've seen. So one way they could be useful: Qwen3 text to speech models are all under 2B parameters, and Open AI's whisper-small speech to text model is under 1B parameters, so you could have an AI agent that you could talk to and could talk back, where, in theory, you could offload all audio-text and text-audio processing to the low power NPU and leave the GPU to do all of the LLM processing.

zozbot234 5 hours ago | parent | next [-]

You could always offload some layers to the NPU for lower power use and leave the rest to the GPU. If the latter is power throttled (common for prefill, not for decode) that will be a performance improvement.

jcgrillo 2 hours ago | parent | prev [-]

That seems like a really niche use case, and probably not worth the surface area? The power savings would have to be truly astonishing to justify it, given what a small fraction of compute time your average device spends processing voice input. I'd wager the 90th percentile siri/ok google/whatever user issues less than 10 voice queries per day. How much power can they use running on normal hardware and how much could it possibly matter?