As someone who spent a year writing an SDK specifically for AI PCs, it always felt like a solution in search of a problem. Like watching dancers in bunny suits sell CPUs, if the consumer doesn't know the pain point you're fixing, they won't buy your product.

▲

martinald a day ago | parent | next [-]

Tbh it's been the same in Windows PCs since forever. Like MMX in the Pentium 1 days - was marketed as basically essential for anything "multimedia" but provided somewhat between no and minimal speedup (v little software was compiled for it).

It's quite similar with Apple's neural engine, which afiak is used very little for LLMs, even for coreML. I know I don't think I ever saw it being used in asitop. And I'm sure whatever was using it (facial recognition?) could have easily ran on GPU with no real efficiency loss.

▲

Maxatar a day ago | parent | next [-]

I have to disagree with you about MMX. It's possible a lot of software didn't target it explicitly but on Windows MMX was very widely used as it was integrated into DirectX, ffmpeg, GDI, the initial MP3 libraries (l3codeca which was used by Winamp and other popular MP3 players) and the popular DIVX video codec.

	▲	conductr a day ago \| parent \| next [-]
		Similar to AI PC's right now, very few consumers cared in late 90s. Majority weren't power users creating/editing videos/audio/graphics. Majority of consumers were just consuming and they never had a need to seek out MMX for that, their main consumption bottleneck was likely bandwidth. If they used MMX indirectly in Winamp or DirectX, they probably had no clue. Today, typical consumers aren't even using a ton of AI or enough to even make them think to buy specialized hardware for it. Maybe that changes but it's the current state.
	▲	bombcar a day ago \| parent \| prev \| next [-]
		MMX had a chicken/egg problem; it did take awhile to "take off" so early adopters really didn't see much from it, but by the time it was commonplace it was doing some work.
	▲	martinald a day ago \| parent \| prev [-]
		ffmpeg didn't come out for 4 years after the MMX brand was introduced! Of course MMX was widely used later but at the time it was complete marketing.

▲

giantrobot a day ago | parent | prev | next [-]

Apple's neural engine is used a lot by the non-LLM ML tasks all over the system like facial recognition in photos and the like. The point of it isn't to be some beefy AI co-processor but to be a low-power accelerator for background ML workloads.

The same workloads could use the GPU but it's more general purpose and thus uses more power for the same task. The same reason macOS uses hardware acceleration for video codecs and even JPEG, the work could be done on the CPU but cost more in terms of power. Using hardware acceleration helps with the 10+ hour lifetime on the battery.

▲

martinald a day ago | parent [-]

Yes of course but it's basically a waste of silicon (which is very valuable) imo - you save a handful of watts to do very few tasks. I would be surprised if in the length of my MacBook the NPU has been utilised more than 1% of the time the system is being used.

You still need a GPU regardless if you can do JPEG and h264 decode on the card - for games, animations, etc etc.

▲

adastra22 a day ago | parent [-]

Do you use Apple's Photos app? Ever see those generated "memories," or search for photos by facial recognition? Where do you think that processing is being done?

Your macbook's NPU is probably active every moment that your computer is on, and you just didn't know about it.

▲

martinald a day ago | parent [-]

How often is the device either generating memories or I'm searching for photos? I don't use Apple Photos fwiw, but even if I did I doubt I'd be in that app for 1% of my total computer time, and of that time only a fraction of the time would be spent doing stuff on the ANE. I don't think searching for photos requires that btw, if they are already indexed it's just a vector search.

You can use asitop to see how often it's actually being used.

I'm not saying it's not ever used, I'm saying it's used so infrequently that any (tiny) efficiency gains do not trade off vs running it on the GPU.

	▲	adastra22 a day ago \| parent \| next [-]
		Continuously in the background. There's basically a nonstop demand for ML things being queued up to run on this energy-efficient processor, and you see the results as they come in. That indexing operation is slow, and run continuously!
	▲	kalleboo a day ago \| parent \| prev [-]
		You also have Safari running OCR on every image and video on every webpage you load to let you select and copy text

▲

buildbot a day ago | parent | prev | next [-]

Using VisionOCR stuff on MacOS spins my M4 ANE up from 0 to 1W according to poweranalyzer

▲

heavyset_go a day ago | parent | prev [-]

The silicon is sitting idle in the case of most laptop NPUs. In my experience, embedded NPUs are very efficient, so there's theoretically real gains to be made if the cores were actually used.

▲

martinald a day ago | parent [-]

Yes but you could use the space on die for GPU cores.

	▲	heavyset_go 14 hours ago \| parent [-]
		At least with the embedded platforms I'm familiar with, dedicated silicon to NPU is both faster and more power efficient than offloading to GPU cores. If you're going to be doing ML at the edge, NPUs still seem like the most efficient use of die space to me.

▲

ezst a day ago | parent | prev | next [-]

It's even worse and sadder. Consumers already paid a premium for that, because the monopolists in place made it unavoidable. And now, years later, engineers (who usually are your best advocates and evangelists when it comes to bringing new technologies to the material world) are desperate to find any reason at all for those things to exist and not be a complete waste of money and resources.

▲

convivialdingo a day ago | parent | prev [-]

I spent a few months working on different edge compute NPUs (ARM mostly) with CNN models and it was really painful. A lot of impressive hardware, but I was always running into software fallbacks for models, custom half-baked NN formats, random caveats, and bad quantization.

In the end it was faster, cheaper, and more reliable to buy a fat server running our models and pay the bandwidth tax.