We need custom inference chips at scale for this imho. Every computer (whatever formfactor/board) should have an inference unit on it so at least inference is efficient and fast and can be offloaded while the cpu is doing something else.

▲

Aurornis 2 days ago | parent | next [-]

The bottleneck in common PC hardware is mostly memory bandwidth. Offloading the computation part to a different chip wouldn’t help if memory access is the bottleneck.

There have been a lot of boards and chips for years with dedicated compute hardware, but they’re only so useful for these LLM models that require huge memory bandwidth.

▲

touristtam 2 days ago | parent [-]

It is also to note that the bandwidth bus has seen very little upgrade over the years and even the onboard RAM on GPU card have seen mediocre upgrades. If everyone and their grandma wasn't using NVidia GPUs we would probably have seen a more competitive market and greater changes outside the chip itself.

	▲	bigyabai a day ago \| parent [-]
		I don't think that's true. AMD, Apple and Intel are all dGPU competitors with roughly the same struggle bringing upgrades to market. They have every incentive to release a disruptive product, but refuse to invest in their ecosystem the way Nvidia did.

▲

chvid 2 days ago | parent | prev | next [-]

Look at the specs of this Orange Pi 6+ board - dedicated 30 TPU NPU.

https://boilingsteam.com/orange-pi-6-plus-review/

▲

baq 2 days ago | parent | prev | next [-]

At this point of the timeline compute is cheap, it’s RAM which is basically unavailable.

▲

sofixa 2 days ago | parent | prev | next [-]

Almost all of them have it already. Microsoft's "Copilot+" branding includes a prerequisite for an NPU with a minimal amount of TOPS.

It's just that practically nothing uses those NPUs.

▲

fouc 2 days ago | parent | prev [-]

I can't believe this was downvoted. It makes a lot of sense that it would be highly useful to have mass custom inference chips.

	▲	bigyabai a day ago \| parent [-]
		It's quite easy to understand. The tech industry has gone through 4-5 generations of obsolete NPU hardware that was dead-on-arrival. Meanwhile, there are still GPUs from 2014-2016 that run CUDA and are more power efficient than the NPUs. The industry has to copy CUDA, or give up and focus on raster. ASIC solutions are a snipe chase, not to mention small and slow.