| ▲ | anonzzzies 2 days ago | ||||||||||||||||
We need custom inference chips at scale for this imho. Every computer (whatever formfactor/board) should have an inference unit on it so at least inference is efficient and fast and can be offloaded while the cpu is doing something else. | |||||||||||||||||
| ▲ | Aurornis 2 days ago | parent | next [-] | ||||||||||||||||
The bottleneck in common PC hardware is mostly memory bandwidth. Offloading the computation part to a different chip wouldn’t help if memory access is the bottleneck. There have been a lot of boards and chips for years with dedicated compute hardware, but they’re only so useful for these LLM models that require huge memory bandwidth. | |||||||||||||||||
| |||||||||||||||||
| ▲ | chvid 2 days ago | parent | prev | next [-] | ||||||||||||||||
Look at the specs of this Orange Pi 6+ board - dedicated 30 TPU NPU. | |||||||||||||||||
| ▲ | baq 2 days ago | parent | prev | next [-] | ||||||||||||||||
At this point of the timeline compute is cheap, it’s RAM which is basically unavailable. | |||||||||||||||||
| ▲ | sofixa 2 days ago | parent | prev | next [-] | ||||||||||||||||
Almost all of them have it already. Microsoft's "Copilot+" branding includes a prerequisite for an NPU with a minimal amount of TOPS. It's just that practically nothing uses those NPUs. | |||||||||||||||||
| ▲ | fouc 2 days ago | parent | prev [-] | ||||||||||||||||
I can't believe this was downvoted. It makes a lot of sense that it would be highly useful to have mass custom inference chips. | |||||||||||||||||
| |||||||||||||||||