| ▲ | snovv_crash 3 hours ago | |
Hoe much dedicated cache do these NPUs have? Because it's easy enough to saturate the memory bandwidth using the CPU for compute, never mind the GPU. Adding dark silicon for some special operations isn't going to make out memory bandwidth faster. | ||
| ▲ | bjackman an hour ago | parent [-] | |
Does a cache help with inference workloads anyway? I don't know much about it but my mental model is that for transformers you need random access to billions of parameters. | ||