| ▲ | bjackman 3 hours ago | |
Does a cache help with inference workloads anyway? I don't know much about it but my mental model is that for transformers you need random access to billions of parameters. | ||
| ▲ | fc417fc802 6 minutes ago | parent [-] | |
It's streaming access, and no not as far as I'm aware. APUs have always been hilariously bottlenecked on memory bandwidth as soon as your task actually needed to pull in data. The only exception I know of is the PS5 because it uses GDDR instead of desktop memory. | ||