| ▲ | zozbot234 3 hours ago | |
OTOH, LLM inference tends to have very predictable memory access patterns. So well-placed prefetch instructions that can execute predictable memory fetches in parallel with expensive compute might help CPU performance quite a bit. I assume that this is done already as part of optimized numerical primitives such as GEMM, since that's where most of the gain would be. | ||