▲ | pbsd 4 days ago | |
Vector ALU instruction latencies are understandably listed as 2 and higher, but this is not strictly the case. From AMD's Zen 5 optimization manual [1], we have
Basically, short vector code sequences that don't fill up the scheduler will have better latency.[1] https://www.amd.com/content/dam/amd/en/documents/processor-t... | ||
▲ | Dylan16807 4 days ago | parent [-] | |
So if you fill up the scheduler with a long line of dependent instructions, you experience a significant slowdown? I wonder why they decided to make it do that instead of limiting size/fill by a bit. What all the tradeoffs were. |