▲ | dragontamer 4 days ago | |
Well, when you consider that AVX 512 instructions have 2 or 3 reads per 1 write, there's a degree of sense here. Consider the standard matrix multiplication primitive the FMAC / multiply and accumulate: 3 reads and one write if I'm counting correctly .... (Output = A * B + C, three reads one output). |