| ▲ | Const-me 2 hours ago | ||||||||||||||||
> AVX2 level includes FMA (fast multiply-add) FMA acronym is not fast multiply add, it’s fused multiply add. Fused means the instruction computes the entire a * b + c expression using twice as many mantissa bits, only then rounds the number to the precision of the arguments. It might be the Prism emulator failed to translate FMA instructions into a pair of two FMLA instructions (equally fused ARM64 equivalent), instead it did some emulation of that fused behaviour, which in turn what degraded the performance of the AVX2 emulation. | |||||||||||||||||
| ▲ | vintagedave an hour ago | parent [-] | ||||||||||||||||
Author here - thanks - my bad. Fixed 'fast' -> 'fused' :) I don't have insight into how Prism works, but I have wondered if the right debugger would see the ARM code and let us debug exactly what was going on for sure. | |||||||||||||||||
| |||||||||||||||||