Remix.run Logo
ack_complete 11 hours ago

People who have worked on the Windows x64 emulator claim that TSO isn't as much of a deal as claimed, other factors like enhanced hardware flag conversion support and function call optimizations play a significant role too:

http://www.emulators.com/docs/abc_exit_xta.htm

neobrain 7 hours ago | parent | next [-]

> People who have worked on the Windows x64 emulator claim that TSO isn't as much of a deal as claimed

This is a misinterpretation of what the author wrote! There is a real and significant performance impact in emulating x86 TSO semantics on non-TSO hardware. What the author argues is that enabling TSO process-wide (like macOS does with Rosetta) resolves this impact but it carries counteracting overhead in non-emulated code (such as the emulator itself or in ARM64EC).

The claimed conclusion is that it's better to optimize TSO emulation itself rather than bruteforce it on the hardware level. The way Microsoft achieved this is by having their compiler generate metadata about code that requires TSO and by using ARM64EC, which forwards any API calls to x86 system libraries to native ARM64 builds of the same libraries. Note how the latter in particular will shift the balance in favor of software-based TSO emulation since a hardware-based feature would slow down the native system libraries.

Without ecosystem control, this isn't feasible to implement in other x86 emulators. We have a library forwarding feature in FEX, but adding libraries is much more involved (and hence currently limited to OpenGL and Vulkan). We're also working on detecting code that needs TSO using heuristics, but even that will only ever get us so far. FEX is mainly used for gaming though, where we have a ton of x86 code that may require TSO (e.g. mono/Unity) but wouldn't be handled by ARM64EC, so the balance may be in favor of hardware TSO either way here.

For reference, this is the paragraph (I think) you were referring to:

> Another common misconception about Rosetta is that it is fast because the hardware enforces Intel memory ordering, something called Total Store Ordering. I will make the argument that TSO is the last thing you want, since I know from experience the emulator has to access its own private memory and none of those memory accesses needs to be ordered. In my opinion, TSO is ar red herring that isn't really improving performance, but it sounds nice on paper.

bri3d 9 hours ago | parent | prev [-]

This is more like what I’d expect! This is a great article too, thank you, this is the kind of thing I come to HN for :)