Remix.run Logo
drewg123 14 hours ago

Does anybody know if the X2 supports the x86 Total store ordering (TSO) memory ordering model? That's how Apple silicon does such efficient emulation of x86. I'd think that would be even MORE important for a Windows ARM64 laptop where there is so much more legacy x86 software going back decades.

bri3d 13 hours ago | parent | next [-]

Does anyone have benchmarks for Rosetta with TSO vs the Linux version with no-TSO? I guess it might be a bit challenging to achieve apples to apples, although you could run a test benchmark on OSX and then Asahi on the same hardware, I think?

I've always been curious about just how much Rosetta magic is the implementation and how much is TSO; Prism in Windows 24H2 is also no slouch. If the recompiler is decent at tracing data dependencies it might not have to fence that much on a lot of workloads even without hardware TSO.

ack_complete 11 hours ago | parent | next [-]

People who have worked on the Windows x64 emulator claim that TSO isn't as much of a deal as claimed, other factors like enhanced hardware flag conversion support and function call optimizations play a significant role too:

http://www.emulators.com/docs/abc_exit_xta.htm

neobrain 7 hours ago | parent | next [-]

> People who have worked on the Windows x64 emulator claim that TSO isn't as much of a deal as claimed

This is a misinterpretation of what the author wrote! There is a real and significant performance impact in emulating x86 TSO semantics on non-TSO hardware. What the author argues is that enabling TSO process-wide (like macOS does with Rosetta) resolves this impact but it carries counteracting overhead in non-emulated code (such as the emulator itself or in ARM64EC).

The claimed conclusion is that it's better to optimize TSO emulation itself rather than bruteforce it on the hardware level. The way Microsoft achieved this is by having their compiler generate metadata about code that requires TSO and by using ARM64EC, which forwards any API calls to x86 system libraries to native ARM64 builds of the same libraries. Note how the latter in particular will shift the balance in favor of software-based TSO emulation since a hardware-based feature would slow down the native system libraries.

Without ecosystem control, this isn't feasible to implement in other x86 emulators. We have a library forwarding feature in FEX, but adding libraries is much more involved (and hence currently limited to OpenGL and Vulkan). We're also working on detecting code that needs TSO using heuristics, but even that will only ever get us so far. FEX is mainly used for gaming though, where we have a ton of x86 code that may require TSO (e.g. mono/Unity) but wouldn't be handled by ARM64EC, so the balance may be in favor of hardware TSO either way here.

For reference, this is the paragraph (I think) you were referring to:

> Another common misconception about Rosetta is that it is fast because the hardware enforces Intel memory ordering, something called Total Store Ordering. I will make the argument that TSO is the last thing you want, since I know from experience the emulator has to access its own private memory and none of those memory accesses needs to be ordered. In my opinion, TSO is ar red herring that isn't really improving performance, but it sounds nice on paper.

bri3d 9 hours ago | parent | prev [-]

This is more like what I’d expect! This is a great article too, thank you, this is the kind of thing I come to HN for :)

justincormack 4 hours ago | parent | prev [-]

There was a paper with benchmarks posted recently here but I cant find it immediately. I think it was 6-10% from memory.

londons_explore 13 hours ago | parent | prev [-]

For really old software, it tends not to make good use of multiple cores anyway and you can simply emulate just a single core to achieve total store ordering.

Anything modern and popular and you can probably get it recompiled to ARM64

0x000xca0xfe 12 hours ago | parent [-]

Unfortunately games are the most common demanding multithread applications. Studios throw a binary over the fence and then get dissolved. Seems to be the way the entire industry operates.

Maybe more ISA diversity will incentivize publishers to improve long-term software support but I have little hope.