Remix.run Logo
dmitrygr a day ago

> As mentioned in the article, TSO is not exclusive to Apple's ARM implementation.

I thought I had been quite clear. I guess I'll try again even more clearly.

"TSO" is three letters. It is not a spec. "We all do TSO" is as meaningful as "we all want world peace". Everyone has their own meaning for those words, and the meanings may differ significantly. Each is a memory model, and each can be called "TSO". But just like not every "John Smith" is the same person, nor is everything called "TSO" the same. Does NVIDIA's TSO order ALL reads with respect to ALL writes? Does Apple's? What does x86 do in that case? What does a Fujitsu CPU do? "TSO" does not mean the same thing to everyone just like "world peace" does not. If, for example, NVIDIA came out and said "our TSO mode complies 100% with x86 memory model and will always continue to", then Fujitsu did the same, and then (LOL) Apple also publicly promised that, then and only then would your comment make sense. As it stands, four entities use the same acronym to each mean their own thing, and you are assuming absolute equality because the three letters match.

Fun story: I know FOR A FACT the answer to my above question about ordering of all reads vs all writes is not the same for x86, Apple's TSO, NVIDIA's TSO, and Fujitsu's TSO. Do you? Do you know how? Do you know how the answers might change with time and hardware revisions, given that at least Apple made no promises as to how their undocumented TSO mode works today or will work tomorrow? Exactly...

One cannot build a stable f{ea,u}ture on undocumented un[der]specified hardware features.

Dylan16807 a day ago | parent | next [-]

> I know FOR A FACT the answer to my above question about ordering of all reads vs all writes is not the same for x86, Apple's TSO, NVIDIA's TSO, and Fujitsu's TSO.

Well of course they differ. TSO says that some reorderings are banned and some are optional, and there's a million factors that go into deciding when those options are taken.

> "TSO" is three letters. It is not a spec.

It's a few rules that you can depend on. Are those rules not enough to build a program on top of? The simpler you make your rules, the less spec you need. On the other end of the spectrum, a dozen specialized memory barriers need a ton of explanation.

dmitrygr a day ago | parent [-]

>> "TSO" is three letters. It is not a spec.

>It's a few rules that you can depend on.

Until properly specified they are not "rules" but "hopes". Apple made no promises and provided no specs for their TSO mode. What makes you sure that that TSO bit on AppleM4pro acts the same as on AppleM1? That same "TSO" bit might mean yet a third thing on AppleM7megaMaxProEliteG2 in 2031. How do you know that an OS update that also updated iBoot on your Mac did not change some internal chip config MSR and now even on your AppleM4pro CPU whose TSO you understood, it acts differently due to this config bit change?

Dylan16807 a day ago | parent [-]

I wasn't talking about Apple's promises, I was talking about the meaning of "TSO". If you know you have TSO, you have some rules you can depend on. What's an example of something you need beyond those rules, to write correctly concurrent code?

dmitrygr a day ago | parent [-]

> If you know you have TSO

"If you know you have world peace"

Sure, now define "total". Which accesses does that affect and which ones does it not? Is device memory included? PCIe memory? Are there ordering guarantees between mappings with different permissions?

Then, define "store ordering". Does it affect loads in any way? Or simply just stores?

Dylan16807 a day ago | parent | next [-]

> Sure, now define "total". Which accesses does that affect and which ones does it not? Is device memory included? PCIe memory? Are there ordering guarantees between mappings with different permissions?

At a basic level TSO is a model for how cores interact and devices are weird, so I'd say those get to be unspecified.

And ideally you want a line saying if the instruction cache needs to be flushed for self-modifying code since that's kind of a violation if not specified but it's a forgivable one.

> Then, define "store ordering".

Sure, though I'm not promising my wording is perfect: In TSO, when stores complete they become visible to all other cores and all cores agree on the exact same list of completed stores.

> Does it affect loads in any way? Or simply just stores?

Depends on what you mean by "affects". Loads in one core might not see stores from another core that have not yet reached the global/total list.

slabtickler a day ago | parent | next [-]

just speaking honestly i would not consider I$ snooping as part of the definition of TSO. it is part of the x86 memory model yes but “TSO” does not define the full story here

dmitrygr a day ago | parent | prev [-]

> when stores complete they become visible to all other cores and all cores agree on the exact same list of completed stores.

Not that they agree on what completed but on the order they completed in. That is the "o" in TSO. You inadvertently proved my point.

.

> so I'd say those get to be unspecified.

* CRASH *

You left something unspecified that mattered. Ordering of accesses to mappings with differing permissions matters, and whether they are seen in-program-order or not by other cores will break x86 emulators (main use cases for TSO).

.

That's the point here :) This is the usual "i am sure we can all agree what X means" argument - it does not work when it comes to precise things like memory models.

Dylan16807 a day ago | parent [-]

> Not that they agree on what completed but on the order they completed in. That is the "o" in TSO. You inadvertently proved my point.

A list is ordered. You're trying too hard to nitpick. (Also I gave a disclaimer that my wording wasn't perfect, and it only took a couple words for you to "fix" it. If it can be fixed that easily then that doesn't actually counteract my point.)

> You left something unspecified that mattered. Ordering of accesses to mappings with differing permissions matters, and whether they are seen in-program-order or not by other cores will break x86 emulators (main use cases for TSO).

How many x86 emulators have the emulated code talking directly to hardware, to the same piece of hardware, from multiple cores at the same time?

I don't think this is a "main use case".

Plus there's going to be a baseline for how talking to the hardware works. Only TSO-mode-specific details of the hardware access are left unsaid in this basic model, and many access patterns fitting the above description still won't notice anything one way or the other.

gpderetta 18 hours ago | parent | prev [-]

> define "store ordering". Does it affect loads in any way? Or simply just stores

It affects the visible ordering of remote stores to normal memory, so load are necessarily affected (it wouldn't make sense to guarantee a store order if unobservable).

Really, TSO is defined independently of x86 and in fact it took a while to actually prove that x86 was TSO. Concretely, how do architectures that claim (optional) TSO differ from each other at least for access to normal memory?

GeekyBear a day ago | parent | prev [-]

Were you aware that all the BIOS implementations used in PC compatible computers (Compaq, AMI, Phoenix, etc.) were not identical and were compatible to a greater or lesser extent with the original IBM BIOS, yet Linux somehow supported PC compatible computers?

> Someone saying "it is TSO" is not documentation.

Trying to re-implement what IBM's BIOS did was not documentation either.

The original sets the standard, whether a given implementation is perfectly equivalent or not.

dmitrygr a day ago | parent [-]

I see no further point for this discussion. Either you truly do not understand or are pretending to not understand the difference between memory models (affect literally every memory access as long as the system is powered up) and BIOS (not used once the OS is up, and thus one-time at-boot quirks handling code can work around most issues). Either way, g'day.

Oh, and to answer your question, yes, quite aware, actually. I've done quite a bit of low level work over the decades, including, curiously, working in the Apple platform kernel team at the time when this TSO bit appeared.