Remix.run Logo
amluto 3 hours ago

If you do this, please be aware that there is absolutely no guarantee that you will not observe time going backwards. You probably will not have one thread ask for the time twice in a row and get results that are out of order, but you can have thread 1 ask for the time and do a store-release and then have thread 2 do a load-acquire, observe thread 1’s write, and ask for the time, and thread 2’s time may be earlier than thread 1’s. This is because RDTSC by itself does not respect x86’s memory order — it does not act like a load.

source: I wrote a bunch of this code and I’ve tested it fairly extensively.

vlovich123 2 hours ago | parent [-]

Do you know if the rust quanta / fastant crates have this problem? I feel like they don’t but I haven’t actually dug into the implementation. The reason I think not is that at least in the case of quanta the clock value can be made to be broadcast from a single clock maintainer thread. But even when its using plain rdtsc it says it upholds monotonicity barring kernel/virtualization bugs:

https://docs.rs/quanta/latest/quanta/struct.Instant.html

So I think it’s possible to do this correctly?

amluto an hour ago | parent [-]

If it’s calling clock_gettime, it should be fine. If it uses RDTSCP, it should be fine (assuming your system actually has synchronized TSCs, and there is a long history of this failing). If it uses the sadly vendor-dependent magic incantation involving LFENCE or MFENCE, it should be fine. If it does plain RDTSC, it may not be fine.

(I have no special insight into what Intel and AMD CPUs do under the hood, but my best guess has always been that they are implemented by ucode that has no dependencies on anything in the register file except whatever might be internal to the ucode for the instruction itself. And the dispatch logic will cheerfully schedule it as such, including moving it before loads that precede in the instruction stream. Since RDTSC itself isn’t a load, the magic that makes all loads be acquires does not apply. RDTSCP is probably an excessively heavily pessimized version that waits for earlier loads to actually happen. The really nice hypothetical version where RDTSC “loads” a virtual loadable register in the coherency domain and can be speculated just like a real load is probably too complex to be worth implementing.)