Remix.run Logo
aforwardslash a day ago

> In some cases we have even seen crashes in non-memory instructions (e.g. MOV ZR, R1), which implicates misexecution: a fault in the CPU (or a bug in the telemetry bookkeeping, I suppose).

Thats the thing. Bit flips impact everything memory-resident - that includes program code. You have no way of telling what instruction was actually read when executing the line your instrumentation may say corresponds to the MOV; or it may have been a legit memory operation, but instrumentation is reporting the wrong offset. There are some ways around it, but - generically - if a system runs a program bigger than the processor cache and may have bit flips - the output is useless, including whatever telemetry you use (because it is code executed from ram and will touch ram).

adonovan 20 hours ago | parent [-]

Good point: I-cache is memory too. (Indeed it is SRAM, so its bits might be even more fragile than DRAM!)

c-c-c-c-c 16 hours ago | parent [-]

Why would a 6T cell (SRAM) be more fragile than a 1T1C (DRAM) cell?

zinekeller 12 hours ago | parent [-]

Because it's SRAM, and therefore it still can lose its electrons because we're working with cells a few atoms thick? The loss is not necessarily in L1 (where it's replaced frequently), but in L3 which now has memory comparable to PCs in the early 2000s (and can have its data "stuck" in the same physical area for minutes).