| ▲ | dboreham a day ago | ||||||||||||||||||||||||||||||||||
That tells you one bit was changed. It doesn't prove that single bit changed due to a hardware failure. It could have been changed by broken software. | |||||||||||||||||||||||||||||||||||
| ▲ | sfink 8 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
[I work at Mozilla] Yes, that's a confounding factor, and in fact the starting assumption when looking at a crash. Sometimes you can be pretty sure it's hardware. For example, if it's a crash on an illegal instruction in non-JITted code, the crash reporter can compare that page of data with the on-disk image that it's supposed to be a read-only copy of. Any mismatches there, especially if they're single bit flips, are much more likely to be hardware. But I've also seen it several times when the person experiencing the crashes engages on the bug tracker. Often, they'll get weird sporadic but fairly frequent crashes when doing a particular activity, and so they'll initially be absolutely convinced that we have a bug there. But other people aren't reporting the same thing. They'll post a bunch of their crash reports, and when we look at them, they're kind of all over the place (though as they say, almost always while doing some particular thing). Often it'll be something like a crash in the garbage collector while watching a youtube video, and the crashes are mostly the same but scattered in their exact location in the code. That's a good signal to start suspecting bad memory: the GC scans lots of memory and does stuff that is conditional on possibly faulty data. We'll start asking them to run a memory test, at least to rule out hardware problems. When people do it in this situation, it almost always finds a problem. (Many people won't do it, because it's a pain and they're understandably skeptical that we might be sandbagging them and ducking responsibility for a bug. So we don't start proposing it until things start feeling fishy.) But anyway, that's just anecdata from individual investigations. gsvelto's post is about what he can see at scale. | |||||||||||||||||||||||||||||||||||
| ▲ | LeifCarrotson a day ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
Broken software causes null pointer references and similar logic errors. It would be extremely unusual to have an inadvertent
that got inserted in the code by accident. That's just not the way that we write software. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||