| ▲ | thebruce87m 19 hours ago | ||||||||||||||||
That shouldn’t make sense. It’s not like the ECC info is stored in additional bits separate from the data, it’s built in with the data so you can’t “ignore” it. Hmm, off to read the paper. | |||||||||||||||||
| ▲ | smalley 6 hours ago | parent | next [-] | ||||||||||||||||
The ECC information is stored in separate DRAM devices on the DIMM. This is responsible for some of the increased cost of DIMMs with ECC at a given size. When marketed the extra memory for ECC are typically not included in the size for DIMMs so a 32GB DIMM with and without ECC will have differing numbers of total DRAM devices. There's a pretty good set of diagrams and descriptions of the faults in this paper https://dl.acm.org/doi/10.1145/3725843.3756089. Also to the parent: there's an updated public paper on DDR4 era fault observations https://ieeexplore.ieee.org/document/10071066 | |||||||||||||||||
| |||||||||||||||||
| ▲ | Agingcoder 15 hours ago | parent | prev [-] | ||||||||||||||||
I fully agree with you ! Neither soft nor hard memory errors, nothing… but but flips ,and reproducible at that. We scanned all our machines following this ( a few thousand servers ) and found out that ram issues were actually quite common, as said in the paper. | |||||||||||||||||