▲ | c0l0 5 days ago | |||||||||||||||||||||||||||||||
I see a particular ECC error at least weekly on my home desktop system, because one of my DIMMs doesn't like the (out of spec) clock rate that I make it operate at. Looks like this:
(this is `sudo ras-mc-ctl --errors` output)It's always the same address, and always a Corrected Error (obviously, otherwise my kernel would panic). However, operating my system's memory at this clock and latency boosts x265 encoding performance (just one of the benchmarks I picked when trying to figure out how to handle this particular tradeoff) by about 12%. That is an improvement I am willing to stomach the extra risk of effectively overclocking the memory module beyond its comformt zone for, given that I can fully mitigate it by virtue of properly working ECC. | ||||||||||||||||||||||||||||||||
▲ | Hendrikto 5 days ago | parent | next [-] | |||||||||||||||||||||||||||||||
Running your RAM so far out of spec that it breaks down regularly, where do you take the confidence that ECC will still work correctly? Also: Could you not have just bought slightly faste RAM, given the premium for ECC? | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | kderbe 4 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
I would loosen the memory timings a bit and see if that resolves the ECC errors. x265 performance shouldn't fall since it generally benefits more from memory clock rate than latency. Also, could you share some relevant info about your processor, mainboard, and UEFI? I see many internet commenters question whether their ECC is working (or ask if a particular setup would work), and far fewer that report a successful ECC consumer desktop build. So it would be nice to know some specific product combinations that really work. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | ainiriand 4 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||
I think you've found a particularly weak memory cell, I would start thinking about replacing that module. The consistent memory_channel=1, csrow=0 pattern confirms it's the same physical location failing predictably. |