Remix.run Logo
Modified3019 a day ago

Thanks to asrock motherboards for AMD’s threadripper 1950x working with ECC memory, that’s what I learned to overclock on.

I eventually discovered with some timings I could pass all the usual tests for days, but would still end up seeing a few corrected errors a month, meaning I had to back off if I wanted true stability. Without ECC, I might never have known, attributing rare crashes to software.

From then on I considered people who think you shouldn’t overlock ECC memory to be a bit confused. It’s the only memory you should be overlocking, because it’s the only memory you can prove you don’t have errors.

I found that DDR3 and DDR4 memory (on AMD systems at least) had quite a bit of extra “performance” available over the standard JEDEC timings. (Performance being a relative thing, in practice the performance gained is more a curiosity than a significant real life benefit for most things. It should also be noted that higher stated timings can result in worse performance when things are on the edge of stability.)

What I’ve noticed with DDR5, is that it’s much harder to achieve true stability. Often even cpu mounting pressure being too high or low can result in intermittent issues and errors. I would never overclock non-ECC DDR5, I could never trust it, and the headroom available is way less than previous generations. It’s also much more sensitive to heat, it can start having trouble between 50-60 degrees C and basically needs dedicated airflow when overclocking. Note, I am not talking about the on chip ECC, that’s important but different in practice from full fat classic ECC with an extra chip.

I hate to think of how much effort will be spent debugging software in vain because of memory errors.

monster_truck a day ago | parent | next [-]

DDR4 and 5 both have similar heat sensitivity curves which call for increased refresh timings past 45C.

Some of the (legitimately) extreme overclockers have been testing what amounts to massive hunks of metal in place of the original mounting plates because of the boards bending from mounting pressure, with good enough results.

On top of all of this, it really does not help that we are also at the mercy of IMC and motherboard quality too. To hit the world records they do and also build 'bulletproof', highest performance, cost is no object rigs, they are ordering 20, 50 motherboards, processors, GPUs, etc and sitting there trying them all, then returning the shit ones. We shouldn't have to do this.

I had a lot of fun doing all of this myself and hold a couple very specific #1/top 10/100 results, but it's IMHO no longer worth the time or effort and I have resigned to simply buying as much ram as the platform will hold and leaving it at JEDEC.

golem14 a day ago | parent | prev | next [-]

Hmm, I wonder if we see, now since we are in a RAM availability crisis, more borderline to bad RAMs creep into the supply chain.

If we had a time series graph of this data, it might be revealing.

monster_truck a day ago | parent [-]

If you look around you'll see people already putting the new, chinese made DDR4 through its paces, it's holding up far better than anyone expected.

Every single time I've had someone pay me to figure out why their build isn't stable, it's always some combination of cheap power supply with no noise filtering, cheap motherboard, and poor cooling. Can't cut corners like that if you want to go fast. That is to say, I've never encountered "almost ok" memory. They're quite good at validation.

iamflimflam1 20 hours ago | parent | next [-]

The danger is we’ll start to see more QA rejects coming into the market. The temptation to mix in factory rejects into your inventory is going to get very high for a lot of resellers.

kombine 20 hours ago | parent | prev [-]

Where does one find these? I'm looking for DDR4 ECC for my homelab.

bpye 20 hours ago | parent | prev | next [-]

Similar experience. I played with overclocking the DDR5 ECC memory I have on my system, it would appear to be stable and for quite a while it would be. But after a few days I'd notice a handful of correctable errors.

I now just run at the standard 5600MHz timing, I really don't find the potential stability trade off worth it. We already have enough bugs.

kmeisthax a day ago | parent | prev [-]

> From then on I considered people who think you shouldn’t overlock ECC memory to be a bit confused. It’s the only memory you should be overlocking, because it’s the only memory you can prove you don’t have errors.

This attitude is entirely corporate-serving cope from Intel to serve market segmentation. They wanted to trifurcate the market between consumers, business, and enthusiast segments. Critically, lots of business tasks demand ECC for reliability, and business has huge pockets, so that became a business feature. And while Intel was willing to sell product to overclockers[0], they absolutely needed to keep that feature quarantined from consumer and business product lines lest it destroy all their other segmentation.

I suspect they figured a "pro overclocker" SKU with ECC and unlocked multipliers would be about as marketable as Windows Vista Ultimate, i.e. not at all, so like all good marketing drones they played the "Nobody Wants What We Aren't Selling" card and decided to make people think that ECC and overclocking were diametrically supposed.

[0] In practice, if they didn't, they'd all just flock to AMD.

gruez a day ago | parent | next [-]

>[0] In practice, if they didn't, they'd all just flock to AMD.

only when AMD had better price/performance, not because of ECC. At best you have a handful of homelabbers that went with AMD for their NAS, but approximately nobody who cares about performance switched to AMD for ECC ram, because ECC ram also tend to be clocked lower. Back in Zen 2/3 days the choice was basically DDR4-3600 without ECC, or DDR4-2400 with ECC.

pushedx a day ago | parent | prev [-]

At the beginning of your comment I was wondering if the "attitude" that was corporate serving was the anti-ECC stance or the pro-ECC stance (based on the full chunk that you quoted). I'm glad that by the end of the comment you were clearly pro ECC.

Any workstation where you are getting serious work done should use ECC