Remix.run Logo
Is DWPD Still a Useful SSD Spec?(klarasystems.com)
26 points by zdw 5 days ago | 9 comments
mgerdts an hour ago | parent | next [-]

This article misses several important points.

- Consumer drives like Samsung 980 Pro and WD SN 850 Black use TLC as SLC when about 30+% of the drive is erased. At this time you a burst write a bit less than 10% of the drive capacity at 5 GB/s. After that, it slows remarkably. If the filesystem doesn’t automatically trim free space, the drive will eventually be stuck in slow mode all the time.

- Write amplification factor (WAF) is not discussed. Random small writes and partial block deletions will trigger garbage collection, which ends up rewriting data to reclaim freed space in a NAND block.

- A drive with a lot of erased blocks can endure more TBW than one that has all user blocks with data. This is because garbage collection can be more efficient. Again, enable TRIM on your fs.

- Overprovisioning can be used to increase a drive’s TBW. If before you write to your 0.3 DWPD 1024 GB drive, you partition it so you use only 960 GB, you now have a 1 DWPD drive.

- per the NVMe spec there are indicators of drive health in the SMART log page.

- Almost all current datacenter or enterprise drives support an OCP SMART log page. This allows you to observe things like the write amplification factor (WAF), rereads due to ECC errors, etc.

Aurornis 35 minutes ago | parent [-]

You’re also missing an important factor: Many drives now reserve some space that cannot be used by the consumer so they have extra space to work with. This is called factory overprovisioning.

> - Consumer drives like Samsung 980 Pro and WD SN 850 Black use TLC as SLC when about 30+% of the drive is erased. At this time you a burst write a bit less than 10% of the drive capacity at 5 GB/s. After that, it slows remarkably. If the filesystem doesn’t automatically trim free space, the drive will eventually be stuck in slow mode all the time.

This is true, but despite all of the controversy about this feature it’s hard to encounter this in practical consumer use patterns.

With the 980 Pro 1TB you can write 113GB before it slows down. (Source https://www.techpowerup.com/review/samsung-980-pro-1-tb-ssd/... ) So you need to be able to source that much data from another high speed SSD and then fill nearly 1/8th of the drive to encounter the slowdown. Even when it slows down you’re still writing at 1.5GB/sec. Also remember that the drive is factory overprovisioned so there is always some amount of space left to handle some of this burst writing.

For as much as this fact gets brought up, I doubt most consumers ever encounter this condition. Someone who is copying very large video files from one drive to another might encounter it on certain operations, but even in slow mode you’re filling the entire drive capacity in under 10 minutes.

markhahn 5 minutes ago | parent | prev | next [-]

Text is wrong about CRCs: everyone uses pretty heavy ECC, so it's not just a re-read. This also provides a somewhat graduated measure of the block's actual health, so the housekeeping firmware can decide whether to stop using the block (ie, move the content elsewhere).

I'm also not a fan of buy bigger storage concept, or the conspiracy-theory on 480 v 512.

It sure would be nice if when considering a product, you could just look at some claimed stats from the vendor about time-related degradation, firmware sparing policy, etc. we shouldn't have to guess!

mdtancsa 2 hours ago | parent | prev | next [-]

dropping off the bus is the best case fail really. Its more annoying when writes become slower than the other disks often causing confusing performance profiles of the overall array. Having good metrics for each disk (we use telegraf) will help flag it early. On my zfs pools, monitoring disk io for each disk, smartmon metrics help tease that out. For SSDs probably the worst is when there is some firmware bug that triggers on all disks at the same time. e.g. the infamous HP SSD Failure at 32,768 Hours of Use. Yikes!

Havoc 2 hours ago | parent | prev | next [-]

After getting burned by consumer drives I decided it’s time for a zfs array from used enterprise ssds. Tons of writes on them but full mirrored config and zfs is easier to backup so should be ok. And the really noisy stuff like logging im just sticking into optanes - those are 6+ dwpd depending on model which may as well be unlimited for personal use scenarios

igtztorrero 2 hours ago | parent | prev [-]

The most common catastrophic failure you’ll see in SSDs: the entire drive simply drops off the bus as though it were no longer there.

Happened to me last week.

I just put it in a plastic bag into the freezer during 15 minutes, and works.

I made a copy to my laptop and then install a new server.

But not always works like charms.

Please always have a backup for documents, and a recent snapshot for critical systems.

serf 2 hours ago | parent | next [-]

to be perfectly fair though, this isn't a new failure mode when SSDs arrived on the scene.

drive controllers on HDDs just suddenly go to shit and drop off buses, too.

I guess the difference being that people expect the HDD to fail suddenly whereas with a solid state device most people seem to be convinced that the failure will be graceful.

lvl155 2 hours ago | parent | prev [-]

Always make backups to HDD and cloud (and possibly tape if you are a data nut).

zamadatix 2 hours ago | parent [-]

I don't think one should worry as much about what medias they are backing up to as if they are answering the question "does my data resiliency match my retention needs".

And regularly test restores actually work, nothing worse than thinking you had backups and then they don't restore right.