| ▲ | Is DWPD Still a Useful SSD Spec?(klarasystems.com) | ||||||||||||||||||||||
| 26 points by zdw 5 days ago | 9 comments | |||||||||||||||||||||||
| ▲ | mgerdts an hour ago | parent | next [-] | ||||||||||||||||||||||
This article misses several important points. - Consumer drives like Samsung 980 Pro and WD SN 850 Black use TLC as SLC when about 30+% of the drive is erased. At this time you a burst write a bit less than 10% of the drive capacity at 5 GB/s. After that, it slows remarkably. If the filesystem doesn’t automatically trim free space, the drive will eventually be stuck in slow mode all the time. - Write amplification factor (WAF) is not discussed. Random small writes and partial block deletions will trigger garbage collection, which ends up rewriting data to reclaim freed space in a NAND block. - A drive with a lot of erased blocks can endure more TBW than one that has all user blocks with data. This is because garbage collection can be more efficient. Again, enable TRIM on your fs. - Overprovisioning can be used to increase a drive’s TBW. If before you write to your 0.3 DWPD 1024 GB drive, you partition it so you use only 960 GB, you now have a 1 DWPD drive. - per the NVMe spec there are indicators of drive health in the SMART log page. - Almost all current datacenter or enterprise drives support an OCP SMART log page. This allows you to observe things like the write amplification factor (WAF), rereads due to ECC errors, etc. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | markhahn 5 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||
Text is wrong about CRCs: everyone uses pretty heavy ECC, so it's not just a re-read. This also provides a somewhat graduated measure of the block's actual health, so the housekeeping firmware can decide whether to stop using the block (ie, move the content elsewhere). I'm also not a fan of buy bigger storage concept, or the conspiracy-theory on 480 v 512. It sure would be nice if when considering a product, you could just look at some claimed stats from the vendor about time-related degradation, firmware sparing policy, etc. we shouldn't have to guess! | |||||||||||||||||||||||
| ▲ | mdtancsa 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
dropping off the bus is the best case fail really. Its more annoying when writes become slower than the other disks often causing confusing performance profiles of the overall array. Having good metrics for each disk (we use telegraf) will help flag it early. On my zfs pools, monitoring disk io for each disk, smartmon metrics help tease that out. For SSDs probably the worst is when there is some firmware bug that triggers on all disks at the same time. e.g. the infamous HP SSD Failure at 32,768 Hours of Use. Yikes! | |||||||||||||||||||||||
| ▲ | Havoc 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
After getting burned by consumer drives I decided it’s time for a zfs array from used enterprise ssds. Tons of writes on them but full mirrored config and zfs is easier to backup so should be ok. And the really noisy stuff like logging im just sticking into optanes - those are 6+ dwpd depending on model which may as well be unlimited for personal use scenarios | |||||||||||||||||||||||
| ▲ | igtztorrero 2 hours ago | parent | prev [-] | ||||||||||||||||||||||
The most common catastrophic failure you’ll see in SSDs: the entire drive simply drops off the bus as though it were no longer there. Happened to me last week. I just put it in a plastic bag into the freezer during 15 minutes, and works. I made a copy to my laptop and then install a new server. But not always works like charms. Please always have a backup for documents, and a recent snapshot for critical systems. | |||||||||||||||||||||||
| |||||||||||||||||||||||