Remix.run Logo
__turbobrew__ 3 days ago

Something I learned the hard way is that SSD performance can nosedive if DISCARD/TRIM commands are not sent to the device. Up to 50% lower throughput on our Samsung DC drives.

Through metrics I noticed that some SSD in a cluster were much slower than others despite being uniform hardware. After a bit of investigation it was found that the slow devices had been in service longer, and we were mot sending DISCARDs to the SSDs due to a default in dm-crypt: https://wiki.archlinux.org/title/Dm-crypt/Specialties#Discar...

The performance penalty for our drives (Samsung DC drives) was around 50% if TRIM was never run. We now run blkdiscard when provisioning new drives and enable discards on the crypt devices and things seem to be much better now.

Reflecting a bit more, this makes me more bullish on system integrators like Oxide as I have seen so many times software which was misconfigured to not use the full potential of the hardware. There is a size of company between a one person shop and somewhere like facebook/google where they are running their own racks but they don’t have the in house expertise to triage and fix these performance issues. If for example you are getting 50% less performance out of your DB nodes, what is the cost of that inefficiency?

p_ing 3 days ago | parent | next [-]

While not the same issue, I took four 500GB Samsung 850 EVO drives and created a Storage Space out of them for Hyper-V VMs. Under any sort of load the volume would reach ~1 second latency. This was on a SAS controller in JBOD mode.

Switched to some Intel 480GB DC drives and performance was in the low milliseconds as I would have thought any drive should be.

Not sure if I was hitting the DRAM limit of the Samsungs or what, spent a bit of time t-shooting but this was a home lab and used Intel DCs were cheap on eBay. Granted, the Samsung EVOs weren't targeted to that type of work.

__turbobrew__ 3 days ago | parent | next [-]

850 EVO is basically the lowest tier consumer device, from what I have read those devices can only handle short bursts of IOs and do not perform well under sustained load.

pkaye 2 days ago | parent | prev | next [-]

The Samsung 850 EVO drives probably used an SLC write cache. A small portion of the NAND is configured to use as an SLC write buffer so they can handle a burst of writes faster and later move them the the MLC/TLC region. This is sufficient for typical consumer workloads.

Another thing you will notice is the 850 EVO is 500GB capacity while the Intel one is 480GB. The difference is capacity is put towards overprovisioning which reduces write amplification. The idea is if you have sufficient free space available, whole NAND blocks will naturally get invalidated before you run out of free blocks.

sitkack 3 days ago | parent | prev [-]

Could be garbage collection pauses. You could try wiping them again with zeros or doing a drive specific reset and see if the performance is normative.

lathiat 3 days ago | parent | prev [-]

The fun part is that for a bunch of SSD drives (especially older ones), sending discard/trim may also tank the performance. Due to firmware bugs.

loeg 2 days ago | parent [-]

You still might need to pace how fast you send discard/trim to modern drives, FWIW.