Remix.run Logo
inkyoto 3 days ago

Not an issue for the commenter – since they have mentioned S3, they are either using AWS EBS or instance attached scratch NVMe's which the vendor (AWS) takes care of.

The AWS control plane will detect an ailing SSD backing up the EBS and will proactively evacuate the data before the physical storage goes pear shaped.

If it is an EC2 instance with an instance attached NVMe, the control plane will issue an alert that can be automatically acted upon, and the instance can be bounced with a new EC2 instance allocated from a pool of the same instance type and get a new NVMe. Provided, of course, the design and implementation of the running system are stateless and can rebuild the working set upon a restart.

jitl 3 days ago | parent [-]

EBS is slow. No way we would use it for swap. Gotta be instance storage device. And yes, we can rebuild a node from source data, we do so regularly to release changes anyways.

inkyoto 3 days ago | parent [-]

I figured that you were using instance attached NVMe's since you mentioned the scale of your load – an EBS even with the io2 Express storage class can't keep up with a physical NVMe drive on high intensity I/O tasks.

Regardless, AWS takes care the hardware cycling / migration in either case.