I like the peer comment's answer about a processing time threshold (e.g., a day). Another obvious threshold is data that doesn't conveniently fit on local disks. Large scale processing solutions can often process directly from/to object stores like S3. And if it's running inside the same provider (e.g., AWS in the case of S3), data can often be streamed much faster than with local SSDs. 10GB/s has been available for a decade or more, and I think 100GB/s is available these days.

▲

betaby 4 hours ago | parent [-]

> data can often be streamed much faster than with local SSDs. 10GB/s has been available for a decade or more, and I think 100GB/s is available these days.

In practice most AWS instances are 10Gbps capped. I have seen ~5Gbps consistently read from GCS and S3. Nitro based images are in theory 100Gbps capable, in practice I've never seen that.

	▲	sgarland 3 hours ago \| parent [-]
		Also, anything under 16 vCPUs generally has baseline / burst bandwidth, with the burst being best-effort, 5-60 minutes. This has, at multiple companies for me, been the cause of surprise incidents, where people were unaware of this fact and were then surprised when the bandwidth suddenly plummeted by 50% or more after a sustained load.