Remix.run Logo
kgeist 2 hours ago

So, does this snapshotting optimization support arbitrary containers?

I'm currently planning to deploy using Amazon SageMaker, but a cold start takes a whopping ~9 minutes: 6 minutes for instance provisioning + 3 minutes for PyTorch initialization. My Docker image is ~14 GB, and the weights are a few GB. How long would it take to cold start this configuration on Modal?

SageMaker's performance makes it pretty much useless without many warm instances around (= tens of thousands of dollars per month), because users won't be happy if they have to randomly wait 9 minutes

charles_irl an hour ago | parent [-]

Yep! That should start in ten seconds or so -- about a second per gigabyte of weights, plus a second to start the container and a few seconds to load the memory snapshot.

There are a few limitations with snapshotting, e.g. it generally fails when using multiple GPUs, which we document here: https://modal.com/docs/guide/memory-snapshots.