| ▲ | kgeist 2 hours ago | |
So, does this snapshotting optimization support arbitrary containers? I'm currently planning to deploy using Amazon SageMaker, but a cold start takes a whopping ~9 minutes: 6 minutes for instance provisioning + 3 minutes for PyTorch initialization. My Docker image is ~14 GB, and the weights are a few GB. How long would it take to cold start this configuration on Modal? SageMaker's performance makes it pretty much useless without many warm instances around (= tens of thousands of dollars per month), because users won't be happy if they have to randomly wait 9 minutes | ||
| ▲ | charles_irl an hour ago | parent [-] | |
Yep! That should start in ten seconds or so -- about a second per gigabyte of weights, plus a second to start the container and a few seconds to load the memory snapshot. There are a few limitations with snapshotting, e.g. it generally fails when using multiple GPUs, which we document here: https://modal.com/docs/guide/memory-snapshots. | ||