This is hell for a lot of ML containers, that have gigabytes of CUDA and PyTorch. Before at least you could keep your code contained to a layer. But if I understand this correctly every code revision duplicates gigabytes of the same damn bloated crap.

▲

spwa4 4 hours ago | parent [-]

If you have problems with 13 (I believe) GB of docker layers ... how do you deal with terabytes or petabytes of AI training data?

	▲	epistasis 4 hours ago \| parent \| next [-]
		Petabytes of training data is only one application of PyTorch, which is going to use tens of thousands of containers, but... Inference, development cycles, any of the application domains of PyTorch that don't involve training frontier models... all of those are complicated by excessive container layers. But mostly dev really sucks with writing out an extra 10GB for a small code change.
	▲	StableAlkyne 4 hours ago \| parent \| prev \| next [-]
		You don't even need MB of training data for some ML applications. AI is the sexy thing nowadays, but neural networks (Torch is a NN library) are generally useful for even small regression and clarification problems. For some problems you might even be able to get away with single digit numbers of training points (classic example of this regime being Physics-Informed Neural Networks)
	▲	Normal_gaussian 4 hours ago \| parent \| prev \| next [-]
		the training data is on a separate drive; or the training data isn't that large for this use case; or they aren't training.
	▲	0cf8612b2e1e 2 hours ago \| parent \| prev [-]
		You don’t train petabytes on your laptop.