| ▲ | andai 2 hours ago | |
>The communication speeds are untenable. Can it be parallelized or not? If you take a model, make two copies, and fine-tune each one on different data, what happens when you merge them? Does it work if you freeze different layers? I think this works if the steps are small enough. And the transfer should become tenable if the steps are big enough. Where's the cutoff? | ||