Remix.run Logo
genxy 2 hours ago

> Claude 3 Opus

Unless they are changing the architecture in huge ways. The pre-training done for 3 goes into later models. I am sure the frontier labs are figuring out how to pretrain generic feedstocks that can be fed into downstream training pipelines. DeepSeeks incremental training run cost was what, 5M? Alibaba and DeepSeek have the best most efficient training pipelines, look at the rate at which custom Qwen models are being pumped out.