Remix.run Logo
dist-epoch 2 hours ago

Musk said Grok 5 is currently being trained, and it has 7 trillion params (Grok 4 had 3)

svara an hour ago | parent [-]

My understanding is that all recent gains are from post training and no one (publicly) knows how much scaling pretraining will still help at this point.

Happy to learn more about this if anyone has more information.

dist-epoch 34 minutes ago | parent [-]

You gain more benefit spending compute on post-training than on pre-training.

But scaling pre-training is still worth it if you can afford it.