Remix.run Logo
otabdeveloper4 3 hours ago

> higher param count models will remain smarter for a looong time

They're not smarter, they just know more stuff.

You probably don't need knowledge about Pokemon or the Diamond Sutra in your enterprise coding LLM.

The "smarts" comes from post-training, especially around tool use.

anon7725 2 hours ago | parent [-]

If the smarts came from post-training, we could show significant gains by doing that post-training again for previous generations of models. But we know that isn’t happening - effective post training is necessary but not sufficient for model performance.