> It is almost guaranteed that a 60-90B model can outperform current SOTA in coding tasks within 2-3 years.

I am ready to bet against this. Knowledge benchmark like SimpleQA isn't increasing for small models.

> It is far less clear that a 1.2T model will be meaningfully better enough to justify training it.

Well for one, we know for certain there is Mythos which is meaningfully better. And I think there is a lot of juice left to squeeze for Mythos class model.

▲

onlyrealcuzzo 5 hours ago | parent | next [-]

> Well for one, we know for certain there is Mythos which is meaningfully better.

Do we?

Have you used it?

What is "meaningfully" better? It's not 3-4 orders of magnitude better. That is definitely happening for smaller models.

	▲	YetAnotherNick 3 hours ago \| parent [-]
		What do you mean by 3-4 orders of magnitude better? Was Einstein 3-4 order of magnitude better than us? Meaningful in the sense it could find security vulnerabilities in browser and kernel that >99% of the engineers couldn't find.

▲

ertgbnm 5 hours ago | parent | prev [-]

Knowledge benchmarks can't really be improved upon via distillation or RL. It requires those facts be added to the training corpus and for the model to memorize them better. Neither distillation or RL really do that and thus we shouldn't expect improvements on SimpleQA unless some other interventions are being made.

Model intelligence and knowledge aren't necessarily directly related. If we can pack greater intelligence and agency at the cost of it forgetting factoids, that would actually be a good thing. We don't need LLMs to memorize facts, we need them to learn how to interact with the world such that they can find the facts that are necessary and surface them to the user.

If we could distill all of the knowledge out of an LLM and just be left with a very agentic model that only knows facts in it's context, I think some very interesting stuff would happen.

	▲	slashdave 5 hours ago \| parent \| next [-]
		RL is more than facts. Synthetic feedback is an obvious approach. Does the model suggest code that compiles and performs well?
	▲	YetAnotherNick 3 hours ago \| parent \| prev [-]
		Lot of the things aren't facts that could be stated. No one can just see the dictionary or translation of words and start talking in that language. There isn't a clear definition of what is knowledge and what is intelligence. Is being able to write in C knowledge? Is knowing undefined behaviour in that knowledge?