I think this is predicted? Part of the story is how they were able to preserve core reasoning ability while cutting knowledge like "pelicans have wings."

> these findings motivate the Parametric Compression-Coverage Hypothesis, which views verifiable reasoning as compressible into compact reasoning cores, while open-domain knowledge and general-purpose competence require broad parameter coverage over facts, concepts, and long-tail scenarios.

▲

pylotlight 8 hours ago | parent [-]

The only real essential item here is tool calling capability is it not? So I assume they tested a strong read/write/edit tool consistency?

	▲	nsingh2 8 hours ago \| parent \| next [-]
		This model doesn't support tool calling, was not part of its training. It's focused on Python (and I think C++) competitive programming and mathematics tasks, i.e. tasks with verifiable rewards. So if you have a task that fits that description, the size-to-capability ratio is good. These kinds of models might be more useful as tools to be used by larger orchestrator models, than being the orchestrators themselves.
	▲	btown 7 hours ago \| parent \| prev [-]
		I'm not seeing any mention of tools in the paper, much less a bias towards "curiosity" to use those tools when it encounters gaps in its knowledge. So perhaps this is a good proof-of-concept that single-pass code generation is viable with this small a model - but we're still a long way from a viable solution.