Remix clone Hacker News

new | show | ask | jobs Github

	▲	simonw 6 hours ago
		It doesn't self-improve, that's a misleading headline. As far as I can tell they trained it by running their own reinforcement learning on top of Qwen and Gemma 4 (not sure how they combined weights from both, or if they used Qwen as the basis and Gemma 4 to help train?) - so the "self-improving" is about their training process, not how you use the weights.
	▲	kamranjon 6 hours ago \| parent \| next [-]
		I think the 9b and 31b dense are Gemma models and the 35B-MoE, and 397B-MoE are Qwen models since these are model sizes covered by each of them respectively
	▲	sisve 3 hours ago \| parent \| prev \| next [-]
		Do you think we will get a self-improving model in 26 or 27? Maybe not a native one but some kind of hack so a model will learn something without loosing part of the context window?
	▲	6 hours ago \| parent \| prev \| next [-]
		[deleted]
	▲	kennywinker 6 hours ago \| parent \| prev [-]
		Gotcha. That makes more sense. We ran the model to train the model -> “self-improving”.