Remix clone Hacker News

new | show | ask | jobs Github

	▲	benob 10 hours ago
		Reminds me of "Universal pre-training by iterated random computation" https://arxiv.org/pdf/2506.20057, with bit less formal approach. I wonder if there is a closed-form solution for those kinds of initialization methods (call them pre-training if you wish). A solution that would allow attention heads to detect a variety of diverse patterns, yet more structured than random init.
	▲	ACCount37 6 hours ago \| parent [-]
		I'm partial to "pre-pre-training" myself.