Remix clone Hacker News

new | show | ask | jobs Github

	▲	NitpickLawyer 2 days ago
		> By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini’s architecture by 23%, leading to a 1% reduction in Gemini's training time. The message I replied to said "if I have some toy poorly optimized python example". I think it's safe to say that matmul & kernel optimisation is a bit beyond a small python example.