Remix clone Hacker News

new | show | ask | jobs Github

	▲	robbomacrae 2 hours ago
		I don't think that is entirely fair.. I don't see them stating anywhere they are measuring coding capabilities... "Using complex games to probe real intelligence." And this seems very much in line with the methodology in ARC-AGI-3. The results here, in the OP article and in https://www.designarena.ai all tell a similar story: Kimi K2.6 is up and in the SOTA mix.
	▲	tgv an hour ago \| parent [-]
		The task was writing a "bot" to play the game. The title is "Kimi K2.6 just beat Claude, GPT-5.5, and Gemini in a coding challenge." How does that not imply measuring coding capabilities?