Remix clone Hacker News

new | show | ask | jobs Github

	▲	yorwba 7 hours ago
		SWE-Bench Pro consists of 1865 tasks. https://arxiv.org/abs/2509.16941 Qwen3-Coder-Next solved 44.3% (826 or 827) of these tasks. To solve a single task, it took between ≈50 and ≈280 agent turns, ≈150 on average. In other words, a single pass through the dataset took ≈280000 agent turns. Kimi-K2.5 solved ≈84 fewer tasks, but also only took about a third as many agent turns.
	▲	zamadatix 5 hours ago \| parent \| next [-]
		Ah, a spread of the individual tests makes plenty of sense! Many thanks (same goes to the other comments).
	▲	regularfry 6 hours ago \| parent \| prev [-]
		If this is genuinely better than K2.5 even at a third the speed then my openrouter credits are going to go unused.