Remix clone Hacker News

new | show | ask | jobs Github

	▲	martinald 11 hours ago
		I thought that but it does do a lot better on other benchmarks. Perhaps SWE bench just doesn't capture a lot of the improvement? If the web design improvements people have been posting on twitter, I suspect this will be a huge boon for developers. SWE benchmark is really testing bugfixing/feature dev more. Anyway let's see. I'm still hyped!
	▲	camdenreslink 9 hours ago \| parent \| next [-]
		It seems the benchmarks that had a big jump had to do with visual capabilities. I wonder how that will translate to improvements to the workloads LLMs are currently used for (or maybe it will introduce new workloads).
	▲	rfoo 10 hours ago \| parent \| prev \| next [-]
		SWE Bench doesn't even test bugfixing / feature dev properly after you achieve roughly 70% if you don't benchmaxx it .
	▲	catigula 11 hours ago \| parent \| prev [-]
		That would be great! But AI is a bubble if these models can’t do serious engineering work.