Remix clone Hacker News

new | show | ask | jobs Github

	▲	lattalayta 8 months ago
		I haven't been following them that closely, but are people finding these benchmarks relevant? It seems like these companies could just tune their models to do well on particular benchmarks
	▲	mickael-kerjean 8 months ago \| parent \| next [-]
		The benchmark is something you can optimize for, doesn't mean it generalize well. Yesterday I tried for 2 hours to get claude to create a program that would extract data from a weird adobe file. 10$ later, the best I had is a program that was doing something like: `switch(testFile) { case "test1.ase": // run this because it's a particular case case "test2.ase": // run this because it's a particular case default: // run something that's not working but that's ok because the previous case should // give the right output for all the test files ... }`
	▲	emp17344 8 months ago \| parent \| prev [-]
		That’s exactly what’s happening. I’m not convinced there’s any real progress occurring here.