Remix clone Hacker News

new | show | ask | jobs Github

	▲	quinnjh 3 hours ago
		the field is advancing so fast it's hard to do real science as their will be a new SOTA by the time you're ready to publish results. i think this is a combination of that and people having a laugh. Would you mind sharing which benchmarks you think are useful measures for multimodal reasoning?
	▲	techpression 2 hours ago \| parent [-]
		A benchmark only tests what the benchmark is doing, the goal is to make that task correlate with actually valuable things. Graphic benchmarks is a good example, extremely hard to know what you will get in a game by looking at 3D Mark scores, it varies by a lot. Making a SVG of a single thing doesn’t help much unless that applies to all SVG tasks.