Remix.run Logo
quinnjh 3 hours ago

the field is advancing so fast it's hard to do real science as their will be a new SOTA by the time you're ready to publish results. i think this is a combination of that and people having a laugh.

Would you mind sharing which benchmarks you think are useful measures for multimodal reasoning?

techpression 2 hours ago | parent [-]

A benchmark only tests what the benchmark is doing, the goal is to make that task correlate with actually valuable things. Graphic benchmarks is a good example, extremely hard to know what you will get in a game by looking at 3D Mark scores, it varies by a lot. Making a SVG of a single thing doesn’t help much unless that applies to all SVG tasks.