| ▲ | quinnjh 3 hours ago | |
the field is advancing so fast it's hard to do real science as their will be a new SOTA by the time you're ready to publish results. i think this is a combination of that and people having a laugh. Would you mind sharing which benchmarks you think are useful measures for multimodal reasoning? | ||
| ▲ | techpression 2 hours ago | parent [-] | |
A benchmark only tests what the benchmark is doing, the goal is to make that task correlate with actually valuable things. Graphic benchmarks is a good example, extremely hard to know what you will get in a game by looking at 3D Mark scores, it varies by a lot. Making a SVG of a single thing doesn’t help much unless that applies to all SVG tasks. | ||