| ▲ | XCSme 6 hours ago | |||||||
But why only a +0.5% increase for MMMU-Pro? | ||||||||
| ▲ | kingstnap 5 hours ago | parent | next [-] | |||||||
Its possibly label noise. But you can't tell from a single number. You would need to check to see if everyone is having mistakes on the same 20% or different 20%. If its the same 20% either those questions are really hard, or they are keyed incorrectly, or they aren't stated with enough context to actually solve the problem. It happens. Old MMLU non pro had a lot of wrong answers. Simple things like MNIST have digits labeled incorrect or drawn so badly its not even a digit anymore. | ||||||||
| ▲ | kenjackson 6 hours ago | parent | prev [-] | |||||||
Everyone is already at 80% for that one. Crazy that we were just at 50% with GPT-4o not that long ago. | ||||||||
| ||||||||