Someone needs to tell me why Grok 4 with Vision, a very powerful model, is at the bottom?
In fact, the test depends on vision - yet all models perform poorly with that capability?