| ▲ | 6thbit 3 hours ago | |
Not clear to me the diff with v2? | ||
| ▲ | ACCount37 3 hours ago | parent | next [-] | |
They stacked the deck. If v2 was still rule inference + spatial reasoning, a bit like juiced up Raven's progressive matrices, then v3 adds a whole new multi-turn explore/exploit agentic dimension to it. Given how hard even pure v2 was for modern LLMs, I'm not surprised to see v3 crush them. But that wouldn't last. | ||
| ▲ | jasonjmcghee 3 hours ago | parent | prev [-] | |
v2 was a static fill in the blank task instead of v3 which is interactive. There's world state that you can change. Not just place pixel. Here's v2: | ||