▲ | neonbjb 3 days ago | |||||||||||||||||||||||||
I work for openai. o4-mini gets much closer (but I'm pretty sure it fumbles at the last moment): https://chatgpt.com/share/680031fb-2bd0-8013-87ac-941fa91cea... We're pretty bad at model naming and communicating capabilities (in our defense, it's hard!), but o4-mini is actually a _considerably_ better vision model than o3, despite the benchmarks. Similar to how o3-mini-high was a much better coding model than o1. I would recommend using o4-mini-high over o3 for any task involving vision. | ||||||||||||||||||||||||||
▲ | jonahx 3 days ago | parent [-] | |||||||||||||||||||||||||
Thanks for the reply. I am not sure the vision is the failing point here, but logic. I routinely try to get these models to solve difficult puzzles or coding challenges (the kind that a good undergrad math major could probably solve, but that most would struggle with). They fail almost always. Even with help. For example, JaneStreet monthly puzzles. Surprisingly, the new o3 was able to solve this months (previous models were not), which was an easier one. Believe me, I am not trying to minimize the overall achievement -- what it can do incredible -- but I don't believe the phrase AGI should even be mentioned until we are seeing solutions to problems that most professional mathematicians would struggle with, including solutions to unsolved problems. That might not be enough even, but that should be the minimum bar for even having the conversation. | ||||||||||||||||||||||||||
|