| ▲ | nsainsbury 3 hours ago | |
I used to also believe along these lines but lately I'm not so sure. I'm honestly shocked by the latest results we're seeing with Gemini 3 Deep Think, Opus 4.6, and Codex 5.3 in math, coding, abstract reasoning, etc. Deep Think just scored 84.6% on ARC-AGI-2 (https://deepmind.google/models/gemini/)! And these benchmarks are supported by my own experimentation and testing with these models ~ specifically most recently with Opus 4.6 doing things I would have never thought possible in codebases I'm working in. These models are demonstrating an incredible capacity for logical abstract reasoning of a level far greater than 99.9% of the world's population. And then combine that with the latest video output we're seeing from Seedance 2.0, etc showing an incredible level of image/video understanding and generation capability. I was previously deeply skeptical that the architecture we have would be sufficient to get us to AGI. But my belief in that has been strongly rattled lately. Honestly I think the greatest gap now is simply one of orchestration, data presentation, and work around in-context memory representations - that is, converting work done into real world into formats/representations, etc. amenable for AI to run on (text conversion, etc.) and keeping new trained/taught information in context to support continual learning. | ||
| ▲ | 9x39 3 hours ago | parent | next [-] | |
>These models are demonstrating an incredible capacity for logical abstract reasoning of a level far greater than 99.9% of the world's population. This is the key I think that Altman and Amodei see, but get buried in hype accusations. The frontier models absolutely blow away the majority of people on simple general tasks and reasoning. Run the last 50 decisions I've seen locally through Opus 4.6 or ChatGPT 5.2 and I might conclude I'd rather work with an AI than the human intelligence. It's a soft threshold where I think people saw it spit out some answers during the chat-to-LLM first hype wave and missed that the majority of white collar work (I mean it all, not just the top software industry architects and senior SWEs) seems to come out better when a human is pushed further out of the loop. Humans are useful for spreading out responsibility and accountability, for now, thankfully. | ||
| ▲ | lostmsu 3 hours ago | parent | prev [-] | |
While I think 99.9% is overstating it, I can believe that number is strictly more than 1% at this point. | ||