| ▲ | Metacelsus 6 hours ago | |||||||
According to benchmarks in the announcement, healthily ahead of Claude 4.6. I guess they didn't test ChatGPT 5.3 though. Google has definitely been pulling ahead in AI over the last few months. I've been using Gemini and finding it's better than the other models (especially for biology where it doesn't refuse to answer harmless questions). | ||||||||
| ▲ | CuriouslyC 4 hours ago | parent | next [-] | |||||||
Google is way ahead in visual AI and world modelling. They're lagging hard in agentic AI and autonomous behavior. | ||||||||
| ▲ | throwup238 5 hours ago | parent | prev | next [-] | |||||||
The general purpose ChatGpt 5.3 hasn’t been released yet, just 5.3-codex. | ||||||||
| ▲ | neilellis 5 hours ago | parent | prev | next [-] | |||||||
It's ahead in raw power but not in function. Like it's got the worlds fast engine but one gear! Trouble is some benchmarks only measure horse power. | ||||||||
| ||||||||
| ▲ | scarmig 2 hours ago | parent | prev | next [-] | |||||||
> especially for biology where it doesn't refuse to answer harmless questions Usually, when you decrease false positive rates, you increase false negative rates. Maybe this doesn't matter for models at their current capabilities, but if you believe that AGI is imminent, a bit of conservatism seems responsible. | ||||||||
| ▲ | Davidzheng 5 hours ago | parent | prev | next [-] | |||||||
I gather that 4.6 strengths are in long context agentic workflows? At least over Gemini 3 pro preview, opus 4.6 seems to have a lot of advantages | ||||||||
| ||||||||
| ▲ | nkzd 4 hours ago | parent | prev | next [-] | |||||||
Google models and CLI harness feels behind in agentic coding compared OpenAI and Antrophic | ||||||||
| ▲ | simianwords 5 hours ago | parent | prev [-] | |||||||
The comparison should be with GPT 5.2 pro which has been used successfully to solve open math problems. | ||||||||