| ▲ | HarHarVeryFunny 4 hours ago | |||||||||||||
I'm suspect on how much of a coding advance it will be. Seems odd that their announcement has zero coding benchmarks, with the closest related thing being terminal bench. | ||||||||||||||
| ▲ | hereme888 3 hours ago | parent | next [-] | |||||||||||||
Tracking model performance on Artificial Analysis makes me think these models are constantly optimized/tuned in some way or another. GPT 5.5 was scoring in the mid 60's when it was first released, now it's almost 10 points higher. | ||||||||||||||
| ▲ | jdw64 4 hours ago | parent | prev | next [-] | |||||||||||||
Maybe I'll know once I try it? Honestly, for small functions or methods, I don't think there's a huge difference between models. But the larger the code gets, the more noticeable the difference seems to be. Personally, I think this kind of coding experience varies from person to person | ||||||||||||||
| ▲ | vanuatu 4 hours ago | parent | prev | next [-] | |||||||||||||
sadly with all the labs benchmaxxing I feel like you just have to try the model for a while to really evaluate how good it is, especially for each individual use case | ||||||||||||||
| ▲ | MangoCoffee 2 hours ago | parent | prev | next [-] | |||||||||||||
>zero coding benchmarks "What gets measured gets managed" | ||||||||||||||
| ▲ | artursapek 4 hours ago | parent | prev [-] | |||||||||||||
They claim extreme performance on ExploitBench, which Mythos was touted as being incredible at. https://x.com/OpenAI/status/2070555278576439306 | ||||||||||||||
| ||||||||||||||