| ▲ | spongebobstoes 2 hours ago | |
this has reasoning disabled everywhere, making it a pretty bad benchmark. the argument given is that's the "default consumer experience" that might be generally true, but I think chatgpt has reasoning enabled for free accounts. regardless, reasoning is the state of the art, and disabling it reduces the value of this research to predict the future it's also not clear if this is using the API or the product model, when both exist. they behave differently lastly, the actual model details are very much buried. I am relieved to see opus 4.8 and chatgpt 5.5 were used, but this information should be presented more clearly. a brand is not a model, and models change quickly | ||