Remix.run Logo
input_sh 4 hours ago

It's better on a benchmark I've never heard of!? That is groundbreaking, I'm switching immediately!

modeless 4 hours ago | parent [-]

I also wasn't that familiar with it, but the Opus 4.6 announcement leaned pretty heavily on the TerminalBench 2.0 score to quantify how much of an improvement it was for coding, so it looks pretty bad for Anthropic that OpenAI beat them on that specific benchmark so soundly.

Looking at the Opus model card I see that they also have by far the highest score for a single model on ARC-AGI-2. I wonder why they didn't advertise that.

input_sh 4 hours ago | parent [-]

No way! Must be a coinkydink, no way OpenAI knew ahead of time that Anthropic was gonna put a focus on that specific useless benchmark as opposed to all the other useless benchmarks!?

I'm firing 10 people now instead of 5!