Remix.run Logo
JohnRoseDev 3 hours ago

I can’t help but think that these benchmarks are completely fake. Sam even posted a benchmark on X a couple days ago of how the ‘complete version’ of 5.5 cyber was already ahead of Mythos apparently. This just feels like absolutely fake nonsense. The impact of Mythos on the industry was clear and in front of everyone’s eyes. The amount of vulnerabilities Mozilla fixed. The vulnerabilities and exploits Anthropic showcased in that blog post about the chrome sandbox escape etc. And now we’re supposed to believe this 5.5 cyber is already ahead of Mythos, ok. And yeah, gpt 5.6 is even further ahead, alright.

brookst 2 hours ago | parent [-]

Well if they are posting fraudulent benchmarks, that's a good sign to invest in their IPO. It's pure downside protection: IPO does well, profit. IPO does poorly, concrete evidence of pre-IPO fraud.

I personally don't think it's likely that OpenAI would post completely fake numbers in this pre-IPO period, but if you do, this is an opportunity.

JohnRoseDev an hour ago | parent [-]

I’m not an open-ai hater by default, I’m just waiting for someone to please explain how an incremental 5.5 cyber version is supposed to be already ahead of the flagship mythos model that’s been shaking up the software industry for a few months. If OpenAI had these supposed better-than-mythos 5.5 capabilities in their hands internally for some time now, why didn’t they make anything out of it in this era where everybody is desperate for any good press they can get?