| ▲ | NitpickLawyer 3 days ago | |
> Across all three evals, GPT‑5.5 improves on GPT‑5.4’s scores while using fewer tokens. Yeah, this was the next step. Have RLVR make the model good. Next iteration start penalising long + correct and reward short + correct. > CyberGym 81.8% Mythos was self reported at 83.1% ... So not far. Also it seems they're going the same route with verification. We're entering the era where SotA will only be available after KYC, it seems. | ||
| ▲ | toraway 3 days ago | parent | next [-] | |
Isn't Mythos limited to a selected group of companies/organizations Anthropic chose themselves? If the OpenAI announcement for GPT-5.5 is accurate the "trusted cyber access" just requires an open, seemingly straightforward identity verification step. https://openai.com/index/scaling-trusted-access-for-cyber-de...
"GPT‑5.4‑Cyber" is something else and apparently needs some kind of special access, but that CyberGym benchmark result seems to apply to the more or less open GPT-5.5 model that was just released. | ||
| ▲ | cbg0 3 days ago | parent | prev | next [-] | |
Isn't CyberGym an open benchmark so trivial to benchmaxx anyway? | ||
| ▲ | mattas 3 days ago | parent | prev [-] | |
Not good for employees that are being measured by their token usage. | ||