Remix.run Logo
samuelknight 8 hours ago

I don't know about Mythos but the chart understates the capability of the current frontier models. GPT and Claude models available today are capable of Web app exploits, C2, and persistence in well under 10M tokens if you build a good harness.

The benchmark might be a good apples-to-apples comparison but it is not showing capability in an absolute sense.