| ▲ | gertlabs 3 hours ago | |
The strongest model we've benchmarked on our comprehensive, little known, and difficult to game benchmark, is still Claude Opus 4.5 for agentic workflows. That's not a typo. Interpret that how you will, but if Anthropic had to take cost/resource savings measures after the last major release, less than 6 months ago, it's unlikely they have the economics to offer what Mythos is promised to be, at any sort of product scale. But I agree, it would be great to get stronger models and start securing all the junk on the web. Of course, that requires maintainers to know how to use these tools. Benchmarks at https://gertlabs.com/?agentic=all | ||