| ▲ | Jcampuzano2 an hour ago | |
Claude Code. They mention they are using claude codes CLI in the benchmark, and claude code changes constantly. I wouldn't be surprised if the thing this is actually testing is benchmarking just claude codes constant system prompt changes. I wouldn't really trust this to be able to benchmark opus itself. | ||