▲ | mustaphah 5 days ago | ||||||||||||||||
I speculate something similar (or even worse) is going on with Terminal-Bench [1]. Like, seriously, how come all these agents are beating Claude Code? In practice, they are shitty and not even close. Yes. I tried them. | |||||||||||||||||
▲ | cma 5 days ago | parent | next [-] | ||||||||||||||||
Claude code was severely degraded the last few weeks, very simple terminal prompts were failing for me that it never had problems with. | |||||||||||||||||
| |||||||||||||||||
▲ | Bolwin 5 days ago | parent | prev [-] | ||||||||||||||||
They're all using claude so idk. Claude code is just a program, the magic is mainly in the model |