Remix.run Logo
mustaphah 5 days ago

I speculate something similar (or even worse) is going on with Terminal-Bench [1].

Like, seriously, how come all these agents are beating Claude Code? In practice, they are shitty and not even close. Yes. I tried them.

[1] https://www.tbench.ai/leaderboard

cma 5 days ago | parent | next [-]

Claude code was severely degraded the last few weeks, very simple terminal prompts were failing for me that it never had problems with.

giveita 5 days ago | parent [-]

Follow the money. Or how much comes from your pocket vs. VC and big tech speculators.

cma 5 days ago | parent [-]

They did a big fundraising round right after so it's easy to suspect they were manipulating profitability growth for it.

Bolwin 5 days ago | parent | prev [-]

They're all using claude so idk. Claude code is just a program, the magic is mainly in the model