| ▲ | Esophagus4 3 days ago | |
Jeez... this seems like another condescending HN comment that uses "source?" to discredit and demean rather than to seek genuine insight. The commenter told you they suspect they save time, it seems like taking their experience at face value is reasonable here. Or, at least I have no reason to jump down their throat... the same way I don't jump down your throat when you say, "these tools are a waste of time in my experience." I assume that you're smart enough to have tested them out thoroughly, and I give you the benefit of the doubt. If you want to bring up METR to show that they might be falling into the same trap, that's fine, but you can do that in a much less caustic way. But by the way, METR also used Cursor Pro and Claude 3.5/3.7 Sonnet. Cursor had smaller context windows than today's toys and 3.7 Sonnet is no longer state of the art, so I'm not convinced the paper's conclusions are still as valid today. The latest Codex models are exponential leaps ahead of what METR tested, by even their own research.[1] [1]https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com... | ||