| ▲ | bluegatty an hour ago | |
You will immediately notice the difference if you use it at the threshold. It's like most people just watching a 'starting nba player' (not superstar, but just starting player) vs one that sits on the bench. If you were to just watching them play, work out, shoot - you'd never notice the difference. Put them head to head and it's 98-54 and you start to see the patterns. It's pretty interesting actually, someone tell me what the 'science' for this is, I'm sure there is some kind of information theory at work here. Software has innumerable kinds of problems at varying level of complexity and so it provides the perfect testbed for seeing how far models can go in practice. Should add: you're very right to hint that harness, tooling, and models tuned o both the harness and he kinds of things people do on the harness, as well as some other things do make enormous difference. Bu and large, SOTA Codex/Claude Code are substantially better - at least for now. That may change. | ||