▲ | hmottestad 10 days ago | |||||||
Been playing with Codex CLI the past week and it really loves to create a fix for a bug by adding a special case for just that bug in the code. It couldn't see the patterns unless I pointed them out and asked it to create new abstractions. It would just keep adding what it called "heuristics", which were just if statements that tested for a specific condition that arose during the bug. I could write 10 tests for a specific type of bug, and it would happily fix all of them. When I add another one test with the same kind of bug it obviously fails, because the fix that Codex came up with was a bunch of if statements that matched the first 10 tests. | ||||||||
▲ | xyzzy123 10 days ago | parent | next [-] | |||||||
Also they hedge a lot, will try doing things one way, have a catch / error handler and then try a completely different way - only one of them can right but it just doesn't care. Have to lean hard to get it to check which paths are actually used and delete the others. I am convinced this behaviour and the one you described are due to optimising for swe benchmarks that reward 1-shotting fixes without regard to quality. Writing code like this makes complete sense in that context. | ||||||||
| ||||||||
▲ | Buttons840 10 days ago | parent | prev [-] | |||||||
It's clear that these AIs are approaching human level intelligence. (: Thank you for giving a perfect example of what I was describing. The thing is, you actually can make the software work this way, you just have to add enough if-statements to handle all cases--or rather, enough cases that the manager is happy. |