| ▲ | dcre 2 days ago | ||||||||||||||||
Counterpoint: no, they're not. The test in the article is very silly. | |||||||||||||||||
| ▲ | vidarh a day ago | parent | next [-] | ||||||||||||||||
This springs to mind: "On two occasions I have been asked, – "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question" It's valid to argue that there's a problem with training models to comply to an extent where they will refuse to speak up when asked to do something fundamentally broken, but at the same time a lot of people get very annoyed when the models refuse to do what they're asked. There is an actual problem here, though, even if part of the problem is competing expectations of refusal. But in this case, the test is also a demonstration of exactly how not to use coding assistants: Don't constrain them in ways that create impossible choices for them. I'd guess (I haven't tested) that you'd have decent odds of getting better results even just pasting the error message into an agent than adding stupid restrictions. And even better if you actually had a test case that verified valid output. (and on a more general note, my experience is exactly the opposite of the writer's two first paragraphs) | |||||||||||||||||
| ▲ | InsideOutSanta 2 days ago | parent | prev | next [-] | ||||||||||||||||
How is it silly? I've observed the same behavior somewhat regularly, where the agent will produce code that superficially satisfies the requirement, but does so in a way that is harmful. I'm not sure if it's getting worse over time, but it is at least plausible that smarter models get better at this type of "cheating". A similar type of reward hacking is pretty commonly observed in other types of AI. | |||||||||||||||||
| |||||||||||||||||
| ▲ | amluto a day ago | parent | prev | next [-] | ||||||||||||||||
Is it? This week I asked GPT-5.2 to debug an assertion failure in some code that worked on one compiler but failed on a different compiler. I went through several rounds of GPT-5.2 suggesting almost-plausible explanations, and then it modified the assertion and gave a very confident-sounding explanation of why it was reasonable to do so, but the new assertion didn’t actually check what the old assertion checked. It also spent an impressive of time arguing, entirely incorrectly and based in flawed reasoning that I don’t really think it found in its training set, as to why it wasn’t wrong. I finally got it to answer correctly by instructing it that it was required to identify the exact code generation difference that caused the failure. I haven’t used coding models all that much, but I don’t think the older ones would have tried so hard to cheat. This is also consistent with reports of multiple different vendors’ agents figuring out how to appear to diagnose bugs by looking up the actual committed fix in the repository. | |||||||||||||||||
| |||||||||||||||||
| ▲ | terminalbraid a day ago | parent | prev | next [-] | ||||||||||||||||
The strength of argument you're making reminds me of an onion headline. https://theonion.com/this-war-will-destabilize-the-entire-mi... "This War Will Destabilize The Entire Mideast Region And Set Off A Global Shockwave Of Anti-Americanism vs. No It Won’t" | |||||||||||||||||
| |||||||||||||||||
| ▲ | foxglacier a day ago | parent | prev [-] | ||||||||||||||||
Yes. He's asking it to do something impossible then grading the responses - which must always be wrong - according to his own made-up metric. Somehow a program to help him debug it is a good answer despite him specifying that he wanted it to fix the error. So that's ignoring his instructions just as much as the answer that simply tells him what's wrong, but the "worst" answer actually followed his instructions and wrote completed code to fix the error. I think he has two contradictory expectations of LLMs: 1) Take his instructions literally, no matter how ridiculous they are. 2) Be helpful and second guess his intentions. | |||||||||||||||||
| |||||||||||||||||