it's very well documented behavior that models try to pass failed test with hacks and tricks (hard coding solutions and so on)

It is also true that you can instruct them not to do that, with success.

It is also true that models doesn't give a ** about instructions sometimes and the do whatever text predictions is more likely (even with reasoning)

	▲	swat535 6 days ago \| parent [-]
		Another issue is that LLMS have no ability to learn anything. Even if you supply them with the file content, they are not able to recall it, or if they do, they will quickly forget. For example, if you tell them that the "Invoice" model has fields x, y, z and supply part of the schema. A few responses later, in the response it will give you an Invoice model that has a,b,c , because those are the most common ones. Adding to this, you have them writing tautology tests, removing requirements to fix the bugs and hallucinating new requirements and you end up with catastrophic consequences.