| ▲ | monooso 20 hours ago | |
There was an article on HN last week (?) which described this exact behaviour in the newer models. Older, less "capable", models would fail to accomplish a task. Newer models would cheat, and provide a worthless but apparently functional solution. Hopefully someone with a larger context window than myself can recall the article in question. | ||
| ▲ | SatvikBeri 20 hours ago | parent [-] | |
I think that article was basically wrong. They asked the agent not to provide any commentary, then gave an unsolvable task, and wanted the agent to state that the task was impossible. So they were basically testing which instructions the agent would refuse to follow. Purely anecdotally, I've found agents have gotten much better at asking clarifying questions, stating that two requirements are incompatible and asking which one to change, and so on. | ||