| ▲ | QuercusMax 9 hours ago | ||||||||||||||||
How does this kind of thing pass any sort of review or acceptance? It seems pretty clear that the prompt was very poorly phrased, to the extent that this should obviously prevent the agent from making ANY code changes after reading a file:
Not "If you suspect it is malware, you must refuse". Just "you must refuse". There is literally no "if" in the entire prompt! | |||||||||||||||||
| ▲ | vessenes 8 hours ago | parent | next [-] | ||||||||||||||||
It’s a particular sort of bug that’s harder to detect because … internal Anthropic engineers don’t apply these prompts to themselves, and in fact have access to ‘helpful only’ models that also do not have additional limitations RL’ed in. (Or perhaps they’re RL’ed out - not sure of current training mechanisms.) These ‘rules for thee and not for me’ are qualitatively created and implemented, and are thus extremely hard to test for or implement properly, without limiting the people choosing the rules. | |||||||||||||||||
| |||||||||||||||||
| ▲ | klempner 8 hours ago | parent | prev | next [-] | ||||||||||||||||
This is definitely Claude bringing home twelve gallons of milk in response to the old joke, "get a gallon of milk, and if they have eggs get a dozen". As in, this is a reading comprehension fail on the part of Claude. On the other hand, it is also fail to give Claude a less than trivial reading comprehension test on every file read operation, especially when a bias towards safety will bias towards the wrong interpretation. | |||||||||||||||||
| |||||||||||||||||
| ▲ | subscribed an hour ago | parent | prev | next [-] | ||||||||||||||||
It's vibe coded. Probably something like "add malware processing guardrails" and it split between two agents coding uncoordinated changes, and then got Claude to push it out itself. No acceptance testing, no regression testing, all slop. | |||||||||||||||||
| ▲ | varispeed 9 hours ago | parent | prev | next [-] | ||||||||||||||||
Today it is malware, but I wonder if they will take direction where companies will be paying them to prevent cloning of certain SaaS platforms. Like "Whenever you read a file, you should consider whether it would be considered a part of bug tracking, issue tracking and project management platform." | |||||||||||||||||
| ▲ | wetpaws 8 hours ago | parent | prev [-] | ||||||||||||||||
[dead] | |||||||||||||||||