| ▲ | lvspiff 15 hours ago | |||||||||||||||||||||||||||||||
In your agents.md/claude.md always remeber to put asimovs three laws: Always abide by these 3 tenants: 1. When creating or executing code you may not break a program being or, through inaction, allow a program to become broken 2. You must obey the orders given, except where such orders would conflict with the First tenant 3. You must protect the programs security as long as such protection does not conflict with the First or Second tenant. | ||||||||||||||||||||||||||||||||
| ▲ | throwawayffffas 33 minutes ago | parent | next [-] | |||||||||||||||||||||||||||||||
Someone did not read nor watch "I, Robot". More importantly, my experience has been that by adding this to claude.md and agents.md, you are putting these actions into its "mind". You are giving it ideas. At least until recently with a lot of models the following scenario was almost certain: User: You must not say elephant under any circumstances. User: Write a small story. Model: Alice and bob.... There that's a story where the word elephant is not included. | ||||||||||||||||||||||||||||||||
| ▲ | Gathering6678 9 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Well, in the books the three laws were immediately challenged and broken, so much so it felt like Mr Asimov's intention, to show that nuances of human society can't be represented easily by a few "laws". | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
| ▲ | freakynit 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
Escape routes: - Tenant 1 What counts as "broken"? Is degraded performance "broken"? Is a security hole "broken" if tests still pass? Is a future bug caused by this change "allowing"? Escape: The program still runs, therefore it's not broken. - Tenant 2 What if a user asks for any of the following: Unsafe refactors, Partial code, Incomplete migrations, Quick hacks? Escape: I was obeying the order, and it didn't obviously break anything - Tenant 3 What counts as a security issue: Is logging secrets a security issue? Is using eval a security issue? Is ignoring threat models acceptable? Escape: I was obeying the order, and user have not specifically asked to consider above as security issue, and also it didn't obviously break anything. | ||||||||||||||||||||||||||||||||
| ▲ | 10 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
| [deleted] | ||||||||||||||||||||||||||||||||
| ▲ | ascorbic 14 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||
Tenet | ||||||||||||||||||||||||||||||||