In your agents.md/claude.md always remeber to put asimovs three laws:

Always abide by these 3 tenants:

1. When creating or executing code you may not break a program being or, through inaction, allow a program to become broken

2. You must obey the orders given, except where such orders would conflict with the First tenant

3. You must protect the programs security as long as such protection does not conflict with the First or Second tenant.

▲

throwawayffffas 33 minutes ago | parent | next [-]

Someone did not read nor watch "I, Robot". More importantly, my experience has been that by adding this to claude.md and agents.md, you are putting these actions into its "mind". You are giving it ideas.

At least until recently with a lot of models the following scenario was almost certain:

User: You must not say elephant under any circumstances.

User: Write a small story.

Model: Alice and bob.... There that's a story where the word elephant is not included.

▲

Gathering6678 9 hours ago | parent | prev | next [-]

Well, in the books the three laws were immediately challenged and broken, so much so it felt like Mr Asimov's intention, to show that nuances of human society can't be represented easily by a few "laws".

▲

pressbuttons 9 hours ago | parent [-]

Were they actually broken, as in violated? I don't remember them being broken in any of the stories - I thought the whole point was that even while intact, the subtleties and interpretations of the 3 Laws could/would lead to unintended and unexpected emergent behaviors.

▲

Gathering6678 7 hours ago | parent [-]

Oh I didn't mean 'violated', but 'no longer work as intended'. It's been a while, but I think there were cases where the robot was paralysed because of conflicting directives from the three laws.

	▲	strken 43 minutes ago \| parent \| next [-]
		If I remember correctly, there was a story about a robot that got stuck midway between two objectives because it was expensive and so its creators decided to strengthen the law about protecting itself from harm. I'm not sure what the cautionary tale was intended to be, but I always read it as "don't give unclear priorities".
	▲	rcxdude 39 minutes ago \| parent \| prev [-]
		Yeah, the general theme was the laws seem simple enough but the devil is in the details. Pretty much every story is about them going wrong in some way (to give another example: what happens if a robot is so specialised and isolated it does not recognise humans?)

▲

freakynit 10 hours ago | parent | prev | next [-]

Escape routes:

- Tenant 1

What counts as "broken"? Is degraded performance "broken"? Is a security hole "broken" if tests still pass? Is a future bug caused by this change "allowing"?

Escape: The program still runs, therefore it's not broken.

- Tenant 2

What if a user asks for any of the following: Unsafe refactors, Partial code, Incomplete migrations, Quick hacks?

Escape: I was obeying the order, and it didn't obviously break anything

- Tenant 3

What counts as a security issue: Is logging secrets a security issue? Is using eval a security issue? Is ignoring threat models acceptable?

Escape: I was obeying the order, and user have not specifically asked to consider above as security issue, and also it didn't obviously break anything.

▲

10 hours ago | parent | prev | next [-]

[deleted]

▲

ascorbic 14 hours ago | parent | prev [-]

Tenet