▲ | BriggyDwiggs42 12 hours ago | |
I’ll take the example as an example of an LLM initiating harmful behavior in general and admit that such a thing is perfectly possible. I think the issue is down to the degree to which preventing such initiation impinges on the agency of the user, and I don’t think that requests for information should be refused because it’s lots of imposition for very little gain. I’m perfectly alright with conditioning/prompting the model not to readily jump into serious, potentially harmful targets without the direct request of the user. |