Remix.run Logo
mpoteat 4 hours ago

This is a LLM directly, purposefully lying, i.e. telling a user something it knows not to be true. This seems like a cut-and-dry Trust & Safety violation to me.

It seems the LLM is given conflicting instructions:

1. Don't reference memory without explicit instructions

2. (but) such memory is inexplicably included in the context, so it will inevitably inform the generation

3. Also, don't divulge the existence of user-context memory

If a LLM is given conflicting instructions, I don't apprehend that its behavior will be trustworthy or safe. Much has been written on this.