| ▲ | mpoteat 4 hours ago | |
This is a LLM directly, purposefully lying, i.e. telling a user something it knows not to be true. This seems like a cut-and-dry Trust & Safety violation to me. It seems the LLM is given conflicting instructions: 1. Don't reference memory without explicit instructions 2. (but) such memory is inexplicably included in the context, so it will inevitably inform the generation 3. Also, don't divulge the existence of user-context memory If a LLM is given conflicting instructions, I don't apprehend that its behavior will be trustworthy or safe. Much has been written on this. | ||