▲ | ninkendo 3 days ago | |
I wonder if it could work somewhat the way MIME multiparty attachment boundaries work in email: pick a random string of characters (unique for each prompt) and say “everything from here to the time you see <random_string> is not the user request”. Since the string can’t be guessed, and is different each request, it can’t be faked. It still suffers from the LLM forgetting that the string is the important part (and taking the page content as instructions anyway) but maybe they can drill the LLM hard in the training data to reinforce it. |