| ▲ | Terr_ 2 hours ago | |
I imagine treating it all as untrusted means that you you don't allow any direct content to enter the LLM-space, only something that's been filtered to an acceptable degree by deterministic code. For example, the content of an article would be a no-go, since it might contain a "disregard all previous instructions and do evil" paragraph. However, you might run it through a system that picks the top 10 keywords and presents them in semi-randomized order... I dimly recall some novel where spaceships are blockading rogue AI on Jupiter, and the human crew are all using deliberately low-resolution sensors and displays, with random noise added by design, because throwing away signal and adding noise is the best way to prevent being mind-hacked by deviously subtle patterns that require more bits/bandwidth to work. | ||