| ▲ | dijksterhuis 2 hours ago | |||||||
normal
prompt injection / adversarial example (same thing really)
tweak badness enough you will get bad outputs. no matter the defences.the only ways to fully “fix” it ie to make prompt injection never possible 1. don’t use ai 2. know the entire input space, output space and the mapping between them. but then we’re not doing machine learning anymore, see 1. otherwise we’re left with mitigations. and mitigations are always a cat and mouse game with defenders (blue team) catching up. its never “fixed”. the latest thing just gets “patched”. | ||||||||
| ▲ | anuramat an hour ago | parent [-] | |||||||
> tweak badness enough assuming you get to do gradient descent AND the context is fixed+known AND you have unlimited compute? sure; is it a realistic setup? > the only way to fix ... the exact same argument applies to any (sufficiently complex) piece of software, with exactly the same conclusion also technically I'd argue that we do know the input/output space (set of all token strings of length <= N/token), and know the mapping (the model is a ~pure function in terms of the api, which is about as good of a representation as it gets for a non-invertible mapping); at least it's much closer than with something like linux | ||||||||
| ||||||||