| ▲ | cadamsdotcom a day ago | |
API serving already sanitised the role boundary tokens so you can’t submit them. But what if the techniques applied to get Golden Gate Claude were applied instead of a role-boundary marker? Then the model would “know” where input is coming from - because the vector that’s being applied for the current role is putting it in a different area of latent space.. and the vector could have sufficient amplitude to prevent any coercive instructions pulling it back to some other place. Or am I misunderstanding what Golden Gate Claude was doing? | ||