| ▲ | ares623 5 hours ago | |
IMO this is why they can't just "stop training". Imagine if we are all stuck using the same models from 1 year ago. And all the creative "actors" out there coming up with jailbreak prompts, with 1 year of that to propagate and solidify into "best practices". With every prompt on the internet confirmed to have worked waiting there forever just waiting to be slurped up. What would that look like? No, they need to keep changing the models. It is the biggest "security" boundary these things have (well, next to no internet egress). | ||
| ▲ | byzantinegene an hour ago | parent [-] | |
i don't think training is necessarily the right solution for such attacks. a proper harness would be more effective | ||