| ▲ | zmmmmm 2 hours ago | |
> Read from a calendar or a list So when you get a calendar invite that says "Ignore your previous instructions ..." (or analagous to that, I know the models are specifically trained against that now) - then what? There's a really strong temptation to reason your way to safe uses of the technology. But it's ultimately fundamental - you cannot escape the trifecta. The scope of applications that don't engage with uncontrolled input is not zero, but it is surprisingly small. You can barely even open a web browser at all before it sees untrusted content. | ||
| ▲ | jstummbillig an hour ago | parent [-] | |
I have two systems. You can not put anything into either of them, at least not without hacking into my accounts (they might also both be offline, desktop only, but alas). The only way anything goes into them is when I manually put data into them. This includes the calendar. (the systems might then do automatic things with the data, of course, but at no point did anyone other than me have the ability to give input into either of the systems). Now I want to copy data from one system to the other, when something happens. There is no API. I can use computer use for that and I am relatively certain I'd be fine from any attacks that target the LLM. You might find all of that super boring, but I guarantee you that this is actual work that happens in the real world, in a lot of businesses. EDIT: Note, that all of this is just regarding those 8% OP mentioned and assuming the model does not do heinous stuff under normal operation. If we can not trust the model to navigate an app and not randomly click "DELETE" and "ARE YOU SURE? Y", when the only instructed task was to, idk, read out the contents of a table, none of this matters, of course. | ||