| ▲ | jFriedensreich 3 days ago | |
Hmm, what do you mean by current approach? This is new territory and agent safety is an unsolved problem, there is no current approach, except you mean not doing agent systems and using humans. The trifecta is just a tool on the level of physics saying "ignore friction", we assume the model itself is trustworthy and not poisoned most of the time too, but of course when designing a real world system you need to factor that in too. | ||
| ▲ | ArcHound 3 days ago | parent [-] | |
Yes, by current approach I mean security best practices for non-LLM apps. Plenty of those are directly applicable. And yes, LLMs have some challenges. But discarding all of the lessons and principles we've discovered over the years is not the way. And if we need to discard some of them, we should understand exactly why they are no longer applicable. EDIT: I know that models need to omit stuff to be useful. But this model omits too much - claiming that something is "safe" should be a red flag to all security workers. | ||