| ▲ | locknitpicker 4 hours ago |
| > My team and I are firm that we are the ones accountable. LLMs are a tool like every other. Except it is definitely not. LLMs alone have highly non-deterministic even at a high-level, where they can even pursuit goals contrary to the user's prompts. Then, when introduced in ReAct-type loops and granted capabilities such as the ability to call tools then they are able to modify anything and perform all sorts of unexpected actions. To make matters worse, nowadays models not only have the ability to call tools but also to generate code on the fly whatever ad-hoc script they want to run, which means that their capabilities are not limited to the software you have installed in your system. This goes way beyond "regular tool" territory. |
|
| ▲ | keerthiko 3 hours ago | parent | next [-] |
| I think you are misinterpreting gp as saying "LLMs are a tool [like every other tool]" to mean "LLMs have similar properties to other tools" — when I believe they meant "LLMs are a tool. other tools are also tools," where the operative implication of "tool" is not about scope of capabilities or how deterministic its output is (these aren't defining properties of the concept of "tool"), but the relationship between 'tool' and 'operator': - a tool is activated with operator intent (at some point in the call-chain) - the operator is accountable for the outcomes of activating the tool, intended or otherwise The capabilities and the abilities of a tool to call sub-tools is only relevant insofar as expressing how much larger the scope of damage and surface area of accountability is with a new generation of tools. This is not that different than past technological leaps. When a US bomber dropped a nuke in Hiroshima, the accountability goes up the chain to the war-time president giving the authorization to the military and air force to execute the mission — the scope of accountability of a single decision was way larger than supreme commanders had in prior wars. If the US government decides to deploy an LLM to decide who receives and who is denied healthcare coverage, social security payments, voting rights, or anything else, the head of internal affairs to authorize the use of that tool should be held accountable, non-determinism of the tool be damned. |
| |
| ▲ | locknitpicker 3 hours ago | parent [-] | | > - a tool is activated with operator intent (at some point in the call-chain) This again is where the simplistic assumption breaks down. Just because you can claim that a person kick started something, that does not mean that person is aware and responsible for all its doing. Let's put things in perspective: if you install a mobile app from the app store, are you responsible and accountable for every single thing the app does in your system? Because with LLMs and agents you have even less understanding and control and awareness of what they are doing. | | |
| ▲ | engeljohnb 2 hours ago | parent | next [-] | | >Just because you can claim that a person kick started something Kick started what? If you decided to give an LLM access to your database, it's completely on you when you when it does something you don't want. You should've known better. If all you "kickstart" is an LLM generating text that you can use however you decide, there will never be anything to worry about from the LLM. > Let's put things in perspective: if you install a mobile app from the app store, are you responsible and accountable for every single thing the app does in your system? Yes, and it bothers me that others don't feel the same. You vetted the app, you installed the app, and you gave it permission to do whatever on your system. Of course you're responsible. | | |
| ▲ | orphea an hour ago | parent [-] | | it bothers me that others don't feel the same
I bet these are the same people who don't admit they make mistakes; they are never wrong, something else is to blame. |
| |
| ▲ | BadBadJellyBean 3 hours ago | parent | prev | next [-] | | > if you install a mobile app from the app store, are you responsible and accountable for every single thing the app does in your system? Yes. I can try to vet the app to the best of your abilities and beyond that it's a tradeoff between how likely is it to cause harm and do the benefits outweigh these harms. Of course everyone is differently qualified to do this but my argument is more about professionals. Managers should know better than to blindly trust LLM companies. Engineers should take better care what they allow LLMs to do and what tools they give them. There is a difference between "I couldn't have known" and "I didn't know". You can know that LLMs are not trustworthy. You couldn't have know what they do but you already knew that trusting them blindly might be bad. You could know that giving a baby a razor blade is a bad idea. You can't know what exactly will happen but you might have a pretty good idea that it will probably be not good. | | |
| ▲ | 52-6F-62 2 hours ago | parent [-] | | Except what we have here is razor blade companies getting the government to heavily subsidize present razor blade production running massive advertising campaigns and intense intra-industry pressure to give said razor blades to babies under fear of losing your job or "falling behind" those not giving razor blades to babies. Let's not forget all the razor blade enthusiasts just screaming at you that you are using babies with razor blades wrong and that it works totally fine for them. |
| |
| ▲ | orphea an hour ago | parent | prev | next [-] | | that does not mean that person is aware and responsible for all its doing.
If they are unaware or - worse - don't understand what they are doing, maybe they shouldn't do the thing in the first place? | |
| ▲ | keerthiko 2 hours ago | parent | prev [-] | | There can be more than one person or entity to be held accountable, depending on the details of impact If I install a powerful/dangerous app, and I come under harm, I have some accountability — most of it if it's due to user error (eg: I install termux and `rm -rf /`). If it's malware, and Google/Apple approved said app to their store which is where I got it from, when their whole value proposition for walled-garden storefronts is protecting users, then they have significant accountability. If the app requests more permissions than necessary for stated goals, and/or intentionally harms users via misrepresentation or misdirection (malware), the app publisher should also be held accountable (by the storefront, legally, etc). I'm also unclear what angle you are arguing: are you stating that because tools have gotten so complicated that the end user may not understand how it all works, no one should be considered responsible or held accountable? Or that the tool (currently a non-entity) itself should be held accountable somehow? Or that no one other than the distributor of the tool should be accountable?* |
|
|
|
| ▲ | BadBadJellyBean 4 hours ago | parent | prev | next [-] |
| Then that is also on me for using a tool that I can't control. I don't run my LLMs in a way where they can just do things without me signing off on it. It's not nearly as fast as just letting it do it's thing but I kept it from doing stupid things so many times. Giving up control is a decision. The consequences of this decision are mine to carry. I can do my best to keep autonomous LLMs contained and safe but if I am the one who deploys them, then I am the one who is to blame if it fails. That's why I don't do that. |
| |
| ▲ | locknitpicker 3 hours ago | parent [-] | | > Then that is also on me for using a tool that I can't control. That's a core trait of LLMs. Even the AI companies developing frontier models felt the need to put together whole test suites purposely designed to evaluate a model's propensity to try to subvert the user's intentions. https://www.anthropic.com/research/shade-arena-sabotage-moni... > Giving up control is a decision. No, it is definitely not. Only recently did frontier models started to resort to generating ad-hoc scripts as makeshift tools. They even generate scripts to apply changes to source files. | | |
| ▲ | BadBadJellyBean 3 hours ago | parent [-] | | You seem to misunderstand me. An LLM can only spit out text. It is the tooling I use that allows it to write scripts and call them. In my tooling it waits for me to accept changes, call scripts or other tools that might change something. I can make that deterministic. I know that it will stop and ask because it has no choice. If I want to be safer I give it no tools at all. I can also just choose not to use an LLM. It is my choice to use them so it is my duty to keep myself safe. If I can't control that I'd be stupid to use them. My take is that I probably can use LLMs safely when I don't let it run autonomously. There is a slight chance that the LLM will generate a string that will cause a bug in an MCP that will let the LLM do what it wants. That is the risk I am going to take and I will take the blame if it goes wrong. |
|
|
|
| ▲ | dpoloncsak 4 hours ago | parent | prev [-] |
| Isn't the next sentence there literally 'Only that it's non deterministic'? |