Remix.run Logo
BadBadJellyBean 4 hours ago

My team and I are firm that we are the ones accountable. LLMs are a tool like every other. Only that it's non deterministic. But I am the one using the tool. I am the one giving the tool access. I am the one who has to keep everything safe.

I have shot myself in the foot using gparted in the past by wiping the wrong disk. gparted wasn't to blame. I was.

Letting LLMs work freely without supervision sounds great but it will lead to pain. I have to supervise their work. And that is also during execution. You can try to replace a human but we see where this leads. Sooner or later the LLM will do something stupid and then the only one to blame is the person who used the tool.

pjc50 3 hours ago | parent | next [-]

This is kind of the reverse of https://en.wikipedia.org/wiki/Poka-yoke . A lot of tools have affordances built in to make "right" things easy and "wrong" or unsafe things harder. LLMs .. well, the text interface is uniquely flat. Everything is seemingly as easy as everything else.

I worry about the use of humans as sacrificial accountability sinks. The "self-driving car" model already has this: a car which drives itself most of the time, but where a human user is required to be constantly alert so that the AI can transfer responsibility a few hundred miliseconds before the crash.

eqvinox an hour ago | parent | next [-]

> A lot of tools have affordances built in to make "right" things easy and "wrong" or unsafe things harder.

This is true for almost anything handed to laypeople, but not for a lot of professional tools. Even a plain battery powered drill has very few protections against misuse. A soldering iron has none. Neither do sewing needles; sewing machines barely do, in the sense that you can't stick your fingers in a gap too narrow. A chemist's chemicals certainly have no protections, only warning labels. Etc.

Also cf. the hierarchy of controls: https://www.cdc.gov/niosh/hierarchy-of-controls/about/index....

people don't seem to want to eliminate AI → replacing it doesn't improve things → isolating it - yup, people are trying to put it in containers and not give it access to delete the production database → changing how people work with it: that's where we are now → PPE: no such thing for AI, sadly → production database is deleted.

BadBadJellyBean 28 minutes ago | parent [-]

Exactly this. I was talking about professionals. People who should know better. If we as professionals give away our agency and our accountability we make ourselves obsolete. If I just tell the LLM what to do and hope it doesn't go south then the Manager could probably do that as well.

And if a non professional did it they should ask themselves why we have professionals. Maybe there was a reason and maybe they do have value.

lelanthran 3 hours ago | parent | prev | next [-]

> This is kind of the reverse of https://en.wikipedia.org/wiki/Poka-yoke . A lot of tools have affordances built in to make "right" things easy and "wrong" or unsafe things harder.

I point to the first USB port as the harbinger of things to come - try it one way, fail, turn it around, fail again, then turn it around one more time.

Just like AI, except there are unlimited axis upon which to turn it :-/

BadBadJellyBean 3 hours ago | parent | prev | next [-]

I agree that LLMs could be more open about their dangers and that people are bad at judging risks sometimes.

Still I think a band saw has very little warning on it and by it's design there is very little anyone can do about me cutting off my finger if I am not careful.

LLM companies can do very little about the unpredictability of LLMs. So we have to choose how for we will let it go. In the end the LLM only produces texts. We are in control what tools we give it. The more tools the more useful and also the more dangerous.

And maybe it's all worth it. Maybe the LLM deletes the database only sometimes but between that we make a lot of money. I don't think my employer would enjoy that so I will be more conservative.

skydhash 2 hours ago | parent | next [-]

It’s possible to make AI safe, but that also throws most of the gains out of the windows, especially if the artifact is a diff which can take time to review. In IT, you often have to give access to possible malicious users, you just have to scope what they can do.

But the push is agentic everything, where AI needs to be everywhere, not in its own sandbox.

BadBadJellyBean 2 hours ago | parent [-]

We don't have to blindly follow every trend. If agentic is not safe then it's on me if I use it and something breaks.

clickety_clack 2 hours ago | parent | prev [-]

A band saw is always a screaming band of bladed death. An LLM is sometimes a buddy, sometimes a mentor, and only sometimes a guy that drops your database.

cindyllm an hour ago | parent [-]

[dead]

polytely 2 hours ago | parent | prev | next [-]

This is so well put, and it not only happens on the user level but also on the organisational level. Where you can completely abdicate both responsibility and explanation by moving the complicated questions into the black box of an AI model.

chrisweekly 2 hours ago | parent | prev | next [-]

^ which approach makes no logical sense; an inattentive or even partly-attentive driver simply cannot resume control and react accordingly within even 2 seconds.

Avicebron 3 hours ago | parent | prev [-]

I think that might be the better definition between "engineering" and "vibing". Engineering follows and elevates Poka-yoke patterns, vibing ignores them.

bombcar 3 hours ago | parent | prev | next [-]

> gparted wasn't to blame. I was.

These can both be true, especially if/when it has bad defaults. This is why you have things like "type the name of the database you're dropping" safety features - but you also have to name your production database something like "THE REAL DaTabaSe - FIRE ME" so you have to type that and not fall into the trap of ending up with the same name in test/development.

AI is particularly seductive because it sounds like a reasonable person has thought things out, but it's all just a giant confidence trick (that works most of the time, which makes it even more dangerous).

fyrabanks 4 hours ago | parent | prev | next [-]

Thank you. Exactly this.

There were so many fundamental problems with the infrastructure even before the person gave a poor prompt to an agent.

If you're using the same API key for staging and prod--and just storing it somewhere randomly to forget about--you're setting yourself up for failure with or without AI.

kokojambo 3 hours ago | parent | prev | next [-]

This is the right approach. I've been developing for 30 years and very much enjoy working with Ai. It's easy to see the Ai is just as good as the person using it. Deterministic or not, it's up for the dev to check the result (both code and behavior). I compare the anti-ai articles like the one saying "ai deleted my prod db" similar to factory workers rioting and complaining about machines replacing them. Ai makes a good developer better, the tech industry always attracted fakers that wanted a piece of the pie and now that these people have their hands on a powerful too and connect it to their prod db, they cry in pain and frustration. Like people with no license crashing a car and crying that cars are dangerous; They are but only because people use them dangerously.

pfortuny an hour ago | parent | prev | next [-]

Tesla has been sued for a similar reason "full self-driving".

AI companies are selling their products as "perfect" ("better than humans...").

I agree in part with you but I also agree that they are selling a hammer which can blow-up without notice.

BadBadJellyBean 33 minutes ago | parent [-]

I do agree that the companies could do a better job telling about the dangers, but let's be real here. It's hardly a secret that LLMs can be erratic. It's not news.

Other companies also tell me their product is the best thing since sliced bread. I still try to find the flaws. That's part of my job. But suddenly with LLMs we just blindly trust the companies? I don't think you.

I don't blindly give up my brain and my agency and no one else should. It's fun and educational to play around with LLMs. Find the what they are good at. But always remember that you can't predict what it will do. So maybe don't blindly trust it.

lelanthran 3 hours ago | parent | prev | next [-]

> I have shot myself in the foot using gparted in the past by wiping the wrong disk. gparted wasn't to blame. I was.

Much like how a poor workman always blames his tools, people using poor tools always blame themselves.

I mean, Donald E Norman wrote The Philosophy of Everyday Things in the 80s!(Later became "The Design of Everyday Things")

And yet, today, we will still have a bunch of people defending Gnome's design decisions, or the latest design decisions from Apple, etc.

BadBadJellyBean 2 hours ago | parent [-]

I am still to blame if I choose a bad tool. Especially if I should know better.

locknitpicker 4 hours ago | parent | prev | next [-]

> My team and I are firm that we are the ones accountable. LLMs are a tool like every other.

Except it is definitely not.

LLMs alone have highly non-deterministic even at a high-level, where they can even pursuit goals contrary to the user's prompts. Then, when introduced in ReAct-type loops and granted capabilities such as the ability to call tools then they are able to modify anything and perform all sorts of unexpected actions.

To make matters worse, nowadays models not only have the ability to call tools but also to generate code on the fly whatever ad-hoc script they want to run, which means that their capabilities are not limited to the software you have installed in your system.

This goes way beyond "regular tool" territory.

keerthiko 3 hours ago | parent | next [-]

I think you are misinterpreting gp as saying

"LLMs are a tool [like every other tool]" to mean "LLMs have similar properties to other tools" — when I believe they meant "LLMs are a tool. other tools are also tools," where the operative implication of "tool" is not about scope of capabilities or how deterministic its output is (these aren't defining properties of the concept of "tool"), but the relationship between 'tool' and 'operator':

- a tool is activated with operator intent (at some point in the call-chain)

- the operator is accountable for the outcomes of activating the tool, intended or otherwise

The capabilities and the abilities of a tool to call sub-tools is only relevant insofar as expressing how much larger the scope of damage and surface area of accountability is with a new generation of tools. This is not that different than past technological leaps.

When a US bomber dropped a nuke in Hiroshima, the accountability goes up the chain to the war-time president giving the authorization to the military and air force to execute the mission — the scope of accountability of a single decision was way larger than supreme commanders had in prior wars. If the US government decides to deploy an LLM to decide who receives and who is denied healthcare coverage, social security payments, voting rights, or anything else, the head of internal affairs to authorize the use of that tool should be held accountable, non-determinism of the tool be damned.

locknitpicker 3 hours ago | parent [-]

> - a tool is activated with operator intent (at some point in the call-chain)

This again is where the simplistic assumption breaks down. Just because you can claim that a person kick started something, that does not mean that person is aware and responsible for all its doing.

Let's put things in perspective: if you install a mobile app from the app store, are you responsible and accountable for every single thing the app does in your system? Because with LLMs and agents you have even less understanding and control and awareness of what they are doing.

engeljohnb 2 hours ago | parent | next [-]

>Just because you can claim that a person kick started something

Kick started what? If you decided to give an LLM access to your database, it's completely on you when you when it does something you don't want. You should've known better.

If all you "kickstart" is an LLM generating text that you can use however you decide, there will never be anything to worry about from the LLM.

> Let's put things in perspective: if you install a mobile app from the app store, are you responsible and accountable for every single thing the app does in your system?

Yes, and it bothers me that others don't feel the same. You vetted the app, you installed the app, and you gave it permission to do whatever on your system. Of course you're responsible.

orphea an hour ago | parent [-]

  it bothers me that others don't feel the same
I bet these are the same people who don't admit they make mistakes; they are never wrong, something else is to blame.
BadBadJellyBean 3 hours ago | parent | prev | next [-]

> if you install a mobile app from the app store, are you responsible and accountable for every single thing the app does in your system?

Yes. I can try to vet the app to the best of your abilities and beyond that it's a tradeoff between how likely is it to cause harm and do the benefits outweigh these harms.

Of course everyone is differently qualified to do this but my argument is more about professionals. Managers should know better than to blindly trust LLM companies. Engineers should take better care what they allow LLMs to do and what tools they give them.

There is a difference between "I couldn't have known" and "I didn't know". You can know that LLMs are not trustworthy. You couldn't have know what they do but you already knew that trusting them blindly might be bad.

You could know that giving a baby a razor blade is a bad idea. You can't know what exactly will happen but you might have a pretty good idea that it will probably be not good.

52-6F-62 2 hours ago | parent [-]

Except what we have here is razor blade companies getting the government to heavily subsidize present razor blade production running massive advertising campaigns and intense intra-industry pressure to give said razor blades to babies under fear of losing your job or "falling behind" those not giving razor blades to babies.

Let's not forget all the razor blade enthusiasts just screaming at you that you are using babies with razor blades wrong and that it works totally fine for them.

BadBadJellyBean an hour ago | parent [-]

[dead]

orphea an hour ago | parent | prev | next [-]

  that does not mean that person is aware and responsible for all its doing.
If they are unaware or - worse - don't understand what they are doing, maybe they shouldn't do the thing in the first place?
keerthiko 2 hours ago | parent | prev [-]

There can be more than one person or entity to be held accountable, depending on the details of impact

If I install a powerful/dangerous app, and I come under harm, I have some accountability — most of it if it's due to user error (eg: I install termux and `rm -rf /`).

If it's malware, and Google/Apple approved said app to their store which is where I got it from, when their whole value proposition for walled-garden storefronts is protecting users, then they have significant accountability.

If the app requests more permissions than necessary for stated goals, and/or intentionally harms users via misrepresentation or misdirection (malware), the app publisher should also be held accountable (by the storefront, legally, etc).

I'm also unclear what angle you are arguing: are you stating that because tools have gotten so complicated that the end user may not understand how it all works, no one should be considered responsible or held accountable? Or that the tool (currently a non-entity) itself should be held accountable somehow? Or that no one other than the distributor of the tool should be accountable?*

BadBadJellyBean 4 hours ago | parent | prev | next [-]

Then that is also on me for using a tool that I can't control. I don't run my LLMs in a way where they can just do things without me signing off on it. It's not nearly as fast as just letting it do it's thing but I kept it from doing stupid things so many times.

Giving up control is a decision. The consequences of this decision are mine to carry. I can do my best to keep autonomous LLMs contained and safe but if I am the one who deploys them, then I am the one who is to blame if it fails.

That's why I don't do that.

locknitpicker 3 hours ago | parent [-]

> Then that is also on me for using a tool that I can't control.

That's a core trait of LLMs.

Even the AI companies developing frontier models felt the need to put together whole test suites purposely designed to evaluate a model's propensity to try to subvert the user's intentions.

https://www.anthropic.com/research/shade-arena-sabotage-moni...

> Giving up control is a decision.

No, it is definitely not. Only recently did frontier models started to resort to generating ad-hoc scripts as makeshift tools. They even generate scripts to apply changes to source files.

BadBadJellyBean 3 hours ago | parent [-]

You seem to misunderstand me. An LLM can only spit out text. It is the tooling I use that allows it to write scripts and call them. In my tooling it waits for me to accept changes, call scripts or other tools that might change something. I can make that deterministic. I know that it will stop and ask because it has no choice. If I want to be safer I give it no tools at all.

I can also just choose not to use an LLM. It is my choice to use them so it is my duty to keep myself safe. If I can't control that I'd be stupid to use them.

My take is that I probably can use LLMs safely when I don't let it run autonomously. There is a slight chance that the LLM will generate a string that will cause a bug in an MCP that will let the LLM do what it wants. That is the risk I am going to take and I will take the blame if it goes wrong.

dpoloncsak 4 hours ago | parent | prev [-]

Isn't the next sentence there literally 'Only that it's non deterministic'?

mystraline 3 hours ago | parent | prev [-]

> LLMs are a tool like every other. Only that it's non deterministic.

If you stay away from the corporate SaaS token vendors, and run your own, you will find LLMs are deterministic, purely based on the exact phrase on input. And as long as the context window's tokens are the same, you will get the same output.

The corporate vendors do tricks and swap models and play with inherent contexts from other chats. It makes one-shot questions annoying cause unrelated chats will creep into your context window.

BadBadJellyBean 3 hours ago | parent [-]

Yes and no. You might get the same output if you turn down the temperature, but you will probably not know the output without running it first. It's a bit like a hashing function. If I give the same input I get the same hash but I don't know which input will to which hash without running the function.

Also most LLMs are not run as I write a prompt and I will read output. Usually you have MCPs or other tools connected. These will change the input and it will probably lead to different outputs. Otherwise it wouldn't be a problem at all.