Remix.run Logo
suzzer99 2 hours ago

> How is it different from a person believing whatever they read on the Internet?

The problem is LLMs have no capacity for shame.

My Dad got taken in by a Target gift card scam. He felt so terrible, he almost didn't even tell me about it. He may get scammed again, but not by anything remotely like that.

To LLMs, all mistakes just get washed together into the same bucket. They don't spend days feeling depressed and stupid over getting scammed. There's no giant blinking red light that says, "Never let this happen again!"

basilikum 33 minutes ago | parent | next [-]

I don't think shame is a helpful human emotion here in general. It prevents people from reaching out for help and makes many crimes much harder to tackle because the victims do not report it.

Also many victims fall for the exact same scam over and over again; to the point that lists of scam victims are sold and used as leads.

suzzer99 14 minutes ago | parent | next [-]

If a junior developer makes a dumb mistake that causes a mini-disaster, their brain makes it a priority to never make that same mistake again. They physically feel anxiety the next time they get into a similar situation, which serves as a very effective reminder not to do the same dumb thing.

LLMs make the same mistakes over and over. And even if/when they have the capacity to learn on the fly, they have no capacity to prioritize. It's all just a big haze of tokens.

That's my overall point. Humans have mistakes and then they have MISTAKES. And a whole continuum in between. LLMs just have a mish-mash of training data. I think before LLMs are more than just fancy parrots, we need a find an analogue to pain, shame, joy, fear, and the myriad other emotions that factor into human decision-making.

idiotsecant 9 minutes ago | parent | prev [-]

Shame is a wildly useful human emotion. Shame of letting down the tribal unit formed basically all of civilization. Shame is good.

keeda 2 hours ago | parent | prev | next [-]

> The problem is LLMs have no capacity for shame.

I know what you mean but I can't help but be cheeky: https://www.fastcompany.com/91383271/googles-chatbot-apologi...

Jokes aside, shame does not change the underlying point though. Despite feeling ashamed for being tricked, as you point out people can still get scammed again by different tricks. I think your point is more about learning from mistakes than shame.

Which still does not change the underlying point, I suppose. Offhand I cannot think of anything that would fix this problem for LLMs that wouldn't also fix it for humans, like relying on trusted sources.

amarant an hour ago | parent | prev | next [-]

>The problem is LLMs have no capacity for shame

You seem to be implying that people do, and I'd like to contest that point gestures wildly at everything

ChuckMcM 2 hours ago | parent | prev [-]

This is a great point. I've added it to my list of things when talking about the limitations of LLM.

Terr_ 2 hours ago | parent | next [-]

IMO we must take it a step further: In this context "the LLM" we're all automatically thinking-of doesn't exist, it is a fictional character we humans "see" inside a story being acted-out or read to us. (In contrast, the real-world LLM is an algorithm in a basement constantly taking documents and making them slightly longer based on trends detected in all documents.)

Therefore "the LLM can't feel shame" is true in the same way that "CyberDracula thirsts for the fluids of the innocent." Good news: Vampirism doesn't exist! Bad news: Curing Dracula is impossible, because the patient doesn't exist either. Go looking for the target mind we wanted to make more-intelligent or kinder, and it turns out to be a trick of the light.

The best we can do is change the generator process, so that the next story instead contains a different new character also named after Dracula (or a brand of LLM) that sounds smarter or is narrated with kinder actions.

Apocryphon 2 hours ago | parent | prev [-]

Perhaps the end state is going to be from the last Hitchhiker's Guide to the Galaxy book, Mostly Harmless:

> Anything that thinks logically can be fooled by something else that thinks at least as logically as it does. The easiest way to fool a completely logical robot is to feed it with the same stimulus sequence over and over again so it gets locked in a loop. This was best demonstrated by the famous Herring Sandwich experiments conducted millennia ago at MISPWOSO (the MaxiMegalon Institute of Slowly and Painfully Working Out the Surprisingly Obvious).

> A robot was programmed to believe that it liked herring sandwiches. This was actually the most difficult part of the whole experiment. Once the robot had been programmed to believe that it liked herring sandwiches, a herring sandwich was placed in front of it. Where upon the robot thought to itself, Ah! A herring sandwich! I like herring sandwiches.

> It would then bend over and scoop up the herring sandwich in its herring sandwich scoop, and then straighten up again. Unfortunately for the robot, it was fashioned in such a way that the action of straightening up caused the herring sandwich to slip straight back off its herring sandwich scoop and fall on to the floor in front of the robot. Whereupon the robot thought to itself, Ah! A herring sandwich...etc., and repeated the same action over and over again. The only thing that prevented the herring sandwich from getting bored with the whole damn business and crawling off in search of other ways of passing the time was that the herring sandwich, being just a bit of dead fish between a couple of slices of bread, was marginally less alert to what was going on than was the robot.

> The scientists at the Institute thus discovered the driving force behind all change, development and innovation in life, which was this: herring sandwiches. They published a paper to this effect, which was widely criticised as being extremely stupid. They checked their figures and realised that what they had actually discovered was “boredom”, or rather, the practical function of boredom. In a fever of excitement they then went on to discover other emotions, Like “irritability”, “depression”, “reluctance”, “ickiness” and so on. The next big breakthrough came when they stopped using herring sandwiches, whereupon a whole welter of new emotions became suddenly available to them for study, such as “relief”, “joy”, “friskiness”, “appetite”, “satisfaction”, and most important of all, the desire for “happiness”. This was the biggest breakthrough of all.

> Vast wodges of complex computer code governing robot behaviour in all possible contingencies could be replaced very simply. All that robots needed was the capacity to be either bored or happy, and a few conditions that needed to be satisfied in order to bring those states about. They would then work the rest out for themselves.

ChuckMcM 2 hours ago | parent | next [-]

I love that book, that said, the point is more subtle than that. Current LLM attention models are limited in their feedback. Adding a form of 'shame' feedback (result is technically correct but morally bad or some such) would help here but I doubt the folks building theses things would choose to do so.

jerf 41 minutes ago | parent [-]

From a certain and quite valid point of view, they have no mechanism for feedback at all. Every time you start a conversation you're starting in the same state, modulo the random numbers. At most you have this very, very vague loop in that the conversations for LLM 1.0 will be fed in to the training set for LLM 2.0.

Even "shame" would only apply to the current session and disappear in the next one, or eventually be compacted away.

(Although honorable mention to Gemini's meltdown: https://x.com/AISafetyMemes/status/1953397827662414022 )

suzzer99 2 minutes ago | parent [-]

According to ChatGPT, researchers are working on models that remember personal directives across sessions. IE - an actual personal assistant that gets to know you and your proclivities. So it's definitely on their radar. No idea how far away they are.

amarant an hour ago | parent | prev [-]

Damn I had forgotten about this section of the book to the point that even reading it, I only recognised the style as typical Adams.

Guess that means I'm overdue for a re-read! Jaay!