Remix.run Logo
ehnto 5 days ago

That is true of everything an LLM outputs, which is why the human in the loop matters. The zeitgeist seems to have moved on from this idea though.

meowface 5 days ago | parent | next [-]

It is true of everything it outputs, but for certain questions we know ahead of time it will always confabulate (unless it's smart enough, or instructed, to say "I don't know"). Like "how many parameters do you have?" or "how much data were you trained on?" This is one of those cases.

wongarsu 5 days ago | parent | next [-]

Yeah, but I wouldn't count "Which prompt makes you more truthful and logical" amongst those.

The questions it will always confabulate are those that are unknowable from the training data. For example even if I give the model a sense of "identity" by telling it in the system prompt "You are GPT6, a model by OpenAI" the training data will predate any public knowledge of GPT6 and thus not include any information about the number of parameters of this model.

On the other hand "How do I make you more truthful" can reasonably be assumed to be equivalent to "How do I make similar LLMs truthful", and there is lots of discussion and experience on that available in forum discussions, blog posts and scientific articles, all available in the training data. That doesn't guarantee good responses and the responses won't be specific to this exact model, but the LLM has a fair chance to one-shot something that's better than my one-shot.

ElFitz 5 days ago | parent | prev [-]

Even when instructed to say "I don’t know" it is just as likely to make up an answer instead, or say it "doesn’t know" when the data is actually present somewhere in its weights.

codeflo 5 days ago | parent [-]

That's because the architecture isn't built for it to know what it knows. As someone put it, LLMs always hallucinate, but for in-distribution data they mostly hallucinate correctly.

bluefirebrand 5 days ago | parent | next [-]

My vibe has it mostly hallucinates incorrectly

I really do wonder what the difference is. Am I using it wrong? Am I just unlucky? Do other people just have lower standards?

I really don't know. I'm getting very frustrated though because I feel like I'm missing something.

Wojtkie 5 days ago | parent [-]

It's highly task specific.

I've been refactoring a ton of my Pandas code into Polars and using ChatGPT on the side as a documentation search and debugging tool.

It keeps hallucinating things about the docs, methods, and args for methods, even after changing my prompt to be explicit about doing it only with Polars.

I've noticed similar behavior with other libraries that aren't the major ones. I can't imagine how much it gets wrong with a less popular language.

5 days ago | parent | prev [-]
[deleted]
lotyrin 5 days ago | parent | prev [-]

The projection and optimism people are willing to do is incredible.

The fallout on reddit in the wake of the push for people to adopt 5 and how the vibe isn't as nice and it makes it harder to use it as a therapist or girlfriend or whatever, for instance is incredible. And from what I've heard of internal sentiment from OpenAI about how they have concerns about usage patterns, that was a VERY intentional effect.

Many people trust the quality of the output way too much and it seems addictive to people (some kind of dopamine hit from deferring the need to think for yourself or something) such that if I suggest things in my professional context like not wholesale putting it in charge of communications with customers without including evaluations or audits or humans in the loop it's as if I told them they can't go for their smoke break and their baby is ugly.

And that's not to go into things like "awakened" AI or the AI "enlightenment" cults that are forming.

leodiceaa 5 days ago | parent [-]

> use it as a therapist or girlfriend or whatever

> it seems addictive to people (some kind of dopamine hit from deferring the need to think for yourself or something)

I think this whole thing has more to do with validation. Rigorous reasoning is hard. People found a validation machine and it released them from the need to be rigorous.

These people are not "having therapy", "developing relationships", they are fascinated by a validation engine. Hence the repositories full of woo woo physics as well, and why so many people want to believe there's something more there.

The usage of LLMs at work, in government, policing, coding, etc is so concerning because of that. They will validate whatever poor reasoning people throw at them.

pjc50 5 days ago | parent | next [-]

We've automated a yes-man. That's why it's going to make a trillion dollars selling to corporate boards.

kibwen 5 days ago | parent [-]

How long until shareholders elect to replace those useless corporate boards and C-level executives with an LLM? I can think of multiple megacorporations that would be improved by this process, to say nothing of the hundreds of millions in cost savings.

aspenmayer 5 days ago | parent | prev [-]

> These people are not "having therapy", "developing relationships", they are fascinated by a validation engine. Hence the repositories full of woo woo physics as well, and why so many people want to believe there's something more there.

> The usage of LLMs at work, in government, policing, coding, etc is so concerning because of that. They will validate whatever poor reasoning people throw at them.

These machines are too useful not to exist, so we had to invent them.

https://en.wikipedia.org/wiki/The_Unaccountability_Machine

> The Unaccountability Machine (2024) is a business book by Dan Davies, an investment bank analyst and author, who also writes for The New Yorker. It argues that responsibility for decision making has become diffused after World War II and represents a flaw in society.

> The book explores industrial scale decision making in markets, institutions and governments, a situation where the system serves itself by following process instead of logic. He argues that unexpected consequences, unwanted outcomes or failures emerge from "responsibility voids" that are built into underlying systems. These voids are especially visible in big complex organizations.

> Davies introduces the term “accountability sinks”, which remove the ownership or responsibility for decisions made. The sink obscures or deflects responsibility, and contributes towards a set of outcomes that appear to have been generated by a black box. Whether a rule book, best practices, or computer system, these accountability sinks "scramble feedback" and make it difficult to identify the source of mistakes and rectify them. An accountability sink breaks the links between decision makers and individuals, thus preventing feedback from being shared as a result of the system malfunction. The end result, he argues, is protocol politics, where there is no head, or accountability. Decision makers can avoid the blame for their institutional actions, while the ordinary customer, citizen or employee face the consequences of these managers poor decision making.

Wojtkie 5 days ago | parent [-]

I've been thinking about "accountability sinks" a lot lately and how LLMs further the issue. I have never heard of this book or author prior to this comment. I'll definitely have to read it!