Remix.run Logo
meowface 5 days ago

It is true of everything it outputs, but for certain questions we know ahead of time it will always confabulate (unless it's smart enough, or instructed, to say "I don't know"). Like "how many parameters do you have?" or "how much data were you trained on?" This is one of those cases.

wongarsu 5 days ago | parent | next [-]

Yeah, but I wouldn't count "Which prompt makes you more truthful and logical" amongst those.

The questions it will always confabulate are those that are unknowable from the training data. For example even if I give the model a sense of "identity" by telling it in the system prompt "You are GPT6, a model by OpenAI" the training data will predate any public knowledge of GPT6 and thus not include any information about the number of parameters of this model.

On the other hand "How do I make you more truthful" can reasonably be assumed to be equivalent to "How do I make similar LLMs truthful", and there is lots of discussion and experience on that available in forum discussions, blog posts and scientific articles, all available in the training data. That doesn't guarantee good responses and the responses won't be specific to this exact model, but the LLM has a fair chance to one-shot something that's better than my one-shot.

ElFitz 5 days ago | parent | prev [-]

Even when instructed to say "I don’t know" it is just as likely to make up an answer instead, or say it "doesn’t know" when the data is actually present somewhere in its weights.

codeflo 5 days ago | parent [-]

That's because the architecture isn't built for it to know what it knows. As someone put it, LLMs always hallucinate, but for in-distribution data they mostly hallucinate correctly.

bluefirebrand 5 days ago | parent | next [-]

My vibe has it mostly hallucinates incorrectly

I really do wonder what the difference is. Am I using it wrong? Am I just unlucky? Do other people just have lower standards?

I really don't know. I'm getting very frustrated though because I feel like I'm missing something.

Wojtkie 5 days ago | parent [-]

It's highly task specific.

I've been refactoring a ton of my Pandas code into Polars and using ChatGPT on the side as a documentation search and debugging tool.

It keeps hallucinating things about the docs, methods, and args for methods, even after changing my prompt to be explicit about doing it only with Polars.

I've noticed similar behavior with other libraries that aren't the major ones. I can't imagine how much it gets wrong with a less popular language.

5 days ago | parent | prev [-]
[deleted]