Remix.run Logo
GodelNumbering an hour ago

More interesting part probably worth highlighting: The SAME model won't always return the same output when prompted with the same fact check.

You ask a human 1000 times a fact check question, they say the same answer 1000 times. You ask an LLM the same question a 1000 times, your results could vary significantly.

Humans work based on the Metamemory (knowing what they know), while LLMs are picking from statistical probability.

logged4upvoting 23 minutes ago | parent [-]

That is not true, over an extended task that you cannot keep complete in memory humans do not behave with 100% consistency.

I have labeled datasets with a human team and shown the same task to the same user on a different day, and they answered differently. Of course, they are usually consistent with themselves most of the time but not always.