LLMs will make mistakes on every turn. The mistakes will have little to no apparent connection to "difficulty" or what may or may not be prevalent in the training data. They will be mistakes at all levels of operation, from planning to code writing to reporting. Whether those mistakes matter and whether you catch them is mostly up to you.

I have yet to find a model that does not make mistakes each turn. I suspect that this kind of error is fundamentally incorrigible.

The most interesting thing about LLMs is that despite the above (and its non-determinism) they're still useful.

▲

simonw 19 minutes ago | parent | next [-]

> I have yet to find a model that does not make mistakes each turn

What kind of mistakes are you talking about here?

▲

pyrolistical an hour ago | parent | prev [-]

As a human I make typos all the time

▲

dangus 38 minutes ago | parent | next [-]

A human can sit down and say “I’m going to make sure this is correct on the first pass and make sure I make an exact copy.”

They have cognitive awareness of which tasks are highly critical and need more checking and re-checking without being prompted to think that way.

For a human, time doesn’t stop when the first pass of the prompt and response is over. An LLM effectively wipes its memory of what it just did unless something is keeping track of a highly resource constrained context.

An LLM is like an author of a book that immediately closes its eyes and wipes its memory after writing a chapter. Sure, it can pull some of that back in the next query via context, and it can regain context very quickly, but it effectively has no memory of the exact thing it just did.

When a human is doing these tasks there is a lot of room for mistakes but there’s also a wildly higher capacity for flowing through time.

▲

adampunk 32 minutes ago | parent [-]

Ok, and?

	▲	simonh 23 minutes ago \| parent \| next [-]
		Humans understand what mistakes are and can reason about what constitutes a mistake and what doesn’t. LLMs can’t do that. It’s for the same reason that they will invent bullshit instead of saying “I don’t know”, when they don’t know. They don’t have a concept of accuracy of facts.
	▲	dangus 28 minutes ago \| parent \| prev [-]
		And that’s why I’m paid six figures and my LLM is paid $20/month.

▲

adampunk an hour ago | parent | prev [-]

I do too! I also make higher level design errors and get too enthusiastic about projects before code is written.

We are, in a sense, fallible machines who have designed a planet-wide computational fabric around that fact.

	▲	peyton 32 minutes ago \| parent [-]
		[flagged]