Is your premise here that LLMs have a unique or enhanced insight into how LLMs work best?

▲ crustycoder 3 days ago | parent | next [-]

I wouldn't go that far but the only way I've found so far of getting a reasonable insight into why a LLM has chosen to do something is to ask it.

▲ alexwebb2 3 days ago | parent | prev [-]

Not OP but I’d back that assertion.

When the model that’s interpreting it is the same model that’s going to be executing it, they share the same latent space state at the outset.

So this is essentially asking whether models are able to answer questions about context they’re given, and of course the answer is yes.

▲ didgeoridoo 3 days ago | parent [-]

There is no evidence of this. Evals are quite different from "self-evals". The only robust way of determining if LLM instructions are "good" is to run them through the intended model lots of times and see if you consistently get the result you want. Asking the model if the instructions are good shows a very deep misunderstanding of how LLMs work.

▲ alexwebb2 2 days ago | parent | next [-]

You're misunderstanding my assertion.

When you give prompt P to model M, when your goal is for the model to actually execute those instructions, the model will be in state S.

When you give the same prompt to the same model, when your goal is for the model to introspect on those instructions, the model is still in state S. It's the exact same input, and therefore the exact same model state as the starting point.

Introspection-mode state only diverges from execution-mode state at the point at which you subsequently give it an introspection command.

At that point, asking the model to e.g. note any ambiguities about the task at hand is exactly equivalent to asking it to evaluate any input, and there is overwhelming evidence that frontier models do this very well, and have for some time.

Asking the model, while it's in state S, to introspect and surface any points of confusion or ambiguities it's experiencing about what it's being asked to do, is an extremely valuable part of the prompt engineering toolkit.

I didn't, and don't, assert that "asking the model if the instructions are good" is a replacement for evals – that's a strawman argument you seem to be constructing on your own and misattributing to me.

▲ mpalmer 2 days ago | parent | next [-]

    At that point, asking the model to e.g. note any ambiguities about the task at hand is exactly equivalent to asking it to evaluate any input

This point is load-bearing for your position, and it is completely wrong.

Prompt P at state S leads to a new state SP'. The "common jumping off point" you describe is effectively useless, because we instantly diverge from it by using different prompts.

And even if it weren't useless for that reason, LLMs don't "query" their "state" in the way that humans reflect on their state of mind.

The idea that hallucinations are somehow less likely because you're asking meta-questions about LLM output is completely without basis

▲

alexwebb2 2 days ago | parent [-]

> The idea that hallucinations are somehow less likely because you're asking meta-questions about LLM output is completely without basis

Not sure who you're replying to here – this is not a claim I made.

	▲	mpalmer 2 days ago \| parent [-]
		That's fair, but I'm not sure why you chose to address the one part of my comment that isn't responsive to your points.

▲ crustycoder 2 days ago | parent | prev [-]

Nicely put. I haven't seen anyone say that the introspection abilities of LLMs are up to much, but claiming that it's completely impossible to get a glimpse behind the curtain is untrue.

▲ crustycoder 3 days ago | parent | prev [-]

Is that based on your "deep understanding" of how LLMs work or have you actually tried it? If you watch the execution trace of a Skill in action, you can see that it's doing exactly this inspection when the skill runs - how could it possibly work any other way?

Skills are just textual instructions, LLMs are perfectly capable of spotting inconsistencies, gaps and contradictions in them. Is that sufficient to create a good skill? No, of course not, you need to actually test them. To use an analogy, asking a LLM to critique a skill is like running lint on C code first to pick up egregious problems, running testcases is vital.

▲

hansmayer 3 days ago | parent [-]

> you can see that it's doing exactly this inspection when the skill runs

I mean how do you know what does it exactly do? Because of the text it outputs?

▲

crustycoder 2 days ago | parent [-]

"exactly this inspection" != "what does it exactly do"

▲

hansmayer 2 days ago | parent [-]

Please read your own sentence again. Because you litterally said the opposite.

▲

crustycoder 2 days ago | parent [-]

I'd tell you to read it again, but you seem to be struggling.

	▲	hansmayer 2 days ago \| parent [-]
		Did I write this: "you can see that it's doing exactly this inspection when the skill runs" ? So, yeah - read what you wrote again.