Where do you get the idea that you have a good sense of the introspective capabilities of frontier models ? Certainly not from interpretability research. Ironically, the people who make these sort of comments understand LLMs the least.

▲

embedding-shape 3 hours ago | parent [-]

> Certainly not from interpretability research

What research shows that you can ask ChatGPT to explain its reasoning and why it said what it said, and that's guaranteed to actually be the motivation?

I've seen a bunch of experimentation looking at various things inside the black box while the inference is happening, but never seen any research pointing to tokens being able to explain why other tokens are there, but I'd be very happy to be educated here if you have any resources at hand, I won't claim to know everything.

▲

famouswaffles 3 hours ago | parent [-]

>What research shows that you can ask ChatGPT to explain its reasoning and why it said what it said, and that's guaranteed to actually be the motivation?

What research shows that you can ask a Human to explain its reasoning and why it said what it said, and that's guaranteed to actually be the motivation? Because there's no such thing. If anything, what research exists suggests any explanation we're making is a nice post-hoc rationalization after the fact even if the Human thinks otherwise.

https://transformer-circuits.pub/2025/introspection/index.ht...

	▲	embedding-shape 2 hours ago \| parent [-]
		Why not try to answer my question, instead of asking a different question which I haven't even claimed to have the answer to?