Remix.run Logo
nialv7 a day ago

> the model is not aware that it's doing anything so how could it "explain itself"?

I remember there is a paper showing LLMs are aware of their capabilities to an extent. i.e. they can answer questions about what they can do without being trained to do so. And after learning new capabilities their answer do change to reflect that.

I will try to find that paper.

nialv7 a day ago | parent [-]

Found it, here: https://martins1612.github.io/selfaware_paper_betley.pdf