| ▲ | famouswaffles 5 hours ago | ||||||||||||||||
Where do you get the idea that you have a good sense of the introspective capabilities of frontier models ? Certainly not from interpretability research. Ironically, the people who make these sort of comments understand LLMs the least. | |||||||||||||||||
| ▲ | embedding-shape 3 hours ago | parent [-] | ||||||||||||||||
> Certainly not from interpretability research What research shows that you can ask ChatGPT to explain its reasoning and why it said what it said, and that's guaranteed to actually be the motivation? I've seen a bunch of experimentation looking at various things inside the black box while the inference is happening, but never seen any research pointing to tokens being able to explain why other tokens are there, but I'd be very happy to be educated here if you have any resources at hand, I won't claim to know everything. | |||||||||||||||||
| |||||||||||||||||