Remix.run Logo
thepasch 9 hours ago

Yup, those are among the papers I was referring to in the opening parts of the piece! The difference between them and my small tests is that they all explicitly prompt the model to introspect, while I specifically didn't and kept the context perfectly "normal conversation"-shaped (minus the complete corruption of the model's outputs, of course).

famouswaffles 8 hours ago | parent [-]

There's another one that intrigued me greatly when i read about it years back. This was back when GPT-3 was state of the art. I had a lot of trouble finding it again but i did!

It's not an exact fit because the output is that of a tool rather than the model itself (though i don't think much would change if we had the model perform the arithmetic itself but altered answers similarly), but it was the first time I began to realize that just like the brain, these models have an expectation of reality that they work around. They don't necessarily 'trust' an output if it diverges significantly from this 'reality'. And that this disregard may be silent indeed (no reasoning or chain of thought here).

GPT-3 will ignore tools when it disagrees with them - https://vgel.me/posts/tools-not-needed/