▲ | zamalek a day ago | |||||||||||||||||||||||||||||||||||||
LLMs lie about their reasoning: https://www.anthropic.com/research/tracing-thoughts-language... | ||||||||||||||||||||||||||||||||||||||
▲ | Lerc a day ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
It's worth mentioning that this is a different scenario to the reasoning models though. Reasoning models use the generated text to arrive at an answer, in a sense, it cannot lie until it gives the answer. That answer may express a reasoning that was not the reasoning used. That bit is the lie. You can actually take this further when you consider deepseek style reinforcement. While the reasoning text may appear to show the thought process used in readable language, the model is trained to say whatever it needs to generate the right answer, that may or may not be what that text means to an outside observer. In theory it could encode extra information in word lengths or even evolve it's own Turing complete gobbledegook. There are many degrees of likelihood in the options available. Perhaps one more likely is some rarely used word has some poorly trained side-effect that gives the context a kick in the right direction right before it was going to take a fork going the wrong way. Kind of a SolidGoldMagikarp spanking. | ||||||||||||||||||||||||||||||||||||||
▲ | unoti a day ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
> LLMs lie about their reasoning People do this all the time too! Cat scans show that people make up their minds quickly, showing activations in one part of the brain that makes snap judgements, and then a fraction of a second later the part that shows rational reasoning begins to activate. People in sales have long known this, wanting to give people emotional reasons to make the right decision, while also giving them the rational data needed to support it. [1] I remember seeing this illustrated ourselves when our team of 8 or so people was making a big ERP purchasing decision between Oracle ERP and Peoplesoft long ago. We had divided what our application needed to do into over 400 feature areas, and in each feature area had developed a very structured set of evaluation criteria for each area. Then we put weights on each of those to express how important it was to us. We had a big spreadsheet to rank the things. But along the way of the 9 month sales process, we really enjoyed working with the Oracle sales team a lot better. We felt like we'd be able to work with them better. In the end, we ran all the numbers, and Peoplesoft came out on top. And we sat there and soberly looked each other in the eyes, and said "We're going with Oracle." (Actually I remember one lady on the team when asked for her vote said, "It's gotta be the big O.") Salespeople know that ultimately it's a gut decision, even if the people buying things don't realize that themselves. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
▲ | a day ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
[deleted] |