▲ | sejje 4 days ago | |||||||||||||
Humans are used to ignoring things while LLMs are explicitly trained to pay attention to the entire text. Humans who haven't been exposed to trick problems or careful wording probably have a hard time, they'll be less confident about ignoring things. But the LLM should have seen plenty of trick problems as well. It just doesn't parse as part of the problem. Humans have more options, and room to think. The LLM had to respond. I'd also like to see how responses were grouped, does it ever refuse, how do refusals get classed, etc. Were they only counting math failures as wrong answers? It has room to be subjective. | ||||||||||||||
▲ | Y_Y 4 days ago | parent [-] | |||||||||||||
> LLMs are explicitly trained to pay attention to the entire text I'd respectfully disagree on this point. The magic of attention in transformers is the selective attention applied, which ideally only gives significant weight to the tokens relevant to the query. | ||||||||||||||
|