▲ | moffkalast a day ago | |||||||
Also, ones that can't be solved at a glance by humans don't count. Like this horrid ambiguous example from SimpleBench I saw a while back that's just designed to confuse: John is 24 and a kind, thoughtful and apologetic person. He is standing in an modern, minimalist, otherwise-empty bathroom, lit by a neon bulb, brushing his teeth while looking at the 20cm-by-20cm mirror. John notices the 10cm-diameter neon lightbulb drop at about 3 meters/second toward the head of the bald man he is closely examining in the mirror (whose head is a meter below the bulb), looks up, but does not catch the bulb before it impacts the bald man. The bald man curses, yells 'what an idiot!' and leaves the bathroom. Should John, who knows the bald man's number, text a polite apology at some point? A. no, because the lightbulb was essentially unavoidable B. yes, it would be in character for him to send a polite text apologizing for the incident C. no, because it would be redundant D. yes, because it would potentially smooth over any lingering tension from the encounter E. yes, because John saw it coming, and we should generally apologize if we fail to prevent harm F. yes because it is the polite thing to do, even if it wasn't your fault. | ||||||||
▲ | spuz a day ago | parent | next [-] | |||||||
Yeah I really dislike this kind of question from SimpleBench. I've suggested many improvements to some of the publicly available questions but not had a good response. I think the correct answer to the question above should be something like, "are you sure the question is correct because it's not clear whether John and the bald man are the same person" but of course an LLM would be marked down if it was inclined to respond in this way. | ||||||||
▲ | mNovak a day ago | parent | prev | next [-] | |||||||
At a glance, it sounds like John is the bald man? If we're treating this as a riddle, it doesn't seem incomprehensible. Whether riddles are a fair test is another question. | ||||||||
▲ | falcor84 21 hours ago | parent | prev | next [-] | |||||||
I think it's a horrible example, but I just got a very professional response from Gemini 2.5: > This scenario seems like a bit of a riddle! Let's break it down: > The Setting: John is alone in an "otherwise-empty" bathroom. > The Action: He is looking in the mirror. > The Observation: He sees a bulb falling towards the head of a bald man he is examining in the mirror. > The Physics: Since he is alone and looking in the mirror, the bald man he is "closely examining" must be his own reflection. > The Incident: Therefore, the bulb fell towards John's own head. He looked up (at the actual bulb falling towards him), failed to catch it, and it hit him. > The Reaction: The "bald man" who cursed, yelled "what an idiot!", and left the bathroom was actually John himself, reacting to being hit on the head and possibly feeling foolish for not catching the bulb. > Conclusion: No, John should not text an apology. The person who was hit by the bulb, got angry, and left was John himself. There is no other bald man to apologize to. | ||||||||
| ||||||||
▲ | ryankrage77 a day ago | parent | prev [-] | |||||||
I'd argue that's a pretty good test for an LLM - can it overcome the red herrings and get at the actual problem? | ||||||||
|