| ▲ | camgunz 2 hours ago | |||||||
I found over 500 examples that fit your criteria. Embarrassing you were arguing in bad faith this whole time. | ||||||||
| ▲ | simianwords 2 hours ago | parent [-] | |||||||
They all use the tool search, no? Please correct me if I'm wrong. My criteria was using ChatGPT which explicitly allows it. https://arxiv.org/html/2511.13029v1 if you don't believe me. BTW this was your original point >Anyway, it's trivial to get pretty much any model to make things up. Don't we all know this? That's why I was surprised by your position; if we know anything about these things it's that they make things up. And look at how much effort you have had to do 1. use the wrong model for the horns example 2. the game one also didn't work 3. now you are searching for examples in literal benchmarks and you are still not able to find any How is this trivial in any interpretation of the word? I think it would be perfectly reasonable to agree that it is not at all trivial to find counter examples for my challenge. | ||||||||
| ||||||||