| ▲ | simianwords 2 hours ago | |
They all use the tool search, no? Please correct me if I'm wrong. My criteria was using ChatGPT which explicitly allows it. https://arxiv.org/html/2511.13029v1 if you don't believe me. BTW this was your original point >Anyway, it's trivial to get pretty much any model to make things up. Don't we all know this? That's why I was surprised by your position; if we know anything about these things it's that they make things up. And look at how much effort you have had to do 1. use the wrong model for the horns example 2. the game one also didn't work 3. now you are searching for examples in literal benchmarks and you are still not able to find any How is this trivial in any interpretation of the word? I think it would be perfectly reasonable to agree that it is not at all trivial to find counter examples for my challenge. | ||
| ▲ | camgunz an hour ago | parent [-] | |
I've got about 20 minutes in this; mostly I've been reading wallstreetbets at the Shake Shack bar in the Boston airport. I'm happy to post this over and over again until you engage w/ it: > I found over 500 examples that fit your criteria. | ||