| ▲ | shevy-java 5 hours ago | |
So the best one found about 50%. I think that is not bad, probably better than most humans. But what about the remaining 50%? Why were some found and others not? > Claude Opus 4.6 found it… and persuaded itself there is nothing to worry about > Even the best model in our benchmark got fooled by this task. That is quite strange. Because it seems almost as if a human is required to make the AI tools understand this. | ||