Remix clone Hacker News

new | show | ask | jobs Github

	▲	shevy-java 5 hours ago
		So the best one found about 50%. I think that is not bad, probably better than most humans. But what about the remaining 50%? Why were some found and others not? > Claude Opus 4.6 found it… and persuaded itself there is nothing to worry about > Even the best model in our benchmark got fooled by this task. That is quite strange. Because it seems almost as if a human is required to make the AI tools understand this.