Remix.run Logo
rcxdude 3 hours ago

The news articles on it are going to affect this. I wonder if the original paper is in the base models at all, almost certainly these results were from the article showing up in an Internet search.

Similarly, I wonder what a frontier model would say if just given the paper in isolation and asked to summarise/opine on it. I suspect they would successfully recognize such obivous signs, the failure is when less sophisticated LLMs are just skimming search results and summarising them.