Remix.run Logo
Cynddl 6 hours ago

> “These are not isolated incidents. They are symptoms of a systemic problem: the benchmarks we rely on to measure AI capability are themselves vulnerable to the very capabilities they claim to measure.”

As a researcher in the same field, hard to trust other researchers who put out webpages that appear to be entirely AI-generated. I appreciate it takes time to write a blog post after doing a paper, but sometimes I'd prefer just a link to the paper.