Remix.run Logo
fulafel 2 days ago

So GDPval is OpenAI's own benchmark. PDF link: https://arxiv.org/pdf/2510.04374