OpenAI released this a couple months ago
https://openai.com/index/healthbench/
Give it a year and that benchmark will probably be maxed out too.