paper: Performance of a large language model on the reasoning tasks of a physician
https://www.science.org/doi/10.1126/science.adz4433