Remix.run Logo
themafia 10 hours ago

This study is based almost entirely on pre-existing "vignettes." In other words, on tests that are already known and have existed for years, the model did well, which is precisely what you should expect.

It provides no information on real world outcomes or expectations of performance in such a setting. A simple question might be "how accurate are patient electronic health records typically?"

Finally, if the Internet somehow goes down at my hospital, the Doctor can still think, while LLM services cannot. If the power goes out at the hospital, the Doctor can still operate, while even local LLMs cannot.

You're going to need to improve the power efficiency of these models by at least two orders of magnitude before they're generally useful replacements of anything. As it is now they're a very expensive, inefficient and fragile toy.

krisoft 7 hours ago | parent [-]

> This study is based almost entirely on pre-existing "vignettes."

This is basically the only way how to ethically approach the topic. First you verify performance on “vignettes” as you say. Then if the performance appears satisfying you can continue towards larger tests and more raw sensor modalities. If the results are still promising (both that they statistically agree with the doctors, but also that when they disagree we find the AIs actions to fall benignly). These phases take a lot of time and carefull analysises. And only after that can we carefully design experiments where the AI works together with doctors. For example an experiment where the AI would offer suggestion for next steps to a doctor. These test need to be constructed with great care by teams who are very familiar with medical ethics, statistics and the problems of human decision making. And if the results are still positive just then can we move towards experiments where the humans are supervising the AI less and the AI is more in the driving seat.

Basically to validate this ethically will take decades. So we can’t really fault the researchers that they have only done the first tentative step along this long journey.

> if the Internet somehow goes down at my hospital, the Doctor can still think, while LLM services cannot

Privacy, resiliency and scalability are all best served with local LLMs here.

> If the power goes out at the hospital, the Doctor can still operate, while even local LLMs cannot.

Generators would be the obvious answer there. If we can make machines which outperform human doctors in realworld conditions providing generator backed UPS power for said machines will be a no brainer.

> You're going to need to improve the power efficiency of these models by at least two orders of magnitude before they're generally useful replacements of anything.

Why? Do you have numbers here or just feels?