Remix.run Logo
globular-toast 2 days ago

This seems like the kind of thing you'd want from a distro. Would you be happy if your doctor just started giving you new drugs because they're "new technology"? Or would you prefer it to go through rigorous rounds of testing and evaluation to figure out the potential problems?

adastra22 2 days ago | parent [-]

I certainly hope my medical team is using AI tools, as they have been repeatedly demonstrated to be more accurate than doctors.

Only downside is my last psychiatrist dropped me as a patient when he left his practice to start an AI company providing regulatory compliance for, essentially, Dr. ChatGPT.

wobfan a day ago | parent | next [-]

> I certainly hope my medical team is using AI tools, as they have been repeatedly demonstrated to be more accurate than doctors.

AI is not a new tool - transformer-based LLMs are. Which is what this post is about.

The latter are very known to be a LOT LESS accurate, and still are very prone to hallucinate. This is just a fact. For your health I hope no one of your medical team is using the current generation for anything else than casual questions.

I'm not an opponent, and I don't think straight up banning LLM-generated code commits is the right thing, but I can understand their stance.

globular-toast a day ago | parent | prev [-]

Honestly it just sounds like you've been sold on "AI" being a thing and don't have any idea how any of it works. I don't even know what you're referring to with "more accurate than doctors". Classifying scans or something? Do you realise how different that is to generative LLMs writing code etc? Scan classification may well have been shown to be more accurate, but generative LLMs have never been shown to be "better" than humans and in fact it's easy to demonstrate they are much, much worse in many ways.

adastra22 a day ago | parent [-]

LLMs perform better than doctors in a randomized trial:

https://jamanetwork.com/journals/jamanetworkopen/fullarticle...

And here: https://arxiv.org/html/2503.10486v1

globular-toast a day ago | parent [-]

> the use of an LLM did not significantly enhance diagnostic reasoning performance compared with the availability of only conventional resources.

The other one isn't peer reviewed. Your précis doesn't appear to be warranted.

adastra22 a day ago | parent [-]

You only read the first line of the summary. This is the juicy bit:

> The LLM alone scored 16 percentage points (95% CI, 2-30 percentage points; P = .03) higher than the conventional resources group.

Basically they setup the experiment as a control group and a LLM-assisted group. There was no difference between the two groups and that is what was reported in the top level finding that you quote.

Then they went back and said “wait, what if we just blindly trusted the LLM? What if we had a third group that had no doctor involved — just let the LLM do the diagnosis?” This retroactively synthesized group did significantly better than either of the actual experimental groups:

> The LLM alone scored 16 percentage points (95% CI, 2-30 percentage points; P = .03) higher than the conventional resources group … The LLM alone demonstrated higher performance than both physician groups, indicating the need for technology and workforce development to realize the potential of physician-artificial intelligence collaboration in clinical practice.