You’re holding on to the intuition (hope) that we are smarter than the LLMs in some hard to define way. Maybe. But it’s getting harder and harder to define a task that humans beat LLMs on. On pretty much any easily quantifiable test of knowledge or reasoning, the machines win. I agree experienced humans are still better on “judgement” tasks in their field. But the judgement tasks are kinda necessarily ones where there isn’t a correct answer. And even then, I think the machines’ judgement is better than a lot of humans.

Is medical diagnosis one of these high judgement tasks? Personally I don’t think so.

▲

Calavar 7 hours ago | parent | next [-]

> But it’s getting harder and harder to define a task that humans beat LLMs on. On pretty much any easily quantifiable test of knowledge or reasoning, the machines win.

Quite to the contrary, I think it's extremely trivial to find a task where humans beat LLMs.

For all the money that's been thrown at agentic coding, LLMs still produce substantially worse code than a senior dev. See my own prior comments on this for a concrete example [1].

These trivial failure cases show that there are dimensions to task proficiency - significant ones - that benchmarks fail to capture.

> Is medical diagnosis one of these high judgement tasks?

Situational. I would break diagnosis into three types:

1. The diagnosis comes from objective criteria - laboratory values, vital signs, visual findings, family history. I think LLMs are likely already superior to humans in this case.

2. The diagnosis comes from "chart lore" - reading notes from prior physicians and realizing that there is new context now points to a different diagnosis. (That new context can be the benefit of hindsight into what they already tried and failed and/or new objective data). LLMs do pretty good at this when you point them at datasets where all the prior notes were written by humans, which means that those humans did a nontrivial part of the diagnostic work. What if the prior notes were written by LLMs as well? Will they propagate their own mistakes forward? Yet to be studied in depth.

3. The diagnosis comes from human interaction - knowing the difference between a patient who's high as a bat on crack and one who's delirious from infection; noticing that a patient hesitates slightly before they assure you that they've been taking all their meds as prescribed; etc. I doubt that LLMs will ever beat humans at this, but if LLMs can be proven to be good at point 2, then point 3 alone will not save human physicians.

[1] https://news.ycombinator.com/threads?id=Calavar#47891432

▲

MapleMoth 8 hours ago | parent | prev | next [-]

>But it’s getting harder and harder to define a task that humans beat LLMs on. On pretty much any easily quantifiable test of knowledge or reasoning, the machines win.

I and likely the person who you replayed to don't find that existing studies actually hold this to be true.

▲

eueheu 8 hours ago | parent | prev | next [-]

LLM’s operate on a mechanical form of intelligence one that at present is not adaptive to changes in the environment.

If the latter part of your post were true, how come the demand for radiologists has grown? The problem with this place is it’s full of people who don’t understand nuance. And your post demonstrates this emphatically.

	▲	jtonz 7 hours ago \| parent [-]
		For me there are a few main takeaways on how AI _could_ supersede the average ER doctor. The first is that a technical solution can be trained on _ALL_ medical data and have access to it all in the moment. It is difficult to assume a doctor could also achieve this. The second is that for medical cases understanding the sum of all symptoms and the patients vitals would lead to an accurate diagnosis a majority of the time. AI/ML is entirely about pattern recognition, when you combine this with point one, you end up with a system that can quickly diagnose a large portion of patients in extremely short timeframes. On a different note, I think we can leave the ad-hominem attacks at home please.

▲

idiotsecant 7 hours ago | parent | prev [-]

There are almost no real world tasks that LLMs outperform humans on, operating by themselves. Pair them with a human for adaptability, judgement, and real world context and let the human drive, sure. Just let it loose on its own? You get an ocean of slop that doesn't do even close to what it's supposed to.