| ▲ | oofbey 8 hours ago | |||||||
You’re holding on to the intuition (hope) that we are smarter than the LLMs in some hard to define way. Maybe. But it’s getting harder and harder to define a task that humans beat LLMs on. On pretty much any easily quantifiable test of knowledge or reasoning, the machines win. I agree experienced humans are still better on “judgement” tasks in their field. But the judgement tasks are kinda necessarily ones where there isn’t a correct answer. And even then, I think the machines’ judgement is better than a lot of humans. Is medical diagnosis one of these high judgement tasks? Personally I don’t think so. | ||||||||
| ▲ | Calavar 7 hours ago | parent | next [-] | |||||||
> But it’s getting harder and harder to define a task that humans beat LLMs on. On pretty much any easily quantifiable test of knowledge or reasoning, the machines win. Quite to the contrary, I think it's extremely trivial to find a task where humans beat LLMs. For all the money that's been thrown at agentic coding, LLMs still produce substantially worse code than a senior dev. See my own prior comments on this for a concrete example [1]. These trivial failure cases show that there are dimensions to task proficiency - significant ones - that benchmarks fail to capture. > Is medical diagnosis one of these high judgement tasks? Situational. I would break diagnosis into three types: 1. The diagnosis comes from objective criteria - laboratory values, vital signs, visual findings, family history. I think LLMs are likely already superior to humans in this case. 2. The diagnosis comes from "chart lore" - reading notes from prior physicians and realizing that there is new context now points to a different diagnosis. (That new context can be the benefit of hindsight into what they already tried and failed and/or new objective data). LLMs do pretty good at this when you point them at datasets where all the prior notes were written by humans, which means that those humans did a nontrivial part of the diagnostic work. What if the prior notes were written by LLMs as well? Will they propagate their own mistakes forward? Yet to be studied in depth. 3. The diagnosis comes from human interaction - knowing the difference between a patient who's high as a bat on crack and one who's delirious from infection; noticing that a patient hesitates slightly before they assure you that they've been taking all their meds as prescribed; etc. I doubt that LLMs will ever beat humans at this, but if LLMs can be proven to be good at point 2, then point 3 alone will not save human physicians. [1] https://news.ycombinator.com/threads?id=Calavar#47891432 | ||||||||
| ▲ | MapleMoth 8 hours ago | parent | prev | next [-] | |||||||
>But it’s getting harder and harder to define a task that humans beat LLMs on. On pretty much any easily quantifiable test of knowledge or reasoning, the machines win. I and likely the person who you replayed to don't find that existing studies actually hold this to be true. | ||||||||
| ▲ | eueheu 8 hours ago | parent | prev | next [-] | |||||||
LLM’s operate on a mechanical form of intelligence one that at present is not adaptive to changes in the environment. If the latter part of your post were true, how come the demand for radiologists has grown? The problem with this place is it’s full of people who don’t understand nuance. And your post demonstrates this emphatically. | ||||||||
| ||||||||
| ▲ | idiotsecant 7 hours ago | parent | prev [-] | |||||||
There are almost no real world tasks that LLMs outperform humans on, operating by themselves. Pair them with a human for adaptability, judgement, and real world context and let the human drive, sure. Just let it loose on its own? You get an ocean of slop that doesn't do even close to what it's supposed to. | ||||||||