Amazing how you can just deflect any criticism of LLMs here by going “but humans suck too!” And the misanthropic HN userbase eats it up every time.

We live during the healthiest period in human history due to the fact that doctors are highly reliable and well-trained. You simply would not be able to replace a real doctor with an LLM and get desirable results.

▲ KronisLV 6 hours ago | parent | next [-]

> Amazing how you can just deflect any criticism of LLMs here by going “but humans suck too!” And the misanthropic HN userbase eats it up every time.

I think it's rather people trying to keep grounded and suggest that it's not just the hallucination machine that's bad, but also that many doctors in real life also suck - in part because of the domain being complex, but also due to a plethora of human reasons, such as not listening to your patients properly or disregarding their experiences and being dismissive (seems to happen to women more for some reason), or sometimes just being overworked.

> You simply would not be able to replace a real doctor with an LLM and get desirable results.

I don't think people should be replaced with LLMs, but we should benchmark the relative performance of various approaches:

  A) the performance of doctors alone, no LLMs
  B) the performance of LLMs alone, no human in the loop
  C) the performance of doctors, using LLMs

Problem is that historical cases where humans resolved the issue and not the ones where the patient died (or suffered in general as a consequence of the wrong calls being made) would be pre-selecting for the stuff that humans might be good at, and sometimes wouldn't even properly be known due to some of those being straight up malpractice on the behalf of humans, whereas benchmarking just LLMs against stuff like that wouldn't give enough visibility in the failings of humans either.

Ideally you'd assess the weaknesses and utility of both at a meaningfully large scale, in search of blind spots and systemic issues, the problem being that benchmarking that in a vacuum without involving real cases might prove to be difficult and doing that on real cases would be unethical and a non-starter. And you'd also get issues with finding the truly shitty doctors to include in the sample set, sometimes even ones with good intentions but really overworked (other times because their results would suggest they shouldn't be practicing healthcare), otherwise you're skewing towards only the competent ones which is a misrepresentation of reality.

Reminds me of an article that got linked on HN a while back: https://restofworld.org/2025/ai-chatbot-china-sick/

The fact that someone would say stuff like "Doctors are more like machines." implies failure before we even get to basic medical competency. People willingly misdirect themselves and risk getting horrible advice because humans will not give better advice and the sycophantic machine is just nicer.

▲

slopinthebag 4 hours ago | parent [-]

> I think it's rather people trying to keep grounded and suggest that it's not just the hallucination machine that's bad, but also that many doctors in real life also suck

No, you see this line or argumentation on every post critical of LLM's deficiencies. "Humans also produce bad code", "Humans also make mistakes" etc etc.

	▲	KronisLV 3 hours ago \| parent [-]
		> No, you see this line or argumentation on every post critical of LLM's deficiencies. "Humans also produce bad code", "Humans also make mistakes" etc etc. So your reading of this is that it's a deflection of the shortcomings? My reading of it is that both humans and LLMs suck at all sorts of tasks, often in slightly different ways. One being bad at something doesn't immediately make the other good if it also sucks - it might, however, suggest that there are issues with the task itself (e.g. in regards to code: no proper tests and harnesses of various scripts that push whoever is writing new code in the direction of being correct and successful).

▲ boondongle 6 hours ago | parent | prev [-]

Even in medicine, often the difference between drug A and drug B is the difference between the two in statistical terms. If drugs were held to the standard "works 100% of the time", no drug would ever be cleared for use. Feelings about AI and this administration are influencing this conversation far too much.

It's like people want to remove the physician or current care from the discussion. It's weird because care is already too expensive and too error prone for the cost.