I'd greatly prefer a blind study comparing doctors to AI, rather than a study of doctors feeding AI scenarios and seeing if it matches their predetermined outcome.

Edit: People seem confused here. The study was feeding the AI structured clinical scenarios and seeing it's results. The study was not a live analyses of AI being used in the field to treat patients.

▲

riskassessment 7 hours ago | parent | next [-]

I don't understand this reasoning. Randomizing people to AI vs standard of care is expensive and risky. Checking whether the AI can pass hypothetical scenarios seems like a perfectly reasonable approach to researching the safety of these models before running a clinical trial.

▲

selridge 4 hours ago | parent | next [-]

The issue is that those hypothetical scenarios do not have to look like how patients actually interact with the tool.

Real life use is full of ill posed questions open ended statements inaccurate assessment of symptoms, and conclusory remarks sprinkled in between. Real use of chat bots for Health by non-clinicians looks very different than scenario based evaluation.

▲

WarmWash 7 hours ago | parent | prev | next [-]

You would pass those hypothetical scenarios to doctors too, and then the analyses of results would be done by doctors who don't know if it's an AI or doctor result.

▲

riskassessment 6 hours ago | parent [-]

From the paper

> Three physicians independently assigned gold-standard triage levels based on cited clinical guidelines and clinical expertise, with high inter-rater agreement

	▲	6 hours ago \| parent [-]
		[deleted]

▲

nick49488171 7 hours ago | parent | prev [-]

You can start by comparing "doctor" care vs "doctor who also uses AI" care

▲

GorbachevyChase 2 hours ago | parent | prev | next [-]

The number of people who die each year just in the United States for causes attributable to medical errors is believed to be in the hundreds of thousands. A doctor’s opinion is not the golden yardstick.

It may be interesting to study if there is some kind of signal in general health outcomes in the US since the popularization of ChatGPT for this purpose. It may be a while before we have enough data to know. I could see it going either way.

▲

hwillis 7 hours ago | parent | prev | next [-]

We have standards of care for a reason. They are the most basic requirements of testing. Ignoring them is not just being a bad doctor, its unethical treatment. Its the absolute bare minimum of a medical system.

▲

dekoidal 5 hours ago | parent | prev | next [-]

You're joking right? This is the 'testing on mice' phase and it failed and your idea is to start dosing humans just to see what happens.

	▲	selridge 4 hours ago \| parent [-]
		Human use is already widespread. You might as well complain in 2015 about the use of Wikipedia among emergency room doctors. That ship has sailed.

▲

RandomLensman 6 hours ago | parent | prev | next [-]

Feeding scenarios is not without challenges as some things, for example, smell, would be "pre-processed" by humans before fed into the AI, I think.

▲

lmkg 7 hours ago | parent | prev | next [-]

That type of experimental set-up is forbidden due to ethical concerns. It goes against medical ethics to give patients treatment that you think might be worse.

▲

nradov 7 hours ago | parent | prev | next [-]

I don't understand what you're proposing. How would you design such a study in a way that would pass IRB?

▲

dec0dedab0de 6 hours ago | parent | next [-]

I think the best would be an interface, where the patient isn't told if the doctor on the other end is human or AI. Tell them that they are going to do multiple remote exams with different care providers for the same illness in exchange for free treatment, and payment for the study.

If you're worried about not catching a legit emergency, as in something that can't wait a day or two for them to complete the different sessions, you could have a doctor monitor the interactions with the ability to raise a flag and step in to send them to the ER.

	▲	nradov 5 hours ago \| parent [-]
		I'm pretty sure that wouldn't pass IRB.

▲

SoftTalker 7 hours ago | parent | prev | next [-]

Feed it randomly selected case histories? See if it came up with the same diagnosis as the doctors?

▲

nradov 7 hours ago | parent [-]

I don't think that would tell us anything useful. The data quality in most patient charts is shockingly bad. I've seen a lot of them while working on clinical systems interoperability. Garbage in / garbage out. When human physicians make a diagnosis they typically rely on a lot of inputs that never appear in the patient chart.

And in most cases the diagnosis is the easy part. I mean we see occasional horror stories about misdiagnosis but those are rare. The harder and more important part is coming up with an effective treatment plan which the patient will actually follow, and then monitoring progress while making adjustments as needed. So a focus on the diagnosis portion of clinical decision support seems fundamentally misguided.

▲

qsera 6 hours ago | parent [-]

> When human physicians make a diagnosis they typically rely on a lot of inputs that never appear in the patient chart.

Yea, like how rich the patient is or if they are on insurance etc. I wish I was kidding.

	▲	PearlRiver 5 hours ago \| parent [-]
		This the real reason why some people go to chatGPT instead of a GP. I am glad to live in a country were going to the doctor is free.

▲

selridge 4 hours ago | parent | prev | next [-]

You could absolutely randomize care between a doctor and an AI under an IRB. I’d be stunned if there aren’t a dozen studies doing something like this already.

You have to justify it, but most places have sections in the document where you request review to justify it. It’s not any different from giving one patient heart medicine that you think works and another patient a sugar pill.

	▲	nradov 4 hours ago \| parent [-]
		Huh? Do you have any actual examples of such studies? I don't think you understand how IRB actually works. In actual heart medicine studies the control arm is typically treated with the current standard of care, not a placebo. So it seems pretty clear that you don't have any actual knowledge or experience in this area.

▲

dyauspitr 7 hours ago | parent | prev [-]

It’s all case histories and text no real person is affected by this.

▲

lkey 4 hours ago | parent | prev | next [-]

This 'preference' is sociopathic, illegal, and stupid.

▲

qsera 7 hours ago | parent | prev [-]

Yea, that is exactly why I don't like this.

These "experts", they have no problem to tout anecdotes when it serves them..

	▲	7 hours ago \| parent [-]
		[deleted]