Ontario auditors find doctors' AI note takers routinely blow basic facts

rainsford 43 minutes ago | parent | next [-]

I have generally moved from bearish to bullish on the future of current AI technology, but the continued inaccuracy with basic facts all while the models significantly improve continues to give me significant pause.

As an example, creating recipes with Claude Opus based on flavor profiles and preferences feels magical, right up until the point at which it can't accurately convert between tablespoons and teaspoons. It's like the point in the movie where a character is acting nearly right but something is a bit off and then it turns out they're a zombie and going to try to eat your brain. This note taking example feels similar. It nearly works in some pretty impressive ways and then fails at the important details in a way that something able to do the things AI can allegedly do really shouldn't.

It's these failures that make me more and more convinced that while current generation AI can do some pretty cool things if you manage it right, we're not actually on the right track to achieve real intelligence. The persistence of these incredibly basic failure modes even as models advance makes it fairly obvious that continued advancement isn't going to actually address those problems.

▲

Brian_K_White 10 minutes ago | parent | next [-]

I hate to help provide possible soultions to an entire process I don't approve of, but maybe the fuzzy tools need old style deterministic tools the same way and for the same reasons we do.

So instead of an LLM trying to answer a math or reason question by finding a statistical match with other similar groups of words it found on 4chan and the all in podcast and a terrible recipe for soup written by a terrible cook, it can use a calculator when it needs a calculator answer.

▲

colechristensen 7 minutes ago | parent [-]

No, they just need to be trained to have adversarial self review "thinking" processes.

You ask an LLM "What's wrong with your answer?" and you get pretty good results.

	▲	binary0010 2 minutes ago \| parent [-]
		Or you get the original output result was perfect and the adversarial "rethinking" switches to an incorrect result.

▲

themafia 22 minutes ago | parent | prev [-]

> we're not actually on the right track to achieve real intelligence.

Real intelligence means you have to say "I don't know" when you don't know, or ask for help, or even just saying you refuse to help with the subtext being you don't want to appear stupid.

The models could ostensibly do this when it has low confidence in it's own results but they don't. What I don't know if it's because it would be very computationally difficult or it would harm the reputation of the companies charging a good sum to use them.

	▲	colechristensen 5 minutes ago \| parent \| next [-]
		You can TELL the models to do this and they'll follow your prompt. "Give me your answer and rate each part of it for certainty by percentage" or similar.
	▲	bluefirebrand 9 minutes ago \| parent \| prev [-]
		My theory is because the people building the models and in charge of directing where they go love the sycophantic yes-man behavior the models display They don't like hearing "I don't know"

▲

zOneLetter 2 hours ago | parent | prev | next [-]

Anecdotally, we use an LLM note-taker at work for meetings. I had to intervene recently because our CIO was VERY angry at our vendor for something they promised to do and never did. He wasn't at the meeting where the "promise" was made. I was. They never promised anything, and the discussion was significantly more nuanced than what the LLM wrote in the detailed summary.

In other cases, I have seen it miss the mark when the discussion is not very linear. For example, if I am going back and forth with the SOC team about their response to a recent alert/incident. It'll get the gist of it right, but if you're relying on it for accuracy, holy hell does it miss the mark.

I can see the LLM take great notes for that initial nurse visit when you're at the hospital: summarize your main issue, weight, height, recent changes, etc. I would not trust it when it comes to a detailed and technical back-and-forth with the doctor. I would think for compliance reasons hospitals would not want to alter the records and only go by transcripts, but what do I know...

▲

Hobadee an hour ago | parent | prev | next [-]

The AI note taker we use at work records the meeting as well, and each note it takes about the meeting has a timestamp link that takes you directly there in the recording so you can check it yourself. While I'm sure a solution like this is more complicated in a HIPPAA environment, something like this is critical for things as important as healthcare.

▲

TonyAlicea10 31 minutes ago | parent | next [-]

When designing AI-based user experiences I refer to this as provenance. It’s a vital aspect of trust, reliability, compliance and more. If a software system includes LLM output like this but doesn’t surface the provenance of its output for human evaluation and verification then it’s at best poor user experience, and at worst a dangerous one.

▲

an hour ago | parent | prev [-]

[deleted]

	▲	lostmsu an hour ago \| parent [-]
		You can check the summary immediately after the meeting, that gives some extra confidence that the notes were recognized correctly.

▲

ceejayoz 2 hours ago | parent | prev | next [-]

> 60% of evaluated AI Scribe systems mixed up prescribed drugs in patient notes, auditors say

Not mentioned, as far as I can see: the comparative human mistake rate.

Having seen a lot of medical records, 60% sounds about normal lol.

▲

thepotatodude 23 minutes ago | parent | next [-]

60% is insanely high and absolutely not the performance of human mistake rate. What charts are you reading?

▲

jmward01 31 minutes ago | parent | prev | next [-]

This is not a popular view 'AI sucks at X but so do humans' but I think it is valid and we should take wins where we can, especially in healthcare. It is pretty clear that initial accuracy issues will become less and less of a problem as these technologies mature. This focus on accuracy now as a 'see it's bad' talking point though misses the real danger. Medical note takers have an exceptionally high chance of being hijacked for money and that is an issue we need to bring attention to now. They provide a real-time feed into a trillion dollar industry. Just roll that around in your head for a second. Insurance companies are going to want to tap that feed in real time so they can squeeze more money out. Drug makers are going to want to tap into that feed so they can abuse the data. Hospitals will want to tap into that feed to wring more out of doctors and boost the number of billable codes for each encounter. Very few entities are looking to tap into that feed to, you guessed it, help the patient. I am for these systems (and I have been involved in building them in the past) but the feeding frenzy of business interest that will obviously get involved with them is the thing we should be yelling and screaming about, not short-term accuracy issues.

▲

Arodex an hour ago | parent | prev [-]

But who is responsible is different.

(And if you already see 60% error rates in standard, pre-AI note taking, how does that not translate into many deaths and injury? At least one country's health system in the world should have caught that)

▲

tredre3 16 minutes ago | parent | next [-]

> And if you already see 60% error rates in standard, pre-AI note taking, how does that not translate into many deaths and injury?

Presumably most doctor's visits are a one-problem-one-solution-one-doctor type of thing. Done deal, notes are never read again. So that alone would explain why high rates of errors doesn't result in injuries or death very often.

Any injury or death caused by poor notes would have to occur when mistakes are done if you're followed for a serious chronic condition, or if you're handled by a team where effective communication is required.

▲

ceejayoz an hour ago | parent | prev | next [-]

> how does that not translate into many deaths and injury?

Because most of it is just written down and never looked at again until there’s a lawsuit or something.

▲

cyanydeez an hour ago | parent | prev [-]

Yeah, the problem is the health system has no sacrificial goat if the AI note taker provides the wrong detail. The last thing we want is CTO being responsible!

	▲	bluefirebrand an hour ago \| parent [-]
		I'm not convinced the CTO would be held accountable either. I do wonder if people would be pushing AI so hard if their organizations were planning to hold them accountable for mistakes the AI made I bet if that were the case we'd see a lot slower rollout of AI systems

▲

nothinkjustai 23 minutes ago | parent | prev [-]

People will eventually figure out LLMs have no capacity for intent and are fundamentally unreliable for tasks such as summarization, note taking etc.