Remix.run Logo
I let ChatGPT analyze a decade of my Apple Watch data, then I called my doctor(msn.com)
66 points by zdw 9 hours ago | 83 comments
chrisfosterelli 7 hours ago | parent | next [-]

Health metrics are absolutely tarnished by a lack of proper context. Unsurprisingly, it turns out that you can't reliably take a concept as broad as health and reduce it to a number. We see the same arguments over and over with body fat percentages, vo2 max estimates, BMI, lactate thresholds, resting heart rate, HRV, and more. These are all useful metrics, but it's important to consider them in the proper context that each of them deserve.

This article gave an LLM a bunch of health metrics and then asked it to reduce it to a single score, didn't tell us any of the actual metric values, and then compared that to a doctor's opinion. Why anyone would expect these to align is beyond my understanding.

The most obvious thing that jumps out to me is that I've noticed doctors generally, for better or worse, consider "health" much differently than the fitness community does. It's different toolsets and different goals. If this person's VO2 max estimate was under 30, that's objectively a poor VO2 max by most standards, and an LLM trained on the internet's entire repository of fitness discussion is likely going to give this person a bad score in terms of cardio fitness. But a doctor who sees a person come in who isn't complaining about anything in particular, moves around fine, doesn't have risk factors like age or family history, and has good metrics on a blood test is probably going to say they're in fine cardio health regardless of what their wearable says.

I'd go so far to say this is probably the case for most people. Your average person is in really poor fitness-shape but just fine health-shape.

Shank 2 hours ago | parent | next [-]

> But a doctor who sees a person come in who isn't complaining about anything in particular, moves around fine, doesn't have risk factors like age or family history, and has good metrics on a blood test is probably going to say they're in fine cardio health regardless of what their wearable says.

This is true of many metrics and even lab results. Good doctors will counsel you and tell you that the lab results are just one metric and one input. The body acclimates to its current conditions over time, and quite often achieves homeostasis.

My grandma was living for years with an SpO2 in the 90-95% range as measured by pulse oximetry, but this was just one metric measured with one method. It doesn't mean her blood oxygen was actually repeatedly dropping, it just meant that her body wasn't particularly suited to pulse oximetry.

colechristensen 2 hours ago | parent | prev [-]

>I'd go so far to say this is probably the case for most people. Your average person is in really poor fitness-shape but just fine health-shape.

Modern medicine has failed to move into the era of subtlety and small problems and many people suffer as a result. Fitness nerds and general non-scientists fill the gap poorly so we get a ton of guessing and anecdotal evidence and likely a whole lot of bad advice.

Doctors won't say there's a problem until you're SICK and usually pretty late in the process when there's not a lot of room to make improvements.

At the same time, doctors won't do anything if you're 5% off optimal, but they'll happily give you a medicine that improves one symptom that's 50% off optimal that comes along with 10 side effects. Although unless you're dying or have something really straightforward wrong with you, doctors don't do much at all besides giving you a sedative and or a stimulant.

Doctors don't know what to do with small problems because they're barely studied and the people who DO try to do something don't do it scientifically.

anon7000 33 minutes ago | parent | next [-]

A worthwhile book to read on this topic is Outlive by Peter Attia (MD). The core premise is that American healthcare focuses far too much on treating problems after they’re extremely severe. It is would be cheaper and healthier to invest more into conservative & preventative care, trying to prevent or minimize problems early in life before they become incredibly dangerous and expensive/difficult/impossible to treat.

I have a close friend who works in conservative care, and it’s astonishing what they see. For example, someone went to a number of specialists and doctors about a throat condition where they really struggled swallowing. They even had to swallow a radioactive pill to do some kind of imaging. Unnecessary exposure, and an expensive process to go through, and ultimately went exactly nowhere.

Meanwhile, it was a simple musculoskeletal issue which my friend was able to resolve in a single visit with absolutely no risk to the patient.

Medical schools need to stop producing MDs who reach for pills as the first line of defense without trying to root cause issues. Do you really need addictive pain killers, or maybe some PT, exercise, massage, etc. to help resolve your pain.

lnsru an hour ago | parent | prev [-]

It’s not medicine. It’s healthcare system. Doctor isn’t paid enough to go thoroughly through the complaint and dig deeper. In Germany you get 5 minutes diagnose and that’s all from health insurance. And this from the better doctor. For normal one diagnose comes from 2 minutes interaction. Believing that the diagnose is right is very naive.

gizmodo59 4 minutes ago | parent | prev | next [-]

For every sensational article of AI was useless, there is plenty of examples where using ChatGPT to find out what else could be happening and then having a conversation with doctor has helped many that I know of anecdotally and many such reports online as well.

At the end of the day, it’s yet another tool that people can use to help their lives. They have to use their brain. The culture of seeing doctor as a god doesn’t hold up anymore. So many people have had bad experiences when the entire health care industry at least in US is primarily a business than helping society get healthy.

wawayanda 7 hours ago | parent | prev | next [-]

A year or so ago, I fed my wife's blood work results into chatgpt and it came back with a terrifying diagnosis. Even after a lot of back and forth it stuck to its guns. We went to a specialist who performed some additional tests and explained that the condition cannot be diagnosed with just the original blood work and said that she did not have the condition. The whole thing was a borderline traumatic ordeal that I'm still pretty pissed about.

greenknight 4 hours ago | parent | next [-]

On the flip side, i had some pain in my chest... RUQ (right upper quadrant for those medical folk).

On the way to the hospital, ChatGPT was pretty confident it was a issue with my gallbladder due to me having a fatty meal for lunch (but it was delicious).

After an extended wait time to be seen, they didnt ask about anything like that, and at the end they were like anything else to add, added it in about ChatGPT / Gallbladder... discharged 5 minutes later with suspicion of Gallbladder as they couldnt do anything that night.

Over the next few weeks, got test after test after test, to try and figure out whats going on. MRI. CT. Ultrasound etc.etc. they all came back negative for the gallbladder.

ChatGPT was persistant. It said to get a HIDA scan, a more specialised scan. My GP was a bit reluctant but agreed. Got it, and was diagnosed with a hyperkinetic gallbladder. It is still unrecognised as an issue, but mostly accepted. So much so my surgeon initally said that it wasnt a thing (then after doing research about it, says it is a thing)... and a gastroentologist also said it wasnt a thing.

Had it taken out a few weeks ago, and it was chroically inflammed. Which means the removal was the correct path to go down.

It just sucks that your wife was on the other end of things.

tharkun__ 4 hours ago | parent [-]

This reminds me of another recent comment in some other post, about doctors not diagnosing "hard to diagnose" things.

There are probably ("good") reasons for this. But your own persistence, and today the help of AI, can potentially help you. The problem with it is the same problem as previously: "charlatans". Just that today the charlatan and the savior are both one and the same: The AI.

I do recognize that most people probably can't tell one from the other. In both cases ;)

You'll find this in my post history a few times now but essentially: I was lethargic all the time, got migraine type headaches "randomly" a lot. Having the feeling I'd need to puke. One time I had to stop driving as it just got so bad. I suddenly was no longer able to tolerate alcohol either.

I went to multiple doctors, was sent to specialists, who all told me that they could maaaaaybe do test XYX but essentially: It wasn't a thing, I was crazy.

Through a lot of online research I "figured out" (and that's an over-statement) that it was something about the gut microbiome. Something to do with histamine. I tried a bunch of things, like I suspected it might be DAO (Di-Amino-Oxidase) insufficiency. I tried a bunch of probiotics, both the "heals all your stuff" and "you need to take a single strain or it won't work" type stuff. Including "just take Actimel". Actimel gave me headaches! Turns out one of the (prominent) strains in there makes histamine. Guess what, Alcohol, especially some, has histamines and your "hangover" is also essentially histamines (made worse by the dehydration). And guess what else, some foods, especially some I love, contain or break down into histamines.

So I figured that somehow it's all about histamines and how my current gut microbiome does not deal well with excess histamines (through whichever source). None of the doctors I went to believed this to be a "thing" nor did they want to do anything about it. Then I found a pro-biotic that actually helped. If you really want to check what I am taking, check the history. I'm not a marketing machine. What I do believe is that one particular bacterium helped, because it's the one thing that wasn't in any of the other ones I took: Bacillus subtilis.

A soil based bacterium, which in the olden times, you'd have gotten from slightly not well enough cleaned cabbage or whatever vegetable du jour you were eating. Essentially: if your toddler stuffs his face with a handful of dirt, that's one thing they'd be getting and it's for the better! I'm saying this, because the rest of the formulation was essentially the same as the others I tried.

I took three pills per day, breakfast, lunch and dinner. I felt like shit for two weeks, even getting headaches again. I stuck with it. After about two weeks I started feeling better. I think that's when my gut microbiome got "turned around". I was no longer lethargic and I could eat blue cheese and lasagna three days in a row with two glasses of red wine and not get a headache any longer! Those are all foods that contain or make lots of histamine. I still take one per day and I have no more issues.

But you gotta get to this, somehow, through all of the bullshit people that try to sell you their "miracle cure" stuff. And it's just as hard as trying to suss out where the AI is bullshitting you.

There was exactly a single doctor in my life, who I would consider good in that regard. I had already figured the above one out by that time but I was doing keto and it got all of my blood markers, except for cholesterol into normal again. She literally "googled" with me about keto a few times, did a blood test to confirm that I was in ketosis and in general was just awesome about this. She was notoriously difficult to book and later than any doctor for schedules appointments, but she took her time and even that would not really ever have been enough to suss out the stuff that I figured out through research myself if you ask me. While doctors are the "half gods in white", I think there's just way too much stuff and way too little time for them. It's like: All the bugs at your place of work. Now imagine you had exactly one doctor across a multitude of companies. Of course they only figure out the "common" ones ...

tstrimple a minute ago | parent | next [-]

It's horses not zebras until it's actually a zebra and your life depends on it. I think those sorts of guidelines are useful in the general case. But many medical issues quickly move beyond the general case and need closer examination. Not sure how you do that effectively without wasting tons of money on folks with indigestion.

xenonite 23 minutes ago | parent | prev [-]

Interesting to read, thank you very much. Are you still eating ketogenic? The bacillus subtilis seems to metabolize glucose, so are yours still alive? And did you try other probiotica beforehand? I am having HIT and eating a mostly carnivore diet with mostly fresh/unfermented meat.

worldsavior an hour ago | parent | prev | next [-]

I think it's your problem you got stressed from a probabilistic machine answering with what you want to hear.

fn-mote 6 hours ago | parent | prev | next [-]

> I fed my wife's blood work results into chatgpt and it came back with a terrifying diagnosis

I don't get it... a doctor ordered the blood work, right? And surely they did not have this opinion or you would have been sent to a specialist right away. In this case, the GP who ordered the blood work was the gatekeeper. Shouldn't they have been the person to deal with this inquiry in the first place?

I would be a lot more negative about "the medical establishment" if they had been the ones who put you through the trauma. It sounds like this story is putting yourself through trauma by believing "Dr. GPT" instead of consulting a real doctor.

I will take it as a cautionary tale, and remember it next time I feed all of my test results into an LLM.

kolinko 11 minutes ago | parent | next [-]

At least in Poland, I can almost always see my results before my doctor does - I get a notification that the labwork is ready and I can view results online.

Also, the regular bloodwork is around $50-$100 (for noninsured or without a prescription), so many people just do this out of pocket once in a while and only bring to doctor if anything looks suspicious.

Finally, there is EU regulation about data that applies to medical field as well - you always have the right to view all the data that any company has stored about you. Gatekeeping is forbidden by law.

vineyardmike 5 hours ago | parent | prev [-]

You don't need a doctor to order bloodwork. I get a full panel done yearly, just to establish a baseline and trend. I try not to overanalyze it, and just keep it around for a professional in case some real issue arises in the future.

SchemaLoad 6 hours ago | parent | prev | next [-]

I asked a doctor friend why it seems common for healthcare workers to keep the results sheets to themself and just give you a good/bad summary. He told me that the average person can't properly understand the data and will freak themselves out over nothing.

themafia 5 hours ago | parent | prev | next [-]

> it stuck to its guns

It gave you a probabilistic output. There were no guns and nothing to stick to. If you had disrupted the context with enough countervailing opinion it would have "relented" simply because the conversational probabilities changed.

nprateem 4 hours ago | parent [-]

It's amazing this still needs to be said, especially here

coffeefirst 2 hours ago | parent [-]

Here, sure.

For the general public, these tools have been advertised this way.

So if a good subset of HN still gets fooled, the layperson is screwed.

fouc 4 hours ago | parent | prev | next [-]

> it stuck to its guns

Everyone that encounters this needs to do a clean/fresh prompt with memory disabled to really know if the LLM is going to consistently come to the same conclusion or not.

irjustin 6 hours ago | parent | prev | next [-]

Isn't it two sides to the same coin?

You should be happy about it that it's not the thing specifically when the signs pointed towards it being "the thing"?

themafia 5 hours ago | parent [-]

You are _absolutely_ going to die in the next 30 minutes.

When it doesn't happen will you still be happy?

nprateem 4 hours ago | parent | next [-]

Depends if I'm now broke from blowing it all on crack and hookers.

irjustin 5 hours ago | parent | prev [-]

How is this apples-apples at all?

But to answer directly... yes? yes, I am.

[edit]

A bit it more real. My blood pressure monitor says my bp is 200/160. Chat says you're dead get yourself to a hospital.

Get to the hospital and says oh your bp monitor is wrong.

I'm happy? I would say that I am. Sure I'm annoyed at my machine, but way happier it's wrong than right.

vineyardmike 5 hours ago | parent [-]

This is another example of why its frustrating still.

"Yes I'm happy I'm not dying" ignores that "go to the hospital [and waste a day, maybe some financial cost]" because a machine was wrong. This is still pretty inconvenient because a machine wasn't accurate/calibrated/engineered weak. Not dying is good, but the emotions and fear for a period of time is still bad.

irjustin 2 hours ago | parent [-]

Yeah I guess I just don't see eye-to-eye on this.

I 100% understand those frustrations. That the "detectors" should've been more accurate, or the fears, battery of tests, and costs associated of time and money. But, if you have the means to find out something that could have been extremely concerning is actually "nothing wrong" - isn't that worth it?

My friend is 45, had bloody stool -> colonoscopy -> polyps removed -> benign. Isn't that way better than colon cancer?

Maybe it's a glass half-empty-full thing.

jesterson 2 hours ago | parent | prev | next [-]

Never ceases to surpise me why people taking word salad output so seriously.

And probably the same people laugh at ancient folks carefully listening to shamans.

orionsbelt 6 hours ago | parent | prev | next [-]

> "A year or so ago"

What model?

Care to share the conversation? Or try again and see how the latest model does?

bigbuppo an hour ago | parent | prev | next [-]

Why not just ask WebMD?

terribleperson 4 hours ago | parent | prev | next [-]

Do you have a custom prompt/personality set? What is it?

daveguy 7 hours ago | parent | prev [-]

Please keep telling your story. This is the kind of shit that medical science has been dealing with for at least a century. When evaluating testing procedures false positives can have serious consequences. A test that's positive every time will catch every single true positive, but it's also worthless. These LLMs don't have a goddamn clue about it. There should be consequences for these garbage fires giving medical advice.

maerF0x0 7 hours ago | parent [-]

Part of the issue is taking it's output as conclusion rather than as a signal / lead.

I would never let an LLM make an amputate or not decision, but it could convince me to go talk with an expert who sees me in person and takes a holistic view.

sinuhe69 40 minutes ago | parent | prev | next [-]

My general take on any AI/ML in medicine is that without a proper clinical validation, they are not worth to try. Also, AI Snake Oil is worth reading.

freedomben 8 hours ago | parent | prev | next [-]

> Despite having access to my weight, blood pressure and cholesterol, ChatGPT based much of its negative assessment on an Apple Watch measurement known as VO2 max, the maximum amount of oxygen your body can consume during exercise. Apple says it collects an “estimate” of VO2 max, but the real thing requires a treadmill and a mask. Apple says its cardio fitness measures have been validated, but independent researchers have found those estimates can run low — by an average of 13 percent.

There's plenty of blame to go around for everyone, but at least for some of it (such as the above) I think the blame more rests on Apple for falsely representing the quality of their product (and TFA seems pretty clearly to be blasting OpenAI for this, not others like Apple).

What would you expect the behavior of the AI to be? Should it always assume bad data or potentially bad data? If so, that seems like it would defeat the point of having data at all as you could never draw any conclusions from it. Even disregarding statistical outliers, it's not at all clear what part of the data is "good" vs "unrealiable" especially when the company that collected that data claims that it's good data.

brandonb 8 hours ago | parent | next [-]

FWIW, Apple has published validation data showing the Apple Watch's estimate is within 1.2 ml/kg/min of a lab-measured Vo2Max.

Behind the scenes, it's using a pretty cool algorithm that combines deep learning with physiological ODEs: https://www.empirical.health/blog/how-apple-watch-cardio-fit...

itchyouch 6 hours ago | parent | next [-]

The trick with the vo2 max measurement on the apple watch though is that the person can not waste any time during their outdoor walk and needs to maintain a brisk pace.

Then there's confounders like altitude, elevation gain that can sully the numbers.

It can be pretty great, but it needs a bit of control in order to get a proper reading.

ignoramous 6 hours ago | parent | prev [-]

The paper itself: https://www.apple.com/healthcare/docs/site/Using_Apple_Watch...

Seems like Apple's 95% accuracy estimate for VO2 max holds up.

  Thirty participants wore an Apple Watch for 5-10 days to generate a VO2 max estimate. Subsequently, they underwent a maximal exercise treadmill test in accordance with the modified Åstrand protocol. The agreement between measurements from Apple Watch and indirect calorimetry was assessed using Bland-Altman analysis, mean absolute percentage error (MAPE), and mean absolute error (MAE).

  Overall, Apple Watch underestimated VO2 max, with a mean difference of 6.07 mL/kg/min (95% CI 3.77–8.38). Limits of agreement indicated variability between measurement methods (lower -6.11 mL/kg/min; upper 18.26 mL/kg/min). MAPE was calculated as 13.31% (95% CI 10.01–16.61), and MAE was 6.92 mL/kg/min (95% CI 4.89–8.94).

  These findings indicate that Apple Watch VO2 max estimates require further refinement prior to clinical implementation. However, further consideration of Apple Watch as an alternative to conventional VO2 max prediction from submaximal exercise is warranted, given its practical utility.
https://pmc.ncbi.nlm.nih.gov/articles/PMC12080799/
aeonfox 8 hours ago | parent | prev | next [-]

> I think the blame more rests on Apple for falsely representing the quality of their product

There was plenty of other concerning stuff in that article. And from a quick read it wasn't suggested or implied the VO2 max issue was the deciding factor for the original F score the author received. The article did suggest many times over the ChatGPT is really not equipped for the task of health diagnosis.

> There was another problem I discovered over time: When I tried asking the same heart longevity-grade question again, suddenly my score went up to a C. I asked again and again, watching the score swing between an F and a B.

brandonb 8 hours ago | parent [-]

The lack of self-consistency does seem like a sign of a deeper issue with reliability. In most fields of machine learning robustness to noise is something you need to "bake in" (often through data augmentation using knowledge of the domain) rather than get for free in training.

jayd16 6 hours ago | parent | prev | next [-]

Well if it doesn't know the quality of the data and especially if it would be dangerous to guess then it should probably say it doesn't have an answer.

AndrewKemendo 7 hours ago | parent | prev | next [-]

> Should it always assume bad data or potentially bad data? If so, that seems like it would defeat the point of having data at all as you could never draw any conclusions from it.

Yes. You, and every other reasoning system, should always challenge the data and assume it’s biased at a minimum.

This is better described as “critical thinking” in its formal form.

You could also call it skepticism.

That impossibility of drawing conclusions assumes there’s a correct answer and is called the “problem of induction.” I promise you a machine is better at avoiding it than a human.

Many people freeze up or fail with too much data - put someone with no experience in front of 500 ppl to give a speech if you want to watch this live.

hmokiguess 8 hours ago | parent | prev | next [-]

I have been sitting and waiting for the day these trackers get exposed as just another health fad that is optimized to deliver shareholder value and not serious enough for medical grade applications

NoPicklez 8 hours ago | parent [-]

I don't see how they are considered a health fad, they're extremely useful and accurate enough. There are plenty of studies and real world data showing Garmin VO2Max readings being accurate to 1-2 points different to a real world test.

There is this constant debate about how accurately VO2max is measured and its highly dependent on actually doing exercise to determine your VO2max using your watch. But yes if you want a lab/medically precise measure you need to do it a test that measures your actual oxygen uptake.

miltonlost 8 hours ago | parent | prev [-]

> What would you expect the behavior of the AI to be? Should it always assume bad data or potentially bad data? If so, that seems like it would defeat the point of having data at all as you could never draw any conclusions from it.

Well, I would expect the AI to provide the same response as a real doctor did from the same information. Which the article went over the doctors were able to.

I also would expect the AI to provide the same answer every time to the same data unlike what it did (from F to B over multiple attempts in the article)

OpenAI is entirely to blame here when they are putting out faulty products, (hallucinations even on accurate data are a fault of them).

jdub 5 hours ago | parent [-]

Why do you have those expectations?

cameldrv 2 hours ago | parent | prev | next [-]

I dunno, if the Apple Watch said he had a vo2max of 30, that probably means he can’t run a mile in less than 12 minutes or so. He’s probably not at all healthy…

dfajgljsldkjag 8 hours ago | parent | prev | next [-]

The author is a healthy person but the computer program still gave him a failing grade of F. It is irresponsible for these companies to release broken tools that can cause so much fear in real people. They are treating serious medical advice like it is just a video game or a toy. Real users should not be the ones testing these dangerous products.

nomel 7 hours ago | parent | next [-]

> It is irresponsible for these companies

I would claim that ignoring the "ChatGPT is AI and can make mistakes. Check important info." text, right under the query they type in client, is clearly more irresponsible.

I think that a disclaimer like that is the most useful and reasonable approach for AI.

"Here's a tool, and it's sometimes wrong." means the public can have access to LLMs and AI. The alternative, that you seem to be suggesting (correct me if I'm wrong), means the public can't have access to an LLM until they are near perfect, which means the public can't ever have access to an LLM, or any AI.

What do you see as a reasonable approach to letting the public access these imperfect models? Training? Popups/agreement after every question "I understand this might be BS"? What's the threshold for quality of information where it's no longer considered "broken"? Is that threshold as good as or better than humans/news orgs/doctors/etc?

coffeefirst 2 hours ago | parent | next [-]

Oh I have a plan for this.

Allow it to answer general questions about health, medicine and science.

It can’t practice medicine, it can only be a talking encyclopedia that tells you how the heart works and how certain biomarkers are used. Analyzing your specific case or data is off limits.

And then when the author asks his question, it says it’s not designed to do that.

ytoawwhra92 6 hours ago | parent | prev | next [-]

Why are you assuming that the general public ought to have access to imperfect tools?

I live in a place where getting a blood test requires a referral from a doctor, who is also required to discuss the results with you.

kolinko 2 minutes ago | parent | next [-]

> I live in a place where getting a blood test requires a referral from a doctor, who is also required to discuss the results with you.

You’re saying it like it’s a good thing.

nomel 5 hours ago | parent | prev [-]

> Why are you assuming that the general public ought to have access to imperfect tools?

Could you tell me which source of information do you see as "perfect" (or acceptable) that you see as a good example of a threshold for what you think the public should and should not have access to?

Also, what if a tool still provides value to the user, in some contexts, but not to others, in different contexts (for example, using the tool wrong)?

For the "tool" perspective, I've personal never seen a perfect tool. Do you have an example?

> I live in a place where getting a blood test requires a referral from a doctor, who is also required to discuss the results with you.

I don't see how this is relevant. In the above article, the user went to their doctor for advice and a referral. But, in the US (and, many European countries) blood tests aren't restricted, and can be had from private labs out of pocket, since they're just measurements of things that exist in your blood, and not allowing you to know what's inside of you would be considered government overreach/privacy violation. Medical interpretations/advice from the measurements is what's restricted, in most places.

ytoawwhra92 5 hours ago | parent [-]

> Could you tell me which source of information do you see as "perfect" (or acceptable) that you see as a good example of a threshold for what you think the public should and should not have access to?

I know it when I see it.

> I don't see how this is relevant.

It's relevant because blood testing is an imperfect tool. Laypeople lack the knowledge/experience to identify imperfections and are likely to take results at face value. Like the author of the article did when ChatGPT gave them an F for their cardiac health.

> Medical interpretations/advice from the measurements is what's restricted, in most places.

Do you agree with that restriction?

nomel 4 hours ago | parent [-]

> I know it when I see it.

This isn't a reasonable answer. No action can be taken and no conclusion/thought can be made from it.

> Do you agree with that restriction?

People should be able to perform and be informed about their own blood measurements, and possibly bring something up with their doctors outside of routine exams (which they may not even be insured for in the US). I think the restriction on medical advice/conclusion, that results in treatment, is very good, otherwise you end up with "Wow, look at these results! you'll have to buy my snake oil or you'll die!".

I don't believe in reducing society to a level that completely protects the most stupid of us.

ytoawwhra92 3 hours ago | parent [-]

> This isn't a reasonable answer.

Sure it is. The world runs on human judgement. If you want me to rephrase I could say that the threshold for imperfection should reflect contemporary community standards, but Stewart's words are catchier.

> I think the restriction on medical advice/conclusion, that results in treatment, is very good, otherwise you end up with "Wow, look at these results! you'll have to buy my snake oil or you'll die!".

Some people would describe this as an infringement on their free speech and bodily autonomy.

Which is to say that I think you and I agree that people in general need the government to apply some degree of restriction to medicine, we just disagree about where the line is.

But I think if I asked you to describe to me exactly where the line is you'd ultimately end up at some incarnation of "I know it when I see it".

Which is fine. Even good, I think.

> I don't believe in reducing society to a level that completely protects the most stupid of us.

This seems at odds with what you said above. A non-stupid person would seek multiple consistent opinions before accepting medical treatment, after all.

throwaway290 an hour ago | parent | prev | next [-]

> "ChatGPT is AI and can make mistakes. Check important info."

Is the same thing that can be said about any human

> "Doctor is human and can make mistakes"

Therefore it's really not sufficient to make it clear that it is wrong in different ways and worse than human.

zdragnar 7 hours ago | parent | prev [-]

> Popups/agreement after every question "I understand this might be BS"?

Considering the number of people who take LLM responses as authoritative Truth, that wouldn't be the worst thing in the world.

dylan604 7 hours ago | parent | prev | next [-]

What LLM should the LLM turn to ask if what the user is asking is safe for the first LLM to answer?

elzbardico 6 hours ago | parent | prev [-]

Well, what we could expect? It is a fucking Large Language Model. You're feeding it a very long multi-variable time series, it can't make any sense of it, but it is going to generate text.

If you are lucky, maybe it was finetuned to see a long comma-delimited sequence of values as a table and then emit a series of tool calls to generate some deterministic code to calculate a set of descriptive statistics that then will be close in the latent space to some hopefully current medical literature, and it will generate some things that makes sense and it is not absurdly wrong.

It is a fucking LLM, it is not 2001's HAL.

seemaze 7 hours ago | parent | prev | next [-]

I can't wait until it starts recommending signing me up for an OpenAI personalized multi-vitamin® supscription

elzbardico 6 hours ago | parent | prev | next [-]

LLMs are not a mythical universal machine learning model that you can feed any input and have it magically do the same thing a specialized ML model could do.

You can't feed an LLM years of time-series meteorological data, and expect it to work as a specialized weather model, you can't feed it years of medical time-series and expect it to work as a model specifically trained, and validated on this specific kind of data.

An LLM generates a stream of tokens. You feed it a giant set of CSVs, if it was not RL'd to do something useful with it, it will just try to make whatever sense of it and generate something that will most probably have no strong numerical relationship to your data, it will simulate an analysis, it won't do it.

You may have a giant context windows, but attention is sparse, the attention mechanism doesn't see your whole data at the same time, it can do some simple comparisons, like figuring out that if I say my current pressure is 210X180 I should call an ER immediately. But once I send it a time-series of my twice a day blood-pressure measurements for the last 10 years, it can't make any real sense of it.

Indeed, it would have been better for the author to ask the LLM to generate a python notebook to do some data analysis on it, and then run the notebook and share the result with the doctor.

rfw300 6 hours ago | parent | next [-]

This is true as a technical matter, but this isn't a technical blog post! It's a consumer review, and when companies ship consumer products, the people who use them can't be expected to understand failure modes that are not clearly communicated to them. If OpenAI wants regular people to dump their data into ChatGPT for Health, the onus is on them to make it reliable.

themafia 5 hours ago | parent [-]

> the onus is on them to make it reliable.

That is not a plausible outcome given the current technology or of any of OpenAI's demonstrated capabilities.

"If Bob's Hacksaw Surgery Center wants to stay in business they have to stop killing patients!"

Perhaps we should just stop him before it goes too far?

vineyardmike 4 hours ago | parent [-]

> That is not a plausible outcome given the current technology or of any of OpenAI's demonstrated capabilities

OpenAI has said that medical advice was one of their biggest use-cases they saw from users. It should be assumed they're investigating how to build out this product capability.

Google has LLMs fine tuned on medical data. I have a friend who works at a top-tier US medical research university, and the university is regularly working with ML research labs to generate doctor-annotated training data. OpenAI absolutely could be involved in creating such a product using this sort of source.

You can feed an LLM text, pictures, videos, audio, etc - why not train a model to accept medical-time-series data as another modality? Obviously this could have a negative performance impact on a coding model, but could potentially be valuable for a consumer-oriented chat bot. Or, of course, they could create a dedicated model and tool-call that model.

elzbardico 3 hours ago | parent [-]

They are going to do the same thing they do with code.

They are going to hire armies of developing world workers to massage those models on post-training to have some acceptable behaviors, and they will create the appropriate agents with the appropriate tools to have something that will simulate the real thing in a most plausible way.

Problem is, RLVR is cheap with code, but it can get very expensive with human physiology.

protocolture 5 hours ago | parent | prev [-]

This LLM is advertising itself in a medical capacity. You arent wrong, but the customer has been fed the wrong set of expectations. Its the fault of the marketing of the tool.

daft_pink 3 hours ago | parent | prev | next [-]

the problem with ai is that it isn’t good at recognizing red flags in data. i used it to find red flags in a financial report and it finds red flags in virtually every financial report it lays eyes on.

brandonb 8 hours ago | parent | prev | next [-]

We trained a foundation model specifically for wearable data: https://www.empirical.health/blog/wearable-foundation-model-...

The basic idea was to adapt JEPA (Yann LeCun's Joint-Embedding Predictive Architecture) to multivariate time series, in order to learn a latent space of human health from purely unlabeled data. Then, we tested the model using supervised fine tuning and evaluation on on a bunch of downstream tasks, such as predicting a diagnosis of hypertension (~87% accuracy). In theory, this model could be also aligned to the latent space of an LLM--similar to how CLIP aligns a vision model to an LLM.

IMO, this shows that accuracy in consumer health will require specialized models alongside standard LLMs.

siliconc0w 5 hours ago | parent | prev | next [-]

The problem is that false positives can be incredibly expensive in money, time, pain, and anxiety. Most people cannot afford (and healthcare system cannot handle) thousands of dollars in tests to disprove every AI hunch. And tests are rarely consequence free. This is effectively a negative externality of these AI health products and society is picking up the tab.

djoldman 3 hours ago | parent | prev | next [-]

I'm less interested in what "grade" the AI gave and much more interested in what therapy or remedy it would have suggested. That's curiously lacking here.

jdub 5 hours ago | parent | prev | next [-]

Why do people even begin to believe that a large language model can usefully understand and interpret health data?

Sure, LLM companies and proponents bear responsibility for the positioning of LLM tools, and particularly their presentation as chat bots.

But from a systems point of view, it's hard to ignore the inequity and inconvenience of the US health system driving people to unrealistic alternatives.

(I wonder if anyone's gathering comparable stats on "Doctor LLM" interactions in different countries... there were some interesting ones that showed how "Doctor Google" was more of a problem in the US than elsewhere.)

stego-tech 5 hours ago | parent | prev | next [-]

This is not remotely surprising.

Look, AI Healthbros, I'll tell you quite clearly what I want from your statistical pattern analyzers, and you don't even have to pay me for the idea (though I wouldn't say no to a home or Enterprise IT gig at your startup):

I want an AI/ML tool to not merely analyze my medical info (ON DEVICE, no cloud sharing kthx), but also extrapolate patterns involving weather, location, screen time, and other "non-health" data.

Do I record taking tylenol when the barometric pressure drops? Start alerting me ahead of time so I can try to avoid a headache.

Does my screen time correlate to immediately decreased sleep scores? Send me a push notification or webhook I can act upon/script off of, like locking me out of my device for the night or dimming my lights.

Am I recording higher-intensity workouts in colder temperatures or inclement weather? Start tracking those metrics and maybe keep better track of balance readings during those events for improved mobility issue detection.

Got an app where I track cannabis use or alcohol consumption? Tie that to my mental health journal or biological readings to identify red flags or concerns about misuse.

Stop trying to replace people like my medical care team, and instead equip them with better insights and datasets they can more quickly act upon. "Subject has been reporting more negative moods in his mental health journal, an uptick in alcohol consumption above his baseline, and inconsistent cannabis use compared to prior patterns" equips the care team with a quick, verifiable blurb from larger datasets that can accelerate care and improve patient outcomes - without the hallucinations of generative AI.

evolighting 2 hours ago | parent | prev | next [-]

Health data, medical records, even research data, is very scarce in the public domain. This is not just due to so-called privacy concerns, but because such data could have generated “value” (and been sold at a good price) long before the emergence of large language models.

ThundeChile 2 hours ago | parent [-]

I think it's quite alarming that people don't even think about the privacy when sending their health data to corporations which make a large percentage of their revenue selling the data onwards (or using it to things you didn't mean them to).

CqtGLRGcukpy 8 hours ago | parent | prev | next [-]

Original article can be read at https://www.washingtonpost.com/technology/2026/01/26/chatgpt....

Paywall-free version at https://archive.ph/k4Rxt

anonzzzies 7 hours ago | parent | prev | next [-]

Apple watch told me, based on vo2 max, that i'm almost dead, all the time. I went to the doctor, did a real test and it was complete nonsense. I had the watch replaced 3 times but same results, so I returned it and will not try again. Scaring people with stuff you cannot actually shut off (at least you couldn't before) is not great.

elzbardico 7 hours ago | parent | prev | next [-]

A simple understanding of transformers should be enough to make someone see that using an LLM to analyze multi-variate time series data is a really stupid endeavor.

nprateem 4 hours ago | parent [-]

It should be obvious to even the most dim-witted idiot with a PhD in statistics and AI

elzbardico 3 hours ago | parent [-]

You only need this if you are a researcher. Undergraduate knowledge of Calculus and Linear Algebra is more than enough to have quite a good understanding of ML in general, and LLMs in particular.

Maybe a very small bit of Information Theory (a couple of Shannon's papers are enough) and some classical books on Natural Language Processing from the late 90s and early 2000 so you have an idea of what Language Models are outside the modern Deep Learning driven approach.

creatonez 8 hours ago | parent | prev | next [-]

ChatGPT Health is a completely wreckless and dangerous product, they should be sued into oblivion for even naming it "health".

orionsbelt 6 hours ago | parent [-]

ChatGPT has done more for my health than any doctor. Truly.

haldujai 5 hours ago | parent [-]

How so?

maxdo 7 hours ago | parent | prev [-]

Typical Western coverage: “How dare they call me unhealthy.” In reality, the doctor said it needs further investigation and that some data isn’t great. They didn’t say “unhealthy”; they said “needs more investigation.” What’s wrong with that? Is the real issue just a bruised Western ego?