Remix.run Logo
p-e-w 19 hours ago

Then the correct answer is “I can’t tell.”

Not “Here’s a random guess that I just pulled out of my ass.”

LLMs have picked up the bad habit of trying to give an answer when no answer can be given from scientists, who overall don’t say “I don’t know” nearly as often as they should.

jeroenhd 19 hours ago | parent | next [-]

I tried asking LLMs about food before. They all say "I can't tell for certain, but this is an estimate based on the ingredients I can spot/infer/guess".

You need to write a specific prompt to avoid any warnings.

Of course a lot of people don't know what limitations LLMs have, so there's some value to a blog post about it, but it's not as black-and-white as the article might suggest with its graphs.

The prompt (documented here: https://www.diabettech.com/wp-content/uploads/2026/04/Supple...) lists specific instructions and a specific output format that doesn't allow the LLM any room for explanation or warning in processable data (only in notes fields). In fact, the prompt explicitly tells the LLM to ignore visual inferencing for some statistics and to rely on a nutrition authority instead.

Even in that intentionally restricted format, the English language output uses words like "roughly" and "estimated" in the LLMs I've tested.

Sure, if you take the numeric values and plot them in graphs, you get wildly inconsistent results, but that research method intentionally restricts the usefulness and reliability of the LLMs being researched.

What's much more troubling is this line from the preprint:

> The open-source iAPS automated insulin delivery (AID) system now offers food analysis through APIs from OpenAI, Anthropic and Google [8]

The linked app does seem to have a disclaimer, though:

> "AI nutritional estimates are approximations only. Always consult with your healthcare provider for medical decisions. Verify nutritional information whenever possible. Use at your own risk."

Ukv 19 hours ago | parent | prev | next [-]

> Then the correct answer is “I can’t tell.”

From the paper they're using structured JSON schema mode opposed to freeform answers, so it can't. Models do typically caveat their answer for questions like this, in my experience.

professoretc 18 hours ago | parent [-]

They'll qualify their answers in English but as the article mentions, if your prompt asks for a confidence score, that "uncertainty" doesn't translate into low numerical confidence.

Ukv 17 hours ago | parent [-]

Quantifying their own confidence is also something they're not good at, and which the format would prevent them from refusing to do or preceding with a caveat if that's what you'd want of them. Particularly since the response format seems backwards - giving confidence, then carbs estimate, then observations/notes, rather than being able to base carbs estimate off of observations/notes and then confidence estimate off of both of those.

> They'll qualify their answers in English but [...]

That the default user-facing chat as a normal user would use it gives a warning is the key part IMO. I don't think expectations of there being no "wrong way" to use the model can necessarily extend to API usage with long custom system prompt and restricted output format.

agentultra 19 hours ago | parent | prev [-]

LLMs had no agency to choose such a course of action.

They’re algorithms and they were designed this way.