> But the author just took pictures of food & expected a realistic response? Is this genuinely what amounts to a study in AI?

The article explains this: There are apps targeting people with diabetes that claim to count your carbs with AI.

> If you’re using AI carb counting in a diabetes app

Before you dismiss a study, try to understand where it’s coming from.

The authors of the study weren’t stupid. They knew the LLMs would provide poor results. They ran the study to quantify it and create a resource to spread the information in response to the rise of AI carb counting apps.

▲

ijk 17 hours ago | parent | next [-]

> The authors of the study weren’t stupid. They knew the LLMs would provide poor results. They ran the study to quantify it and create a resource to spread the information in response to the rise of AI carb counting apps.

Yeah. I think it is under-appreciated that much of science is intended for debugging purposes. Sure, you and I know that X is positive, but what's it actual value? Can we find the causes that make it that way? Et cetera.

▲

endymion-light 18 hours ago | parent | prev | next [-]

I don't believe the authors of this study are stupid.

If there are apps targeting people with diabetes that claims to count your carbs with AI, why haven't those been analysed? That would be a far more effective claim.

I based the study off of the clickbait article that they wrote about the study - i'll read through the study to see whether they analyse that, but it would be far more effective to see if the 'carb-counting' AI app is returning similiar results to the frontier model - that's an interesting result that actually can forward discussion.

▲

Aurornis 18 hours ago | parent | next [-]

> If there are apps targeting people with diabetes that claims to count your carbs with AI, why haven't those been analysed? That would be a far more effective claim.

Because the apps aren’t going to let you submit 29,000 automated requests for statistical analysis.

And if you did, the authors of those apps would just release an update saying they changed models and try to dismiss the study.

The vitriol against this article on HN is sad. Commenters who agree with the article and its conclusions are grasping for reasons to be angry about it anyway

▲

endymion-light 18 hours ago | parent [-]

You can commit statistical analysis on frontier models and still use commercial applications as an identifier & comparison.

Criticism is not vitriol - it's possible to make a wider point about being taken aback by the lack of education within AI to the point that there's a critical mass of people using them for calorie counting; but there are many studies on effects of LLMs on psychology etc that are far more effective.

But for me - this is like creating a study that performing algebra & calculus is innacurate on LLMs. That should be common knowledge

▲

hrimfaxi 18 hours ago | parent | next [-]

It is not uncommon to study things that are considered common knowledge.

▲

YeGoblynQueenne 17 hours ago | parent | prev [-]

Well, for me the comments that insist we don't need to study X because everybody knows LLMs can't do that is a very good justification to study exactly X.

Not to mention that this is now a standard thought-terminating cliché, where someone points out a use case where LLMs don't work at all well and irrate responses protest that LLMs aren't meant to be used in that way. Says who? If you ask an LLM a question and it answers it- then that's an LLM use case. If you can ask the same question many times and evaluate the results then that's an evaluation that is perfectly fine to make.

	▲	endymion-light 17 hours ago \| parent [-]
		Yes - my original claim is not to not study it, it's to study it deeper than just surface level, which is my belief at what I've read from the site linked

▲

tsimionescu 16 hours ago | parent | prev [-]

The linked "click bait" article explains this very clearly as well. It clearly explains the methodology: they took the prompt sent to an LLM by a popular open source carb counting iOS app and sent it, together with five different pictures of food that a typical person might take, to all of the frontier models, and checked the responses. They also explain the purpose: to check the possible accuracy of this approach taken by a real app that real people use.

The fact that you somehow perceived this as an attack on LLMs as a technology is a failure entirely on your part. There is nothing in the article that suggests that people shouldn't use LLMs for other purposes - just a statistical verification of the fact that they shouldn't be used for this one particular thing.

	▲	endymion-light 12 hours ago \| parent [-]
		I didn't take anything as an attack on LLMs. I took it as a severe misunderstanding of how technology works. I specifically outline that I would like to see the margin of error even when integrating actual apps that claim to achieve results, rather than using tools that don't. None of my claim perceives anything as an attack on LLMs, which shows a mischaracterisation on your part of my entire point.

▲

ilivethere 18 hours ago | parent | prev [-]

Typical case of the "curse of knowledge". We deal with AI on a daily basis on the technical level, so it's very easy to forget that the "common" folk really still believe that AI can replace dieticians, gym coaches, etc