I don't believe the authors of this study are stupid.

If there are apps targeting people with diabetes that claims to count your carbs with AI, why haven't those been analysed? That would be a far more effective claim.

I based the study off of the clickbait article that they wrote about the study - i'll read through the study to see whether they analyse that, but it would be far more effective to see if the 'carb-counting' AI app is returning similiar results to the frontier model - that's an interesting result that actually can forward discussion.

▲

Aurornis 18 hours ago | parent | next [-]

> If there are apps targeting people with diabetes that claims to count your carbs with AI, why haven't those been analysed? That would be a far more effective claim.

Because the apps aren’t going to let you submit 29,000 automated requests for statistical analysis.

And if you did, the authors of those apps would just release an update saying they changed models and try to dismiss the study.

The vitriol against this article on HN is sad. Commenters who agree with the article and its conclusions are grasping for reasons to be angry about it anyway

▲

endymion-light 18 hours ago | parent [-]

You can commit statistical analysis on frontier models and still use commercial applications as an identifier & comparison.

Criticism is not vitriol - it's possible to make a wider point about being taken aback by the lack of education within AI to the point that there's a critical mass of people using them for calorie counting; but there are many studies on effects of LLMs on psychology etc that are far more effective.

But for me - this is like creating a study that performing algebra & calculus is innacurate on LLMs. That should be common knowledge

▲

hrimfaxi 18 hours ago | parent | next [-]

It is not uncommon to study things that are considered common knowledge.

▲

YeGoblynQueenne 17 hours ago | parent | prev [-]

Well, for me the comments that insist we don't need to study X because everybody knows LLMs can't do that is a very good justification to study exactly X.

Not to mention that this is now a standard thought-terminating cliché, where someone points out a use case where LLMs don't work at all well and irrate responses protest that LLMs aren't meant to be used in that way. Says who? If you ask an LLM a question and it answers it- then that's an LLM use case. If you can ask the same question many times and evaluate the results then that's an evaluation that is perfectly fine to make.

	▲	endymion-light 17 hours ago \| parent [-]
		Yes - my original claim is not to not study it, it's to study it deeper than just surface level, which is my belief at what I've read from the site linked

▲

tsimionescu 16 hours ago | parent | prev [-]

The linked "click bait" article explains this very clearly as well. It clearly explains the methodology: they took the prompt sent to an LLM by a popular open source carb counting iOS app and sent it, together with five different pictures of food that a typical person might take, to all of the frontier models, and checked the responses. They also explain the purpose: to check the possible accuracy of this approach taken by a real app that real people use.

The fact that you somehow perceived this as an attack on LLMs as a technology is a failure entirely on your part. There is nothing in the article that suggests that people shouldn't use LLMs for other purposes - just a statistical verification of the fact that they shouldn't be used for this one particular thing.

	▲	endymion-light 12 hours ago \| parent [-]
		I didn't take anything as an attack on LLMs. I took it as a severe misunderstanding of how technology works. I specifically outline that I would like to see the margin of error even when integrating actual apps that claim to achieve results, rather than using tools that don't. None of my claim perceives anything as an attack on LLMs, which shows a mischaracterisation on your part of my entire point.