But I don't see them using those commercial services in this study - instead, they're using frontier model companies? Is Gemini advertising that you get a realistic calorie count from a picture? Maybe so - in which case i'd take it back!

▲

notahacker 18 hours ago | parent | next [-]

The commercial services likely also have frontier model dependencies...

The opening to the actual paper is quite explicit that (i) other studies have already tested commercial apps with with unimpressive results and (ii) a popular open source app for carb counting directly relies on API calls from these frontier models, and this research batch tested the images used the exact same models and prompts as the popular open source app.

▲

azakai 16 hours ago | parent [-]

A carb counting app might use API calls to these frontier models and then do some kind of analysis. It could see if different models agree or not, or multiple calls, and with how much variance.

So it would be more accurate to test the apps rather than the APIs, unless the goal is to warn people that just open chatgpt and ask there.

	▲	notahacker 14 hours ago \| parent [-]
		The open source app could in theory do that, but the paper's authors would be able to determine whether it did or not by reading its code, which they evidently did to replicate the API calls it made with their own script. (And of course it would also be far more tedious to submit each picture 500 times manually using an app and manually log the response than using a script which is designed to collect the data automatically as fast as API rate limits permit)

▲

coldtea 18 hours ago | parent | prev [-]

Are commercial services anything more than just UI facades on top of frontier model APIs?

▲

endymion-light 18 hours ago | parent [-]

Great point - and i'd love a study to address that. If the study pointed out that X services sit perfectly within the analysis found, I think that would be a fantastic study that would be enlightening & useful to show.

▲

swiftcoder 18 hours ago | parent [-]

The app the study is based on is open-source, so you yourself can verify that it does indeed just call a frontier model with the same prompts used in the study

▲

endymion-light 17 hours ago | parent [-]

That's not really the same thing as what I'm saying - which is to investigate the applications specifically advertising AI calorie counting capabilities

	▲	notahacker 14 hours ago \| parent [-]
		They investigated an open source application specifically advertising carb counting capabilities, replicated its prompts and API calls in a way optimised to collect data from 26000 queries (which is a lot to do using a GUI!). They also note other people have already done [necessarily] smaller scale studies of the commercial AI carb counting apps and been similarly unimpressed by the responses. This is all in the first few paragraphs of a preprint paper describing the research in considerably more detail which is linked at the bottom of TFA Meta: enjoying nearly half this HN thread being arguments that surely people care about what's in their food don't ask ChatGPT for comment instead of looking it up properly, and most of the rest of it being people who apparently care what's in a research paper asking HN for comment instead of looking it up :)