> But the author just took pictures of food & expected a realistic response?

There are very popular apps on the App Store right now that are going viral among non-techie people that do exactly this, and they have no concept of how AI works. My wife was talking about one and I had to give her a reality check that the AI had no idea what ingredients were used to make the food. And she's a licensed nutritionalist.

Studies like this create something to point at for people who are confused and serve as a springboard for a conversation in the media.

▲

whazor an hour ago | parent | next [-]

The real benchmark should be comparing the amounts with a human guess. And aa far as I know with diabetes if you are within 30% of guessing carbs then you should be fine.

▲

endymion-light 18 hours ago | parent | prev | next [-]

That's true - I suppose i'm just dissapointed that this study hasn't seemed to include those within any analysis. Being able to point out that the top 100 calorie counting apps on the app store return similiar results to simple frontier models would be of interest.

I think i'm just dissapointed that this study doesn't go deep enough, and stays at a surface level statistical analysis of frontier models.

	▲	dpark 15 hours ago \| parent [-]
		I think it’s a very useful study specifically to debunk the apps that support this flow. None of those apps have magic. They cannot do better than the frontier models.

▲

asdfasgasdgasdg 17 hours ago | parent | prev | next [-]

To be fair these kinds of apps also existed before LLMs. They just used OpenCV or similar instead of the LLM APIs.

▲

inerte 16 hours ago | parent | prev | next [-]

To be fair my expectations is that those apps have done the prompt engineering, and schema, and tools (to query nutrition database), etc... and although they're not 100% consistent, the margin of errors should be narrow to the point that barely matter, and they should do a bit better than a random ChatGPT chat session.

	▲	Centigonal 15 hours ago \| parent [-]
		the problem isn't one that can be solved with prompts. If I gave a panel of food and nutrition experts (human or machine) a bunch of pictures of food, they still wouldn't be able to tell if, e.g. a slice of cake was made with whole milk or skim. The "pic of packaged food --> LLM --> nutrition DB call" pipeline is workable, but many users of these apps are using them for fresh prepared foods, which is just an unworkable problem without either an understanding of the preparation process or a bomb calorimeter.

▲

xnx 15 hours ago | parent | prev | next [-]

Even simpler examples make the limitations obvious. Images can't distinguish Diet Coke from Coke.

▲

senordevnyc 17 hours ago | parent | prev [-]

licensed nutritionalist

Nutritionist?

▲

kalleboo 6 hours ago | parent | next [-]

Haha oops. English is hard...

▲

Insanity 17 hours ago | parent | prev [-]

[flagged]

	▲	busssard 16 hours ago \| parent [-]
		[flagged]