Using AI to track calories with pictures is convenient, but not accurate.
Overview
- What did they test? In this study researchers tested the accuracy of the three leading large language models (LLMs), ChatGPT, Claude, and Gemini, for estimating food weight, energy content, and macronutrient composition of 52 unique foods.
- What did they find? ChatGPT and Claude showed just under 40% average error for weight and energy estimation and 65-70% error for Gemini for these metrics. Error in the estimation of macronutrients ranged from 42-110%, with all models exhibiting over 60% error in estimating protein. For all models accuracy was worse with larger portion sizes and there were several instances where foods were completely misidentified.
- What does it mean for you? Accurate assessment of nutritional intake can be very helpful for management of weight and body composition. However, current methods of nutritional assessment can be time consuming and difficult and there is a need for less burdensome ways to assess nutrition intake.
With the rise of AI there is an increased potential for utilizing LLMs to assist with estimating dietary intake using pictures. The findings of this study suggest that, while there is promise for improving in the future, these LLMs did not assess nutrition intake accurately.
What’s the Problem?
Assessing nutrition intake accurately is very helpful for making informed dietary changes and also for conducting research to further understand the relationship between dietary habits and health outcomes. Unfortunately, accurate nutrition assessment can be time-consuming and difficult and more convenient methods such as 3-day recall and food frequency questionnaires are not very accurate 1.
Recent technological advancements in machine learning have offered a potential alternative solution to manual dietary assessment 2. The most popular large language models (LLMs) -ChatGPT, Claude, and Gemini - have the capacity to estimate nutritional intake based on a picture which is much more convenient than traditional manual input methods. However, there are several limitations to using LLMs and it is unclear how accurate the data generated from these models can be.

Purpose
Artificial intelligence is becoming more popular as a tool for assessing nutrition intake, but the accuracy of LLMs for assessing nutritional intake is unclear. Therefore, the purpose of this study was to assess the accuracy of the most popular LLMs, ChatGPT, Claude, and Gemini, for predicting portion size, energy, and macronutrient composition of 52 different foods using pictures.
Hypothesis
The authors did not present a hypothesis.