alexa A Survey on Automated Food Monitoring and Dietary Management Systems | Open Access Journals
ISSN: 2157-7420
Journal of Health & Medical Informatics
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

A Survey on Automated Food Monitoring and Dietary Management Systems

Bruno Vieira Resende Silva and Juan Cui*

Department of Computer Science and Engineering, University of Nebraska–Lincoln, NE, USA

*Corresponding Author:
Juan Cui
Systems Biology and Biomedical Informatics (SBBI) Laboratory
Department of Computer Science and Engineering
University of Nebraska–Lincoln
NE, 68588, USA
Tel: 4024725023
E-mail: [email protected]

Received Date: June 20, 2017; Accepted Date: July 01, 2017; Published Date: July 15, 2017

Citation: Silva BVR, Cui J (2017) A Survey on Automated Food Monitoring and Dietary Management Systems. J Health Med Informat 8:272. doi: 10.4172/2157-7420.1000272

Copyright: © 2017 Silva BVR, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Health & Medical Informatics


Healthy diet with balanced nutrition is the key to the prevention of life-threatening diseases such as obesity, cardiovascular disease, and cancer. Recent advances in smartphone and wearable sensor technologies have led to a proliferation of food monitoring applications based on automated food image processing and eating episode detection, with the goal to conquer drawbacks of the traditional manual food journaling that is time consuming, inaccurate, underreporting, and low adherent. In order to provide users feedback with nutritional information accompanied by insightful dietary advice, various techniques in light of the key computational learning principles have been explored. This survey presents a variety of methodologies and resources on this topic, along with unsolved problems, and closes with a perspective and boarder implications of this field.


Food monitoring; Food image classification; Food image dataset; Automatic nutrient assessment; Machine learning


Many people faces challenges to maintain healthy diet and manage their weight these days, while knowing bad eating habits lead to overweight and obesity that increase the risk of heart diseases, hypertension, other metabolic comorbidities such as type 2 diabetes, and cancer [1]. Personal diet management is always warranted in these scenarios, which often involves manual food logging that is time consuming and tedious [2]. By virtues of growth of smartphone use, several mobile applications have been developed to facilitate food journaling, such as MyFitnessPal, LoseIt and Fooducate, and many have demonstrated great potential in effective diet control [3]. For example, a study shows higher user retention with smartphone-based diet logging compared to the websites and paper diary in a period of six months [4]. Teenagers are willing to take food images using a mobile food recorder before eating [5]; and the dietary feedback contributes to weight loss [6]. However, many of these applications require significant manual input from users and suffer from the low performance in assessing the exact ingredients and food portion [7], which has hindered the long-term use from user.

The current consensus objective on this topic is to develop new methods that can automatically identify food items and estimate nutrients based on food images, utilizing cutting-edge techniques in Computer Vision and Machine Learning and ideally being friendly, effort free and accurate for user to keep track of their meals. Along this line of research, several key issues have been raised including the following. First, the food image databases are expected to be comprehensive, containing large number of food classes to cover the food diversity and abundant images per class to reflect the food image discrepancy when training a classification system [8]. Second, reliable food segmentation is highly recommended to identify all possible items in an image and separate them from the background regardless the lighting conditions or if the food are mixed or not [9]. Subsequently, classification will be performed on each segmented item using machine learning models that are trained based on large food datasets. Last, volume and weight estimation can be performed on each identified item, followed by the nutrient assessment [10-12]. The workflow of an automated food monitoring system that connects these components is presented in Figure 1. It is notable that every aforementioned step involves technical challenges, e.g., it is difficult to estimate food volume based on 2-dimentinoal images.


Figure 1: The workflow of an automated food monitoring system that connects various components discussed in the main text.

In addition to the image-based strategy, several wearable devices, such as glasses with load cells [13] or connected to sensors on temporalis muscle and accelerometer [14] and wrist motion track [15], have been explored to detect food intake events automatically. The collected information about eating episodes, pertinent to users’ diet habit pattern, can serve as starting point for food consumption analysis and diet interventions, e.g., providing user recommendations for physical exercise, healthier food, or eating habit [16,17].

In this paper, we review the most relevant applications on automatic food monitoring (till April 2017) that focus on addressing each aforementioned challenge. We specifically introduce current food image databases in section 2 and survey existing methods used for segmentation, feature extraction, classification, and volume and nutrient estimation in section 3. In addition, a few studies on food-monitoring wearable devices and diet invention depicted are in sections 4 and 5, respectively. Finally, we close the review by discussing the remaining challenges and presenting future outlook in this field (Figure 1).

Food Image Databases

A comprehensive collection of quality food images is key to train a food-classification model and benchmark the prediction performance, i.e., a common procedure to verify if a new classifier outperforms previous methods is to compare their classification performance on large food image databases such as Food-101 [18], UEC Food-100 [9], and UEC Food-256 [19]. Current food image datasets vary in many aspects, for examples, type of cuisine, number of food groups, and total images per food class. For instance, Menu-Match dataset [20] contains 41 food classes and a total of 646 images captured in 3 distinct restaurants while PFID [21] has 61 classes with a total of 1098 pictures captured in fast food restaurants and laboratory. Table 1 gives a summary of different databases with their respective features.

Study (Year) Database Image content Total # of class/image Acquisition Reference
Chen et al., 2009 PFID Fast food items from USA 61/1098 Images taken in restaurants and in lab, with white background [21]
Mariappan, 2009 TADA* Common food in USA 256 food+50 replicas Images collected in a controlled environment [26]
Hoashi et al., 2010 Food85* Japanese food 85/8500 Images derived from previous database with 50 Japanese food category and web [25]
Chen, 2012 Chen Chinese food 50/5000 Images downloaded from the web [22]
Matsuda et al., 2012 UEC Food-100 Popular Japanese food 100/9060 Images acquired by digital camera (each photo has a bounding box indicating the location of the food item) [9]
Farinella et al., 2014 Diabetes Selected food 11/4868 Images downloaded from the web [24]
Bossard et al., 2014 Food-101 Popular food in USA 101/101000 Images downloaded from the web [18]
Kawano and Yanai, 2014 UEC Food-256 Popular foods in Japan and other countries 256/31397 Images acquired by digital camera (each photo has a bounding box indicating the location of the food item) [19]
Meyers, 2015 Food201-Segmented* Popular food in USA 201/12625 Images derived from Food 101 dataset; segmented [11]
Beijbom et al., 2015 Menu-Match Food from three restaurants (Asian, Italian, and soup) 41/646 Images taken by authors [20]
Ciocca et al., 2016 UNIMIB2016 Food from dining hall 73/1027 Images acquired by digital camera in dining hall; segmented [8]
Chen and Ngo, 2016 Vireo Chinese dishes 172/110241 Images downloaded from the web [23]

Table 1: Food image databases.

It is noticeable that there is no default food image database for general classification purpose since most databases archive specific food type. For examples, the UNIMIB2016 database [8] has Italian food images from a campus dining hall and the UEC Food-100 [9] consists of items from Chinese culinary. Similarly, Chen [22] and PFID [21] consist of images from traditional Japanese dishes and American fast food, respectively. On the other hand, Food-101 [18] and UEC Food-256 [19] contain a mix of eastern and western food. Except for food type, other image features such as if the picture was obtained in the wild, in a controlled environment, or whether the image is segmented or not has been taken into consideration when developing those databases (Table 1).

Image Based Food Recognition

Food image segmentation

Segmentation is an important process to separate parts of a scene. When dealing with food, the objective is to localize and extract food items from the image [23-26]. It takes place before food classification when authors attempt to identify multiple food items in the image [8,27] or estimate volume [11,12], which often contributes to improved classification accuracy [9,12].

It is challenging to segment food images since they may not present specific attributes such as edges and defined contour [28]. Food items can be on top of each other or being obstructed by another component, making it hidden in the given image [28]. Meanwhile, external factors such as illumination can interfere negatively in this step, where shadows can be identified as part of the food or even a new food item [12,29].

Several methods have been proposed to address the segmentation issue, summarized in Table 2. For examples, one asks user to draw bounding boxes over food items on the smartphone screen, and performs segmentation using GrabCut algorithm over selected areas [27]. Another segments items by integrating four methods to detect candidate region, including the whole image (assuming each image has one food), Deformable Part Model (DPM, a method utilizing sliding windows to detect object regions), circle detector (detecting circular in an image), and JSEG segmentation to segment regions [9]. A similar approach in Ciocca et al. [8] combined different strategies including image saturation, binarization, JSEG segmentation, and morphological operations (noise removal) to segment multiple food items. In addition, the work presented in Yang et al. [28] tries to segment food by its ingredients and their spatial relationship applying Semantic Texton Forest (STF).

Study (Year) Approach Performance Reference
Yang et al., 2010 Semantic Texton Forest calculates the probability for each pixel to belong to one of the food classes. Output from Semantic Texton Forest is far from a precise parsing of an image [28]
Matsuda et al., 2012 Combined techniques: whole image, DPM, circle detector and JSEG segmentation Overall  accuracy to 21% (top 1) and 45% (top 5)* [9]
Kawano and Yanai, 2013 Each food item within user generated bound boxes is segmented by GrabCut algorithm Performance depending on the size of the bounding boxes [27]
Pouladzadeh et al., 2014 Graph cut segmentation algorithm to extract food items and user's finger Overall accuracy of 95% [12]
Shimoda and Yanai, 2015 CNN model searching for food item based on fragmented reference Detects correct bounding boxes around food items with mean average precision of 49.9% when compared to ground truth values [30]
Meyers, 2015 DeepLab model Classification accuracy increases with conditional random fields [11]
Zhu et al., 2015 Multiple segmentations generated for an image and selected by a classifier It outperforms normalized cut [10]
Ciocca et al., 2016 Combines saturazation, binarization, JSEG segmentation and morphological operations Achieves better segmentation than using JSEG-only approach [8]

Table 2: Food segmentation methods.

Of particular interest is that Deep Leaning approach has been used to tackle food segmentation [11,30], although at its early stage. For example, the application named Im2Calories utilized the Convolution Neural Network (CNN) model that provides unary potentials of a conditional random fields and a fully connected graph to perform edgesensitive label smoothing [11], which increased the overall classification accuracy (Table 2).

Feature extraction

Image objects can be recognized based on their characteristics, such as colour, shape and texture [31]. According to Hassannejad et al. [32], selection of relevant features is important when building a recognition model capable of identifying food items. General image features, as mentioned above, may not be descriptive enough to distinguish foods since the properties of the same good may change when the food is prepared in different ways [23]. For example, Penne and Spaghetti have same colour and texture but distinct shape.

In order to extract informative visual information from food image, descriptors such as Local Binary Patterns (LBP), color information, Gabor filter, Scale-Invariant Feature Transform (SIFT) [22], called handcrafted features, can be applied (illustrated in Table 3). Different features and their fusion often result in different classification performance. For instance, when SIFT and LBP were used individually on Chen dataset [22], it achieves accuracy of 53% and 45.9%, respectively; when they were combined with additional colour and Gabor filter, accuracy rises to 68,3%. Based on the same dataset, another study, Menu-Match [20], extracted the SIFT, LBP and colour in different settings, along with HOG and MR8 and obtained the accuracy of 77.4%. It also illustrates how sensitive a classification can be when the same feature is extracted but with different parameters.

Traditional Methods
Study (Year) Approach Database Performance** Reference
Features Classifier Top1 Acc. Top5 Acc.
Chen, 2012 SIFT, LBP, color and gabor Multi-class Adaboost Chen 68.3% 90.9% [22]
Beijbom et al., 2015 SIFT, LBP, color, HOG and MR8 SVM 77.4% 96.2% [20]
Anthimopoulos et al., 2014 SIFT and color Bag of Words and SVM Diabetes 78.0% - [31]
Bossard et al., 2014 SURF and L*a*b color values RFDC Food-101 50.8% - [18]
Hoashi et al., 2010 Bag of features, color, gabor texture and HOG MKL Food85 62.5% - [25]
Beijbom et al., 2015 SIFT, LBP, Color, HOG and MR8 SVM Menu-Match 51.2%* [20]
Christodoulidis et al., 2015 Color and LBP SVM Local dataset 82.2% - [34]
Pouladzadeh et al., 2014 Color, texture, size and shape SVM 92.2% - [12]
Pouladzadeh et al., 2014 Graph Cut, color, texture, size and shape SVM 95.0% - [12]
Kawano and Yanai, 2013 Color and SURF SVM - 81.6% [27]
Farinella et al., 2014 Bag of textons SVM PFID 31.3% - [24]
Yang et al., 2010 Pairwise local features SVM 78.0% - [28]
He et al., 2014 DCD, MDSIFT, SCD, SIFT KNN TADA 64.5% - [29]
Zhu et al., 2015 Color, texture and SIFT KNN 70.0% - [10]
Matsuda et al., 2012 SIFT, HOG, Gabor texture and color MKL-SVM UEC-Food-100 21.0% 45.0% [9]
Liu et al., 2016 Extended HOG and Color Fisher Vector 59.6% 82.9% [36]
Kawano and Yanai, 2014 Color and HOG Fisher Vector 65.3% - [39]
Yanai and Kawano, 2015 Color and HOG Fisher Vector 65.3% 86.7% [35]
Kawano and Yanai, 2014 Fisher Vector, HOG and color One x rest Linear classifier UEC-Food-256 50.1% 74.4% [38]
Yanai et al., 2015 Color and HOG Fisher Vector 52.9% 75.5% [35]
Deep Leaning Methods
Study (Year) Approach Dataset Top1 Top5 Reference
Anthimopoulos et al., 2014 ANNnh Diabetes 75.0% - [31]
Bossard et al., 2014 Food-101 Food-101 56.4% - [18]
Yanai and Kawano, 2015 DCNN-Food 70.4% - [35]
Liu et al., 2016 DeepFood 77.4% 93.7% [36]
Meyers, 2015 GoogleLeNet 79.0% - [11]
Hassannejad et al., 2016 Inception v3 88.3% 96.9% [32]
Meyers, 2015 GoogleLeNet Food201 segmented 76.0% - [11]
Menu-Match 81.4%* -
Christodoulidis et al., 2015 Patch-wise CNN Own database 84.90% - [34]
Pouladzadeh et al., 2016 Graph cut+Deep Neural Network 99.0% - [40]
Kawano and Yanai, 2014 OverFeat+Fisher Vector UEC-Food-100 72.3% 92.0% [39]
Liu et al., 2016 DeepFood 76.3% 94.6% [36]
Yanai and Kawano, 2015 DCNN-Food 78.8% 95.2% [35]
Hassannejad et al., 2016 Inception v3 81.5% 97.3% [32]
Chen and Ngo, 2016 Arch-D 82.1% 97.3% [23]
Liu et al., 2016 DeepFood UEC-Food-256 54.7% 81.5% [36]
Yanai and Kawano, 2015 DCNN-Food 67.6% 89.0% [35]
Hassannejad et al., 2016 Inception v3 76.2% 92.6% [32]
Ciocca et al., 2016 VGG UNIMINB2016 78.3% - [8]
Chen and Ngo, 2016 Arch-D VIREO 82.1% 95.9% [23]

Table 3: Traditional and deep learning classification methods.

Food classification method

Currently, there are two major classification strategies for food image recognition: 1) Traditional machine learning-based approach using handcrafted features and 2) Deep Learning-based approach. The former usually start with a set of visual features extracted from the food image and use them to train a prediction model based on Machine Learning algorithms such as Support Vector Machine [20], Bag of Features [31], or K Nearest Neighbors [8]. In contrast, emerging deep learning architectures have a large number of connected layers that are able to learn features, followed by a final layer responsible for classification [33]. Recent approaches based on Deep Learning become more popular and effective, e.g., the study in Christodoulidis et al. [34] obtained astonishing results in the ImageNet’s Large Scale Visual Recognition Challenge 2012 (ILSVRC2012).

The following example compares a classifier trained with handcrafted features with a deep learning architecture. In Yanai and Kawano [35], color and HOG features are classified using a similar strategy to Bag of Features, called Fisher Vectors, which achieved accuracy of 65.3% on UEC Food-100 [9]. On the same database, the Deep Learning architecture DCNN-FOOD [35] was created and showed an improvement of 13.5% over the handcrafted method. A major advantage of Deep Learning method is that they can learn relevant features from images automatically, which is particularly important in the cases when the pre-defined features are not discriminative enough [32]. More studies based on both methods are shown on Table 3. Clearly, a common issue with most current methods is that the performance was presented mainly based on overall accuracy where the assessment of sensitivity and specificity were missing (Table 3).

Food volume estimation and nutrient analysis

After identifying all food items from an image, it is important to assess the nutrients included, e.g., the carbohydrates, sugar, or total calorie, which will require volume/weight estimation, another challenge. In fact, not even an expert dietitian can estimate the total calories without a precise instrument, e.g., a scale. Taking image-based calorie estimation as an example: first, food candidate regions must be recognized, segmented, and classified correctly [22,36]; the volume from each segmented item will be calculated; and the nutrient can be estimated based on a nutritional facts table [37-39], such as USDA Food Composition Database [40].

The most challenging part is to estimate food’s volume from 2-dimensional image which normally does not have the depth information, unless reference objects are placed next to the meal [8,41]. Volume can be underestimated or overestimated with interference from external factors, such as lighting conditions, blurred images, and noisy background [22], only few strategies were reported for estimation of food volume and calorie intake as currently the major focus in this domain still lies in the food classification (Table 4) [32].

Study (Year) Approach Performance Reference
Noronha et al., 2011 Via crowdsourcing (e.g. users from Amazon Mechanical Turk) Better performance than other commercial app using crowdsourcing but overall it is error prone since users estimate food portion by just looking at the picture [42]
Chen, 2012 Use depth camera to acquire color and depth Preliminary result showing some limitations when estimating quantity of cooked rice and water [22]
Villalobos et al., 2012 Use Top+side view pictures with user's finger as reference Results change due to illumination conditions and image angle; standard error is in an acceptable range [44]
Beijbom et al., 2015 Use menu items from nearby restaurants Food calorie is from pre-defined restaurant’s menu [20]
Meyers, 2015 3D volume estimation by capturing images with a depth camera and reconstructing image using Convolutional Neural Network and RANSAC Using toy food; the CNN volume predictor is accurate for most of the meals; no calorie estimation outside a controlled environment. [11]
Woo et al., 2010 Use a checkerboard as reference for camera calibration and 3D reconstruction Mean volume error of 5.68% on a test of sever food items [43]

Table 4: Methods for food volume and calorie estimation.

As listed in Table 4, crowdsourcing [42] and a depth sensor camera [11,22] have been utilized for food volume estimation and nutrition assessment. Although leading to promising results, these studies were conducted either in a controlled environment or using an extra camera that is not practical in real-world events. In addition, user’s finger was also used as reference while taking a picture from the top and side views of the plate to estimate food volume [12]. The concern here is that multiple food items overlap in the side view, making it hard to distinguish. Similarly, another reference object, a checkerboard, was used to help obtaining depth information alongside camera calibration [10], which also needs users to carry additional equipment in order to estimate food’s volume.

Note that those methods are mostly tied to a controlled environment. For example, it has stated that a broader study outside the laboratory is not feasible because nutrient values vary depending on how the food was prepared and there is not a broad nutritional database for prepared foods yet [11]. On the other hand, it performed volume estimation for only 7 items in Woo et al. [43], while the study only matched classified food to annotated menu items with respective (Table 4) known calories in Beijbom et al. [20].

Wearable-Device Based Food Monitoring

Other than monitoring food intake through image processing, several wearable devices have been developed for auto-detection of eating episodes. For example, a proof of concept called Glassense [13] utilizes a pair of glasses with load cells to detect user’s digestive behaviours through facial signals. Likewise, glasses connected to a sensor placed on the temporalis muscle and an accelerometer was also presented to detect food intake when users are physically active and/ or talking [14]. In addition, a wrist motion tracker was developed to identify eating activities and measure food intake [15].

Although these approaches can detect eating activities with decent resolution, more follow-up research efforts are needed to explore the relationships between eating activities and nutrient intake and calories consumption.

Diet Intervention

Dietary intervention can be realized after the aforementioned diet management systems learn adequate information about the individual’s’ eating habits. Often it requires functionality similar to a diet advisor capable of giving users feedbacks to improve their health, e.g., eat less often or replace A by B in the meal for weight loss [6]. Recent applications are more sophisticated in this regard. For examples, Faiz et al. [16] introduces a Semantic Healthcare Assistant for Diet and Exercise (SHADE) that can identify user habits and generate suggestions not only for diet, but also for exercise for diabetic control.

Similarly, Lee et al. [17] presents a personal food recommendation agent that can creates a meal plan according to a person’s lifestyle and particular health needs towards a certain health goal.

Remaining Challenges

As mentioned above, despite of the advances in food recognition technologies, there are remaining challenges with respect to each analytical step. For example, food image datasets and classification methods are highly related since the former provide training data for the latter. Current image databases tend to grow in number of classes to incorporate different types of food, e.g., what happened to Food201- Segmented [11], Food85 [25], and UEC Food-256 [19]. Meanwhile, classifiers are developed based on new architecture that is capable of identifying new food items. Since the Deep Learning approaches can provide better classification accuracy when trained on larger datasets [33], it is possible also a need to generate more food images from existing datasets by randomly cropping images and apply distortions like brightness, contrast, saturation and hue [32].

Although segmentation of food items has shown significant improvements in Zhu et al. [10], it is still difficult to segment hidden food item and mixed food. Other factors such as lightning can also contribute negatively to segment foods. For example, shadows can be considered as part of food or candidate regions by algorithms. Methods based on manually-selected candidate items can be promising [30], however, the bounding box size may be influential [27].

Nutrient and calorie estimation remains the most challenging problem in automated diet monitoring systems since it is highly dependent on food segmentation and volume estimation [11]. Undoubtedly, calories can be overestimated or underestimated if any of the other steps is erroneous. However, as discussed above, volume estimation based on 2D images are still far from satisfactory even using the effective reference objects such as a checkerboard [43] and finger [44]. Note this problem can be solved by using stereo cameras, as illustrated in im2Calories [11], which requires extra accessories, or using SmartPlate, a device that integrates multiple scales into a dinning plate to weight food items. Obviously, once all those new more functionalities and sensors are embedded in the smartphones; all such complexity [45] of the problem can be alleviated significantly.


In this review, we have surveyed a wide range of strategies in computer vision and artificial intelligence specifically designed for automated food recognition and dietary intervention. Particularly, the entire framework can be broken down into four parts that involve developments of comprehensive food image databases, classifiers capable for food item recognition, and strategies for food volume estimation, nutrient analysis that provide information for diet intervention. Even though improved performance has been demonstrated, challenging issues still remain and desire novel algorithms and techniques. Worth mentioning is the increased appreciation of using Deep Learning models for food image classification, which has outperformed traditional methodologies using handcrafted features on large databases that cover diverse types of food. Increased application of wearable sensor devices, especially those can be integrated into smartphone, will promote this line of research in a revolutionized manner and as a whole the food monitoring system will help generating novel insights in effective health promotion and disease prevention.


Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

  • 5th International Conference on Medical Informatics and Telehealth
    August 29-30, 2017 Prague Czech Republic

Article Usage

  • Total views: 143
  • [From(publication date):
    July-2017 - Aug 19, 2017]
  • Breakdown by view type
  • HTML page views : 118
  • PDF downloads :25

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version