Validation of the Food Frequency Questionnaire Used to Assess the Association between Dietary Habits and Cardiovascular Risk Factors in the NESCAV Study

Background: In epidemiological studies, the validation of dietary assessment instruments is important to avoid biased associations with outcome measures. Objective: Our objective was to assess the validity of the 134-item food frequency questionnaire (FFQ) used in the Nutrition Environment and Cardiovascular Health (NESCAV) study. Methods: The FFQ was validated against a 3-day dietary record (DR) on a sample of 29 women. The intraclass correlation coefficient (ICC) and Bland and Altman plots were used to assess absolute agreement, whereas relative agreement was appraised by Spearman’s correlation coefficient and Cohen kappa coefficient based on cross classification of 3-category nutrient intake. Results: The two methods differed significantly for the majority of micronutrients with FFQ yielding higher intakes than the DR. The bias between the two methods was nonetheless acceptable with an average overestimation by the FFQ of 11% for macronutrients and 29% for micronutrients. Regarding precision, results differed by 48% for micronutrients and 50% for macronutrients. Correlations on energy-adjusted data by the two methods were satisfactory with an average correlation of 0.47 and 16/25 coefficients above 0.40. Only vitamin A and cholesterol showed poor correlations of 0.02 and 0.05, respectively. On average, the correct classification rate in 3 categories was 50.3% and 19/25 kappa coefficients were above 0.20. Poor agreement was found for protein, cholesterol, starch, vitamins A, B12 and E with weighted kappa coefficient less than 0.20. Conclusion: Although absolute values of dietary intakes were not always accurate, the relationship and agreement between FFQ and DR may be considered as satisfactory. In particular, the FFQ was able to categorize subjects into 3 broad categories of intakes for most nutrients. Results for proteins, cholesterol, starch, vitamins A, E and B12 however ought to be interpreted with caution.


Introduction
Unhealthy dietary habits are associated with chronic diseases such as cardiovascular disease and cancer [1,2]. A diet rich in energy, total fat, saturated fat and cholesterol but relatively low in unsaturated fats, fruits and vegetables has been linked to the development of cardiovascular risk factors [3]. However, further research is needed to better understand the effect that nutrition may have on cardio metabolic risk factors (such as hypertension, hypercholesterolemia, diabetes and obesity) [4]. In this context, the interregional NESCAV (Nutrition, Environment and Cardiovascular Health) study aimed to assess dietary habits of the Greater region's population (Luxembourg, Wallonia in Belgium and Lorraine in France) and to explore the relationship between diet and cardiovascular risk factors (CVRF) [5].
In nutritional epidemiological studies, the measurement of diet constitutes a difficult challenge because of the complex nature of the diet itself [6]. Several dietary assessment methods are available, e.g. the dietary record (DR), 24-hour dietary recall, food frequency questionnaire (FFQ) and diet history. However none of these methods can measure dietary intake without errors [7]. One of the most common tools used to study the relation between diet and disease is the food frequency questionnaire, because of its easy administration and its low cost in studies with large sample size [8]. Therefore, for the NESCAV study, we used a modified FFQ previously designed to assess dietary habits in Quebec [9].
Research on diet-disease relationships requires accurate data collection of dietary intake and estimation of nutritional intake. The accuracy of data is dependent on the precision and completeness of data collection, use of a representative and comprehensive food composition database, and consistency and precision during data entry. As such errors cause bias in relative risk estimates, it is paramount to estimate the validity of the instrument used to assess dietary intake.
Basically, validity studies are used to determine the degree of measurement errors and yield information about how well the instrument is measuring what it is intended to measure [10]. To assess the true validity of an FFQ would require measuring with high accuracy the usual self-selected diet of free living individuals over several months, which is not feasible. Therefore, researchers assess relative validity by comparing the FFQ with alternative dietary assessment methods considered to be more valid. Since multiple weekly dietary records are judged to be superior to FFQ, we choose it as reference method [11][12][13].
The effect of diet on a health outcome is most frequently quantified as odds ratio or relative risk in epidemiological studies. Therefore, FFQs must be able to rank individuals along the distribution of intake so that individuals with low intakes can be separated from those with high intakes and thus provide accurate risk estimates [10]. The capacity of ranking is assessed via the relative agreement. Additionally, as the NESCAV study also aims to assess the compliance of the population with dietary guidelines, it is therefore important that our modified FFQ measures accurately the dietary intakes. The objective of the present research work was to validate the FFQ used in NESCAV study by assessing both the relative and absolute agreement of the FFQ with a 3-day diet records (DR).

Description of the NESCAV study
NESCAV study is a cross-sectional study aiming to assess the prevalence of cardiovascular risk factors in the population of the Greater Region (Grand-Duchy of Luxembourg, Wallonia in Belgium, and Lorraine in France). The objectives are to assess, in a representative sample of 3000 randomly selected individuals living in the Greater Region, 1) the cardiovascular health and risk profile, 2) the association between the dietary habits and the cardiovascular risk, 3) the association of occupational and environmental pollution markers with the cardiovascular risk, 4) the knowledge, awareness and level of control of cardiovascular risk factors.

Validation study population
29 female workers were recruited from the University of Liege (Wallonia) to take part in the validation study.

FFQ (Tested method)
In NESCAV study, the dietary habits were assessed by using a semiquantitative FFQ. The concept and rationale for major food groups has been developed, basing on the validated Canadian FFQ, which was composed of 73-food items to capture food consumption among adults living in Quebec [9]. Reliability and accuracy against four food records were examined in a validation study, which suggested that the original FFQ was a relatively valid instrument for determining usual diet in Quebec adults [9]. Our FFQ was adapted to the studied population's cultural and linguistic particularities, to assess the subjects' intake of energy and nutrients coming from different cultural backgrounds. For this purpose, intensive efforts were done to extend the items and integrate new foods to cover the diversity of dietary habits of the Greater Region's population.
The last version of the FFQ aims to assess the dietary intake, by asking the participants to report the frequency of consumption and portion size of approximately 134 item lines over the last three months. Items are defined by a series of foods or beverages which are categorized into 9 major food groups: starchy food, fruits, cooked and raw vegetables, meat-poultry-fish-eggs, prepared dishes, dairy products, fats, drinks (alcoholic and non-alcoholic), and miscellaneous. The participants reported the frequency of consumption of each food group on the basis of 6 levels of frequencies: rarely or never; one to three times a month; one to two times a week; three to five times a week; one time a day; 2 times or more a day. Standard serving sizes and food models based on a photographic manual, validated by the French 'SUpplementation en VItamines et Mineraux AntioXydants' (SU. VI.MAX) study [14], are provided as a reference to aid the participants to estimate the portion size. Estimates of grams of food consumed per day were calculated by multiplying the frequency of consumption of food items by the portion size chosen. The food items (g/d) were subsequently converted into daily nutrients intake by using the SU.VI. MAX Food Composition Database [15]. For a given nutrient, intakes from specific food items were then summed to obtain the total nutrient intake for each individual. Computed nutrient intakes of vitamins reflect only food sources.
The accessibility and readability of our FFQ were assessed in a pre-test phase on a multicultural group of subjects. Given the multilinguistic nature of the population residing in Luxembourg, the FFQ was translated from French into the three most used languages, namely German, English and Portuguese, and then backward translated into French to ensure the linguistic validity [16].
The FFQ was self-administered with the help of trained research nurses. At the interview, the staff provided detailed instructions about how to fill in the FFQ, helped the participants individually to complete dietary information and then checked the correctness and completeness of the questionnaire.

3-day diet records (DR) (Reference method)
Participants were asked to take home and complete an open-ended 3DR. The diary booklet contained instructions and pages to record foods eaten during seven time periods (before breakfast, breakfast, mid-morning, lunch, tea, evening meal, later evening) for each of 3 days. For each participant, two days of the week and one day of the week-end were chosen at random. The mean daily intake of 3 dietary records was used as representative of DR.

Statistical analysis
Absolute agreement: Summary statistics were calculated and presented for unadjusted data. Results were expressed as median and interquartile range (IQR). Nutrient intakes estimated from the FFQ were compared to those derived from the DR by the Wilcoxon signed-rank test. Bland-Altman graphs [17] which plot the difference between the results of two methods against their mean were used to assess the agreement between FFQ and DR over the entire range of intake levels. Spearman's correlation coefficient was calculated to test for a potential relationship between methods difference and intake level (heteroscedasticity). In absence of significant correlation, the mean bias (mean difference) and 95% limits of agreement (LOA) were calculated as mean ± 2 standard deviation (SD) of the between-methods differences. Moreover, computations were also performed on log-transformed data and antilogs were then taken, providing limits for the FFQ/DR ratio. These ratios were expressed in percents with 100% representing perfect agreement. As bias could be compensated by positive and negative values, the precision (difference in absolute values) was also presented.
Relative agreement: Since one of the major goals of the NESCAV study was to use the FFQ to assess the association between nutrient composition of the diet (rather than the absolute individual nutrient intakes) and cardiovascular risk factors, all nutrients were energyadjusted according to the regression residual method of Willett and Stampfer [18]. Energy-adjusted nutrients are the residuals from regression analyses with energy intake as the independent variable and the nutrient intake as the dependent variable. Residuals are finally added to the expected nutrient value for the mean energy intake of the sample to obtain a score adjusted to the average energy intake. The agreement between energy-adjusted daily intakes from DR and FFQ were first measured by Spearman's correlation coefficient for all nutrients; values > 0.40 being regarded as acceptable [19]. For values lower than 0.40, the attenuation will be so severe that it will be difficult to detect associations between diet and disease [7]. For each nutrient, the distributions of FFQ and DR results were divided into 3 categories of equal frequency by means of terciles (low, medium and high intake). Individual results were then cross-classified in the FFQ and DR categories, and the FFQ correct classification rate obtained was considered as measure of its capacity of ranking. The proportion of FFQ subjects falling in opposite categories was also computed yielding an estimation of grossly misclassification errors. The agreement between the 3 categorical scales was also measured by the weighted Cohen kappa coefficient (ĸ); the weighting factors being 1 for complete agreement (same category), 0.5 for disagreement one category apart (adjacent categories) and 0 for complete disagreement (opposite categories). Kappa values were interpreted as follows: >0.80 indicates very good agreement, 0.61-0.80 good agreement, 0.41-0.60 moderate agreement, 0.21-0.40 fair agreement, and <0.20 poor agreements [20]. 95% confidence intervals (95% CI) of correlation and kappa coefficients were also computed.
Results were considered significant at the 5% critical level (P<0.05). All analyses were performed using SAS statistical software (version 9.2, SAS Institute Inc).

Subjects' characteristics
The 29 female participants ranged from 25 to 45 years old. All were non-smokers and had a degree's level less or equal than high school. Table 1 displays summary statistics (median, IQR) for daily intakes of energy and 25 nutrients obtained from the FFQ and from the DR. The two methods differed significantly for the majority of micronutrients with the FFQ tending to report higher intakes than the DR. No differences were found for energy intake and for macronutrients except sugar and water. In general, the distributions of energy intake and of nutrient intakes for the FFQ were more dispersed than those for the DR.

Absolute agreement between FFQ and DR
The results of the Bland and Altman analyses are shown in table 2. Mean differences were not computed in case of heteroscedasticity. The overestimation by the FFQ is clearly demonstrated by the mean ratios almost all above 100% (excepted for PUFA and linoleic acid). Overall, FFQ overestimated intake of DR by 21%. This overestimation was lower for macronutrients (11%) than for micronutrients (29%). FFQ-derived estimates for sugar, vitamins C, E and A were particularly overestimated with percentages of 42%, 87%, 52% and 49%, respectively. Although the bias is computed from positive and negative differences between FFQ and DR results, the mean of absolute differences gives an idea of the precision of the FFQ method. For PUFA, linoleic acid, cholesterol, starch and vitamin B12, the bias was relatively close to 100% (90%, 91%, 107%, 100% and 108%, respectively) whereas the precision was  much worse (153%, 155%, 149%, 148% and 158%, respectively). For these nutrients, no evidence of bias was highlighted but the estimations differed by 50% or more. Overall, FFQ and DR differed by 46% for micronutrients and by 50% for macronutrients. This was confirmed by the ICC values which were quite low (0.07 for energy intake and a mean of 0.24 for macronutrients and 0.21 for micronutrients, respectively).

Relative agreement between FFQ and DR
Spearman's correlations for unadjusted and energy-adjusted nutrient intakes obtained from the FFQ and the DR are presented in table 3. Correlations for unadjusted data ranged from -0.10 (vitamin A) to +0.65 (vitamin C). We noted that 7/26 correlation coefficients (water, PUFA, linoleic acid, calcium, vitamins B1, B2 and C) were above 0.40. On average, correlations amounted 0.30. For almost every nutrient, energy adjustment led to higher correlation coefficients with an average correlation of 0.47 and 16/25 coefficients above 0.40. For macronutrients such as fibers, sugar, starch, MUFA, PUFA, SFA, and lipids, correlations were good. Correlations of common micronutrients such as iron, phosphorus, potassium, sodium, zinc, calcium, vitamins B1, B2 and C, ranged from 0.51 to 0.69 and were better than those of other micronutrients. Only vitamin A and cholesterol showed a poor correlation of 0.02 and 0.05, respectively. Interestingly, despite its poor absolute agreement (bias of 187% and precision of 193%), vitamin C demonstrated the best ranking capacity (correlation equal to 0.5 for unadjusted data and 0.69 for energy-adjusted data). Although 95% CI were quite large, most of them were statistically significant with -indicates that data were not computed because of heteroscedasticity of the difference.   the value 0 outside of the intervals. Most of the others non-significant correlations were also high but not significant because of the small sample size. Table 3 also displays the results of the cross-classification of the nutrients into the 3-tertile categories and the corresponding weighted kappa coefficients; all measuring the discriminant ability of the FFQ to categorize individuals into broad nutrient intake categories. For unadjusted data, the correct classification rate ranged 28-62% (mean 43.5%) while on average 13.5% of the values were grossly misclassified (extreme categories); only 8/25 kappa coefficients were >0.20, corresponding to a fair agreement. For almost every nutrient, energy adjustment led to an increase of the capacity to categorizing individuals. For energy-adjusted data, 31-69% (mean 50.4%) of the values was correctly classified and the mean percentage of gross misclassifications decreased to 9.10%, while 19/25 kappa coefficients were above 0.20. Poor agreement was found for protein, cholesterol, starch, vitamins A, B12 and E with weighted kappa coefficients of less than 0.20. As with correlations, 95% CI of Kappa coefficients were quite large.

Discussion
For most nutrients, observed FFQ-derived nutrients were higher compared to those of the DR. This finding was already demonstrated in previous studies [13,21], particularly for FFQs exceeding 100 items [22], as was the case here. The bias between the two methods was acceptable with an average of overestimation by the FFQ of 11% for macronutrients and of 29% for micronutrients. However, the precision was much worse with discrepancies between the two methods of 46% for macronutrients and 50% for micronutrients. Globally, according to the intra-class correlation coefficients and the Bland-Altman plots, there was a "fair agreement" between the two methods.
The overestimation by the FFQ may reflect the fact that the DR underestimates many food groups [23]. It is likely that some food items on the FFQ may not have been consumed during the 3 days and this may contribute to the observed difference. The variations between the two instruments in terms of the method of data collection and the manner to transform the self-reported food items into nutrients may explain these dissimilarities in estimates. The higher consumption of fiber and vitamins estimated by the FFQ could be related to the suggested number of fruits and vegetables in the FFQ, providing thus more selection possibilities, compared to the DR. In fact, in the FFQ, 26 food items were used to describe the food groups, fruits and legumes, which could explain the relatively large difference in means (despite an acceptable agreement for ranking) for vitamin C intake. Another possible explanation for the large differences in average nutrients intakes between the compared methods is the estimation of portion sizes [24]. The FFQ with the help of a manual photos suggested predefined portion sizes, while in the DR the consumed amounts were quantified in an open-ended manner. The dispersion of recorded values for the FFQ is therefore rather low. Additionally, participants could meet difficulties in estimating portion sizes for some food groups. For instance, although plum, grape, cherry, nectarine, peach and apricot are all belong to the same food group; one portion of each may have quite different weight. Besides, the FFQ request the frequency and amount consumed of single food items. Therefore, it relies on the participant's ability to quantify the consumption of a given item from single foods as well as from mixed dishes. In contrast, foods consumed as part of mixed dished were quantified separately in the DR method [24].
As already stated in similar studies by other researchers [25][26][27] we may consider the FFQ data as inappropriate to estimate absolute levels of food intake, when considering the bias and low precision observed in this study. However, in the NESCAV study, our main concern was to classify individuals into different groups according to exposure levels rather than to assess their absolute nutrients intake. Indeed, for estimating relative risks between nutritional exposure and cardiovascular risk factor, the degree of misclassification of subjects is more important than the quantitative scale on which the ranking is made [28]. Therefore, correlations and weighted kappa were also computed. We found a mean correlation coefficient of 0.30 for unadjusted data and 0.47 for energy-adjusted data. The better correlation coefficients after adjustment for total energy clearly indicate that the variability of the nutrient intakes is related to energy intake. Except for protein and cholesterol, results were similar or better than the correlations found in comparable studies [13,21,29]. Some studies which used 7-day dietary records obtained better or similar correlation coefficients [13,30] and those which used 3-day dietary records obtained lower or similar results [31]. Although, the validation of the original Canadian version of the FFQ [9] showed good correlation coefficients for protein (r=0.75) and cholesterol (r=0.74), correlations obtained from the modified FFQ were not significant.
Concerning results on cross-classification into tertiles, we observed that FFQ performed well. After energy adjustment, 19/25 nutrients obtained a weighted kappa coefficient above 0.20 and the average correct classification rate was about 50%. These percentages of agreement were comparable to those of previous studies which compared their FFQ with four 3-day dietary records [29] confirming that the FFQ we used may be useful in ranking.
Although observed correlations and kappa between FFQ-derived intakes and the DR were good, they are likely to be underestimates of the correlations between the FFQ and real intake. It is probable that the DR for 3 days was not as representative of long-term dietary habits as we expected; the inclusion of more days of records would have possibly improved these results. According to the literature [32], 3 days of recordings were reasonable because declining accuracy of recording with increasing fatigue and boredom have been noticed with longer records. However, a higher number of record days, spread over the whole year would have been more optimal as reference method, since this could take into account seasonal variation as well [31].
The main strength of this validation study was the selection of the most used DR as a reference method which is considered as 'gold standard' among dietary assessment methods [33], because it has fewer correlation errors compared with other reference methods [6]. This has been attributed to the fact that both methods employ different approaches to evaluate dietary intakes. The DR does not rely on memory, is constituted of open-ended questions and involves direct estimation of portion size [34]. Therefore, by validating our FFQ against DR, the possibility of an improved correlation due to similar source of error is reduced.
However, the study has several drawbacks. Firstly, we did not address specifically the issue of repeatability. In general, the reproducibility of FFQs is assessed by self-administration at two points in time to the same group of people. Since all studies assessing FFQ's reproducibility showed very good results, we think that this feature of FFQ is already well-known. Moreover, it was already examined for the Quebec's version which concluded its reliability [9]. Additionally, according to Altman [20], a method with poor repeatability will never agree well with another method. Therefore, in view of the good results obtained concerning the agreement between FFQ and DR, it would be very unlikely to have a FFQ with poor repeatability and such a good agreement with DR.
Secondly, our study is constituted only of 29 female participants, which may limit the final conclusion. Generally, the suggested sample size for an FFQ validation study is varied between 50 to 100 individuals [19], although other researchers have used similar number of participants to the present study, and produced promising results. Moreover, power calculation showed that a minimum of 29 subjects would be needed to give 80% power to detect correlation between FFQ and DR of 0.45 as significant at the 5% level. Additionally, in order to take into account the small sample size, non-parametric methods (Spearman's rank correlation, Wilcoxon signed-rank test) were used. In summary, despite of the small sample size, most of the observed associations were high and statistically significant. We have, however, calculated 95% confidence intervals for all computed statistics. Another drawback of the validation sample is that the characteristics of participants in the validation study (only females of moderate education level) were not perfectly comparable to those of the population.
Finally, like any dietary assessment methodology, DR is also prone to a degree of misreporting, in particular to item-specific underreporting or over reporting. This could lead to artificially high correlation coefficients between the DR and the FFQ. Nutritional biomarkers have been recently used for validation purposes, as an alternative to selfreported methods of dietary intake, having the advantages of being objective, unbiased and theirs errors are uncorrelated to the errors of FFQ. As blood and urine samples were collected for most of the subjects in the NESCAV study, correlations of nutrient intakes from the FFQ and their biochemical measures are tested and the results will be published in a forthcoming report, to support the present findings.
Then, we will assess associations between dietary habits and cardiovascular risk factors. Since the traditional single-nutrient approach is not really appropriate to describe the complexity of the human diet and the high level of intercorrelation among various food and nutrients, overall dietary patterns will be studied [35]. Dietary patterns consider how foods and nutrients are consumed in combinations and therefore represent more closely the real world. We will use both 'a priori'methods which are based on the use of dietary score that assess compliance with prevailing dietary guidelines and 'a posteriori' method which are dimension-reduction technique applied to the data.

Conclusion
The aim of this study was to evaluate the performance of a 134item FFQ to be used in the NESCAV study. Validity was assessed by comparing estimates from FFQ with those derived from the mean of 3-day DR. Globally, considering the results presented in this study, estimates of absolute values of dietary intakes may not be accurate. However, regarding several favorable elements, such as good correlations and ranking, along with the previous validation of the original Canadian version and the similarities with others validation studies, we can conclude that our questionnaire is a reasonable tool to categorize subjects into broad ranges of dietary intakes. Nevertheless, the results on protein, cholesterol, starch, vitamin A, E and B12 should be interpreted with caution. In the light of these results and the advantage of being cost effective and quickly administered, we believe that the FFQ is a good tool to evaluate dietary patterns in people living in the Greater region. The future validation study against nutritional biomarkers should reinforce the validity of this modified FFQ.