Statistical Issues in the Evaluation of Clustering of Metabolic Syndrome in Spousal Pairs

Background: The metabolic syndrome is intimately linked hypertension, impaired glucose tolerance, dyslipidemia, and abdominal obesity and is associated with an increased risk of total and cardiovascular mortality in adults. Genetics as well as environmental influences have been implicated in obesity and several cardiovascular risk factors. Because Family is one of the most important factors affecting metabolic risk factors, studying co-aggregation of the components of the syndrome among family members, and in particular spousal pairs, is of interest to genetic epidemiologists and community health researchers. Methods: Based on the clinical definition of the syndrome, we introduce three statistical models to estimate the prime parameter of interest which measure the degree of clustering of the disease among spousal pairs. Since the focus in this paper is on the methodological approach to estimate the between pairs clustering parameters, we shall use Monte-Carlo simulated data for demonstration purposes, with values of the input parameters for each component taken from a well-known Korean study. We develop two models, the Bivariate Truncated Poisson Model (BTPM), which models the counts, and the Bivariate Dirichlet Multinomial Model (BDMM), which models the frequency of counts, and discuss the relative merits of each model. The two models are qualitatively different but quantitatively interrelated. Since the clinical definition of the metabolic syndrome requires that at least three of its components, co-exist within a subject, we show that adhering to this definition requires certain specifications that should be satisfied in any of the adopted models. We estimated the clustering parameters under the specified models. A comparison between the models was based on the internal consistency of each model. What we mean by that is the degree of closeness of the estimated distribution to the observed data. The BDMM fitted the data much closer than the BTPM. Interpretation: In a sample of randomly selected spousal pairs; and according to the clinical definition, the number of components of the metabolic syndrome can in an individual be 0, 1, 2, 3, 4, or 5. Estimation of the clustering parameter of the counts is equivalent to the estimation of the intraclass correlation coefficient (ICCC) between pairs. Assessing the goodness of fit of the proposed models, it is more statistically sound to estimate the degree of clustering of the components of the syndrome in spousal pairs under the BDMM. Journal of Biometrics & Biostatistics J o ur al of Bio metrics & Bistatis t i c s


Introduction
The metabolic syndrome is the co-aggregation of hypertension, impaired glucose tolerance, dyslipidemia, and abdominal obesity and is associated with an increased risk of total and cardiovascular mortality in adults [1,2]. Genetics as well as environmental influences have been implicated in obesity and several cardiovascular risk factors [3,4]. Family is one of the most important factors affecting metabolic risk factors in children, in that family displays an interaction between genetic and shared environmental factors [5,6]. Recent research showed that childhood and adolescent overweight has been increasing in Asian countries due to urbanization and economic development. For example, over the past 10 years, the rates of overweight among Korean children and adolescents aged 5-20 years have doubled [7], which may ultimately cause an increase in adverse cardiovascular outcomes. Globally, the prevalence of the metabolic syndrome is high among obese children and adolescents, and increases with increasing obesity [8].
Most of the research on metabolic syndrome is related to description by Reaven [9] of syndrome X or 'the insulin resistance syndrome'. In 1988, Reaven et al. [10] used the term syndrome X to refer to the tendency of glucose intolerance, hypertension, low high density lipoprotein (HDL) cholesterol and raised triglycerides, and hyperinsulinaemia to occur in the same individual.
Broadly, the research on this highly prevalent condition followed two lines of work: one based on the epidemiological studies whose main concern has been to identify risk factors for cardiovascular disease; and the other based on clinical and experimental studies concerning the pathogenesis of diabetes and atherosclerosis. We focus in this paper on statistical and epidemiological aspects of metabolic syndrome.
Given the importance of the metabolic syndrome as a public health problem, estimation of its population prevalence according to the most recent definitions poses challenges. Within the frame work of clustering or familial aggregation of traits we focus on the issue when the sampling strategy is cross sectional cluster sampling with two units within each cluster; husband and wife, or spousal pairs. The major concern with our research is to formulate model from which measures American Association of Clinical Endocrinologists [18]. Finally, the (international Diabetes Federation (IDF) definition uses a lower fasting glucose level than the original NCEP definition, using the American Diabetes Association 2003 cut point for impaired fasting glucose [19,20].
Computation of MS prevalence from the prevalence of its individual components should be done using the basic principles of calculus of probability. This is because according to the clinicians' definition, a randomly selected person is said to have the MS if he/she suffers from at least three of the five components. With this definition, we illustrate the calculations using the USA data given in Table 2.
As an example, the prevalence of one component is the probability assigned to the set: Assuming the independence of the individual components, this is given by: The dash over the notation means the negation of the event. For Similar calculations give P(S 1 )=0.283 Following the above approach prevalence, of 2, 3, 4, and 5 components are given respectively as: As was noted by Laird and Lange [11], "The general concepts used in aggregation and heritability analysis are widely accepted as useful measures of the degree to which traits are inherited; most researchers would not undertake genetic analysis without evidence of aggregation or heritability of the trait." One should note that familial aggregation of a trait is a necessary but not sufficient condition for inferring the importance of genetic susceptibility, since environmental and cultural influences can also play a role in familial clustering and excess familial risk. For quantitative traits, the biometrical approach introduced by Morton [12], Rao et al. [13], and Morton and McLean [14] to evaluate the degree of resemblance among family members has relied on the well-developed multivariate normal theory. However in assessing the degree of family resemblance, clinical epidemiologists often prefer to report the disease status of individuals on a categorical scale. Therefore analytic approaches established under the multivariate normal model are not useful. It should also be emphasized that questions regarding to familial aggregation of traits can be effectively addressed under appropriate sampling designs. Extended nuclear families, sib-ships based sampling designs, and twins studies are examples of designs that are often adopted by genetic epidemiologists to answer questions regarding heredity and environmental impacts on traits of interest.
Spousal resemblance [15] or concordance may be due to shared environment, common behaviors, and also positive assortative mating, that is, the tendency of individuals to choose a spouse with similar characteristics. If concordance was mainly due to a cohabitation effect, then it should increase with increasing time shared by spouses. Differential effects of cohabitation and assortative mating are not mutually exclusive, and both should be considered for a correct interpretation of spousal resemblance. Therefore, spousal resemblance is a subject of interest that can be studies under the spousal-pairs design.
The paper is structured as follows. In Section 2 we estimate the clustering parameter and construct an overall estimate pooled from each component. In Section 3 we discuss the estimation based on the clinical definition of each component of the syndrome using the BTPM, and in Section 4 we use the BDMM and compare the goodness of fit of both models.

Definition of the Metabolic Syndrome (MS)
The widely used definition of metabolic syndrome is that of the World Health Organization [16]. The components of each definition and criteria for making the diagnosis of the metabolic syndrome are summarized in Table

Spousal Concordance of Metabolic Syndrome: Analysis Based on Modeling Interval-Scale Data
An important study was conducted in [15]. The main objective was to estimate the spousal concordance for each of the 5 components of the syndrome. The sample included 3141 Korean spousal pairs. As a measure of concordance the authors used a single estimate of Pearson's correlation between spouses for each component. The data were not available for reanalysis by us. However, we used several Monte Carlo simulations from bivariate normal distributions. We used as input parameters the estimates reported in the Korean study with a sample size n=4000 spousal pairs. When we contrasted the summary measures in Table 3 with corresponding estimates produced by Park et al. [15] there were no differences among their and our results. In family studies, the ICC, not Pearson's correlation is the most commonly used measure of clustering of traits. In the last column of Table 3 we added the estimates of the Intra-class Correlation Coefficients (ICC) denoted by ˆ ρ . Equation (1) shows how the ICC is obtained as a function of the means, variances and the Pearson's correlation.
In [15] the authors dichotomized the data for each component, using the WHO definitions, and then used the odds ratio as a measure of spousal concordance. Alternatively we shall combine the ICC estimates to produce an overall estimate of clustering of the components of the syndrome.
In equation (1)  Note that the ICC is quite low in all the components of the metabolic syndrome. Since the ICC is defined as the ratio of the between pairs variance to the total variance, its interpretation depends on the population under study. In societies where for example consanguinity is accepted, due to the larger contribution of the genetic components of variation, one would expect the value of the ICC to be much larger than what is reported here.
To find an overall estimate of a measure of clustering of the metabolic syndrome we follow an approach proposed by Cochran [21].
ρ i Following Cochran [21], the minimum variance linear estimator of the pooled estimator of ICC is: We can show that the variance of ρ is given by: Asymptotically, ρ is distributed as Based on the asymptotic distribution of ρ , we construct a test of homogeneity for the five ICC's. This test is based on X 2 and is formulated as follows:  Table 3. From results given in Donner [22] we have The value of the chi-square test T=16.39 giving a p-value=0.005. This means the level of clustering varies significantly across the components. We may construct a 95 % confidence interval in the form .0967 ± 1.96 (0.007).
This information can be used to estimate the sample size (number of spousal pairs) needed to verify relevant hypotheses on the clustering parameter.
Suppose, for example that we would like to estimate the number of spousal pairs needed to detect the departure of the common clustering parameter from its null value. This is equivalent to testing the null hypothesis: not the same. We suggest applying the Fisher's [21] variance stabilizing transformation to the estimators of ICC. Fisher showed that, when the sample size is large, then for a sample size of N spousal pairs: With power 80% and Type I error rate 5%, to verify the above null hypothesis we need to recruit approximate number of pairs: For example for, The first and second scenarios give sample sizes 212 and 12300.

Analysis Based on Clinical Definition of the Syndrome The dichotomization scheme
The classification of a subject to be belonging to either of the two  categories depends on the cut-off point specified by the adopted guide lines specified by the organizations (WHO, IDF, EGIR, NCEP). There are little differences among these thresholds and for our study, whose main objective is the statistical methods post categorizations, we shall use the recommendation made by the WHO. To formalize the presentation we suppose that y ijl is the value of the lth component, for the j th spouse in the ith family, where,i=1, 2,…k (number of families), j=1, 2, and ,l=1, 2,3,4,5 We shall further assume that:

MS Component Father
Where c jl are the threshold value of the l th component for the j th spouse, as determined by the IDF (International Diabetes Federation). Note that the inequality sign will be reversed for the case low HDL. The cumulative probit model for the categorical variable * ijl y can be written as: j=1, 2; l=1, 2,3,4,5 for low HDL.
It turns out that, conditional on the i th spousal pair that { } * ijl y constitutes a finite sequence of independent Bernoulli trials with success probability P jl. Since we are interested in the sum of the components of MS, we define another random variable.

Categorical data analysis
Fekedulegan et al. [23] conducted a comparison among several statistical models to evaluate factors associated with metabolic syndrome. The goal of their study was to evaluate the usefulness of alternative generalized linear models for analysis of metabolic syndrome as a count outcome and compare the results with models that utilize the definition as a binary outcome (presence/absence) for the syndrome. After the dichotomization of the measured outcomes, the definition of metabolic syndrome can be modified, as the total count of syndrome components for an individual, to represent a discrete outcome y=0, 1, 2, 3, 4, or 5, where statistical models for count data can be used as an alternative to assess the association between exposure variable(s) and metabolic syndrome. They proposed using the Poisson regression to model the relationship between the syndrome as a dependent variable with limited range (0, 1, 2, 3, 4, 5) and potential risk factors associated with the syndrome. We note that: 1.
The Poisson random variable has values that range from zero to infinity, and one has to modify the model to account for the right truncation at y=5.

2.
Since our interest is on the clustering, a right truncated Poisson model needs to be extended to model the correlation among the counts of the spousal pairs.
The focus of this section is on the application of the right truncated bivariate Poisson as a possible tool to model correlated count data. Let x 1 be a random variable having a truncated Poisson distribution, at a known truncation point m: where, Denote the mean and variance of x are given respectively by µ m1 and v m1 Simple computations show that the mean and variance are given respectively by: Strategy of the construction of a bivariate model from the marginal distributions is a well-known [24,25].
Define the random variables: , , x x γ are mutually independently Poisson random variables with ( ) Poisson .

γ ϑ
The random variables ( ) 1 2 , ξ ξ are said to have bivariate truncated Poisson distribution whose joint probability function is given by: It can be easily shown that ϑ is the covariance between the two variables and therefore: Because of the restrictions on the Poisson parameters, the correlation between ( ) 1 2 , ξ ξ is strictly positive. We followed the WHO guidelines as indicated in Table 1, to dichotomize the data obtained from the Monte Carlo simulations.
Let the estimated count frequency from the 5 X 5 table be denoted by The correlation coefficient estimate is obtained by substituting the MLE of the model parameters in (4) and is given by  0.0798 0.016. ρ = ± The standard error of  ρ is obtained by the delta method. The concordance correlation, as a measure of clustering, using equation (1)  Note that the cost of categorization, relative to the uncategorized data is a drop of 29% in the value of the clustering parameter (0 .0760 vs 0.0967) and a drop in the efficiency of estimation of about 19%. Meaning that, if we follow the clinical definition of the syndrome, one needs a sample size almost 20% larger than the needed sample to analyze uncategorized data, to maintain the level of precision in estimating the clustering parameter.
An important question is how to assess the goodness-of-fit (GOF) of the BTPM. Traditionally the chi-square goodness of fit is used for this purpose. But one of the disadvantages of this measure is that it is value is affected by few (one or two) expected frequencies (e) that deviate markedly from their corresponding observed frequencies (o). Alternatively, we shall use three approaches to measure the GOF of the BTPM. The first is to use Lin's concordance correlation between the observed, and the expected counts n rs under the present model. Using equation (1) we get an estimate measure of concordance between the observed and the expected counts given by (0.798 ± 0.091). This is an indication that the TBPM fits the counts reasonably well. The second is quite different, and depends on the degree of closeness (agreement) between the observed and the expected counts under the present model. First we define the statistic D=abs(o-e)/25, and if D ≤ q then, e and o are close, otherwise they are not. The quantity q is arbitrarily chosen, and for the current application we take q=0.01. The percent agreement (i.e. percent of time D ≤ 0.01) is 0.375, with a standard error 0.125, indicating poor agreement between the observed and the expected counts. The third approach is an adaptation to the technique developed by Altman and Bland [26] and Bradley and Blackwood [27] to establish agreement between two sets of measurements. As shown in [27] a nonsignificant liner regression of (o-e) on (o+e)/2 is an indicative of strong agreement. In the meant time, we can detect agreement as indicated in [25] by plotting the limits of agreement: (o-e ) ± SD (o-e) vs (o+e)/2. This is shown in Figure 1, and we conclude, based on the methods that the BTPM does not provide a good fit to the data.
The BTPM allows us to analyze the categories as counts, which are often modeled by probability distributions such as Poisson or negative binomial. Another strategy is to analyze the frequency of these counts in the sampled spousal pairs. The most commonly used tool that accounts for the correlations among categories for cross classified data is the Multinomial-Dirichlet-Model (MDM) [28,29]. As a competitor to the BTPM we shall apply the MDM to analyze correlated categorical data.

Analysis of Correlated Categorical Data with the BDMM
To analyze the cell counts in the 6 × 6 frequency table assuming that the data were generated by the MDM, we need to formalize the presentation.
Conditional on the i th pair ( ) Now, to affect a correlation among the multinational probabilities, we usually assume that conditional on the spousal pair the vector of probabilities has a bivariate distribution that is specified by its marginal means and the covariance structure such that: 1.
. π π = − i i and will be treated as a nuisance parameter. Note that the correlation parameter defined is not necessarily the clustering parameter that we seek to estimate. Introducing ρ at this stage is necessary to create a quasirandom effect model in order to induce correlation between spousal pairs. The idea is that we need to generate expected correlated cell counts under the above set-up and then compare these counts with the observed counts. As a measure of cluster we shall use Cohen's kappa [30,31].
The application of Cohen's Kappa in clinical and epidemiological research is long standing and is used to measure the diagonal agreement between two categorical variables in C x C table. Let The kappa coefficient is defined as: Agresti [29] presented a simple expression for the estimated variance of κ Where; ( ) It is more convenient to write the variance of κ in terms of the chance agreement and the population kappa in the form: We would like to determine the sample size N needed to test the hypothesis H 0 : k=0 verses H 1 : k=1 Type I error date=α and Type II error date=β the required number of spousal pairs is Remarks: When k=0, then is an increasing function of π e . That is, larger values of chance agreement requires larger number of spousal pairs to detect significant departure from the null value. We also note that the complete specifications of ( ) ê Var depend on values of u 1 and u 2 which in turn depend on the cell and the marginal probabilities. Therefore one should provide guessed values for u 1 and u 2 in an attempt to estimate the sample size.
We shall use Cohen's [30,31] kappa to measure the degree of clustering between randomly selected pairs of spouses. Borkowf et al. [32] established the asymptotic theory for construction of large sample confidence interval on the kappa statistic.
The cross classification of the data for wives and husbands is given in Table 4.
Note that category 5 was quite sparse for both males and females and therefore we decided to collapse it with category 4. The estimated κ and its standard errors are given respectively by 0.022, and 0.01, which is significantly different from zero (p-value=0.027). The BDMM fitted the data quite well. The first criterion for the goodness of fit gives: (1) an estimated value for Lin's concordance correlation coefficient between observed and expected count as (0.995 ± 0.002). (2) The percentage absolute difference (percentage agreement), using the same criterion as in the case of the BTPM is (0.92 ± 0.06). (3) The limits of agreement [26] shown in Figure 2 indicate that there is a considerable agreement between the observed and expected counts. The regression of the difference between the observed counts on the average was nonsignificant with p-value=0.133.

Marginal homogeneity
In order to establish familial resemblance between pairs of spouses it is important to test the homogeneity of their marginal distributions. In the case of the matched-pairs data, McNemar's test [33] can be applied only to the case in which there are two possible categories for the outcome. According the clinical definition of the MS a person with at least three components is classified to have the condition; else he/ she does not have it. For spousal pairs with a categorical response, a two-way contingency table with the same row and column categories summarizes the data, under this situation, the contingency table is also called square table. In this case we have single 2 × 2 table: The variance expression for kappa (21) is greatly simplified, and  π π π π π π π π π κ π π = − = − = + = − + π π π π π π π π π π π π π = − + = − = − = − and Under marginal homogeneity the variance expression for the kappa statistic reduces to: Following the clinical definition of the MS, we collapse the data in Table 5 into a square Table 6. The response category "1" represents the count of at least three components.
The McNemar's test of marginal homogeneity has a p-value=1.00. The kappa statistic and its standard error are given respectively by (0.033 ± 0.017). There is some statistical implication when the clinical definition of the syndrome is used.
For a general square contingency table Stuart [34] and Maxwell [35] developed an asymptotic chi-square test of marginal homogeneity, for which a SAS macro was provided in [36]. The value of the test statistic is 16.3 with p-value=0.092 indicating the support for marginal homogeneity between spouses' distribution over the specified categories.

The issue of sample size requirements
The sample estimation is an essential step to guarantee that a study possesses a certain power. In the case of 2 × 2 table, sample size requirements regarding the statistical inference on kappa, has been extensively discussed [37,38]. In the general case, we shall investigate two approaches. The first approach is proposed by Cantor [39] who used the relative error in estimation  and the numerator of kappa (π 0π e ) as the basic requirements for sample size estimation. The proposed equation is: ( ) we need to recruit at least 9700 spousal pairs. Two advantages of Cantor's expression; the first is that the sample size equation (28) does not depend on the marginal probabilities, through its dependence on u 1 and u 2 which must be guessed by the researcher. Second, it does not need specifications for the values of kappa under the two hypotheses to be tested. Alternatively, we provide the sample size requirement to test the hypothesis H 0 : k=0 versus a non-zero alternative, for Type I error rate α and power 1-β. The sample size expression is: For testing k 0= 0, Table 7 gives some calculated values of N for guessed values of u 1 and u 2 , 80% power and 5% Type I error rate. The disadvantage is that for (29) to be of practical importance we need to guess the values of u 1 and u 2 .
It is clear from Table 7 that for fixed power and type error rate the sample size depends on the distance of the values of under both hypotheses, and the chance error. Moreover, it seems that values of u 1 and u 2 have little effect on the sample size.

Discussion
Our main focus in this paper has been on the estimation of spousal concordance which may be the result of shared lifestyle and socioeconomic environment. In fact spousal resemblance or concordance may be due to shared environment, common behaviors, and also positive assortative mating, that is, the tendency of individuals to choose a spouse with similar characteristics. If concordance was mainly due to a cohabitation effect, then it should increase with increasing time shared by spouses. Identification of the relative contributions of shared modifiable environmental risk factors may then improve our understanding and thus enable targeting of detrimental lifestyle minimizing the rapid increase in the prevalence of the metabolic syndrome.
Because data for the components of MS for spousal pairs are not available, we simulated similar data using input parameters from the Korean study. Working with simulated data is a practical way to control the sampling error, and to verify the reproducibility of the results. This strategy has been recommended in [40]. Since our interest is in modeling MS in spousal pairs we need to define bivariate probability distributions for continuous data, and the categorical data. We developed two models; (1) The Truncated Bivariate Poisson Model (BTPM) to analyze the categories as counts. The main purpose of the truncation is to overcome the limitations of the Poisson to model discrete outcomes that has an upper finite limit. (2) The second model is Multivariate-Dirichlet-Multinomial distribution (BDMM) to analyze the frequency of the bivariate counts. The two models are not nested within each other, and we therefore had to evaluate the goodness of fit of each model separately. We have developed three approaches to goodness of fit; all showed that the BDMM fitted the data quite closely.
Total π .0 π .1 1   Since the index of clustering is our target parameter its nature varied according to the modeling strategy. The ICCC is a natural choice for the continuous data under both the bivariate normal model and the BTPM. The kappa statistic is the natural choice to measure spousal concordance for the categorical data. It is interesting to notice that the parameter estimate of spousal concordance has almost similar value under the three models. But the level of uncertainty was highest under the BTPM.
There are limitations in modeling MS under the study design proposed in [15]. The first is the absence of important covariates, and or possible confounders, that affect the estimation of the clustering parameter. For example, it would be desirable to include in the proposed models individual level covariates such as age, and level of education, and possibly cluster level covariate such as the length of spousal co-habitation.
As a final remark, we note that the evaluation of the kappa statistic from a 2X2 table makes the problem of likelihood based inference, sample size estimation, and confidence interval construction tractable. But the interest in agreements between pairs when the subjects are classified in several categories has increased the potential applications of the kappa statistic in this regard as a measure of clustering. Hypothesis testing and sample size requirements are much more complicated in the multi-categorical classification. This problem was considered by many researchers by collapsing the multinomial data into binary data. Bartfay and Donner [41], Cohen [42] Donner and Eliasziw [43], and Kraemer [44] demonstrated the advantage of preserving multinomial data on the original scale. They all showed that collapsing multinomial data into two categories results in reduction of the effective sample size. This means that a substantial increase in the sample size is required to maintain the same level of power at a given level of significance.