Received date: April 01, 2015; Accepted date: May 22, 2015; Published date: May 29, 2015
Citation: Rosner B, Wang W, Eliassen H, Hibert E (2015) Comparison of Dependent Pearson and Spearman Correlation Coefficients with and without Correction for Measurement Error. J Biomet Biostat 6:226. doi:10.4172/2155-6180.1000226
Copyright: © 2015 Rosner B, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are are credited.
Visit for more related articles at Journal of Biometrics & Biostatistics
There already exist methods for comparing dependent Pearson correlation coefficients. However, each of the variables (X, Y) has associated random error; and a related question is after correcting for random error, which variable correlates most highly with the outcome variable Z. In this paper, we present methods for comparing dependent deattenuated correlation coefficients. This is a generalization of previous work for obtaining confidence limits for a single deattenuated correlation coefficient. In addition, we extend this work to the comparison of dependent Spearman correlation coefficients. The methods are illustrated with two examples. The first example concerns the comparison of nephrotoxicity of phenacetin and aspirin intake as measured by repeat biomarkers obtained from the same subjects. The second example is a comparison of the validity of different storage conditions for measuring HbA1c from dried blood specimens as compared to the gold standard of immediate processing. Results from using these methods indicate that phenacetin intake is more highly correlated with serum creatinine levels than aspirin intake and that short-term storage is preferable to long-term storage for assessment of HbA1c levels. We have available SAS software for comparing dependent deattenuated Pearson correlation and dependent Spearman correlations with and without deattentuation.
Deattenuated correlation; Dependent correlations; Measurement error correction; Pearson correlation; Spearman correlation
In 1967/68 a group of 1,256 women were recruited for the Swiss Analgesic Study [1,2]. Women were age 30-49 years and lived in the Basel, Switzerland area. The purpose of the study was to investigate whether there was an association between intake of phenacetincontaining analgesics and prevalence and incidence of kidney disease. To measure phenacetin intake a urinary metabolite (N-acetyl-Paminophenol (NAPAP)) was assayed from a urine sample in the clinic and from 2 additional urine samples collected at home on 2 separate days within 1 week of the clinic visit. Serum creatinine was used as a measure of kidney function. Women were seen at follow-up visits in 1969, 1970, 1971, 1972, 1975 and 1978. In the primary paper, mean NAPAP over the 3 replicates was categorized and related to both prevalence and incidence of kidney disease where an abnormal creatinine was defined as >1.5 mg/dl. In the present paper, we represent NAPAP and serum creatinine at the baseline clinic visit in 1968 as continuous variables and compute the correlation coefficient between these two measures. To distinguish phenacetin intake from general use of analgesics, urinary salicylates were also assessed in triplicate at baseline as an indication of intake of aspirin-containing analgesics. If we let X=NAPAP at the baseline clinic visit, Y=salicylates at the baseline clinic visit and Z=serum creatinine at the baseline clinic visit, then we wish to test the hypothesis
where ρxz is the population correlation between X and Z and ρyz is defined similarly.
There already exist methods for comparing dependent correlation coefficients for normally distributed random variables obtained from the same subjects. In an excellent review paper on this subject  it is noted that in previous simulation studies, several methods are clearly inappropriate, while other methods preserve type I error under a variety of conditions. Among the latter are the method of Williams  which is an enhancement of a procedure initially proposed by Hotelling . Hotelling’s procedure is exact, but only under the condition that the sample and population standard deviations are the same for both X and Y (i.e., sx=σx, sy=σy). To test the hypotheses in (1) Williams proposes the test statistic
Where RXZ=corr(x,z),Rxy and Ryz are defined similarly, N=sample size an
and we reject H0 if and tN-3,p=pth percentile of a tN-3 distribution.
In addition, Olkin and Finn  used a similar approach to compare more than 2 correlated correlation coefficients. Since in finite samples, the Fisher z transformation better approximates a normal distribution than the sample correlation coefficient, Dunn and Clark  proposed the test statistic
Zxz=0.5 ln [(1+ Rxz)/ (1- Rxz)], Zyz is defined similarly, and we reject H0 if , where zp=pth percentile of a N(0,1) distribution.
Furthermore, Meng, Rosenthal, and Rubin  propose a statistic that is asymptotically equivalent to the Dunn and Clark procedure, but with a simpler expression for var(Zxz -Zxz) given by:
In addition, Bilker, Brensinger, and Gur  propose a combined permutation-bootstrap approach to test the hypothesis in (1) based on the test statistic
which is computationally intensive, but avoids the necessity of specifying the underlying distribution of Zxz. Finally, the R package  cocor maintained by Diedenhofen  provides software to implement a variety of methods for comparing dependent Pearson correlation coefficients.
However, an assumption of [4-8] is that the joint distribution of (X, Y, Z) is multivariate normal. We relax this assumption and use the method of moments and the delta method to estimate var(Zxz -Zyz). Furthermore, in the Swiss Analgesic dataset, there is considerable intraindividual variation among replicate measures in both X and Y and the more important question is whether the true (underlying) mean value of X and Y at baseline, denoted by μx and μy are more highly correlated with Z. Thus, we wish to test the hypothesis
This is an extension of previous work  that provides confidence limits for a single deattenuated correlation. Another issue is that it is clear that neither X nor Y is normally distributed. Hence, the Spearman correlation coefficient may be a more appropriate measure of association than the Pearson correlation coefficient in this dataset. Hence, we extend our methodology to the comparison of dependent Spearman correlation coefficients both with and without correction for measurement error. To our knowledge, there is no previous literature on the comparison of dependent Spearman correlation coefficients.
In this paper, we first describe the methodology and propose both an asymptotic test and an exact test. Second, we present a simulation study to assess the validity of the asymptotic test in finite samples and compare it to existing procedures for comparing dependent correlation coefficients. Finally, we describe two examples based on real data illustrating the use of these methods.
We consider a classical measurement error model for Xi and Yi of the form
are respectively independent, if replicate measures of X and Y are obtained from the same subjects.
We define the true correlation between (X and Z) and (Y and Z) by
Based on equations (2) and (3), it is straightforward to show that
the intra class correlations obtained from repeated measures of X and Y based on (2). We wish to test the hypothesis:
Note that it is possible that while if the reproducibility of X is worse than that of Y (i.e., ICCx<ICCy). Similarly, it is possible that while
It is usually more efficient to base inferences for correlations based on Fisher’s z statistic. Thus, we will test the hypothesis
and Zyz,true is defined similarly.
We will use the delta method to estimate
and obtain the test statistic
and are derived in Appendices A, B and C of the supplementary materials, respectively.
Therefore, from Appendices A, B and C of the supplementary materials, we obtain and the test statistic Vxy,true in (5),
with asymptotic p-value
, where Φ=standard normal c.d.f.
Similarly, we can also test the hypothesis based on the large sample test statistic
where and based on the delta method
are given in Appendices A, B and C, respectively of the supplementary materials.
It is also of interest to obtain confidence limits for . It is not possible to translate confidence limits for Z , to corresponding confidence limits for ρ , . Instead, we will use as a point estimate of . To obtain we use the formula:
Since is a ratio estimator, we first consider and use the delta method to obtain
Furthermore, from the delta method,
where and are obtained from Appendix A of the supplementary materials. Similarly,
where and are obtained from Appendix B of the supplementary materials. Finally, from the delta method we have:
where and are given in Appendix C of the supplementary materials.
Hence, if we combine (9)-(13) and assume asymptotic normality of . we have an approximate 100% × (1-α)CI for given by
Similarly, we can obtain confidence limits for given by
When sample size is small, permutation methods can be used to estimate levels of significance. The permutation distribution is generated by randomly shuffling the X, Y labels and computing the empirical distribution of in (5) and (7), respectively. The rank of based on the observed data with reference to their permutation distributions can be used to estimate exact two-sided p-values given by
And similarly for where M is the size of the permutation distribution and I(a)=1 if a is true,=0 if a is false.
We can also consider the comparison of dependent Spearman correlations whereby we test the hypothesis
where =population Spearman correlation between X and Z and is defined similarly.
If the data are transformed to the probit scale then from Rosner and Glynn .
where and is estimated by where Φ=c.d.f. of a N(0,1) distribution, population correlation between the probit of X and the probit of Z and is defined similarly.
Thus, the hypothesis test in (17) is equivalent to the hypothesis test in (7) after transformation to the probit scale yielding a test statistic based on given by
It is also of interest to obtain confidence limits for which we estimate by . Based on the delta method and (18), we obtain:
are given in Appendices A, B and C, respectively, of the supplementary material.
Thus, a 100% × (1-α)CI for is given by
Correction for measurement error
Rosner and Glynn  defined the deattenuated Spearman correlation by
where is the deattenuated correlation between probit scores for X and Z, respectively. Hence, the hypotheses are equivalent to the hypotheses
Furthermore, the latter hypotheses can be tested using (5), where the Fisher z transformation is performed based on probit scores, yielding a test statistic based on given by
and and are obtained from equations (A3), (B1) and (C1), of Appendices A, B and C, respectively of the supplementary materials.
To obtain a 100% × (1-α)CI for we use a similar approach as in (19) and obtain:
are obtained from (11)-(13), respectively.
The corresponding large sample 100% × (1-α)CI for is given by
We conducted simulation studies to assess the validity of the asymptotic procedure in (7) for comparing dependent correlations in finite samples and to compare this procedure with the procedures of Dunn and Clark  and Meng, Rosenthal and Rubin . We simulated data for (X, Y, Z) from a multivariate normal distribution N(μ, Σ) with
We performed 4,000 simulations for each combination of N=100, 400 and
We also simulated data from a log normal distribution (X*, Y*, Z*), where X*=exp(X), Y*=exp(Y) and Z*=exp(Z). The results are given in Table 1. We see that in the case of a normal distribution that all three procedures have appropriate type I error (designs 1-4) , particularly in the case of n=400. Similarly, the power of the procedures (designs 5-6) is also comparable. However, in the case of a log normal distribution, type I error is more adequately preserved by the method of moments procedure, particularly for n=400. As the sample size increases, the inconsistent estimate of the variance provided by the Dunn and Clark and Meng procedures in the setting of a non-normal distribution yields variance estimates that are too low and type I errors that are too high. Conversely, as n increases the method of moments procedure yields consistent variance estimates and actual p-values that are close to nominal levels. Alternative approaches if non-normality is suspected would be to compare dependent Spearman correlations rather than Pearson correlations.
|Design||Distribution||ρxz||ρyz||N||Type I Error||Power|
|Method of Moments*||Dunn and Clark||Meng||Method of Moments*||Dunn and Clark||Meng|
*Based on equation 7
Table 1: Comparison of alternative methods for comparing dependent correlations.
To assess the accuracy of the procedure in (5) to compare dependent deattenuated Pearson correlations we used the same data designs as in designs 1-6. We performed each set of simulations in two ways. First, we assumed that the intra class correlation between replicate measures of X (and Y) was known without error and set to 0.67. Thus, in equations (A3), (B1) and (C2). Second, we generated replicate values of (X1, X2, Y1, Y2) from a multivariate normal distribution denoted by N (μR, ΣR) where μR=(0,0,0,0) and
and estimated ICCx and ICCy from the replicate data. For each set of simulations we estimated the empirical type I error of Vxy, true, and the bias and variance of in (5). The results are given in Table 2.
We see in Table 2 that the median theoretical variance and the empirical variance of and are very similar both for the case where the ICC is assumed known (Type 1) and where it is estimated from the simulated data (Type 2). In addition, the empirical type I error is close to 0.05 (range from 0.039 to 0.061) for all parameter combinations considered. Furthermore, the bias in estimation of is close to 0 for all parameter combinations considered. The above variances are slightly larger when the ICC is estimated (Type 2) than when the ICC is assumed known (Type 1), particularly when .
|Type||ICCx(ICCy)||ρxz (ρyz)||N||TypeI error|
Theoretical variance estimates are medians over 4000 simulated samples; Type 1 simulations assume that ICCx, ICCy are known; Type 2 simulations estimates ICCx, ICCy, from the sample data; for both Type 1 and Type 2 simulations, the underlying ICCx, ICCy are provided in the 2nd column of the table. Simulation results are based on 4000 simulations.
Table 2: Simulation results for the test procedure in (5) and (6).
In addition, we estimated the bias and variance of and the associated coverage probability based on (14). The results are given in Table 3. We note that the bias is close to 0 in all designs considered. Furthermore, the empirical type I error in designs where is close to 0.05 and the coverage probability both when and when is close to 95% both for N=100 and N=400.
|ρxz||ρyz||N||TypeI error||Coverage probability|
ICCx and ICCy estimated from simulated data (type 2 simulations). Simulation results are based on 4000 simulations
Table 3: Simulation results for the estimation of in (14).
Association between analgesic intake and level of kidney function
We use data from the Swiss Analgesic Study where in 1967/68 1,256 women provided a urine specimen in the clinic and 2 additional specimens at home on different days which were assayed for NAPAP, a metabolite indicating recent intake of phenacetin-containing analgesics and salicylates a metabolite indicating recent intake of aspirin-containing analgesics. In addition, serum creatinine, a marker of kidney function was also measured at the baseline clinic visit. We let X, Y and Z be the NAPAP, salicylates and serum creatinine at the baseline (1968) clinic visit. We wish to compare corr(X,Z) with corr(Y,Z). A total of 1168 women had complete data on X, Y and Z. Descriptive statistics for the (X,Y,Z) data are provided in Table 4.
|Pearson (Spearman) correlation||Pearson (Spearman)|
|N||mean||sd||(10th – 90th percentile)||x||y||z||ICC|
|NAPAP (X) (o.d.)||1168||0.176||0.352||(0.000,0.675)||1||0.423||0.19||0.614|
|salicylates (Y) (mg%)||1168||15.8||19.2||(10.0,35.0)||1||0.068||0.404|
|serum creatinine (Z) (mg/dL)||1168||1.02||0.37||(0.69,1.50)||1|
NAPAP value at the baseline (1968) clinic visit.
Individual salicylate values were obtained in grouped form (1=0-19 mg%/2=20-49 mg%/3=50-99 mg%/4=100+mg%) at the baseline (1968) clinic visit, and assigned scores of 10 mg%, 35 mg%, 75 mg% and 100 mg%, respectively.
Spearman correlations are estimated based on (18)
N: Number of subjects with serum creatinine, NAPAP and salicylates, available at the baseline (1968) clinic visit
ICC: Intraclass Correlation among replicate urine values (based on urine specimens obtained at the baseline clinic visit and 2 additional urine specimens obtained at home on different days).
Table 4: Descriptive statistics for NAPAP, salicylates and serum creatinine at the baseline visit, Swiss Analgesic Study, n=1168.
We note that the distributions of each of X, Y and Z are skewed and not normally distributed. There is moderate correlation between NAPAP and salicylates (Pearson correlation=0.423) in part because some analgesics contain both phenacetin and aspirin. Pearson correlations between NAPAP and salicylates vs. serum creatinine were 0.190 and 0.068, respectively. The ICC for NAPAP and salicylates over the 3 determinations were 0.614 and 0.404, respectively, indicating moderate variability of intake on different days.
Based on (7), the observed Pearson correlations were significantly different (Vxy=3.344, p=0.001) (Table 5). For comparison, we also analyzed these data using the Dunn and Clark  and Meng, et al.  procedures. Based on Dunn and Clark, we have Z*1=3.929, p-value=8.5 × 10-5. Based on Meng, et al, we have Zxy=3.919, p-value=8.9 × 10-5. As in the simulations, the se’s are inappropriately low for both of these procedures in the setting of these highly non-normal data, yielding p-values that are biased downward and type I errors that are too low. The deattenuated Pearson correlations were 0.243 and 0.108 for NAPAP (X) and salicylates (Y), respectively, vs. serum creatinine. Based on (5) there was a significant difference between these correlations, Vxy,true=2.640, p=0.008. Since NAPAP and salicylates were right-skewed, we also computed observed and deattenuated Spearman correlations which are presented in Table 6.
|Variable||observed correlation||Z transform||deattenuated correlation||Z transform|
|NAPAP, 1968 (o.d.)||0.19||0.192||0.243||0.248|
|salicylates, 1968 (mg%)||0.068||0.069||0.108||0.108|
|Difference between correlations||0.122||0.124||0.135||0.139|
|(95%CI)||(0.051, 0.193)||(0.035, 0.235)|
The V statistic for the observed and deattenuated Pearson correlations is based on (7) and (5), respectively.
Table 5: Observed and deattenuated Pearson Correlations between serum creatinine at baseline (1968) vs. each of NAPAP and salicylates at the baseline clinic visit (1968), Swiss Analgesic Study, n=1168.
|Variable||observed Spearman correlation||observed probit correlation||Z transform||deattenuated Spearman correlation||deattenuatedprobit correlation||Z transform|
|NAPAP, 1968 (o.d.)||0.155||0.162||0.163||0.212||0.221||0.225|
|salicylates, 1968 (mg%)||0.081||0.085||0.085||0.127||0.133||0.134|
|Difference between correlations||0.074||0.077||0.078||0.085||0.088||0.091|
|(95% CI)||(0.013, 0.134)||(-0.005, 0.174)|
The V statistic for the observed and deattenuated probit correlations is based on (7) and (5), respectively.
Table 6: Spearman Correlations between serum creatinine at baseline (1968) vs. each of NAPAP and salicylates at the baseline clinic visit (1968), Swiss Analgesic Study, n=168.
The observed Spearman correlations between Z and (X,Y) were 0.155 and 0.081, respectively, which were significantly different (Vxy,probit=2.395, p=0.017). After correcting for deattentuation, the Spearman correlations between Z and (X,Y) were 0.212 and 0.127, respectively, which only showed a trend towards statistical significance (Vxy,probit,true=1.849, p=0.064). Overall, the results from the comparison of correlation coefficients in Tables 5 and 6 were consistent with the results from the primary paper (Dubach, Levy and Muller, ) which showed that NAPAP and salicylates were significantly associated with serum creatinine at baseline, although these analyses were based on categorical definitions of NAPAP, salicylates and creatinine.
Comparison of storage conditions for HbA1c measurements in plasma
Recently dried blood spot assays of HbA1c have been used as biomarkers in many epidemiologic studies and simplified specimen handling would be cost-effective for these large scale community-based studies. A study was conducted at Brigham and Women’s Hospital (BWH) to test the hypothesis that dried blood spot determinations for HbA1c are valid measurements with low-intensity storage conditions . Blood samples were drawn into EDTA containing tubes and submitted for duplicate HPLC analysis of HbA1c (considered as the gold standard process). Blood for spotting was also drawn to identical EDTA tubes and then dropped randomly to blood-spot cards. After having been air-dried for at least 20 minutes these dried blood specimens were placed into single-sample, airtight bags with desiccant pouch. They were then stored at room temperature for 0, 2, or 4 weeks in the lab and were then shipped to Biosafe Laboratories, Inc. (Lake Forest IL) for immediate HbA1c analysis or placed in freezer storage [-80°C] for an additional 4 or 12 weeks. Dried blood spot determinations of HbA1c were performed in triplicate, blinded to storage conditions and protocol. Here we want to test the equality of the correlations between results from HPLC measures (Z) and results from dried blood spot measures after 2 weeks of room temperature pre-shipping followed by an additional 4 weeks of freezing after shipping (X), and the correlation between results from HPLC measures (Z) and results from dried blood spot measures after 4 weeks of room temperature pre-shipping followed by an additional 12 weeks of freezing after shipping (Y).
We refer to storage for 2 weeks at room temperature plus 4 additional weeks in the freezer as short-term storage (X), storage for 4 weeks at room temperature plus 12 additional weeks in the freezer as long-term storage (Y) and HPLC processing as Z. The observed and deattenuated Spearman correlations between Z vs. (X,Y) respectively are given in Table 7. We see that ρxz,s=0.952 while ρyz,s=0.711, which are significantly different both based on large-sample methods (p=0.005) and also more appropriate exact methods (p=0.021). After deattenuation, we obtain ρyz,s, true=0.979 and ρyz,s, true=0.776 (2-sided p-value=0.134 based on large sample methods and 0.056 based on exact methods). Thus, the length of storage has a borderline significant effect on the validity of the HbA1c assay with short-term storage preferable. Plots of , permutation and in equation 16 over the 4,096 elements of the permutation distribution are given in Figures 1 and 2, respectively. The distributions are somewhat skewed to the right, particularly for , which indicates the necessity of using exact methods in a small-sample setting.
|observed Spearman correlation||observed probit correlation||Z transform||deattenuated Spearman correlation||deattenuatedprobit correlation||Z transform||ICCs|
|2 weeks of room temperature pre-shipping plus 4 weeks of freezer (X)||0.952||0.956||1.896||0.979||0.98||2.31||0.946|
|4 weeks of room temperature pre-shipping plus 12 weeks of freezer (Y)||0.711||0.727||0.922||0.776||0.79||1.072||0.835|
|p-value (large sample)||0.005||0.134|
The V statistics are based on (7) and (5) using the probit transformation.
Table 7: Spearman Correlations between HbA1c levels determined immediately vs. HbA1c levels determined after 2 delay periods, N= NICC =12.
In this paper, we provide a method to compare two Pearson correlation coefficients (ρxz ,ρyz ) where the correlations are estimated from the same subjects. Unlike previous literature on the subject, we do not make the assumption that (X, Y, Z) are multivariate normal. It is shown in simulation studies that the method of moments estimator of the sampling variance of (Zxz-Zyz) is more appropriate than standard methods that assume normality, particularly in the setting of non-normal data such as in the analgesic intake data presented in the first example in this paper. We then extended our methods to provide for comparison of two deattenuated correlation coefficients where the correlations are estimated from the same subjects. This method is generally applicable to compare the validity of two surrogate measures (X, Y) where validity is measured by the respective correlation with a third variable (Z) and both X and Y are subject to measurement error. Although the methods are asymptotic, we have shown in finite samples that the asymptotic properties of the test statistic and confidence limits are appropriate when N ≥ 100. In addition, we provide a method for confidence interval estimation of both Δρ and ρΔ true .
The methods were extended to the comparison of dependent Spearman correlations which may be more appropriate when (X, Y, Z) are not multivariate normal, which provide for the comparison of both ordinary and deattenuated Spearman correlations. To our knowledge, this is the first paper in the literature to discuss comparison of dependent Spearman correlations.
In this paper, we have considered deattenuated correlations of the form corr(μx,Z) and corr(μy,Z) which account for error in the estimation of X and Y. However, it is likely that Z is also measured with error and a useful extension would be to estimate and compare corr(μx μz) vs corr(μy μz) , where error in X, Y and Z are taken into account.
SAS macros for the comparison of dependent ordinary and deattenuated Pearson and Spearman correlations as discussed in this paper are available from the following website:
We would like to thank Drs. Buxton and Seeman for providing the original data on HbAlc.
The research was supported by U.S. National Institute of Health grants (U54 CA155626-03, P30 CA006516-48, R01 CA050597-18 and P30 AG017265- 099002). This work was conducted with support from Harvard Catalyst | The Harvard Clinical and Translational Science Center (National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health Award UL1 TR001102) and financial contributions from Harvard University and its affiliated academic healthcare centers. The content is solely the responsibility of the authors and does not necessarily represent the official views of Harvard Catalyst, Harvard University and its affiliated academic healthcare centers, of the National Institutes of Health.
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals