Comparison of Dependent Pearson and Spearman Correlation Coefficients with and without Correction for Measurement Error

There already exist methods for comparing dependent Pearson correlation coefficients. However, each of the variables (X, Y) has associated random error; and a related question is after correcting for random error, which variable correlates most highly with the outcome variable Z. In this paper, we present methods for comparing dependent deattenuated correlation coefficients. This is a generalization of previous work for obtaining confidence limits for a single deattenuated correlation coefficient. In addition, we extend this work to the comparison of dependent Spearman correlation coefficients. The methods are illustrated with two examples. The first example concerns the comparison of nephrotoxicity of phenacetin and aspirin intake as measured by repeat biomarkers obtained from the same subjects. The second example is a comparison of the validity of different storage conditions for measuring HbA1c from dried blood specimens as compared to the gold standard of immediate processing. Results from using these methods indicate that phenacetin intake is more highly correlated with serum creatinine levels than aspirin intake and that short-term storage is preferable to long-term storage for assessment of HbA1c levels. We have available SAS software for comparing dependent deattenuated Pearson correlation and dependent Spearman correlations with and without deattentuation. Journal of Biometrics & Biostatistics J o u rn al of Bio metrics & Bistatis t i c s


Introduction
In 1967/68 a group of 1,256 women were recruited for the Swiss Analgesic Study [1,2]. Women were age 30-49 years and lived in the Basel, Switzerland area. The purpose of the study was to investigate whether there was an association between intake of phenacetincontaining analgesics and prevalence and incidence of kidney disease. To measure phenacetin intake a urinary metabolite (N-acetyl-Paminophenol (NAPAP)) was assayed from a urine sample in the clinic and from 2 additional urine samples collected at home on 2 separate days within 1 week of the clinic visit. Serum creatinine was used as a measure of kidney function. Women were seen at follow-up visits in 1969, 1970, 1971, 1972, 1975 and 1978. In the primary paper, mean NAPAP over the 3 replicates was categorized and related to both prevalence and incidence of kidney disease where an abnormal creatinine was defined as >1.5 mg/dl. In the present paper, we represent NAPAP and serum creatinine at the baseline clinic visit in 1968 as continuous variables and compute the correlation coefficient between these two measures. To distinguish phenacetin intake from general use of analgesics, urinary salicylates were also assessed in triplicate at baseline as an indication of intake of aspirin-containing analgesics. If we let X=NAPAP at the baseline clinic visit, Y=salicylates at the baseline clinic visit and Z=serum creatinine at the baseline clinic visit, then we wish to test the hypothesis 0 1 : vs : where xz ρ is the population correlation between X and Z and yz ρ is defined similarly.
There already exist methods for comparing dependent correlation coefficients for normally distributed random variables obtained from the same subjects. In an excellent review paper on this subject [3] it is noted that in previous simulation studies, several methods are clearly inappropriate, while other methods preserve type I error under a variety of conditions. Among the latter are the method of Williams [4] which is an enhancement of a procedure initially proposed by Hotelling [5].
Hotelling's procedure is exact, but only under the condition that the sample and population standard deviations are the same for both X and Y (i.e., s x =σ x , s y =σ y ). To test the hypotheses in (1)  > and t N-3,p =p th percentile of a t N-3 distribution.
In addition, Olkin and Finn [6] used a similar approach to compare more than 2 correlated correlation coefficients. Since in finite samples, the Fisher z transformation better approximates a normal distribution than the sample correlation coefficient, Dunn and Clark [7] proposed the test statistic ( ) Furthermore, Meng, Rosenthal, and Rubin [8] propose a statistic that is asymptotically equivalent to the Dunn and Clark procedure, but with a simpler expression for var(Z xz -Z yz ) given by: In addition, Bilker, Brensinger, and Gur [9] propose a combined permutation-bootstrap approach to test the hypothesis in (1) based on the test statistic which is computationally intensive, but avoids the necessity of specifying the underlying distribution of Z xy . Finally, the R package [10] cocor maintained by Diedenhofen [11] provides software to implement a variety of methods for comparing dependent Pearson correlation coefficients.
However, an assumption of [4][5][6][7][8] is that the joint distribution of (X, Y, Z) is multivariate normal. We relax this assumption and use the method of moments and the delta method to estimate var(Z xz -Z yz ). Furthermore, in the Swiss Analgesic dataset, there is considerable intraindividual variation among replicate measures in both X and Y and the more important question is whether the true (underlying) mean value of X and Y at baseline, denoted by μ x and μ y are more highly correlated with Z. Thus, we wish to test the hypothesis This is an extension of previous work [12] that provides confidence limits for a single deattenuated correlation. Another issue is that it is clear that neither X nor Y is normally distributed. Hence, the Spearman correlation coefficient may be a more appropriate measure of association than the Pearson correlation coefficient in this dataset. Hence, we extend our methodology to the comparison of dependent Spearman correlation coefficients both with and without correction for measurement error. To our knowledge, there is no previous literature on the comparison of dependent Spearman correlation coefficients.
In this paper, we first describe the methodology and propose both an asymptotic test and an exact test. Second, we present a simulation study to assess the validity of the asymptotic test in finite samples and compare it to existing procedures for comparing dependent correlation coefficients. Finally, we describe two examples based on real data illustrating the use of these methods.

Hypothesis testing
We consider a classical measurement error model for X i and Y i of are given in Appendices A, B and C, respectively of the supplementary materials.

Interval estimation
It is also of interest to obtain confidence limits for Furthermore, from the delta method, are obtained from Appendix B of the supplementary materials. Finally, from the delta method we have: Similarly, we can obtain confidence limits for given by

Small-Sample Inference
When sample size is small, permutation methods can be used to estimate levels of significance. The permutation distribution is generated by randomly shuffling the X, Y labels and computing the empirical distribution of , and (5) and (7), respectively. The rank of , and ∆ ∆ z true z based on the observed data with reference to their permutation distributions can be used to estimate exact two-sided p-values given by And similarly forˆz ∆ , where M is the size of the permutation distribution and I(a)=1 if a is true,=0 if a is false.

Hypothesis testing
We can also consider the comparison of dependent Spearman correlations whereby we test the hypothesis where , ρ xz s =population Spearman correlation between X and Z and , ρ yz s is defined similarly.
If the data are transformed to the probit scale then from Rosner and Glynn [13].
population correlation between the probit of X and the probit of Z and , ρ yz probit is defined similarly.
Thus, the hypothesis test in (17)

Simulations
We conducted simulation studies to assess the validity of the asymptotic procedure in (7) for comparing dependent correlations in finite samples and to compare this procedure with the procedures of Dunn and Clark [7] and Meng, Rosenthal and Rubin [8]. We simulated data for (X, Y, Z) from a multivariate normal distribution N(μ, Ʃ) with ( ) We also simulated data from a log normal distribution (X*, Y*, Z*), where X*=exp(X), Y*=exp(Y) and Z*=exp(Z). The results are given in Table 1. We see that in the case of a normal distribution that all three procedures have appropriate type I error (designs 1-4) , particularly in the case of n=400. Similarly, the power of the procedures (designs 5-6) is also comparable. However, in the case of a log normal distribution, type I error is more adequately preserved by the method of moments procedure, particularly for n=400. As the sample size increases, the inconsistent estimate of the variance provided by the Dunn and Clark and Meng procedures in the setting of a non-normal distribution yields variance estimates that are too low and type I errors that are too high. Conversely, as n increases the method of moments procedure yields consistent variance estimates and actual p-values that are close to nominal levels. Alternative approaches if non-normality is suspected would be to compare dependent Spearman correlations rather than Pearson correlations.
To assess the accuracy of the procedure in (5) to compare dependent deattenuated Pearson correlations we used the same data designs as in designs 1-6. We performed each set of simulations in two ways. First, we assumed that the intra class correlation between replicate measures of X (and Y) was known without error and set to 0.67. Thus, and (C2). Second, we generated replicate values of (X 1 , X 2 , Y 1 , Y 2 ) from a multivariate normal distribution denoted by N (μ R , Ʃ R ) where μ R= (0,0,0,0) and  Table 2.
We see in Table 2 ),ln( where combinations considered. Furthermore, the bias in estimation of , ∆ z true is close to 0 for all parameter combinations considered. The above variances are slightly larger when the ICC is estimated (Type 2) than when the ICC is assumed known (Type 1), particularly when 0.5 In addition, we estimated the bias and variance of , ∆ R true and the associated coverage probability based on (14). The results are given in Table 3. We note that the bias is close to 0 in all designs considered. Furthermore, the empirical type I error in designs where

Examples Association between analgesic intake and level of kidney function
We use data from the Swiss Analgesic Study where in 1967/68 1,256 women provided a urine specimen in the clinic and 2 additional specimens at home on different days which were assayed for NAPAP, a metabolite indicating recent intake of phenacetin-containing analgesics and salicylates a metabolite indicating recent intake of aspirin-containing analgesics. In addition, serum creatinine, a marker of kidney function was also measured at the baseline clinic visit. We let X, Y and Z be the NAPAP, salicylates and serum creatinine at the baseline (1968) clinic visit. We wish to compare corr(X,Z) with corr(Y,Z). A total of 1168 women had complete data on X, Y and Z. Descriptive statistics for the (X,Y,Z) data are provided in Table 4.
We note that the distributions of each of X, Y and Z are skewed and not normally distributed. There is moderate correlation between NAPAP and salicylates (Pearson correlation=0.423) in part because some analgesics contain both phenacetin and aspirin. Pearson correlations between NAPAP and salicylates vs. serum creatinine were 0.190 and 0.068, respectively. The ICC for NAPAP and salicylates over the 3 determinations were 0.614 and 0.404, respectively, indicating moderate variability of intake on different days.    (5) and (6).

Volume 6 • Issue 2 • 1000226
Based on (7), the observed Pearson correlations were significantly different (V xy =3.344, p=0.001) ( Table 5). For comparison, we also analyzed these data using the Dunn and Clark [7] and Meng, et al. [8] procedures. Based on Dunn and Clark, we have * 1 Z =3.929, p-value=8.5 × 10 -5 . Based on Meng, et al, we have Z xy =3.919, p-value=8.9 × 10 -5 . As in the simulations, the se's are inappropriately low for both of these procedures in the setting of these highly non-normal data, yielding p-values that are biased downward and type I errors that are too low. The deattenuated Pearson correlations were 0.243 and 0.108 for NAPAP (X) and salicylates (Y), respectively, vs. serum creatinine. Based on (5) there was a significant difference between these correlations, V xy,true =2.640, p=0.008.
Since NAPAP and salicylates were right-skewed, we also computed observed and deattenuated Spearman correlations which are presented in Table 6.
The observed Spearman correlations between Z and (X,Y) were 0.155 and 0.081, respectively, which were significantly different (V xy,probit =2.395, p=0.017). After correcting for deattentuation, the Spearman correlations between Z and (X,Y) were 0.212 and 0.127, respectively, which only showed a trend towards statistical significance (V xy,probit,true =1.849, p=0.064). Overall, the results from the comparison of correlation coefficients in Tables 5 and 6   Individual salicylate values were obtained in grouped form (1=0-19 mg%/2=20-49 mg%/3=50-99 mg%/4=100+mg%) at the baseline (1968) clinic visit, and assigned scores of 10 mg%, 35 mg%, 75 mg% and 100 mg%, respectively.  The V statistic for the observed and deattenuated Pearson correlations is based on (7) and (5), respectively. The V statistic for the observed and deattenuated probit correlations is based on (7) and (5), respectively. The V statistics are based on (7) and (5) using the probit transformation. results from the primary paper (Dubach, Levy and Muller, [1]) which showed that NAPAP and salicylates were significantly associated with serum creatinine at baseline, although these analyses were based on categorical definitions of NAPAP, salicylates and creatinine.

Comparison of storage conditions for HbA1c measurements in plasma
Recently dried blood spot assays of HbA1c have been used as biomarkers in many epidemiologic studies and simplified specimen handling would be cost-effective for these large scale community-based studies. A study was conducted at Brigham and Women's Hospital (BWH) to test the hypothesis that dried blood spot determinations for HbA1c are valid measurements with low-intensity storage conditions [14]. Blood samples were drawn into EDTA containing tubes and submitted for duplicate HPLC analysis of HbA1c (considered as the gold standard process). Blood for spotting was also drawn to identical EDTA tubes and then dropped randomly to blood-spot cards. After having been air-dried for at least 20 minutes these dried blood specimens were placed into single-sample, airtight bags with desiccant pouch. They were then stored at room temperature for 0, 2, or 4 weeks in the lab and were then shipped to Biosafe Laboratories, Inc. (Lake Forest IL) for immediate HbA1c analysis or placed in freezer storage [-80°C] for an additional 4 or 12 weeks. Dried blood spot determinations of HbA1c were performed in triplicate, blinded to storage conditions and protocol. Here we want to test the equality of the correlations between results from HPLC measures (Z) and results from dried blood spot measures after 2 weeks of room temperature pre-shipping followed by an additional 4 weeks of freezing after shipping (X), and the correlation between results from HPLC measures (Z) and results from dried blood spot measures after 4 weeks of room temperature pre-shipping followed by an additional 12 weeks of freezing after shipping (Y).
We refer to storage for 2 weeks at room temperature plus 4 additional weeks in the freezer as short-term storage (X), storage for 4 weeks at room temperature plus 12 additional weeks in the freezer as long-term storage (Y) and HPLC processing as Z. The observed and deattenuated Spearman correlations between Z vs. (X,Y) respectively are given in Table 7. We see that ρ xz,s =0.952 while ρ yz,s =0.711, which are significantly different both based on large-sample methods (p=0.005) and also more appropriate exact methods (p=0.021). After deattenuation, we obtain ρ yz,s, true =0.979 and ρ yz,s, true =0.776 (2-sided p-value=0.134 based on large sample methods and 0.056 based on exact methods). Thus, the length of storage has a borderline significant effect on the validity of the HbA1c assay with short-term storage preferable. ) ∆ z true permutation , which indicates the necessity of using exact methods in a small-sample setting.

Discussion
In this paper, we provide a method to compare two Pearson correlation coefficients , ) (ρ ρ xz yz where the correlations are estimated from the same subjects. Unlike previous literature on the subject, we do not make the assumption that (X, Y, Z) are multivariate normal. It is shown in simulation studies that the method of moments estimator of the sampling variance of (Z xz -Z yz ) is more appropriate than standard methods that assume normality, particularly in the setting of non-normal data such as in the analgesic intake data presented in the first example in this paper. We then extended our methods to provide for comparison of two deattenuated correlation coefficients , , ρ ρ xz true yz true where the correlations are estimated from the same subjects. This method is generally applicable to compare the validity Volume 6 • Issue 2 • 1000226  of two surrogate measures (X, Y) where validity is measured by the respective correlation with a third variable (Z) and both X and Y are subject to measurement error. Although the methods are asymptotic, we have shown in finite samples that the asymptotic properties of the test statistic and confidence limits are appropriate when N ≥ 100. In addition, we provide a method for confidence interval estimation of both ρ ∆ and , ρ ∆ true .
The methods were extended to the comparison of dependent Spearman correlations which may be more appropriate when (X, Y, Z) are not multivariate normal, which provide for the comparison of both ordinary and deattenuated Spearman correlations. To our knowledge, this is the first paper in the literature to discuss comparison of dependent Spearman correlations.
In this paper, we have considered deattenuated correlations of the form corr(μ x ,Z) and corr(μ y ,Z) which account for error in the estimation of X and Y. However, it is likely that Z is also measured with error and a useful extension would be to estimate and compare corr(μ x μ z ) vs corr(μ y μ z ) , where error in X, Y and Z are taken into account.
SAS macros for the comparison of dependent ordinary and deattenuated Pearson and Spearman correlations as discussed in this paper are available from the following website: https://sites.google.com/a/channing.harvard.edu/bernardrosner/ channing.