alexa
Reach Us +44-1522-440391
Comparison of Dependent Pearson and Spearman Correlation Coefficients with and without Correction for Measurement Error | OMICS International
ISSN: 2155-6180
Journal of Biometrics & Biostatistics

Like us on:

Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on Medical, Pharma, Engineering, Science, Technology and Business
All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Comparison of Dependent Pearson and Spearman Correlation Coefficients with and without Correction for Measurement Error

Bernard Rosner1,4*, Wei Wang2, Heather Eliassen3,4 and Eileen Hibert4

1Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA

2Department of Medicine, Harvard Medical School, Brigham and Women’s Hospital, Boston, MA 02115, USA

3Department of Nutrition and Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA

4Channing Division of Network Medicine, Harvard Medical School, Boston, MA 02115, USA

*Corresponding Author:
Bernard Rosner
Channing Division of Network Medicine, Harvard Medical School
Boston, MA 02115, USA
Tel: 617-525-2743
Fax: 617-731-1541
E-mail: [email protected]

Received date: April 01, 2015; Accepted date: May 22, 2015; Published date: May 29, 2015

Citation: Rosner B, Wang W, Eliassen H, Hibert E (2015) Comparison of Dependent Pearson and Spearman Correlation Coefficients with and without Correction for Measurement Error. J Biomet Biostat 6:226. doi:10.4172/2155-6180.1000226

Copyright: © 2015 Rosner B, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are are credited.

Visit for more related articles at Journal of Biometrics & Biostatistics

Abstract

There already exist methods for comparing dependent Pearson correlation coefficients. However, each of the variables (X, Y) has associated random error; and a related question is after correcting for random error, which variable correlates most highly with the outcome variable Z. In this paper, we present methods for comparing dependent deattenuated correlation coefficients. This is a generalization of previous work for obtaining confidence limits for a single deattenuated correlation coefficient. In addition, we extend this work to the comparison of dependent Spearman correlation coefficients. The methods are illustrated with two examples. The first example concerns the comparison of nephrotoxicity of phenacetin and aspirin intake as measured by repeat biomarkers obtained from the same subjects. The second example is a comparison of the validity of different storage conditions for measuring HbA1c from dried blood specimens as compared to the gold standard of immediate processing. Results from using these methods indicate that phenacetin intake is more highly correlated with serum creatinine levels than aspirin intake and that short-term storage is preferable to long-term storage for assessment of HbA1c levels. We have available SAS software for comparing dependent deattenuated Pearson correlation and dependent Spearman correlations with and without deattentuation.

Keywords

Deattenuated correlation; Dependent correlations; Measurement error correction; Pearson correlation; Spearman correlation

Introduction

In 1967/68 a group of 1,256 women were recruited for the Swiss Analgesic Study [1,2]. Women were age 30-49 years and lived in the Basel, Switzerland area. The purpose of the study was to investigate whether there was an association between intake of phenacetincontaining analgesics and prevalence and incidence of kidney disease. To measure phenacetin intake a urinary metabolite (N-acetyl-Paminophenol (NAPAP)) was assayed from a urine sample in the clinic and from 2 additional urine samples collected at home on 2 separate days within 1 week of the clinic visit. Serum creatinine was used as a measure of kidney function. Women were seen at follow-up visits in 1969, 1970, 1971, 1972, 1975 and 1978. In the primary paper, mean NAPAP over the 3 replicates was categorized and related to both prevalence and incidence of kidney disease where an abnormal creatinine was defined as >1.5 mg/dl. In the present paper, we represent NAPAP and serum creatinine at the baseline clinic visit in 1968 as continuous variables and compute the correlation coefficient between these two measures. To distinguish phenacetin intake from general use of analgesics, urinary salicylates were also assessed in triplicate at baseline as an indication of intake of aspirin-containing analgesics. If we let X=NAPAP at the baseline clinic visit, Y=salicylates at the baseline clinic visit and Z=serum creatinine at the baseline clinic visit, then we wish to test the hypothesis

equation         (1)

where ρxz is the population correlation between X and Z and ρyz is defined similarly.

There already exist methods for comparing dependent correlation coefficients for normally distributed random variables obtained from the same subjects. In an excellent review paper on this subject [3] it is noted that in previous simulation studies, several methods are clearly inappropriate, while other methods preserve type I error under a variety of conditions. Among the latter are the method of Williams [4] which is an enhancement of a procedure initially proposed by Hotelling [5]. Hotelling’s procedure is exact, but only under the condition that the sample and population standard deviations are the same for both X and Y (i.e., sxx, syy). To test the hypotheses in (1) Williams proposes the test statistic

equation

Where RXZ=corr(x,z),Rxy and Ryz are defined similarly, N=sample size an

equation

and we reject H0 if equation and tN-3,p=pth percentile of a tN-3 distribution.

In addition, Olkin and Finn [6] used a similar approach to compare more than 2 correlated correlation coefficients. Since in finite samples, the Fisher z transformation better approximates a normal distribution than the sample correlation coefficient, Dunn and Clark [7] proposed the test statistic

equation

where

equation

Zxz=0.5 ln [(1+ Rxz)/ (1- Rxz)], Zyz is defined similarly, and we reject H0 if equation, where zp=pth percentile of a N(0,1) distribution.

Furthermore, Meng, Rosenthal, and Rubin [8] propose a statistic that is asymptotically equivalent to the Dunn and Clark procedure, but with a simpler expression for var(Zxz -Zxz) given by:

equation

where

equation

and

equation

In addition, Bilker, Brensinger, and Gur [9] propose a combined permutation-bootstrap approach to test the hypothesis in (1) based on the test statistic

equation

which is computationally intensive, but avoids the necessity of specifying the underlying distribution of Zxz. Finally, the R package [10] cocor maintained by Diedenhofen [11] provides software to implement a variety of methods for comparing dependent Pearson correlation coefficients.

However, an assumption of [4-8] is that the joint distribution of (X, Y, Z) is multivariate normal. We relax this assumption and use the method of moments and the delta method to estimate var(Zxz -Zyz). Furthermore, in the Swiss Analgesic dataset, there is considerable intraindividual variation among replicate measures in both X and Y and the more important question is whether the true (underlying) mean value of X and Y at baseline, denoted by μx and μy are more highly correlated with Z. Thus, we wish to test the hypothesis

equation

This is an extension of previous work [12] that provides confidence limits for a single deattenuated correlation. Another issue is that it is clear that neither X nor Y is normally distributed. Hence, the Spearman correlation coefficient may be a more appropriate measure of association than the Pearson correlation coefficient in this dataset. Hence, we extend our methodology to the comparison of dependent Spearman correlation coefficients both with and without correction for measurement error. To our knowledge, there is no previous literature on the comparison of dependent Spearman correlation coefficients.

In this paper, we first describe the methodology and propose both an asymptotic test and an exact test. Second, we present a simulation study to assess the validity of the asymptotic test in finite samples and compare it to existing procedures for comparing dependent correlation coefficients. Finally, we describe two examples based on real data illustrating the use of these methods.

Comparison of Dependent Deattenuated Pearson Correlations

Hypothesis testing

We consider a classical measurement error model for Xi and Yi of the form

equation       (2)

equation

where

equationare respectively independent, equation if replicate measures of X and Y are obtained from the same subjects.

We define the true correlation between (X and Z) and (Y and Z) by

equation      (3)

equation 

Based on equations (2) and (3), it is straightforward to show that

equation                (4)

equation

where

equation

and

equationthe intra class correlations obtained from repeated measures of X and Y based on (2). We wish to test the hypothesis:

equation

Note that it is possible that equation while equation if the reproducibility of X is worse than that of Y (i.e., ICCx<ICCy). Similarly, it is possible that equation while equation

It is usually more efficient to base inferences for correlations based on Fisher’s z statistic. Thus, we will test the hypothesis

equation

Where

equation and Zyz,true is defined similarly.

We will use the delta method to estimate

equation

and obtain the test statistic

equation                       (5)

We have:

equation                       (6)

Where

equation and equationare derived in Appendices A, B and C of the supplementary materials, respectively.

Therefore, from Appendices A, B and C of the supplementary materials, we obtain equation and the test statistic Vxy,true in (5),

with asymptotic p-value

equation, where Φ=standard normal c.d.f.

Similarly, we can also test the hypothesis equation based on the large sample test statistic

equation                 (7)

where equation and based on the delta method

equation        (8)

equation

equation

and

equationare given in Appendices A, B and C, respectively of the supplementary materials.

Interval estimation

It is also of interest to obtain confidence limits for equation. It is not possible to translate confidence limits for Z ,equation to corresponding confidence limits for ρ , equation . Instead, we will use equation as a point estimate of equation. To obtain equationwe use the formula:

equation            (9)

Since equation is a ratio estimator, we first consider equation and use the delta method to obtain

equation         (10)

Furthermore, from the delta method,

equation         (11)

where equation and equationare obtained from Appendix A of the supplementary materials. Similarly,

equation       (12)

equation

where equation and equation are obtained from Appendix B of the supplementary materials. Finally, from the delta method we have:

equation      (13)

equation

where equation and equation are given in Appendix C of the supplementary materials.

Hence, if we combine (9)-(13) and assume asymptotic normality of equation. we have an approximate 100% × (1-α)CI for equation given by

equation     (14)

Similarly, we can obtain confidence limits for equation given by

equation              (15)

where

equation

equation

equation

Small-Sample Inference

When sample size is small, permutation methods can be used to estimate levels of significance. The permutation distribution is generated by randomly shuffling the X, Y labels and computing the empirical distribution of equation in (5) and (7), respectively. The rank of equation based on the observed data with reference to their permutation distributions can be used to estimate exact two-sided p-values given by

equation          (16)

And similarly for equation where M is the size of the permutation distribution and I(a)=1 if a is true,=0 if a is false.

Comparison of Dependent Spearman Correlations

Hypothesis testing

We can also consider the comparison of dependent Spearman correlations whereby we test the hypothesis

equation          (17)

where equation=population Spearman correlation between X and Z and equation is defined similarly.

If the data are transformed to the probit scale then from Rosner and Glynn [13].

equation          (18)

equation

where equation and is estimated by equation where Φ=c.d.f. of a N(0,1) distribution, equation population correlation between the probit of X and the probit of Z and equation is defined similarly.

Thus, the hypothesis test in (17) is equivalent to the hypothesis test in (7) after transformation to the probit scale yielding a test statistic based on equation given by equation

Interval estimation

It is also of interest to obtain confidence limits for equation which we estimate by equation. Based on the delta method and (18), we obtain:

equation      (19)

equation 

equation

where

equationare given in Appendices A, B and C, respectively, of the supplementary material.

Thus, a 100% × (1-α)CI for equation is given by

equation     (20)

Correction for measurement error

Rosner and Glynn [13] defined the deattenuated Spearman correlation by

equation     (21)

where equation is the deattenuated correlation between probit scores for X and Z, respectively. Hence, the hypotheses equation are equivalent to the hypotheses

equation

Furthermore, the latter hypotheses can be tested using (5), where the Fisher z transformation is performed based on probit scores, yielding a test statistic based on equation given by

equation and

equation    (22)

equation

and equationand equationare obtained from equations (A3), (B1) and (C1), of Appendices A, B and C, respectively of the supplementary materials.

To obtain a 100% × (1-α)CI for equation we use a similar approach as in (19) and obtain:

equation (23)

equation

equation

where

equationare obtained from (11)-(13), respectively.

The corresponding large sample 100% × (1-α)CI for equationis given by

equation            (24)

where

equation

Simulations

We conducted simulation studies to assess the validity of the asymptotic procedure in (7) for comparing dependent correlations in finite samples and to compare this procedure with the procedures of Dunn and Clark [7] and Meng, Rosenthal and Rubin [8]. We simulated data for (X, Y, Z) from a multivariate normal distribution N(μ, Σ) with

equation

We performed 4,000 simulations for each combination of N=100, 400 and

equation

We also simulated data from a log normal distribution (X*, Y*, Z*), where X*=exp(X), Y*=exp(Y) and Z*=exp(Z). The results are given in Table 1. We see that in the case of a normal distribution that all three procedures have appropriate type I error (designs 1-4) , particularly in the case of n=400. Similarly, the power of the procedures (designs 5-6) is also comparable. However, in the case of a log normal distribution, type I error is more adequately preserved by the method of moments procedure, particularly for n=400. As the sample size increases, the inconsistent estimate of the variance provided by the Dunn and Clark and Meng procedures in the setting of a non-normal distribution yields variance estimates that are too low and type I errors that are too high. Conversely, as n increases the method of moments procedure yields consistent variance estimates and actual p-values that are close to nominal levels. Alternative approaches if non-normality is suspected would be to compare dependent Spearman correlations rather than Pearson correlations.

Design Distribution ρxz ρyz N Type I Error Power
Method of Moments* Dunn and Clark Meng Method of Moments* Dunn and Clark Meng
1 Normal 0.305 0.305 100 0.061 0.054 0.052 --- --- ---
2 Normal 0.305 0.305 400 0.045 0.045 0.045 --- --- ---
3 Normal 0.5 0.5 100 0.063 0.054 0.052 --- --- ---
4 Normal 0.5 0.5 400 0.045 0.045 0.045 --- --- ---
5 Normal 0.305 0.5 100 --- --- --- 0.866 0.842 0.839
6 Normal 0.305 0.5 400 --- --- --- 0.987 1 1
7 Lognormal 0.305 0.305 100 0.102 0.162 0.16 --- --- ---
8 Lognormal 0.305 0.305 400 0.075 0.197 0.196 --- --- ---
9 Lognormal 0.5 0.5 100 0.116 0.247 0.244 --- --- ---
10 Lognormal 0.5 0.5 400 0.082 0.302 0.326 --- --- ---
11 Lognormal 0.305 0.5 100 --- --- --- 0.495 0.581 0.577
12 Lognormal 0.305 0.5 400 --- --- --- 0.812 0.919 0.919

Table 1: Comparison of alternative methods for comparing dependent correlations.

To assess the accuracy of the procedure in (5) to compare dependent deattenuated Pearson correlations we used the same data designs as in designs 1-6. We performed each set of simulations in two ways. First, we assumed that the intra class correlation between replicate measures of X (and Y) was known without error and set to 0.67. Thus, equation in equations (A3), (B1) and (C2). Second, we generated replicate values of (X1, X2, Y1, Y2) from a multivariate normal distribution denoted by N (μR, ΣR) where μR=(0,0,0,0) and

equation

and estimated ICCx and ICCy from the replicate data. For each set of simulations we estimated the empirical type I error of Vxy, true, and the bias and variance of equationin (5). The results are given in Table 2.

We see in Table 2 that the median theoretical variance and the empirical variance of equation and equation are very similar both for the case where the ICC is assumed known (Type 1) and where it is estimated from the simulated data (Type 2). In addition, the empirical type I error is close to 0.05 (range from 0.039 to 0.061) for all parameter combinations considered. Furthermore, the bias in estimation of equation is close to 0 for all parameter combinations considered. The above variances are slightly larger when the ICC is estimated (Type 2) than when the ICC is assumed known (Type 1), particularly when equation .

Type ICCx(ICCy) ρxz (ρyz) N   TypeI error
Mean Variance Mean Variance Mean Variance
1 0.67 0.305 100 theoretical 0.391 0.0163 0.391 0.0162 0 0.0095 0.05
        empirical 0.395 0.0177 0.398 0.0178 -0.002 0.0103 0.061
  0.67 0.305 400 theoretical 0.391 0.0041 0.391 0.0041 0 0.0024 0.05
        empirical 0.392 0.0041 0.393 0.0042 -0.001 0.0023 0.045
  0.67 0.5 100 theoretical 0.71 0.0212 0.71 0.0212 0 0.0146 0.05
        empirical 0.72 0.0229 0.723 0.0233 -0.003 0.0161 0.058
  0.67 0.5 400 theoretical 0.71 0.0053 0.71 0.0053 0 0.0036 0.05
        empirical 0.712 0.0053 0.714 0.0055 -0.001 0.0036 0.044
2 0.67 0.305 100 theoretical 0.391 0.0166 0.391 0.0168 0 0.01 0.05
        empirical 0.395 0.0183 0.398 0.018 -0.003 0.0106 0.051
  0.67 0.305 400 theoretical 0.391 0.0042 0.391 0.0042 0 0.0025 0.05
        empirical 0.394 0.0042 0.393 0.0041 0.001 0.0026 0.048
  0.67 0.5 100 theoretical 0.71 0.0226 0.71 0.0222 0 0.0167 0.05
        empirical 0.724 0.027 0.728 0.0269 -0.004 0.0201 0.039
  0.67 0.5 400 theoretical 0.71 0.0056 0.71 0.0056 0 0.004 0.05
        empirical 0.715 0.0058 0.713 0.0057 0.002 0.0044 0.048

Table 2: Simulation results for the test procedure in (5) and (6).

In addition, we estimated the bias and variance of equation and the associated coverage probability based on (14). The results are given in Table 3. We note that the bias is close to 0 in all designs considered. Furthermore, the empirical type I error in designs where equation is close to 0.05 and the coverage probability both when equation and when equation is close to 95% both for N=100 and N=400.

ρxz ρyz N   TypeI error Coverage probability
Mean Variance Mean Variance Mean Variance
0.305 0.305 100 theoretical 0.373 0.012 0.373 0.0122 0 0.0073 0.05 95
      empirical 0.37 0.0128 0.373 0.0131 -0.002 0.0075 0.051 94.6
    400 theoretical 0.373 0.0031 0.373 0.0031 0 0.0018 0.05 95
      empirical 0.373 0.003 0.372 0.0031 0.001 0.0019 0.048 95.1
0.5 0.5 100 theoretical 0.611 0.0086 0.611 0.0087 0 0.0065 0.05 95
      empirical 0.61 0.0094 0.612 0.0096 -0.002 0.0068 0.039 94.5
    400 theoretical 0.611 0.0022 0.611 0.0022 0 0.0016 0.05 95
      empirical 0.612 0.0022 0.611 0.0022 0.001 0.0017 0.048 95
0.305 0.5 100 theoretical 0.373 0.012 0.611 0.0087 -0.238 0.0071 ---- 95
      empirical 0.37 0.0132 0.612 0.0094 -0.242 0.0075 ---- 94.3
    400 theoretical 0.373 0.0031 0.611 0.0022 -0.238 0.0018 ---- 95
      empirical 0.373 0.003 0.611 0.0022 -0.238 0.0018 ---- 95

Table 3: Simulation results for the estimation of in (14).

Examples

Association between analgesic intake and level of kidney function

We use data from the Swiss Analgesic Study where in 1967/68 1,256 women provided a urine specimen in the clinic and 2 additional specimens at home on different days which were assayed for NAPAP, a metabolite indicating recent intake of phenacetin-containing analgesics and salicylates a metabolite indicating recent intake of aspirin-containing analgesics. In addition, serum creatinine, a marker of kidney function was also measured at the baseline clinic visit. We let X, Y and Z be the NAPAP, salicylates and serum creatinine at the baseline (1968) clinic visit. We wish to compare corr(X,Z) with corr(Y,Z). A total of 1168 women had complete data on X, Y and Z. Descriptive statistics for the (X,Y,Z) data are provided in Table 4.

          Pearson (Spearman) correlation Pearson (Spearman)
  N mean sd (10th – 90th percentile) x y z ICC
NAPAP (X) (o.d.) 1168 0.176 0.352 (0.000,0.675) 1 0.423 0.19 0.614
-0.353 -0.155 -0.515
salicylates (Y) (mg%) 1168 15.8 19.2 (10.0,35.0)   1 0.068 0.404
-0.081 -0.389
serum creatinine (Z) (mg/dL) 1168 1.02 0.37 (0.69,1.50)     1  

Table 4: Descriptive statistics for NAPAP, salicylates and serum creatinine at the baseline visit, Swiss Analgesic Study, n=1168.

We note that the distributions of each of X, Y and Z are skewed and not normally distributed. There is moderate correlation between NAPAP and salicylates (Pearson correlation=0.423) in part because some analgesics contain both phenacetin and aspirin. Pearson correlations between NAPAP and salicylates vs. serum creatinine were 0.190 and 0.068, respectively. The ICC for NAPAP and salicylates over the 3 determinations were 0.614 and 0.404, respectively, indicating moderate variability of intake on different days.

Based on (7), the observed Pearson correlations were significantly different (Vxy=3.344, p=0.001) (Table 5). For comparison, we also analyzed these data using the Dunn and Clark [7] and Meng, et al. [8] procedures. Based on Dunn and Clark, we have Z*1=3.929, p-value=8.5 × 10-5. Based on Meng, et al, we have Zxy=3.919, p-value=8.9 × 10-5. As in the simulations, the se’s are inappropriately low for both of these procedures in the setting of these highly non-normal data, yielding p-values that are biased downward and type I errors that are too low. The deattenuated Pearson correlations were 0.243 and 0.108 for NAPAP (X) and salicylates (Y), respectively, vs. serum creatinine. Based on (5) there was a significant difference between these correlations, Vxy,true=2.640, p=0.008. Since NAPAP and salicylates were right-skewed, we also computed observed and deattenuated Spearman correlations which are presented in Table 6.

Variable observed correlation Z transform deattenuated correlation Z transform
NAPAP, 1968 (o.d.) 0.19 0.192 0.243 0.248
salicylates, 1968 (mg%) 0.068 0.069 0.108 0.108
Difference between correlations 0.122 0.124 0.135 0.139
(95%CI) (0.051, 0.193) (0.035, 0.235)
se 0.037 0.053
V statistic 3.344 2.64
p-value 0.001 0.008

Table 5: Observed and deattenuated Pearson Correlations between serum creatinine at baseline (1968) vs. each of NAPAP and salicylates at the baseline clinic visit (1968), Swiss Analgesic Study, n=1168.

Variable observed Spearman correlation observed probit correlation Z transform deattenuated Spearman correlation deattenuatedprobit correlation Z transform
NAPAP, 1968 (o.d.) 0.155 0.162 0.163 0.212 0.221 0.225
salicylates, 1968 (mg%) 0.081 0.085 0.085 0.127 0.133 0.134
Difference between correlations 0.074 0.077 0.078 0.085 0.088 0.091
 (95% CI) (0.013, 0.134) (-0.005, 0.174)
se 0.033 0.049
V statistic 2.395 1.849
p-value 0.017 0.064

Table 6: Spearman Correlations between serum creatinine at baseline (1968) vs. each of NAPAP and salicylates at the baseline clinic visit (1968), Swiss Analgesic Study, n=168.

The observed Spearman correlations between Z and (X,Y) were 0.155 and 0.081, respectively, which were significantly different (Vxy,probit=2.395, p=0.017). After correcting for deattentuation, the Spearman correlations between Z and (X,Y) were 0.212 and 0.127, respectively, which only showed a trend towards statistical significance (Vxy,probit,true=1.849, p=0.064). Overall, the results from the comparison of correlation coefficients in Tables 5 and 6 were consistent with the results from the primary paper (Dubach, Levy and Muller, [1]) which showed that NAPAP and salicylates were significantly associated with serum creatinine at baseline, although these analyses were based on categorical definitions of NAPAP, salicylates and creatinine.

Comparison of storage conditions for HbA1c measurements in plasma

Recently dried blood spot assays of HbA1c have been used as biomarkers in many epidemiologic studies and simplified specimen handling would be cost-effective for these large scale community-based studies. A study was conducted at Brigham and Women’s Hospital (BWH) to test the hypothesis that dried blood spot determinations for HbA1c are valid measurements with low-intensity storage conditions [14]. Blood samples were drawn into EDTA containing tubes and submitted for duplicate HPLC analysis of HbA1c (considered as the gold standard process). Blood for spotting was also drawn to identical EDTA tubes and then dropped randomly to blood-spot cards. After having been air-dried for at least 20 minutes these dried blood specimens were placed into single-sample, airtight bags with desiccant pouch. They were then stored at room temperature for 0, 2, or 4 weeks in the lab and were then shipped to Biosafe Laboratories, Inc. (Lake Forest IL) for immediate HbA1c analysis or placed in freezer storage [-80°C] for an additional 4 or 12 weeks. Dried blood spot determinations of HbA1c were performed in triplicate, blinded to storage conditions and protocol. Here we want to test the equality of the correlations between results from HPLC measures (Z) and results from dried blood spot measures after 2 weeks of room temperature pre-shipping followed by an additional 4 weeks of freezing after shipping (X), and the correlation between results from HPLC measures (Z) and results from dried blood spot measures after 4 weeks of room temperature pre-shipping followed by an additional 12 weeks of freezing after shipping (Y).

We refer to storage for 2 weeks at room temperature plus 4 additional weeks in the freezer as short-term storage (X), storage for 4 weeks at room temperature plus 12 additional weeks in the freezer as long-term storage (Y) and HPLC processing as Z. The observed and deattenuated Spearman correlations between Z vs. (X,Y) respectively are given in Table 7. We see that ρxz,s=0.952 while ρyz,s=0.711, which are significantly different both based on large-sample methods (p=0.005) and also more appropriate exact methods (p=0.021). After deattenuation, we obtain ρyz,s, true=0.979 and ρyz,s, true=0.776 (2-sided p-value=0.134 based on large sample methods and 0.056 based on exact methods). Thus, the length of storage has a borderline significant effect on the validity of the HbA1c assay with short-term storage preferable. Plots of ,equation permutation andequation in equation 16 over the 4,096 elements of the permutation distribution are given in Figures 1 and 2, respectively. The distributions are somewhat skewed to the right, particularly for equation, which indicates the necessity of using exact methods in a small-sample setting.

biometrics-biostatistics-permutation-distribution

Figure 1: Permutation distribution of in equation 7 for the HbA1c data (M=212=4096).

biometrics-biostatistics-permutation-distribution

Figure 2: Permutation distribution of , in equation 5 for the HbA1c data (M=212=4096).

observed Spearman correlation observed probit correlation Z transform deattenuated Spearman correlation deattenuatedprobit correlation Z transform ICCs
2 weeks of room temperature pre-shipping plus 4 weeks of freezer (X) 0.952 0.956 1.896 0.979 0.98 2.31 0.946
4 weeks of room temperature pre-shipping plus 12 weeks of freezer (Y) 0.711 0.727 0.922 0.776 0.79 1.072 0.835
difference 0.241 0.229 0.974 0.203 0.19 1.238
se 0.348 0.827
V statistic 2.797 1.497
p-value (large sample) 0.005 0.134
p-value (exact) 0.021 0.056

Table 7: Spearman Correlations between HbA1c levels determined immediately vs. HbA1c levels determined after 2 delay periods, N= NICC =12.

Discussion

In this paper, we provide a method to compare two Pearson correlation coefficients (ρxzyz ) where the correlations are estimated from the same subjects. Unlike previous literature on the subject, we do not make the assumption that (X, Y, Z) are multivariate normal. It is shown in simulation studies that the method of moments estimator of the sampling variance of (Zxz-Zyz) is more appropriate than standard methods that assume normality, particularly in the setting of non-normal data such as in the analgesic intake data presented in the first example in this paper. We then extended our methods to provide for comparison of two deattenuated correlation coefficients equation where the correlations are estimated from the same subjects. This method is generally applicable to compare the validity of two surrogate measures (X, Y) where validity is measured by the respective correlation with a third variable (Z) and both X and Y are subject to measurement error. Although the methods are asymptotic, we have shown in finite samples that the asymptotic properties of the test statistic and confidence limits are appropriate when N ≥ 100. In addition, we provide a method for confidence interval estimation of both Δρ and ρΔ true .

The methods were extended to the comparison of dependent Spearman correlations which may be more appropriate when (X, Y, Z) are not multivariate normal, which provide for the comparison of both ordinary and deattenuated Spearman correlations. To our knowledge, this is the first paper in the literature to discuss comparison of dependent Spearman correlations.

In this paper, we have considered deattenuated correlations of the form corr(μx,Z) and corr(μy,Z) which account for error in the estimation of X and Y. However, it is likely that Z is also measured with error and a useful extension would be to estimate and compare corr(μx μz) vs corr(μy μz) , where error in X, Y and Z are taken into account.

SAS macros for the comparison of dependent ordinary and deattenuated Pearson and Spearman correlations as discussed in this paper are available from the following website:

https://sites.google.com/a/channing.harvard.edu/bernardrosner/ channing.

Acknowledgements

We would like to thank Drs. Buxton and Seeman for providing the original data on HbAlc.

The research was supported by U.S. National Institute of Health grants (U54 CA155626-03, P30 CA006516-48, R01 CA050597-18 and P30 AG017265- 099002). This work was conducted with support from Harvard Catalyst | The Harvard Clinical and Translational Science Center (National Center for Research Resources and the National Center for Advancing Translational Sciences, National Institutes of Health Award UL1 TR001102) and financial contributions from Harvard University and its affiliated academic healthcare centers. The content is solely the responsibility of the authors and does not necessarily represent the official views of Harvard Catalyst, Harvard University and its affiliated academic healthcare centers, of the National Institutes of Health.

References

Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Article Usage

  • Total views: 14008
  • [From(publication date):
    June-2015 - Aug 18, 2019]
  • Breakdown by view type
  • HTML page views : 10073
  • PDF downloads : 3935
Top