alexa Correction of Verication Bias using Log-linear Models for a Single Binaryscale Diagnostic Tests | Open Access Journals
ISSN: 2155-6180
Journal of Biometrics & Biostatistics
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Correction of Verication Bias using Log-linear Models for a Single Binaryscale Diagnostic Tests

Haresh D Rochani*, Hani M Samawi, Robert L Vogel and Jing Jing Yin

Department of Biostatistics, Jiann-Ping Hsu College of Public Health, USA

*Corresponding Author:
Haresh D Rochania
Assistant Professor, Department of Biostatistics
Jiann-Ping Hsu College of Public Health
Georgia Southern University, Statesboro
GA, 30460, USA
Tel: 912.478.1101
E-mail:[email protected]

Received date: October 30, 2015; Accepted date: December 11, 2015; Published date: December 18, 2015

Citation: Rochania HD, Samawia HM, Vogela RL, Yina JJ (2015) Correction of Verication Bias using Log-linear Models for a Single Binary-scale Diagnostic Tests. J Biom Biostat 6:266. doi:10.4172/2155-6180.1000266

Copyright: © 2015 Rochania HD, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Biometrics & Biostatistics

Abstract

In diagnostic medicine, the test that determines the true disease status without an error is referred to as the gold standard. Even when a gold standard exists, it is extremely difficult to verify each patient due to the issues of costeffectiveness and invasive nature of the procedures. In practice some of the patients with test results are not selected for verification of the disease status which results in verification bias for diagnostic tests. The ability of the diagnostic test to correctly identify the patients with and without the disease can be evaluated by measures such as sensitivity, specificity and predictive values. However, these measures can give biased estimates if we only consider the patients with test results who also underwent the gold standard procedure. The emphasis of this paper is to apply the log-linear model approach to compute the maximum likelihood estimates for sensitivity, specificity and predictive values. We also compare the estimates with Zhou’s results and apply this approach to analyze Hepatic Scintigraph data under the assumption of ignorable as well as non-ignorable missing data mechanisms. We demonstrated the efficiency of the estimators by using simulation studies.

Keywords

Verification bias; Diagnostic tests; Log-liner models; Missing data

Introduction

Diagnostic testing in medicine is the process of identifying the patients with and without a particular disease. Accuracy of the diagnostic test is the ability of the test to correctly identify the true disease status of the patient [1]. The test or procedure that determines the true disease status without an error is called the gold standard test. However, even with the existence of a gold standard test, verification for the disease status of each patient may not be obtained due to various reasons such as the invasive nature or too costly gold standard test. For example, the Prostate Specific Antigen (PSA) blood test is used as a screening test for diagnosis of prostate Cancer with a ranging cost from $60 to $80. However, the true diagnosis is generally confirmed by invasive procedures such as prostate biopsy with a ranging cost from $1600 to $1800. Thus patients with a high risk or a prostate screening test positive are more likely to be offered the gold standard test than those with low risk or a negative screening test. Furthermore, the inference about the measures of diagnostic accuracy such as sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) may be biased because individuals who are selected for the gold standard test based on the diagnostic test results are not a random sample [2]. Many authors have referred to this bias as, work-up bias [3] or verification bias [4]. Therefore, in the presence of verification bias, the efficiency of a diagnostic test heavily depends only on those patients whose disease status has been verified. In addition if there is a positive association between patient selection for verification and the test results will produce more bias results [5].

When the true disease status is missing among patients who were not selected for verification, the framework of the missing data mechanism proposed by Rubin can be applied for verification bias correction. If the probability of selecting patients for verification is independent of both observed and unobserved data then missingness in the disease status is considered missing completely at random (MCAR). Missingness in the disease status is considered missing at random (MAR) when the probability of selecting patients only depends on the observed data and it is considered missing not at random (MNAR) when the probability depends on unobserved data [6]. This is most likely to occur when there is long time lag between the initial test and verification, when there are multiple investigators at various institutions, when the patient population is very heterogeneous or when the disease process is not well understood.

Begg and Greenes [4] proposed a method of verification bias correction for binary diagnostic tests by using Bayes’ theorem under the MAR (i.e. conditional assumption). Zhou [7] derived the maximum likelihood estimators for sensitivity and specificity and their variances under both MAR and MNAR mechanisms. However, under the MNAR assumption, Zhou [7] assumed that two known ratios need to be quantified. Zhou [8] showed that the estimators of PPV and NPV are unbiased and consistent if the probability of selecting patients for disease verification does not depends on the true disease status of the patient (i.e. MAR assumption). Some authors have tried to use multiple imputation methods to correct for verification bias under ignorable missingness [9]. However, the validity of that approach was debated by De Groot et al. [10]. In dealing with multiple diagnostic tests and the presence of covariates, Baker [11] Kosinski and Barnhart [12] suggested regression approaches to deal with the MNAR missing mechanism. However, their models require the use of multiple tests or covariates in order to achieve identifiability. Martinez et al. [13] have tried to address the issue of verification bias in the MNAR setting by using a Bayesian approach with a beta prior distribution. Some authors have tried using Gibbs sampling techniques as a Bayesian approach to correct the verification bias in MNAR estimates [14]. All these available methods require iterative methods to compute the estimates of measures for diagnostic accuracy especially under MNAR mechanism. However, by using the log-linear model approach, the explicit solutions for the measures of diagnostic accuracy estimates can be computed by the use of a simple algebraic formula (computationally convenient) under all missing data mechanisms.

In this paper, we provide the log-linear model approach to correct for verification bias in order to compute the estimates of diagnostic measures (i.e. Sensitivity, Specificity, PPV and NPV) under various assumptions of missing data mechanisms. We also confirm by simulations that the estimators are asymptotically unbiased and more efficient than complete cases estimates. In next section, we will show the impact of verification bias by using hypothetical example. The following sections will focus on the method of log-linear models to correct the verification bias for diagnostic measures and comparision with Zhou [7] results. Furthermore, we illustrate the log-linear model approach by application to the Hepatic Scintigraph data.

Impact of Verification Bias

In diagnostic medicine, it is a common practice to refer only patients who are test positives for a gold standard test for disease verification. To illustrate the impact of bias, we considered a hypothetical study of 1000 patients to measure the sensitivity, specificity, PPV and NPV (Tables 1 and 2).

   Verified  2*Unverified  2*Total
-3 D + D -    
T +  71  120  0  191
T -  29  780  0  809
         1000

Table 1: Hypothetical data.

The hypothetical data are summarized in Table 1. If every patient is being referred for verification for the gold standard then the true sensitivity from Table 1 is 0.71. Furthermore, the true specificity, PPV and NPV are 0.87, 0.37 and 0.96 respectively. However, if we consider that only 10% of those who have a test negative result receive the gold standard for verification, then the data are summarized in Table 2. From Table 2, the observed sensitivity, specificity, PPV and NPV are 0.96, 0.4, 0.37 and 0.96 respectively. Overall if positive test results are more likely to receive disease verification, (i.e. MAR) then by using only verified cases, the estimates for sensitivity and specificity are biased, in fact, sensitivity is being overestimated and specificity is being underestimated. However, under the MAR assumption, the estimates of PPV = P(D+ |T +) and NPV = P(D− |T −) from complete cases are unbiased [8,15,16].

Methodology

Log-linear parameterization

Baker et al.[17] used the homogeneous log-linear model approach to adjust the expected cell counts for a two way table with missing data. We can use the similar approach (BRD model approach) to compute the estimates for measures of diagnostic accuracy correcting for verification bias. To demonstrate this idea, define two random variables I and J , each with two levels i = 1,2 and j = 1,2 . Let RJ be the missing data indicator for variable J, such that RJ=1 if J is being observed and RJ = 2 , if J is missing. Similarly, let RI=1 if we observe in variable I. Consider I × J cross-classified Table 3 with total sample size of n++1+ , each with two levels with one supplemental margin (in the terminology of Dempster and Baker 1992) with cell count nijab . The cell counts with RJ=RJ =1 is denoted as nij11. The supplemental margin for variable J when RJ = 2 is denoted by ni+12 . In these cases, the symbol ’+’ denotes the summation over the corresponding index. Furthermore, denote the expected cell counts by λij1k , cell probabilities for I, J, RI and RJ by π ijkl and the marginal probabilities by πij.. for Table 3.

   Verified  2* Unverified  2*Total
-3 D + D -    
T +  71  120  0  191
T -  3  78  728  809
         1000

Table 2: Hypothetical data with unverified cases.

Log-linear models are widely used to analyze contingency tables. Missing data indicators for contingency tables with supplemental margins can be incorporated in log-linear models in the same way as they incorporate the other variables. In terms of λ , the homogenous log-linear model (presence of only two-way interactions) for partially observed one variable is image

image (1)

Because this is an over parameterized model, we add the side conditions that the sum of each λ term over each of the indicated subscripts is zero. For computational purposes, it is convenient to use an alternative parameterization of the model. Let image, image and image such that image and image. In terms of log-linear models image image

Taking advantage of the side conditions, we can derive the following expression for image:

image

Denote,

image

image

In probabilistic terms, mij and image can be expressed as follows,

image

image

For Table 3, the total expected count is

image

and in general notation the marginal and conditional probabilities can be expressed as :

image

image

image

image

image

The likelihood ratio statistic with respect to a model fitting the data perfectly is,

image

Models parameter estimates

Baker et al. [17] identified nine different models for the two-way table with three supplementary margins based on the dependence of missingness on one or the other or its own realization. For Table 3, when bij equals to β.., βi.or β.j, there are three different identifiable models. First and second subscripts to the parameter β correspond to variable I and J respectively. The subscript " . " indicates that parameter is constant over corresponding index. In addition, each model has a unique interpretation. For example, Model (β..) can be interpreted as the missingness in J is constant (MCAR). Similarly, Model (βi.) can be interpreted as the missingness in J depends on the realization of I (MAR) and when the missingness in J depends on its own realization, then we consider Model β.j to be MNAR.

     Verified  Unverified
    RJ = 1 RJ = 2
    J=1 J=2  
    (D + ) (D - )  
RI = 2   (T + )I=1 n1111 n1211 n1+12
  (T -)I=2 n2111 n2211  n2+12
        n++1+

Table 3: Two way table with one supplementary margin.

Assuming Poisson distribution for cell counts, the maximum likelihood estimates for mij and β can be derived from log-likelihood functions for each model. Detailed derivation of the estimates are provided in Appendix. The likelihood ratio statistics for each model can also be found in the Appendix.

MCAR estimates

image

image (2)

image (3)

From equation 2, the estimates for Sensitivity and Specificity are

image

image

Similarly, the estimates for PPV image and NPV image can be computed from equation 3.

MAR estimates

image

image (4)

image (5)

From equation 4 the estimates of the Sensitivity and Specificity can be obtained as follows

image

image

can be obtained. Similarly, from equation 5 the estimates of PPV image and NPV image can be computed.

MNAR estimates

image

image (8)

image (9)

For Model image , equation 8 provides the estimates of Sensitivity and Specificity as follows

image

image

while equation 9 provides the estimates of PPV image and NPV image

It is possible to obtain negative maximum likelihood estimates for image in MNAR model. If any solution is negative ( image or image), the estimates still can be computed by maximizing the likelihood function by using the limited memory algorithm for constrained optimization Byrd [18]. Optim function can be utilized in statistical softwar R 2.15.1 or higher versions.

Inference about diagnostic measures

In this section, the multivariate delta method is used to draw the inferences about the diagnostic measures for Model (βi.) (MAR). A similar method can be used for Model (β..) (MCAR) and Model (β.j) (MNAR). To derive the asymptotic variance of the estimates, we have

image

However, to ensure better normal approximation, we derived the asymptotic distribution of image . By using multivariate delta method for Model(βi.) , we have

image (10)

where image and I is the Fisher information. The detailed derivation of I2 is provided in the Appendix. Therefore, the 100(1−δ )% confidence interval for image is

image

In case of negative estimates for MNAR model, the variances for the measures of the diagnostic accuracy can be estimated by bootstrap method.

Comparison with Zhou’s result

Zhou [7] derived the estimates for Sensitivity and Specificity under MAR and MNAR assumptions. The estimates obtained in previous section for Sensitivity and Specificity under MAR assumption are equivalent to Zhou’s estimates (Appendix). However,under MNAR assumption Zhou [7] assumes that the two ratios

image

are known to estimate the Sensitivity and Specificity. However, the loglinear model approach does not require such assumptions to estimate MNAR parameters.

Simulation Study

A simulation study was conducted to illustrate the reduction in bias and root mean squared error (RMSE) of the estimators for sensitivity and specificity for MAR model compared to complete case analysis (Table 4).

Two variables “Test Results” and “Disease Status” with different sample sizes (N) were generated from the multinomial distribution with marginal probabilities of image and image under different sensitivity image and specificity image values. We cross-classified these variables in a 2× 2 contingency table. This process was repeated 2000 times. In each complete contingency tables, we generated missing values in “Disease Status” in such a way that missingness in “Disease Status” depends on the realization of the “Test Results” to ensure the MAR missing mechanism. Table 4 represents the results from our simulation for MAR model when 50% of negative tests and 10% of the positive tests were not verified for the disease. Table 4 shows that correcting for the verification bias using the log-linear model approach will substantially reduce the bias as well as improve the performance of the estimators compared to complete case analysis. We obtained similar results with 10% missigness in T+ with different amounts of missingness in T- such as 70%, 60%, 40%, 30% and 20% and/or various marginal probabilities. In addition to this, we carried out an extensive simulation for Model image by simulating the MNAR missing mechanism to estimate PPV and NPV. We observed the reduction in bias and RMSE compared to complete case analysis under different amounts of non-ignorable missingness on the disease status.

N (Sen %,Sp% )  Complete Cases Model Based
     Bias  Bias
     (RMSE)  (RMSE)
    Sen Sp Sen Sp
200  (60,60)  0.1464  0.1269  0.0623  0.0375
    (0.1438)  (0.1279)  (0.0005)  (0.0021)
   (95,95)  0.0504  0.0229  0.0245  0.0195
    (0.0375)  (0.0214)  (0.0004)  (0.0004)
   (60,95)  0.1470  0.0230  0.0586  0.0200
    (0.1424)  (0.0213)  (0.0014)  (0.0004)
   (95,60)  0.1464  0.1279  0.0623  0.0375
    (0.2235)  (0.3587)  (0.1585)  (0.1921)
500  (60,60)  0.1461  0.1310  0.0389  0.0228
    (0.1461)  (0.1310)  (0.0017)  (0.0017)
   (95,95)  0.0396  0.0212  0.0160  0.0122
    (0.0357)  (0.0210)  (0.0002)  (0.0008)
   (60,95)  0.1450  0.0217  0.0354  0.0121
    (0.1450)  (0.0215)  (0.0001)  (0.0001)
   (95,60)  0.0412  0.1301  0.0159  0.0234
    (0.0373)  (0.1301)  (0.0006)  (0.0006)

Table 4: Bias and MSE comparison for 50% missing in T - and 10% missing in T + .

Application

In this section, we will demonstrate the log-linear modeling approach to compute the measures of diagnostic accuracy under various missing mechanisms by using the Hepatic Scintigraph data published by [19]. In the Hepatic Scintigraph data, 650 patients underwent hepatic scintigraphy of which 429 patients had positive scans while 221 patients had negative hepatic scans. Only 263 patients out of 429 and 81 patients out of 221 were referred for status verification using procedures such as liver biopsy, exploratory laparotomy or autopsy within 6 weeks of their scans. Table 5 represents the verified and unverified cases by hepatic scan results. Table 6 shows the Maximum likelihood estimates for model parameters (Tables 5-7).

   Verified  Unverified  Total
  D + D -    
T  231  32  166  429
T -  27  54  140  221
         650

Table 5: Hepatic scintigraph data.

Model image G2  P-Value
  image image G2  
Model  0.89  35.84 < 0.001
Model  0.63  1.73  -  -
Model  0.39  2.4  -  -

Table 6: ML estimates for model parameters.

Measure Model β..  Model β.i  Model β.j
Sensitivity  0.9  0.84  0.9
   (0.86,0.93)  (0.77,0.88)  (0.85,0.93)
Specificity  0.63  0.74  0.63
  (0.52,0.73) (0.66,0.81) (0.52,0.72)
PPV  0.88  0.88  0.75
   (0.83,0.91)  (0.84,0.91)  (0.62,0.84)
NPV  0.67  0.67  0.83
   (0.55,0.77)  (0.58,0.75)  (0.75,0.89)

Table 7: Estimates of diagnostic accuracy measures.

Table 7 represents the estimates of sensitivity, specificity , PPV and NPV and 95% confidence intervals under three different models by using the log-linear model approach to Table 5. By using Zhou [7] method the estimates of Sensitivity and Specificity can range from (0.68,0.95) and (0.37,0.86) respectively depending upon the values of e0 and e1 . From Table 7, it can be confirmed that under MAR and MCAR assumptions the estimates of PPV and NPV and under MNAR assumption the estimates of sensitivity and specificity can be obtained from complete cases [8].

Final Remarks

Verification bias is an extremely common problem in diagnostic medicine. This paper shows that how the log-linear model approach in single binary-scale diagnostic tests correct the verification bias for estimating the diagnostic measures. Log-linear models also reduces the bias and improves the performance of the estimators compared to complete case analysis. In addition to this, explicit estimates for the measures of diagnostic accuracy under different missing mechanisms can be computed from simple algebraic formula, while other available methods require iterative methods to estimate the measures of diagnostic accuracy. Furthermore, this approach will allow us to test for MCAR assumption for particular data. However, the only way to confirm for MAR or MNAR missing mechanisms is to recollect the missing data. Although in diagnostic medicine due to the issue of costeffectiveness, we do not have the luxury of getting hold of the missing data. As a result, careful modeling of missing mechanisms to reduce the bias is more an important issue. Therefore, if the data are not MCAR then by using the scientific knowledge about the data we can assume a MAR or MNAR missing mechanism to compute estimates for the measures of diagnostic accuracy with log-linear models.

Conflict of Interest

The authors have declared no conflict of interest.

References

Appendix

Log-linear parameterization

Expected counts for all verified cases can be modeled by using equation 1 in the following way

image

image

Expected counts for all un-verified cases can be modeled as the following way :

image

image

Some of the side conditions (the sum of each λ -term over each of the indicated subscript is zero) for the log-linear
models are

image

By using these conditions, we can get

image

Maximum likelihood (ML) Estimates for Model image (MCAR)

image

image

image

image

image

image

image

image

image (11)

First solving for image

image

image

Substituting image in equation 11

image

Maximum likelihood Estimates for Model image (MAR)

image

image

image

image

image

image

image

image

image

image

Maximum likelihood Estimates for Model image (MNAR)

image

image

image

image

image

image (12)

image

Comparison of MAR Estimates with Zhou’s Result

image

image (13)

From equation 6 the Sensitivity estimates from log-linear model are

image

image

image

image

Similarly, we can show that the estimates for Specificity from log-linear models are equivalent to [] Specificity
estimates.

Information Matrices :
Model image

image

image

image

image

image

image

image

image

image

Model image

image

image

image

image

image

Model image

image

image

image

image

image

image

image

image

image

Likelihood Ratio Statistics Model image

image

image

image

Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 7864
  • [From(publication date):
    December-2015 - Jun 23, 2017]
  • Breakdown by view type
  • HTML page views : 7821
  • PDF downloads :43
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

 
© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
adwords