Correction of Verication Bias using Log-linear Models for a Single Binary-scale Diagnostic Tests

In diagnostic medicine, the test that determines the true disease status without an error is referred to as the gold standard. Even when a gold standard exists, it is extremely difficult to verify each patient due to the issues of costeffectiveness and invasive nature of the procedures. In practice some of the patients with test results are not selected for verification of the disease status which results in verification bias for diagnostic tests. The ability of the diagnostic test to correctly identify the patients with and without the disease can be evaluated by measures such as sensitivity, specificity and predictive values. However, these measures can give biased estimates if we only consider the patients with test results who also underwent the gold standard procedure. The emphasis of this paper is to apply the log-linear model approach to compute the maximum likelihood estimates for sensitivity, specificity and predictive values. We also compare the estimates with Zhou’s results and apply this approach to analyze Hepatic Scintigraph data under the assumption of ignorable as well as non-ignorable missing data mechanisms. We demonstrated the efficiency of the estimators by using simulation studies. Citation: Rochani H, Samawi H, Vogel R, Yin J (2015) Correction of Verication Bias using Log-linear Models for a Single Binary-scale Diagnostic Tests. J Biom Biostat 6: 266. doi:10.4172/2155-6180.1000266 J Biom Biostat ISSN: 2155-6180 JBMBS, an open access journal Page 2 of 5 Volume 6 • Issue 5 • 1000266 verification bias. To demonstrate this idea, define two random variables I and J , each with two levels i = 1,2 and j = 1,2 . Let J R be the missing data indicator for variable J, such that = 1 J R if J is being observed and = 2 J R , if J is missing. Similarly, let = 1 I R if we observe in variable I. Consider I J × cross-classified Table 3 with total sample size of 1 n++ + , each with two levels with one supplemental margin (in the terminology of Dempster and Baker 1992) with cell count ijab n . The cell counts with = = 1 I J R R is denoted as 11 ij n . The supplemental margin for variable J when = 2 J R is denoted by 12 i n + . In these cases, the symbol ’+’ denotes the summation over the corresponding index. Furthermore, denote the expected cell counts by 1 ij k λ , cell probabilities for I, J, I R and J R by ijkl π and the marginal probabilities by .. ij π for Table 3. Log-linear models are widely used to analyze contingency tables. Missing data indicators for contingency tables with supplemental margins can be incorporated in log-linear models in the same way as they incorporate the other variables. In terms of λ , the homogenous log-linear model (presence of only two-way interactions) for partially observed one variable is ( ) 1 log = ( ) ( ) (1) ( ) ( ) ( 1) ( ) ij k I J R R IJ IR IR I J I J i j k ij i ik μ λ λ λ λ λ λ λ λ + + + + + + + ( 1) ( ) (1 ) JR JR R R I J I J j jk k λ λ λ + + + (1) Because this is an over parameterized model, we add the side conditions that the sum of each λ term over each of the indicated subscripts is zero. For computational purposes, it is convenient to use an alternative parameterization of the model. Let 0 ij m ≥ , 0 ij b ≥ and ( ) 1 1 = ij ij i j m b n++ + + ∑∑ such that 11 = ij ij m μ and 12 = ij ij ij m b μ . In terms of log-linear models = exp{ ( ) ( ) (1) (1) ( ) ( 1) ( 1) ( 1) ( 1) ij I J R R IJ IR JR JR JR I J I J I J m i j ij i i j j λ λ λ λ λ λ λ λ λ λ + + + + + + + + +


Introduction
Diagnostic testing in medicine is the process of identifying the patients with and without a particular disease. Accuracy of the diagnostic test is the ability of the test to correctly identify the true disease status of the patient [1]. The test or procedure that determines the true disease status without an error is called the gold standard test. However, even with the existence of a gold standard test, verification for the disease status of each patient may not be obtained due to various reasons such as the invasive nature or too costly gold standard test. For example, the Prostate Specific Antigen (PSA) blood test is used as a screening test for diagnosis of prostate Cancer with a ranging cost from $60 to $80. However, the true diagnosis is generally confirmed by invasive procedures such as prostate biopsy with a ranging cost from $1600 to $1800. Thus patients with a high risk or a prostate screening test positive are more likely to be offered the gold standard test than those with low risk or a negative screening test. Furthermore, the inference about the measures of diagnostic accuracy such as sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) may be biased because individuals who are selected for the gold standard test based on the diagnostic test results are not a random sample [2]. Many authors have referred to this bias as, work-up bias [3] or verification bias [4]. Therefore, in the presence of verification bias, the efficiency of a diagnostic test heavily depends only on those patients whose disease status has been verified. In addition if there is a positive association between patient selection for verification and the test results will produce more bias results [5].
When the true disease status is missing among patients who were not selected for verification, the framework of the missing data mechanism proposed by Rubin can be applied for verification bias correction. If the probability of selecting patients for verification is independent of both observed and unobserved data then missingness in the disease status is considered missing completely at random (MCAR). Missingness in the disease status is considered missing at random (MAR) when the probability of selecting patients only depends on the observed data and it is considered missing not at random (MNAR) when the probability depends on unobserved data [6]. This is most likely to occur when there is long time lag between the initial test and verification, when there are multiple investigators at various institutions, when the patient population is very heterogeneous or when the disease process is not well understood.
Begg and Greenes [4] proposed a method of verification bias correction for binary diagnostic tests by using Bayes' theorem under the MAR (i.e. conditional assumption). Zhou [7] derived the maximum likelihood estimators for sensitivity and specificity and their variances under both MAR and MNAR mechanisms. However, under the MNAR assumption, Zhou [7] assumed that two known ratios need to be quantified. Zhou [8] showed that the estimators of PPV and NPV are unbiased and consistent if the probability of selecting patients for disease verification does not depends on the true disease status of the patient (i.e. MAR assumption). Some authors have tried to use multiple imputation methods to correct for verification bias under ignorable missingness [9]. However, the validity of that approach was debated by De Groot et al. [10]. In dealing with multiple diagnostic tests and the presence of covariates, Baker [11] Kosinski and Barnhart [12] suggested regression approaches to deal with the MNAR missing mechanism. However, their models require the use of multiple tests or covariates in order to achieve identifiability. Martinez et al. [13] have tried to address the issue of verification bias in the MNAR setting by using a Bayesian approach with a beta prior distribution. Some authors have tried using Gibbs sampling techniques as a Bayesian approach to correct the verification bias in MNAR estimates [14]. All these available methods require iterative methods to compute the estimates of measures for  Table 3.
Log-linear models are widely used to analyze contingency tables. Missing data indicators for contingency tables with supplemental margins can be incorporated in log-linear models in the same way as they incorporate the other variables. In terms of λ , the homogenous log-linear model (presence of only two-way interactions) for partially observed one variable is ( ) Because this is an over parameterized model, we add the side conditions that the sum of each λ term over each of the indicated subscripts is zero. For computational purposes, it is convenient to use an alternative parameterization of the model. Let  Taking advantage of the side conditions, we can derive the following expression for ij b : The detailed derivation can be found in the Appendix.   In this paper, we provide the log-linear model approach to correct for verification bias in order to compute the estimates of diagnostic measures (i.e. Sensitivity, Specificity, PPV and NPV) under various assumptions of missing data mechanisms. We also confirm by simulations that the estimators are asymptotically unbiased and more efficient than complete cases estimates. In next section, we will show the impact of verification bias by using hypothetical example. The following sections will focus on the method of log-linear models to correct the verification bias for diagnostic measures and comparision with Zhou [7] results. Furthermore, we illustrate the log-linear model approach by application to the Hepatic Scintigraph data.

Impact of Verification Bias
In diagnostic medicine, it is a common practice to refer only patients who are test positives for a gold standard test for disease verification. To illustrate the impact of bias, we considered a hypothetical study of 1000 patients to measure the sensitivity, specificity, PPV and NPV (Tables 1 and 2).
The hypothetical data are summarized in Table 1. If every patient is being referred for verification for the gold standard then the true sensitivity from Table 1 is 0.71. Furthermore, the true specificity, PPV and NPV are 0.87, 0.37 and 0.96 respectively. However, if we consider that only 10% of those who have a test negative result receive the gold standard for verification, then the data are summarized in Table 2. From Table 2, the observed sensitivity, specificity, PPV and NPV are 0.96, 0.4, 0.37 and 0.96 respectively. Overall if positive test results are more likely to receive disease verification, (i.e. MAR) then by using only verified cases, the estimates for sensitivity and specificity are biased, in fact, sensitivity is being overestimated and specificity is being underestimated. However, under the MAR assumption, the estimates of PPV = ( | ) P D T + + and NPV = ( | ) P D T − − from complete cases are unbiased [8,15,16].

Log-linear parameterization
Baker et al. [17] used the homogeneous log-linear model approach to adjust the expected cell counts for a two way table with missing data. We can use the similar approach (BRD model approach) to compute the estimates for measures of diagnostic accuracy correcting for

MAR estimates
From equation 4 the estimates of the Sensitivity and Specificity can be obtained as follows ( ) ( ) ( ) ( ) 11 1.

Inference about diagnostic measures
In this section, the multivariate delta method is used to draw the and in general notation the marginal and conditional probabilities can be expressed as : The likelihood ratio statistic with respect to a model fitting the data perfectly is,

Models parameter estimates
Baker et al. [17] identified nine different models for the two-way table with three supplementary margins based on the dependence of missingness on one or the other or its own realization. For Table  3, when b ij equals to β .. , β i. or β .j , there are three different identifiable models. First and second subscripts to the parameter β correspond to variable I and J respectively. The subscript " . " indicates that parameter is constant over corresponding index. In addition, each model has a unique interpretation. For example, Model (β .. ) can be interpreted as the missingness in J is constant (MCAR). Similarly, Model (β i. ) can be interpreted as the missingness in J depends on the realization of I (MAR) and when the missingness in J depends on its own realization, then we consider Model β.j to be MNAR.
Assuming Poisson distribution for cell counts, the maximum likelihood estimates for m ij and β can be derived from log-likelihood functions for each model. Detailed derivation of the estimates are provided in Appendix. The likelihood ratio statistics for each model can also be found in the Appendix.

MCAR
In case of negative estimates for MNAR model, the variances for the measures of the diagnostic accuracy can be estimated by bootstrap method. Zhou [7]

Simulation Study
A simulation study was conducted to illustrate the reduction in bias and root mean squared error (RMSE) of the estimators for sensitivity and specificity for MAR model compared to complete case analysis (Table 4).
Two variables "Test Results" and "Disease Status" with different sample sizes (N) were generated from the multinomial distribution with marginal probabilities of 1 = 0.5 π + and 2 = 0.5 π + under different sensitivity ( ) values. We cross-classified these variables in a 2 2 × contingency table. This process was repeated 2000 times. In each complete contingency tables, we generated missing values in "Disease Status" in such a way that missingness in "Disease Status" depends on the realization of the "Test Results" to ensure the MAR missing mechanism. Table 4 represents the results from our simulation for MAR model when 50% of negative tests and 10% of the positive tests were not verified for the disease. Table 4 shows that correcting for the verification bias using the log-linear model approach will substantially reduce the bias as well as improve the performance of the estimators compared to complete case analysis. We obtained similar results with 10% missigness in T + with different amounts of missingness in T − such as 70%, 60%, 40%, 30% and 20% and/or various marginal probabilities. In addition to this, we carried out an extensive simulation for Model . ( ) j β by simulating the MNAR missing mechanism to estimate PPV and NPV. We observed the reduction in bias and RMSE compared to complete case analysis under different amounts of non-ignorable missingness on the disease status.

Application
In this section, we will demonstrate the log-linear modeling approach to compute the measures of diagnostic accuracy under various missing mechanisms by using the Hepatic Scintigraph data published by [19]. In the Hepatic Scintigraph data, 650 patients underwent hepatic scintigraphy of which 429 patients had positive scans while 221 patients had negative hepatic scans. Only 263 patients out of 429 and 81 patients out of 221 were referred for status verification using procedures such as liver biopsy, exploratory laparotomy or autopsy within 6 weeks of their scans. Table 5 represents the verified and unverified cases by hepatic scan results. Table 6 shows the Maximum likelihood estimates for model parameters (Tables 5-7). Table 7 represents the estimates of sensitivity, specificity , PPV and NPV and 95% confidence intervals under three different models by using the log-linear model approach to Table 5. By using Zhou [7] method the estimates of Sensitivity and Specificity can range from (0.68,0.95) and (0.37,0.86) respectively depending upon the values of 0 e and 1 e . From Table 7, it can be confirmed that under MAR and MCAR assumptions the estimates of PPV and NPV and under MNAR assumption the estimates of sensitivity and specificity can be obtained from complete cases [8].

Final Remarks
Verification bias is an extremely common problem in diagnostic medicine. This paper shows that how the log-linear model approach in single binary-scale diagnostic tests correct the verification bias for estimating the diagnostic measures. Log-linear models also reduces  the bias and improves the performance of the estimators compared to complete case analysis. In addition to this, explicit estimates for the measures of diagnostic accuracy under different missing mechanisms can be computed from simple algebraic formula, while other available methods require iterative methods to estimate the measures of diagnostic accuracy. Furthermore, this approach will allow us to test for MCAR assumption for particular data. However, the only way to confirm for MAR or MNAR missing mechanisms is to recollect the missing data. Although in diagnostic medicine due to the issue of costeffectiveness, we do not have the luxury of getting hold of the missing data. As a result, careful modeling of missing mechanisms to reduce the bias is more an important issue. Therefore, if the data are not MCAR then by using the scientific knowledge about the data we can assume a MAR or MNAR missing mechanism to compute estimates for the measures of diagnostic accuracy with log-linear models.    Expected counts for all un-verified cases can be modeled as the following way : Some of the side conditions (the sum of each λ -term over each of the indicated subscript is zero) for the log-linear models are By using these conditions, we can get