Department of Biostatistics, Jiann-Ping Hsu College of Public Health, USA
Received date: October 30, 2015; Accepted date: December 11, 2015; Published date: December 18, 2015
Citation: Rochania HD, Samawia HM, Vogela RL, Yina JJ (2015) Correction of Verication Bias using Log-linear Models for a Single Binary-scale Diagnostic Tests. J Biom Biostat 6:266. doi:10.4172/2155-6180.1000266
Copyright: © 2015 Rochania HD, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Biometrics & Biostatistics
In diagnostic medicine, the test that determines the true disease status without an error is referred to as the gold standard. Even when a gold standard exists, it is extremely difficult to verify each patient due to the issues of costeffectiveness and invasive nature of the procedures. In practice some of the patients with test results are not selected for verification of the disease status which results in verification bias for diagnostic tests. The ability of the diagnostic test to correctly identify the patients with and without the disease can be evaluated by measures such as sensitivity, specificity and predictive values. However, these measures can give biased estimates if we only consider the patients with test results who also underwent the gold standard procedure. The emphasis of this paper is to apply the log-linear model approach to compute the maximum likelihood estimates for sensitivity, specificity and predictive values. We also compare the estimates with Zhou’s results and apply this approach to analyze Hepatic Scintigraph data under the assumption of ignorable as well as non-ignorable missing data mechanisms. We demonstrated the efficiency of the estimators by using simulation studies.
Verification bias; Diagnostic tests; Log-liner models; Missing data
Diagnostic testing in medicine is the process of identifying the patients with and without a particular disease. Accuracy of the diagnostic test is the ability of the test to correctly identify the true disease status of the patient [1]. The test or procedure that determines the true disease status without an error is called the gold standard test. However, even with the existence of a gold standard test, verification for the disease status of each patient may not be obtained due to various reasons such as the invasive nature or too costly gold standard test. For example, the Prostate Specific Antigen (PSA) blood test is used as a screening test for diagnosis of prostate Cancer with a ranging cost from $60 to $80. However, the true diagnosis is generally confirmed by invasive procedures such as prostate biopsy with a ranging cost from $1600 to $1800. Thus patients with a high risk or a prostate screening test positive are more likely to be offered the gold standard test than those with low risk or a negative screening test. Furthermore, the inference about the measures of diagnostic accuracy such as sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) may be biased because individuals who are selected for the gold standard test based on the diagnostic test results are not a random sample [2]. Many authors have referred to this bias as, work-up bias [3] or verification bias [4]. Therefore, in the presence of verification bias, the efficiency of a diagnostic test heavily depends only on those patients whose disease status has been verified. In addition if there is a positive association between patient selection for verification and the test results will produce more bias results [5].
When the true disease status is missing among patients who were not selected for verification, the framework of the missing data mechanism proposed by Rubin can be applied for verification bias correction. If the probability of selecting patients for verification is independent of both observed and unobserved data then missingness in the disease status is considered missing completely at random (MCAR). Missingness in the disease status is considered missing at random (MAR) when the probability of selecting patients only depends on the observed data and it is considered missing not at random (MNAR) when the probability depends on unobserved data [6]. This is most likely to occur when there is long time lag between the initial test and verification, when there are multiple investigators at various institutions, when the patient population is very heterogeneous or when the disease process is not well understood.
Begg and Greenes [4] proposed a method of verification bias correction for binary diagnostic tests by using Bayes’ theorem under the MAR (i.e. conditional assumption). Zhou [7] derived the maximum likelihood estimators for sensitivity and specificity and their variances under both MAR and MNAR mechanisms. However, under the MNAR assumption, Zhou [7] assumed that two known ratios need to be quantified. Zhou [8] showed that the estimators of PPV and NPV are unbiased and consistent if the probability of selecting patients for disease verification does not depends on the true disease status of the patient (i.e. MAR assumption). Some authors have tried to use multiple imputation methods to correct for verification bias under ignorable missingness [9]. However, the validity of that approach was debated by De Groot et al. [10]. In dealing with multiple diagnostic tests and the presence of covariates, Baker [11] Kosinski and Barnhart [12] suggested regression approaches to deal with the MNAR missing mechanism. However, their models require the use of multiple tests or covariates in order to achieve identifiability. Martinez et al. [13] have tried to address the issue of verification bias in the MNAR setting by using a Bayesian approach with a beta prior distribution. Some authors have tried using Gibbs sampling techniques as a Bayesian approach to correct the verification bias in MNAR estimates [14]. All these available methods require iterative methods to compute the estimates of measures for diagnostic accuracy especially under MNAR mechanism. However, by using the log-linear model approach, the explicit solutions for the measures of diagnostic accuracy estimates can be computed by the use of a simple algebraic formula (computationally convenient) under all missing data mechanisms.
In this paper, we provide the log-linear model approach to correct for verification bias in order to compute the estimates of diagnostic measures (i.e. Sensitivity, Specificity, PPV and NPV) under various assumptions of missing data mechanisms. We also confirm by simulations that the estimators are asymptotically unbiased and more efficient than complete cases estimates. In next section, we will show the impact of verification bias by using hypothetical example. The following sections will focus on the method of log-linear models to correct the verification bias for diagnostic measures and comparision with Zhou [7] results. Furthermore, we illustrate the log-linear model approach by application to the Hepatic Scintigraph data.
In diagnostic medicine, it is a common practice to refer only patients who are test positives for a gold standard test for disease verification. To illustrate the impact of bias, we considered a hypothetical study of 1000 patients to measure the sensitivity, specificity, PPV and NPV (Tables 1 and 2).
Verified | 2*Unverified | 2*Total | ||
---|---|---|---|---|
-3 | D + | D - | ||
T + | 71 | 120 | 0 | 191 |
T - | 29 | 780 | 0 | 809 |
1000 |
Table 1: Hypothetical data.
The hypothetical data are summarized in Table 1. If every patient is being referred for verification for the gold standard then the true sensitivity from Table 1 is 0.71. Furthermore, the true specificity, PPV and NPV are 0.87, 0.37 and 0.96 respectively. However, if we consider that only 10% of those who have a test negative result receive the gold standard for verification, then the data are summarized in Table 2. From Table 2, the observed sensitivity, specificity, PPV and NPV are 0.96, 0.4, 0.37 and 0.96 respectively. Overall if positive test results are more likely to receive disease verification, (i.e. MAR) then by using only verified cases, the estimates for sensitivity and specificity are biased, in fact, sensitivity is being overestimated and specificity is being underestimated. However, under the MAR assumption, the estimates of PPV = P(D+ |T +) and NPV = P(D− |T −) from complete cases are unbiased [8,15,16].
Log-linear parameterization
Baker et al.[17] used the homogeneous log-linear model approach to adjust the expected cell counts for a two way table with missing data. We can use the similar approach (BRD model approach) to compute the estimates for measures of diagnostic accuracy correcting for verification bias. To demonstrate this idea, define two random variables I and J , each with two levels i = 1,2 and j = 1,2 . Let R_{J} be the missing data indicator for variable J, such that R_{J}=1 if J is being observed and R_{J} = 2 , if J is missing. Similarly, let R_{I}=1 if we observe in variable I. Consider I × J cross-classified Table 3 with total sample size of n_{++1+} , each with two levels with one supplemental margin (in the terminology of Dempster and Baker 1992) with cell count n_{ijab} . The cell counts with R_{J}=R_{J} =1 is denoted as n_{ij11}. The supplemental margin for variable J when R_{J} = 2 is denoted by n_{i+12} . In these cases, the symbol ’+’ denotes the summation over the corresponding index. Furthermore, denote the expected cell counts by λ_{ij1k} , cell probabilities for I, J, R_{I} and R_{J} by π_{ ijkl} and the marginal probabilities by π_{ij..} for Table 3.
Verified | 2* Unverified | 2*Total | ||
---|---|---|---|---|
-3 | D + | D - | ||
T + | 71 | 120 | 0 | 191 |
T - | 3 | 78 | 728 | 809 |
1000 |
Table 2: Hypothetical data with unverified cases.
Log-linear models are widely used to analyze contingency tables. Missing data indicators for contingency tables with supplemental margins can be incorporated in log-linear models in the same way as they incorporate the other variables. In terms of λ , the homogenous log-linear model (presence of only two-way interactions) for partially observed one variable is
(1)
Because this is an over parameterized model, we add the side conditions that the sum of each λ term over each of the indicated subscripts is zero. For computational purposes, it is convenient to use an alternative parameterization of the model. Let , and such that and . In terms of log-linear models
Taking advantage of the side conditions, we can derive the following expression for :
Denote,
In probabilistic terms, m_{ij} and can be expressed as follows,
For Table 3, the total expected count is
and in general notation the marginal and conditional probabilities can be expressed as :
The likelihood ratio statistic with respect to a model fitting the data perfectly is,
Models parameter estimates
Baker et al. [17] identified nine different models for the two-way table with three supplementary margins based on the dependence of missingness on one or the other or its own realization. For Table 3, when bij equals to β_{..}, β_{i.}or β_{.j}, there are three different identifiable models. First and second subscripts to the parameter β correspond to variable I and J respectively. The subscript " . " indicates that parameter is constant over corresponding index. In addition, each model has a unique interpretation. For example, Model (β_{..}) can be interpreted as the missingness in J is constant (MCAR). Similarly, Model (β_{i.}) can be interpreted as the missingness in J depends on the realization of I (MAR) and when the missingness in J depends on its own realization, then we consider Model β_{.j} to be MNAR.
Verified | Unverified | |||
---|---|---|---|---|
R_{J = 1} | R_{J = 2} | |||
J=1 | J=2 | |||
(D + ) | (D - ) | |||
R_{I = 2} | (T + )I=1 | n_{1111} | n_{1211} | n_{1+12} |
(T -)I=2 | n_{2111} | n_{2211} | n_{2+12} | |
n_{++1+} |
Table 3: Two way table with one supplementary margin.
Assuming Poisson distribution for cell counts, the maximum likelihood estimates for m_{ij} and β can be derived from log-likelihood functions for each model. Detailed derivation of the estimates are provided in Appendix. The likelihood ratio statistics for each model can also be found in the Appendix.
MCAR estimates
(2)
(3)
From equation 2, the estimates for Sensitivity and Specificity are
Similarly, the estimates for PPV and NPV can be computed from equation 3.
MAR estimates
(4)
(5)
From equation 4 the estimates of the Sensitivity and Specificity can be obtained as follows
can be obtained. Similarly, from equation 5 the estimates of PPV and NPV can be computed.
MNAR estimates
(8)
(9)
For Model , equation 8 provides the estimates of Sensitivity and Specificity as follows
while equation 9 provides the estimates of PPV and NPV
It is possible to obtain negative maximum likelihood estimates for in MNAR model. If any solution is negative ( or ), the estimates still can be computed by maximizing the likelihood function by using the limited memory algorithm for constrained optimization Byrd [18]. Optim function can be utilized in statistical softwar R 2.15.1 or higher versions.
Inference about diagnostic measures
In this section, the multivariate delta method is used to draw the inferences about the diagnostic measures for Model (β_{i.}) (MAR). A similar method can be used for Model (β_{..}) (MCAR) and Model (β_{.j}) (MNAR). To derive the asymptotic variance of the estimates, we have
However, to ensure better normal approximation, we derived the asymptotic distribution of . By using multivariate delta method for Model(β_{i.}) , we have
(10)
where and I is the Fisher information. The detailed derivation of I_{2} is provided in the Appendix. Therefore, the 100(1−δ )% confidence interval for is
In case of negative estimates for MNAR model, the variances for the measures of the diagnostic accuracy can be estimated by bootstrap method.
Comparison with Zhou’s result
Zhou [7] derived the estimates for Sensitivity and Specificity under MAR and MNAR assumptions. The estimates obtained in previous section for Sensitivity and Specificity under MAR assumption are equivalent to Zhou’s estimates (Appendix). However,under MNAR assumption Zhou [7] assumes that the two ratios
are known to estimate the Sensitivity and Specificity. However, the loglinear model approach does not require such assumptions to estimate MNAR parameters.
A simulation study was conducted to illustrate the reduction in bias and root mean squared error (RMSE) of the estimators for sensitivity and specificity for MAR model compared to complete case analysis (Table 4).
Two variables “Test Results” and “Disease Status” with different sample sizes (N) were generated from the multinomial distribution with marginal probabilities of and under different sensitivity and specificity values. We cross-classified these variables in a 2× 2 contingency table. This process was repeated 2000 times. In each complete contingency tables, we generated missing values in “Disease Status” in such a way that missingness in “Disease Status” depends on the realization of the “Test Results” to ensure the MAR missing mechanism. Table 4 represents the results from our simulation for MAR model when 50% of negative tests and 10% of the positive tests were not verified for the disease. Table 4 shows that correcting for the verification bias using the log-linear model approach will substantially reduce the bias as well as improve the performance of the estimators compared to complete case analysis. We obtained similar results with 10% missigness in T+ with different amounts of missingness in T- such as 70%, 60%, 40%, 30% and 20% and/or various marginal probabilities. In addition to this, we carried out an extensive simulation for Model by simulating the MNAR missing mechanism to estimate PPV and NPV. We observed the reduction in bias and RMSE compared to complete case analysis under different amounts of non-ignorable missingness on the disease status.
N | (Sen %,Sp% ) | Complete Cases | Model Based | ||
---|---|---|---|---|---|
Bias | Bias | ||||
(RMSE) | (RMSE) | ||||
Sen | Sp | Sen | Sp | ||
200 | (60,60) | 0.1464 | 0.1269 | 0.0623 | 0.0375 |
(0.1438) | (0.1279) | (0.0005) | (0.0021) | ||
(95,95) | 0.0504 | 0.0229 | 0.0245 | 0.0195 | |
(0.0375) | (0.0214) | (0.0004) | (0.0004) | ||
(60,95) | 0.1470 | 0.0230 | 0.0586 | 0.0200 | |
(0.1424) | (0.0213) | (0.0014) | (0.0004) | ||
(95,60) | 0.1464 | 0.1279 | 0.0623 | 0.0375 | |
(0.2235) | (0.3587) | (0.1585) | (0.1921) | ||
500 | (60,60) | 0.1461 | 0.1310 | 0.0389 | 0.0228 |
(0.1461) | (0.1310) | (0.0017) | (0.0017) | ||
(95,95) | 0.0396 | 0.0212 | 0.0160 | 0.0122 | |
(0.0357) | (0.0210) | (0.0002) | (0.0008) | ||
(60,95) | 0.1450 | 0.0217 | 0.0354 | 0.0121 | |
(0.1450) | (0.0215) | (0.0001) | (0.0001) | ||
(95,60) | 0.0412 | 0.1301 | 0.0159 | 0.0234 | |
(0.0373) | (0.1301) | (0.0006) | (0.0006) |
Table 4: Bias and MSE comparison for 50% missing in T - and 10% missing in T + .
In this section, we will demonstrate the log-linear modeling approach to compute the measures of diagnostic accuracy under various missing mechanisms by using the Hepatic Scintigraph data published by [19]. In the Hepatic Scintigraph data, 650 patients underwent hepatic scintigraphy of which 429 patients had positive scans while 221 patients had negative hepatic scans. Only 263 patients out of 429 and 81 patients out of 221 were referred for status verification using procedures such as liver biopsy, exploratory laparotomy or autopsy within 6 weeks of their scans. Table 5 represents the verified and unverified cases by hepatic scan results. Table 6 shows the Maximum likelihood estimates for model parameters (Tables 5-7).
Verified | Unverified | Total | ||
---|---|---|---|---|
D + | D - | |||
T | 231 | 32 | 166 | 429 |
T - | 27 | 54 | 140 | 221 |
650 |
Table 5: Hepatic scintigraph data.
Model | G^{2} | P-Value | ||
---|---|---|---|---|
G^{2} | ||||
Model | 0.89 | 35.84 | < 0.001 | |
Model | 0.63 | 1.73 | - | - |
Model | 0.39 | 2.4 | - | - |
Table 6: ML estimates for model parameters.
Measure | Model β_{..} | Model β_{.i} | Model β_{.j} |
Sensitivity | 0.9 | 0.84 | 0.9 |
(0.86,0.93) | (0.77,0.88) | (0.85,0.93) | |
Specificity | 0.63 | 0.74 | 0.63 |
(0.52,0.73) | (0.66,0.81) | (0.52,0.72) | |
PPV | 0.88 | 0.88 | 0.75 |
(0.83,0.91) | (0.84,0.91) | (0.62,0.84) | |
NPV | 0.67 | 0.67 | 0.83 |
(0.55,0.77) | (0.58,0.75) | (0.75,0.89) |
Table 7: Estimates of diagnostic accuracy measures.
Table 7 represents the estimates of sensitivity, specificity , PPV and NPV and 95% confidence intervals under three different models by using the log-linear model approach to Table 5. By using Zhou [7] method the estimates of Sensitivity and Specificity can range from (0.68,0.95) and (0.37,0.86) respectively depending upon the values of e_{0} and e_{1} . From Table 7, it can be confirmed that under MAR and MCAR assumptions the estimates of PPV and NPV and under MNAR assumption the estimates of sensitivity and specificity can be obtained from complete cases [8].
Verification bias is an extremely common problem in diagnostic medicine. This paper shows that how the log-linear model approach in single binary-scale diagnostic tests correct the verification bias for estimating the diagnostic measures. Log-linear models also reduces the bias and improves the performance of the estimators compared to complete case analysis. In addition to this, explicit estimates for the measures of diagnostic accuracy under different missing mechanisms can be computed from simple algebraic formula, while other available methods require iterative methods to estimate the measures of diagnostic accuracy. Furthermore, this approach will allow us to test for MCAR assumption for particular data. However, the only way to confirm for MAR or MNAR missing mechanisms is to recollect the missing data. Although in diagnostic medicine due to the issue of costeffectiveness, we do not have the luxury of getting hold of the missing data. As a result, careful modeling of missing mechanisms to reduce the bias is more an important issue. Therefore, if the data are not MCAR then by using the scientific knowledge about the data we can assume a MAR or MNAR missing mechanism to compute estimates for the measures of diagnostic accuracy with log-linear models.
The authors have declared no conflict of interest.
Log-linear parameterization
Expected counts for all verified cases can be modeled by using equation 1 in the following way
Expected counts for all un-verified cases can be modeled as the following way :
Some of the side conditions (the sum of each λ -term over each of the indicated subscript is zero) for the log-linear
models are
By using these conditions, we can get
Maximum likelihood (ML) Estimates for Model (MCAR)
(11)
First solving for
Substituting in equation 11
Maximum likelihood Estimates for Model (MAR)
Maximum likelihood Estimates for Model (MNAR)
(12)
Comparison of MAR Estimates with Zhou’s Result
(13)
From equation 6 the Sensitivity estimates from log-linear model are
Similarly, we can show that the estimates for Specificity from log-linear models are equivalent to [] Specificity
estimates.
Information Matrices :
Model
Model
Model
Likelihood Ratio Statistics Model