Reach Us
+44-1522-440391

**Seongho Kim ^{1}, Diane Erwin^{2} and Dongfeng Wu^{1*}**

^{1}Dept of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 40202, USA

^{2}Information Management Services, Inc., Rockville, MD 20852, USA

- *Corresponding Author:
- Dr. Dongfeng Wu

Dept of Bioinformatics and Biostatistics

School of Public Health and Information Sciences

University of Louisville, Louisville, KY 40202, USA

**E-mail:**[email protected]

**Received Date: **March 14, 2012; **Accepted Date: **April 21, 2012; **Published Date: ** April 23, 2012

**Citation:**Kim S, Erwin D, Wu D (2012) Efficacy of Dual Lung Cancer Screening by Chest X-Ray and Sputum Cytology Using Johns Hopkins Lung Project Data. J Biom Biostat 3:139. doi:10.4172/2155-6180.1000139

**Copyright:** © 2012Kim S, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Visit for more related articles at** Journal of Biometrics & Biostatistics

Cytology; X-ray; Sojourn time; Sensitivity; Transition probability; Lung cancer screening

Lung cancer is the leading cause of cancer death for both men and women and more people die of lung cancer than breast, colon, and prostate cancers combined. The lungs are located in the chest and most lung cancer begins in the cells that line bronchi. There are two major types of Lung cancer which need different treatments: One is small cell lung cancer, making up about 20% of all lung cancer cases and the other is non-small cell lung cancer, which is the most common type of lung cancer. The age-specific lung cancer incidence rate rises with advancing age and reaches its peak between 65 and 74 [1].

The most common cause of lung cancer deaths is cigarette smoking, although lung cancer has occurred in people who have never smoked. There are other causes such as high levels of air pollution, radon gas, family history, radiation therapy to the lungs, etc. [2]. It may take years for lung cancer cells to develop, and often people do not have any symptoms, until the disease has progressed to a late stage. As a result, less than 15% of lung cancers are discovered in early stages, when the possibility of curative treatment is the greatest. However, there are many arguments on whether early detection could lead to significant reduction in lung cancer mortality. A recent study by the National Lung Screening Trial (NLST) showed that there is a 20% reduction of lung cancer mortality among those who received spiral Computed Tomography (CT) than those who received chest X-rays [3]. Nonetheless, there is no recommendation for regular lung cancer screening so far.

In several randomized controlled trials of lung cancer screening, chest X-ray and/or sputum cytology were used simultaneously. For example, Melamed et al. [4] examined the annually dual screening with X-ray and cytology for the Memorial Sloan-Kettering study, concluding that the cytology is not necessary as an annual screening. Chien et al. [5] studied mean sojourn time for lung cancer by chest X-ray screening. Doria-Rose et al. [6] evaluated the benefit of each screening in terms of long-term mortality reduction, addressing that there is a modest benefit from sputum cytology screening on lung cancer screening. Wu et al. [7] investigated screening sensitivity, transition probability, and sojourn time for 4-monthly chest X-ray and sputum cytology screenings using the Mayo Lung Project data. However, none of them considered testing the dependency of the two diagnostic procedures, chest X-ray and sputum cytology, conditional on lung cancer status in terms of screening sensitivity. Therefore, in this work, the correlation coefficient of the two tests, chest X-ray and sputum cytology was estimated for testing the independence of the two screening procedures using the Johns Hopkins Lung Project (JHLP) data. In addition, we investigate the overall screening sensitivity, transition probability, sojourn time, and lead time of the two screening procedures.

We assume that the lung cancer develops by progressing through three states, denoted by S_{0}→ S_{p}→ S_{c}, corresponding respectively, to the disease-free state, the preclinical cancer state where an asymptomatic individual unknowingly has lung cancer that a screening exam can detect; and the clinical state when the lung cancer manifests itself in clinical symptoms. If a person enters the preclinical state (S_{p}) at age t1 and one’s clinical symptoms present later at age t_{2} then (t_{2} - t_{1}) is the sojourn time in the preclinical state. If one is offered a screening exam at time t within the interval (t_{1}, t_{2}), and cancer is diagnosed, then the length of the time (t_{2} - t) is the lead time.

The main objective in lung cancer screening is to detect lung cancer as early as possible, i.e., in the preclinical state where one has tumor but there are no symptoms. The screening model has three important parameters, sensitivity of the diagnosis modality, the sojourn time distribution, and the transition probability from disease-free to preclinical state. These three parameters can be called the building blocks for cancer screening modeling, as all other parameters of interest, such as the lead time, and the probability of over-diagnosis can be expressed as a function of these three key parameters.

**Johns Hopkins Lung Project Data**

The designs of the Johns Hopkins Lung Project (JHLP) can be found in the literature reported previously [8]. JHLP trials enrolled 10,386 men in the Baltimore metropolitan area between 1973 and 1978 aged at least 45 years, who smoked at least one pack of cigarettes per day (or who had smoked this much within one year of enrollment) and who had no prior history of respiratory tract cancer. Then all participants were randomized to either chest X-ray only or a dualscreen (chest X-ray and sputum cytology) group, resulting in 5,160 to chest X-ray only arm and 5,226 to the dual-screen arm. Participants in the chest X-ray group received chest X-ray screening test annually, for 8 consecutive years. Participants in the dual-screen group took chest X-ray annually and received sputum cytology every 4 months, for 8 consecutive years, for a total of 22 screening time points altogether. Out of the 22 screenings, only 8 annual screenings include both chest X-ray and sputum cytology procedures. If any of the tests was positive, then the screen was considered positive and a definitive work-up exam, such as biopsy, was done. The data that we used includes the total number of participants in each screening exam, the number of detected and confirmed cancer cases in each screening exam, and the number of interval cases. These data were stratified by age at entry. The age at entry ranges from 45 to 88 years old in the JHLP. However, we only used the data from age 45 to age 67, because the other age groups have too few participants, and may cause large deviations in the estimation.

**Model specification**

The methods introduced by Shen et al. [9] and Wu et al. [7,10] were employed for the analysis of the JHLP data and we follow their notations and conventions.

Suppose two screening procedures, chest X-ray and sputum cytology (we hereafter call them x-ray and cytology, respectively), are being applied independently to each individual during an earlydetection trial. Screening exams can result in four mutually exclusive cases where lung cancer exists. That is, Case 1 is when the cancer was identified by x-ray only; Case 2 is the case when identified by cytology only; Case 3 is when identified by both x-ray and cytology; and Case 4 is where cancer was not identified by any procedure. Let X_{j} and Z be binary random variables. X_{j} denotes the outcome of procedure j (j = 1 means x-ray, 2 means cytology) with one for a suspicious finding, i.e., positive and zero for a normal finding, i.e., negative and Z represents the true disease state of an individual with one for having disease and zero for not having the disease. Then

α_{ab}=Pr(X_{1} = a, X_{2} = b|Z=1), a, b = 0,1,

and

β = Pr(max (X_{1},X_{1} )=1|Z=1)=1-α_{00},

where β is the overall sensitivity of the screening program, and the individual sensitivities of x-ray and cytology are

β_{1} = Pr (X_{1} = 1|Z = 1) = α_{10}+α_{11}

and β_{2} = Pr (X_{2} = 1|Z = 1) = α_{01}+α_{11},respectively. The schematic representation of these probabilities can be found in **Figure 1**. Since 0 < αab< 1 for a,b=0,1, it follows that 0 < β_{1}, β_{2} < 1 and that β_{1} , β_{2} < β and β < β_{1} + β_{2} . Based on [9], we obtain the correlation coefficient r, between X_{1} and X_{2} conditional on Z = 1 by the following equation:

For a group of people whose first screening exam is taken at age t0, we let t_{0} < t_{1} < …< t_{K-1} < T represent K ordered screening exam times and let T denote the follow-up time past the last examination. The ith screening interval is (t_{i-1}, t_{i}) for i = 1,2,…,k. We adopt the following notation: the ith annual screening exam happens at the age is the total number of individuals examined at t_{i-1}, s_{i,t0} is the number of cases diagnosed at the exam given at t_{i-1}, and r_{i,t0} is the number of interval cases within the interval (t_{i-1}, t_{i}). For the ith exam, define to be the number of cases detected by x-ray, cytology, and both procedures, respectively, then the total number of cases detected at the i^{th} exam is

Let D_{i,t0} be the probability of an individual definitively diagnosed at the ith scheduled exam given at t_{i-1} and I be the probability of an interval case occurring in the i^{th} interval (t_{i-1}, t_{i}). These two probabilities are as follows:

Where w(t)dt is the probability of a transition from S_{0} to S_{p} during (t, t + dt); q(t) is the probability density function of the sojourn time in S_{p}; is the survivor function of the sojourn time in the preclinical state S_{p}. The transition probability density function is the probability density function of a lognormal distribution with μ and σ2 multiplied by 20%, i.e., a sub-density function:

Note that 20% was selected based on the previous analysis on Lung cancer screening of [7]. We employed the log-logistic distribution to model the sojourn time in the preclinical state according to [10]:

where x is the sojourn time, and κ and ρ are positive parameters to represent the scale and location in the log-logistic family.

The JHLP data have annual x-ray and 4-monthly cytology with different screening intervals. As a result, there were two screens each year with only cytology. In other words, there were one dual-screening (x-ray and cytology) and two single-screening (cytology only) exams each year. To correct two missing x-ray screenings on cytology a year, we modeled the following likelihood:

(6)

where K_{d} denotes the number of dual-screenings and K_{s} the number of single-screenings with K = K_{d} + K_{s} Note that the probabilities of being diagnosed and incident for the single-screenings depend on the sensitivity for cytology only, i.e., only β_{2} is considered instead of β since the information of x-ray is missing.

To ensure the dependence of two screening procedures, the likelihood ratio test is considered between two log-likelihood functions. One is under the null hypothesis (H_{0}) when the correlation coefficient, r, is equal to zero and the other log-likelihood function is under the alternative hypothesis (H_{1}) when r is not equal to zero. It is well known that the log-likelihood ratio test statistic under H_{0} is

Where θ = (β_{1}, β_{2}, r, μ,σ^{2},κ, ρ) and θ* = (β_{1}, β_{2}, r, μ,σ^{2},κ, ρ), and this test statistic is approximately X_{2} distributed with degree of freedom of one.

**Lead time estimation**

The lead time estimation is performed using the method developed by Wu et al. [11]. In this study, the classical approach is adapted for the lead time estimation, although Wu et al. [11] proposed the Bayesian approach based on the posterior probability calculation. One major characteristic of the lead time (L) is that its distribution is a mixture of point mass at zero and a piecewise continuous density. That is, it is composed of the conditional probability P(L = 0 | D =1) and the conditional probability density function zf_{L}(z|D=1), for any 0 < z ≤ T - t_{0}, where D is a binary variable with D =1 indicating development of clinical disease and D=0 indicating the absence of the clinical disease before death; T is human life span; t_{0} is the individual’s age at one’s initial screening exam. In particular, P(L = 0|D =1) can be interpreted as the probability of “no-early-detection” or “no benefit” of a screening procedure since it is the probability of lead time equals to zero. For a more detailed overview of lead time and its estimation, please refer to [11].

The global optima of the parameter θ under H_{0} and H_{1} using PSO described in the Supplementary data are shown in **Table 1**. The null model H_{0} and the alternative model H_{1} have very similar estimates for all corresponding parameters. To test if the correlation coefficient r is significantly different from zero, the LRT test was performed, which is described in the previous section, and the resulting p-value is 0.5903, meaning that there is no strong evidence to support that the correlation coefficient is different from zero. In other words, the sensitivities of the two screening modalities are considered independent. In particular, the sensitivities for each procedure are 79.93% and 26.98% for x-ray and cytology respectively, and the dual screening has a sensitivity of 85.34%, suggesting that the dual screening improves the overall cancer detection rate up to ~ 5%. Note that the standard error of the estimate of κ is relatively larger than other estimates. Since the null hypothesis is not rejected in terms of the p-value, we hereafter use the parameter estimates under H_{0} for further analysis.

Model | β_{1} |
β_{2} |
r | μ | α^{2} |
κ | ρ | ll* | p-value^{#} |
---|---|---|---|---|---|---|---|---|---|

H_{0} |
0.7993 (0.72´10-7) |
0.2698 (0.87´10-7) |
- | 3.7967 (0.33 ´10-6) |
0.0417 (0.38 ´10-5) |
5.2930 (0.0038) |
0.9412 (0.0003) |
-305.7977 | 0.5903 |

H_{1} |
0.7993 (0.0009) |
0.2739 (0.0006) |
-0.0039 (0.0048) |
3.7966 (0.0265) |
0.0414 (0.0035) |
5.2938 (0.0231) |
0.9305 (0.0266) |
-305.6527 |

*, Loglikelihood; #, The p-value of the loglikelihood ratio test

**Table 1:** Parameter estimates and their standard errors for two different models. The numbers in parentheses are the standard error of the estimates. The null hypothesis (H_{0}) is when two sensitivities are independent (r = 0) and the alternative hypothesis (H_{1}) is when two sensitivities are not independent (r ≠ 0). The estimated sensitivity for the dual screening (ÃŸ) is 0.8534 and 0.8550 for H_{0} and H_{1}, respectively.

The density curves of the transition probability and the sojourn time are displayed in **Figure 2**. The transition probability of males from age 30 to 6 5 ranges from 1.087×10^{-3} to 8.952×10^{-3}. It means that, depending on their age, there will be 1.087 to 8.952 people transitioning from the disease-free state to the preclinical lung cancer per year in every 1000 people. The transition probability increases to the mode at age 42.7 and then decreases monotonically. Likewise, the sojourn time is not a monotonic function of age, having the mode at 0.99 year and the median at 1.06 year. The mean sojourn time is 1.13 years with the standard deviation of 0.42 years.

The lead time estimation for x-ray, cytology, and dual procedures is reported with the different screening intervals (Δ = 4,6,9,12,18,24 months). We can see that the mean lead time increases as the screening time interval decreases. If someone will have lung cancer eventually and he takes more frequent screening exams, then he might have a longer lead time, hence he might be treated at an earlier stage for potentially better prognosis. As a result, the probability of “no benefit” has the maximum at Δ = 24 months. For example, the probability of “no benefit” P_{1} for the dual procedure is 50.47% at Δ = 24 months and 1.36% at Δ = 4 months, resulting an increasing 49.11%. However, due to the smaller sensitivity of 26.98% in the cytology procedure, its probability of “no benefit” even at Δ = 4 is 38.38%, which is larger than other procedures. The density curves for the lead time are depicted with different screening intervals as shown in **Figure 3**. The modes of all the density curves are shifted to the left as the screening interval increases, meaning that the mean lead time decreases.

A likelihood-based approach was applied to the JHLP data for estimating the sensitivity, the transition probability density, the sojourn time, and the lead time. In addition, the dependency of the two screening procedures was investigated using the likelihood ratio test followed by examining the efficacy of the dual screening.

In the JHLP data, the x-ray procedure is an annual exam, while the cytology procedure is a four-months screening. As a result, the x-ray has two fewer screening exams than the cytology each year, it can be considered that the data has missing information on x-ray. Therefore, we modified the likelihood function to fit this missing data with the probabilities of being diagnosed and incident different from [9]. However, this missing information makes it difficult for the parameter estimation to converge due to an unwanted early termination or getting stuck to a saddle point. For this reason, a derivative-free global optimization algorithm, PSO, was applied to the JHLP data so that the parameters were able to be estimated.

The efficacy of the dual screening for lung cancer detection is examined through the sensitivity. The screening procedure with x-ray only is improved from 79.93% to 85.34% when the screening exams were combined with cytology. The independence test of these two screening procedures is further conducted, concluding that there is no significant dependency between the two procedures. In other words, the cytology procedure would detect the lung cancer cases that the x-ray procedure misses, and vice versa. However, the benefit of the dual screening is likely to be small since the improvement is about 5% of the sensitivity, which is consistent with the conclusion of Doria-Rose et al. [6].

On the other hand, the overall sensitivity of the JHLP study is slightly lower than that of the Mayo Lung Project (MLP) study in which the overall sensitivity of the dual screening was 89% [7]. In fact, the mode of the transition probability of the MLP study was at age 68, which is larger than that of the JHLP study with age 42.7. This means the transition to the disease is occurred earlier than the MLP study does. As a result, it might make the overall sensitivity lower since the smallest entering age of the study in the JHLP data is age 45. Another reason might be that the MLP study is four-monthly exams for both x-ray and cytology.

The probability of “no benefit” of the JHLP study is interestingly much smaller than that of the MLP study, even though the overall sensitivity of the MLP study is larger. For example, P_{1} at Δ = 12 is 32.74% of the MLP study, but the JHLP study has P_{1} of 19.01% for the dual screening. This might be because its mean sojourn time is rightly skewer than that of the MLP study. In fact, the mean sojourn time of the JHLP data is much longer than that of Chien et al. [5] and smaller than that of the MLP study, causing the lead time to be shorter.

The cytology procedure is likely not to be effective if the screening interval is larger than 6 months since the probability of “no benefit” becomes more than 50% and reaches to 71.17% at Δ = 12, meaning that the cytology procedure should be offered frequently with less than Δ = 6 months to have a benefit, which is compatible with the suggestion of Melamed et al. [4].

Although the x-ray is offered less frequently than the cytology procedure, the JHLP study shows that the x-ray and cytology procedures can be considered statistically independent and so the dual screening improves the overall sensitivity.

We want to thank Dr. Philip Prorok for allowing us to have access to the Johns Hopkins Lung Project (JHLP) data.

We declare that we have no conflict of interest.

- Altekruse SF, Kosary CL, Krapcho M, Neyman N, Aminou R, et al. (2010) SEER Cancer Statistics Review, 1975-2007. National Cancer Institute, Bethesda, MD, USA.
- Alberg AJ, Ford JG, Samet JM, American College of Chest Physicians (2007) Epidemiology of lung cancer: ACCP evidence-based clinical practice guidelines (2nd edn). Chest 132: 29S-55S.
- The National Lung Screening Trial Research Team (2011) Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 365: 395-409.
- Melamed MR, Flehinger BJ, Zaman MB, Heelan RT, Perchick WA, et al. (1984) Screening for early lung cancer. Results of the Momorial Sloan-Kettering study in New York. Chest 86: 44-53.
- Chien CR, Lai MS, Chen TH (2008) Estimation of mean sojourn time for lung cancer by chest X-ray screening with a Bayesian approach. Lung Cancer 62: 215-220.
- Doria-Rose VP, Marcus PM, Szabo E, Tockman MS, Melamed MR, et al. (2009) Randomized controlled trials of the efficacy of lung cancer screening by sputum cytology revisited : a combined mortality analysis from the Johns Hopkins lung project and the memorial sloan-kettering lung study. Cancer 115: 5007-5017.
- Wu D, Erwin D, Rosner GL (2011) Sojourn time and lead time projection in lung cancer screening. Lung Cancer 72: 322-326.
- Berlin NI, Buncher CR, Fontana RS, Frost JK, Melamed MR (1984) The National cancer Institute Cooperative Early Lung Cancer Detection Program. Results of the initial screen (prevalence). Early lung cancer detection: introduction. Am Rev Respir Dis 130: 545-549.
- Shen Y, Wu D, Zelen M (2001) Testing the independence of two diagnostic tests. Biometrics 57: 1009-1017.
- Wu D, Rosner GL, Broemeling LD (2005) MLE and Bayesian inference of age-dependent sensitivity and transition probability in periodic screening. Biometrics 61: 1056-1063.
- Wu D, Rosner GL, Broemeling LD (2007) Bayesian inference for the lead time in periodic cancer screening. Biometrics 63: 873-880.

Select your language of interest to view the total content in your interested language

- Adomian Decomposition Method
- Algebra
- Algebraic Geometry
- Algorithm
- Analytical Geometry
- Applied Mathematics
- Artificial Intelligence Studies
- Axioms
- Balance Law
- Behaviometrics
- Big Data Analytics
- Big data
- Binary and Non-normal Continuous Data
- Binomial Regression
- Bioinformatics Modeling
- Biometrics
- Biostatistics methods
- Biostatistics: Current Trends
- Clinical Trail
- Cloud Computation
- Combinatorics
- Complex Analysis
- Computational Model
- Computational Sciences
- Computer Science
- Computer-aided design (CAD)
- Convection Diffusion Equations
- Cross-Covariance and Cross-Correlation
- Data Mining Current Research
- Deformations Theory
- Differential Equations
- Differential Transform Method
- Findings on Machine Learning
- Fourier Analysis
- Fuzzy Boundary Value
- Fuzzy Environments
- Fuzzy Quasi-Metric Space
- Genetic Linkage
- Geometry
- Hamilton Mechanics
- Harmonic Analysis
- Homological Algebra
- Homotopical Algebra
- Hypothesis Testing
- Integrated Analysis
- Integration
- Large-scale Survey Data
- Latin Squares
- Lie Algebra
- Lie Superalgebra
- Lie Theory
- Lie Triple Systems
- Loop Algebra
- Mathematical Modeling
- Matrix
- Microarray Studies
- Mixed Initial-boundary Value
- Molecular Modelling
- Multivariate-Normal Model
- Neural Network
- Noether's theorem
- Non rigid Image Registration
- Nonlinear Differential Equations
- Number Theory
- Numerical Solutions
- Operad Theory
- Physical Mathematics
- Quantum Group
- Quantum Mechanics
- Quantum electrodynamics
- Quasi-Group
- Quasilinear Hyperbolic Systems
- Regressions
- Relativity
- Representation theory
- Riemannian Geometry
- Robotics Research
- Robust Method
- Semi Analytical-Solution
- Sensitivity Analysis
- Smooth Complexities
- Soft Computing
- Soft biometrics
- Spatial Gaussian Markov Random Fields
- Statistical Methods
- Studies on Computational Biology
- Super Algebras
- Symmetric Spaces
- Systems Biology
- Theoretical Physics
- Theory of Mathematical Modeling
- Three Dimensional Steady State
- Topologies
- Topology
- mirror symmetry
- vector bundle

- Total views:
**12388** - [From(publication date):

April-2012 - Jul 19, 2019] - Breakdown by view type
- HTML page views :
**8546** - PDF downloads :
**3842**

**Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals**

International Conferences 2019-20