Reach Us +44-1522-440391
Efficacy of Dual Lung Cancer Screening by Chest X-Ray and Sputum Cytology Using Johns Hopkins Lung Project Data | OMICS International
ISSN: 2155-6180
Journal of Biometrics & Biostatistics

Like us on:

Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on Medical, Pharma, Engineering, Science, Technology and Business
All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.

Efficacy of Dual Lung Cancer Screening by Chest X-Ray and Sputum Cytology Using Johns Hopkins Lung Project Data

Seongho Kim1, Diane Erwin2 and Dongfeng Wu1*

1Dept of Bioinformatics and Biostatistics, School of Public Health and Information Sciences, University of Louisville, Louisville, KY 40202, USA

2Information Management Services, Inc., Rockville, MD 20852, USA

*Corresponding Author:
Dr. Dongfeng Wu
Dept of Bioinformatics and Biostatistics
School of Public Health and Information Sciences
University of Louisville, Louisville, KY 40202, USA
E-mail: [email protected]

Received Date: March 14, 2012; Accepted Date: April 21, 2012; Published Date: April 23, 2012

Citation:Kim S, Erwin D, Wu D (2012) Efficacy of Dual Lung Cancer Screening by Chest X-Ray and Sputum Cytology Using Johns Hopkins Lung Project Data. J Biom Biostat 3:139. doi:10.4172/2155-6180.1000139

Copyright: © 2012Kim S, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Biometrics & Biostatistics


Cytology; X-ray; Sojourn time; Sensitivity; Transition probability; Lung cancer screening


Lung cancer is the leading cause of cancer death for both men and women and more people die of lung cancer than breast, colon, and prostate cancers combined. The lungs are located in the chest and most lung cancer begins in the cells that line bronchi. There are two major types of Lung cancer which need different treatments: One is small cell lung cancer, making up about 20% of all lung cancer cases and the other is non-small cell lung cancer, which is the most common type of lung cancer. The age-specific lung cancer incidence rate rises with advancing age and reaches its peak between 65 and 74 [1].

The most common cause of lung cancer deaths is cigarette smoking, although lung cancer has occurred in people who have never smoked. There are other causes such as high levels of air pollution, radon gas, family history, radiation therapy to the lungs, etc. [2]. It may take years for lung cancer cells to develop, and often people do not have any symptoms, until the disease has progressed to a late stage. As a result, less than 15% of lung cancers are discovered in early stages, when the possibility of curative treatment is the greatest. However, there are many arguments on whether early detection could lead to significant reduction in lung cancer mortality. A recent study by the National Lung Screening Trial (NLST) showed that there is a 20% reduction of lung cancer mortality among those who received spiral Computed Tomography (CT) than those who received chest X-rays [3]. Nonetheless, there is no recommendation for regular lung cancer screening so far.

In several randomized controlled trials of lung cancer screening, chest X-ray and/or sputum cytology were used simultaneously. For example, Melamed et al. [4] examined the annually dual screening with X-ray and cytology for the Memorial Sloan-Kettering study, concluding that the cytology is not necessary as an annual screening. Chien et al. [5] studied mean sojourn time for lung cancer by chest X-ray screening. Doria-Rose et al. [6] evaluated the benefit of each screening in terms of long-term mortality reduction, addressing that there is a modest benefit from sputum cytology screening on lung cancer screening. Wu et al. [7] investigated screening sensitivity, transition probability, and sojourn time for 4-monthly chest X-ray and sputum cytology screenings using the Mayo Lung Project data. However, none of them considered testing the dependency of the two diagnostic procedures, chest X-ray and sputum cytology, conditional on lung cancer status in terms of screening sensitivity. Therefore, in this work, the correlation coefficient of the two tests, chest X-ray and sputum cytology was estimated for testing the independence of the two screening procedures using the Johns Hopkins Lung Project (JHLP) data. In addition, we investigate the overall screening sensitivity, transition probability, sojourn time, and lead time of the two screening procedures.

We assume that the lung cancer develops by progressing through three states, denoted by S0→ Sp→ Sc, corresponding respectively, to the disease-free state, the preclinical cancer state where an asymptomatic individual unknowingly has lung cancer that a screening exam can detect; and the clinical state when the lung cancer manifests itself in clinical symptoms. If a person enters the preclinical state (Sp) at age t1 and one’s clinical symptoms present later at age t2 then (t2 - t1) is the sojourn time in the preclinical state. If one is offered a screening exam at time t within the interval (t1, t2), and cancer is diagnosed, then the length of the time (t2 - t) is the lead time.

The main objective in lung cancer screening is to detect lung cancer as early as possible, i.e., in the preclinical state where one has tumor but there are no symptoms. The screening model has three important parameters, sensitivity of the diagnosis modality, the sojourn time distribution, and the transition probability from disease-free to preclinical state. These three parameters can be called the building blocks for cancer screening modeling, as all other parameters of interest, such as the lead time, and the probability of over-diagnosis can be expressed as a function of these three key parameters.

Materials and Methods

Johns Hopkins Lung Project Data

The designs of the Johns Hopkins Lung Project (JHLP) can be found in the literature reported previously [8]. JHLP trials enrolled 10,386 men in the Baltimore metropolitan area between 1973 and 1978 aged at least 45 years, who smoked at least one pack of cigarettes per day (or who had smoked this much within one year of enrollment) and who had no prior history of respiratory tract cancer. Then all participants were randomized to either chest X-ray only or a dualscreen (chest X-ray and sputum cytology) group, resulting in 5,160 to chest X-ray only arm and 5,226 to the dual-screen arm. Participants in the chest X-ray group received chest X-ray screening test annually, for 8 consecutive years. Participants in the dual-screen group took chest X-ray annually and received sputum cytology every 4 months, for 8 consecutive years, for a total of 22 screening time points altogether. Out of the 22 screenings, only 8 annual screenings include both chest X-ray and sputum cytology procedures. If any of the tests was positive, then the screen was considered positive and a definitive work-up exam, such as biopsy, was done. The data that we used includes the total number of participants in each screening exam, the number of detected and confirmed cancer cases in each screening exam, and the number of interval cases. These data were stratified by age at entry. The age at entry ranges from 45 to 88 years old in the JHLP. However, we only used the data from age 45 to age 67, because the other age groups have too few participants, and may cause large deviations in the estimation.

Model specification

The methods introduced by Shen et al. [9] and Wu et al. [7,10] were employed for the analysis of the JHLP data and we follow their notations and conventions.

Suppose two screening procedures, chest X-ray and sputum cytology (we hereafter call them x-ray and cytology, respectively), are being applied independently to each individual during an earlydetection trial. Screening exams can result in four mutually exclusive cases where lung cancer exists. That is, Case 1 is when the cancer was identified by x-ray only; Case 2 is the case when identified by cytology only; Case 3 is when identified by both x-ray and cytology; and Case 4 is where cancer was not identified by any procedure. Let Xj and Z be binary random variables. Xj denotes the outcome of procedure j (j = 1 means x-ray, 2 means cytology) with one for a suspicious finding, i.e., positive and zero for a normal finding, i.e., negative and Z represents the true disease state of an individual with one for having disease and zero for not having the disease. Then

αab=Pr(X1 = a, X2 = b|Z=1), a, b = 0,1,


β = Pr(max (X1,X1 )=1|Z=1)=1-α00,

where β is the overall sensitivity of the screening program, and the individual sensitivities of x-ray and cytology are

β1 = Pr (X1 = 1|Z = 1) = α1011

and β2 = Pr (X2 = 1|Z = 1) = α0111,respectively. The schematic representation of these probabilities can be found in Figure 1. Since 0 < αab< 1 for a,b=0,1, it follows that 0 < β1, β2 < 1 and that β1 , β2 < β and β < β1 + β2 . Based on [9], we obtain the correlation coefficient r, between X1 and X2 conditional on Z = 1 by the following equation:



Figure 1: The Venn diagram between the sensitivities of x-ray and cytology procedures. β1, β2, and β are the sensitivities of the chest X-ray, sputum cytology, and dual procedures, respectively.

For a group of people whose first screening exam is taken at age t0, we let t0 < t1 < …< tK-1 < T represent K ordered screening exam times and let T denote the follow-up time past the last examination. The ith screening interval is (ti-1, ti) for i = 1,2,…,k. We adopt the following notation: the ith annual screening exam happens at the age Equation is the total number of individuals examined at ti-1, si,t0 is the number of cases diagnosed at the exam given at ti-1, and ri,t0 is the number of interval cases within the interval (ti-1, ti). For the ith exam, define Equation to be the number of cases detected by x-ray, cytology, and both procedures, respectively, then the total number of cases detected at the ith exam is Equation

Let Di,t0 be the probability of an individual definitively diagnosed at the ith scheduled exam given at ti-1 and I be the probability of an interval case occurring in the ith interval (ti-1, ti). These two probabilities are as follows:



Where w(t)dt is the probability of a transition from S0 to Sp during (t, t + dt); q(t) is the probability density function of the sojourn time in Sp; is the survivor function of the sojourn time in the preclinical state Sp. The transition probability density function is the probability density function of a lognormal distribution with μ and σ2 multiplied by 20%, i.e., a sub-density function:


Note that 20% was selected based on the previous analysis on Lung cancer screening of [7]. We employed the log-logistic distribution to model the sojourn time in the preclinical state according to [10]:


where x is the sojourn time, and κ and ρ are positive parameters to represent the scale and location in the log-logistic family.

The JHLP data have annual x-ray and 4-monthly cytology with different screening intervals. As a result, there were two screens each year with only cytology. In other words, there were one dual-screening (x-ray and cytology) and two single-screening (cytology only) exams each year. To correct two missing x-ray screenings on cytology a year, we modeled the following likelihood:


Equation (6)

where Kd denotes the number of dual-screenings and Ks the number of single-screenings with K = Kd + Ks Note that the probabilities of being diagnosed and incident for the single-screenings depend on the sensitivity for cytology only, i.e., only β2 is considered instead of β since the information of x-ray is missing.

Likelihood Ratio Test (LRT)

To ensure the dependence of two screening procedures, the likelihood ratio test is considered between two log-likelihood functions. One is under the null hypothesis (H0) when the correlation coefficient, r, is equal to zero and the other log-likelihood function is under the alternative hypothesis (H1) when r is not equal to zero. It is well known that the log-likelihood ratio test statistic under H0 is


Where θ = (β1, β2, r, μ,σ2,κ, ρ) and θ* = (β1, β2, r, μ,σ2,κ, ρ), and this test statistic is approximately X2 distributed with degree of freedom of one.

Lead time estimation

The lead time estimation is performed using the method developed by Wu et al. [11]. In this study, the classical approach is adapted for the lead time estimation, although Wu et al. [11] proposed the Bayesian approach based on the posterior probability calculation. One major characteristic of the lead time (L) is that its distribution is a mixture of point mass at zero and a piecewise continuous density. That is, it is composed of the conditional probability P(L = 0 | D =1) and the conditional probability density function zfL(z|D=1), for any 0 < z ≤ T - t0, where D is a binary variable with D =1 indicating development of clinical disease and D=0 indicating the absence of the clinical disease before death; T is human life span; t0 is the individual’s age at one’s initial screening exam. In particular, P(L = 0|D =1) can be interpreted as the probability of “no-early-detection” or “no benefit” of a screening procedure since it is the probability of lead time equals to zero. For a more detailed overview of lead time and its estimation, please refer to [11].


The global optima of the parameter θ under H0 and H1 using PSO described in the Supplementary data are shown in Table 1. The null model H0 and the alternative model H1 have very similar estimates for all corresponding parameters. To test if the correlation coefficient r is significantly different from zero, the LRT test was performed, which is described in the previous section, and the resulting p-value is 0.5903, meaning that there is no strong evidence to support that the correlation coefficient is different from zero. In other words, the sensitivities of the two screening modalities are considered independent. In particular, the sensitivities for each procedure are 79.93% and 26.98% for x-ray and cytology respectively, and the dual screening has a sensitivity of 85.34%, suggesting that the dual screening improves the overall cancer detection rate up to ~ 5%. Note that the standard error of the estimate of κ is relatively larger than other estimates. Since the null hypothesis is not rejected in terms of the p-value, we hereafter use the parameter estimates under H0 for further analysis.

Model β1 β2 r μ α2 κ ρ ll* p-value#
H0 0.7993
- 3.7967
(0.33 ´10-6)
(0.38 ´10-5)
-305.7977 0.5903
H1 0.7993

Table 1: Parameter estimates and their standard errors for two different models. The numbers in parentheses are the standard error of the estimates. The null hypothesis (H0) is when two sensitivities are independent (r = 0) and the alternative hypothesis (H1) is when two sensitivities are not independent (r ≠ 0). The estimated sensitivity for the dual screening (Ã) is 0.8534 and 0.8550 for H0 and H1, respectively.

The density curves of the transition probability and the sojourn time are displayed in Figure 2. The transition probability of males from age 30 to 6 5 ranges from 1.087×10-3 to 8.952×10-3. It means that, depending on their age, there will be 1.087 to 8.952 people transitioning from the disease-free state to the preclinical lung cancer per year in every 1000 people. The transition probability increases to the mode at age 42.7 and then decreases monotonically. Likewise, the sojourn time is not a monotonic function of age, having the mode at 0.99 year and the median at 1.06 year. The mean sojourn time is 1.13 years with the standard deviation of 0.42 years.


Figure 2: The density plots of the (a) transition probability and (b) sojourn time.

The lead time estimation for x-ray, cytology, and dual procedures is reported with the different screening intervals (Δ = 4,6,9,12,18,24 months). We can see that the mean lead time increases as the screening time interval decreases. If someone will have lung cancer eventually and he takes more frequent screening exams, then he might have a longer lead time, hence he might be treated at an earlier stage for potentially better prognosis. As a result, the probability of “no benefit” has the maximum at Δ = 24 months. For example, the probability of “no benefit” P1 for the dual procedure is 50.47% at Δ = 24 months and 1.36% at Δ = 4 months, resulting an increasing 49.11%. However, due to the smaller sensitivity of 26.98% in the cytology procedure, its probability of “no benefit” even at Δ = 4 is 38.38%, which is larger than other procedures. The density curves for the lead time are depicted with different screening intervals as shown in Figure 3. The modes of all the density curves are shifted to the left as the screening interval increases, meaning that the mean lead time decreases.


Figure 3: Density plots for the lead time in the JHLP data according to different screening intervals (Δ). The six different Δ’s are used: 0.33 (4 months), 0.5 (6 months), 0.75 (9 months), 1 (12 months), 1.5 (18 months), and 2 (24 months).

Discussion and Conclusions

A likelihood-based approach was applied to the JHLP data for estimating the sensitivity, the transition probability density, the sojourn time, and the lead time. In addition, the dependency of the two screening procedures was investigated using the likelihood ratio test followed by examining the efficacy of the dual screening.

In the JHLP data, the x-ray procedure is an annual exam, while the cytology procedure is a four-months screening. As a result, the x-ray has two fewer screening exams than the cytology each year, it can be considered that the data has missing information on x-ray. Therefore, we modified the likelihood function to fit this missing data with the probabilities of being diagnosed and incident different from [9]. However, this missing information makes it difficult for the parameter estimation to converge due to an unwanted early termination or getting stuck to a saddle point. For this reason, a derivative-free global optimization algorithm, PSO, was applied to the JHLP data so that the parameters were able to be estimated.

The efficacy of the dual screening for lung cancer detection is examined through the sensitivity. The screening procedure with x-ray only is improved from 79.93% to 85.34% when the screening exams were combined with cytology. The independence test of these two screening procedures is further conducted, concluding that there is no significant dependency between the two procedures. In other words, the cytology procedure would detect the lung cancer cases that the x-ray procedure misses, and vice versa. However, the benefit of the dual screening is likely to be small since the improvement is about 5% of the sensitivity, which is consistent with the conclusion of Doria-Rose et al. [6].

On the other hand, the overall sensitivity of the JHLP study is slightly lower than that of the Mayo Lung Project (MLP) study in which the overall sensitivity of the dual screening was 89% [7]. In fact, the mode of the transition probability of the MLP study was at age 68, which is larger than that of the JHLP study with age 42.7. This means the transition to the disease is occurred earlier than the MLP study does. As a result, it might make the overall sensitivity lower since the smallest entering age of the study in the JHLP data is age 45. Another reason might be that the MLP study is four-monthly exams for both x-ray and cytology.

The probability of “no benefit” of the JHLP study is interestingly much smaller than that of the MLP study, even though the overall sensitivity of the MLP study is larger. For example, P1 at Δ = 12 is 32.74% of the MLP study, but the JHLP study has P1 of 19.01% for the dual screening. This might be because its mean sojourn time is rightly skewer than that of the MLP study. In fact, the mean sojourn time of the JHLP data is much longer than that of Chien et al. [5] and smaller than that of the MLP study, causing the lead time to be shorter.

The cytology procedure is likely not to be effective if the screening interval is larger than 6 months since the probability of “no benefit” becomes more than 50% and reaches to 71.17% at Δ = 12, meaning that the cytology procedure should be offered frequently with less than Δ = 6 months to have a benefit, which is compatible with the suggestion of Melamed et al. [4].

Although the x-ray is offered less frequently than the cytology procedure, the JHLP study shows that the x-ray and cytology procedures can be considered statistically independent and so the dual screening improves the overall sensitivity.


We want to thank Dr. Philip Prorok for allowing us to have access to the Johns Hopkins Lung Project (JHLP) data.

Conflict of Interest Statement

We declare that we have no conflict of interest.


Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Article Usage

  • Total views: 12388
  • [From(publication date):
    April-2012 - Jul 19, 2019]
  • Breakdown by view type
  • HTML page views : 8546
  • PDF downloads : 3842