Over-diagnosis in Lung Cancer Screening using the MSKC-LCSP Data

We applied a newly developed probability method to predict long term outcomes and over diagnosis in lung cancer screening using the Memorial Sloan-Kettering Cancer study (MSKC-LCSP) data. All participants were categorized into four mutually exclusive groups depending on their diagnosis status and ultimate disease status: symptom-freelife, no-early-detection, true-early-detection and over-diagnosis. Probability of each group is a function of the three key parameters: screening sensitivity, sojourn time in preclinical state and transition density from disease free to the preclinical state. We first obtained reliable and accurate estimates of these three key parameters using the MSKCLCSP data and likelihood function with a Bayesian approach, and then calculate the probability of each group by inserting these Bayesian posterior samples to the probability formulae, to predict future long term outcomes of lung cancer screening using chest x-ray. Human lifetime was treated as a random variable derived from US. Social Security Administration (SSA), so number of screening exams in the future is a random variable as well. The result shows that over diagnosis is not a big issue in lung cancer screening, given that it is only about 4.56% to 7.43% among the screendetected cases, depending on the age at the first screening. Journal of Biometrics & Biostatistics J o u rn al of Bio metrics & Bistatis t i c s


Introduction
Lung cancer is the second most common cancer and the leading cause of cancer-related death in both males and females in the United States. 224,210 new cases and 159,260 deaths due to lung cancer are estimated in 2014 [1]. The number of estimated deaths due to lung cancer is approximately equal to the estimated deaths from breast, prostate, pancreatic and colon cancers combined. The life time risk of lung cancer is 6.88% [2], which means every 1 in 15 people will be diagnosed with lung cancer during their lifetime, with an estimated 1.5 million deaths annually in the world [3].
Smoking is the major cause of lung cancer. Chronic exposure to carcinogens found in cigarette smoking is associated with nearly 90% of all cases of lung cancer. Since 1950, several epidemiology studies have provided strong scientific evidence on the link between lung cancer and smoking. Other than smoking, occupational and nonoccupational exposure is responsible for lung cancer as well, such as radon, air pollution, polycyclic aromatic hydrocarbons, asbestos, etc [4][5][6]. For example, asbestos textile workers have a 10-fold risk of lung cancer [7], and 10% of lung cancer deaths are due to indoor radon exposure. It may take years for lung cancer cell to develop, and usually patients do not have any symptoms until the disease has progressed to a late stage [7,8].
Detecting lung cancer at an earlier stage in order to give the patient curative interventions and reduce the mortality is the main purpose of cancer screening. When a high-risk population, i.e. smokers for lung cancer, can be identified, the potential impact of screening is greater. Although the overall survival of lung cancer is poor, prognosis is better when a lung cancer patient can receive a complete surgical resection at an earlier stage. Therefore, an early detection by a screening technique may improve the overall outcome of lung cancer patients.
So far, the lung cancer screening techniques are chest X-ray, sputum cytology and computed tomography (CT). Since 1970s, chest X-ray and sputum cytology have been evaluated in high-risk populations of smokers [9][10][11]. The sensitivity of chest X-ray ranges from 54% to 84% [12][13][14][15], with a specificity above 90% [12]. The sensitivity for sputum cytology is from 27% to 66% [13,16], with a specificity of 99% [16]. In these studies cigarette smokers were assigned to take the chest X-ray and/or sputum cytology at 12 months and <6 months intervals, respectively [11,13,17,18]. Furthermore, shorter screening interval means lower probability of no-early-detection [15]. However, those studies showed neither chest X-ray nor sputum cytology can improve lung cancer-specific or overall mortality, even though these techniques could detect lung cancer at an earlier stage. One possible explanation could be that these screening tools may detect non-aggressive cancer cells at early stage, instead of aggressive cancer cells which cause the majority of lung cancer deaths. Another possibility is the lack of effective treatment for lung cancer even with early detection.
The Lung Cancer Screening Program at the Memorial Sloan-Kettering Cancer Center (MSKC-LCSP) recruited participants in the New York City area since 1974 [11]. The main purpose was to evaluate sputum cytology as a supplement to the annual chest X-ray examination for early detection and diagnosis. Briefly, the trial enrolled 10,040 men aged 45 years and older who smoked at least 1 pack of cigarettes per day (or who had smoked this much within 1 year of enrollment), and who had no prior history of respiratory tract cancer. All eligible participants were randomized by computer to either a dual-screen or X-ray only group, and were invited to attend annual exams during which posterior-anterior and lateral chest X-rays were obtained. Screening continued for 5 to 8 years at Sloan-Kettering. 288 participants developed lung cancer including both small cell and nonsmall cell cancer cases, and over 40% of them were diagnosed in stage I with 76% surviving at least 5 years. In this study, we only used the data from participants in the X-ray only group and the age groups that had more than 2 participants diagnosed with lung cancer.
The clinical lung cancer was assumed to develop through three stages: S 0 → S p → S c . S 0 represents the disease-free state, S p represents the preclinical state in which an asymptomatic individual unknowingly has disease that a screening exam can detect, and S c represents the clinical state when the disease manifests itself in clinical symptoms. The three key parameters in cancer screening are sensitivity, sojourn time distribution, and transition probability density. If a person enters the preclinical state S p at age t 1 , and the clinical symptoms present later at age t 2 , then (t 2 −t 1 ) is the sojourn time in the preclinical state. If they are offered a screening exam at time t within the age interval (t 1 ,t 2 ), and cancer is diagnosed, the length of the time(t 2 −t) is called lead time, the length of time that the diagnosis is advanced by screening. The sensitivity is the probability that the screening exam is positive, given that the individual is in the preclinical stage. The transition probability into the preclinical stage is the probability density function (PDF) of making a transition from the disease-free state to the preclinical state. Many other features can be expressed as a function of the three key parameters.
The main purpose of this project is to evaluate the long term effect and percentage of over diagnosis of chest X-ray using the MSKC-LCSP data. All initially superficially healthy individuals who planned to be screened for lung cancer are categorized into four mutually exclusive groups: (1) symptom-free-life, (2) no-early-detection, (3) true-earlydetection, and (4) over-diagnosis. Which category a participant would be in eventually depends on whether he would be diagnosed with lung cancer, and whether he would die from this cause [19,21]. The probability of each outcome is a function of the three key parameters, which could be estimated using likelihood function based on the MSKC-LCSP data [18,20].

Method
We will briefly summarize the method that has been derived by Wu et al. [19,20] that we used for this research. All initially superficially healthy people who entered the screening program were categorized in the following way: • Outcome 1 (symptom-free-life [SympF])-a male heavy smoker who took part in screening exams that never found lung cancer, and ultimately he died of other causes.
• Outcome 2 (no-early-detection [NoED])-a male heavy smoker who took part in screening exams; but whose disease manifested itself clinically and was not detected by scheduled screening.
• Outcome 3 (true-early-detection [TrueED])-a male heavy smoker whose lung cancer was diagnosed at a scheduled screening exam and whose clinical symptoms would have appeared before his death.
• Outcome 4 (over-diagnosis [OverD])-a male heavy smoker who was diagnosed with lung cancer at a scheduled screening exam, but whose clinical symptoms would not have appeared before his death.
The probability of each outcome was derived using a random variable to represent human lifetime by Wu et al. [19], and the results are summarized in the Appendix. The probability is a function of the three key parameters: screening sensitivity, sojourn time in preclinical state and transition density from the disease-free to the preclinical state. Define β(t) to be the screening sensitivity at age t; that is, the probability of a positive screening results if the individual is in the preclinical state, where t is the individual's age at the exam. Define w(t) as the PDF of a transition from S 0 to S P at age t. Let q(t) be the PDF of the sojourn time in S P , and let ( ) be the survival function of the sojourn time. Throughout this paper, the time variable t represents an individual's age. The capital letter T represents a person's lifetime, which is a continuous random variable with a PDF of f T (t).
To make predictive inference for lung cancer screening using chest X-ray, accurate estimation of the three key parameters: sensitivity, sojourn time distribution, and transition probability density, is first required because the derived probability is a function of these three parameters. These parameters were estimated using Bayesian inference and the MSKC-LCSP data. The parametric models we used are: The sensitivity was considered a constant with respect to age, from our previous study [15,18]. The transition density function is a sub-PDF, with 0.3 as its upper bound, since not all people will make a transition from the disease-free to the preclinical state during lifetime. According to Villeneuve and Mao [21], the lifetime risk for male smokers is 17.2%. Since the MSKC-LCSP participants were male heavy smokers, the risk should be much higher than that. Therefore, in this paper, 30% was chosen as a reasonable upper limit. The distribution for the sojourn time was chosen as before, a log logistic density for convenience of the computation. The parameters that need to be estimated in the above model are θ=b 0 ,μ,σ 2 ,κ,ρ. For detailed justifications on how these age effect functions were chosen, see Wu et al. [18,20].
We used Markov Chain Monte Carlo (MCMC) to generate a random posterior sample from the joint posterior distribution of the parameters from a Bayesian inference. The posterior simulation was partitioned into two chains with different starting values that were over dispersed with respect to the target distribution. Each MCMC simulation was run for 50,000 steps, with a burn-in 25,000 steps. After the burn-in time, the posteriors were sampled every 50 steps, providing 500 posterior samples for the parameter vector θ. Bayesian output diagnosis showed that the chains had converged. The 500 posterior samples from each of the two chains were pooled for the analysis, giving a total of 1000 posterior samples * , 1, ,1000 . The posterior estimates for parameters θ and the standard errors are listed in Table 1. probabilities for each category. Given the MSKC-LCSP data, the posterior predictive probability can be estimated by: Where P(Outcome i|T≥t 0 ,A,Data) represents the probability for each outcome as defined in the Appendix, and * j θ is the 1000 posterior samples from the MCMC simulation.
The above method was applied to make a predictive inference in the case of a screening program consisting of periodic lung screening tests for male heavy smokers. It was assumed that there were three cohorts with ages 40, 50, and 60 at the initial screening exam, then the probability for each category was calculated under different screening intervals: 6, 12, and 24 months. Lifetime was treated as a random variable; and its distribution was obtained using the actuarial life table from the Social Security Administration, published online. Details about how to transform the period life table into the PDF was provided in Section 4 of Wu et al. [23]. The simulation results are summarized in Table 2 The probability of symptom-free-life is quite high, above 84.5% for all age group, and is almost constant as the screening interval changes. This could be due to the fact that about 84% of male heavy smokers median is 92.05%, with the 95% highest posterior density (HPD) interval (58.37%, 99.29%). The posterior mean sojourn time is 3.35 years, with a posterior median of 1.18 years for heavy smokers. The 95% HPD interval is (0.58, 12.61) years for the sojourn time, which means 95% of the lung cancer cases have a sojourn time between 0.58 and 12.61 years before symptoms present. For those who will make a transition from disease free to the preclinical state, the transition probability density is skewed to the right, with the average time to spend in the disease-free state to be 78.48 years and a standard error of 3.13 years; the median is 78.13 years, and the density has a mode at 70 years. The estimated PDF curves for the time duration in the diseasefree state w(t) and the sojourn time q(z), and the corresponding 95% point-wise confidence band were plotted in Figure 1.  will not develop clinical lung cancer before death, or about 84% of male heavy smokers will die from causes other than lung cancer. The probability of no-early-detection increases as the screening interval increases, ranging from 2.4% to 9.4% for all cohorts. The probability of true-early-detection decreases as the screening interval increases, from 11.9% to about 4.7%. The probability of over-diagnosis is low; it is less than 1% for all hypothetic cohorts. The standard deviations are reported in parentheses in Table 2. The trend of the probability for different screening intervals with initial age 50 is shown in Figure 2.
The probability of symptom-free-life and over-diagnosis does not vary much over different screening intervals, while the probability of trueearly-detection and no-early-detection show the opposite directions as the screening interval increases.
The predictive conditional probability of no-early-detection, trulyearly-detection and over-diagnosis given that he is a diagnosed lung cancer case (including both interval case and screen-detected case) was provided in Table 3 with the standard error in the parenthesis. The pattern is more obvious across the three age groups: the probabilities of no-early-detection and true-early-detection decrease slightly as the initial age increases within the same screening interval; the probability of over-diagnosis increases slightly as the initial age increases within the same screening interval. This suggests that older people may suffer a little more from over-diagnosis.
The conditional probability of true-early-detection and overdiagnosis among the screen-detected cases was provided in Table 4, with its 95% confidence interval. These are the figures that concern people most. The probability of over-diagnosis increases slightly with the increasing initial age, and it also increases slightly with the increasing screening interval, which seems contrary to our intuition, while the probability of true-early-detection slightly decreases with the increasing screening interval. Combining these two groups together is called the screen-detected cases; when screening interval increases, the probability of screen-detected cases will decrease, as shown in Table 3, and it is compatible with our intuition. However, when we compare over-diagnosis and true-early-detection within the screen-detected cases, since the probabilities of the two have to add up to 100%, one of them will increase, and the other will decrease, as it is shown in Table 4. And we can see that the probability is changing very slightly, less than 1.5% within each age group in Table 4. Overall, the probability of overdiagnosis is small, less than 8%.

Discussion and Conclusions
We applied the probability calculation method in Wu et al. [19,20] to the Lung Cancer Screening Program data from the Memorial Sloan-Kettering Cancer Center. Bayesian analysis was applied because it can incorporate uncertainty, and it is easy to calculate the variations and the CIs of the percentages. The results provide policy makers with some useful information regarding screening among male heavy smokers by giving important estimates of the probability of true-early-detection, no-early-detection, symptom-free-life, and over-diagnosis.
The mean sojourn time for male heavy smokers is about 3.35 years, with a 95% credible interval (0.58, 12.61) years. The sensitivity for chest X-rays is 86.64%. The transition probability from the disease-free to the preclinical states has a peak around age 70. We compared these results with the Mayo Lung Project study, which has similar study design with this study. In the Mayo Lung Project study, the mean sojourn time was shorter (2.2 years), the mean sensitivity was greater (89%), and the transition probability density is peaked at 68 years old [18].
Furthermore, the estimated probability of symptom-free-life is about 84% for male heavy smokers. That is, male heavy smokers have Δ t months Mean probability (S.E.), %

P(SympF) P(NoED) P(TrueED) P(OverD)
Age at first screening t 0 = 40    a lifetime risk of 16% for lung cancer, which is higher than the lifetime risk for the general population (about 7%) published by the NCI's "SEER Fast Fact Stats" database [1]; it is close to the estimated risk for male smokers (17.2%) from Villeneuve and Mao [21]. The proportion of over-diagnosis among the screen-detected cases is about 4.56%-7.43% among all age groups, showing that more than 94% of the screendetected cases are true-early-detection cases and immediate treatment is needed. We have used the actuarial life table from the Social Security Administration website to estimate these probabilities. The life table is built for general US population. Most people believe that smokers have a shorter expected lifetime: even if they don't die of lung cancer, they might have higher death rates for many other causes. We agree with this, but we cannot find any documents on the lifetime of smokers. However, a shorter expected lifetime will cause the probability of overdiagnosis even smaller. According to Patz et al. [24], the over-diagnosis of lung cancer by low-dose computed tomography (LDCT) screening could be more than 18%, much higher than our estimation here. Their method is very simple: using the number of lung cancer cases detected by LDCT minus the number detected by Chest X-ray (CXR) arm, then divided by the total number of screen-detected cases in the LDCT arm.
Using our model, we can evaluate and compare the characteristics of different cohorts under different screening frequencies in the future, not only over diagnosis, but the whole cohort. The advantage of modeling is that it can provide predictive answers regarding long term effects and over diagnosis based on existed screening data, such as, what is the percentage of the symptom-free-life in this age cohort if people plan to take screening in a fixed schedule in the future? What is the percentage of true-early-detection versus over-diagnosis among the screen-detected cases? Scientists and policy makers could use this method to evaluate the screening techniques and improve population's health. The limitation of the model is that since the probability of each case is a function of screening sensitivity, sojourn time in the preclinical state, and transition density from the disease-free to the preclinical state, reliable estimation of these three key parameters from screening data is a priority. If the estimation of these three key parameters is not accurate, then the predicted probability of each case may be biased. Each graph is a boxplot of the probabilities (from the 2000 posterior samples) for different future screening intervals of 6, 12 and 24 months.
, the probability of each outcome given his lifetime For an individual at current age 0 t , the number of future screens is unknown; however, if he plans to fellow a prefixed screening schedule, 0 1 K t t t < < < <  , then the probability of each outcome can be obtained by: That is, the probabilities for these four outcomes always add up to 1, given that an individual is asymptomatic before they take the first exam.