alexa Posterior Inference for White Hispanic Breast Cancer Survival D ata | Open Access Journals
ISSN: 2155-6180
Journal of Biometrics & Biostatistics
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Posterior Inference for White Hispanic Breast Cancer Survival D ata

Hafiz MR Khan1*, Anshul Saxena2 and Alice Shrestha1

1Department of Biostatistics, Robert Stempel College of Public Health & Social Work Florida International University, Miami, FL 33199, USA

2Department of Health Promotion & Disease Prevention, Robert Stempel College of Public Health & Social Work, Florida International University, Miami, FL 33199, USA

*Corresponding Author:
Hafiz MR Khan
Department of Biostatistics
Robert Stempel College of Public Health & Social Work Florida International University
Miami, FL 33199, USA
Tel: +001-305-348-4908
Fax: +001-305-348-4901
E-mail: [email protected]

Received date: November 18, 2013; Accepted date: January 11, 2014; Published date: January 18, 2014

Citation: Khan HMR, Saxena A, Shrestha A (2014) Posterior Inference for White Hispanic Breast Cancer Survival Data. J Biomet Biostat 5: 183. doi: 10.4172/2155-6180.1000183

Copyright: © 2014 Khan HMR, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are are credited.

Visit for more related articles at Journal of Biometrics & Biostatistics

Abstract

The purpose of this paper is to develop a statistical probability model and to obtain posterior inference for the parameters given the survival times of the White Hispanic female cancer patients. Stratified random sample of White Hispanic female patients’ survival data was used to derive a best fit statistical probability model. The study sample was extracted from the Surveillance Epidemiology and End Results (SEER) cancer registry database. Three model building criterions were utilized; Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and Deviance Information Criteria (DIC) to measure the goodness of fit. We found that the Exponentiated Weibull model fits the survival times better as compared to other widely known statistical probability models. The Bayesian approach is employed to derive the posterior inference for the parameters.

Keywords

Breast cancer survival data; Statistical inference; Bayesian inference

Introduction

Breast cancer is a cancer that begins in the tissues of the breast. It develops as a result of uncontrolled growth of altered cells of the breast [1]. These cells form a tumor, which can be palpated and felt by some females as a lump or mass during breast self-examination. Recent studies have indicated that one in eight women in the United States (U.S) will develop invasive breast cancer during their lifetime [2]. Regardless of what race and ethnicity women belong to, breast cancer in U.S. is the most common cancer in women excluding some kinds of skin cancer [3]. It accounts for a very high prevalence of 16% among all cancer types [4]. In 2008, breast cancer claimed 458,400 lives [5]. Moreover, approximately 60% of deaths due to breast cancer occurred in developing nations, which contrasted the common belief among cancer researchers that breast cancer is prevalent in developed countries. In 2010, 206,966 women were diagnosed with breast cancer and 40,996 women died from the disease in the US [6]. The incidence rate of breast cancer among White females is 119.5 per 100,000 population, which is highest among all races [3].

Breast cancer is the most common cancer among American women accounting for the highest overall incidence rate of 123.1 per 100,000 population among all cancers [1,6]. Some major known risk factors of breast cancer are age, smoking, excessive alcohol drinking, obesity, lactation, and family history [7,8]. It was believed that level of awareness, fewer mammograms, socio-economic factors, and lack of access to health care are strong risk factors for breast cancer. However, recent studies indicate that ethnic differences could also be an important factor associated with breast cancer mortality and incidence rates [1].

In 2013, an estimated 232,340 new cases of invasive breast cancer diagnosis were expected to be diagnosed among US women along with an estimated 64,640 additional cases of in situ breast cancer. Approximately 39,620 women were expected to die from breast cancer [9]. The incidence rate of breast cancer among all racial and ethnic groups remained stable from 2004-2008. The death rates among all ethnic groups began decreasing in the early 1990s except for the American Indians/Alaska Natives [2]. According to the American Cancer Society, sufficient evidence supports the fact that there are disparities among breast cancer death rates by state, socioeconomic status, and race/ethnicity [2]. Research focusing on each of these determinants is necessary in order to have an understanding of the breast cancer epidemic among American women. Despite the fact that age remains the strongest risk factor for breast cancer, race and ethnicity also contributes to the increased probability of developing breast cancer [1]. White women are more likely to develop breast cancer compared to any other racial group in the United States [1].

According to the US Census Bureau, 50.0 million Americans, or 16% of the total population identified themselves as Hispanic or Latino in 2010 [4]. About 17,100 diagnosis among Hispanic women were expected to be diagnosed and 2400 were expected to die from breast cancer in 2012 [4]. Among all ethnicities in US, the overall death rates of breast cancer are highest among Hispanic women [3]. Lack of awareness and inaccessibility to health care are possible factors that explain the elevated risk of breast cancer among Hispanic women as compared to other ethnic groups [1].

Although previous studies suggest that Mexican women have a lower risk of developing breast cancer, current research indicates an increase in the incidence rates of breast cancer among Mexican- American women. The Arizona Cancer Center and three Mexican universities have collaborated in the Ella Binational Breast Cancer Study (EBBCS) to gather data that can provide insight on the breast cancer differences between Mexican native and Mexican-American women [1]. According to the EBBCS, Mexican women who live in the US have increased risk of breast cancer due to lifestyle and reproductive factors. Studies conducted on all other U.S. born Hispanic women correspond to the EBBCS findings. Physical inactivity, early menarche, late menopause, postmenopausal obesity, and alcohol consumption are responsible for the increase in the risk of breast cancer among the Mexican-American women [4]. On the other hand, having more children breast-feeding for a longer period of time, active lifestyle, and more fiber consumption lowers the risk of getting breast cancer among women born in Mexico [1].

Cancer survival data is recorded and stored at various hospitals and cancer registries, so that it can be used for future analysis. There is a high demand for novel statistical analysis and methods to understand such type of data in a scientific manner. Statistical analysis can give an idea of inferences about the exiting survival data and its probability model.

The main goals of this paper are (i) to review certain right skewed models; (ii) to give a justification that the given sample data set follows a specific model by using model selection criterions for goodness of fit tests; and (iii) to perform a Bayesian analysis of the posterior distribution for the parameters.

This paper is organized as follows. A real breast cancer survival data example related to goodness of fit and reparameterization, and posterior inference for the model parameters of white ethnicity are presented in Section 2. Finally, conclusion is added in Section 3.

Probability Model Testing

The data extracted from healthcare experiments may follow several statistical probability models, for example, exponential, gamma, lognormal, Weibull, exponentiated exponential (EE), exponentiated Weibull (EW), and beta generalized exponential (BGE), etc. Statistical methodologies are immensely necessary to understand and make scientific conclusions from such type of data.

There are many statistical probability models have been used in modeling survival data. In this paper we consider exponentiated exponential model (EEM), beta generalized exponential (BGEM), exponentiated Weibull model (EWM), and beta inverse-Weibull (BIW) because for specific values of the parameters they reduces to certain statistical probability models.

The exponentiated exponential model (EEM) is used in modeling the data from engineering and biomedical sciences. The EEM has two parameters, scale and shape. A random variable x is said to have an exponentiated exponential distribution if its probability density function (pdf) is given by

equation

where α>0 and λ>0 are the shape and scale parameters, respectively.

The probability density function of beta generalized exponential model is given by {

equation

where the shape parameter, α>0 and the scale parameter, λ>0. There are two additional parameters, a>0 and b>0 whose role is to introduce skewness and to vary tail weight [10]. The BGE model generalizes some well-known models; beta exponential and generalized exponential models are the special cases.

Mudholkar and Srivastav [11] presented the first exponentiated Weibull model (EWM). The probability density function for the exponentiated Weibull model is given by

equation

where α>0 and β>0 are the shape parameters, and λ>0 is the scale parameter.

The beta inverse-Weibull (BIW) model is one of the widely used distributions for problems in medicine and reliability. It shows a good fit to several data sets such as the times to breakdown of an insulating fluid and subject to constant tensions [12]. The probability density function of beta inverse-Weibull model is given by

equation

where β>0 is the shape parameter, and two additional parameters, a>0 and b>0 whose role is to introduce skewness and to vary tail weight.

A Bayesian method is used to explore the posterior probability for the parameters from the EEM, BGEM, EWM, and BIWM. The purpose of Bayesian method is to develop the posterior inference for the parameters given a set of observed data. For further information about Bayesian method, the readers can refer to several published works [13-19]. Additional applications of Bayesian method for predictive inference have been discussed by a number of authors [20-26].

Example of Breast Cancer Survival Data

We used the breast cancer data (N=657,712) from Surveillance, Epidemiology and End Results (SEER, 1973-2009) cancer registry website in the USA [27]. In the USA, there are twelve states that collect breast cancer patients’ information. The total SEER data were by gender: males=4,269 and females=653,443. Among the total females=608,032, White Hispanic=22,639, White non-Hispanic=531,562. Since there is a small chance that breast cancer will occur in males, they were not considered in this study. Stratified random sampling scheme was used to randomly select a sample of nine states out of twelve data-recorded states to represent White Hispanic breast cancer cases. Exclusion of three states will allow other researchers to perform external validation of our findings, since the information-theoretic criteria are essentially internal validations. However, external validation is beyond the scope of the present study. Finally, a simple random sampling (SRS) method was then used to select 2,000 White Hispanic patients from the selected nine states.

Goodness of fit and reparameterization

The most commonly used methods to measure the goodness of fit for the models are Akaike Information Criterion (AIC), Deviance Information Criterion (DIC), and Bayesian Information Criterion (BIC). Among these methods, DIC is the most widely used method. It is a Bayesian measure of fit, which is used for comparison of different models where the samples of the posterior distribution of parameters are obtained by Markov chain Monte Carlo (MCMC) methods, for example, the use of public data by Congdon [28,29]. The values of DIC can be both positive and negative. Model with lower DIC value is considered better than others. BIC is an asymptotic result assumed that the distribution of data is an exponential family. Similar to AIC, given any two estimated models, the model with a lower value of BIC is preferred. We used multiple information-theoretic criteria in this study to see whether or not each criterion agrees in selecting a best fitting model. Achcar et al. [30] used a re-parameterization for certain skewed models. A re-parameterization method may apply considering the log-likelihood functions based on x=(x1,x2,…,xn) from the models described earlier, which are given below:

The log-likelihood function from the EE model which is given by equation

Assume ρ1=log(α) and ρ2=log(λ). We further assume that ρ1 and ρ2 are independently distributed. To obtain non-informative prior for ρ1 and ρ2, let a uniform prior distribution for ρi be equation

The log-likelihood function from the beta generalized exponentiated model, which is given by

equation

Assume ρ1=log(a); ρ2=log(b); ρ3=log(α); and ρ4=log(λ). We further assume that ρ1, ρ2, ρ3, and ρ4 are independently distributed. To obtain non-informative prior for ρ1, ρ2, ρ3, and ρ4 let a uniform prior distribution for ρj be equation

The log-likelihood function from the EW model which is given by

equation

Assume ρ1=log(α); ρ2=log(β); and ρ3=log(λ). We further assume that ρ1, ρ2 and ρ3 are independently distributed. To obtain noninformative prior for ρ1, ρ2 and ρ3, let a uniform prior distribution for ρk be equation

The log-likelihood function from the beta inverse Weibull model, which is given by

equation

Assume ρ1=log(β); ρ2=log(a); and ρ3=log(b). We further assume that ρ1, ρ2 and ρ3 are independently distributed. To obtain noninformative prior for ρ1, ρ2 and ρ3 let a uniform prior distribution for ρg be equation

Table 1 presents the results of the measures of goodness of fit. The posterior distributions for the parameters and their results are reported in Tables 2-5. The posterior kernel densities for the parameters are given in Figures 1-4. Specifications for the kernel density estimation were the WinBugs defaults.

Model criterions AIC BIC DIC
Exponentiated exponential 19430.400 19441.602 19430.426
Exponentiated Weibull 19425.700 19442.001 19423.700
Beta generalized exponential 19433.300 19455.703 19429.300
Beta inverse Weibull 19444.400 19465.700 19442.300

Table 1: Selection of the best model for White Hispanic females on the basis of AIC, BIC, and DIC criterions.

Node Men SD MC error Median 95% CI Sample
alpha 8.152 0.3771 0.005046 8.144 (7.44, 8.909) 50,000
lambda 0.0351 7.19E-04 9.50E-06 0.0351 (0.03369, 0.03652) 50,000
rho1 2.097 0.04627 6.18E-04 2.097 (2.007, 2.187) 50,000
rho2 -3.35 0.02049 2.71E-04 -3.35 (-3.39, -3.31) 50,000

Table 2: Summary results of the posterior parameters in the case of exponentiated exponential for White Hispanic females breast cancer patients (n=2,000).

Node Mean SD MC error Median 95% CI Sample
alpha 6.338 0.3607 0.0171 6.295 (5.744, 7.164) 50,000
beta 1.099 0.01956 0.001189 1.103 (1.052, 1.124) 50,000
lambda 0.02083 0.002238 1.38E-04 0.02019 (0.01839, 0.02642) 50,000
rho1 1.845 0.05629 0.002639 1.84 (1.748, 1.969) 50,000
rho2 0.09379 0.01798 0.001095 0.0981 (0.05065, 0.1166) 50,000
rho3 -3.877 0.1015 0.006212 -3.903 (-3.996, -3.634 50,000

Table 3: Summary results of the posterior parameters in the case of exponentiated Weibull (EW) for White Hispanic females breast cancer patients (n=2,000).

Node Mean SD MC error Median 95% CI Sample
a 4.741 1.319 0.08746 4.483 (2.928, 7.226) 50,000
alpha 1.811 0.4869 0.03249 1.777 (1.093, 2.665) 50,000
b 1.042 0.02052 2.94E-04 1.044 (1.003, 1.071) 50,000
lambda 0.03407 8.50E-04 1.64E-05 0.03404 (0.03246, 0.03579) 50,000
rho1 1.518 0.2774 0.01853 1.5 (1.074, 1.978) 50,000
rho2 0.04052 0.01977 2.83E-04 0.04303 (0.002838, 0.0688) 50,000
rho3 0.5564 0.2759 0.01844 0.5747 (0.08872, 0.9801) 50,000
rho4 -3.38 0.02494 4.80E-04 -3.38 (-3.428, -3.33) 50,000

Table 4: Summary results of the posterior parameters in the case of beta generalized exponentiated for White Hispanic females breast cancer patients (n=2,000).

Node Mean SD MC error Median 95% CI Sample
a 1.031 0.01789 9.39E-05 1.031 (1.002, 1.06) 50,000
b 1.047 0.02722 1.25E-04 1.046 (1.002, 1.092) 50,000
beta 403.2 0.2026 0.001536 403.3 (402.7, 403.4) 50,000
rho1 6 5.03E-04 3.81E-06 6 (5.998, 6.0) 50,000
rho2 0.03037 0.01736 9.11E-05 0.03054 (0.001552, 0.05857) 50,000
rho3 0.04527 0.02601 1.20E-04 0.0453 (0.002265, 0.08786) 50,000

Table 5: Summary results of the posterior parameters in the case of beta inverse Weibull for White Hispanic females breast cancer patients (n=2,000).

biometrics-biostatistics-exponentiated-exponential

Figure 1: Kernel density of the posterior parameters in the case of exponentiated exponential for White Hispanic females breast cancer patients (n=2,000).

biometrics-biostatistics-exponentiated-weibull

Figure 2: Kernel density of the posterior parameters in the case of exponentiated Weibull for White Hispanic females breast cancer patients (n=2,000).

biometrics-biostatistics-beta-generalized

Figure 3: Kernel density of the posterior parameters in the case of beta generalized exponential for White Hispanic females’ breast cancer patients (n=2,000).

biometrics-biostatistics-beta-inverse-weibull

Figure 4: Kernel density of the posterior parameters in the case of beta inverse Weibull for White Hispanic females’ breast cancer patients (n=2,000).

Results of goodness of fit tests and posterior inference for the parameters from the White Hispanic survival data

The following AIC, BIC, and DIC values are calculated and the posterior inference for the parameters with their corresponding kernel densities are obtained.

Table 1 consists of AIC, BIC, and DIC values for the EE, EW, BGE, and BIW models. Lower values of AIC, BIC, and DIC infer better model fit. The data fits EW distribution better than other models. The estimated value of AIC is the lowest (19425.700) while the DIC value is the least (19423.700) in the case of EWM. The estimated value of BIC (19442.001) is very close to the lowest (19441.602) while the DIC value is the lowest (19423.700) in the case of EWM. Comparing the estimated values of AIC, BIC, and DIC for the models, the EWM fits better for the survival days because it produces smaller values of AIC, BIC, and DIC.

Table 2 indicates summary results of the posterior distribution of the parameters from the exponentiated exponential by using the White Hispanic breast cancer patient’s survival data. By generating the values of the ρ1 and ρ2 from the data, the results of the posterior distribution parameters α and λ are estimated using the MCMC method. The loglikelihood function is derived from the exponentiated exponential model and then its parameter values are assigned to the appropriate theoretical probability distributions. The WinBugs software is used to obtain the summary results of the parameters. After removing the burn-in samples, the remaining samples are treated as if the samples are from the original distribution. The procedure was conducted by 50,000 Monte Carlo repetitions to produce the inference for the posterior parameters in Table 2. Figure 1 displays the graphical representation of the parameters behavior. After 50,000 Monte Carlo repetitions, it is noted that the kernel densities for both shape and scale parameters exhibit approximately symmetric distribution.

Table 3 indicates the summary results of the posterior distribution of the parameters from the exponentiated Weibull by using the White Hispanic female breast cancer patients’ survival data. By setting the generated values ρ1, ρ2 and ρ3 from the data, the results of the posterior distribution parameters α, β, and λ are estimated using the MCMC methods. The log-likelihood function is derived from the exponentiated Weibull model and then by its parameter values which are assigned to appropriate probability distributions. The WinBugs software is used to obtain the summary results of the parameters. Figure 2 displays the graphical representation of the distributions of the parameters behaviors. It is noted that the distribution of the shape parameter α deviates from symmetric distribution, and other model parameters β and λ are distributed as skewed models. The parameter ρ1 deviates from the normal; and both ρ2 and ρ3 form skewed models.

Table 4 indicates the summary results of the posterior distribution of the parameters from the beta generalized exponentiated model by using the White Hispanic female breast cancer patients’ data. The WinBugs software is used to obtain the summary results (Mean, SD, MC Error, Median, and Confidence Intervals) of the parameters. Figure 3 displays the graphical representations of the parameters for female in the case of beta generalized exponential. It is noted that the parameters λ and ρ4 from the beta generalized exponential exhibit approximate normal distribution. The other parameters tend not only non-uniform, they are often not even remotely symmetric distributions.

Table 5 indicates the summary results of the posterior distribution of the parameters from the beta inverse Weibull model by using the White Hispanic female breast cancer patients’ survival data. The WinBugs software is used to obtain the summary results (Mean, SD, MC Error, Median, and Confidence Intervals) of the parameters. Figure 4 displays the graphical representations of the parameters for female in the case of beta inverse Weibull model. It is noted that the parameters β and ρ1 from the beta inverse Weibull exhibit skewed distribution, and other parameters remains approximately uniform distributions.

Conclusion

Several statistical models were used to identify the best fit model for the White Hispanic female breast cancer patients’ survival data. In the case of goodness of fit analysis, the breast cancer survival sample for the ethnicity followed exponentiated Weibull distribution. The lowest DIC value of White Hispanic is 19423.700. In the case of EWM, Mean ± SD for α, β, and λ values are 6.338 ± 0.3607, 1.099 ± 0.01956, and 0.02083 ± 0.002238, respectively.

We determined the inference for posterior parameters given breast cancer survival model by using the Bayesian method. By using less Markov Chain errors, the inferences for the posterior parameters are reported in Tables 2-5. The dynamic kernel densities for each of the parameters are reported in Figures 1-4 so that one can observe the shape of the kernel density.

Statistical probability models are very important to describe inferences for posterior model parameters. To develop the best statistical probability model for White Hispanic, we used model selection criterions, AIC, BIC, and DIC. The summary results of the posterior parameters are reported. The results are obtained after running 50,000 Monte Carlo repetitions. The results of the posterior distribution of parameters using the breast cancer patients’ survival data will contribute a new addition to White Hispanic ethnicity. WinBugs software was used to check the goodness of fit tests, to obtain the summary results of the posterior parameters, to determine the kernel densities of the parameters, and also to carry out all related calculations.

Acknowledgements

The authors would like to thank the editor-in-chief and the referees for their valuable comments and suggestions.

References

Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Article Usage

  • Total views: 11635
  • [From(publication date):
    February-2014 - Nov 23, 2017]
  • Breakdown by view type
  • HTML page views : 7853
  • PDF downloads : 3782
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri & Aquaculture Journals

Dr. Krish

[email protected]

1-702-714-7001Extn: 9040

Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals

Ronald

[email protected]

1-702-714-7001Extn: 9042

Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001Extn: 9040

Clinical Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

Food & Nutrition Journals

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

General Science

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics & Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Materials Science Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Nursing & Health Care Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

Ann Jose

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001Extn: 9042

 
© 2008- 2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
adwords