Reach Us +441414719275
Abilities of Statistical Models to Identify Subjects with Ghost Prognosis Factors | OMICS International | Abstract
ISSN: 2380-5439

Journal of Health Education Research & Development
Open Access

Our Group organises 3000+ Global Conferenceseries Events every year across USA, Europe & Asia with support from 1000 more scientific Societies and Publishes 700+ Open Access Journals which contains over 50000 eminent personalities, reputed scientists as editorial board members.

Open Access Journals gaining more Readers and Citations
700 Journals and 15,000,000 Readers Each Journal is getting 25,000+ Readers

This Readership is 10 times more when compared to other Subscription Journals (Source: Google Analytics)

Research Article

Abilities of Statistical Models to Identify Subjects with Ghost Prognosis Factors

Nguyen JM1,2,3*, Gaultier A1 and Antonioli D3
1SEB, CHU NANTES, 85, Rue Saint Jacques 44093 Nantes Cedex 01, France
2INSERM, UMR892, 8 quai Moncousu - BP 70721, 44007 Nantes Cedex 01, France
3HWRS, Atlanpôle, Route de Gachet, 44300 Nantes, France
Corresponding Author : Nguyen JM
Rue Saint Jacques 44093 Nantes Cedex 01, France
Tel: +33-2-40-08-33-33
E-mail: [email protected]
Received: October 22, 2015 Accepted: November 04, 2015 Published: November 06, 2015
Citation: Nguyen JM, Gaultier A, Antonioli D (2015) Abilities of Statistical Models to Identify Subjects with Ghost Prognosis Factors. J Health Edu Res Dev 3:141. doi:10.4172/2380-5439.1000141
Copyright: © 2015 Nguyen JM, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at Pubmed, Scholar Google



Many tools are available to estimate prediction quality, but none are available to assess the ability, of a predictive model to identify completely missing or unknown prognostic factors, designated as ghost factors (GFs). However, it may be possible to predict whether a subject carries a GF.


To simulate the presence of a GF, a significant prognostic factor and all variables correlated with it were removed prior to model analysis. Public datasets and simulated data were used. A predictive statistical model was developed to assess the relationship between the presence of a GF and the predictive capacity of a given model based on the correlation between predicted outcome and GF presence. Five statistical models were compared using this procedure.


After evaluating 6 real databases, the only statistical method consistently able to identify subjects with GFs was the use of optimized regression models. Using simulated, linearly correlated data, optimized regression models exhibited up to a 92% success rate, whereas conventional linear models had less than 53% success. Random forest and classification tree models had the highest success rates compared to the other evaluated models.


Model-based outcome prediction was assessed with respect to the presence of GFs. As GFs are unknown, only subjects who are carriers of significant unknown prognostic factors can be identified. As complex models outperformed linear models in identifying GF presence, we assume that the associations between GFs and outcome-predictive factors are also complex and not linear.