alexa A Pass to Variable Selection | Open Access Journals
ISSN: 2155-6180
Journal of Biometrics & Biostatistics
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

A Pass to Variable Selection

Yixin Fang*

Department of Mathematical Science, New Jersey Institute of Technology, USA

*Corresponding Author:
Fang Y
Department of Mathematical Science
New Jersey Institute of Technology, USA
Tel: +1 973-596-3000
E-mail: [email protected]

Received Date: September 06, 2016; Accepted Date: September 30, 2016; Published Date: October 07, 2016

Citation: Fang Y (2016) A Pass to Variable Selection. J Biom Biostat 7: 318. doi:10.4172/2155-6180.1000318

Copyright: © 2016 Fang Y, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Biometrics & Biostatistics

Introduction

Many regularized procedures produce sparse solution and therefore are sometimes used for variable selection in linear regression. It has been showed that regularized procedures are more stable than subset selection. Such procedures include LASSO, SCAD, and adaptive LASSO, to name just a few. However, their performance depends crucially on the tuning parameter selection. For the purpose of prediction, popular methods for the tuning parameter selection include Cp, cross-validation, and generalized cross-validation. For the purpose of variable selection, the most popular method for the tuning parameter selection is BIC. The selection consistency of BIC for some regularized procedures have been shown. (Here the selection consistency means that the probability of selecting the data generating model is tending to one when the sample size goes to infinity, assuming that the data generating model is a subset of the full model.) However, knowing degrees of freedom is required in the use of BIC. For many regularized procedures, such as those for graphical models and clustering algorithms, the formulae for degrees of freedom do not exist.

Recently, stability selection has become another popular method for variable selection [1,2]. However, most methods based on stability depend on some hyper-tuning parameter explicitly. For example, the method in [1] depends on a threshold (pre-set as 0.8 in [1]) and the method in [2] depends also on a threshold (pre-set as 0.9 in [2]). Therefore, it is desirable to propose some method to avoid such hypertuning parameter in stability selection methods. One suggestion is to combine the strength of both stability selection and cross-validation. Since cross-validation is one variable selection method based on prediction, the new method is referred as the prediction and stability selection (PASS).

Prediction and Stability Selection (PASS)

Consider variable selection in linear regression, yi = xiβ + εi, i = 1,…,n. Assume β = (β1,….,βp)′ is sparse in the sense that image < p, where image. Without loss of generality, assume image= {1,…,q}. A general framework for the regularized regression is image . This framework includes LASSO, SCAD, and adaptive LASSO. If image is used to estimate image, most regularized procedures have been shown to be selection consistent with appropriate λ = λn, emphasizing its dependence on data. In general, as shown in [3], there are five cases:

Case 1: If image , then image with probability tending to one.

Case 2: If image , then image , where γ0 is fixed and its sign pattern may or may not be the same as that of β.

Case 3: If image, then image and the sign pattern of image is consistent with that of β with probability tending to one.

Case 4: If image , then the sign pattern of image is consistent with that of β on image with probability tending to one, while for all sign patterns consistent with that of β on image, the probability of obtaining this pattern is tending to a limit in (0,1).

Case 5: If image, then image and image with probability tending to one.

A good criterion should intend to select λn from case 3; selecting λn from cases 1 or 2 might lead to under-fitting while from cases 4 or 5 might lead to over-fitting. If the two degenerate cases (1 and 5) are pre-excluded, the criterion, referred to PASS, incorporates crossvalidation, which avoids under-fitting, and Kappa selection proposed in [2], which avoids over-fitting. To describe this criterion, consider any regularized procedure with λ and randomly partition the dataset {(yi,xi{(yi, xi),…,(yn,xn)} into two halves, image and image , where image. Based on Z1 and Z2 respectively, image is obtained and then submodel image is selected, k = 1,2.

If λ is from Case 4, both submodels, image, would include non-informative variables randomly. The agreement of these two submodels can be measured by Cohen’s Kappa Coefficient, image. On the other hand, if λ is from Case 2, either submodels, image, might exclude some informative variable. To avoid such under-fitting, consider cross-validation, CV(Z1,Z2; λ). Now we are ready to describe the PASS algorithm, which runs the following five steps.

Step 1: Randomly partition the original dataset into two halves, image and image.

Step 2: Based on image and image respectively, two sub-models, image and image , are selected.

Step 3: Calculate image and image.

Step 4: Repeat Steps 1-3 for B times and obtain the following ratio,

image

Step 5: Compute PASS(λ) on a grid of λ and select image.

Discussion

Following the above five cases, we can show that the proposed PASS criterion is selection consistent under some regular conditions. The new criterion has several advantages. First, it does not depend on any hyper-tuning parameter. Second, the implementation is straightforward. Third, it can be applied to variable selection in any models such as linear model, generalized linear model, and Cox’s proportional hazard model. Fourth, it can also be applied to variable selection in both supervised learning and unsupervised learning.

References

Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 8072
  • [From(publication date):
    December-2016 - Oct 21, 2017]
  • Breakdown by view type
  • HTML page views : 8001
  • PDF downloads :71
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri, Food, Aqua and Veterinary Science Journals

Dr. Krish

[email protected]

1-702-714-7001 Extn: 9040

Clinical and Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals

Ronald

[email protected]

1-702-714-7001Extn: 9042

Chemical Engineering and Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001 Extn: 9040

Earth & Environmental Sciences

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

General Science and Health care Journals

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics and Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001 Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Informatics Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Material Sciences Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Mathematics and Physics Journals

Jim Willison

[email protected]

1-702-714-7001 Extn: 9042

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001 Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

John Behannon

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001 Extn: 9042

 
© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
adwords