NRPred-FS: A Feature Selection based Two-level Predictor for Nuclear Receptors
- *Corresponding Author:
- Xuan Xiao
Jing-De-Zhen Ceramic Institute
Jing-De-Zhen 333403, China
E-mail: [email protected]
Received Date: January 21, 2014; Accepted Date: February 24, 2014; Published Date: February 28, 2014
Citation: Wang P, Xiao X (2014) NRPred-FS: A Feature Selection based Two-level Predictor for Nuclear Receptors. J Proteomics Bioinform S9:002. doi:10.4172/jpb.S9-002
Copyright: © 2014 Wang P, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Motivation: Nuclear receptors (NRs) play a role in all developmental and physiological processes and are important drug targets in a wide variety of disease and healthy states. In the past years, to identify NRs and their subfamilies with high throughput and low-cost, many machine learning methods have been introduced. However, these predictors are all developed based on old dataset in the NucleaRDB, what’s more, no feature selection technique is employed, so that the performances are very limited.
Result: In this study, a feature selection based two-level predictor, called NRPred-FS, is developed that can be used to identify a query protein as a nuclear receptor or not based on its sequence information alone, if it is, the prediction will be automatically continued to further identify it among the following eight subfamilies: (1) Thyroid hormone like (NR1), (2) HNF4-like (NR2), (3) Estrogen like, (4) Nerve growth factor IB-like (NR4), (5) Fushi tarazu-F1 like (NR5), (6) Germ cell nuclear factor like (NR6), (7) knirps like (NR0A), and (8) DAX like (NR0B). The nuclear receptor sequences are encoded as sequence-derived feature vectors formed by incorporating various physicochemical and statistical features. Furthermore, the features set are optimized by forward feature selection algorithm for reducing the feature dimensions and for getting higher classifying accuracy. As a demonstration, this method gone through rigorous testing on a benchmark datasets derived from the latest version of NucleaRDB and UniProt. The overall prediction accuracies of leave-one-out cross-validation were about 97% and 93% in the first and second level respectively. As a convenience to the users, the powerful predictor, NRPred-FS, is freely accessible at https://www.jci-bioinfo.cn/NRPred-FS. Hopefully it will be a useful vehicle for identifying NRs and their subfamilies.