ISSN: 0974-276X
Journal of Proteomics & Bioinformatics
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
 
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Predicting secondary structure of Oxidoreductase protein family using Bayesian Regularization Feed-forward Backpropagation ANN Technique

Brijesh Singh Yadav2*, Mayank Pokhariyal1, Barkha Ratta1, Gaurava Rai1, Meeta Saxena1, Bhaskar Sharma1 and K.P.Mishra2
1Division of Biochemistry, Indian Veterinary Research Institute, Izatnagar, Barelly, India 243122.
2Nehru Gram Bharti University, Allahabad,India 221505.
Corresponding Author : Brijesh Singh Yadav,
Nehru Gram Bharti University,
Allahabad, India 221505,
Email: brijeshbioinfo@gmail.com
Received April 09, 2010; Accepted May 19, 2010; Published May 19, 2010
Citation: Yadav BS, Pokhariyal M, Ratta B, Rai G, Saxena M, et al. (2010) Predicting Secondary Structure of Oxidoreductase Protein Family Using Bayesian Regularization Feed-forward Backpropagation ANN Technique. J Proteomics Bioinform 3: 179-182. doi:10.4172/jpb.1000137
Copyright: © 2010 Yadav BS, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at
DownloadPubmed DownloadScholar Google

Visit for more related articles at Journal of Proteomics & Bioinformatics

Keywords
Neural network; Human oxidoreductase; Protein; Secondary structure
Introduction
The most important level of protein structure is the secondary structure which is mainly composed of alpha helices, beta strands and coils, which are formed from local sequences of amino acids (Branden and John, 1991). Knowing a protein’s secondary structure helps to determine the structural properties of that protein. Several methods have been developed to determine secondary structure, with varying accuracy. One method involves analyzing the X-ray diffraction patterns of crystallized proteins. While X-ray diffraction is rather time-consuming, it is extremely accurate (Qian and Sejnowski, 1988). Another method, structure homology, or threading, utilizes an amino acid sequence with a known secondary structure as a model to predict the secondary structure of another similar sequence (Holley and Karplus, 1989). Various theoretical algorithms with high accuracy have also been proposed. Two of the most prominent are the DSSP and Chou-Fasman algorithms. The first algorithm determines secondary structure through knowledge obtained from the three dimensional protein structure, such as hydrogen bonds and various geometrical features (Kabsch and Sander, 1983). On the other hand, the Chou-Fasman algorithm predicts secondary structure by using many empirically determined rules in addition to information concerning the primary sequence (Chou and Fasman, 1974). 
A more recent and interesting approach to secondary structure prediction has been the use of neural networks, which have been found to have respectable accuracy (Qian and Sejnowski, 1988). These information-processing systems consist of a large number of simple interconnected processing units that operate in parallel. All of these units are found in three different types of layers in the network: the input layer, the output layer, and in some cases, the hidden layers in between the input and output layers (Laurence, 1994). Each unit has an internal activation state which fluctuates according to the unit’s input: excitatory input increases the activation, while inhibitory input decreases the activation (Khanna, 1990). By changing the activation of the units, neural networks are capable of learning by assimilating past inputs into the activation of each unit (Rumelhart and McClelland, 1986). This capability has led to numerous applications in areas such as signal processing and pattern and speech recognition (Laurence, 1994). This study was aimed to develop an improved fully-automated method for the prediction of physiochemical properties of catalytic residues of structural protein of PDB using a carefully selected and supervised Machine learning Backpropagation algorithm coupled with an optimal discriminative set of structural protein properties. This study helps in denovo prediction of properties of functional sites of proteins (Yadav et al., 2009). The prediction of ß-turns is an important element of protein secondary structure prediction. Recently, a highly accurate neural network based method Betatpred2 has been developed for predicting β-turns in proteins using positionspecific scoring matrices (PSSM) generated by PSI-BLAST and secondary structure information predicted by PSIPRED (Harpreet and Raghava, 2004). In this paper we evaluate the effects of an imbalanced data set in training and learning of neural networks when they are applied to predict protein secondary structure. For this we applied resampling methods to tackle the imbalance class problem. Results show that imbalanced data sets decrease the helixes predictions rates.  Although, protein data set distribution does not affect significantly the global accuracy (Q3) (Palodeto et al., 2009).
An artificial neural network (ANN) solution is described for the recognition of domains in protein sequences. A querysequence is first compared to a reference database of domain sequences byuse of BLAST and the output data, encoded in the form of six parameters, are forwarded to feed-forward artificial neural networks with six input and six hidden units with sigmoidal transfer function. The recognition is based on the distribution of BLAST scores precomputed for the known domain groups in a database versus database comparison (Murvai et al., 2001). 
In this paper, an attempt has been made to improve the predictive capabilities of neural networks for protein secondary structure with the use of predictions made by the DSSP and Chou- Fasman algorithms. With this intent, we evaluate the accuracy that the network achieves while using information obtained from either the DSSP or Chou-Fasman algorithms, in determining whether or not the secondary structure prediction of a given amino acid sequence is valid. The architecture of the network, which was constructed using the Bayesian Regularization Backpropagation Function of MATLAB 7.0, is described and the results produced by the network are discussed and statistically analyzed (Figure 1). 
Materials and Methods
Network design
The neural network that was used in this investigation consists of a twenty five-unit input layer and three-unit output layer. No hidden layers were incorporated into the network due to the conclusion of Qian and Sejnowski (1988) that the peak performance of their network in determining protein secondary structure was nearly independent of the number of hidden units. Furthermore, the network utilizes a feed-forward design, in which signals are transferred forward from the input units to the output unit (Kneller et al., 1990). 
The twenty five units in the input layer encode a window of twenty five residues of an amino acid sequence, composed of twelve residues on either side of the central residue. The output unit represents the prediction made by the neural network as to whether the central residue represents alpha helix, beta sheet or coil. 
The activation state of each unit, Xi, is a real value between 0 and 1. The strength of the connection, or weight, between a unit j and another unit i is represented by a real number Wij . The activation of a unit can be calculated by summing the products of every unit’s output Yj and weight Wij and then adding a bias term, bj
Having calculated the activation Xi for a unit, the output of that unit, Yi can be computed using the logistic sigmoid function 
and then propagated to the next layer of the neural network.
During each cycle, the inputs are presented to the network. The weights of the units are adjusted at the end of the cycle, and this procedure is repeated. Back-propagation, a type of learning algorithm, is used to optimize the adjustment of the weights. This form of supervised training, in which the desired output is presented to the network along with the inputs (Laurence, 1994), was used to train the neural network as shown in Figure 2. 
Network training and testing sets
To train and test the neural network, the amino acid sequences of Human Oxidoreductase protein, were obtained from the Protein Data Bank (PDB) at Brookhaven National Laboratory (Bernstein et al., 1977) as shown in Table 1. 
The creation of input patterns for propagation through the network was accomplished by MATLAB program. The program first parses the secondary structure predictions and identifies the residues that have different predictions, of which one of the following: alpha helix, beta sheet, or coil. Next, it takes each identified residue, the twelve residues on either side of it and converts those twelve values using a numerical encoding scheme into a format understandable by the network. Table 2 show the schemes that were used to convert the amino acids into numerical formats. 
Table 3 show the schemes that were used to convert the secondary structure predictions into numerical formats.
At this point, we have a comprehensive set of input patterns for a particular protein. Finally, the program creates input patterns by selecting an element for each of the twenty five inputs from all of the elements with the same input positions in the input patterns that have just been determined. 
The training and testing sets were compiled on Human Transferase family proteins. 
Results and Data
During each of the tests, the neural network was trained for maximum of one hundred ephocs using the appropriate training set before the predictions were made for the testing set. A typical training set contains over n windows per protein chains comprising of more than n x l training patterns in total, where ‘l’ is the length of the sequence. A typical architecture is a fully-connected network (25 inputs, 3 outputs). The prediction is determined by the strongest of three network outputs. For example, the output (-1.83e-15, -1.07e- 16, 1) is taken to be a Coil prediction. The results of the tests are shown in Table 4. 
To measure the performance of neural network, the correlation coefficient for each target class has been calculated as follows: 
where, Ch = Correlation coefficient for helix
p = patterns correctly assigned to helix
n = patterns correctly assigned to non –helix
o = patterns incorrectly assigned to helix
u = patterns incorrectly assigned to not – helix
The correlation coefficients for helix (Ch) was found to be 0.54 for strand (Ce) it was 0.83 and for coil (Cc) it was 0.42. 
Discussion
Perfect prediction of protein secondary structures is probably impossible for a variety of reasons including the fact that a conformation may also depend on other environmental variables, related to solvent, acidity, hydrophobicity, hydrophilicity and so forth. It is however comforting to observe that steady progress is being made in this area, with an increasing number of secondary structures being predicted in the structural databases, and steady improvement of classification and machine learning methods. Here, neural network architecture has been developed that predicts secondary structure of protein with a performance of almost 79% correct prediction. This neural network require a smaller training time compared to fully connected networks with the same number of units. The analysis of the results have demonstrated that the development of a better multi-expert system architecture with different representation schemes can yield a better and more promising solution. 
Work is also under way to improve the sequence-to-structure prediction produced by the neural network and can be applied to other families of protein. 
References
  1. Branden C, John T (1991) Introduction to Protein Structure. Garland Publishing, New York 11-31.

  2. Bernstein FC, Koetzle TF, Williams GJB, Meyer EF Jr., Brice MD, et al. (1977) The Protein Data Bank: A Computer-based Archival File for Macromolecular Structures. J Mol Biol 80: 319-24 » CrossRef » PubMed » Google Scholar

  3. Chou PY, Fasman GD (1974) Prediction of Protein Conformation. Biochemistry 13: 222-245.» PubMed » Google Scholar

  4. Harpreet K, Raghava GPS (2004) A neural network method for prediction of β-turn types in proteins using evolutionary information. Bioinformatics 20: 2751-8. » CrossRef » PubMed » Google Scholar

  5. Holley LH, Karplus M (1989) Protein Secondary Structure Prediction With a Neural Network. Proc Natl Acad Sci USA 86: 152-156.» CrossRef » PubMed » Google Scholar

  6. Kabsch W, Sander C (1983) Dictionary of Secondary Structure Pattern Recognition of Hydrogen-Bonded and Geometrical Features. Biopolymers 22: 2577-2637.» CrossRef » PubMed » Google Scholar

  7. Khanna T (1990) Foundations of Neural Networks. Addison-wesley Book Express USA.» CrossRef »Google Scholar

  8. Kneller DG, Cohen FE, Langridge R (1990) Improvements in Protein Secondary Structure Prediction by an Enhanced Neural Network. J Mol Biol 214: 171-182.» CrossRef » PubMed » Google Scholar

  9. Laurence V (1994) Fundamentals of Neural Networks: Architectures, Algorithms, and Applications. Prentice-Hall, Englewood Cliffs, NJ.» CrossRef » Google Scholar

  10. Murvai J, Vlahovicek K, Szepesvári C, Pongor S (2001) Prediction of Protein Functional Domainsfrom Sequences Using Artifi cial Neural Networks. Genome Res 11: 1410-7.» CrossRef » PubMed » Google Scholar

  11. Palodeto V, Terenzi H, Marques JLB (2009) Training Neural Networks for Protein Secondary Structure Prediction: The Effects of Imbalanced Data Set. Springer Berlin Heidelberg 5755/2009: 258-265.» CrossRef » Google Scholar

  12. Qian N, Sejnowski TJ (1988) Predicting the Secondary Structure of Globular Proteins Using Neural Network Models. J Mol Biol 202: 865-884.» CrossRef » PubMed » Google Scholar

  13. Rumelhart DE, McClelland JL (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, Cambridge, MA 1: 45-76. » Google Scholar

  14. Yadav BS, Gupta S, Mishra KP (2009) Prediction of Biochemical Properties of Protein Active Site Residues with ANN Classifier. Journal of Scholar Research Library 1: 8-17.» CrossRef »Google Scholar
 
Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 11208
  • [From(publication date):
    May-2010 - Jul 27, 2016]
  • Breakdown by view type
  • HTML page views : 7475
  • PDF downloads :3733
 
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

OMICS International Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
 
 
OMICS International Conferences 2016-17
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings
 
 

Contact Us

Agri, Food, Aqua and Veterinary Science Journals

Dr. Krish

agrifoodaquavet@omicsinc.com

1-702-714-7001 Extn: 9040

Clinical and Biochemistry Journals

Datta A

clinical_biochem@omicsinc.com

1-702-714-7001Extn: 9037

Business & Management Journals

Ronald

business@omicsinc.com

1-702-714-7001Extn: 9042

Chemical Engineering and Chemistry Journals

Gabriel Shaw

chemicaleng_chemistry@omicsinc.com

1-702-714-7001 Extn: 9040

Earth & Environmental Sciences

Katie Wilson

environmentalsci@omicsinc.com

1-702-714-7001Extn: 9042

Engineering Journals

James Franklin

engineering@omicsinc.com

1-702-714-7001Extn: 9042

General Science and Health care Journals

Andrea Jason

generalsci_healthcare@omicsinc.com

1-702-714-7001Extn: 9043

Genetics and Molecular Biology Journals

Anna Melissa

genetics_molbio@omicsinc.com

1-702-714-7001 Extn: 9006

Immunology & Microbiology Journals

David Gorantl

immuno_microbio@omicsinc.com

1-702-714-7001Extn: 9014

Informatics Journals

Stephanie Skinner

omics@omicsinc.com

1-702-714-7001Extn: 9039

Material Sciences Journals

Rachle Green

materialsci@omicsinc.com

1-702-714-7001Extn: 9039

Mathematics and Physics Journals

Jim Willison

mathematics_physics@omicsinc.com

1-702-714-7001 Extn: 9042

Medical Journals

Nimmi Anna

medical@omicsinc.com

1-702-714-7001 Extn: 9038

Neuroscience & Psychology Journals

Nathan T

neuro_psychology@omicsinc.com

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

John Behannon

pharma@omicsinc.com

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

social_politicalsci@omicsinc.com

1-702-714-7001 Extn: 9042

 
© 2008-2016 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version