ISSN: 0974-276X
Journal of Proteomics & Bioinformatics
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
 
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Exploring the Interplay of Sequence and Structural Features in Determiming the Flexibility of AGC Kinase Protein Family : A Bioinformatics Approach

Amit Kumar Banerjee, Neelima Arora, Varakantham Pranitha, U.S.N.Murty*
Bioinformatics Group, Biology Division, Indian Institute of Chemical Technology, Hyderabad-500607, A.P., India.
Corresponding Author : Dr. U.S.N Murty, Deputy Director/ Scientist “F” Head,
Biology Division, Indian Institute of Chemcal Technology,
Hyderabad- 500607, India,
Tel        : +91 40 27193134,
Fax       : +91 40 27193227,
E-mail   : murty_usn@yahoo.com
Received May 02, 2008; Accepted May 15, 2008; Published May 20, 2008
Citation: Amit KB, Neelima A, Varakantham P, Murty USN (2008) Exploring the Interplay of Sequence and Structural Features in Determiming the Flexibility of AGC Kinase Protein Family : A Bioinformatics Approach. J Proteomics Bioinform 1:077-089. doi:10.4172/jpb.1000013
Copyright: © 2008 Amit KB, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at
DownloadPubmed DownloadScholar Google

Visit for more related articles at Journal of Proteomics & Bioinformatics

Abstract

In this study, data mining approach was used to generate association rules for predicting average flexibility from the various derived sequence and structural features. 21 parameters were calculated and their variable importance was calculated for 115 sequences of AGC kinase family belonging to mouse and human using Classification and Regression Tree (CART). Beta turns were found to have maximum influence on average flexibility while the total beta strands were found to exert minimum impact on average flexibility. Understanding the variable importance will prove useful as a simple pr edictor of flexibility from an amino acid sequence. This will aid in better understanding of phenomenon underlying the average flexibility and thus, will pave a way for rational design of therapeutics.

Keywords
AGC kinase; Protein flexibility; Data mining; Classification and regression tree (CART); Bioinformatics
Introduction
Every biological molecule is characterized and set apart from other biomolecules by a definite set of inherent intrinsic properties. Being the determinant of some vital functions like transport of metabolites (Anderson et al., 1990; Spurlino et al., 1991), catalysis (Bennett and Steitz, 1978; Remington et al., 1982) and regulation of protein activity (Perutz, 1970; Perutz, 1989) etc, average flexibility holds prime importance. Eukaryotic proteins demonstrate higher flexibility which influence conformational ability required in important biological processes like molecular recognition, interaction, assembly and modification. Moreover, protein flexibility is also known to influence stability and folding. There has been a sudden spur of interest in studies related to flexibility of proteins owing to discovery of role of some highly flexible proteins with implications in life threatening diseases like AIDS (HIV gp41) and scrapie (Chan et al., 1997). A comprehensive knowledge of fundamental nature of average flexibility will facilitate the unraveling of structurefunction relationship and will also aid in development of novel therapeutics (Teague, 2003).
AGC protein kinase family, one among the eight ePK families defined in the Kinbase, includes many important enzymes such as cyclic nucleotide and calcium-phospholipid dependent kinases, ribosomal S6-phosphorylating kinases, G protein-coupled kinases, and few others. The AGC serine threonine kinases, known for phosphorylating sites surrounded by basic amino acids, are involved in many intra–cellular signaling pathways, critical cellular processes and control cell growth, differentiation and cell survival. Their crucial role in transmembrane signaling process hints on the importance of features of AGC kinases which may be responsible for membrane localization (Peterson and Schreiber, 1999). This group of protein kinases shares similarity within the catalytic domain and is characterized by similar mechanism of activation. Deregulation of AGC kinases is known to have implications in several diseases like Cancer, Diabetes, neurodegeneration, and thus, AGC kinases represent several attractive targets for small inhibitors of therapeutic significance (Breitenlechner et al., 2003).
Their stringent spatio-temporal regulation is attained through loop phosphorylation and repositioning of the key catalytic and substrate binding regions which indicates the importance of flexibility in these proteins (Kannan et al., 2007). There is preponderance of literature on flexibility of proteins but elucidating the effect of parameters influencing it is cumbersome. This study aims at exploring the importance of different parameters influencing the average flexibility of AGC kinase family using data mining approach.
Materials and Methods
Sequence Collection and Pre-Processing
Protein sequences of the enzymes belonging to AGC family of protein kinase super family in FASTA format were collected from the non redundant (NR) protein database of NCBI (http://www.ncbi.nlm.nih.gov). Partial sequences were excluded from the study and sequences were again put to manual filtering so as to minimize the redundancy. This approach resulted in 600 sequences from the total 1259 sequences of AGC family available in the database were obtained. Out of these, sequences belong ing to Homo sapiens (59) and Mus musculus (56) were considered for this study.
Rules derived from CART can be interpreted in simple context of “If “and “Then” based statement and thus are self-explanatory. For example: Rule 1 can be interpreted as
Rule 1: IF “BULKINESS <= 14.2207” & “ALPHA -HELIX <= 1.01975” &” A.A COMPOSITION <= 5.55”, THEN “AVERAGE FLEXIBILITY=0.457”.
Rule 14: IF “RECOGNITION FACTORS<= 89.4723” &“TRANSMEMBRANE TENDENCY<= -54225” & “ALPHA -HELIX > 1.01975” & “TOTAL BETA-STRAND> 0.95975&<= 1.018” & “A.A. Composition<= 6.0055” & “RELATIVE MUTABILITY<= 80.0835”, THEN “AVERAGE FLEXIBILITY= 0.436563”
Variable Importance
Importance of different variables was calculated based on predefined scores in CART and summarized in Table 4.
Discussion
Dynamic nature of proteins, conferred by their structural flexibility, is associated with function. Average flexibility, an innate property of proteins is being recognized with implications in many important physiological processes recently (Wright and Dyson 1999; Bright et al., 2001; Dunker et al., 2001; Namba, 2001). Recognition of several highly flexibile proteins in some pathological conditions have led to the momentum in studies related to the flexibility of proteins. The huge gap in number of sequence and structures in PDB limits the utilization of 3-dimensional structure for deriving features affecting flexibility like Bfactors. In unavailability of such data, sequence composition and secondary structure provides a rough estimation of structural properties. This warrants the need for an alternate and simplistic approach for determining the effect of various parameters on average flexibility in an easy to understand quantitative relationship. Data mining approaches based on decision tree based methods have been successfully exploited in elucidating importance of features affecting important biological processes (Banerjee et al., 2007). CART has been exploited in microarray studies (Boulesteix et al., 2003), ecological studies (De’ath and Fabricius, 2000), risk prediction (Gottschalk et al., 1998), diseases diagnosis (Hermanek and Holzmann, 1994) and social studies (Özge et al., 2004).
The dataset comprising of various derived features was used to elucidate decision rules by CART that can serve as rule of thumb for finding the effect of different parameters on average flexibility, which is virtually impossible to calculate in a lab simultaneously using conventional approaches. Among the secondary structure features, beta turn, alpha helix, coil, parallel beta strand, beta sheet and total beta strands were found to influence the average flexibility in descending order. Among sequence features, % accessible residues, trans-membrane tendency, amino acid composition, bulkiness, recognition factors, molecular weight, polarity, hydrophobicity, average area buried, refractivity, no. of codons, % buried residues, and relative mutability were observed to affect the average flexibility in decreasing order(Table 4). Beta turns were found to have maximum impact while total beta strand were found to have minimum effect on average flexibility of the proteins considered in the study. As more and more studies are advocating the inclusion of protein flexibility in docking algorithms, it will be interesting to gain an insight on features influencing the flexibility of proteins. It is speculated that an extensive knowledge of protein flexibility and the various parameters contributing towards is important for rational drug design. Such an approach will lead to better understanding of underlying biological phenomena and aid in enzyme engineering processes.
Acknowledgements
Authors thank Dr. J.S.Yadav, Director, IICT for his continuous support and encouragement. We thank anonymous reviewers for their critical suggestions for the improvement of the manuscript.
References

  1. Anderson BF, Baker HM, Morris GE, Rumball SV, Baker EN (1990) Apolactoferrin structure demonstrates ligand-induced conformational change in transferrins. Nature 344: 784–787. »  CrossRef  »  PubMed  »  Google Scholar

  2. Banerjee AK, Arora N, Murty USN (2007) Stability of ITS2 Secondary Structure in Anopheles: What Lies Beneath? International Journal of Integrative Biology 3: 232-238.  »  CrossRef  »   Google Scholar

  3. Bennett WS Jr, Steitz TA (1978) Glucose-induced conformational change in yeast hexokinase. Proc Natl Acad Sci USA 75: 4848–4852. »  CrossRef  »  PubMed  »  Google Scholar

  4. Bhaskaran R, Ponnuswamy PK (1988) Positional flexibilities of amino. acid residues in globular proteins. Int J Pept Prot Res 32: 242-255.  »  CrossRef  »  Google Scholar

  5. Boulesteix AL, Tutz G, Strimmer K (2003) A CART-based approach to discover emerging patterns in microarray data. Bioinformatics 19: 2465-2472. »  CrossRef  »  PubMed  »  Google Scholar

  6. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall New York NY.

  7. Breitenlechner C, Gaßel M, Engh R, Bossemeyer D (2003) Structural Insights Into AGC Kinase Inhibition. Oncology Research Featuring Preclinical and Clinical Cancer Therapeutics. 14: 267-278.

  8. Bright JN, Woolf TB, Hoh JH (2001) Predicting properties of intrinsically unstructured proteins. Prog Biophys Mol Biol 76: 131–173. »  CrossRef  »  PubMed  »  Google Scholar

  9. Chan DC, Fass D, Berger JM, Kim PS (1997) Core structure of gp41 from the HIV envelope glycoprotein. Cell 89: 263– 273. »  CrossRef  »  PubMed

  10. Chou PY, Fasman GD (1978) Prediction of the secondary structure of proteins from their amino acid sequence. Adv Enzymol Relat Areas Mol Biol 47: 45-148.  »  PubMed  »  Google Scholar

  11. Dayhoff MO, Schwartz RM., Orcutt BC (1978) A model of evolutionary change in protein; in: M.O. Dayhoff (Ed.), Atlas of Protein Sequence and Structure, Nat. Biomed. Res. Foundation Washington DC 5 Suppl 3: 345–352.

  12. Death G , Fabricius KE (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis, Ecology 81: 3178– 3192.  »  CrossRef   »  Google Scholar

  13. Deléage, Roux (1987) An algorithm for protein secondary structure prediction based on class prediction. Protein Engineering Design and Selection 1: 289-294.  »  CrossRef  »  PubMed  »  Google Scholar

  14. Dunker AK, Lawson DJ, Brown CJ, Williams RM, Romero P, et al. (2001) Intrinsically disordered protein. J Mol Graph Model 19: 26–59. »  CrossRef  »  PubMed  »  Google Scholar

  15. Fraga S (1982) Theoretical prediction of protein. antigenic determinants from amino acid sequences. Can J Chem 60: 2606- 2610.  »  CrossRef   »  Google Scholar

  16. Gottschalk KW, Colbert JJ, Feicht DL (1998) Tree mortality risk of oak due to gypsy moth. European Journal of Forest Pathology 28: 121-132.  »  CrossRef   »  Google Scholar

  17. Hermanek P, Guggenmoos-Holzmann I (1994) Classification and regression trees (CART) for estimation of prognosis in patients with gastric carcinoma. J Cancer Res Clin Oncol 120: 309–313. »  CrossRef  »  PubMed  »  Google Scholar

  18. Joël Janin (1979) Surface and inside volumes in globular proteins. Nature 277: 491 – 492.  »  CrossRef  »  PubMed  »  Google Scholar

  19. Jones DD (1975) Amino acid properties and side-chain orientation in proteins: a cross correlation appraoch. J Theor Biol 50: 167-83.  »  PubMed  »  Google Scholar

  20. Kannan N, Haste N, Taylor SS, Neuwald AF (2007) The hallmark of AGC kinase functional divergence is its C-terminal tail, a cis-acting regulatory module. Proc Natl Acad Sci USA.104: 1272–1277. »  CrossRef  »  PubMed  »  Google Scholar

  21. Kyte J, Doolittle RF (1982) A simple method for displaying the hydrophobic character of a protein. J Mol Biol 157: 105-132.  »  CrossRef  »  PubMed  »  Google Scholar

  22. Lifson S, Sander C (1979) Antiparallel and parallel - strands differ in amino acid residue preferences. Nature 282: 109- 111. »  CrossRef  »  PubMed  »  Google Scholar

  23. McCaldon P, Argo P (1988) Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences. Proteins: Structure Function and Genetics 4: 99-122. »  CrossRef  »  PubMed  »  Google Scholar

  24. Namba K (2001) Roles of partially unfolded conformations in macromolecular self-assembly. Gene Cells 6: 1–12.  »  CrossRef  »  PubMed  »  Google Scholar

  25. Özge C, Toros F, Bayramkaya E, Çamdeviren H, Sasmaz T (2006) Which sociodemographic factors are important on smoking behaviour of high school students? The contribution of classification and regression tree methodology in a broad epidemiological survey. Postgraduate Medical Journal 82: 532-541. »  CrossRef  »  PubMed  »  Google Scholar

  26. Parker PJ, Parkinson SJ (2001) AGC protein kinase phosphorylation and protein kinase C. Biochemical Society Transactions 29: 860-863. »  CrossRef  »  PubMed  »  Google Scholar

  27. Perutz MF (1989) Mechanisms of cooperativity and allosteric regulation in proteins. Q Rev Biophys 22: 139–237. »  PubMed  »  Google Scholar

  28. Perutz MF (1970) Stereochemistry of cooperative effects in haemoglobin. Nature 228: 726–739. »  CrossRef  »  PubMed  »  Google Scholar

  29. Peterson RT, Schreiber SL (1999) Kinase phosphorylation: Keeping it all in the family. Curr Biol 9: R521-4. »  CrossRef  »  PubMed  »  Google Scholar

  30. Remington S, Wiegand G, Huber R (1982) Crystallographic refinement and atomic models of two different forms of citrate synthase at 2.7 and 1.7 Å resolution J Mol Biol 158: 111–152. »  CrossRef  »  PubMed  »  Google Scholar

  31. Rose GD, Geselowitz AR, Lesser GJ, Lee RH, Zehfus MH (1985) Hydrophobicity of amino acid residues in globular proteins. Science 229: 834-838. »  CrossRef  »  PubMed  »  Google Scholar

  32. Spurlino JC, Lu GY, Quiocho FA (1991) The 2.3-Å resolution structure of the maltose- or maltodextrin-binding protein, a primary receptor of bacterial active transport and chemotaxis. J Biol Chem 266: 5202–5219. »  CrossRef  »  PubMed  »  Google Scholar

  33. Teague SJ (2003) Implications of protein flexibility for drug discovery. Nat Rev Drug Discov 2: 527-41. »  CrossRef  »  PubMed  »  Google Scholar

  34. Wright PE, Dyson HJ (1999) Intrinsically Unstructured Proteins: Re-assessing the Protein Structure-Function Paradigm. J Mol Biol 293: 321–331. »  CrossRef  »  PubMed  »  Google Scholar

  35. Zhao G, London E (2006) An amino acid “transmembrane tendency” scale that approaches the theoretical limit to accuracy for prediction of transmembrane helices: Relationship to biological hydrophobicity. Protein Sci 15: 1987-2001. »  CrossRef  »  PubMed  »  Google Scholar

  36. Zimmerman JM, Naomi E, Simha R (1968) The characterization of amino acid sequences in proteins by statistical methods. Journal of Theoretical Biology 21: 170- 201. »  CrossRef  »  PubMed  »  Google Scholar

Accession numbers of the considered AGC kinase protein sequences are as follows
O70291.1, POC605.1, P16054.1, P18654.2, P23298.1, P31750.1, P54265.1, P68181.2,
P70268.3, P70336.1, Q3UU96.2, O70293.1, P05132.3, P18653.1, P20444.3, P28867.3,
P49025.3, P63318.1, P68404.3, P70335.1, Q3U214.2, Q3UYH7.1, Q7TPS0.2,
Q7TSE6.1, Q7TSJ6.1, Q7TT50.1, Q8BSK8.1, Q8BWW9.2, Q8BYR2.2, Q8C0P0.1,
Q8C050.2, Q8K045.1, Q8VEB1.2, Q9ERE3.1, Q9QZS5.1, Q9R1L5.3,
Q9WUA6.1,Q9WUT3.1, Q9WVC6.1, Q9WVL4.1, Q9Z0Z0.1, Q9Z1M4.1, Q9Z2A0.2,
Q9Z2B9.1, Q8OUW5.2, Q91VJ4.1, Q99MK8.2, Q811L6.2, Q922R0.1, Q02111.1,
Q02956.1, Q60592.1, Q60823.1, Q61410.1, Q62074.2, P41743.1, P43250.2, P51812.1,
P51817.1, Q02156.1, Q16513.1, Q16512.1, Q15835.1, Q15418.2, Q15349.2, Q15208.1,
Q13976.3, Q13464.1, Q13237.1, CAE55958.1, NP_443073.1, O00141.2, O14578.2,
O15021.2, O15530.1, O60307.2, O75116.3, O75582.1, O75676.1, O95835.1, P05129.3,
P05771.4, P14619.1, P17252.3, P17612.2, P22612.3, P22694.2, P23443.2, P24256.1,
P24723.2, P25098.2, P31749.2, P31751.2, P32298.3, P34947.1, P35626.2, Q09013.1,
Q05655.1, Q05513.4, Q04759.3, Q96GX5.1, Q96BR1.1, Q9Y243.1, Q9Y5S2.2,
Q9Y2H9.2, Q9Y2H1.3, Q9UK32.1, Q9UBS0.1, Q9NRM7.1, Q9HBY8.1, Q8WTQ7.1,
Q6P5Z2.1, Q6P0Q8.2, Q6DT37.1, Q5VT25.1.
Select your language of interest to view the total content in your interested language
 
Share This Article
   
 
   
 
Relevant Topics
Disc Applications of Bioinformatics
Disc Bacterial transcriptome
Disc Bioinformatics Algorithms
Disc Bioinformatics Databases
Disc Bioinformatics Tools
Disc Cancer Pharmacogenomics
Disc Cancer Proteomics
Disc Clinical Pharmacogenomics
Disc Clinical Proteomics
Disc Cluster analysis
Disc Comparative genomics
Disc Comparative proteomics
Disc Comparative transcriptomics
Disc Computational drug design
Disc Current Proteomics
Disc Data algorithms
Disc Data mining applications in genomics
Disc Data mining applications in proteomics
Disc Data mining in drug discovery
Disc Data mining tools
Disc Data modelling and intellegence
Disc Data warehousing
Disc Drug Dosage Formulations
Disc Drug Toxicity and Efficacy
Disc Epigenetics
Disc Epigenomic studies
Disc Gene Expression profiling
Disc Gene polymorphism
Disc Genome annotation
Disc Genomic Targets
Disc Genomic data mining
Disc Genomic data warehousing
Disc Glycome
Disc Human Proteome Project Applications
Disc Immune Disorders
Disc Individualized Medicine
Disc Mapping of genomes
Disc Mass Spectrometry in Proteomics
Disc Meta genomics
Disc Metabolome
Disc Microarray
Disc Microarray Proteomics
Disc Molecular and Cellular Proteomics
Disc Mouse transcriptome
Disc Non coding MRNA
Disc Personalized Medicine Studies
Disc Pharmacoeconomics in Drug Development
Disc Pharmacogenetics
Disc Pharmacogenomic Biomarker
Disc Pharmacogenomics Applications
Disc Pharmacogenomics Future Medicine
Disc Pharmacogenomics and Personalized Medicine
Disc Pharmacogenomics for Patient Care
Disc Pharmacoproteomics in Drug development
Disc Profiling
Disc Protein Sequence Analysis
Disc Protein engineering
Disc Proteogenomics
Disc Proteome
Disc Proteome Profiling
Disc Proteomic Analysis
Disc Proteomic Biomarkers
Disc Proteomics Clinical Applications
Disc Proteomics Research
Disc Proteomics Science
Disc Proteomics and Pharmacodynamics
Disc Proteomics data warehousing
Disc Python for Bioinformatics
Disc Quantitative Proteomics
Disc RNA sequencing
Disc RNA sequencing and analysis
Disc Sequencing
Disc Small RNA Sequencing
Disc Statistical data mining
Disc Transcripotme
Disc Transcriptional Attenuation
Disc Transcriptional Regulation
Disc Transcriptome analysis
Disc Translational Medicine
 
Recommended Journals
Disc Transcriptomics Journal
Disc Pharmacogenomics Journal
Disc Data Mining Journal
  View More»
 
Recommended Conferences
Disc 6th Bioinformatics Conference
August 22-23, 2016 Philadelphia, Pennsylvania, USA
Disc 7th International Conference and Expo on Proteomics
October 24-26, 2016 Rome, Italy
View More»
 
Article Tools
Disc Export citation
Disc Share/Blog this article
 
Article usage
  Total views: 11168
  [From(publication date):
May-2008 - Jun 27, 2016]
  Breakdown by view type
  HTML page views : 7434
  PDF downloads :3734
 
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

 
OMICS International Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
 
 
OMICS International Conferences 2016-17
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings
 
 

Contact Us

Agri, Food, Aqua and Veterinary Science Journals

Dr. Krish

agrifoodaquavet@omicsinc.com

1-702-714-7001 Extn: 9040

Clinical and Biochemistry Journals

Datta A

clinical_biochem@omicsinc.com

1-702-714-7001Extn: 9037

Business & Management Journals

Ronald

business@omicsinc.com

1-702-714-7001Extn: 9042

Chemical Engineering and Chemistry Journals

Gabriel Shaw

chemicaleng_chemistry@omicsinc.com

1-702-714-7001 Extn: 9040

Earth & Environmental Sciences

Katie Wilson

environmentalsci@omicsinc.com

1-702-714-7001Extn: 9042

Engineering Journals

James Franklin

engineering@omicsinc.com

1-702-714-7001Extn: 9042

General Science and Health care Journals

Andrea Jason

generalsci_healthcare@omicsinc.com

1-702-714-7001Extn: 9043

Genetics and Molecular Biology Journals

Anna Melissa

genetics_molbio@omicsinc.com

1-702-714-7001 Extn: 9006

Immunology & Microbiology Journals

David Gorantl

immuno_microbio@omicsinc.com

1-702-714-7001Extn: 9014

Informatics Journals

Stephanie Skinner

omics@omicsinc.com

1-702-714-7001Extn: 9039

Material Sciences Journals

Rachle Green

materialsci@omicsinc.com

1-702-714-7001Extn: 9039

Mathematics and Physics Journals

Jim Willison

mathematics_physics@omicsinc.com

1-702-714-7001 Extn: 9042

Medical Journals

Nimmi Anna

medical@omicsinc.com

1-702-714-7001 Extn: 9038

Neuroscience & Psychology Journals

Nathan T

neuro_psychology@omicsinc.com

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

John Behannon

pharma@omicsinc.com

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

social_politicalsci@omicsinc.com

1-702-714-7001 Extn: 9042

 
© 2008-2016 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version