Received date: April 15, 2014; Accepted date: April 18, 2014; Published date: April 20, 2014
Citation: Rakesh M, Lavanya R, Kishore B (2014) Prediction of Genome and Protein Structures of Homo Sapiens Neanderthalensis Using NCBI Tools. J Data Mining Genomics Proteomics R1:001. doi:10.4172/2153-0602.R1-001
Copyright: © 2014 Rakesh M, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Data Mining in Genomics & Proteomics
The national Centre for biotechnological information (NCBI) is part of the United States of America. The role in research and education specific to fundamental molecular and genetic processes that control health and disease is done by National Library of Medicine (NLM) NCBI. Introduction to NCBI databases and maps help researchers select tool that may help in finding sequences. A particular helpful feature is science primer, which provides an easy to read introduction to many science topics relevant to NCBI resources like Bio informatics, Molecular modelling, Genome mapping and Molecular genetics. An effort to sequence entire genome adds to the number of protein sequences by a factor of two each year, the gap between sequence and structural information stored in Public databases is growing rapidly. In contrast to sequencing techniques, an experimental method for structure determination takes lot of time and limited in their application. Therefore it’s not able to keep pace with the flood of newly characterised gene products. The growth of practical methods for predicting protein structure from sequence is therefore of significant importance in field of biology.
Genome; Protein structure; Homo sapiens
Neanderthals should be classified as Homo neanderthalensis or “Homo sapiens neanderthalensis”. DNA consists of phosphates, sugars and bases arranged in a chain. There are four different bases: adenine, thymine, guanine and cytosine. DNA is wound together in a double stranded molecule with adenine paired with thymine and cytosine paired with guanine across the two sides of the strand. When DNA is replicated, the chemical bonds that bind together the two sides are broken and new nucleotides are joined to the unwound strand, generating strands with the same sequences as the original. Mutations arise when an error in the copying occurs and a different base is substituted. Different combinations of bases code for different amino acids, which are the building blocks of proteins. Sometimes more than one combination of bases leads to the same amino acid. For example, both AAA and AAG code for lysine. Therefore, a DNA change that led to a G at the third position rather than an A wouldn’t affect the code for lysine. This is known as a silent change. However, if the third position had changed to a C, making the set AAC that would change the amino acid to asparagine. This change that could affect the protein is called non synonymous. Most human sequences differ from each other by on average 8.0 substitutions, while the human and chimpanzee sequences differ by about 55.0 substitutions. The Neanderthal and modern human sequences differed by approximately 27.2 substitutions. Mitochondrial DNA from the Paglicci specimens as well as other ancient humans fit within the range of modern humans, but the Neanderthals remain consistently genetically distinct. This shows that early anatomically modern Homo sapiens were not very different genetically from current modern humans, but were still different from Neanderthals. Though this evidence does not disprove the idea of Neanderthal and modern human admixture, it shows that moderns and Neanderthals did not have more genetic similarities during the Pleistocene that were subsequently lost .
In above Figure 1, gene is present on the chromosome 1. On chromosome 1, gene is present in region 1p 36.1. The protein encoding this gene is member of GTP binding protein. CDC42 gene represented in dark red colour transcribed from 3’ to 5’ direction (Figure 2) .
Figure 3 explains about genomic regions, transcripts and products .
Secondary structure of protein
In structural biology and biochemistry, the secondary structure can be defined as the general three dimensional forms of local segments of biopolymers such as nucleic acids (DNA or RNA) and proteins. To know the relation between the amino acid sequence and protein structure an unambiguous and physically meaningful definition of secondary structure is vital [4-8].
The explosive accumulation of protein sequences in large scale sequencing projects are in severe contrast to much slower experimental determination of protein structures. The improved methods of predicting the structure from the gene sequence alone are therefore needed [8-12].
The structure of proteins can be predicted in four types.
1. Proteins are made up of amino acids arranged in poly peptide chains, and the order of the amino acids in this chain is known as primary structure.
2. The regular way in which the poly peptide chains are arranged in space to form a protein is called secondary structure.
3. The arrangement of three dimensional structure of protein in space is known as tertiary structure.
4. The arrangement of combination of two or more poly peptides chains is known as Quaternary structure.
Secondary structure of protein can be predicted by using the website http://bioinf.cs.ucl.ac.uk in these protein FASTA sequence was uploaded the results appears as fallows.
Figure 4, determines the graphical representation of protein produced from psipred server.
The prediction of secondary structure (pred) is observed in three different colours (Legend) for three different types of secondary structure. Pink colour cylindrical legend represents helix, yellow colour arrow represents strand, black colour line represents random coil of secondary structure. The confidence of prediction (conf) is indicated in blue colour vertical bars. Each blue bar is prediction for each amino acid. Greater the height of bar, darker the blue colour, higher the confidence of prediction level. Smaller the bars, paler the blue colour then lower the confidence of prediction level. From Figure 4, confidence of prediction in secondary structure is mostly high and Helix, coil and β stands are shown to be in equal composition.
Secondary structure is a set of techniques in bioinformatics that aims to predict secondary structure of protein s and nucleic acid sequences based on the basic information of primary structure.
For analysis of secondary structure Chou-Fasman plot, (3 corresponding) psipred server, (colour coded short corresponding states) Spom method (colour coded states with graphical view correspond) where used.
In Figure 5, secondary structure of protein predicted with basic model Chou-Fasman plot. The above sequence is protein FASTA format of given accession number NM_044472 composed with total residues of 191 amino acids. Below FASTA sequence, residue of secondary structure are represented in helix (< ----->), Sheets (E) and turns (T). Of the total 191 residues, helix with 124 residue (64.9%), Sheets with 84 residue (44.0%) and turns with 30 residues shown respectively.
Chou-fasman prediction is a basic secondary structure when compare to psipred prediction. Chou-fasman predicts helix, sheet, and turns for each residue with over all percentage but does not show confidence of prediction in graphical representation.
The accurate secondary-structure prediction is an important aspect for predicting the tertiary structure. The tertiary structure is a unique three dimensional folding of a protein. The tertiary structure is considered to be largely determined by the sequence of amino acids of which it is composed and by the proteins primary sequence. The protein tertiary structure is held together by interactions between the side chains- the ‘’R’’ groups. The hydrogen bonding, hydrophobic interaction, ionic bonding and disulphide bonds are the forces that give rise to the tertiary structure of a protein. By using http://www.sbg.bio. ic.ac.uk protein structure prediction server allows us to submit a protein sequence. It then performs a prediction of our choice and receives the results of the prediction via e-mail [12-16].
The study of figure can provide a very good idea of the underlying structural framework of tertiary contacts between alpha-helices, in both globular and TM environments. The Figure 6 shows included distances, angles, and dihedral angles that together describe entirely the underlying geometry of each contact. From the fig it is likely to identify differences in the geometry requirements of tertiary contacts between alpha-helices in common and Tran membrane alpha-helices. Finally, how the geometry of tertiary contacts changes with the amino acid can be studied [19-21].
Raster Display of Molecules (RasMol) is a molecular graphics program used for the visualization of nucleic acids, small molecules and proteins. This program is more important for teaching, display and generation of publication quality images [22-24].