<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD 2.3 20070202//EN" "http://dtd.nlm.nih.gov/publishing/2.3/journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
	<front>
		<journal-meta>
			<journal-id journal-id-type="nlm-ta">J Comput Sci Syst Biol</journal-id>
			<journal-id journal-id-type="publisher-id">opg</journal-id>						
			<journal-title>Journal of Computer Science amp; Systems Biology</journal-title>			 
			<issn pub-type="epub">0974-7230</issn>
			<publisher>
				<publisher-name>OMICS Publishing Group</publisher-name>
				<publisher-loc>India, USA</publisher-loc>
			</publisher>
		</journal-meta>
		<article-meta>	
			<article-id pub-id-type="doi">10.4172/jcsb.1000010</article-id>			
			<article-id pub-id-type="publisher-id">000063</article-id>
			<article-categories>
				<subj-group subj-group-type="heading">
					<subject>Research Article</subject>
				</subj-group>
				<subj-group subj-group-type="Discipline">
					<subject>Biochemistry</subject>
				</subj-group>
				<subj-group subj-group-type="System Taxonomy">
					<subject>Proteomics</subject>
					<subject>Bioinformatics</subject>
					<subject>Genomics</subject>
					<subject>Transcriptomics</subject>
					<subject>Biomarkers</subject>
				</subj-group>
			</article-categories>
			<title-group>
				<article-title>Comparison of the Virulence Factors and Analysis of Hypothetical Sequences of the Strains TIGR4, D39, G54 and R6 of <italic>Streptococcus Pneumoniae</italic></article-title>
			</title-group>
			<contrib-group>
				<contrib contrib-type="author">
					<name>
						<surname>Jothi</surname>
						<given-names>R</given-names>
					</name>					
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Parthasarathy</surname>
						<given-names>S</given-names>
					</name>
					<xref ref-type="corresp" rid="cor1">&ast;</xref>
				</contrib>
				<contrib contrib-type="author">
					<name>
						<surname>Ganesan</surname>
						<given-names>K</given-names>
					</name>					
				</contrib>				
			</contrib-group>
			<aff id="a1">Department of Bioinformatics, School of Life Sciences Bharathidasan University, Tiruchirappalli 620 024,
Tamil Nadu, India</aff>		
			<author-notes>
				<corresp id="cor1">&ast; To whom correspondence should be addressed: S. Parthasarathy, Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli 620 024,Tamil Nadu, India, Phone: +91 94435 33095; Fax: +91 431 2407045, E-mail: <email>bdupartha@gmail.com</email></corresp>
			</author-notes>
			<pub-date pub-type="collection">
				<month>12</month>
				<year>2008</year>
			</pub-date>
			<pub-date pub-type="epub">
				<day>26</day>
				<month>12</month>
				<year>2008</year>
			</pub-date>
			<volume>1</volume>			
			<fpage>103</fpage>
			<lpage>118</lpage>
			<history>
			<date date-type="received">
			     <day>21</day>
				 <month>10</month>
				 <year>2008</year>
			</date>
			<date date-type="accepted">
			      <day>19</day>
				  <month>11</month>
				  <year>2008</year>
			</date>
			</history>
			<permissions>
			<copyright-statement><bold>Copyright:</bold> copy; 2008 Jothi R, et al.</copyright-statement>
			<copyright-year>2008</copyright-year>
			<license license-type="open access"> 
			<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.</p>
			</license>
			</permissions>			
			<abstract>
				<p>Whole genome sequences of the four <italic>strains of Streptococcus</italic> pneumoniae, encapsulated TIGR4, D39, G54 and nonencapsulated R6 are considered for the comparative study on genome features, whole genome pairwise alignment, gene role category, and virulence factors using relevant comparative genomics tools. The study of capsular polysaccharide synthesizing genes reveals that many cps genes are unique to TIGR4, which shows the high virulence nature of TIGR4. Further, the study on the other virulence factors such as pneumococcal surface protein A, autolysin, hyaluronate lyase, pneumolysin, neuraminidase B, and pneumococcal surface antigen A of TIGR4 are much related to those of the other three strains, and hence the virulence nature due to these factors among four strains seems to be similar. But it differs from neuraminidase A, choline binding protein A and immunoglobulin A1 protease. Also in the present study, 4 and 22 hypothetical protein sequences of TIGR4 and R6 respectively are predicted as virulence factors. Among those sequences, it is found that 8 hypothetical protein sequences with 7 different functional regions of R6 are related to other previously known virulence factors of TIGR4 and R6 of <italic>S. pneumoniae</italic>.</p>
			</abstract>
			<kwd-group>
				<kwd>Comparative genomics</kwd>
				<kwd><italic>Streptococcus pneumoniae</italic></kwd>
				<kwd>TIGR4</kwd>		
				<kwd>D39</kwd>
				<kwd>G54</kwd>		
				<kwd>R6</kwd>	
				<kwd>virulence factors</kwd>	
				<kwd>hypothetical protein sequences</kwd>	
			</kwd-group>
			<custom-meta-wrap>
				<custom-meta>
					<meta-name>citation</meta-name>
					<meta-value>Jothi R, Parthasarathy S, Ganesan K (2008) Comparison of the Virulence Factors and Analysis of Hypothetical Sequences of the Strains TIGR4, D39, G54 and R6 of <italic>Streptococcus Pneumoniae</italic>. J Comput Sci Syst Biol 1: 103-118. doi:<ext-link ext-link-type="doi" xlink:href="10.4172/jcsb.1000010">10.4172/jcsb.1000010</ext-link></meta-value>
				</custom-meta>
			</custom-meta-wrap>
		</article-meta>
	</front>
	<body>
		<sec>
			<title>Introduction</title>
				<p>The whole genome sequences of bacteria of closely related species or strains are providing new avenues of investigation for the further understanding of microbial diversity, pathogenesis, host-parasite interaction, evolution, etc. through a comparative analysis of their genomes. <italic>Streptococcus pneumoniae</italic>, commonly <italic>pneumococcus</italic> (<xref ref-type="bibr" rid="r7">Dowson, 2004</xref>; <xref ref-type="bibr" rid="r11">Gregory and DeSalle, 2005</xref>), a human pathogen, causes life threatening diseases like pneumoniae, bacteremia, meningitis, sepsis, and otitis media. Genome sequencing of four <italic>S. pneumoniae</italic> strains, namely, TIGR4, D39, G54 and R6 have been completed and genome sequencing of other 14 strains are ongoing. G54 genome sequence is not yet added in GenBank but it is inbuilt in Comprehensive Microbial Resource (CMR) and D39 genome sequence is available in GenBank but not in CMR. TIGR4, a clinical isolate, is encapsulated and highly virulent and many of its virulence factors have been studied (<xref ref-type="bibr" rid="r22">Tettelin et al., 2001</xref>). D39, the encapsulated and virulent strain (<xref ref-type="bibr" rid="r16">Lanie et al., 2007</xref>), was used by Avery, Macleod, and McCarty (<xref ref-type="bibr" rid="r2">Avery et al., 1979</xref>) in their landmark study on the role of DNA as the genetic material. G54 is an encapsulated clinical strain type 19F (<xref ref-type="bibr" rid="r6">Dopazo et al., 2001</xref>). R6, a derivative of the serotype 2 clinical isolate D39, is nonencapsulated and avirulent. The genes encoding many virulence factors are present in R6 genome in addition to the genes of capsular biosynthesis (<xref ref-type="bibr" rid="r12">Hoskins et al., 2001</xref>).</p>
<p>Many types of comparative studies (<xref ref-type="bibr" rid="r22">Tettelin et al., 2001</xref>; <xref ref-type="bibr" rid="r16">Lanie et al., 2007</xref>; <xref ref-type="bibr" rid="r12">Hoskins et al., 2001</xref>; <xref ref-type="bibr" rid="r1">AlonsoDeVelasco et al., 1995</xref>; <xref ref-type="bibr" rid="r5">Bruuml;ckner et al., 2004</xref>; <xref ref-type="bibr" rid="r8">Ferretti et al., 2004</xref>; <xref ref-type="bibr" rid="r19">Silva et al., 2006</xref>) have already been carried out in <italic>Streptococcus</italic> strains on various aspects. The preliminary comparative analysis (<xref ref-type="bibr" rid="r15">Jothi et al., 2007</xref>) of the whole genomes of both the encapsulated TIGR4 and nonencapsulated R6 strains of <italic>S. pneumoniae</italic> provided some insights into the high virulence nature of TIGR4. This present study summarizes specifically how the whole genomes of the four strains, namely, TIGR4, D39, G54 and R6 of <italic>S. pneumoniae</italic> differ from each other by their genome features, genome diversity, gene role category and virulence factors. Comparison of the virulence factors among these strains can provide further insight into any strain uniqueness with relevance to virulence nature and can stimulate new approaches into disease prevention and treatment.</p>
<p><italic>S. pneumoniae</italic> has two surface layers outside the plasma membrane, namely, cell wall and capsule. The cell wall has triple-layered peptidoglycan that holds the capsular and cell wall polysaccharides, and also few proteins. The capsule completely covers the inner structure of <italic>S. pneumoniae</italic>. The cell wall polysaccharide is common to all serotypes of
<italic>S. pneumoniae</italic>, but the chemical structure of the capsular polysaccharide is serotype-specific (<xref ref-type="bibr" rid="r1">AlonsoDeVelasco et al., 1995</xref>). After Averyrsquo;s experiment (<xref ref-type="bibr" rid="r2">Avery et al., 1979</xref>), the capsule has long been recognized as the major virulence factor of <italic>S. pneumoniae</italic>. Experimental proof for this was provided by the difference in 50% lethal dose between encapsulated and nonencapsulated strains. Encapsulated strains were found (<xref ref-type="bibr" rid="r1">AlonsoDeVelasco et al., 1995</xref>) to be at least 105 times more virulent than strains lacking the capsule. Certain proteins in <italic>S. pneumoniae</italic> like pneumococcal surface protein A (PspA), autolysin (LytA), hyaluronate lyase (Hyl), pneumolysin (Ply), neuraminidases A and B (NanA and NanB), choline binding protein A (CbpA), pneumococcal surface antigen A (PsaA) and immunoglobulin A1 (IgA1) protease are important virulence factors (<xref ref-type="bibr" rid="r1">AlonsoDeVelasco et al., 1995</xref>; <xref ref-type="bibr" rid="r14">Jedrzejas, 2001</xref>; <xref ref-type="bibr" rid="r18">Rigden et al., 2003</xref>) and these could be used as potential vaccine candidates. The preliminary identification of the surface proteins and virulence factors of <italic>S. pneumoniae</italic> were done by computational analysis of its genome sequences (<xref ref-type="bibr" rid="r21">Tettelin and Hollingshead, 2004</xref>; <xref ref-type="bibr" rid="r11">Gregory and DeSalle, 2005</xref>; <xref ref-type="bibr" rid="r22">Tettelin et al., 2001</xref>; <xref ref-type="bibr" rid="r12">Hoskins et al., 2001</xref>) and continued in several subsequent studies (<xref ref-type="bibr" rid="r5">Bruuml;ckner et al., 2004</xref>; <xref ref-type="bibr" rid="r17">Polissi et al., 1998</xref>; <xref ref-type="bibr" rid="r23">Wizemann et al., 2001</xref>). Strains of <italic>S. pneumoniae</italic> are now resistant to commonly prescribed antibiotics, such as, penicillin, macrolides and fluoroquinolones (<xref ref-type="bibr" rid="r22">Tettelin et al., 2001</xref>). Because of the multidrug resistance nature of the <italic>S. pneumoniae</italic> strains, we need a deeper understanding of the virulence factors, for that the comparative genomics approach may provide more insight.</p>
<p>At present, only 70 % of the genes in any given genome can be predicted with reasonable confidence (<xref ref-type="bibr" rid="r3">Bork, 2000</xref>). The remaining genes are either hypothetical (do not have any known homolog) or conserved hypothetical (homologous to genes of unknown function), because it is unclear whether they encode actual proteins. The large quantity of hypothetical protein sequences in completely sequenced genomes of organisms makes their study an enormous task. Characterization of these genes or proteins of unknown function is generally recognized as an essential step towards fully understanding the biology of the pathogenic organism and for potential targets. Few studies (<xref ref-type="bibr" rid="r9">Galperin and Koonin, 2004</xref>; <xref ref-type="bibr" rid="r4">Brown, 2005</xref>; <xref ref-type="bibr" rid="r20">Sivashankari and Shanmughavel, 2006</xref>) have already been carried out on hypothetical sequences. In the present study, hypothetical protein sequences of the strains TIGR4 and R6 of <italic>S. pneumoniae</italic> are analyzed to find their virulence nature using VirulentPred. Among those sequences, it is also analyzed how far the hypothetical protein sequences are related to other previously known virulence factors of TIGR4 and R6 of <italic>S. pneumoniae</italic>.</p>
		</sec>
		<sec sec-type='material|methods'>
			<title>Materials and methods</title>
				<p>Various analysis of the whole genomes of the four strains, namely, TIGR4, D39, G54 and R6 of <italic>S. pneumoniae</italic> like the whole genome alignment, comparison of gene role categories, finding the location of the virulence factors in the genome and comparison of virulence regions are carried out using the appropriate bioinformatics software tools.</p>
				<sec>
					<title>Sequence Retrieval and Whole Genome Pairwise Alignment</title>
						<p>The complete genome sequences and the list of annotated gene and protein sequences of TIGR4, D39 and R6
are retrieved from the NCBI ndash; FTP server (ftp://ftp.ncbi.nih.gov/genomes). We used the run-mummer3 program available in the standalone MUMmer 3.20 (<ext-link ext-link-type="uri" xlink:href="http://mummer.sourceforge.net/">http://mummer.sourceforge.net/</ext-link>) and its built-in mummerplot for obtaining the whole genome pairwise alignment of <italic>S. pneumoniae</italic> strains TIGR4, D39, and R6 in different combinations. MUMmer at Comprehensive Microbial Resource (CMR) is used for the whole genome pairwise alignment of the strains TIGR4, G54 and R6 in different combinations.</p>
				</sec>
				<sec>
					<title>Comparison of the Role Category of Genes and Sequence Analysis</title>
						<p>The tool in CMR database (<ext-link ext-link-type="uri" xlink:href="http://cmr.tigr.org/tigr-scripts/CMR/ CmrHomePage.cgi">http://cmr.tigr.org/tigr-scripts/CMR/ CmrHomePage.cgi</ext-link>), the role category piechart is used for the genome features and functional role category comparison of the whole genomes of TIGR4, G54 and R6. Bacterial Annotation System (BASys - <ext-link ext-link-type="uri" xlink:href="http://wishart.biology.ualberta.ca/basys">http://wishart.biology.ualberta.ca/basys</ext-link>) - A web server for automated bacterial genome annotation is used to know the role category for three strains TIGR4, D39 and R6, whose whole genomes are already available in it. From the prediction server of the Center for Biological Sequence Analysis (CBS - <ext-link ext-link-type="uri" xlink:href="http://www.cbs.dtu.dk/services">http://www.cbs.dtu.dk/services</ext-link>), the Genome Atlas is used for the analysis of repeats of <italic>S. pneumoniae</italic>. The sequences of various virulence factors, which are taken for our study, have been verified by using the virulence factors database (<ext-link ext-link-type="uri" xlink:href="http://www.mgc.ac.cn/VFs">http://www.mgc.ac.cn/VFs</ext-link>). BioEdit (<ext-link ext-link-type="uri" xlink:href="http://www.mbio.ncsu.edu/ BioEdit/bioedit.html">http://www.mbio.ncsu.edu/ BioEdit/bioedit.html</ext-link>) is used to compute sequence composition of the genomes and genes. Further, LALIGN (<ext-link ext-link-type="uri" xlink:href="http://www.ch.embnet.org/software/LALIGN_form.html">http://www.ch.embnet.org/software/LALIGN_form.html</ext-link>) is used for the pairwise global alignment of the gene sequences of the strains of <italic>S. pneumoniae</italic>.</p>
				</sec>
				<sec>
					<title>Functional Annotation of Hypothetical Sequences</title>
						<p>VirulentPred (<ext-link ext-link-type="uri" xlink:href="http://bioinfo.icgeb.res.in/virulent">http://bioinfo.icgeb.res.in/virulent</ext-link>) is a SVM (Support Vector Machine) based method to predict bacterial virulent protein sequences, which can be used to screen virulent proteins in proteomes. In the present study the above tool is used to analyse the hypothetical sequences of the strains TIGR4 and R6 of <italic>S. pneumoniae</italic>. From the
proteome of TIRG4 and R6 of <italic>S. pneumoniae</italic>, all unannotated hypothetical protein sequences are retrieved using PERL script and those sequences are used as data set for virulence factor prediction.</p>
				</sec>
		</sec>
		<sec>
			<title>Results and Discussion</title>
				<p>Comparative genomics and <italic>in silico</italic> studies have begun to reveal insights into gene and protein functions of many organisms. Here, we compare the genomes of the strains TIGR4, D39, G54 and R6 of <italic>S. pneumoniae</italic> using the appropriate tools for whole genome comparison and the results are discussed below.</p>
			<sec>
				<title>Comparison of the Genome Features of Four Strains of <italic>S. Pneumoniae</italic></title>
						<p><xref ref-type="table" rid="t1">Table 1</xref> summarizes the general information about the genomes including statistics of genes of these four strains, obtained and compiled from CMR and NCBI web servers. The genome sizes of these four strains range between 2 Mb and 2.16 Mb (c.f. Sl.No.2 of <xref ref-type="table" rid="t1">Table1</xref>). Among these four strains, D39 is the smallest and TIGR4 is the largest based on genome size. The nucleotide base (A, T, G, C, AT and GC) compositions of four strains show that the strains have low GC (~40%) genomes. The number of genes encoding for proteins of these four strains ranges between 1914 and 2234 (c.f. Sl.No.3 of <xref ref-type="table" rid="t1">Table1</xref>). Of the total base pairs of four genomes, approximately 85 - 87% of base pairs (bps) are involved in coding and the remaining are non-coding or junk DNA. The number of genes involved in RNA synthesis (structural RNA, tRNA, and rRNA) is more or less similar in all strains. Finally, by comparing the global and local repeats of TIGR4 and R6 using CBS web server, it is evident that both the repeats are high in TIGR4 than in R6 (c.f. Sl.No.4 of <xref ref-type="table" rid="t1">Table1</xref>) and this may be related to the duplicated regions of the chromosome (<xref ref-type="bibr" rid="r11">Gregory and DeSalle, 2005</xref>).</p>
				</sec>
				<sec>
					<title>Comparison of Whole Genome Pairwise Alignments</title>
						<p>The whole genome pairwise alignments of the strains TIGR4, D39 and R6 of <italic>S. pneumoniae</italic> (whose sequence data are available at NCBI) are obtained using the standalone version of MUMmer and the results are plotted using its built-in mummerplot. The whole genome pairwise alignments of the strains TIGR4, G54 and R6 are obtained using CMR, where these sequences are available, and the five  possible alignments are shown in <xref ref-type="fig" rid="g1">Figure 1(a) ndash; (e)</xref>. Generally, the genomes of prokaryotes are very dynamic, with insertions, deletions, inversions, and translocations being commonly observed among related species or even between different strains of the same species (<xref ref-type="bibr" rid="r11">Gregory and DeSalle, 2005</xref>; <xref ref-type="bibr" rid="r13">Hughes, 2000</xref>). The net result is that the particular complement of genes and their order along the chromosome are not typically conserved over evolutionary time. In some cases, genes that are grouped into operons in one species may be dispersed throughout the genome in others. We find similar results, while we analyzed the genomes of four strains of <italic>S. pneumoniae</italic>. In particular, we find that there exists a stability of the gene order in the genome pairs TIGR4 vs. D39 and TIGR4 vs. R6 and they are shown by fact that most of the points lie along the diagonal in <xref ref-type="fig" rid="g1">Figures 1a and 1b</xref>. The results (<xref ref-type="fig" rid="g1">Figures 1a and 1b</xref>) indicate that the stability of gene order of D39 vs. R6 must also be relatively high and it is shown in <xref ref-type="fig" rid="g1">Figure 1c</xref>. This also confirms the fact that R6 is the derivative of D39. The whole genome pairwise alignments of TIGR4 vs. G54 and that of R6 vs. G54 do not show such a high degree of the stability of gene order compared to the above results (for D39 strain) and are shown in <xref ref-type="fig" rid="g1">Figures 1d and 1e</xref>, respectively.</p>
				<fig id="g1">
					<label>Figure 1</label>
					<caption>
						<title>Whole genome alignment of a) TIGR4 vs. D39; b) TIGR4 vs. R6; c) D39 vs. R6 using stand-alone MUMmer; Whole genome alignment of d) TIGR4 vs. G54 and e) R6 vs. G54 using built-in MUMmer of CMR, which show plasticity and stability in gene order between two strains.</title>
					</caption>
					<graphic xlink:href="JCSB-01-103-g001.tif"/>
				</fig>
				<p>Many of the gene and protein sequences among these strains are approximately the same and this is not surprising as all the strains occupy the same niche in the human respiratory system. The small differences might have arisen after the divergence of these strains from other evolutionary lineages for adaptations in their host. This increases greatly in pathogens and appears to be associated with the ability to infect eukaryotes, perhaps reflecting a mechanism for evading host immune defenses and the unique genes may be located in a plasticity zone.</p>
<p>Since G54 genome sequence is not available at NCBI web server and D39 genome is not available at CMR server, we could not get the whole genome alignment for D39 vs. G54. However, we are able to predict the whole genome pairwise alignment of D39 vs. G54, based on the earlier result. As the <xref ref-type="fig" rid="g1">Figures 1d and 1e</xref> are similar, it indicates that the alignment of D39 vs. G54 must also possess similar structure. This prediction may be confirmed if the whole genome sequence of G54 is made available in NCBI or genome sequence of D39 is included in CMR.</p> 
				</sec>
				<sec>
					<title>Comparison of Capsular Polysaccharide Synthesizing Genes</title>
						<p>We have compared the capsular polysaccharide (cps) synthesizing genes of the strains TIGR4, D39, G54 and R6 of <italic>S. pneumoniae</italic> and the results are shown in <xref ref-type="table" rid="t2">Table 2</xref>. There are 15 different cps genes in TIGR4, 7 in D39 and 9 in G54 and only one in R6. Their gene IDs, G+C percentage, protein length, gene length and gene coordinates are shown in <xref ref-type="table" rid="t2">Table 2</xref>. On comparison, it is estimated that 5 cps genes of TIGR4 (gi|15900275-cps4A, gi|15900276-cps4B, gi|15900278-cps4D, gi|15900046-cps-ptv amp; gi|15901666-cpsptv) are related to that of D39 (gi|116516963-cps2A, gi|116516159-cpsB, gi|116517023-cps2D, gi|116517199-cps and gi|116516120-cps-ptv). All the cps genes of D39 are present in TIGR4 except gi|116516773-cps2E and gi|116516341-cps-ptv.</p>
						<p>Between TIGR4 and G54, 6 cps genes are related (gi|15900275-cps4A, gi|15900276-cps4B, gi|15900277-cps4C, gi|15900278-cps4D, gi|15900046-cps-ptv amp; gi|15901666-cps-ptv of TIGR4 with NT05SP0190-cps4A, NT05SP0191-cps4B, NT05SP0192-cps4C, NT05SP0193- cps4D, NT05SP2185-cps9E &amp; NT05SP1650-cps7G of G54). Likewise, between D39 and G54, 5 cps genes are related (gi|116516963-cps2A, gi|116516159-cpsB, gi|116517023-cps2D, gi|116517199-cps and gi|116516120- cps-ptv of D39 with NT05SP0190-cps4A, NT05SP0191- cps4B, NT05SP0192-cps4C, NT05SP2185-cps9E &amp; NT05SP1650-cps7G of G54), but gi|116516773-cps2E and gi|116516341-cps-ptv of D39 are not present in G54. Similarly, it is interesting to note that the only cps gene of R6 (gi|15902136-capD), has 99.8 % identity with the gene gi|15900046-cps-ptv of TIGR4, 100 % identity with the gene gi|116517199-cps of D39 and 99.5 % identity with the gene NT05SP2185 of G54. All the above results are in support of the Avery&rsquo;s statement (<xref ref-type="bibr" rid="r2">Avery et al., 1979</xref>) that the capsule is responsible for pathogenecity.</p>
<p>From similar analysis, we have also noted that the genes, gi|15900279-cps4E, gi|15900280-cps4F, gi|15900281-cps4G, gi|15900282-cps4H, gi|15900286-cps4I, gi|15900287-cps4J, gi|15900288-cps4K, gi|15900289-cps4L and gi|15900788- cps-ptv are unique to TIGR4. Similarly, the genes gi|116516773-cps2E and gi|116516341-cps-ptv are unique to D39 strain. In the same way, the genes NT05SP0198, NT05SP0202 and NT05SP1909 are unique to the strain G54. But in R6, the only cps gene gi|15902136-capD is common
to all other strains (<xref ref-type="table" rid="t2">Table 2</xref>). As the TIGR4 strain has more number of cps genes than other strains it indicates the high virulence nature of TIGR4. Further, the results also explain that the virulence nature is lesser in D39 and G54 strains, and very less in R6 compared to TIGR4.</p>
<p>Though all the cps genes of TIGR4 are not present in D39, G54 and R6 strains, they are also pathogenic. Therefore, to know the other virulence factors in addition to cps genes, we consider the other genes of the strains from the gene role category aspect.</p>
				</sec>
				<sec>
					<title>Comparison of the Role Category of Genes</title>
						<p>Role category of genes of the different strains are compared by using the two different tools, namely, i. CMR &ndash; role category pie chart for TIGR4, G54 and R6 (<xref ref-type="table" rid="t3">Table 3</xref>) and ii. Bacterial Annotation System (BASys) for the strains TIGR4, D39 and R6, based on the availability of genome sequences. The genes responsible for biosynthesis of various proteins (Sl. Nos. 1-9 of <xref ref-type="table" rid="t3">Table 3</xref>) of TIGR4 are nearly same as in G54 and R6, which suggests the basic complement of proteins required for certain cellular processes. But the genes responsible for the biosynthesis of some other proteins (Sl.Nos.10-23 of <xref ref-type="table" rid="t3">Table 3</xref>) of TIGR4 are notably different from that of G54 and R6. This suggests that, these proteins are important for strain uniqueness and they may be involved in variations in pathogenesis among the strains of <italic>S. pneumoniae</italic>. The percentage values given for a particular role category in <xref ref-type="table" rid="t3">Table 3</xref> is specific to the gene involved in that category only and does not represent the overall gene percentage. For example, autolysin (SP1937) of TIGR4 is categorized into two role categories such as cell envelope and cellular processes (Sl.Nos.11 and 12 of <xref ref-type="table" rid="t3">Table 3</xref>) and the percentage given is specific to the respective categories.</p>
<p>The number of genes which are responsible for pathogenesis in the strains TIGR4, G54 and R6 are manually counted from CMR gene role category (sub role categories pathogenesis, toxin production and resistance) and found to be 101 (4.52 %), 47 (2.30 %) and 42 (1.89 %) respectively (Sl.No.19 of <xref ref-type="table" rid="t3">Table 3</xref>). TIGR4 has many pathogenic factors and it is highly virulent and G54 and R6 strains have approximately 50% of the pathogenic factors of TIGR4. Mobile and extra chromosomal elements comprise a significant fraction of the genome as with the 134 genes (5.99 %) in TIGR4, 71 (3.46 %) in G54 and 86 genes (3.87 %) in R6 (Sl.No.18 of <xref ref-type="table" rid="t3">Table 3</xref>). Generally transposons encode genes for antibiotic resistance (<xref ref-type="bibr" rid="r11">Gregory and DeSalle, 2005</xref>); therefore from our results, it is evident that the antibiotic resistance may be relatively higher in TIGR4 than the strains G54 and R6.</p>
<p>From the results of the comparative study on TIGR4, D39 and R6, using BASys server, we find that most of the values are more or less similar. But, there is a higher percentage for unknown functions in the strains TIGR4, D39 and G54, which indicates that the reason for the differences may also be hidden in the unknown genes or proteins (data not shown).</p>
<p>From <xref ref-type="table" rid="t3">Table 3</xref>, the number of hypothetical, conserved hypothetical, unclassified and unknown genes of whole genomes of the strains TIGR4, G54 and R6 are noted and is shown in <xref ref-type="table" rid="t4">Table 4</xref>. Nearly 37 - 42 % of genes are of unknown type and it shows that these sequences have to be annotated and assigned functions of which some of them may be responsible for the virulence nature. Using the multigenome homology comparison tool, which is available at CMR, the numbers of unique genes in TIGR4, G54 and R6 are found to be 288, 104 and 78, respectively (<xref ref-type="table" rid="t4">Table 4</xref>).</p>
<p>The unique genes of the strains TIGR4, G54 and R6 themselves have many hypothetical, conserved hypothetical, unknown and unclassified sequences and their percentage ranges from 65 to 74, thus the other possible differences among the strains may be known by studying the above said gene sequences. As far as the virulence factors are concerned, in the unique genes of the strain TIGR4, 3 capsular polysaccharide biosynthesis proteins (Sp_0351 (cps4F), Sp_0352 (cps4G) and Sp_0359 (cps4K)), 4 cell wall surface
anchor family proteins (Sp_0462, Sp_0463, Sp_0464 and Sp_1772), a PspC protein (Sp_1417), a NanA protein (SP_1693) and a IgA1 protease (SP_2155) are there. In the case of R6, it has three proteins of type 2 capsule locus (Spr0315, Spr0317 and Spr0319) in its unique genes. But the strain G54 does not have such virulence factors in its unique genes (<xref ref-type="table" rid="t4">Table 4</xref>). The above result shows the high virulence nature of TIGR4 and it also suggests that those virulence factors are specific to TIGR4 and R6. The above differences might have arisen because of the species-specific adaptation to their host particularly in the sake of defense mechanism.</p>
				</sec>
				<sec>
					<title>Comparison of Virulence Factors Other than Capsular Polysaccharide Synthesizing Genes</title>
						<p>In <italic>S. pneumoniae</italic>, the surface and cytoplasmic proteins such as pneumococcal surface protein A (PspA), autolysin (LytA), hyaluronate lyase (Hyl), pneumolysin (Ply), two neuraminidases (NanA and NanB), choline binding protein A (CbpA), pneumococcal surface antigen A (PsaA) and immunoglobulin A1 (IgA1) protease are already stated as the virulence factors (<xref ref-type="bibr" rid="r14">Jedrzejas, 2001</xref>; <xref ref-type="bibr" rid="r18">Rigden et al., 2003</xref>). The comparative results of the above mentioned sequences obtained from CMR, are given in <xref ref-type="table" rid="t5">Table 5</xref>. It provides more insight into the virulence factors of the strains TIGR4, D39, G54 and R6 of <italic>S. pneumoniae</italic>.</p>
						<p>The virulence factors of TIGR4 are taken as reference and are compared with all other related sequences of the strains such as D39, G54 and R6, likewise the virulence factors of D39 are taken as reference and are compared with all the related sequences of the strains G54 and R6. Similarly the virulence factors of G54 are taken as reference and are compared with all the related sequences of the remaining strain R6 using the pairwise sequence alignment tool LALIGN, with default parameters (Alignment: Global; Scoring matrix: BLOSUM50, Gap opening penalty: -14 and extension penalty: -4), and all the results are comparatively shown in <xref ref-type="table" rid="t5">Table 5</xref>.</p>
						<p>PspA is located in the cell wall of <italic>pneumococci</italic> and present in all <italic>S. pneumoniae</italic> strains (<xref ref-type="bibr" rid="r14">Jedrzejas, 2001</xref>). PspA of TIGR4 has ~53-63% identities with D39, G54 and R6 (<xref ref-type="table" rid="t5">Table 5</xref>). When we compare PspA in D39 vs. G54 and G54 vs. R6, the identities between those strains are nearly 63%. The above results indicate that nearly 50-60% virulence nature of PspA of TIGR4 exist in other strains D39, G54 and R6. But it is interesting to note that there is 100%
identity between the PspA sequences of D39 and R6, thus the virulence nature of PspA is exactly the same.</p>
<p>Regarding LytA, Hyl, Ply, NanB and PsaA, all the four strains of S. pneumoniae have above 90% identities, thus the effect of the above mentioned five virulence factors is also similar and it also reflects on G+C percentage, protein length and gene length, but the location in their genomes varies and the similarities and differences can be noticed from the <xref ref-type="table" rid="t5">Table 5</xref>.</p>
<p>All strains have different neuraminidase sequences except G54 and R6 (~90% identity). In the case of CbpA and IgA1 of the strain TIGR4, high percent identities (~73 and 87%) exist with D39 and R6 respectively, exactly identical (100%) between D39 and R6. But very less identities (~40 and 35%) exist with G54 combinations. It seems that the virulence nature based on cbpA and IgaA are similar among the strains TIGR4, D39 and R6 and differs in G54.</p>
<p>From <xref ref-type="table" rid="t5">Table 5</xref>, it is interesting to note that all the virulence factors of D39 are very similar to R6 (above 99% identities except NanA), and it confirms the fact that the avirulent strain R6 is the derivative of the strain D39 (<xref ref-type="bibr" rid="r16">Lanie et al., 2007</xref>). Based on the role category, all TIGR4 virulence factors come under pathogenesis related functions and it also says that TIGR4 has high virulence nature.</p>
				</sec>
				<sec>
					<title>Functional Annotation of Hypothetical Sequences Relevant to the Virulence Factors</title>
						<p>Prediction of virulence factors from the hypothetical sequences of <italic>S. pneumoniae</italic> has implications on the identification and characterization of the virulence mechanism. The present study predicted using VirulentPred (<xref ref-type="bibr" rid="r10">Garg and Gupta, 2008</xref>) that 4 hypothetical sequences of TIGR4 and 22 of R6, respectively, are virulence factors. All these sequences are listed in <xref ref-type="table" rid="t6">Table 6</xref>. The prediction is based on protein features, such as, amino acid composition, di-peptide composition, similarity search, higher order di-peptide composition, PSSM and cascaded SVM module of the tool VirulentPred. However, similar predictions are not possible at present with D39 and G54 as the sequence information of the latter is not fully available.</p>
						<p>Among the 4 predicted virulence factors of TIGR4, only one sequence (gi|15901572) is predicted in R6 as a hypothetical protein (gi|15903627) and the functional region is predicted as Plasmid_Txe (PF06769). This family contains many hypothetical proteins and there is no homolog with other mentioned virulence factors. But in R6, it is interesting to note that among the 22 predicted virulence factors of hypothetical protein sequences, 8 different sequences (gi|15902372, gi|15903388, gi|15903446, gi|15902652, gi|15902781, gi|15903694, gi|15903627 and gi|15903771) with 7 different functional regions which are related to the already mentioned virulence factors of the strains R6 and TIGR4. Those virulence factors are hyaluronidase, Immunoglobulin A1 protease, capsular polysaccharide synthesis, pneumolysin, neuraminidase and choline binding protein. The
above mentioned related sequences of TIGR4 and R6 except gi|15903771 are compared in <xref ref-type="table" rid="t7">Table 7</xref>.</p>
<p>The hypothetical protein sequence, gi|15903771 of R6 has 71 amino acids and its functional region is predicted as putative cell wall binding repeat (42-60) using Interproscan (ID - PF01473). It is also found that the same functional region is repeatedly present in the known virulence factors such as pneumococcal surface protein A, autolysin and choline binding proteins of the strains TIGR4 and R6. Since many domain regions have been identified in the above mentioned known virulence factors of TIGR4 and R6, the regions are not explicitly given. But one can easily obtain those regions using the tool Interproscan.</p>
				</sec>
			</sec>
			<sec>
				<title>Conclusion</title>
					<p>We have compared the virulence nature of the strains, encapsulated TIGR4, D39, G54 and nonencapsulated R6 of Streptococcus pneumoniae using comparative genomics tools. From the whole genome pairwise alignment, we found that the stability of the gene order in the genomes of TIGR4 vs. D39, TIGR4 vs. R6 and D39 vs. R6 are relatively higher than the genomes of TIGR4 vs. G54 and R6 vs. G54. We are able to predict the possible structure of whole genome pairwise alignment of D39 vs. G54 from the alignments of TIGR4 vs. G54 and R6 vs. G54.</p>
<p>From the comparison on the capsular polysaccharide (cps) synthesizing genes, we found that, TIGR4 strain has more number of cps genes than other strains, which may indicate the high virulence nature of TIGR4. Many cps genes are unique to TIGR4, only few are in D39 &amp; G54 and none in R6, which shows the high virulence nature of TIGR4. Further, the study on other virulence factors such as, pneumococcal surface protein A, autolysin, hyaluronate lyase, pneumolysin, neuraminidase B and pneumococcal surface antigen A of TIGR4 are closely related to those of the other three strains, which shows that the virulence nature due to these factors among four strains seems to be similar. But the virulence factors neuraminidase A, choline binding protein A and immunoglobulin A1 protease of TIGR4 differs from other strains of <italic>S. pneumoniae</italic>, which shows that these factors are responsible for the differences in virulence nature among four strains.</p>
<p>From the gene role category comparison, many genes of TIGR4 that are nearly same as in G54 and R6, suggests the basic complement of proteins required for certain cellular processes in the strains of <italic>S. pneumoniae</italic>. But many of the genes of TIGR4 which are notably different from the strains G54 and R6, suggest that these proteins are important for strain uniqueness and they may be involved in variations in pathogenesis. Since many hypothetical, conserved hypothetical, unknown and unclassified proteins exist among the dissimilar role categorized genes, it seems that many of these genes of <italic>S. pneumoniae</italic> have to be annotated and assigned functions of which some of them may also be responsible for the virulence nature. Further, we have also found that most of the virulence factors are same in D39 and R6 and hence also confirms the fact that R6 is the derivative of the strain D39.</p> 
			<p>In order to annotate the uncharacterized protein sequences (hypothetical and conserved hypothetical), the present study predicted 4 and 22 hypothetical sequences of the strains TIGR4 and R6 respectively of <italic>S. pneumoniae</italic> are of virulence factors. Among those predicted virulence factors, 1 and 8 different hypothetical sequences of TIGR4 and R6
respectively contain conserved sequences of known virulence factors such as hyaluronidase, immunoglobulin A1 protease, capsular polysaccharide synthesis, pneumolysin, neuraminidase and choline binding protein. These sequences also may be considered as desirable targets for therapeutics. The effort is to narrow down the search of virulence factors from all hypothetical sequences and this conclusion will be a reality only when it is experimentally proved.</p>
			</sec>		
	</body>
	<back>		
		<ref-list>
		<title>References</title>
		    <ref id="r1">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>AlonsoDeVelasco</surname>
					<given-names>E</given-names>
				</name>	
				<name>
					<surname>Verheul</surname>
					<given-names>AF</given-names>
				</name>
				<name>
					<surname>Verhoef</surname>
					<given-names>J</given-names>
				</name>
				<name>
					<surname>Snippe</surname>
					<given-names>H</given-names>
				</name>			
				</person-group>
				<year>1995</year>
				<article-title><italic>Streptococcus pneumoniae:</italic> virulence factors, pathogenesis, and vaccines</article-title>
				<source>Microbiol Rev</source>		
				<volume>59</volume>			
				<fpage>591</fpage>
				<lpage>603</lpage>
			</citation>
			</ref>
			<ref id="r2">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Avery</surname>
					<given-names>OT</given-names>
				</name>
				<name>
					<surname>MacLeod</surname>
					<given-names>CM</given-names>
				</name>
				<name>
					<surname>McCarty</surname>
					<given-names>M</given-names>
				</name>				
				</person-group>
				<year>1979</year>
				<article-title>Studies on the chemical nature of the substance inducing transformation of <italic>pneumococcal </italic>types. Inductions of transformation by a deoxyribonucleic acid fraction isolated from pneumococcus type III</article-title>
				<source>J Exp Med</source>
				<volume>149</volume>						
				<fpage>297</fpage>
				<lpage>326</lpage>
			</citation>
			</ref>
			<ref id="r3">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Bork</surname>
					<given-names>P</given-names>
				</name>				
				</person-group>
				<year>2000</year>
				<article-title>Powers and pitfalls in sequence analysis: the 70% hurdle</article-title>
				<source>Genome Res</source>
				<volume>10</volume>					
				<fpage>398</fpage>
				<lpage>400</lpage>
			</citation>
			</ref>
			<ref id="r4">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Brown</surname>
					<given-names>TA</given-names>
					<suffix>Jr</suffix>
				</name>
				<name>
					<surname>Ahn</surname>
					<given-names>SJ</given-names>
				</name>
				<name>
					<surname>Frank</surname>
					<given-names>RN</given-names>
				</name>
				<name>
					<surname>Chen</surname>
					<given-names>YY</given-names>
				</name><etal/>
				</person-group>
				<year>2005</year>
				<article-title>A hypothetical protein of <italic>Streptococcus mutans</italic> is critical for biofilm formation</article-title>
				<source>Infect Immun</source>
				<volume>73</volume>				
				<fpage>3147</fpage>
				<lpage>3151</lpage>
			</citation>
			</ref>			
			<ref id="r5">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Br&uuml;ckner</surname>
					<given-names>R</given-names>
				</name>
				<name>
					<surname>Nuhn</surname>
					<given-names>M</given-names>
				</name>
				<name>
					<surname>Reichmann</surname>
					<given-names>P</given-names>
				</name>
				<name>
					<surname>Weber</surname>
					<given-names>B</given-names>
				</name>
				<name>
					<surname>Hakenbeck</surname>
					<given-names>R</given-names>
				</name>
				</person-group>
				<year>2004</year>
				<article-title>Mosaic genes and mosaic chromosomes - genomic variation in <italic>Streptococcus pneumoniae</italic></article-title>
				<source>Int J Med Microbiol</source>
				<volume>294</volume>				
				<fpage>157</fpage>
				<lpage>168</lpage>
			</citation>
			</ref>
			<ref id="r6">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Dopazo</surname>
					<given-names>J</given-names>
				</name>
				<name>
					<surname>Mendoza</surname>
					<given-names>A</given-names>
				</name>
				<name>
					<surname>Herrero</surname>
					<given-names>J</given-names>
				</name>
				<name>
					<surname>Caldara</surname>
					<given-names>F</given-names>
				</name>
				<name>
					<surname>Humbert</surname>
					<given-names>Y</given-names>
				</name><etal/>
				</person-group>
				<year>2001</year>
				<article-title>Annotated draft genomic sequence from a Streptococcus pneumoniae type 19F clinical isolate</article-title>
				<source>Microb Drug Resist</source>
				<volume>7</volume>				
				<fpage>99</fpage>
				<lpage>125</lpage>
			</citation>
			</ref>
			<ref id="r7">
			<citation citation-type="book">
			    <person-group>
				<name>
					<surname>Dowson</surname>
					<given-names>CG</given-names>
				</name>				
				</person-group>
				<year>2004</year>
				<article-title>Plant simple sequence repeats: distribution, variation, and effect on gene expression</article-title>					
				<edition>Tuomanen etal. (eds)</edition>
				<publisher-loc>Washington</publisher-loc>
				<publisher-name>The Pneumococcus ASM press</publisher-name>		
				<fpage>pp3</fpage>
				<lpage>14</lpage>
			</citation>
			</ref>
			<ref id="r8">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Ferretti</surname>
					<given-names>JJ</given-names>
				</name>
				<name>
					<surname>Ajdic</surname>
					<given-names>D</given-names>
				</name>
				<name>
					<surname>McShan</surname>
					<given-names>WM</given-names>
				</name>				
				</person-group>
				<year>2004</year>
				<article-title>Comparative genomics of streptococcal species</article-title>
				<source>Indian J Med Res</source>				
				<volume>119</volume>				
				<fpage>1</fpage>
				<lpage>6</lpage>
			</citation>
			</ref>
			<ref id="r9">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Galperin</surname>
					<given-names>MY</given-names>
				</name>
				<name>
					<surname>Koonin</surname>
					<given-names>EV</given-names>
				</name>							
				</person-group>
				<year>2004</year>
				<article-title>Conserved hypothetical proteins: prioritization of targets for experimental study</article-title>				
				<source>Nucleic Acids Res</source>
				<volume>32</volume>				
				<fpage>5452</fpage>
				<lpage>5463</lpage>							
			</citation>
			</ref>
			<ref id="r10">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Garg</surname>
					<given-names>A</given-names>
				</name>
				<name>
					<surname>Gupta</surname>
					<given-names>D</given-names>
				</name>
				</person-group>
				<year>2008</year>
				<article-title>VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens</article-title>
				<source>BMC Bioinformatics</source>				
				<volume>28</volume>				
				<fpage>9</fpage>
				<lpage>62</lpage>
			</citation>
			</ref>
			<ref id="r11">
			<citation citation-type="book">
			    <person-group>
				<name>
					<surname>Gregory</surname>
					<given-names>TR</given-names>
				</name>
				<name>
					<surname>DeSalle</surname>
					<given-names>R</given-names>
				</name>				
				</person-group>
				<year>2005</year>
				<article-title>Comparative genomics in prokaryotes</article-title>							
				<edition>Gregory (ed.)</edition>
				<publisher-loc>London</publisher-loc>
				<publisher-name>Elsevier/Academic Press</publisher-name>		
				<fpage>pp585</fpage>
				<lpage>660</lpage>
			</citation>
			</ref>
			<ref id="r12">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Hoskins</surname>
					<given-names>J</given-names>
				</name>	
				<name>
					<surname>Alborn</surname>
					<given-names>WE</given-names>
					<suffix>Jr</suffix>
				</name>
				<name>
					<surname>Arnold</surname>
					<given-names>J</given-names>
				</name>	
				<name>
					<surname>Blaszczak</surname>
					<given-names>LC</given-names>
				</name><etal/>		
				</person-group>								
				<year>2001</year>
				<article-title>Genome of the bacterium <italic>Streptococcus pneumoniae</italic> strain R6</article-title>
				<source>J Bacteriol</source>				
				<volume>183</volume>				
				<fpage>5709</fpage>
				<lpage>5717</lpage>							
			</citation>
			</ref>
			<ref id="r13">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Hughes</surname>
					<given-names>D</given-names>
				</name>							
				</person-group>								
				<year>2000</year>
				<article-title>Evaluating genome dynamics: the constraints on rearrangements within bacterial genomes</article-title>
				<source>Genome Biol</source>				
				<volume>1</volume>				
				<fpage>0006.1</fpage>
				<lpage>0006.8</lpage>												
			</citation>
			</ref>	
			<ref id="r14">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Jedrzejas</surname>
					<given-names>MJ</given-names>
				</name>							
				</person-group>								
				<year>2001</year>
				<article-title>Pneumococcal virulence factors: structure and function</article-title>
				<source>Microbiol Mol Biol Rev</source>				
				<volume>65</volume>				
				<fpage>187</fpage>
				<lpage>207</lpage>												
			</citation>
			</ref>	
			<ref id="r15">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Jothi</surname>
					<given-names>R</given-names>
				</name>		
				<name>
					<surname>Manikandakumar</surname>
					<given-names>K</given-names>
				</name>	
				<name>
					<surname>Ganesan</surname>
					<given-names>K</given-names>
				</name>		
				<name>
					<surname>Parthasarathy</surname>
					<given-names>S</given-names>
				</name>					
				</person-group>								
				<year>2007</year>
				<article-title>On the analysis of the virulence nature of TIGR4 and R6 strains of <italic>Streptococcus pneumoniae</italic> using genome comparison tools</article-title>
				<source>J Chem Sci</source>				
				<volume>119</volume>				
				<fpage>559</fpage>
				<lpage>563</lpage>												
			</citation>
			</ref>	
			<ref id="r16">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Lanie</surname>
					<given-names>JA</given-names>
				</name>		
				<name>
					<surname>Wai</surname>
					<given-names>LNG</given-names>
				</name>	
				<name>
					<surname>Kazmierczak</surname>
					<given-names>KM</given-names>
				</name>		
				<name>
					<surname>Andrzejewski</surname>
					<given-names>TM</given-names>
				</name>	
				<name>
					<surname>Davidsen</surname>
					<given-names>TM</given-names>
				</name>	<etal/>			
				</person-group>								
				<year>2007</year>
				<article-title>Genome sequence of Avery&rsquo;s virulent serotype 2 strain D39 of <italic>Streptococcus pneumoniae</italic> and comparison with that of unencapsulated laboratory strain R6</article-title>
				<source>J Bacteriol</source>				
				<volume>189</volume>				
				<fpage>38</fpage>
				<lpage>51</lpage>												
			</citation>
			</ref>	
			<ref id="r17">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Polissi</surname>
					<given-names>A</given-names>
				</name>		
				<name>
					<surname>Pontiggia</surname>
					<given-names>A</given-names>
				</name>	
				<name>
					<surname>Feger</surname>
					<given-names>G</given-names>
				</name>		
				<name>
					<surname>Altieri</surname>
					<given-names>M</given-names>
				</name>	
				<name>
					<surname>Mottl</surname>
					<given-names>H</given-names>
				</name>	<etal/>			
				</person-group>								
				<year>1998</year>
				<article-title>Large-scale identification of virulence genes from <italic>Streptococcus pneumoniae</italic></article-title>
				<source>Infect Immun</source>				
				<volume>66</volume>				
				<fpage>5620</fpage>
				<lpage>5629</lpage>												
			</citation>
			</ref>		
			<ref id="r18">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Rigden</surname>
					<given-names>DJ</given-names>
				</name>		
				<name>
					<surname>Galperin</surname>
					<given-names>MY</given-names>
				</name>	
				<name>
					<surname>Jedrzejas</surname>
					<given-names>MJ</given-names>
				</name>							
				</person-group>								
				<year>2003</year>
				<article-title>Analysis of structure and function of putative surface-exposed proteins encoded in the <italic>Streptococcus pneumoniae</italic> genome: A Bioinformatics-based approach to vaccine and drug design</article-title>
				<source>Crit Rev Biochem Mol Biol</source>				
				<volume>38</volume>				
				<fpage>143</fpage>
				<lpage>168</lpage>												
			</citation>
			</ref>	
			<ref id="r19">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Silva</surname>
					<given-names>NA</given-names>
				</name>		
				<name>
					<surname>McCluskey</surname>
					<given-names>J</given-names>
				</name>	
				<name>
					<surname>Jefferies</surname>
					<given-names>JM</given-names>
				</name>		
				<name>
					<surname>Hinds</surname>
					<given-names>J</given-names>
				</name>	
				<name>
					<surname>Smith</surname>
					<given-names>A</given-names>
				</name>	<etal/>						
				</person-group>								
				<year>2006</year>
				<article-title>Genomic diversity between strains of the same serotype and multilocus sequence type among pneumococcal clinical isolates</article-title>
				<source>Infect Immun</source>				
				<volume>74</volume>				
				<fpage>3513</fpage>
				<lpage>3518</lpage>												
			</citation>
			</ref>	
			<ref id="r20">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Sivashankari</surname>
					<given-names>S</given-names>
				</name>		
				<name>
					<surname>Shanmughavel</surname>
					<given-names>P</given-names>
				</name>										
				</person-group>								
				<year>2006</year>
				<article-title>Functional annotation of hypothetical proteins &ndash; A</article-title>
				<source>Bioinformation</source>				
				<volume>1</volume>				
				<fpage>335</fpage>
				<lpage>338</lpage>												
			</citation>
			</ref>	
			<ref id="r21">
			<citation citation-type="book">
			    <person-group>
				<name>
					<surname>Tettelin</surname>
					<given-names>H</given-names>
				</name>		
				<name>
					<surname>Hollingshead</surname>
					<given-names>SK</given-names>
				</name>										
				</person-group>								
				<year>2004</year>
				<article-title>Comparative genomics of <italic>Streptococcus pneumoniae</italic>: Intrastrain diversity and genome plasticity</article-title>							
				<edition>Tuomanen etal. (eds)</edition>
				<publisher-loc>Washington</publisher-loc>
				<publisher-name>The Pneumococcus ASM press</publisher-name>		
				<fpage>pp15</fpage>
				<lpage>29</lpage>												
			</citation>
			</ref>	
			<ref id="r22">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Tettelin</surname>
					<given-names>H</given-names>
				</name>		
				<name>
					<surname>Nelson</surname>
					<given-names>KE</given-names>
				</name>	
				<name>
					<surname>Paulsen</surname>
					<given-names>IT</given-names>
				</name>		
				<name>
					<surname>Eisen</surname>
					<given-names>JA</given-names>
				</name>	
				<name>
					<surname>Read</surname>
					<given-names>TD</given-names>
				</name>	<etal/>									
				</person-group>								
				<year>2001</year>											
				<article-title>Complete genome sequence of a virulent isolate of <italic>Streptococcus pneumoniae</italic></article-title>
				<source>Science</source>				
				<volume>293</volume>				
				<fpage>498</fpage>
				<lpage>506</lpage>												
			</citation>
			</ref>
			<ref id="r23">
			<citation citation-type="journal">
			    <person-group>
				<name>
					<surname>Wizemann</surname>
					<given-names>TM</given-names>
				</name>		
				<name>
					<surname>Heinrichs</surname>
					<given-names>JH</given-names>
				</name>	
				<name>
					<surname>Adamou</surname>
					<given-names>JE</given-names>
				</name>		
				<name>
					<surname>Erwin</surname>
					<given-names>AL</given-names>
				</name>	
				<name>
					<surname>Kunsch</surname>
					<given-names>C</given-names>
				</name>	<etal/>									
				</person-group>								
				<year>2001</year>											
				<article-title>Use of a whole genome approach to identify vaccine molecules affording protection against <italic>Streptococcus pneumoniae</italic> infection</article-title>
				<source>Infect Immun</source>				
				<volume>69</volume>				
				<fpage>1593</fpage>
				<lpage>1598</lpage>												
			</citation>
			</ref>
		</ref-list>
		 <glossary>
			<def-list>
				<title>Abbreviations</title>
				<def-item>
					<term>CMR</term>
					<def>
						<p>comprehensive microbial resource</p>
					</def>
				</def-item>
				<def-item>
					<term>cps</term>
					<def>
						<p>capsular polysaccharide</p>
					</def>
				</def-item>
				<def-item>
					<term>PspA</term>
					<def>
						<p>pneumococcal surface protein A</p>
					</def>
				</def-item>
				<def-item>
					<term>LytA</term>
					<def>
						<p>autolysin</p>
					</def>
				</def-item>	
				<def-item>
					<term>Hyl</term>
					<def>
						<p>hyaluronate lyase</p>
					</def>
				</def-item>
				<def-item>
					<term>Ply</term>
					<def>
						<p>pneumolysin</p>
					</def>
				</def-item>
				<def-item>
					<term>NanA and NanB</term>
					<def>
						<p>neuraminidases A and B</p>
					</def>
				</def-item>
				<def-item>
					<term>CbpA</term>
					<def>
						<p>choline binding protein A</p>
					</def>
				</def-item>
				<def-item>
					<term>PsaA</term>
					<def>
						<p>pneumococcal surface antigen A</p>
					</def>
				</def-item>
				<def-item>
					<term>IgA1</term>
					<def>
						<p>immunoglobulin A1 protease</p>
					</def>
				</def-item>						
			</def-list>
		</glossary>		
	</back>
	<floats-wrap>
	<table-wrap position="float" id="t1">
	<label>Table 1.</label>
  			<caption>
  				<title>Comparison of the genome features of the strains, encapsulated TIGR4, D39 &amp; G54 and nonencapsulated R6 of S. pneumoniae using CMR, Bioedit and CBS tools.</title>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left">Sl. No.</th>
            <th align="left">Genome Information and Features</th>
            <th align="left">TIGR4</th>
			<th align="left">D39</th>
			<th align="left">G54</th>
			<th align="left">R6</th>					
         </tr>
      </thead>
      <tbody>
         <tr>
            <td rowspan="7">1</td>
            <td>Sequencing center</td>
            <td>TIGR</td>
			<td>TIGR</td>
            <td>Geneva Biomedical Research Institute</td>  
			<td>Eli Lilly</td>          	
         </tr>
         <tr>
            <td>GenBank accession</td>
            <td>AE005672.1</td>
            <td>CP000410.1</td>
			<td>NA</td>
            <td>AE007317.1</td> 			 
         </tr>
         <tr>
            <td>Refseq</td>
            <td>NC_003028</td>
            <td>NC_008533</td>
			<td>NA</td>
            <td>NC_003098</td>           
         </tr>
         <tr>
            <td>Topology</td>
            <td>Circular</td>
            <td>Circular</td>
			<td>Circular</td>
            <td>Circular</td>           	
         </tr>	
		 <tr>
            <td>Molecule</td>
            <td>dsDNA</td>
            <td>dsDNA</td>
			<td>dsDNA</td>
            <td>dsDNA</td>           	
         </tr>
		 <tr>
            <td>Contig</td>
            <td>1</td>
            <td>1</td>
			<td>31 contigs</td>
            <td>1</td>           	
         </tr>
		 <tr>
            <td>Completed date</td>
            <td>2001/10/03</td>
            <td>2006/10/24</td>
			<td>Not yet included in NCBI</td>
            <td>2001/10/03</td>           	
         </tr>	
		 <tr>
            <td rowspan="7">2</td>
            <td>Genome size  (sequence length)</td>
            <td>2.16 Mb </td>
			<td>2Mb</td>
            <td>2.07Mb</td>  
			<td>2.03 Mb</td>          	
         </tr>
		 <tr>
            <td>Number of A</td>
            <td>653880 (30.26%)</td>
            <td>617717 (30.19%)</td>
			<td>628663 (30.31%)</td>
            <td>615270 (30.18 %)</td>           	
         </tr> 
		 <tr>
            <td>Number of T</td>
            <td>649168 (30.04%)</td>
            <td>615968 (30.10%)</td>
			<td>624751 (30.10%)</td>
            <td>613689 (30.10 %)</td>           	
         </tr>  
		 <tr>
            <td>Number of G</td>
            <td>430998 (19.95%)</td>
            <td>407646 (19.92%)</td>
			<td>404611 (19.50%)</td>
            <td>406018 (19.91 %)</td>           	
         </tr>
		 <tr>
            <td>Number of C</td>
            <td>426796 (19.75%)</td>
            <td>404784 (19.78%)</td>
			<td>414824 (20.00%)</td>
            <td>403638 (19.79 %)</td>           	
         </tr>
		 <tr>
            <td>No. of A+T (%)</td>
            <td>60.30</td>
            <td>60.29</td>
			<td>60.43</td>
            <td>60.28</td>           	
         </tr>
		 <tr>
            <td>No. of G+C (%)</td>
            <td>39.69</td>
            <td>39.71</td>
			<td>39.50</td>
            <td>39.71</td>           	
         </tr>
		 <tr>
            <td rowspan="7">3</td>
            <td>Total size of DNA molecule</td>
            <td>2160842 bp</td>
			<td>2046115 bp</td>
            <td>2074072 bp</td>  
			<td>2038615 bp</td>          	
         </tr>
		 <tr>
            <td>Number of coding bases</td>
            <td>1885091 bp (87.23%)</td>
            <td>NA</td>
			<td>1761820 bp (84.94%)</td>
            <td>1761157 bp (86.38%)</td>           	
         </tr>
		 <tr>
            <td>Number of genes</td>
            <td>2234</td>
            <td>1914</td>
			<td>2047</td>
            <td>2043</td>           	
         </tr>
		  <tr>
            <td>Number of genes assigned to role ids</td>
            <td>1506 (67.41%)</td>
            <td>NA</td>
			<td>1343 (65.60%)</td>
            <td>1313 (64.26%)</td>           	
         </tr>
		 <tr>
            <td>Structural RNAs</td>
            <td>70</td>
            <td>73</td>
			<td>NA</td>
            <td>73</td>           	
         </tr>
		 <tr>
            <td>tRNA genes</td>
            <td>58</td>
            <td>58</td>
			<td>51</td>
            <td>58</td>           	
         </tr>
		 <tr>
            <td>rRNA genes</td>
            <td>12</td>
            <td>12</td>
			<td>5</td>
            <td>12</td>           	
         </tr>
		 <tr>
            <td rowspan="4">4</td>
            <td>% global direct repeats</td>
            <td>8.30</td>
			<td rowspan="4" colspan="2">CBS tool does not have the whole genome data of D39 and G54</td>
            <td>5.70</td>  			        	
         </tr>
		 <tr>
            <td>% global inverted repeats</td>
			<td>7.00</td>
            <td>5.40</td>           	
         </tr>
		 <tr>
            <td>% local direct repeats</td>
			<td>6.40</td>
            <td>5.80</td>           	
         </tr>
		 <tr>
            <td>% local inverted repeats</td>
			<td>4.30</td>
            <td>4.20</td>           	
         </tr>
     </tbody>
 	  </table>
 	</table-wrap>
	<table-wrap position="float" id="t2">
	<label>Table 2.</label>
  			<caption>
  				<title>Comparison of capsular polysaccharide (cps) synthesizing genes of four strains of S. pneumoniae. Each cps is compared with all cps sequences of other three strains using LALIGN; all the cps sequences considered fall under the Role Category 11 (Cell Envelope) of CMR..</title>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left">Strain name</th>
            <th align="left">Gene ID and Name</th>
            <th align="left">G+C(%)</th>
			<th align="left">Protein length (aa)</th>
			<th align="left">Gene length (bp)</th>
			<th align="left">Gene coordinates</th>
			<th align="left">Comparison with cps of other strains in %Identity</th>					
         </tr>		
      </thead>
      <tbody>
         <tr>
            <td rowspan="21"><bold>TIGR4</bold></td>
            <td rowspan="2">gi|15900275-cps4A</td>
            <td rowspan="2">38.32</td>
			<td rowspan="2">481</td>
            <td rowspan="2">1446</td>
            <td rowspan="2">320077  -  321522</td>
			<td>96.0  -  D39-gi|116516963-cps2A</td>           		
         </tr>
		 <tr>
		 	<td>94.0  -  G54-NT05SP0190-cps4A</td>  
		</tr>
        <tr>
           	<td rowspan="2">gi|15900276-cps4B</td>
            <td rowspan="2">41.98</td>
			<td rowspan="2">243</td>
            <td rowspan="2">732</td>
            <td rowspan="2">321524  -  322255</td>
			<td>97.9  -  D39-gi|116516159-cpsB</td>           		
         </tr>
		 <tr>
		 	<td>86.4  -  G54-NT05SP0191-cps4B</td>  
		</tr>
		<tr>
           	<td>gi|15900277-cps4C</td>
            <td>40.29</td>
			<td>230</td>
            <td>693</td>
            <td>322264  -  322956</td>
			<td>85.7  -  G54-NT05SP0192-cps4C</td>           		
         </tr>
		 <tr>
           	<td rowspan="2">gi|15900278- cps4D</td>
            <td rowspan="2">34.21</td>
			<td rowspan="2">227</td>
            <td rowspan="2">684</td>
            <td rowspan="2"> 322966  -  323649</td>
			<td>79.6  -  D39-gi|116517023-cps2D</td>           		
         </tr>
		 <tr>
		 	<td>93.8  -  G54-NT05SP0193-cps4D</td>  
		</tr>
		<tr>
           	<td>gi|15900279-cps4E</td>
            <td>33.49</td>
			<td>211</td>
            <td>636</td>
            <td>323990  -  324625</td>
			<td>--</td>           		
         </tr>
		 <tr>
           	<td>gi|15900280-cps4F</td>
            <td>33.17</td>
			<td>409</td>
            <td>1230</td>
            <td>324634  -  325863</td>
			<td>--</td>           		
         </tr>
		 <tr>
           	<td>gi|15900281-cps4G</td>
            <td>27.84</td>
			<td>358</td>
            <td>1077</td>
            <td>325868  -  326944</td>
			<td>--</td>           		
         </tr>
		 <tr>
           	<td>gi|15900282-cps4H</td>
            <td>31.36</td>
			<td>372</td>
            <td>1119</td>
            <td>326937  -  328055</td>
			<td>--</td>           		
         </tr>
		 <tr>
           	<td>gi|15900286-cps4I</td>
            <td>36.70</td>
			<td>365</td>
            <td>1098</td>
            <td>331774  -  332871</td>
			<td>--</td>           		
         </tr>
		 <tr>
           	<td>gi|15900287-cps4J</td>
            <td>38.46</td>
			<td>351</td>
            <td>1056</td>
            <td>332875  -  333930</td>
			<td>--</td>           		
         </tr>
		 <tr>
           	<td>gi|15900288-cps4K</td>
            <td>36.19</td>
			<td>409</td>
            <td>1230</td>
            <td>334030  -  335259</td>
			<td>--</td>           		
         </tr>
		 <tr>
           	<td>gi|15900289-cps4L</td>
            <td>35.02</td>
			<td>394</td>
            <td>1185</td>
            <td>335260  -  336444</td>
			<td>--</td>           		
         </tr>
		  <tr>
           	<td rowspan="3">gi|15900046-cps-ptv<sup>&ast;</sup></td>
            <td rowspan="3">42.21</td>
			<td rowspan="3">616</td>
            <td rowspan="3">1851</td>
            <td rowspan="3">104668  -  106518</td>
			<td>99.8  -  D39-gi|116517199-cps</td>           		
         </tr>
		 <tr>
		 	<td>99.7  - G54-NT05SP2185-cps9E</td>
		 </tr>
		 <tr>
		 	<td>99.8  -  R6-gi|15902136-capD</td>
		</tr>
		<tr>
           	<td>gi|15900788-cps-ptv</td>
            <td>28.79</td>
			<td>455</td>
            <td>1368</td>
            <td> 859370  -  860737</td>
			<td>--</td>           		
         </tr>
		 <tr>
           	<td rowspan="2">gi|15901666-cps-ptv</td>
            <td rowspan="2">43.93</td>
			<td rowspan="2">408</td>
            <td rowspan="2">1227</td>
            <td rowspan="2">1746322 - 1747548</td>
			<td>99.0  -  D39-gi|116516120-cps-ptv</td>           		
         </tr>
		 <tr>
		 	<td>96.6  -  G54-NT05SP1650-cps7G</td>  
		</tr>
		<tr>
			<td rowspan="8"><bold>D39</bold></td>
           	<td>gi|116516963-cps2A</td>
            <td>38.45</td>
			<td>481</td>
            <td>1446</td>
            <td>313744  -  315189</td>
			<td>96.3  -  G54-NT05SP0190-cps4A</td>           		
         </tr>
		 <tr>
			<td>gi|116516159-cpsB</td>
            <td>41.53</td>
			<td>243</td>
            <td>732</td>
            <td>315191  -  315922</td>
			<td>85.2  -  G54-NT05SP0191-cps4B</td>           		
         </tr>
		 <tr>
			<td>gi|116517023-cps2D</td>
            <td>39.06</td>
			<td>226</td>
            <td>681</td>
            <td>316633  -  317313</td>
			<td>79.3  -  G54-NT05SP0192-cps4C</td>           		
         </tr>
		 <tr>
			<td>gi|116516773-cps2E</td>
            <td>37.79</td>
			<td>455</td>
            <td>1368</td>
            <td> 317328  -  318695</td>
			<td>--</td>           		
         </tr>
		 <tr>
           	<td rowspan="2">gi|116517199-cps</td>
            <td rowspan="2">42.19</td>
			<td rowspan="2">616</td>
            <td rowspan="2">1851</td>
            <td rowspan="2">99217  - 101067</td>
			<td>99.5  -  G54-NT05SP2185-cps9E</td>           		
         </tr>
		 <tr>
		 	<td>100   -  R6-gi|15902136-capD</td>  
		</tr>
		<tr>
			<td>gi|116516341-cps-ptv</td>
            <td>30.28</td>
			<td>119</td>
            <td>360</td>
            <td>815811  -  816170</td>
			<td>--</td>           		
         </tr>
		 <tr>
			<td>gi|116516120-cps-ptv</td>
            <td>44.09</td>
			<td>408</td>
            <td>1227</td>
            <td>1633887 - 1635113</td>
			<td>97.1  -  G54- NT05SP1650-cps7G</td>           		
         </tr>
		 <tr>
			<td rowspan="9"><bold>G54</bold></td>
           	<td>NT05SP0190-cps4A</td>
            <td>38.28</td>
			<td>484</td>
            <td>1455</td>
            <td>165975  -  167429</td>
			<td>--</td>           		
         </tr>
		 <tr>
			<td>NT05SP0191-cps4B</td>
            <td>37.56</td>
			<td>243</td>
            <td>732</td>
            <td>167431  -  168162</td>
			<td>--</td>           		
         </tr>
		 <tr>
			<td>NT05SP0192-cps4C</td>
            <td>38.09</td>
			<td>230</td>
            <td>693</td>
            <td>168171  -  168863</td>
			<td>--</td>           		
         </tr>
		 <tr>
			<td>NT05SP0193-cps4D</td>
            <td>34.64</td>
			<td>227</td>
            <td>684</td>
            <td>168873  -  169556</td>
			<td>--</td>           		
         </tr>
		 <tr>
			<td>NT05SP0198-cps19AI</td>
            <td>29.82</td>
			<td>445</td>
            <td>1338</td>
            <td> 173388  -  174725</td>
			<td>--</td>           		
         </tr>
		 <tr>
			<td>NT05SP0202-cps23FP</td>
            <td>41.70</td>
			<td>198</td>
            <td>597</td>
            <td>178230  -  178826</td>
			<td>--</td>           		
         </tr>
		 <tr>
			<td>NT05SP1650-cps7G</td>
            <td>43.77</td>
			<td>417</td>
            <td>1254</td>
            <td>1493392 -  1492139</td>
			<td>--</td>           		
         </tr>
		 <tr>
			<td>NT05SP1909-cps3E</td>
            <td>43.63</td>
			<td>436</td>
            <td>1311</td>
            <td>1726013 -  1727323</td>
			<td>--</td>           		
         </tr>
		 <tr>
			<td>NT05SP2185-cps9E</td>
            <td>42.46</td>
			<td>616</td>
            <td>1851</td>
            <td>1999333 -  2001183</td>
			<td>99.5  -  R6-gi|15902136-capD</td>           		
         </tr>
		 <tr>
		 	<td><bold>R6</bold></td>
			<td>gi|15902136-capD</td>
            <td>42.26</td>
			<td>616</td>
            <td>1851</td>
            <td>  99217  -  101067</td>
			<td>--</td>           		
         </tr>
     </tbody>
 	  </table>
	  <table-wrap-foot>
  				<fn>
  					<p><sup>&ast;</sup>ptv: putative.</p>
  				</fn>
  	  </table-wrap-foot>
 	</table-wrap>
	<table-wrap position="float" id="t3">
	<label>Table 3.</label>
  			<caption>
  				<title>Distribution of genes in the whole genomes of TIGR4, G54 and R6 strains of S. pneumoniae based on their gene role category. These gene role category data are retrieved and compiled from CMR using its Gene Role Category Pie-chart.</title>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left">S. No.</th>
            <th align="left">Gene Role Category</th>
            <th align="left">TIGR4 No of genes - out of 2234(%)</th>
			<th align="left">G54 No of genes - out of 2047(%)</th>
			<th align="left">R6 No of genes - out of 2219(%)</th>						
         </tr>
      </thead>
      <tbody>
         <tr>
            <td></td>
            <td><italic>Similar proteins (common proteins)</italic></td>
			<td></td>
            <td></td>  
			<td></td>          	
         </tr>
         <tr>
            <td>1</td>
            <td>Biosynthesis of cofactors, prosthetic groups, and  carriers</td>
            <td>42   (1.88%)</td>
			<td>48   (2.34%)</td>
            <td>47   (2.11%)</td> 			 
         </tr>
         <tr>
            <td>2</td>
            <td>DNA metabolism </td>
            <td>92   (4.11%)</td>
			<td>98   (4.78%)</td>
            <td>104   (4.68%)</td>           
         </tr>
         <tr>
            <td>3</td>
            <td>Fatty acid and phospholipids metabolism </td>
            <td>23   (1.02%)</td>
			<td>37   (1.8%)</td>
            <td>34   (1.53%)</td>           	
         </tr>	
		 <tr>
            <td>4</td>
            <td>Protein fate </td>
            <td>70   (3.13%)</td>
			<td>76   (3.71%)</td>
            <td>69   (3.10%)</td>           	
         </tr>
		 <tr>
		 	<td>5</td>
            <td>Protein synthesis </td>
            <td>120   (5.37%)</td>
            <td>129   (6.3%)</td>
			<td>128   (5.76%)</td>                       	
         </tr>
		 <tr>
            <td>6</td>
            <td>Purines, pyrimidines nucleosides and nucleotides</td>
            <td>54   (2.41%) </td>
			<td>58   (2.83%)</td>
            <td>61   (2.74%)</td>           	
         </tr>	
		 <tr>
            <td>7</td>
            <td>Regulatory functions </td>
			<td>121   (5.41%)</td>
            <td>117   (5.71%)</td>  
			<td>122    (5.49%)</td>          	
         </tr>
		 <tr>
            <td>8</td>
            <td>Transcription </td>
            <td>29   (1.29%)</td>
			<td>29   (1.41%)</td>
            <td>31   (1.39%)</td>           	
         </tr> 
		 <tr>
            <td>9</td>
            <td>Transport and binding  proteins</td>
            <td>267   (11.90%)</td>
			<td>218   (10.6%)</td>
            <td>236   (10.6%)</td>           	
         </tr>  
		 <tr>
		 	<td colspan="5"></td>
		 </tr>
		 <tr>
		 	<td colspan="5"></td>
		 </tr>
		 <tr>
            <td></td>
            <td><italic>Dissimilar proteins (unique proteins)</italic></td>
			<td></td>
            <td></td>  
			<td></td>          	
         </tr>
		 <tr>
            <td>10</td>
            <td>Amino acid biosynthesis</td>
            <td>53    (2.37%)</td>
			<td>95    (4.64%)</td>
            <td>100   (4.50%)</td>           	
         </tr>
		 <tr>
            <td>11</td>
            <td>Cell envelope</td>
            <td>136    (6.08%)</td>
			<td>131   (6.39%)</td>
            <td>96    (4.32%)</td>           	
         </tr>
		 <tr>
            <td>12</td>
            <td>Cellular processes </td>
            <td>147    (6.58%)</td>
			<td>91   (4.44%)</td>
            <td>76    (3.42%)</td>           	
         </tr>
		 <tr>
            <td>13</td>
            <td>Central intermediary metabolism </td>
            <td>11   (0.49%)</td>
			<td>87   (4.25%)</td>
            <td>93   (4.19%)</td>           	
         </tr>
		 <tr>
            <td>14</td>
            <td>Disrupted reading frame </td>
			<td>92    (4.11%)</td>
            <td>0   (0%)</td>  
			<td>0   (0%)</td>          	
         </tr>
		 <tr>
            <td>15</td>
            <td>Energy metabolism </td>
            <td>143    (6.40%)</td>
			<td>185   (9.03%)</td>
            <td>197   (8.87%)</td>           	
         </tr>
		 <tr>
            <td>16</td>
            <td>Hypothetical proteins </td>
            <td>431   (19.20%)</td>
			<td>236   (11.5%)</td>
            <td>171   (7.70%)</td>           	
         </tr>
		  <tr>
            <td>17</td>
            <td>Conserved hypothetical proteins</td>
            <td>302   (13.50%)</td>
			<td>301   (14.7%)</td>
            <td>519   (23.3%)</td>           	
         </tr>
		 <tr>
            <td>18</td>
            <td>Mobile and extra chromosomal Element functions</td>
            <td>134    (5.99%)</td>
			<td>71   (3.46%)</td>
            <td>86   (3.87%)</td>           	
         </tr>
		 <tr>
            <td>19</td>
            <td>Pathogen responses<sup>&ast;</sup></td>
            <td>101    (4.52 %)</td>
			<td>47   (2.30%)</td>
            <td>42   (1.89%)</td>           	
         </tr>
		 <tr>
            <td>20</td>
            <td>Signal transduction </td>
            <td>79    (3.53%)</td>
			<td>4   (0.19%)</td>
            <td>4   (0.18%)</td>           	
         </tr>
		 <tr>
            <td>21</td>
            <td>Unclassified </td>
            <td>0    (0%)</td>
			<td>167   (8.15%)</td>
            <td>201   (9.05%)</td>           	
         </tr>
		 <tr>
            <td>22</td>
            <td>Unknown function</td>
            <td>174    (7.78%)</td>
			<td>72   (3.51%)</td>
            <td>51   (2.29%)</td>           	
         </tr>
		 <tr>
            <td>23</td>
            <td>Viral functions </td>
            <td>0    (0%)</td>
			<td>23   (1.12%)</td>
            <td>26   (1.17%)</td>           	
         </tr>		 
     </tbody>
 	  </table>
	  <table-wrap-foot>
  				<fn>
  					<p><sup>&ast;</sup>Manually counted.</p>
  				</fn>
  	  </table-wrap-foot>
 	</table-wrap>
	<table-wrap position="float" id="t4">
	<label>Table 4.</label>
  			<caption>
  				<title>Details of the number of hypothetical sequences in whole genomes, unique genes and virulence factors in unique genes of the strains TIGR4, G54 and R6 of <italic>S. pneumoniae</italic>. (D39 data are not included due to the non-availability of the genome sequence information of D39 strain in CMR tool).</title>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left">Types of sequences</th>
            <th align="left">Role categories</th>
            <th align="left">TIGR4</th>
			<th align="left">G54</th>
			<th align="left">R6</th>						
         </tr>
      </thead>
      <tbody>
         <tr>
            <td rowspan="6"><bold>Whole genome </bold></td>
            <td>Total no. of genes</td>
			<td>2234</td>
            <td>2047</td>  
			<td>2219</td>          	
         </tr>
         <tr>
            <td>Hypothetical</td>         
            <td>431</td>
			<td>236</td>
            <td>171</td> 			 
         </tr>
         <tr>
            <td>Conserved hypothetical</td>
            <td>302</td>
			<td>301</td>
            <td>519</td>           
         </tr>
         <tr>
            <td>Unclassified</td>            
            <td>Nil</td>
			<td>167</td>
            <td>201</td>           	
         </tr>	
		 <tr>
            <td>Unknown</td>
            <td>174</td>
			<td>72</td>
            <td>51</td>           	
         </tr>
		 <tr>
		 	<td><bold>Total</bold></td>
            <td><bold>907 - 40.6%</bold></td>
            <td><bold>776 - 37.9%</bold></td>
			<td><bold>942-42.5%</bold></td>                       	
         </tr>
		 <tr>
            <td rowspan="3"><bold>Number of common and unique genes.</bold></td>
            <td>Number of sequences Present in all comparison molecules</td>
            <td>1792</td>
			<td>1824</td>
            <td>1810</td>           	
         </tr>	
		 <tr>
            <td>Number of Present in at least one comparison molecule</td>
			<td>1946</td>
            <td>1943</td>  
			<td>1965</td>          	
         </tr>
		 <tr>
            <td>Not present in any of the comparison molecule (unique genes)</td>
            <td>288</td>
			<td>104</td>
            <td>78</td>           	
         </tr> 
		  <tr>
            <td rowspan="5"><bold>Unique genes</bold></td>
            <td>Hypothetical</td>
			<td>158</td>
            <td>47</td>  
			<td>68</td>          	
         </tr>
		 <tr>
            <td>Conserved hypothetical</td>
            <td>13</td>
			<td>11</td>
            <td>Nil</td>           	
         </tr> 		 
		 <tr>
            <td>Unclassified</td>            
			<td>Nil</td>
            <td>Nil</td>  
			<td>Nil</td>          	
         </tr>
		 <tr>
            <td>Unknown</td>            
            <td>Nil</td>
			<td>Nil</td>
            <td>Nil</td>           	
         </tr>
		 <tr>
            <td><bold>Total</bold></td>
            <td><bold>189 (65.63%)</bold></td>
            <td><bold>58 (74.36%)</bold></td>
			<td><bold>68 (65.39%)</bold></td>            	
         </tr>
		 <tr>
            <td rowspan="13"><bold>Virulence factors among unique genes</bold></td>
            <td rowspan="3">Capsular polysaccharide biosynthesis protein</td>
            <td>Sp_0351-cps4F</td>
			<td>--</td>
            <td>--</td>           	
         </tr>
		 <tr>
            <td>Sp_0352-cps4G</td>
			<td>--</td>
            <td>--</td>           	
         </tr>
		  <tr>
            <td>Sp_0359-cps4K</td>
			<td>--</td>
            <td>--</td>           	
         </tr>
		 <tr>
            <td rowspan="3">Type 2 capsule locus</td>           
			<td>--</td>
            <td>--</td>  
			<td>Spr0315</td>          	
         </tr>
		 <tr>
            <td>--</td>
            <td>--</td>  
			<td>Spr0317</td>          	
         </tr>
		 <tr>
            <td>--</td>
            <td>--</td>  
			<td>Spr0319</td>          	
         </tr>
		 <tr>
            <td rowspan="4">Cell wall surface anchor family protein</td>
            <td>Sp_0462</td>
			<td>--</td>
            <td>--</td>           	
         </tr>
		 <tr>
		 	<td>Sp_0463</td>
			<td>--</td>
            <td>--</td>           	
         </tr>
		 <tr>
		 	<td>Sp_0464</td>
			<td>--</td>
            <td>--</td>           	
         </tr>
		 <tr>
		 	<td>Sp_1772</td>
			<td>--</td>
            <td>--</td>           	
         </tr>
		 <tr>
            <td>PspC</td>          
            <td>Sp_1417</td>
			<td>--</td>
            <td>--</td>           	
         </tr>
		  <tr>
            <td>NanA, authentic frameshift</td>          
            <td>Sp_1693</td>
			<td>--</td>
            <td>--</td>           	
         </tr>
		 <tr>
            <td>IgA1 protease, degenerate</td>          
            <td>Sp_2155</td>
			<td>--</td>
            <td>--</td>           	
         </tr>		 
     </tbody>
 	  </table>	  
 	</table-wrap>
	<table-wrap position="float" id="t5">
	<label>Table 5.</label>
  			<caption>
  				<title>Comparison of the common virulence factors namely, pneumococcal surface protein A (PspA), autolysin (LytA), hyaluronate lyase (Hyl), pneumolysin (Ply), neuraminidase A (NanA), neuraminidase B (NanB), choline binding protein A (CbpA), pneumococcal surface antigen A (PsaA) and immunoglobulin A1 (IgA1) protease of four strains of S. pneumoniae. LALIGN program is used to find identity between sequences.</title>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left">Strain</th>
            <th align="left">Gene ID</th>
            <th align="left">Virulence factors</th>
			<th align="left">G+C (%)</th>
			<th align="left">Protein length (aa)</th>
			<th align="left">Gene length (bp)</th>
            <th align="left" colspan="2">Gene Coordinates</th>
            <th align="left">Role category &ast;&ast;&ast;</th>
			<th align="left">% Identity with D39</th>
			<th align="left">% Identity with G54</th>
			<th align="left">% Identity with R6</th>						
         </tr>
		 <tr>
            <th align="left"></th>
            <th align="left"></th>
            <th align="left"></th>
			<th align="left"></th>
			<th align="left"></th>
			<th align="left"></th>
            <th align="left">5&rsquo; </th>
            <th align="left">3&rsquo;</th>
			<th align="left"></th>
			<th align="left"></th>
			<th align="left"></th>			
			<th align="left"></th>				
         </tr>
      </thead>
      <tbody>
         <tr>
            <td rowspan="9"><bold>TIGR4</bold></td>
            <td>gi|15900059</td>
			<td>PspA</td>
            <td>40.23</td>  
			<td>744</td>
			<td>2235</td>
			<td>118423</td>
            <td>120657</td>  
			<td>1</td>   
			<td>53.6 </td> 
			<td>62.5</td>
			<td>53.6</td>		      	
         </tr>
		 <tr>
            <td>gi|15901761</td>
			<td>LytA</td>
            <td>46.44</td>  
			<td>318</td>
			<td>957</td>
			<td>1841361</td>
            <td>1840405</td>  
			<td>3</td>   
			<td>99.7</td> 
			<td>100</td>
			<td>99.7</td>		      	
         </tr>
		 <tr>
            <td>gi|15900247</td>
			<td>Hyl</td>
            <td>40.15</td>  
			<td>1066</td>
			<td>3201</td>
			<td>287483</td>
            <td>290683</td>  
			<td>5</td>   
			<td>98.8</td> 
			<td>97.5</td>
			<td>97.8</td>		      	
         </tr>   
		 <tr>
            <td>gi|15901747</td>
			<td>Ply</td>
            <td>41.83</td>  
			<td>471</td>
			<td>1416</td>
			<td>1833311</td>
            <td>1831896</td>  
			<td>6</td>   
			<td>99.8</td> 
			<td>100</td>
			<td>99.8</td>		      	
         </tr>    
		 <tr>
            <td>gi|15901180</td>
			<td>Nan-ptv<sup>&ast;</sup></td>
            <td>35.36</td>  
			<td>740</td>
			<td>2223</td>
			<td>1251631</td>
            <td>1249409</td>  
			<td>3</td>   
			<td>10.5</td> 
			<td>20.4</td>
			<td>19.6 </td>		      	
         </tr>  
		 <tr>
            <td>gi|15901522</td>
			<td>NanB</td>
            <td>33.38</td>  
			<td>697</td>
			<td>2094</td>
			<td>1589236</td>
            <td>1587143</td>  
			<td>3</td>   
			<td>99.1</td> 
			<td>98.9</td>
			<td>99.1</td>		      	
         </tr>   
		 <tr>
            <td>gi|15901997</td>
			<td>CbpA</td>
            <td>41.90</td>  
			<td>693</td>
			<td>2082</td>
			<td>2112096</td>
            <td>2110015</td>  
			<td>9</td>   
			<td>73.7</td> 
			<td>40.5</td>
			<td>73.7</td>		      	
         </tr>
		 <tr>
            <td>gi|15901485</td>
			<td>PsaA</td>
            <td>37.11</td>  
			<td>309</td>
			<td>930</td>
			<td>1549466</td>
            <td>1550395</td>  
			<td>14</td>   
			<td>99.7</td> 
			<td>98.1</td>
			<td>99.7</td>		      	
         </tr> 
		 <tr>
            <td>gi|15901019</td>
			<td>IgA1</td>
            <td>38.06</td>  
			<td>2004</td>
			<td>6015</td>
			<td>1083881</td>
            <td>1089895</td>  
			<td>15</td>   
			<td>87.3</td> 
			<td>35.9</td>
			<td>87.3</td>		      	
         </tr> 
		 <tr>
            <td rowspan="9"><bold>D39</bold></td>
            <td>gi|116515876</td>
			<td>PspA</td>
            <td>42.63</td>  
			<td>619</td>
			<td>1860</td>
			<td>128356</td>
            <td>130215</td>  
			<td rowspan="9">NA</td>   
			<td>--</td> 
			<td>63.4</td>
			<td>100</td>		      	
         </tr>
		  <tr>
            <td>gi|116516777</td>
			<td>LytA</td>
            <td>46.39</td>  
			<td>318</td>
			<td>957</td>
			<td>1729601</td>
            <td>1730557</td>  			
			<td>--</td> 
			<td>99.7</td>
			<td>100</td>		      	
         </tr>
		 <tr>
            <td>gi|116515977</td>
			<td>Hyl</td>
            <td>40.14</td>  
			<td>1067</td>
			<td>3204</td>
			<td>285186</td>
            <td>288389</td>  			
			<td>--</td> 
			<td>97.8</td>
			<td>99.0</td>		      	
         </tr>
		 <tr>
            <td>gi|116515376</td>
			<td>Ply</td>
            <td>42.02</td>  
			<td>471</td>
			<td>1416</td>
			<td>1721457</td>
            <td>1722872</td>  			
			<td>--</td> 
			<td>99.8</td>
			<td>100</td>		      	
         </tr>
		 <tr>
            <td>gi|116515419</td>
			<td>N.lyase-ptv<sup>&ast;&ast;</sup></td>
            <td>43.31</td>  
			<td>243</td>
			<td>732</td>
			<td>1190890</td>
            <td>1191621</td>  			
			<td>--</td> 
			<td>10.2</td>
			<td>9.9</td>		      	
         </tr>
		 <tr>
            <td>gi|116516987</td>
			<td>NanB</td>
            <td>33.38</td>  
			<td>697</td>
			<td>2094</td>
			<td>1515745</td>
            <td>1517838</td>  			
			<td>--</td> 
			<td>98.6 </td>
			<td>100</td>		      	
         </tr>
		 <tr>
            <td>gi|116515359</td>
			<td>CbpA</td>
            <td>41.26</td>  
			<td>701</td>
			<td>2106</td>
			<td>1995044</td>
            <td>1997149</td>  			
			<td>--</td> 
			<td>31.0</td>
			<td>100</td>		      	
         </tr>
		 <tr>
            <td>gi|116515973</td>
			<td>PsaA</td>
            <td>37.10</td>  
			<td>309</td>
			<td>930</td>
			<td>1478217</td>
            <td>1479146</td>  			
			<td>--</td> 
			<td>98.4</td>
			<td>100</td>		      	
         </tr>
		 <tr>
            <td>gi|116516343</td>
			<td>IgA1</td>
            <td>39.09</td>  
			<td>1963</td>
			<td>5892</td>
			<td>1037492</td>
            <td>1043383</td>  			
			<td>--</td> 
			<td>36.4</td>
			<td>100</td>		      	
         </tr>
		 <tr>
            <td rowspan="9"><bold>G54</bold></td>
            <td>NT05SP2202</td>
			<td>PspA</td>
            <td>41.36</td>  
			<td>709</td>
			<td>2130</td>
			<td>2015436</td>
            <td>2017565</td>  
			<td>1</td>   
			<td>--</td> 
			<td>--</td>
			<td>63.4 </td>		      	
         </tr>
		 <tr>
            <td>NT05SP1836</td>
			<td>LytA</td>
            <td>46.29</td>  
			<td>318</td>
			<td>957</td>
			<td>1656972</td>
            <td>1656016</td>  
			<td>4</td>   
			<td>--</td> 
			<td>--</td>
			<td>99.7</td>		      	
         </tr>
		 <tr>
            <td>NT05SP0158</td>
			<td>Hyl</td>
            <td>39.97</td>  
			<td>1078</td>
			<td>3237</td>
			<td>137159</td>
            <td>140395</td>  
			<td>5</td>   
			<td>--</td> 
			<td>--</td>
			<td>98.7</td>		      	
         </tr>
		 <tr>
            <td>NT05SP1746</td>
			<td>Ply</td>
            <td>41.94</td>  
			<td>471</td>
			<td>1416</td>
			<td>1577243</td>
            <td>1575828</td>  
			<td>7</td>   
			<td>--</td> 
			<td>--</td>
			<td>99.8</td>		      	
         </tr>
		 <tr>
            <td>NT05SP1517</td>
			<td>Nan A</td>
            <td>41.48</td>  
			<td>980</td>
			<td>2943</td>
			<td>1379132</td>
            <td>1376190</td>  
			<td>8</td>   
			<td>--</td> 
			<td>--</td>
			<td>90.8</td>		      	
         </tr>
		 <tr>
            <td>NT05SP1511</td>
			<td>Nan B</td>
            <td>33.19</td>  
			<td>697</td>
			<td>2094</td>
			<td>1371552</td>
            <td>1369459</td>  
			<td>8</td>   
			<td>--</td> 
			<td>--</td>
			<td>98.6</td>		      	
         </tr>
		 <tr>
            <td>NT05SP2037</td>
			<td>CbpA</td>
            <td>40.67</td>  
			<td>739</td>
			<td>2220</td>
			<td>1848760</td>
            <td>1846541</td>  
			<td>10</td>   
			<td>--</td> 
			<td>--</td>
			<td>31.0</td>		      	
         </tr>
		 <tr>
            <td>NT05SP1476</td>
			<td>psaA</td>
            <td>37.04</td>  
			<td>313</td>
			<td>942</td>
			<td>1331767</td>
            <td>1332708</td>  
			<td>7</td>   
			<td>--</td> 
			<td>--</td>
			<td>98.4</td>		      	
         </tr>
		 <tr>
            <td>NT05SP2154</td>
			<td>IgA1</td>
            <td>36.78</td>  
			<td>1856</td>
			<td>5571</td>
			<td>1969880</td>
            <td>1975450</td>  
			<td>16</td>   
			<td>--</td> 
			<td>--</td>
			<td>36.4</td>		      	
         </tr>
		 <tr>
            <td rowspan="9"><bold>R6</bold></td>
            <td>gi|15902165</td>
			<td>PspA</td>
            <td>42.65</td>  
			<td>619</td>
			<td>1860</td>
			<td>128356</td>
            <td>130215</td>  
			<td>2</td>   
			<td>--</td> 
			<td>--</td>
			<td>--</td>		      	
         </tr>
		 <tr>
            <td>gi|15903796</td>
			<td>LytA</td>
            <td>46.54</td>  
			<td>318</td>
			<td>957</td>
			<td>1723025</td>
            <td>1722069</td>  
			<td>4</td>   
			<td>--</td> 
			<td>--</td>
			<td>--</td>		      	
         </tr>
		 <tr>
            <td>gi|15902330</td>
			<td>Hyl</td>
            <td>40.01</td>  
			<td>1078</td>
			<td>3237</td>
			<td>285103</td>
            <td>288339</td>  
			<td>5</td>   
			<td>--</td> 
			<td>--</td>
			<td>--</td>		      	
         </tr>
		 <tr>
            <td>gi|15903781</td>
			<td>Ply</td>
            <td>42.04</td>  
			<td>471</td>
			<td>1416</td>
			<td>1715341</td>
            <td>1713926</td>  
			<td>7</td>   
			<td>--</td> 
			<td>--</td>
			<td>--</td>		      	
         </tr>
		 <tr>
            <td>gi|15903579</td>
			<td>NanA</td>
            <td>42.67</td>  
			<td>1035</td>
			<td>3108</td>
			<td>1518051</td>
            <td>1514944</td>  
			<td>8</td>   
			<td>--</td> 
			<td>--</td>
			<td>--</td>		      	
         </tr>
		 <tr>
            <td>gi|15903574</td>
			<td>NanB</td>
            <td>33.43</td>  
			<td>697</td>
			<td>2094</td>
			<td>1510307</td>
            <td>1508214</td>  
			<td>8</td>   
			<td>--</td> 
			<td>--</td>
			<td>--</td>		      	
         </tr>
		 <tr>
            <td>gi|15904036</td>
			<td>CbpA</td>
            <td>41.32</td>  
			<td>701</td>
			<td>2106</td>
			<td>1989649</td>
            <td>1987544</td>  
			<td>10</td>   
			<td>--</td> 
			<td>--</td>
			<td>--</td>		      	
         </tr>
		 <tr>
            <td>gi|15903537</td>
			<td>PsaA</td>
            <td>37.22</td>  
			<td>309</td>
			<td>930</td>
			<td>1470686</td>
            <td>1471615</td>  
			<td>12</td>   
			<td>--</td> 
			<td>--</td>
			<td>--</td>		      	
         </tr>
		 <tr>
            <td>gi|15903086</td>
			<td>IgA1</td>
            <td>39.09</td>  
			<td>1963</td>
			<td>5892</td>
			<td>1029961</td>
            <td>1035852</td>  
			<td>13</td>   
			<td>--</td> 
			<td>--</td>
			<td>--</td>		      	
         </tr>
     </tbody>
 	  </table>	
	  <table-wrap-foot>
  				<fn>
					<p>NA - Not Available</p>
  					<p><sup>&ast;</sup>Nan-ptv: Neuraminidase, putative</p>
					<p><sup>&ast;&ast;</sup>N.lyase-ptv: N-acetylneuraminate lyase, putative</p>					
					<p><sup>&ast;&ast;&ast;</sup> Role category functions</p>
					<p>1.	Cell envelope; cellular process &ndash; pathogenesis</p>
					<p>2.	Mobile and extra chromosomal element function: transposon function</p>
					<p>3.	Cell envelope biosynthesis and degradation of surface polysaccharides and Lipopolysaccharides; Cellular processes: pathogenesis</p>
					<p>4.	Cell envelope: biosynthesis and degradation of murine sacculus and peptidoglycan</p>
					<p>5.	Cellular processes: pathogenesis</p>
					<p>6.	Cellular processes: toxin production and resistance; Cellular processes: pathogenesis</p>
					<p>7.	Unclassified: role category not yet assigned</p>
					<p>8.	Viral function: general</p>
					<p>9.	Cell envelope; cellular process &ndash; pathogenesis cellular process: cell adhesion</p>
					<p>10.	Cellular processes toxin production and resistance; Fatty acid and phospholipid metabolism:  degradation</p>
					<p>11.	Cell envelope biosynthesis and degradation of surface polysaccharides and   Lipopolysaccharides</p>
					<p>12.	Unclassified &ndash; role category not yet assigned</p>
					<p>13.	protein fate: Degradation of proteins, peptides and glycopeptides</p>
					<p>14.	Transport and binding proteins: Cations and iron carrying compounds; Cellular processes: pathogenesis; cellular processes:cell adhesion</p>
					<p>15.	protein fate: Degradation of proteins, peptides and glycopeptides; Cellular processes: pathogenesis</p>
					<p>16.	protein fate: Degradation of proteins, peptides and glycopeptides</p>
  				</fn>
  	  </table-wrap-foot>  
 	</table-wrap>
	<table-wrap position="float" id="t6">
	<label>Table 6.</label>
  			<caption>
  				<title>List of predicted 4 and 22 hypothetical protein sequences as virulence factors from Tigr4 and R6 respectively.</title>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left">S. No.</th>
            <th align="left">Protein ID</th>
            <th align="left">Protein Length</th>									
         </tr>
      </thead>
      <tbody>
         <tr>
            <td colspan="3" align="center"><bold>TIGR4</bold></td>                    	
         </tr>
         <tr>
            <td>1</td>         
            <td>gi|15900762</td>
			<td>177</td>           
         </tr>
         <tr>
            <td>2</td>
			<td>gi|15900877</td>
            <td>1039</td>           
         </tr>
         <tr>
            <td>3</td>
			<td>gi|15901572</td>
            <td>84</td>           	
         </tr>
		 <tr>
            <td colspan="3" align="center"><bold>R6</bold></td>                    	
         </tr>	
		 <tr>
            <td>1</td>
            <td>gi|15902135</td>
			<td>385</td>                  	
         </tr>
		 <tr>
		 	<td>2</td>
            <td>gi|15902152</td>
            <td>450</td>			                   	
         </tr>
		 <tr>
            <td>3</td>
			<td>gi|15902269</td>
            <td>65</td>           	
         </tr>	
		 <tr>
            <td>4</td>
            <td>gi|15902355</td>  
			<td>57</td>          	
         </tr>
		 <tr>
            <td>5</td>
			<td>gi|15902369</td>
            <td>149</td>           	
         </tr> 
		  <tr>
            <td>6</td>
            <td>gi|15902372</td>  
			<td>1767</td>          	
         </tr>
		 <tr>
            <td>7</td>
			<td>gi|15902511</td>
            <td>111</td>           	
         </tr> 		 
		 <tr>
            <td>8</td>
            <td>gi|15902652</td>  
			<td>337</td>          	
         </tr>
		 <tr>
            <td>9</td>
			<td>gi|15902781</td>
            <td>170</td>           	
         </tr>
		 <tr>
            <td>10</td>
            <td>gi|15902826</td>
			<td>177</td>            	
         </tr>		
		 <tr>
            <td>11</td>
			<td>gi|15902850</td>
            <td>122</td>           	
         </tr>
		  <tr>
            <td>12</td>
			<td>gi|15903009</td>
            <td>368</td>           	
         </tr>
		 <tr>
            <td>13</td>
            <td>gi|15903331</td>  
			<td>330</td>          	
         </tr>
		 <tr>
            <td>14</td>
            <td>gi|15903388</td>  
			<td>202</td>          	
         </tr>
		 <tr>
            <td>15</td>
            <td>gi|15903446</td>  
			<td>2551</td>          	
         </tr>
		 <tr>
            <td>16</td>
			<td>gi|15903447</td>
            <td>502</td>           	
         </tr>
		 <tr>
		 	<td>17</td>
			<td>gi|15903627</td>
            <td>84</td>           	
         </tr>
		 <tr>
		 	<td>18</td>
			<td>gi|15903694</td>
            <td>719</td>           	
         </tr>
		 <tr>
		 	<td>19</td>
			<td>gi|15903697</td>
            <td>243</td>           	
         </tr>
		 <tr>
            <td>20</td>          
            <td>gi|15903771</td>
			<td>71</td>                   	
         </tr>
		  <tr>
            <td>21</td>
			<td>gi|15903873</td>
            <td>64</td>           	
         </tr>
		 <tr>
            <td>22</td>
			<td>gi|15903916</td>
            <td>380</td>           	
         </tr>		 
     </tbody>
 	  </table>	  
	  </table-wrap>
	  <table-wrap position="float" id="t7">
	<label>Table 7.</label>
  			<caption>
  				<title>Comparison of the predicted and known virulence factors of hypothetical protein sequences with already known virulence factors of TIGR4 and R6 of <italic>S. pneumoniae</italic>.</title>
  			</caption>
   <table frame="hsides" rules="groups">
      <thead>
         <tr>
            <th align="left">ID from Interpro scan</th>
            <th align="left">ID of Hypo. Pro. Seq. of TIGR4 and R6</th>
            <th align="left">Length</th>
			<th align="left">Domain position</th>
			<th align="left">ID of known Virulence Factors of TIGR4 and R6</th>
			<th align="left">Name of Virulence Factors</th>	
			<th align="left">Length</th>
			<th align="left">Domain position</th>
			<th align="left">Functional region</th>					
         </tr>
      </thead>
      <tbody>
	  	  <tr>
			 <td colspan="9" align="center"><bold>TIGR4</bold></td>
		  </tr>
         <tr>
            <td>PF06769</td>
            <td>gi:|15901572</td>
			<td>84</td>
            <td>5-84</td>  
			<td>gi|15903627-VirPredR6</td>   
			<td>Hypothetical</td>
            <td>84</td>
			<td>5-84</td>
            <td>Plasmid_Txe</td>        	
         </tr>
		 <tr>
			 <td colspan="9" align="center"><bold>R6</bold></td>
		  </tr>
         <tr>
            <td rowspan="4">PF00746</td>
            <td>gi|15902372</td>
            <td>1767</td>
			<td>1727 &ndash; 1766</td>
            <td>gi|15902330 &ndash; R6</td> 
			<td>Hyl</td>
            <td>1078</td>
			<td>1040-1077</td>
            <td rowspan="4">Surface protein from Gram-positive cocci, anchor region</td>			 
         </tr>		 
         <tr>
            <td>gi|15903388</td>
            <td>202</td>
            <td>159-199</td>
			<td>gi|15903086 &ndash; R6</td>
            <td>IgA1</td>  
			<td>1963</td>
            <td>88-127</td>           
         </tr>
         <tr>
            <td>gi|15903446</td>
            <td>2551</td>
            <td>2513-2549</td>
			<td>gi|15900247 &ndash; TIGR4</td>
            <td>Hyl</td>
			<td>1066</td>
            <td>1028-1065</td>          	
         </tr>	
		 <tr>
		 	<td></td>
			<td></td>
            <td></td>
            <td>gi|15901019 &ndash; TIGR4</td>
            <td>IgA1</td>
			<td>2004</td>
            <td>88-127</td> 		          	
         </tr>
		 <tr>
            <td rowspan="4">G3DSA: 3.40.50.720</td>
            <td rowspan="4">gi|15902652</td>
            <td rowspan="4">337</td>
			<td rowspan="4">3-237</td>
            <td>gi|15902136 &ndash; R6</td> 
			<td>CapD</td>
            <td>616</td>
			<td>289-544</td>
            <td rowspan="4">Ubiquitin-activating enzyme E1</td>			 
         </tr>			 
		 <tr>
            <td>gi|15900287 &ndash; TIGR4</td>
            <td>cps4J</td>
            <td>351</td>
			<td>3-231</td>                 	
         </tr>
		 <tr>
            <td>gi|15900288 &ndash; TIGR4</td>
            <td>cps4K</td>
            <td>409</td>			
            <td>2-129</td>           	
         </tr>	
		 <tr>
            <td>gi|15900046 &ndash; TIGR4</td>
			<td>cps putative</td>
            <td>616</td>  
			<td>289-544</td>          	
         </tr>
		 <tr>
            <td rowspan="2">PF01289</td>
            <td rowspan="2">gi|15902781</td>
            <td rowspan="2">170</td>
			<td rowspan="2">63-168</td>
            <td>gi|15903781 &ndash; R6</td> 
			<td>Ply</td>
            <td>471</td>
			<td>67-84, 84-100, 142-162</td>
            <td rowspan="2">Thiol-activated cytolysin</td>			 
         </tr>			 
		 <tr>
            <td>gi|15901747 &ndash; TIGR4</td>
            <td>Ply</td>
            <td>471</td>
			<td>63-168</td>           	
         </tr> 
		 <tr>
            <td rowspan="5">PF04650</td>
            <td>gi|15903694</td>
            <td>719</td>
			<td>15-41</td>
            <td>gi|15903579 &ndash; R6</td> 
			<td>NanA</td>
            <td>1035</td>
			<td>21-47</td>
            <td rowspan="5">YSIRK Gram-positive signal peptide</td>			 
         </tr>	
		 <tr>
            <td rowspan="4">gi|15903446</td>
            <td rowspan="4">2551</td>
            <td rowspan="4">12-38</td>
			<td>gi|15904036 &ndash; R6</td>
            <td>cbpA</td> 
			<td>701</td>
            <td>1-40</td>					 
         </tr>
		 
		 <tr>
            <td>gi|15903086 &ndash; R6</td>
            <td>IgA1</td>
            <td>1963</td>
			<td>6-32</td>                 	
         </tr>  
		 <tr>
            <td>gi|15901997 &ndash; TIGR4</td>
            <td>cbpA</td>
            <td>693</td>
			<td>6-32</td>            
         </tr>
		 <tr>
            <td>gi|15901019 &ndash; TIGR4</td>
            <td>IgA1</td>
            <td>2004</td>
			<td>6-32</td>                   	
         </tr>
		 <tr>
            <td rowspan="2">PF07501</td>
            <td rowspan="2">gi|15903446</td>
            <td rowspan="2">2551</td>
			<td rowspan="2">473-549</td>
            <td>gi|15903086 &ndash; R6</td> 
			<td>IgA1</td>
            <td>1963</td>
			<td>315-393</td>
            <td rowspan="2">G5</td>			 
         </tr>			 
		 <tr>
            <td>gi|15901019 &ndash; TIGR4</td>
            <td>IgA1</td>
			<td>2004</td>
            <td>315-393</td>           	
         </tr>
		 <tr>
            <td>PF06769</td>
            <td>gi|15903627</td>
            <td>84</td>
			<td>5-84</td>
            <td>gi|15901572 &ndash; VirPredTIGR4</td>
			<td>Hypothetical</td>
            <td>84</td>
			<td>5-84</td>
            <td>Plasmid_Txe</td>           	
         </tr>		 
     </tbody>
 	  </table>
 	</table-wrap>
  </floats-wrap>
</article>
