Motif Design for Nitrilases

Nitrilase is one of the nitrile metabolizing enzymes that catalyses the conversion of nitriles to corresponding acids which has gained importance in green chemistry. Nitrilase being substrate specific yet it acts over a wide range of nitriles (aliphatic/aromatic) has drawn attention due to its utility in mild hydrolysis. Most of the nitrilases reported hitherto have been physically extracted characterized from the microbial/plant sources. In order to identify sequences for nitrilases two groups of motif were designed i.e. aliphatic nitrilase motif’s (MDMAl) and aromatic nitrilase motif’s (MDMAr) each with four motifs specifically belonging to nitrilase with conserved catalytic triad (Glu-48, Lys-131, Cys-165) which can be used as marker for nitrilase. Conserved regions were identified by performing Multiple Sequence Alignment (MSA) using Multiple EM for Motif Elicitation (MEME). The Manually Designed Motifs (MDM’s) were validated by ScanProsite and their presence is also confirmed by PRATT, Gblocks and MEME. The ScanProsite search against the MDMAr exhibited some new sources of aromatic nitrilase from plant, animals and microbes whereas MDMAl only exhibited nitrilase from microbes. Besides identifying unique motifs in order to confirm their substrate specificity for nitriles, randomly selected sequences were validated by studying some important physiochemical parameters and position specific amino acids. Journal of Data Mining in Genomics & Proteomics J o u r n a l of D ata Mi ning in Gmics & rot e o m i c s ISSN: 2153-0602 Citation: Sharma N, Bhalla TC (2012) Motif Design for Nitrilases. J Data Mining Genomics Proteomics 3:119. doi:10.4172/2153-0602.1000119


Background
Nitrilases are responsible to catalyze the hydrolysis of nitriles into corresponding acids and ammonia which are used in chemistry for the production of industrially important acids. Nitrilases frequently exhibit enantioselectivity under mild reaction conditions, making them ideal tools for green chemistry from conversion of nitrile or in environment management for remediation of nitrile contaminated soil, water and air. Based on substrate specificity they are grouped as aromatic and aliphatic nitrilase.
A large number of microbial plant and animal genome have been sequenced in the past decade and genome sequence data have tremendously expanded over those years. Screening of genome/ proteome databank will be worthwhile to find out novel sources of enzymes. Many tools and techniques such as BLAST, Hidden Markov Model (HMM), neural network classification are used for in silico screening of genome/proteome sequence database till date. Among these, motif design has been found to be one of the most reliable strategy for efficient screening of database [1][2][3] as motifs of a particular protein sequence signifies its specific structure and functionally which is useful to characterize and classify that protein.
Nitrile metabolizing microorganisms are mostly isolated from soil or water using enrichment culture method and these are cultured and tested for the nitrilase activity during screening of microbial isolates. This is a classical method for isolation of the nitrile metabolizing microorganisms and subsequent screening for nitrilases activity is a time consuming and cost intensive process [4][5][6][7]. Presently bioinformatical tools and techniques such as Blast [1], HMM (Hidden Markov Model) [8][9] the neural network classification [10,11] and Motif Identification Neural Design (MOTIFIND) [12] are used for screening of genome and sequence database for searching gene coding for novel enzymes. The present communication aims at Manually Designed Motif (MDM) for aliphatic and aromatic nitrilase and their validation using Prosite, ScanProsite, BLAST, PRATT and G-block is being reported. On the basis of the earlier studies on in silico analysis of amino acid sequence of the aromatic and aliphatic nitrilases [13] and using bioinformatics approaches, motifs were manually designed to differentiate and identified aromatic and aliphatic nitrilases. Computational analysis of amino acid sequences and study pertaining to physiochemical properties [14] of nitrilases have revealed differences between aromatic and aliphatic nitrilases [13].

Retrieval of sequences and designing of motifs
The protein sequences were obtained from the protein server UniprotKB/SwissProt (release 2011-12) database. To design motifs manually, six microbial aliphatic and aromatic nitrilases were selected on the basis of our earlier study (Table 1) [13].

Multiple Sequence alignment and phylogenetic analyses
Multiple sequence alignments were performed using CLUSTAL W [15] and CLUSTAL X (version 2.1) software with default settings and verified with MAFFT (Multiple Alignment with Fast Fourier Transform) and MEME (Multiple Em for Motif Elicidation). Nitrilase (aliphatic and aromatic) sequences were used for phylogenetic relationship inferences. Phylogenetic tree was generated by Neighbor Joining (NJ) using CLUSTAL X ( Figure 2). After multiple alignments and also on the basis of presence of catalytic triad i.e glutamine, lysine, and cysteine (Glu-48, Lys-131, and Cys-165) motifs were manually designed (MDM) for aliphatic (MDMAl) and aromatic nitrilases (MDMAr). In order to verify the manually designed motif and to eliminate poorly aligned or *Corresponding author: Dr. Tek Chand Bhalla, Sub-Distributed Information Centre, Himachal Pradesh University, Summer-Hill, Shimla-171005, India, E-mail: bhallatc@rediffmail.com

In-silico physiochemical characterization and validation of motifs
After confirmation through MEME version 4.8.1 (Figure 1a and Figure 1b), Gblock tool, Manually Designed Motif 's (MDM's) were subjected to database search through ScanProsite to validate and determine the specificity of these motifs in identifying nitrilase sequences. ScanProsite tool was used to search the databases both from UniprotKB/Swiss-Prot and UniprotKB/TrEMBL with match mode of "not greedy, not overlaps" and "no includes". ScanProsite analysis resulted in protein sequences which were counted and were grouped into plants, bacteria and uncultured organisms ( Table 2). MDM were further verified by using Pratt tool to identify  divergent regions of aligned protein Gblock software was used [16,17]. Bootstrap value was calculated by using default value (1000). conserved pattern in protein sequences. In order to find the nature of the nitrilase sequences i.e. aliphatic/aromatic we also studied some of the physiochemical properties of randomly selected protein sequences. The physicochemical parameters studied included negatively charged residue (Asp+Glu), molecular weight, alanine content (%) and instability index. The values of these parameters for the predicted/ reported nitrilase sequences were deduced from the ProtParam (http://web.expasy.org/protparam/) of Expert Protein Analysis System

Results
Manually Designed Motifs were validated for the two groups of nitrilases i.e. aliphatic and aromatic and the results are shown in Table 1. Hits obtained from each manually designed motif (MDM) of aliphatic and aromatic nitrilases in SwissProt and TrEMBL databases are obtained through a motif search. These motifs are conserved for aliphatic and aromatic nitrilases (Figure 1a and Figure 1b). MEME version 4.8.1 (2012) software also confirmed the presence of all the MDM, which were analyzed through ScanProsite and resulted in protein sequences only for nitrilase family. These sequences have conserved catalytic triad i.e. glutamine, lysine, and cysteine (Glu-48, Lys-131, Cys-165) which are essential for the nitrilases activity [18][19][20][21].

Designing and validation of motifs for aliphatic nitrilase
In the present study manually designed motifs for aliphatic nitrilases In these 578 protein sequences, there were 164 nitrilase sequences from the uncultured microorganisms as shown in Table 2.
Total amino acid numbers of all resulted sequences were found to be in between 330-382. Molecular masses were from 37226 to 41971 daltons, alanine content was in between 10-16%, negative charged residue were in between 37-52 and instability index ranged from 30.32 to 44.13 for the aliphatic nitrilase sequences (Table 3a).
In silico physiochemical analysis of the protein sequences retrieved from ScanProsite using first, second, third and fourth MDMAr exhibited the total number of amino acid ranged between 306-331,     molecular sizes were found to be between 32273-36940 dalton's, alanine content was between 4.9-11.0%, negative charged residues ranged from 34-44 and instability index was found to be 39.82-48.68 for the aromatic nitrilase sequences (Table 3b).

Phylogenetic analysis
Twelve protein sequences for nitrilases (aliphatic/aromatic) were subjected for the phylogenetic tree construction. Tree topology generated by Neighbor Joining Method (NJ) revealed that aliphatic and aromatic nitrilases were clustered with well distinct groups and well supported bootstrap value of 1000 (Figure 2).

Discussion
The expansion of molecular sequences and genomic databases has made the area of genome sequence data analysis more challenging and interesting to develop tools for rapid and reliable searching and analysis of the data. In this endeavor, database searching against gene/protein families with motifs has emerged as important strategy for efficient similarity searching [22]. While several domain/motif databases are being compiled, it is important to develop database search tools that fully utilize the conserved structural and functional information embedded in those sequence data to enhance the reliability of the search. Sequence motifs typically occur in a specific and known order in a sequence family. The ordering and spacing of motifs therefore, provide powerful additional criteria for classifying sequences into families. In this paper, we have manually designed groups of motifs (MDM) i.e. MDMAl and MDMAr each with four motifs for rapid and reliable nitrilase identification and compared it to the currently available methods, including the BLAST search, the PROSITE pattern search, Gblock [23] MEME version 4.8.1 (Multiple Em for Motif Elicitation) and the HMM method. All the designed motifs have one amino acid residue of active site, therefore when MDM were analyzed and validated through ScanProsite, resulting protein sequences were specifically belonging to nitrilase with conserved catalytic triad Glu-48, Lys-131, Cys-165 [18,24,20,15] and total number of hits were found to be significantly higher (578 aliphatic & 392 aromatic) when compared to databases such as BLAST and PROSITE.  (Table 2).
Multiple sequence alignment (MSA) in present study revealed that there are specific amino acids present at specific position which are responsible for the activity of nitrilases. Phylogenetic analysis by the Neighbor Joining (NJ) method clearly makes a distinction between the two groups of nitrilases i.e. aliphatic and aromatic nitrilases ( Figure 2).
In our previous communication [13] we have reported that some of the physiochemical properties play a significant role in the substrate specificity of aliphatic and aromatic nitrilases. The physicochemical parameters analysis of these sequences further confirmed that all the sequences having a total number of amino acid between 330-382, molecular weight between 37226-41971 daltons, alanine content 10-16%, negatively charged residue (Asp+Glu) 37-52 and instability index 30.32-44.13 belonged to aliphatic nitrilases. Similarly sequences having a total number of amino acid between 300-330, molecular weight between 32273-36940 daltons, alanine content (%) 8-11%, negative charged residue (Asp+Glu) 37-52 and instability index 39.82-48.68 were of aromatic nitrilases (Table 3a and 3b).
It is well known that uncultured organisms are the new sources of enzymes, antibiotics and drug discovery [25,26] In the present study, we have found nitrilase sequences belonging to uncultured organisms from the database (aliphatic-164, aromatic-41). The protein sequences of these uncultured organisms have all the specific properties responsible for the nitrilase activity. Further studies on these sequences may lead to find out specific nitrilases needed for transformation of nitriles in organic chemistry and industry.

Conclusion
Nitrilases are distributed among microorganisms, plants and some plant species. We here present computational analysis of two classes of nitrilase enzyme which includes the study of motifs, physiochemical and phylogenetic analysis which could be used as tool to predict and differentiate nitrilases on basis of their substrate specificity. The present analysis predicts new sources of nitrilases with study of common and important physicochemical/biochemical properties as they share a common ancestry and use of these observations will be useful to predict  function of the randomly selected nitrilases. Computational analysis of various properties of nitrilases revealed differences and also MEME, Pratt and Gblock analysis have confirmed the presence of four motifs. Additionally this approach has also led us to find new sources of nitrilase across SwissProt/TrEMBL databases. MDM for aliphatic/ aromatic nitrilases, have been validated with the results and these motifs will be of potential application for screening genome/protein sequence databases to find novel sources of nitrilases.