Prediction of micro RNAs against H5N1 and H1N1 NS1 Protein: a Window to Sequence Specific Therapeutic Development

Influenza is a disease of concern world-wide. Recent pandemic of swine influenza (H1N1) virus has raised questions regarding efficacy of current chemotherapy against influenza viruses. It is known that birds especially wetland birds such as ducks and geese are the reservoirs of influenza viruses [1-3]. The spread of influenza viruses to humans is from wetland birds to poultry birds via poultry animals such as pigs [3]. 1996 avian influenza (AI) H5N1 virus was a highly pathogenic influenza virus but it was largely restricted to poultry birds. The potential of AI H5N1 virus to cause a pandemic was speculated earlier and hence many groups started working on H5N1 inhibitory chemotherapy largely focusing on specific therapeutics. Current chemotherapeutic agents used as anti-influenza medicines are protein inhibitors such as neuraminidase inhibitor and M2 ion channel inhibitors.


Introduction
Influenza is a disease of concern world-wide. Recent pandemic of swine influenza (H1N1) virus has raised questions regarding efficacy of current chemotherapy against influenza viruses. It is known that birds especially wetland birds such as ducks and geese are the reservoirs of influenza viruses [1][2][3]. The spread of influenza viruses to humans is from wetland birds to poultry birds via poultry animals such as pigs [3]. 1996 avian influenza (AI) H5N1 virus was a highly pathogenic influenza virus but it was largely restricted to poultry birds. The potential of AI H5N1 virus to cause a pandemic was speculated earlier and hence many groups started working on H5N1 inhibitory chemotherapy largely focusing on specific therapeutics. Current chemotherapeutic agents used as anti-influenza medicines are protein inhibitors such as neuraminidase inhibitor and M2 ion channel inhibitors.
These drugs show good results as an anti-influenza drug but they are disadvantageous to the patient in terms of severe side effects they might cause [4,5]. Sequence specific therapeutics (SSTs) is the latest addition to anti-influenza drugs. Advantages of SSTs include reliability and specificity in their action. Previously small interfering RNAs (siRNAs) have been shown to be effective against influenza [5][6][7][8]. Although siRNAs and miRNAs do not differ much in their physicochemical properties, the source of siRNAs is generally exogenous as compared to the endogenous source of miRNA biogenesis [9,10]. These drugs show good results as an anti-influenza drug but they are disadvantageous to the patient in terms of severe side effects they might cause [4,5]. Endogenous sources of action are useful because they are key elements in gene regulation. miRNAs are important in gene regulation, as they act at sequence level either by degrading mRNA of target protein or by causing translational repression of target protein by inhibiting mRNA of target protein.

NS1 protein
Influenza A virus contains eight segments of negative sense linear RNA genome [11,12]. These 8 segments code for 11 different proteins [12]. Hemagglutinin (HA) and Neuraminidase (NA) are the major antigens of the virus. Other than these proteins non-structural protein 1 (NS1) of segment 8 of AI H5N1 virus have been discovered to effect viral propagation by inhibiting host cell's antiviral pathways [13][14][15][16][17][18]. NS1 protein binds double stranded (ds) RNA of virus during its replication thereby preventing it from undergoing degradation mediated by RNA interference pathway of host cell (supplementary files) [19]. It also inhibits synthesis of interferons, which are key molecules of antiviral pathway. Palese et al. [20] found that repression of NS1 protein leads to reduction in viral propagation. This makes NS1 a very good target to raise sequence specific therapeutics against for.

miRNA prediction and filtering
A variety of web platforms can be used for prediction of miRNAs and their target genes. Although recent developments in the field of sequence specific therapeutics and bioinformatics has resulted in the release of miRNA prediction and target prediction web platforms, many of these possess innate problems with the low specificity and sensitivity in prediction [21]. Also, a variety of factors such as miRNA binding site location (3'UTR, CDS, 5'UTR), number of G:U wobble base pairs in the seed region of miRNA, the kinetics and thermodynamics of miRNA and its target interaction are important to understand the scope of miRNA functionality [22]. Common miRNAs and common targets can be found out by using filtering methods. Thermodynamic stability of miRNA molecules can be deduced from minimum folding energy (MFE) calculations. MFE values and thermodynamic stability of miRNA and target RNA duplex formation can be used to understand efficacy of miRNAs. This reduces noise in the prediction and time consumption. The experimentally validated targets of miRNAs provide reliable data to build up regulatory networks of miRNAs. Although in sillico prediction reduces time in SST development, experimental validation of predicted miRNAs is important to fully understand miRNA action on target genes [23].

Regulatory networks
miRNAs are diverse in terms of their action in gene regulation. They show multiplicity and cooperativity in their action; this means a single miRNA may possess multiple targets, at the same time; multiple miRNAs may regulate a single gene [24]. This diverse nature of action of miRNAs is important in developing regulatory networks on miRNA actions. Filtering at target prediction level is useful to avoid false positives. The regulatory networks may be qualitative, commenting only on the interactions of miRNA and its targets or they may be quantitative, commenting on kinetics and other details about the miRNA and its targets.

Selection of genomes
For H5N1 (A/Goose/Guangdong/1/96 (H5N1)) segment 8 was selected as it was the only complete sequence found in Genbank. For H1N1 14 sequences from different strains world-wide were selected randomly, most of them being from Asiatic countries. The complete list of genomes and their accession is provided as supplementary information.

Prediction of miRNAs
Softberry findmiRNA (Softberry webserver) and miReval Ritchie et al. [25] web platforms were used to predict sequences from segment 8 of H5N1 and H1N1 viruses. The predicted sequences were searched in miRBase for getting homologous metazoan miRNAs. These miRNAs were sorted out according to their e values. Only Mus musculus and Homo sapiens miRNAs were selected. Common miRNAs from predictions were sorted out for further analysis. Sequences of precursors of miRNAs were obtained from miRBase [26].

Analysis of miRNAs and pre-miRNAs
Length distribution and % sequence position analysis for miRNAs and pre-miRNAs were done. To avoid noise caused by lengths of pre-miRNAs, optimal lengths were considered for sequence analysis. For each sequence cumulative data of A, U, G, C content was obtained from RNAfold web platform. Statistical analysis including mean, maximal, minimal, standard deviation, covariance and correlation was carried out.

Filtering of miRNA prediction
Total of five filters were used to sort out miRNAs with therapeutic potential. The first filter used is determination of thermodynamic stability of miRNAs based on MFE values [27,28]. miRNAs falling below threshold value were removed. The second filter used is genomic locations of pre-miRNAs [29] and miRNA expression profile data [30]. According to this filter, miRNA genes located on sex chromosomes were removed as they cannot be used therapeutics. UCSC miRNA gene location maps are provided in the supplementary information files. Third filter setup was that of analysis of binding site. Here, target sites predicted by miRNA and target alignment using ClustalW were subjected to MFE value determination. To avoid variation in result due to variable lengths of target site bindings, AMFE values were considered. AMFE was calculated using following equation Zhang et al. [31] :AMFE = (MFE/Length of pre-miRNA sequence)*100 According to the third filter, relatively less stable binding sites and their respective miRNA pairs were removed. Fourth and fifth filters consisted of miRNA and target interaction analysis. Fourth filter is PAIRFOLD analysis Andronescu et al. [32] and fifth filter is RNAHybrid analysis [33]. The third, fourth and fifth filters were applied to both the target sequences NS1 and segment 8 for H5N1 predicted miRNA binding. Whereas filters 3-5 were applied only for segment 8 sequences of different H1N1 strains for H1N1 predicted miRNA binding, because complete sequences of NS1 of chosen H1N1 strains are not available.

Filtering of target prediction
Total of six filters were used to sort out possible targets of predicted miRNAs. These include DIANA micro T 3.0 strict, DIANA micro T 3.0 loose, DIANA micro T 4.0 beta, PicTar, Target ScanS Papadopoulos et al. [34] and miRGen (Intersection of miRanda, PicTar and TargetScanS) [35]. Genes common to 4 or more filters were considered as significant targets. The filtering was also setup according to the -ln (p-value) of prediction web platforms.

Qualitative miRNA network construction
Sorted out genes after target prediction filtering was considered for miRNA regulatory network construction. Common target genes for predicted miRNAs were sorted out and their involvement in various pathways was searched accordingly. Following is the flowchart made by our group to predict miRNAs against virulent proteins.

Segment 8 of H5N1 shows distant homology with human genes
The source genome of H5N1 selected was Influenza A virus (A/ Goose/Guangdong/1/96 (H5N1)). Segment 8 when aligned with non redundant protein database of human, shows distant homology with Glyceraldehyde-3-phosphate dehydrogenase (GAPDH), heparin sulfate N-deacetylase/N-sulfotransferase (NDST3) and PRKCA-binding protein (PICK-1) proteins. Out of which GAPDH is homologous with NS2, while NDST3 and PICK-1 proteins are found to be distantly homologous with NS1 protein of H5N1. These proteins hence were used for miRNA prediction against segment 8. Homologous sequences showed presence of putative domain of NS superfamily protein.

Prediction of micro RNAs
Total of 22 miRNAs were predicted using various miRNA prediction algorithms, out of which 16 were metazoan and 6 were viral. Predicted miRNAs showed involvement in various cell signaling and regulatory pathways.

Percentage sequence composition and length distribution of miRNAs
Predicted miRNAs show significant skew of 25% each. In case of metazoan miRNAs, Guanine (G) dominates over other bases followed by adenine (A) (Figure 1a). Guanine was found to be dominant (30.38%) at fifth (50%) and thirteenth (63%) as well as all the positions from 21-23. Whereas adenine was found to be dominant (24.70%) at positions 1-3 and at fifteenth (56%) position. Highest enrichment of guanine was found to be at positions 5, 13 and 23. Highest enrichment of adenine is at positions 3 and 15. In case of viral miRNAs (Figure 1b), Uracil (32.60%) was found to be dominant at positions 10, 11, 15-

Percentage sequence composition and length distribution of pre-miRNAs
Precursors of predicted metazoan miRNAs showed a length distribution varying from 72 to 124 nucleotides. To avoid the noise during percentage sequence distribution analysis sequence length of 85 nucleotides was considered. For precursors of predicted viral miRNAs the sequence length of 83 nucleotides was considered for sequence analysis. Both the viral and metazoan pre-miRNA sequences showed Uracil as the predominant base, for which percentage values were found to be 26.91% for metazoan pre-miRNAs and 30.63% for viral pre-miRNAs (supplementary files). The next predominant base found was guanine which differed slightly from average Uracil composition. For metazoan pre-miRNAs all the bases were distributed evenly with a standard deviation of 2.28 and that of viral pre-miRNAs were less evenly distributed with a standard deviation of 7.84. The percentage values of AU and GC content varied largely for metazoan and viral pre-miRNAs. For metazoan pre-miRNAs GC content (51.10%) was higher than AU content (49.04%); whereas for viral pre-miRNAs GC content (42.16%) was found to be lower than AU content (57.22%). Although GC and AU content showed variation in percentage content value, it was found to follow a certain range for both metazoan [(66.91±0.06)%] and viral (83.33%) pre-miRNAs. The overall % GC and % AU content was found to be (49.91±0.3) %. The variation in dinucleotide content and in nucleotide base composition relates with stability and thermodynamic parameters of pre-miRNA sequences, in terms of MFE and AMFE values.

MFE values of miRNAs
MFE values of miRNAs acts as the first filter to sort out miRNAs. MFE values of predicted miRNAs range from -4.2 to 2.04 kcal/mol (Figure 2.0). Average value of MFE given by these miRNAs is -1.844 kcal/mol. The maximum value of MFE is given by hsa-miR-504 (-4.20 kcal/mol). Hsa-miR-504, hsa-miR-141, mmu-miR-141, hsa-miR-22, hsa-miR-138, hsa-miR-124 and hsa-miR-525-3p can be considered for further analysis, as according to the MFE data, their structures are relatively stable than other miRNAs. All other miRNAs show less MFE values suggesting that these miRNAs are less stable as compared to the others.

miRNA locations and their tissue level expression
Although hsa-miR-504 forms the most stable structure, the gene coding for hsa-miR-504 is located of X chromosome. This restricts action of hsa-miR-504 as a therapeutic agent. Studies using mimiRNA web server showed that hsa-miR-504 is maximally expressed in testis, placenta and heart tissue. A wide distribution of expression of predicted miRNAs was found in a variety of cell types and disease conditions such as teratocarcinoma and other types of cancer. Heart cells are also a common location for many miRNAs. mimiRNA could not provide information on hsa-miR-22 and hsa-miR-124.

Analysis of binding site
For a miRNA to work efficiently the binding site of the targets sequence should be stable enough. Analysis of binding site (   stable binding site was shown by hsa-miR-504 target site (-27.14 kcal/mol for segment 8 binding and -23.81 kcal/mol for NS1 mRNA binding)) followed by has-miR-22 target site (-25.27 kcal/mol for segment 8 binding and -24.75 kcal/mol for NS1 mRNA binding). Mmu-miR-141 showed lowest values of binding site AMFE (-17.2 kcal/mol for segment 8 binding and -18.20 kcal/mol for NS1 mRNA binding). All the miRNAs other than mmu-miR-141 hence were sorted for further filtering.

PAIRFOLD and RNAHybrid analysis
miRNA and its target interactions were studied by using PAIRFOLD, Softberry TargetmiRNA and RNAHybrid web servers (Figure 4a, 4b).

Qualitative miRNA network construction
Selected miRNAs with sequence specific therapeutic potential shows interactions with many genes. miRNAs show cooperativity and multiplicity in their function; hence a single miRNA may possess more than one target as well as a single gene may get regulated by multiple miRNAs. To understand the scope of selected miRNA action, a qualitative network of miRNA involvement in gene regulation was constructed ( Figure 5.0). Hsa-miR-138 shows targets in axon guidance and small cell lung cancer, whereas hsa-miR-124 shows targets in variety of cancer types such as chronic myeloid leukemia, glioma, acute myeloid leukemia, prostate cancer, melanoma, non small cell lung cancer, pancreatic cancer, colorectal cancer and endometrial cancer as well as in gap junction, insulin signaling pathway and MAP kinase pathway. Hsa-miR-525-3p possesses targets in axon guidance and melanoma. miRNA network construction of predicted miRNAs thus suggests roles of miRNAs in cell regulatory pathways and pathways related to cancer. Upregulation or downregulation of hsa-miR-124 may result in drastic change in cell physiology at the same time the role of hsa-miR-124 in variety of cancers implies that hsa-miR-124 might possess important roles in cancer down regulation of genes involved in cancer.

Segment 8 do not show homology with human genes:
Considering world-wide impact of recent H1N1 virus, 14 segment 8 genomes from 14 different H1N1 strains isolated from different parts world-wide were used for analysis.

Percentage sequence composition and length distribution of miRNAs
Sequence distribution data showed that Uracil predominates at position 13 and its maximum value found is 100%. Also, cumulative data of percentage values of nucleotide sequences shows that Uracil occurs most commonly with an average value of 35.75% followed by Adenine (25.45%). This is in favor of (A+U)%; where it is found to be 61.21% as compared to (G+C)% which is 38.78%. Highest enrichment of Uracil was found to be at position 13 (100%), whereas that of adenine was found to be at position 2 (60%). The length of predicted miRNAs varied between 21-22 nucleotides with an average of 22 nt. The cumulative average value of 22 nt length was found to be 60% (supplementary files).

Percentage sequence composition and length distribution of pre-miRNAs
Precursors of predicted miRNAs of 1302 family showed a length distribution varying from 72 to 151 nucleotides with an average value of 126 nucleotides, whereas precursors of predicted miRNAs of other than 1302 family showed length distribution varying from 61-97 with an average value of 86 nucleotides (supplementary files). To avoid the noise during percentage sequence distribution analysis, sequence length of 86 and 126 nucleotides was considered for pre-miRNAs other than 1302 family and for 1302 family pre-miRNAs respectively. 1302 family pre-miRNAs showed adenine as the predominant base (34.60%) followed by uracil (34.50%), whereas pre-miRNAs other than 1302 family showed Uracil as the predominant base (30.70%) followed by adenine (27.23%). For pre-miRNAs other than 1302 family all the bases were distributed relatively evenly with a standard deviation of 5.68 whereas those of 1302 family pre-miRNAs were less evenly distributed with a standard deviation of 11.17. The percentage values of AU and GC content did not show large variation for both the types. %AU content was found to be higher for both the types as compared to %GC content. 1302 family pre-miRNAs showed higher % value of AU content (69.10%) than other pre-miRNAs (57.94%); whereas %GC content for pre-miRNAs other than 1302 family showed higher %

miRNA locations and their tissue level expression
All the predicted miRNAs show moderate expression levels for various cell types and tissues such as breast tissue, brain, heart tissue and liver. Hsa-miR-520f shows high expression in breast malignant tumor cells as well as moderate expression level in thymus, whereas other miRNAs show poor to moderate expression in various tissues. 520 series of miRNAs are located on chromosome 19, that of 548

Qualitative miRNA network construction
The potential sequence therapeutic miRNAs predicted against NS1 of H1N1 possess roles in gene regulation. To understand a qualitative miRNA network was constructed (Figure 10), which showed that 520 series miRNAs, possess targets in various cell regulatory pathways such as Map Kinase pathway, mTOR pathway and TGF β pathway. These miRNAs possess targets in various cancer types such as glioma, melanoma, chronic myeloid leukemia, pancreatic cancer etc. miRNA network construction provides an insight regarding miRNA and target gene interaction as well as possible impact on gene regulation and host physiology when miRNA expression is altered or changed due to artificial conditions or by natural cellular processes.  nucleotides. Although guanine predominates in H5N1 metazoan miRNAs (30.38%), Uracil predominates in all other miRNAs with a value of 32.60% for H5N1 viral miRNAs and 35.75% for H1N1 miRNAs. Adenine is the second most abundant base in all the miRNAs and its enrichment is found to be at positions 1-3 and 5 for metazoan miRNAs, 3-5 for viral miRNAs and position 2 for H1N1 miRNAs. This nucleotide position falls in the seed region of miRNAs, which is the primary determinant of miRNA functionality. This suggests that adenine might be important base for determining miRNA functionality in case of predicted miRNAs. Uracil was found predominant base in all the precursors of predicted miRNAs. The second most abundant base found was guanine for H5N1 pre-miRNAs and adenine for H1N1 pre-miRNAs. This has reflected in the variable GC and AU content of the predicted pre-miRNAs. Also, it was found that standard deviation for pre-miRNA sequence composition was highest for 1302 family pre-miRNAs (11.17) , followed by H5N1 viral pre-miRNAs (7.84

Discussion
miRNAs are key regulators in gene regulation networks found in metazoan and viruses. miRNAs essentially play roles in regulation of viral replication and propagation by regulating host gene regulatory mechanisms. The action of miRNAs is specific as they act at sequence level which is the crucial stage in the central dogma. Viruses, especially RNA viruses such as influenza viruses show highly variable nature of antigenic proteins, which is due to antigenic shift, antigenic drift and erroneous replication of viral genome by viral RNA polymerase. NS1 protein and segment 8 genome of influenza A viruses which code for it are the major targets as these sequences are fairly conserved across different strains and also they are not variable in nature as conferred by ClustalW analysis.
Filtering by RNAfold, PAIRFOLD and RNAHybrid web platforms reduces number of false positives in the prediction. Filtering by Target ScanS, DIANA micro T, PicTar, miRanda web platforms reduces number of false positives in case of miRNA targets. Construction of miRNA regulatory networks provides insights into possible roles of predicted miRNAs in gene regulation. The target genes of predicted miRNAs are related with stress related pathways and various cancer types. This suggests that action of NS1 protein inside the host cell might be analogous to that of proteins involved in cancer induction. Correlation between predicted miRNAs clearly shows that Uracil is the predominant base in pre-miRNAs followed by adenine. Also, adenine predominates in the seed region of miRNAs, suggesting importance of adenine in miRNA functionality. Also correlation values show that Uracil contributes to thermodynamic stability of miRNAs. These observations are important in developing therapeutic miRNAs considering delivery miRNAs which might pose problems in their action in vivo.