Received date: April 25, 2017; Accepted date: April 28, 2017; Published date: May 08, 2017
Citation: Pervaiz T, Lotfi A, Haider MS, Haifang J, Fang J (2017) High Throughput Sequencing Advances and Future Challenges. J Plant Biochem Physiol 5:188. doi:10.4172/2329-9029.1000188
Copyright: © 2017 Pervaiz T, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Plant Biochemistry & Physiology
High throughput sequencing (HTS) technologies were developed into indispensable for genomic investigation and recent hottest topic for research in the field of genomics, which can generate over 100 times more data in comparison with the most complicated capillary sequencers. Recent advances and developments in HTS using next generation sequencing techniques have become essential in the studies of digital gene expression profiling, in epigenomics, genomics, and transcriptomics. These methodologies are dexterous of sequencing multiple DNA molecules in corresponding; facilitate hundreds of millions of DNA molecules to be sequenced within a short period of time. Though, the expenses and time period have been significantly reduced; the inaccurate profiles and boundaries of the new policy differ considerably from those of earlier reported sequencing techniques. The technical developments and decreasing cost of NGS (Next Generation Sequencing) technology have made RNA sequencing (RNA-seq) as a worldwide popular technique for gene expression projects. Various approaches have been done for the standardization of RNA sequencing data, which have been materialized in the reports, contradictory, both in the type of bias modification and in the statistical approach. On the other hand, as data persistently build up, there has been no apparent consensus on the proper normalization techniques to be used or the impact of chosen methods on the downstream analysis. In the present article, we mentioned the key features of HT-NGS like, Key HTS platforms and different sequencing applications, ethical limitation and future prospective.
Next generation sequences; High throughput; RNA sequences; Genomics; miRanalyzer
The advancement of high-throughput sequencing (HTS) techniques remarkable advances in technologies have fundamentally changed to understand the genetic and epigenetic molecular bases underlying human health and plant diseases [1-3]. The influence of these methodologies was developed from the constant sequencing of genomic regions of interest, for example, exons and protein binding sites [4,5]. The methodologies involve dispensation large number of sequencing reads controlled in unprocessed data sets ranging from a large number of megabytes to over 25 gigabytes . HTS is a novel and rapid advanced sequencing technology which is commonly used in transcriptomics, genomics and epigenomics , most recent advancements and approaches during in the last decades, that alternative sequencing have been develop into available strategies [8,9] which emphasis to entirely redefine ‘‘High-throughput Sequencing Technology.’’ The present technological approaches do better than the previously reported Sanger-Sequencing tools with a factor of addition of 100–1000 in day by day throughput, and ultimately, by lowering expenses of one million (1 Mb) nucleotides to 4-0.1% of that making sequencing technologies mainstream supplementary through Sanger sequencing [10,11]. To reproduce massive changes, many researchers, recent reviews and companies apply the terminology ‘‘NGS” (nextgeneration sequencing) as an alternative of high-throughput sequencing (HTS) . Current developments in sequencing techniques have been dramatically changed the field of genomics during the present technological development, making it promising even particular research members to generate gigantic amount of sequence data very fast at a considerably subsidized budget and expenses. These HTS techniques make full genome sequencing, transcript quantification, resequencing and deep transcriptome sequencing are available for future researcher.
MicroRNAs (miRNAs) play essential regulatory functions in several organisms through direct cleavage of transcripts, chromatin modification or translational repression and modulate gene expression in both plants and animals. Identification of miRNAs has been carried out in most plant species . Since the first detection of plant miRNAs from Arabidopsis, plant miRNAs have been extremely investigated using experimental and computational techniques . According to miRBase release, the large number of miRNAs were identified and reached up to 4011 . Due to the significant role of the miRNAs in plant natural processes, ranging from organ differentiation to biological and environmental stress responses . The recent advances of computational, analytical and experimental advancements, attention in these undersized molecules has been considerably improved during last and resent decades. Up to date, 25141 mature miRNA sequences from 193 various species (viruses to human) reported in the miRBase catalog. The Plant MicroRNA repository PMRD, released on June 11, 2012, having 10597 miRNAs discovered in 127 plant species. Currently HTS with NGS tools becomes precious in digital gene expression profiling . Many researchers to recognize and verify the expression of new and conserved tissue or developmental stage-specific levels of miRNAs  have used these tools. In addition, methylation sequencing and chromatin immune precipitation sequencing might be helpful to categorize epigenetic changes, while ribosome sequencing used to resolve mRNA transcripts are dynamically individual translated.
Especially, woody plants with high heterozygosity, like in Chinese bayberry, entire genome sequencing involved long-term and costly; as a result it is presently limited to only some species. As an alternative, it has been more helpful to get information of UniGenes throughout transcriptome sequencing [19,20]. Compared with conventional lab techniques, RNA-Seq is a HTS technology, overcoming the short comes of microarrays tools, in exploring unidentified genes. Moreover, thus having recompense in investigating transcriptome fine structures, for instance, splice junction variation and detection of allele-specific expression . In this review, we summarized the useful high throughput sequencing technologies, their application, limitation and also the future of HTS.
Sanger capillary sequencing: Contemporary SCS extensively used in the GE Healthcare Mega BACE or Applied Biosystems 3xxx series instrument are supported by the general similar system were used during 1977 for the genome (wX174) . Initially, millions of duplicates of the structure to be indomitable are amplified or purified, contingent on the sequence basis. Reverse strand synthesis is presented on these duplicates using an identified priming sequence upstream to be indomitable and a combination of dideoxy-nucleotides (ddNTP, modified nucleotides missing a hydroxylgroup in sugar at the third carbon atom) and deoxy-nucleotides. The non-reversible execution of the extension reaction, dNTP/ddNTP mixture causes random producing from the multiple copies of molecules extended to varied size. Subsequently denaturation and cleanup of free moving enzyme, primers and the nucleotides, while the resulting molecules are arranged by their molecular weight (analogous of the point of termination) and the markers involved to the terminating ddNTPs is read out successively in order formed by the categorization step. With the application of existing Sanger sequencing techniques, it is precisely capable of 384 sequences  ranging from 600 to 1,000 nt, in length [24,25] though, the present 384-capillary systems are exceptional. The additional ordinary 96-capillary apparatus produce a utmost of more or less 6 Mb of DNA sequence apiece date, with expenses for consumables cost about $500/1 Mb.
De novo sequencing is applied for generation of the DNA sequence from DNA molecule with no any aforementioned knowledge regarding the sequences. For genome research works, enormously high throughput level and high-end robotics are mandatory in order to provide the sequencing workflow. During the last decade nextgeneration sequencing technologies considerably emphasized on "Moore's Law", which predicted that the high throughput of DNA sequencing in faster rate than the development rate of computer technology, as researchers face difficulties to smooth load and operate in computer memory system. There is demand for the de novo assemblers to capably hold the large scale of sequencing data using scalable commodity servers in the clouds. Chang, Chen , reported Cloud Brush, analogous algorithm that runs on the Map Reduce framework of cloud computing high-throughput sequencing data for de novo assembly. The algorithms apply Myers’s bi-directed string graphs as its basis and consist of two major phases: graph simplification and construction. De novo Sequencing applications in various workflow strategies were developed to perform de novo sequencing. They include Primer walking, Shotgun sequencing, using transposons to randomly prime sites for sequencing, PCR amplification of template, nested deletions and mRNA sequencing.
De novo complete or partial genome sequencing might be address through a variety of common approaches. Initial generation of genomic sequence of new species and detailed genetic analysis were only possible after de novo sequencing has been performed, RNA-Seq is recently developed as an influential, high-throughput sequencing technology that used to produce millions of short sequence reads in short time period by deep sequencing. Long reads are more useful and paired end reads are essential, which allowing gene expression profiling that divulge several new transcribed sections; splice forms, SNPs and accurate localization of transcription boundaries. ESTs are incomplete sequences retrieved from complementary DNA (cDNA). A number of ESTs might be produced from a single gene and represent gene expression in all individual samples . The whole transcription units representing by Full-length cDNAs are more efficient than incomplete sequences for genome annotation and transcriptome analysis [28,29]. Full-length cDNAs chosen and can be developed stand on the 59-cap, a unique traits of mRNA structure [30,31]. Additionally, the genes expected from the de-novo assemblies have to be authenticated to guarantee the efficiency of the assembly process. For the reason that reverse transcriptase PCR (RT-PCR) assist the detection and quantification of target mRNA transcripts, Kim, Lim  applied RT-PCR techniques to predicted tissue-specific candidate genes in order to confirm the consistency of transcriptome assembly in Brassica oleracea. RT-PCR used to explore the tissue-specific genes discovered by de novo assembly and analysis of deep-sequencing data could be indicating to experimentally certify the continuation of the assembled genes. Parallel to adornment the low-quality bases at the end of reads, amalgamation of the contigs produced by multiple assemblies could also improve the assembly outcome [33,34]. Tissuespecific genes are especially practical and expressed in specific cell types or tissues. Not only experimental validation of de novo assembled genes, but similarly. Thus, the knowledge directs us to altitudinal or time-course expression patterns show when and where particular genes are working. Thus, the evidence allows us to assume link between chronological or growth stage-specific manifestation, genes and tissues, and innovative gene functions .
Metagenomics termed as MPS (Massive Parallel Sequencing) of metagenome DNA exclusive of targeted intensification, facilitate a significant increase in the capacity of data created. Farther more, the expenditures of MPS is declining quickly. Untargeted MPS used extracting entire RNA or DNA from the tissue or population of interest. cDNA or DNA is afterward sequenced, exclusive of besieged amplification, using a particularly parallel pyro-sequencing podium, for instance the Illumina GAIIx. Currently, untargeted Massive Parallel Sequencing of rumen microbial populace have been discovered various innovative gene sequences used for deep sequencing of single pooled sample . Conversely, discrepancies among individuals of the similar category have not yet been revealed due to in rumen discrepancy (sampling error) or technical or factual biological variation. The method developed for evaluation the dissimilarity in rumen meta-genome contours involves a “reference meta-genome”, for instance a succession of contig sequences starting from previously reported experimentations, in which sample sequence reads were aligned. The “rumen meta-genome profile” is then the total of the reads that make parallel to every contig. For instance, in the database 200,000 contigs, the profile vector of counts will be a 200,000 × 1. These contours can then be analyzed through hierarchical clustering and bootstrap analysis or linear mixed models .
Targeted genome improvement is influential means for assembly of the substantial throughput of new DNA-sequencing apparatus. In present, a scalable and simple procedure for multiplex enlargement of target regions built on the selector methods. The modernized version exhibits enhanced exposure and compatibility with NGS libraryconstruction method for shotgun sequencing through NGS podiums .
NGS techniques, which facilitate the fast production of whole genome sequences have modernized genetic research . By means of the materialization of economical work surface of NGS platforms, genomic investigation are at present being carried out in applied or translational study labs as a substitute of state-of-the art genome center [39,40]. Although these technological progress, comprehensive finishing and de novo sequence (genome closure) assembly persist to urge researchers . Therefore, outsized incomplete genome sequences have been launched to databanks . Rationalized techniques to assemble high-quality whole genome sequences were needed, particularly in the microbial genomics of Gram-negative multidrug-resistant, somewhere de novo sequencing technique is constantly mandatory because of the genetic multiplicity and vibrant genome reorganization take place [43,44]. WGM utilize singlemolecule restriction examination to acquire results concerning the magnitude of the control splinter and their substantial positions beside the DNA strand . Whole genome has been exploit in a number of functions, which comprised of phylogenetic analyses and genotyping of associated with microbial isolates [46,47], discovery of outsized genomic structural rearrangements or variations [48,49], and quality control or verification for genome sequences which were assembled [50,51]. Physical genome maps based a restriction also having the potential to be used as a model for the exact arrangement of NGS contigs and to assist concluding the spaces between mapped contig. Meanwhile the term “re-sequencing” used as sequencing tool, which refers to the act of sequencing several samples from the same group, where a suggested genome has been reported, and is used to support in the elucidation of the data composed and used for next generation sequencing advances. Such as, re-sequencing of human genomes has been used to discover of both mutations , and polymorphisms .
Although most important development in NGS, sequencing data assembly, particularly from re-emerging pathogens or newly microorganisms, continue inhibited by the short of appropriate suggested sequences. De novo assemblage is the dominant technique to accomplish precise completed sequence, except multiple sequencing paired-end libraries or platforms are frequently essential to complete whole genome coverage. Onmus-Leone, Hang  introduced the technology to entire bacterial genome sequences assembled through assimilating shotgun Roche 454 pyro-sequencing through optical WGM. The WGRM (whole genome restriction map) is applied as the orientation to platform assembled sequence de novo contigs throughout a stepwise development (Figure 1). Outsized de novo contigs be positioned in the exact order and direction from side to side alignment to the WGRM. De novo contigs so as not could be aligned to WGRM were combined into scaffolds with contig branching assembly evidence (Figure 1). These extensive scaffolds are subsequently aligned to the WGRM to discover the intersections to be removed along with the gaps and mismatches to be determined with unexploited contigs. The procedure was repeated in anticipation of a sequence with whole reportage and alignments through the full genome maps were accomplished. Exhausting this technique it is possible to achieve 100% WGRM coverage exclusive of a paired-end library .
Exome is basically the protein coding content of the genetic code, which includes 1%–2% of the genome in all. While sequencers can read only so many bases per run, researchers sequencing exomes can produce more of them more quickly, at the greater resolution and the lower cost. Exome sequencing developed as DNA-enhancement tools and extremely corresponding nucleotide sequencing to discover altogether protein-coding modification in the genome. By regulating the extent of the experiments to the sequences of protein-coding, about five percent of human genome was sequenced. Collectively with developing unrestricted databanks of identified variants, exome sequencing permits for detection of genetic transformations in models were considered unsatisfactorily helpful for prior genetic studies . Whole-genome and whole-exome sequencing were extremely successful in detecting the causes of Mendelian genetics in human. However, next-generation sequencing (NGS) has also progressed in identifying causes of genetic conditions [56,57]. The major challenge coupled along complete exome sequencing is the discovery of the disease-causing mutation(s) along with profuse hereditary candidate variants. Fuchs, Peeters-Scholte  described a number of approaches to handle this data wealth, counting association among control databases, growing number of patients and controls, and tumbling the genomic region under consideration throughout homozygosity mapping. Recently, number of exceptional disarray of copper metabolism has been introduced with suspected, however unidentified monogenetic cause, as an attractive target for this approach. It is also anticipated that the application of these novel technologies will discover the basic deficiency in disorders illustrated, as well as in other genetic disorders of metal metabolism, in the next few years.
Experimental miRNA investigation are frequently supplemented by bioinformatic techniques, are used to develop raw sequencing data, discover mature sequences, miRNA genes, targets and precursors to determine isoforms, and organize small RNAs into identified miRNA families [59,60]. These computational and experimental techniques not only permit for economical, qualitative and quantitative small RNAs analysis, they also produce more precise results in a limited time period . Furthermore, in these sophisticated lab techniques, an opportunity of novel generation of bioinformatics methodologies has advanced emerged as necessary requirement to accommodate supplementary planned progress and enhancement of productivity. The HT-NGS is one of the enormous contests of genomic study.
Establishment of high-throughput tools and deep sequencing analysis has allowed the discovery of several miRNAs that are not preserved or are expressed in low levels, such as those found in Arabidopsis, wheat, rice, tomato and poplar . RNA-sequencing (RNA-Seq) is extensively used for genomics study and exploring novel approaches to analyze the functional involvement of transcriptomes. Particularly Solexa/Illumina sequencing tools has many benefits as a revolutionary technology for transcriptome investigation, such as high exposure at a relatively low expenditure . It has also been used to study transcriptomes in many plants, such as Arabidopsis, rice and berry [64,65]. Moreover, the Helicos BioSciences platform is suitable for applications that require quantitative insight in RNA-seq  or through RNA sequencing, as it sequences RNA templates directly without the required to translate them into cDNA’s .
A large amount of algorithms has been used in order to develop outsized range of data [68,69]. Couple of years back, miRanalyzer was developed; a technique for the discovery of known microRNAs and prediction of novel ones in HTS experimentation with entirely redesigned and includes various novel features. The discovery of novel microRNAs is essentially significant approach as there are several species with very limited numbers were identified microRNAs. Hence, miRanalyzer has been developed as an online browsing tool, which realize all essential techniques for an inclusive analysis of deepsequencing of small RNA molecules . NGS platforms like Genome SequencerTM FLX or Genome Analyzer (Illumina Inc.) has become easily available for small RNA molecules sequencing, which allow both the new microRNA sequences, detection of expression levels at very rapid and sensitivity with economical for common researches. Though, every sequencing work yielded up to 3 Gbp of sequence data, and their data analysis represents a key bioinformatics solution and challenge . Initially, the alignments are performed using the ultrafast short read aligner Bowtie [71,72] that contribute full color space support, allows incongruities in the alignment of the read to the genome and is more rapidly and memory capable than earlier applied alignment algorithm. Additional, the instrument covers 31 species and allocates simply addition of new ones. Third, the tool has no limitation on the number of loaded sequences for the prediction of new microRNAs, and the instruction of the prediction models takes into account dissimilarity between animal and plant microRNAs . Fourth, the implemented module, identifies the differential expression patterns of micro RNAs between two situations based on the DESeq package . In addition, taking advantage of the information that several samples are required for this last module, the multiplication of the consensus sequences for predicted mature and precursor micro RNAs have also implemented. This will be helpful to evaluate the consistency of the extrapolations, i.e. micro RNAs predicted in diverse samples are probable to be efficient than those predicted in just single sample. In conclusion, a standalone form of the miRanalyzer tool that mechanized with modified local file-based database. Variation among animals and plants microRNAs have been used for the prophecy of models and disparity expression of identified and predicted microRNAs, among two situations could be intended. Furthermore, consent sequences of discovered mature and predecessor microRNAs can be acquired from various samples, which enhance the consistency of the expected microRNAs. At last, a stand-alone version of the miRanalyzer that is supported on a confined and simply modified database is also accessible for researches; this allows the client to have extra control on certain consideration as well as to use precise data, for example unpublished assemblies or other libraries that are inaccessible.
‘Next-generation’ sequencing (NGS) tools have been developed in a variety of systems for the investigation of the whole transcriptome for gene expression analysis [75,76]. NGS helps to examine the discovery of target genes and the deviations of their expression by discovering mRNA expression dissimilarity and functional annotation . The comprehensive analysis of the transcriptome provides a significant platform to explore massive polymorphic molecular markers as a high quality resource of expressed sequence tag (EST) assortment .
RNA sequencing or Transcriptome sequencing is newly reported HTS technology that are able to generate millions of short cDNA reads in corresponding approach. RNA-seq can be used to verify sequences and a large quantity of transcripts, even at individual cellular level . RNA sequencing has been extensively used in classification of transcriptome in model species, such as Arabidopsis and rice. Moreover, productively used in detection of long non-coding RNAs and alternatively spliced transcripts stresses responsive in Arabidopsis [80,81]. A general view of a transcriptome can be accessible by RNA sequencing, comprising new transcriptionally active regions and the specific position of transcription boundaries . RNA-seq is particularly useful for investigation of transcriptomes of non-model species [83,84] as no previous understanding of transcript sequence is required.
Bioinformatics has become fundamental element of research and development in the biomedical sciences, and also plays an important function in deciphering genomic, proteomic and transcriptomic data produced by high throughput experimental tools, as systematizing information assembled from conventional biology . Bioinformatics generally deals with four facets of analysis: protein structure prediction, DNA sequence analysis, proteomics, functional genomics and systems biology. High-throughput sequencing, with its rapidly declining expenditure and growing applications is substitute many other research technologies. Nonetheless, considerable confronts remain with NGS; these include data processing and storage. The very large amount of data files Need massive quantity of data power (CPUs), Storage of Data and security/privacy (human samples) and the development of more proficient, strong and duplicated data analysis workflows, summarizing and Processing the huge quantity of data produced by HTS presents a nontrivial confront to bioinformatics . One of the most triumphant output to homogenize HTS workflow was the expansion of the Sequence Alignment/Map (SAM) format for the storage of aligned sequencing reads, along with a analogous set of efficient programs working on SAM files .
Sequencing of all genes is considered as a prognostic practice since analysis of such huge data can give unintentional findings and information about diseases in future . Biomedical study is progressively having huge computational and intensive data, and “big data science” is transferred into the medical field. Unfortunately, regulators, ethicists, and policy-makers have hardly initiated to discover the social, legal and ethical issues raised by the multiplicity of analytical and computational advances in progress and under improvement in medicine and biology. The majority funding concerning huge data bioscience has paying attention on security, a significantly important deliberation, however not the single one. Along with the challenges raised by new computational techniques are investigation about security and safety consideration, justice, and how to attain appropriate informed consent. These expertises also raise a multitude of regulatory concerns that could pressure the prospects of translating new assays or computational technologies to the public health or clinical spheres . In Norway, Biotechnology regulation and Predictive procedures are regulated and synchronized by the “bioteknologiloven”. Although this situation raises several challenges, based on this law, written permission and genetic counseling before, during and after the predictive gene testing are demanded .
High-throughput sequencing data was analyzed according to the three pipelines as shown in Figure 2. The objectives of these different methods are to determine the expression of miRNA with special techniques: the traditional method of reads alignment, and two latest techniques counting of isoforms which is autonomous of any database entries and seed investigation which focus on analysis to the bases that are mainly likely functionally significant . In this regard several software’s and procedures have been developed; including, 9 Analysis tools such as RMAP, BSMAP, mrsFAST, SOCS-B, BS-seeker, BRAT, MethylCoder, and Bismark as well as NGSmethPipe.
Third-generation sequencing technology
As technology moves forward, advancement made toward third generation sequencing tools are, being used which are comprised of real-time monitoring and Nanopore Sequencing of PCR activity throughout fluorescent resonant energy transfer. The benefit of these techniques consist of scalability, simplicity, with rising in DNA polymerase activities and products, with a reduction of miscalculation prone, and even more efficiently practicable with the ultimate goal of achieving precise real-time products .
The 3rd technology already being used in SMRT is to sequence number of DNA fragments in parallel on chip. The chip is comprised of aluminum deposited on top of glass microscope cover slip with a 100 nm-thick layer. There is an array of cylindrical wells 70–100 nm in diameter in the alluminium. Third generation sequencing technologies in present improvements over existing sequencing techniques are (i) Higher throughput; (ii) Longer read lengths to improve de novo assembly and allow to direct discovery of haplotypes and even entire chromosome phasing; (iii) more rapid turnaround time; (iv) higher consensus accuracy to enable rare variant detection; (v) limited use of starting material (theoretically only a single molecule may be required for sequencing); and (vi) economical, wherever sequencing the human genome at high fold range for a lesser amount than $100 is now a realistic goal for the society .
Progress in DNA sequencing techniques has permitted complete examination of the genetics. Understandings from sequencing the transcriptomes, genomes or exomes, of diseased and healthy cells in longsuffering are already enabling enhanced analytical prognostication, classification, and therapy selection of several syndromes. Understanding the data acquired using new HTS-DNA tools, preference made in sequencing policies, and general issues in data analysis and genotype-phenotype correlation is vital if clinicians, geneticists, and pathologists understand the growing technical literature in this field, in all aspects . The widespread genetic heterogeneity has a key apprehension for genetic counseling and molecular diagnosis. Whereas diverse approaches have been currently projected to optimize mutation detection, they either not succeed to perceive mutations in a many patients or done in a time consuming and expensive manner .
The authors declared that they have no competing interests.
This work is supported by grants from the important National Science and technology Specific Projects (No. 2012FY110100-3).