alexa Next-generation sequencing|Bioinformatics|Genome assembly
ISSN: 0974-276X
Journal of Proteomics & Bioinformatics
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Efficiency of Corynebacterium pseudotuberculosis Cp31 Genome Assembly with the Hi-Q Enzyme on an Ion Torrent PGM Sequencing Platform

Adonney AO Veras1, Pablo HCG de Sá1, Kenny C Pinheiro1, Diego Assis das Graças1, Rafael Azevedo Baraúna1, Maria Paula Cruz Schneider1, Vasco Azevedo2, Rommel TJ Ramos1#* and Artur Silva1#*

1Institute of Biological Sciences, Federal University Pará, Belém, Pará, Brazil

2Institute of Biological Sciences, Federal University Minas Gerais, Belo Horizonte, Minas Gerais, Brazil

#These authors contributed equally to this work

*Corresponding Author:
Rommel TJ Ramos
Institute of Biological Sciences,
Federal University Pará
Belém, Pará, Brazil
Tel: (91) 32018426
E-mail: [email protected]

Artur Silva
Institute of Biological Sciences
Federal University Pará, Belém
Pará, Brazil
E-mail: [email protected]

Received Date: October 21, 2014; Accepted Date: December 14, 2014; Published Date: December 21, 2014

Citation: Veras AAO, Sá PHCG, Pinheiro KC, Graças DA, Baraúna RA, et al. (2014) Efficiency of Corynebacterium pseudotuberculosis 31 Genome Assembly with the Hi-Q Enzyme on an Ion Torrent PGM Sequencing Platform. J Proteomics Bioinform 7: 374-378. doi: 10.4172/jpb.1000342

Copyright: © 2014 Veras AAO, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Visit for more related articles at Journal of Proteomics & Bioinformatics

Abstract

Despite the high accuracy obtained through high throughput sequencing (HTS) platforms, including sequencing methods such as pyrosequencing (Roche 454), ligase (SOLiD System), and recently post-light sequencing using the Ion Torrent PGM that is able to detect ions released during sequencing, there are still many errors inherent in the chemistry used by these platforms, for example, INDEL (Insertion/Deletion) that are very common on platforms 454 of Roche and Ion Torrent PGM; Substitution abound in Illumina platforms and SOLiD. Thus, efforts to address these problems have been undertaken by the sequencing companies. To improve the accuracy of the Ion Torrent PGM reads, Life Technologies has developed an enzyme called Hi-Q.

This work aims to demonstrate the performance of the genome assembly of Corynebacterium pseudotuberculosis Cp31 using the enzyme Hi-Q. To evaluate the results, we used an Ion Torrent dataset obtained without the enzyme for the same strain.

The sequencing using the Hi-Q enzyme affected the accuracy of the reads, improving the assembly quality. As a result, a high number of complete genes related to the reference genome were obtained compared to the previous data (without Hi-Q). Furthermore, the use of Hi-Q reduced the number of contigs. After evaluated the GC bias in the genome produced through the Hi-Q, we identified a reduction of GC-Bias, what can reduce the amount and size of gaps generated in the genome assembly process. Furthermore, the comparison of the amount of pseudogenes observed in the genome annotation of C. pseudotuberculosis 31 available at NCBI and the genome sequenced by Ion Torrent PGM with a Hi-Q enzyme, show 3-fold less pseudogenes in the genome obtained through Hi-Q enzyme.

Thus, the high efficiency of Hi-Q was validated, which will be useful to whole-genome sequencing and RNASeq projects, due to its high accuracy, compared to the previous chemistry.

Keywords

NGS; Hi-Q enzyme; Genome assembly; Ion torrent; Bioinformatics

Introduction

NGS (Next-generation sequencing) platforms have brought about a revolution in the growth of biological knowledge [1]. These devices feature high data throughput at a reduced cost compared to platforms using the Sanger method [2,3].

The benchtop sequencing platforms have been launched in the past three years, for instance, Miseq by Illumina (www.illumina.com) and the Ion Torrent PGM (Personal genome machine) by Life Technologies [4].

The Miseq system performs sequencing by synthesis, similar to other Illumina sequencers [4]. However, its technology has dramatically reduced the time required per round compared to the Illumina HiSeq [5,6]. In 2011, Life Technologies released the Ion Torrent PGM, which uses sequencing based on semiconductor technology. This system detects hydrogen ions released during DNA sequencing, thereby inaugurating the post-light era [4,5].

Despite the increased accuracy of data produced by such sequencers, many errors remain (substitutions and indels), which are inherent to the chemistry used in these platforms, the method of library preparation and the type of sequencing used. These errors increase the complexity of data processing, and each of these devices introduces specific errors, such as the substitution errors that are prevalent on the Illumina and SOLiD platforms [7].

An error common to the Ion Torrent PGM and Proton platforms, which was observed in Roche 454, is related to the recognition of homopolymers due to the sequencing of these regions in a single cycle. Additionally, there is the issue of detecting the byproducts of the nucleotide incorporation reactions, i.e., pyrophosphates (454) and hydrogen ions (Ion Torrent and Proton). These are synthesis-based methods that measure reagent flow because the intensity of the flow of reactants is directly proportional to the amount of nucleotides incorporated. However, the relationship between the measured flow intensity and the number of nucleotides incorporated is nonlinear in homopolymeric regions, causing frequent errors in determining the length of such regions, which results in insertions and deletions [8].

The read mapping errors affect the de novo and reference assembly processes due to wrong insertions and deletions, which can cause frame-shifts that are identified during genome annotation and described, mainly, to Ion Torrent PGM [3,9,10].

To improve the quality of data produced on the Ion Torrent PGM sequencing platform, Life Technologies has developed a sequencing enzyme called Hi-Q with site-directed mutations in its molecular structure that reduce insertion and deletion error rates compared to the traditional chemistry.

This study aimed to demonstrate the performance during assembly and sequencing of a fragment library from the Corynebacterium pseudotuberculosis Cp31 genome using the traditionally marketed enzyme and Hi-Q, and compare the occurrence of pseudogenes between the genome available at NCBI and that assembled with Hi-Q enzyme.

Materials and Methods

Origin of the isolate, organism growth and DNA extraction

The model organism used in this study was C. pseudotuberculosis 31, which was isolated from a buffalo in Egypt [11]. The bacterium was grown and the DNA was extracted according to Ramos et al. [12].

Construction of libraries and sequencing

The steps to prepare the library included enzymatic DNA fragmentation into 400 bp fragments (Ion plus fragment library kit, PN#4471252 and Ion Shear Kit, PN#4471248), the binding of specific adapters, size selection performed on 2% E-Gel, PN#G661002, the amplification and dilution of the library, and, finally, emulsion in Ion Onetouch 2 with the Ion PGM Template OT2 400 kit, PN#4479882. The Ion OneTouch ES system was used in the enrichment step.

The Ion PGM Hi-Q Sequencing Solutions kit PN#A24569 was used for sequencing, along with the reagents available in the Ion PGM Hi-Q Sequencing Reagents Kit, PN#A24568. The sequencing was performed on an Ion 318 v2 chip, PN#4484354, following the manufacturer's recommended protocol for producing 400 bp reads.

Data analysis

The quality assessment for the raw data generated by sequencing using the two kits was performed using the FastQC tool (http://www. bioinformatics.babraham.ac.uk/), followed by data processing using the Fastx-toolkit (http://hannonlab.cshl.edu/).

De novo assembly

The assembly process for both datasets was performed using two assembly tools, MIRA version 4.0.2 (http://www.chevreux.org/) and SPADES version 3.1.1 (http://bioinf.spbau.ru/).

The parameters used in the SPADES assembler were as follows: -iontorrent that enable a pipeline specific to data of the Ion Torrent PGM platform; additionally and -careful the error correcting step was performed before the assembly process. For Mira, the default parameters were used.

Assembly quality assessment

The assembly quality was assessed using the QUAST computational tool (http://quast.bioinf.spbau.ru/), which uses an annotated reference genome as the template to validate gene completeness, misassemblies, GC content, and other statistical metrics.

The software also allows multiple assemblies to be analyzed together, thus facilitating comparisons between different assemblies. Therefore, it is possible to evaluate the performance of the Hi-Q enzyme for genome assembly.

Genome annotation

The genome annotation was performed using RAST platform [13].

Results/Discussion

Bacterial growth, the extraction of chromosomal DNA and library construction for sequencing were performed using the same protocol for both samples. The raw data obtained from the two datasets, Cp31fragments and Cp31Hi-Q, were evaluated using FastQC (Figure 1). This analysis showed the higher quality (above Phred 20) of the Cp31Hi-Q data throughout the reads, so no quality filter was applied to the data sets.

proteomics-bioinformatics-assessment

Figure 1: Quality assessment for the Cp31fragments (A) and Cp31Hi-Q (B) libraries.

The results for the assemblies obtained using the computational tools MIRA and SPADES are listed in Table 1. It was observed that even when using different assembly software, the Hi-Q library yielded better results than the library without Hi-Q. Furthermore, the SPADES tool yielded better values for N50, total contigs and total bases.

Description N50 Longest Contig Shortest Contig Total
Contigs
Total
Bases
Cp31 (Mira) 5,228 22,717 509 687 2,381,462
Cp31Hi-Q (Mira) 374,657  528,805 502 143 2,481,848
Cp31 (SPADES) 809 2,912 500 2,145 1,726,405
Cp31Hi-Q (SPADES)  345,051 655,218 1,582 15 2,387,472

Table 1: Assembly results using the MIRA and SPADES assemblers.

The assemblies were evaluated using the QUAST software [14], and the Hi-Q data were of higher quality, as shown in Figure 2. Figure 2A shows the gene completion of all four assemblies, and it is evident that the two assemblies using Hi-Q (blue and purple) featured better values for completeness relative to the reference (black dashes), which had fewer contigs than the assemblies without the enzyme (red and green); those without the enzyme had more contigs and lower gene completeness. These results demonstrate that the Hi-Q enzyme yields better performances in the assembly process.

proteomics-bioinformatics-completeness

Figure 2: Assembly quality assessment. (A) Analysis of gene completeness and (B) Plot of contig length.

Furthermore, the use of the Hi-Q enzyme was associated with the generation of longer contigs (Figure 2B), which is reflected by the blue and purple lines representing contigs >200 kb and >500 kb, respectively. In contrast, the data without Hi-Q (red and green) had contigs well below 100 kb. These results demonstrate once again the higher quality of the data produced with the Hi-Q enzyme.

These results demonstrate a significant improvement in the chemistry used in the sequencing kit, despite the improvements made for the generation of 400 bp reads. According to Loman et al. [4], there was lower accuracy (with a rate below 60% for homopolymers with a length equal to or greater than six bases) in the assembly of data from the Ion Torrent PGM platform compared to the results obtained on the 454 GS Junior and MiSeq platforms. This result is directly linked to difficulty in recognizing homopolymers.

After compare the genome annotation of C. pseudotuberculosis 31 present in the NCBI database and that sequenced with Hi-Q enzyme, 12 pseudogenes were shared between them and a reduction of 3-fold in amount of pseudogenes in the sequence produced by Hi-Q enzyme was observed (Figure 3). The list used to produce the Venn graph (Figure 3) is available in supplementary material. As an example, Figure 4 shows how the Hi-Q data improved the assembly in homopolymeric regions and fixed a frame-shift.

proteomics-bioinformatics-pseudotubeculosis

Figure 3: Venn graph of annotated pseudogenes for the C. pseudotubeculosis 31 (Cp31) sequences. Cp31_Hi-Q: the sequence produced with Hi-Q enzyme; Cp31_NCBI: the sequence present in NCBI database; Shared: pseudogenes shared by both genome sequences.

proteomics-bioinformatics-corynebacterium

Figure 4: Frame-shift correction by Ion Torrent PGM, sequenced by Hi-Q enzyme. The Artemis interface present a frame-shift generated by 3 deletions events which are highlighted (A). After performed the Blast against Corynebacterium pseudotuberculosis 258, we confirmed the deletions were wrong (C). The same region was represented by Ion Torrent PGM (B), but the deletions were fixed, which shows an example of the efficiency of Hi-Q enzyme to address the problems to represent homopolymeric regions.

To evaluate how the Hi-Q improved the results, another analysis was performed using the data produced by the same sequencer (Ion Torrent PGM) using mate-pair and the traditional enzyme, and then we compared it to Hi-Q genome assembly. Despite using a mate-paired library (Cp31_mate), the final results show an inferior performance when compared to the assembly statistics of the Hi-Q enzyme data (Table 2), which presented the highest N50 and base pairs.

Description N50 Longest Contig Shortest Contig Total
Contigs
Total Bases
Cp31_mate_Mira 10,217 53,090 506 461 2,441,500
Cp31_mate_Spades 94,741 219,175 1,465 50 2,405,149
Cp31Hi-Q (Mira) 374,657 528,805 502 143 2,481,848
Cp31Hi-Q (SPADES)  345,051 655,218 1,582 15 2,387,472

Table 2: Comparasion between mate-pair and Hi-Q libraries.

Conclusion

The efficiency of the Hi-Q enzyme was initially evident in the quality assessment of the raw data: better quality data were obtained from the Cp31Hi-Q library than data from the Cp31fragments library that was sequenced without the Hi-Q enzyme.

After the process of assembly an assessment of the completeness of the genes combined with the statistical parameters of the assembly is essential for assessing the accuracy of the assembly process. These analyses revealed better results for data produced with the Hi-Q enzyme on all metrics evaluated.

This result directly implies a reduction of frameshift errors, given that the number of broken genes is directly proportional to the number of such errors in the annotation.

Thus, the use of the Hi-Q enzyme is expected to reduce problems associated with sequencing errors and other experiments that rely on the Ion Torrent PGM sequencing platform.

Acknowledgements

This work was part of the Rede Paraense de Genômica e Proteômica supported by Fundação de Amparo a Pesquisa do Estado do Pará. AAOV, PHGS, VA, AS, RTJR were supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). KCP was supported by Fundação Amazônia Paraense de Amparo à pesquisa (FAPESPA).

References

Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

  • 9th International Conference on Bioinformatics
    October 23-24, 2017 Paris, France
  • 9th International Conference and Expo on Proteomics
    October 23-25, 2017 Paris, France

Article Usage

  • Total views: 11915
  • [From(publication date):
    December-2014 - Sep 26, 2017]
  • Breakdown by view type
  • HTML page views : 8123
  • PDF downloads :3792
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri, Food, Aqua and Veterinary Science Journals

Dr. Krish

[email protected]

1-702-714-7001 Extn: 9040

Clinical and Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals

Ronald

[email protected]

1-702-714-7001Extn: 9042

Chemical Engineering and Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001 Extn: 9040

Earth & Environmental Sciences

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

General Science and Health care Journals

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics and Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001 Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Informatics Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Material Sciences Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Mathematics and Physics Journals

Jim Willison

[email protected]

1-702-714-7001 Extn: 9042

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001 Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

John Behannon

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001 Extn: 9042

 
© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
adwords