Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California, USA
Received date: May 12, 2016; Accepted date: Jul 21, 2016; Published date: Jul 23, 2016
Citation: Li WL (2016) Interpretation of Clinical Next-Generation Sequencing Data: A Hurdle to Jump Over. Next Generat Sequenc & Applic 3:130. doi:10.4172/2469-9853.1000130
Copyright: © 2016 Li WL. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Next Generation Sequencing & Applications
Next-generation sequencing (NGS) has been revolutionary for the clinical diagnostics field. With its high throughput sequencing power and plummeting cost, it has been increasingly used in clinical labs. Instead of testing the candidate genes one at a time by Sanger sequencing, now a lab can test a group of candidate genes at the same time using the NGS method. For example, many clinical labs now offer epilepsy gene panel tests that usually sequence 100-500 genes that are known to be causal or have associations with different kinds of epilepsies. Gene panel tests are also offered for genetic heterogeneous diseases like neurodevelopmental disease, cardiomyopathy, immunodeficiency disease, etc. This approach dramatically increases diagnostic efficiency and helps clinicians to zoom in on the genetic cause for a certain disease in a timely manner. In addition, the NGS technology is also used to diagnose patients who have been through diagnostic odysseys. Exome sequencing is currently used for this purpose in clinical labs. Exome test sequences an exome that contains all the protein coding regions which comprises 1.5% of the genome but contains 80% of recognized disease-causing mutations. Numerous examples illustrated that the exome sequencing method can efficiently identify genetic causes for undiagnosed diseases, which not only helps clinicians to obtain accurate diagnoses but also guides clinicians in the personalized care and treatment of their patients [1,2].
Although NGS technology transforms the clinical diagnostics field, it is still not easy for routine clinical molecular labs to adopt this technology. The hurdle is the interpretation of clinical NGS data, which usually has gigabytes or terabytes of data. Normally, a clinical exome sequencing test detects 20,000-30,000 variants in proteincoding regions per patient, and identifying a disease-causing variant from this large number of variants poses a serious challenge for routine clinical molecular labs which usually only deal with one gene and several variants at a time. Without a bioinformatics team, which helps with the sequence alignment, variant calling, and variant filtration, it is impossible for the lab directors to make sense of the huge amount of NGS data, let alone to make clinical interpretations out of it. Commercial software is available; however, often there is an annual subscription fee and a fee for analysis per sample which a small clinical lab cannot afford.
In addition, even after bioinformatics data analysis and filtration, NGS data still needs manual interpretation of those identified genes and variants. Big academic labs such as Baylor Genetics Lab and UCLA Clinical Exome Sequencing Lab have an NGS data review board which includes lab directors, physicians, and researchers in the molecular biology and genetics fields to manually review exome sequencing data and make a final clinical interpretation. This manual review process makes sure that the genetic variant found can explain the clinical presentations of the patient and inheritance pattern of the disease if it is known. Recently bioinformatics pipelines were reported to be able to efficiently prioritize the genetic variants according to the phenotype information and inheritance pattern of disease causal genes [3-5].
The clinical labs that already offer clinical exome sequencing tests have shown the diagnostic yield of the clinical exome is just around 25% [6-9]. The remaining 75% of clinical cases still could not be diagnosed, preventing appropriate treatment of these patients. The explanation of this low diagnostic yield could be the following:
• Disease-causing variants are located outside of protein-coding regions, such as gene promoter, enhancer, deep intronic region, noncoding intergenic region, etc . Not knowing much about these non-protein-coding regions prevents the interpretation of the functional impact of variations seen in them. For example, a gene promoter is important for gene expression, and it is well known that gene promoter mutations can lead to genetic diseases or cancer [11-14]. However, the pathogenicity of a promoter variant could not be easily inferred without doing functional analysis.
• Disease-causing genes or variants are novel or with few functional studies and associated clinical reports; therefore, it is difficult for clinical labs to perform clinical interpretations on those genes or variants. Often, exome-sequencing-identified variants that were not seen previously, the majority of which are missense mutations, and the pathogenicity of variants remains to be tested functionally. In addition, some novel genes are found in exomes that might be associated with human diseases (inferred from animal studies); however, they are not documented in the OMIM and HGMD databases so that clinical interpretation on the variants seen in these genes is impossible.
• Multigenic cause contributes to the disease onset. Currently the clinical labs only analyze genomic data according to the simple Mendelian inheritance pattern. However, the disease etiology could be multigenic.
• Genetic cause is not enough for the disease onset; environmental insult also plays an important role in disease manifestation. Up to date, how the genetic and environment factors interact with each other to initiate disease onset is not clear yet; however, active research is being done for diseases like autism, allergy, and cancer [15-18]. The knowledge gleaned from this research undoubtedly helps us understand these complicated diseases.
In conclusion, with increasing adoption of genomic sequencing technology in clinical labs, the main challenge is to interpret the clinical genomic data accurately in a timely manner so that the lab can give back helpful clinical reports to physicians. The main obstacle for the interpretation of genomic data is due to the unreadiness of processing huge amounts of data, lack of knowledge of genetic variation in normal populations, insufficient clinical and research studies on important disease genes and variants, and unknown functionalities of non-protein-coding regions in the genome.
Currently, genomic data processing is getting more efficient due to a lot of research effort has been put in, and thus more advanced software tools have been made available for free usage. Such software tools can be adapted for clinical usage with the mind of patient privacy, and automatic clinical report upload to patients’ medical record.
In addition, for clinical testing, each patient’s clinical features have to be considered for data analysis too. With more and more applications of whole genome sequencing, transcriptome sequencing, and epigenomic sequencing, cutting-edge bioinformatics tools are also being created to integrate various genomic data so that clinical lab directors can process complete genomic profile of patients before final clinical interpretations.
In addition, big population sequencing projects, sequencing projects on specific patient cohorts and functional studies on non-coding regions of the genome will allow us, in the near future, to discover the meaning of genomic codes that we do not yet understand, and unveil the secrets hidden in our genome. With the advancement of our tools and knowledge in genomics, we will accumulate unprecedented capabilities to interpret our genome, and thus unleash the full potential of NGS technology.