Utility of Exome Sequencing in a Familial Trio as a Diagnostic Tool in Cardiomyopathies

Introduction: Hereditary heart diseases are a set of major prevalence diseases that are associated with risk of sudden death. These diseases affect with high frequency young individuals and whose genetic basis has been known in recent years. More than 100 genes are involved in these diseases. The emergence of next-generation sequencing is enabling a greater understanding of the genes involved, despite this, in a significant number of patients it is not detected the associated genetic defect. Methods: In this study it was analyzed the utility of exome sequencing in a familial trio as a diagnostic tool. It was realized a retrospective study of a hypertrophic miocardiopathy case with identified genetic cause. The data generated in the sequencing study was analyzed by three bioinformatics tools: ANNOVAR, Ion ReporterTM Software (Thermo Fisher. USA) and QIAGEN’s Ingenuity® Variant AnalysisTM software (QIAGEN. Redwood City). Results: Due to exome study, in the index case, two possibly pathogenic variants were detected: c1292G>A, in KCND3 gene and c.67087C>T, in TTN gene. Additionally, 5 probably not pathogenic variants were identified. Conclusion: Exome’s next-generation sequencing is a more useful tool than Sanger or panel sequencing, due to the flexibility of it. It allows analyze interest region associated with pathology and also discover new variants and their association with diseases.


Introduction
Sudden cardiac dead (SCD) is a major contributor to mortality in the general population. It accounts almost 20% of all-cause mortality in developed countries [1], causing around 800,000 deaths annually in Occidental countries. The incidence of SCD in the US is approximately 180,000 to 250,000 cases per year [2]. In Spain, the incidence is lower than in other industrialized countries, hovering around 30,000 cases per year [2].
Sudden Cardiac Death may be due to very different cardiac conditions. Cardiac isquemy is the most common cause by far in the adult population. By contrast, in the first year of life, the most frequently detected are congenital cardiac lesions and myocarditis. In child-adolescent population, in addition to electrical diseases, such as catecholaminergic tachycardia, SCD is due to hereditary heart diseases [3]. These conditions are defined as a set of major diseases that often affect young individuals, whose genetic base has begun to elucidate recently. They could be classified in two groups, cardiomyopathies and channelopathies. Cardiomyopathies are defined as myocardial disorders in which the myocardium is structurally and/ or functionally abnormal in the absence of definite disease able to cause the myocardial pathology. Cardiomyopathies are classified traditionally according to morphological and functional criteria into four categories: dilated cardiomyopathy (DCM), hypertrophic cardiomyopathy (HCM), restrictive cardiomyopathy (RCM) and arrhythmogenic right ventricular (RV) cardiomyopathy/dysplasia (ARVC/D) [4]. Channelopathies are a heterogeneous group of disorders resulting from the dysfunction of ion channels located in the membranes of all cells and many cellular organelles. These include long QT syndrome, short QT syndrome, Brugada syndrome and catecholaminergic polymorphic ventricular tachycardia [5].
Hypertrophic cardiomyopathy is an autosomal dominant disease characterized by unexplained hypertrophy of the left ventricle (and sometimes of the right ventricle), in absence of cardiac overload [6]. It is the most common inherited heart condition with an estimated prevalence of 1 in 500 in the general population [7]. The cardiac phenotype is variable even in patients carrying the same causal mutation [8]. SCD may be the first manifestation of the disease [9].
Genetic causes for HCM have been described, with over 20 genes discovered to date, most affecting the sarcomere; however, mutations in genes encoding proteins of the Z-disk or intracellular calcium modulators have also been identified. Sarcomere mutations are found in 60% to 70% of patients with a family history of HCM and in 30%-40% of apparently sporadic cases. Mutations in myosin heavy chain (MYH7) and myosin binding protein C (MYBPC3) are the most frequent [7]. Mutations identified in sarcomeric genes are typically single nucleotide substitutions, with the only exception being MYBPC3, where deletions or insertions leading to a frameshift, resulting in haplo-insufficiency occur [10].
Other disease genes have been implicated in HCM, genes encoding troponin T (TTNT2), troponin I (TTNI3) and tropomyosin (TPM1). Co-segregation in large families with members affected by hypertrophic cardiomyopathy supports pathogenic roles for mutations in CSRP3, which encodes muscle LIM protein, and in ACTN2, which encodes alpha alpha-actinin-2. Rare variants of TCAP (telethonin), ANKRD1 (encoding cardiac ankyrin repeat protein, or CARP), JPH2 (junctophilin-2) and MYOZ2 (myozenin-2) have been described in candidate gene analyses and studies of small families, but their role in the disease is unclear [6]. There is emerging hypothesis that double (or compound) mutations may confer a gene dosage effect, predisposing patients to adverse disease progression [11].
Practice guidelines and expert opinions on clinical management and genetic testis recommend: a) the genetic test of the genes MYBPC3, MYH7, TNNI3, TNNT2 and TPM1, are recommended for any patient whom a cardiologist has established a clinical diagnosis of HCM based on examination of the patient´s clinical history, family history and electrocardiographic/echocardiographic phenotype [12], b) mutationspecific genetic testing is recommended for family members and appropriate relatives following the identification of the HCM-causative mutation in an index case [13].
Knowledge of the molecular basis of these diseases is still insufficient because of the great clinical and genetic heterogeneity. The large number of different genes involved in this pathology makes it very difficult, laborious and expensive to study the genetic causes with traditional sequencing. Next Generation Sequencing (NGS) has revolutionized genetic diagnosis in recent years, and is enabling to know the genetic alterations responsible for this disease.
In the eighties and nineties Sanger sequencing became the reference technique for the diagnosis of genetic alterations in the human genome, denominating first generation technology. Until then, restriction enzyme techniques (RFLP) were used [14].
DNA sequencing technologies ideally should be fast, accurate, easy-to-operate and cheap. In the last years, they have undergone tremendous development. In 1987, the first automatic sequencing machine was introduced, adopting capillary electrophoresis which made the sequencing faster and more accurate. It could detect 96 bases one time, 500 K bases a day, and the read length could reach 600 bases. The current model can output 2.88 M bases per day and read length could reach 900 bases [15].
Several second generation sequencing technologies have emerged after the Genome Human Project. They allow sequencing gene panels, exomes and even entire genomes in a single experiment. They also have two advantages, it does not require a priori knowledge of gene(s) responsible for a disorder, it can simultaneously sequence a large number of candidate genes and has a lower cost than Sanger sequencing [16]. They include different methods that are grouped according to the template preparation, sequencing technology and data analysis [17]. There are two methods used in clonally amplified libraries: emulsion PCR and solid phase amplification. There are four main types of sequencing techniques. In semiconductor sequencing, an incorporation event is measured by a pH change from the release of protons resulting from the incorporation [14]. Pyrosequencing uses a cascade of reactions resulting from pyrophosphate being released from each incorporation reaction. This leads to a photon being released by the enzyme luciferase [17]. Sequencing by ligation is an approach in which DNA polymerase is replaced by DNA ligase and fluorescent probes bind to the complementary library sequences [17]. Reversible chain terminators sequencing is a cyclic method in which this kind of nucleotides labeled with a removable fluorescent dye at the 3´-hydroxyl terminus, are used for step-by-step DNA synthesis [18].
NGS platforms have a very high capacity of generating genetic information. This scale of genetic data will require a combination of bioinformatics, genetic and functional interpretation in order to identify the key DNA changes relevant to the individual patient. Therefore various computational strategies have been developed to estimate the harmful potential of genomic variants.

Materials and Methods
This is a retrospective study of a 12 year old, oriental patient who came to the Emergency Room for syncope in connection with effort, palpitations and chest pain. He is diagnosed with mild no obstructive asymmetric hypertrophic cardiomyopathy.
Later, in our laboratory, a trio exome study was conducted including sample of the index patient, his mother and his father.
His mother, is a healthy, Chinese, 36 years old woman. His father is a 39 years old man, diagnosed in 2011, due to his son´s diagnosis, asymmetric hypertrophic cardiomyopathy without dynamic obstruction. The procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional or regional) and with the Helsinki Declaration of 1975, as revised in 1983. Informed consent was obtained from the patients for participation in research.

Library preparation and sequencing
10 mL of whole blood was drawn into an EDTA-K2 collecting tube (Terumo Europe ® , Spain). Genomic DNA was extracted with QIAamp DNA BloodMidi Kit (Qiagen, Hilden, Germany) and quantified in parallel with spectrophotometer technology (Nanodrop ND-1000. BioTek Instruments ® , USA) and fluorimetric (Qubit. Life Technologies ® , Carlsbad, California, USA). The extracted DNA was used for library preparation in accordance with a manufacturer´s protocol using on AmpliSeq TM Exome Library Preparation kit (Life Technologies ® , Carlsbad, California, USA). Libraries were quantified by quantitative PCR using Ion Library Quantitation kit (Life Technologies ® , Carlsbad, California, USA). The barcoded libraries were pooled in a 3-plex manner clonally amplified with the Ion PI Template OT2 200 kit v3 on the Ion One Touch TM 2 System (Life Technologies ®, Carlsbad, California, US) according to the manufacturer´s protocol.
Template spheres were loaded on an Ion PI ™ Chip v2, and sequencing was performed using the Ion PI ™ Sequencing 200 Kit v3 on an Ion Proton ™ System (Life Technologies ® , Carlsbad, California, US) running Torrent Suite ™ Software 4.2 ( Figure 1).

Bioinformatics analysis
Ion Torrent´s mapping program (TMAP, version 3.4.1; https:// github.com/iontorrent/TMAP) was used to align the generated sequence data to the hg19, GRCh37 (UCSC Genome Browser) and Torrent Suit, version 4.0.2 (Ion Torrent Systems, Lifetechnologies ® , USA) was used to detect variants. In whole exome sequencing analysis by ANNOVAR, called A, filter coverage greater than 45x and quality greater than 75x were used. Intronic and synonym variants were removed. Also variants no related with patient´s pathology, cardiac disease, were taken out. Variants pathogenicity was determined using PhyloP score [19], which measures evolutionary conservation at individual alignment sites. Homozygous variants and PolyPhen [20] benign variants were removed.
In the second bioinformatics analysis, B, made with Ion Reporter (Torrent Suite ™ Software, Life Technologies ® , USA), variants nonassociated with HCM or SCD and variants with coverage less than 45x were removed. To establish pathogenicity variants not listed in OMIM, negative value for PhyloP and variants which do not produce a chemical change in proteins according to Grantham score [21,22] were removed.
The last strategy, C, used Ingenuity Variant Analysis (Sample & Assay Technologies, Qiagen) for annotated variants. Default filters associated with biological context of SCD were used.

Results
In panel sequencing analysis of genes related with myocardic disease, four mutations were found, two of them probably pathogenic, both in TTN gene, c.67087C>T and c.5602G>A (Table 1).
In trio exome sequencing, the chip generates 87,155,630 reads (15.2 G), of which 4,092,000 reads have a quality value greater than Q20. The mean length was 193 bp. The summary of mapped reads is described in Table 2.
In proband exome analysis, applying Variant Caller plugin 48,500  variants were detected. This variant was annotated following the workflow described in Figure 2.
The same variants detected in panel, were detected in exome sequencing, but some of them were lost in filtering and annotation steps (Table 3).

ANNOVAR analysis
24 variants were annotates. 3 panel variants were found, A, B and D (Table 3). Variant C, localized in position 17962789 of gen TTN was lost. It was due to the low coverage obtained in flanking regions of exons, when performed exome sequencing instead of panel sequencing.

Ion reporter analysis
Variants A and C were lost. PhyloP score and Grantham score were used to annotate variants. Both remove intronic variants like these.

Ingenuity analysis
None of the panel variants were detected due to the filter biological context associated with heart diseases.
For new variants detection, only those of them which were detected by two bioinformatics strategies were considered for result analysis (Table 4).
Variant 1, classified as pathogenic, was detected by Ingenuity and ANNOVAR. It is localized in KCND3 gene. This variant is involved in the functionality of voltage-gated potassium channels [12] and is inherited from the mother. Variant 2, also detected in panel sequencing, is classified as pathogenic. It is localized in TTN gene, which codified titin protein [23]. This protein is involved in striated muscle contraction and associated tissues. It was detected with ANNOVAR and Ion reporter and is inherited from father, who is also cardiomyopathy affected.
The other variants could be classified as Variants of Uncertain Significance (VOUS). They are variants which not meet criteria of pathogenicity by databases consulted, such as PolyPhen2, PhyloP, SIFT, PhastConsElements and Fathmm. Or are not listed and, therefore, the positive/negative effect cannot be confirmed.    Table 4: New variants detected in exome analysis with at least two bioinformatics strategies. mutations in over 20 different genes [10], so detecting genetic cause by Sanger sequencing is not feasible. In practice, Sanger sequencing is limited to no more than 10 genes [16]. In contrast, NGS would allow the simultaneous analysis of greater number genes. In most laboratories, massive sequencing panels of multiple genes associated with each patient´s disease are used, but this strategy is beginning to be replaced by exome sequencing. It allows the simultaneous analysis of all coding genes. Besides, the rapid decline in the cost of NGS has made exome sequencing cheaper than panel sequencing.

ID: Variant identity
Each exome analysis generally detects around 2,000 to 3,000 single nucleotide polymorphisms (SNPs). Only 40% of them translate in amino acid changes; an even smaller portion, in the order of hundreds, are missense mutations and fewer still are pathogenic [24].
Many factors will need to be considered in determining the biological significance and pathogenicity of the variants identified: presence/ absence of the variant in normal populations, if the variation encodes or not a change in the amino acid and the type, the conservation of the amino acid residue among species and isoforms, and where possible, the coinheritance of the DNA variant in other family members [25].
Compilation of sufficient experimental knowledge to verify the effects of variants detected in genetic analysis is always laborious and time-consuming, so various tools have been developed to provide plausible hypotheses regarding the effects of such variants. However, these tools have some limitations: a) differences in methods, reference databases, training datasets and alignment algorithms may lead to conflicting results; b) data are generated in varying and often incompatible formats that cannot be automatically parsed because they are not textual; c) several tools are no longer maintained or are rarely updated [18].
Comparing results obtained from exon sequencing with cardiac genes panel sequencing, few intronic variants are detected. This is because exome sequencing is focused on coding genes, not in intronic sequences. This variants are lost not only for the less effectiveness in sequencing flaking exon regions, also due to the scores used to predicted pathogenicity based on conservation and chemical change of proteins. Only exonic variants are annotated using these scores. Despite this exome sequencing can be validate because pathogenic variants are detected.
In addition to the pathogenic variant detected in panel sequencing, and inherited from the father, in TTN gene, another pathogenic variant is identified. It is localized in KCND3 and inherited from the mother. This gene encodes a voltage-gated potassium channel, subfamily D member 3, K4.3, related to a variant of Brugada Syndrome.
The severity of Hypertrophic cardiomyopathy is influenced by genetic dose and modifier genes [26], which may explain the severe symptoms suffer by the child and not by the father.
The rest of variants detected are VOUS, there is no enough evidence to ensure that cause disease. It is due to the lack of current data [27]. A genetic cause can be identified only in 35-45% of HCM patients and this increased to 60-65%, when family history is positive for HCM.
It is expected that with the increase of the use of NGS, more reliable results will be obtained and dismissed the VOUS, giving a better diagnosis.
In our study only variants related to genes encoding proteins associated with MSC have been analyzed. However, when exome sequencing is made, many incidental findings could be detected and it should know how to handle.

Conclusion
Sudden Cardiac Death is often one of the first manifestations of hereditary cardiomyopathies. It is expected that with the development of Bioinformatics tools, used for data analysis generated by exome sequencing and even whole genome sequencing, all genes associated with these pathologies could been identified. This will allow us to give an early diagnosis to patients and providing better management of the disease, avoiding them to suffer this manifestation.