Received Date: March 18, 2014; Accepted Date: April 01, 2014; Published Date: April 11, 2014
Citation: Alharbi KK, Khan IA, Tejaswini YRSN, Devi YA (2014) The Role of Genome Sequencing in the Identification of Novel Therapeutic Targets. J Glycomics Lipidomics 4:112. doi: 10.4172/2153-0637.1000112
Copyright: © 2014 Alharbi KK, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Glycomics & Lipidomics
Remarkable technological innovations have emerged in recent years allowing for rapid and cost-effective whole genome direct sequencing. This generation of massive amounts of genomic data was made with the assumption that the better understanding of the Genomics would aid in the identification of new causes for genetic disorders, as well as discovering new therapeutic targets. Due to this assumption, many genomes from different organisms, including humans, have been sequenced, resulting in an immense amount of genetic data. However, in order to best use this data, a similar expansion in our ability to process and analyze the data on a large scale will be necessary. The present review focuses on the impact of genome sequencing projects on the identification of novel genes and proteins, with a special focus on the role of sequencing pathogenic genomes in potential drug development.
DNA; Human genome; Genome sequencing; Genome techniques
The increased need to identify novel targets that are relevant to disease and chemically tractable has posed a challenge to the pharmaceutical industry. It is extremely difficult to identify a unique target specific to a certain disease and that does not have offtarget effects. The genomic era has brought with it a basic change in experimentation, enabling researchers to look more comprehensively at a biological system  and rekindling hope for identification of novel targets for disease treatment. It is estimated that the number of potential therapeutic molecular targets will increase from the approximately 1,000 currently used by the pharmaceutical industry to as many as 10,000 , after proper analysis of genomic data. However, this analysis of the vast amounts of genomic data available presents the drug discovery community with new challenges. For example, G-protein coupled receptors (GPCRs) are the targets commonly used to inhibit pathological processes. These targets have high therapeutic relevance and have been studied exhaustively in terms of basic science. Drug discovery for these common targets, therefore, relies on a meticulous understanding of their function. In contrast, the novel potential targets recently identified by genome projects and environmental sequencing are not usually as well researched. Before the genomic era, about 100 literature references relating to each target were published. Now, each newly identified target has approximately ten references ascribed to it [3,4] recognized that the increased number of targets with the corresponding decreased amount of information about each is ultimately generating a bottleneck in the target validation process. Therefore, it is clear that the cure does not lie in identifying many targets for a particular disease but in the proper mechanistic understanding of how genetic change causes aberrant function. Figure 1 summarizes the flow of target identification for drug discovery, from sequence identification to exploiting the proteome.
The sequencing of the human genome [5,6] and related organisms represents one of the most significant and historic scientific accomplishments. The process of determining the sequence of DNA, proposed by Sanger , has led to the sequencing of the entire genome of an organism. The first finished genome to be published was of the Haemophilus influenza . Within three years of this groundbreaking publication, more than 10 other genomes of model bacterial species like Escherichia coli  and Bacillus subtilis  and pathogens, such as Helicobacter pyroli  and Mycobacterium tuberculosis  were published. The Institute of Genomic Research (TIGR) was a substantial leader in the genomic field by fully sequencing the genome of the yeast . It was a tremendous effort, utilizing 600 scientists from over 100 laboratories and representing the largest decentralized experiment in modern molecular biology.
In parallel with the human genome project, a complementary effort has been made to properly analyze the mouse genome [14,15]. The mouse genome is an important source, as the mouse serves as powerful model organism for bettering our understanding of human disease mechanisms and overall biological processes.
Genomics has led to the implementation of a more global approach to biological problems, overriding the “one gene at a time” approach that is limited by not taking into account the full diversity and complexity of gene expression. The field has been forever altered by the development of high-throughput DNA sequence analysis. Development of highly automated methods of DNA sequencing in the 1990s considerably increased the capacity to sequence the genome in less time. DNA microarray technologies, on the other hand, have enabled the simultaneous measurements of hundreds of thousands of DNA molecules, as well as RNA. It has permitted the ability to measure the extent to which each gene in the genome is switched on or off in a microscopic tissue sample, allowing the construction of an “expression profile.” Additionally, DNA microarrays can be used to determine an individual’s DNA sequence at thousands or millions of specified locations in the genome, thereby creating a “genome profile.” It has also provided a powerful way to investigate the role of single or multiple genes along with DNA sequence variants in disease processes, both in individuals and in certain populations.
Bacterial genomes in the race
The pharmaceutical industries in their quest to develop new antibiotics have adopted these new sequencing technologies, instead of applying the traditional approach of random screening for new active molecules using simple antibiotic activity for primary selection followed by chemical optimization. New bioinformatics tools have been generated for comparative genomic analysis to better understand the evolutionary and phylogenetic relationships of organisms . Focus has been laid on the comparison of a species with a small genome, such as Mycoplasma genitalium (469 putative genes) and a more typical pathogen such as H. influenzae (1703 genes), which revealed the existence of 233 conserved genes . This study suggested that the “minimal genome” of at most 250 genes was important enough to be conserved across species. The same approach has been followed for the analysis of a variety of different genomes (e.g. those of pathogens, such as H. influenzae, Streptococcus pneumoniae, Mycoplasma pneumoniae, and other streptococci). The comparative analysis of the genomes of Chlamydia trachomatis and Chlamydophila pneumoniae has identified specific genes that might be responsible for the different pathologies seen between these two organisms .
In order to understand the genomic differences between pathogenic and non-pathogenic variants of the same species (e.g., Mycobacterium tuberculosis and Mycobacterium bovis), DNA-array technology can be used to determine differences in regions of the variant genomes with comparative hybridization analysis. These regions potentially contain genes that are likely to be of relevance for the development of new antibiotics and/or vaccines . Other approaches have been used over time, however, to identify essential genes through genetic manipulation that affects organism survival. Often, transposons are used to inactivate genes by random insertion [2-25]. Genetic foot-printing using diverse hybridization and PCR techniques were then used to map the insertion sites in the genomes. Other genome-wide gene inactivation studies utilized homologous recombination methods [26-30]. In these studies, resistance markers are normally introduced into the genomes to aid in screening. However, marker less gene deletions represents the most accurate method of gene inactivation.
Such techniques are more difficult and are mainly reported in E. coli [31,32]. Furthermore, the observation that a gene cannot be inactivated is not final proof of its essentiality for the organism. The genes listed in Table 1 are considered potentially essential for most species, which was based on these studies.
|Escherichia Coli||4279||Transposon mutagenesis||Gerdes et al., |
|Bacillus Subtilis||4101||Plasmidinsertionmutagenesis.Conditional mutants. Estimations derivedfrom literature study.||Kobayashi et al., |
|Haemophilus Influenza||1709||Transposon mutagenesis||Akerley et al., |
|Helicobacter Pylori||1552||Transposon mutagenesis||Salama et al., |
|Mycoplasma Genitalium||484||Transposon mutagenesis||Hutchison et al., |
|Staphylococcus Aureus||2595||Antisense RNA expression||Jiang et al., |
|Staphylococcus Pneumonia||2043||Plasmid insertion mutagenesis||Thanassi et al., |
Table 1: Number of potential essential genes identified in genome-wide gene inactivation studies.
The increasing resistance of bacterial pathogens to presentday antibiotics demands more innovative and efficient approaches towards the development of new drugs. To date, bacterial genomics extensively increased the rate at which novel targets are identified and validated. Furthermore, it is very likely that “next-generation” genomic technologies will further accelerate target identification and generic assay development. The application of quantitative structure-activity relationship (QSAR) techniques can then help to reduce the later stages of the development of antimicrobials, such as lead optimization, toxicology, and clinical trials. However, more effort needs to be applied in the improvement of existing methodology to decrease the lag-period between lead identification and the marketing of a new drug.
Pathogenic protozoan genomes
The resurgence of infectious diseases worldwide has been a major impetus in increasing research activities. The World Health Organization has identified African trypanosomiasis, Chagas disease, dengue fever, lymphatic filariasis, leishmaniasis, leprosy, malaria, onchocerciasis, schistosomiasis, and tuberculosis as ten major, yet neglected, infectious diseases. There are ongoing and intense efforts to control or even eradicate the organisms that cause these diseases . Out of these diseases, four are caused by protozoan parasites (African trypanosomiasis, Chagas disease, leishmaniasis, and malaria), and account for more than 1.3 million deaths annually. Hence, more focus has been placed on the genomics of these parasites, as it is believed to be a major source for the effective translation of basic research into applications pertinent to disease control .
Leishmania  is a genus of protozoan pathogens that cause a range of diseases in humans, the result in extensive suffering and death. In recent years, only one compound, miltefosine, has been added to the list of promising anti-leishmaniasis drugs . The history of miltefosine, an alkylphosphocholine that likely interferes with lipid metabolism , demonstrates another paradigm in the development of antiphrastic drugs. Miltefosine was first developed as an antitumor agent but turned out to be clinically ineffective. Only much later its effectiveness against leishmaniasis was discovered [38,39]. With the recent advances in technology, the Leishmania genome was sequenced. The organism is diploid, with 36 chromosome pairs. Sequence analysis revealed that the Leishmania chromosome 1 is composed of a 257 kb information-rich region with 79 protein-coding genes, and contains an unusual gene organization. This organization is suggestive of novel transcription processes . These new findings could potentially allow more drugs to be developed to treat Leishmania infections.
Targeting human genome
“A more important set of instruction books will never be found by human beings. When finally interpreted, the genetic messages encoded within our DNA molecules will provide the ultimate answers to the chemical underpinnings of human existence. They will not only help us understand how we function as healthy human beings but will also explain, at the chemical level, the role of genetic factors in a multitude of diseases such as cancer, Alzheimer’s disease and schizophrenia— that diminish the individual lives of so many millions of people.” James Watson.
The Human Genome Project that has been described as the “Holy Grail” or the “Rosetta Stone” due to its work on deciphering the secrets of human life contained within the genome’s 3 billion bases. These bases encode for about 35,000 genes, far fewer than the expected 100,000. The number represents only a little more than twice those of the genes found in a fruit fly, a mosquito, or a worm. Still, the massive amount of genomic data associated with the larger human genome indicates a considerable increase in complexity. The human genome, the first sequenced vertebrate genome, is around 30 times larger than that of the fly, worm, and mosquito, and 250 times larger than the firstsequenced eukaryotic genome, the yeast .
Analysis of the completed human genome suggests that there are tens of thousands of genes [42,43] and at least as many proteins. Many of these proteins are potential targets for drug intervention to control human disease or injury; popular estimates are in the range of 2,000 to 5,000 potential proteins . However, drugs discovered in the past 100 years have only targeted approximately 500 of these proteins . Compiling a more complete list of all potential drug targets from genomic analyses is a start, but is unlikely to revolutionize downstream research or drug development throughput. The principal value of the human genome sequence comes with the ability to produce drugs for these targets. Designing drugs with the desired physical properties and specificities should become an active process occurring at the very earliest stages of target selection, rather than a process primarily driven by trial and error.
Better and early development of drug targets is especially crucial for addressing the complexity of treating cancer. Based on literature, approximately 100,000 somatic mutations found in cancer genomes have been reported, with the first reported somatic mutation in the gene, HRAS. There is no single technology at present that will detect all the types of abnormalities (i.e., deletions, point mutations, frameshift mutations, copy number variation, network dynamics, and epigenetic changes) associated with cancer. New molecular inventions like exome sequencing, next-generation sequencing, microarrays, and gene chip analysis are beginning to uncover some key genomic regulators. Over the next few years, it is estimated that several hundred million more factors will be identified by large-scale, complete sequencing of cancer genomes. Many clinical trials now include genomic profiles of cancer patients as prognostic and diagnostic indicators. Genomic silhouettes are even used to monitor where and how the cancer genome has been affected by molecularly targeted therapies. These studies will enable earlier and better therapeutic interventions for cancer patients. These data will provide us with a detailed picture of the evolutionary processes that result in our most common disease, providing new insight into the origins and treatment of cancer. However, a comprehensive analysis of the cancer genome remains a daunting challenge. Mining and sharing of data should eventually help oncologists to better integrate the genotypic and phenotypic changes that occur in the many phases of cancer.
Many of the computational methods were designed to handle extremely long genomic sequences (e.g., LAGAN, Genome scan) which can be used to analyze large volumes of sequence data in a high throughput manner. These methods are created to compensate for the inability of databasesearching algorithms, such as BLAST, FASTA, or Smith Waterman [46-48], to handle large sequence queries. Open reading frame prediction algorithms, such as GRAIL or GENSCAN [49-55] are ~70% accurate in predicting exons from eukaryotic genomic sequences . These algorithms, however, have more difficulty in determining which exons constitute one single open reading frame versus identifying one short single exon [57-61]. Updated lists of genome sequencing projects and sequence data are available at the Multipurpose Automated Genome Project Investigation Environment, National Center for Biotechnology Information and the Institute for Genomic Research World Wide Web sites. Just as past chemists systematically organized all elements in a table that represented their differences and similarities, the Human Genome Project will allow modern scientists to construct a biological periodic table relating units of nucleotides corresponding to their evolutionary and functional relationship. If we are to effectively use the fruits of genomic research, we must re-engineer drug discovery and development.
“The authors would like to extend their sincere appreciation to the Deanship of Scientific Research at King Saud University for its funding of this research through the Research Group Project no RGP-VPP-244”.