ISSN: 0974-276X
Journal of Proteomics & Bioinformatics
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
 
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

GExMap: An Intuitive Visual Tool to Detect and Analyze Genomic Distribution in Microarray-generated Lists of Differentially Expressed Genes

Nicolas Cagnard1*, Carlo Lucchesi2, Gilles Chiocchia3
1Plateforme Bioinformatique Paris Descartes, Université Paris Descartes,   IRNEM IFR-94, Paris 75730
2Institut Curie, INSERM U830, Paris 75248
3Institut Cochin, Université Paris Descartes, CNRS (UMR 8104), Paris, F-75014 France; Inserm, U567, Département d’Immunologie, Paris, F-75014 France ; AP-HP,   Hôpital   Ambroise Paré, Service de  Rhumatologie, Boulogne-Billancourt, F 92000 France
Corresponding Author : Dr. Nicolas CagnardPlateforme Bioinformatique Paris Descartes, Université Paris Descartes,  IRNEM IFR-94, Paris 75730
Received December 01, 2008; Accepted January 13, 2009; Published January 15, 2009
Citation: Nicolas C (2009) GExMap: An Intuitive Visual Tool to Detect and Analyze Genomic Distribution in Microarraygenerated Lists of Differentially Expressed Genes. J Proteomics Bioinform 2:051-059. doi:10.4172/jpb.1000060
Copyright: © 2009 Nicolas C. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at
DownloadPubmed DownloadScholar Google

Visit for more related articles at Journal of Proteomics & Bioinformatics

Abstract

High- throughput technologies, as DNA microarrays, generate a huge amount of data, which are difficult to interpret. Biologists require easy-to-use tools to analyze them and explore new scientific hypotheses. We propose a visual data mining tool to extract a new type of useful genomic information buried in gene lists generated by differential expression studies. We compare the genomic distribution that is observed within the gene list with the expected distribution, which is estimated from public genomic databases. An algorithm of research and a statistical test give reliable and optimal results. GExMap helps identify genomic regions that are enriched in genes whose differential expression is of potential interest for target diseases. This software is freely available20 and easily customized. Since sources are frequently updated, it offers tools for updates at any moment. GExMap is usable by any commercially and publicly available microarray platform. Furthermore, GExMap helps in interpretation by showing an ordered list for each gene ontology. Presently, GExMap can also be used to analyse gene lists not only from the Homo sapiens genome but also the Mus musculus and the Rattus norvegicus genomes.

Background
Microarray technology allows parallel and simultaneous monitoring of the whole genome. It is used to detect differentially expressed genes under different conditions. General studies of microarray gene expression generate lists of differentially expressed genes and molecular signatures of diseases. In some cases, the list of genes might be used as a powerful diagnostic and/or prognostic tool for the disease in question-7.
Since its beginnings with cDNA microarrays8-9, targeting only parts of the genome, microarray technology has been improved in accuracy10-12, 21 and now targets the whole genome.
Microarray data analysis produces large lists of genes that are difficult to interpret. Although still imperfect, several programs are able to extract gene functions and bibliographic associations from ontology databases18, while others are able to document13-14 or propose biological pathways15-16. Few programs, however, can analyze the genomic distributions buried in the data. GExMap has been designed to help fill this gap, by providing solutions to specific requests, and mapping and analyzing several types of microarray data17.
Implementation
GExMap is an R software package implementing visual and statistical tools to study and map genomic distribution of differentially expressed genes, in order to facilitate gene list interpretation. With visual representation, the genomic distribution observed in a gene list can be compared with the expected genomic distribution. The results obtained are also statistically tested to unravel genomic regions of potential interest. Moreover, GExMap produces a user-friendly documentation, widely annotated source code and a “visual technical manual” available online20. GExMap can also
produce scored lists and graphical illustrations of Gene Ontology18 identifiers to facilitate gene list interpretation and highlight specific molecular function, biological processes or cellular component.
GExMap produces one individual genomic map per chromosome, complete with statistical information. By default, chromosome maps are not created for chromosomes that contain fewer than five of the listed genes. In those cases, the genomic density comparisons will not be relevant. On the other hand, this parameter could be easily modified by the user, taking into account the increasing false-positive rate of potentially interesting regions.
To facilitate correspondence between identifiers of several types of microarray and genome databases, we have chosen the ENSEMBL genome as the reference. Once the identifier type is recognized, the appropriate correspondence file is automatically loaded. Then, all identifiers are replaced by ENSEMBL annotation and appropriate genomic information. The ENSEMBL genome is presently composed of around 33 000 identifiers corresponding to all known genes and
pseudogenes. The microarray probes do not always correspond to one known gene or to one ENSEMBL identifier. Meaning that probes with no correspondences cannot be allocated to genomic locations. This problem will be solved when the genome is fully and scrupulously annotated. Furthermore the unprocessed identifiers are listed and reported in the final report. The main GExMap data file will be frequently updated and freely available to limit this problem. We have chosen not to take into account identifiers without ENSEMBL correspondence for the subsequent steps.
There is virtually no limit to the size of the tested gene list. In terms of contents, we would recommend the gene list be generated by any rigorous mathematical, statistical or experimental approach designed to identify of differentially expressed genes. If any arbitrary selection has been performed, the gene list may import a significant bias for the results of further tests.
We define a genomic map associated with a gene list as the information about the genomic location of each gene of the list, in the corresponding genome draft.
GExMap produces the genomic map of a gene list, aggregates genomic locations into genomic units, calculates the observed gene frequency by unit, and then performs the statistical comparison of the observed genomic distribution to the one expected in the full genome. The graphical and computational unit can be chosen by the user. Default scaling has been set to 1 million bp. The hazard curve represents the expected genomic distribution which could be found when the genome-wide number of genes is scaled to the size of the tested gene list. Two hazard computations are available. The first one only takes into account ENSEMBL genes located in units where at least one gene of the tested list is located. The second type of hazard computation takes into account ENSEMBL genes of all genome (fig.1). The hazard frequency is used to test the hypothesis that the expected and observed genomic distributions are not homogeneous, and the latter must vary in relation to the disease studied. The total number of genes is broken into two regulation curves showing the number of up- and downregulated genes per unit.
When the tested gene list curve is higher than the hazard curve, then one can expect that the cluster of genes encompassed in the genomic unit could be of interest. GExMap has been set to detect these clusters automatically. On the other hand, in some particular cases, it could be interesting to detect the regions of absence of genes. For this reason, this parameter can be customized thought the use of a variable. A statistical test is performed to validate each detected cluster, and the result is depicted graphically. Significant regions are highlighted on the map by a specific green line. The complete analysis process can be summarized into 6 simple steps (Fig. 2).
Data Sources
The program loads the preformatted source file “Data.Rdata” containing all ENSEMBL identifiers and their genomic locations. The algorithm is designed to implement several processing modes. The user can define the scale of the resulting graphics, the path of the source file, and the results folder. At the moment, there is only one default mode mapping differentially expressed gene lists. Other modes, as CGH and SNP mapping, are under development. The mode is automatically implemented by the program through the analysis of the source file’s column names.
Data Identification
The appropriate file with correspondences between identifiers and ENSEMBL genome is loaded. The replicates are filtered and their numbers reported in a report text file. A table is then created with unique ENSEMBL identifiers for each selected gene, with their genomic locations. Once no existing correspondence can be found, the identifier will increment a “none mapped” list in the results folder. After filtering of the gene list from replicates and unassigned identifiers, a data matrix is created. Genomic locations are calculated by taking the center of each gene (start + end / 2). Genes are then classified by chromosomes and by chromosomal location.
Hazard Estimation
The matrix used to calculate the genomic distribution is computed chromosome by chromosome. The observed number of genes from the tested list and the expected number of genes from the ENSEMBL genome are summarized per scale unit and stored in a matrix. At this stage the hazard computation can start. The hazard computation scales the ENSEMBL genomic distribution to the size of gene list generated by microarray analysis. For each unit, hazard is computed as follows:
Specific hazard of a unit = Number of ENSEMBL genes of the current unit * (Total number of tested genes / N ENSEMBL genes).
N ENSEMBL genes is computed by summarizing all genes of ENSEMBL genome for the first step and, in the second step, taking into account only ENSEMBL genes which are localized in units that contains at least one gene from the tested list
(Fig. 3).
Statistical Tests and Progressive Optimization
First of all, two global statistical tests are performed to select chromosomes showing potential interest (fig. 3A). The CHI2-Goodness of fit test and the Wilcoxon paired sample test are the two global statistical tests used to compare expected with observed gene distributions, chromosome by chromosome.
Then a local test is performed for chromosomes validated by at least one global test (this step is customizable by the user) (fig. 3B). The binomial test is used to detect units in which observed gene population is statistically different from expected (fig. 3C to D). These units showing potential interest are then filtered, according to their significance and the difference between expected and observed distributions (this second selection is customizable by the user) before being extend to regions of potential interest (fig. 3E to G). A last step of optimization is performed to adjust statistically selected regions to seek the largest significant regions of interest (Fig. 4).
To preserve clarity, only significant results are graphically reported. All statistical results, even those that are nonsignificant, are reported in the final result file.
To facilitate the interpretation of the results, the total effective curve is separated into curves of up- (red) and a curve of down regulated (blue) genes (fig.5). In an optional part the annotated genes are sorted and analyzed by Gene Ontology (fig.6).
All the data produced and formatted by GExMap are saved in a text file. These data allow the user to identify genes located in a region of interest, or to verify data used by the statistical tests.
Conclusion
GExMap provides helpful new types of information to interpret data generated by microarray studies. The software combines easy use and easy interpretation of results. Source code is freely available, optimized and annotated to facilitate customization, as implementation of different types of statistical tests or customized detection of interesting regions. All results are separately generated to preserve clarity of graphic representations. We have tested the software with a list of 3855 genes generated from microarray analysis19. GExMap allowed us to focus on a limited number of interesting regions which are now under biological investigation. Further optimizations are planned to complete the generated information. We work on integration of new statistical tests, as a t test adapted to small effective samples with a resampling pre-processing step. To improve compatibility with most of database identifiers and microarray types, a tool dedicated to the generation of correspondence files from text data files will be soon available.
Additional functions for comparisons between microarray gene lists or genomic association studies and to map and analyze several types of genetic markers, genetic association studies, and genetic maps of all kinds are also under development.
Availability and Requirements
GExMap is publicly accessible from the URL http://gexmap.site.voila.fr/. Questions and comments are welcomed through the site.
References
  1. The R Project for Statistical Computing. » CrossRef   » PubMed   »  Google Scholar

  2. Flicek P, Aken BL, Beal K, Ballester B, Caccamo M (2008) Ensembl 2008. Nucleic Acids Res 36:D707– D714.» CrossRef   » PubMed   »  Google Scholar

  3. Ensembl.  »  Google Scholar

  4. Schena M, Shalon D, Davis RW, Brown P (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467- 470.» CrossRef   » PubMed   »  Google Scholar

  5. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531-7.» CrossRef   » PubMed   »  Google Scholar

  6. Klein U, Tu Y, Stolovitzky GA, Mattioli M, Cattoretti G (2001) Gene expression profiling of B cell chronic lymphocytic leukemia reveals a homogeneous phenotype related to memory B cells. J Exp Med 194: 1625-38.» CrossRef   » PubMed   »  Google Scholar

  7. Hans CP, Weisenburger DD, Greiner TC, Gascoyne RD, Delabie J (2004) Confirmation of the molecular classification of diffuse large B-cell lymphoma by immunohistochemistry using a tissue microarray. Blood 103: 275-82.» CrossRef   » PubMed   »  Google Scholar

  8. Devauchelle V, Marion S, Cagnard N, Mistou S, Falgarone G (2004) DNA microarray allows molecular profiling of rheumatoid arthritis and identification of pathophysiological targets. Genes Immun 5: 597-608.» CrossRef   » PubMed   »  Google Scholar

  9. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14: 1675-80.» CrossRef   » PubMed   »  Google Scholar

  10. Harrington CA, Rosenow C, Retief J (2000) Monitoring gene expression using DNA microarrays. Curr Opin Microbiol 3: 285-91.» CrossRef   » PubMed   »  Google Scholar

  11. Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J (2005) Independence and reproducibility across microarray platforms. Nat Methods 2: 329-30.» CrossRef   » PubMed   »  Google Scholar

  12. Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S (2005) Multiple-laboratory comparison of microarray platforms. Nat Methods 2: 329-30.» CrossRef   » PubMed   »  Google Scholar

  13. Petersen D, Chandramouli GV, Geoghegan J, Hilburn J, Paarlberg J (2005) Three microarray platforms: an analysis of their concordance in profiling gene expression. BMC Genomics 6: 63.» CrossRef   » PubMed   »  Google Scholar

  14. Genespring 7.2. » CrossRef  »  Google Scholar

  15. Ingenuity. » CrossRef  »  Google Scholar

  16. Pathway Assist. » CrossRef  »  Google Scholar

  17. Calvano SE, Xiao W, Richards DR, Felciano RM, Baker HV (2005) A network-based analysis of systemic inflammation in humans. Nature 437: 1032-7.» CrossRef   » PubMed   »  Google Scholar

  18. Hoheisel JD (2006) Microarray technology: beyond transcript profiling and genotype analysis. Nat Rev Genet 7: 200-10.» CrossRef   » PubMed   »  Google Scholar

  19. The Gene Ontology. » CrossRef  »  Google Scholar

  20. Cagnard N, Letourneur F, Essabbani A, Devauchelle V, Mistou S (2005) Interleukin-32, CCL2, PF4F1 and GFD10 are the only cytokine / chemokine genes differentially expressed by in vitro cultured rheumatoid and osteoarthritis fibroblast-like synoviocytes. Eur Cytokine Netw 16: 289-92.» CrossRef   » PubMed   »  Google Scholar

  21. 20- GExMap web site. » CrossRef  »  Google Scholar

  22. MAQC consortium (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24: 1151-1161. » CrossRef   » PubMed   »  Google Scholar


Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

Article Usage

  • Total views: 11238
  • [From(publication date):
    January-2009 - Sep 01, 2016]
  • Breakdown by view type
  • HTML page views : 7504
  • PDF downloads :3734
 
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

OMICS International Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
 
 
OMICS International Conferences 2016-17
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings
 
 

Contact Us

Agri, Food, Aqua and Veterinary Science Journals

Dr. Krish

agrifoodaquavet@omicsinc.com

1-702-714-7001 Extn: 9040

Clinical and Biochemistry Journals

Datta A

clinical_biochem@omicsinc.com

1-702-714-7001Extn: 9037

Business & Management Journals

Ronald

business@omicsinc.com

1-702-714-7001Extn: 9042

Chemical Engineering and Chemistry Journals

Gabriel Shaw

chemicaleng_chemistry@omicsinc.com

1-702-714-7001 Extn: 9040

Earth & Environmental Sciences

Katie Wilson

environmentalsci@omicsinc.com

1-702-714-7001Extn: 9042

Engineering Journals

James Franklin

engineering@omicsinc.com

1-702-714-7001Extn: 9042

General Science and Health care Journals

Andrea Jason

generalsci_healthcare@omicsinc.com

1-702-714-7001Extn: 9043

Genetics and Molecular Biology Journals

Anna Melissa

genetics_molbio@omicsinc.com

1-702-714-7001 Extn: 9006

Immunology & Microbiology Journals

David Gorantl

immuno_microbio@omicsinc.com

1-702-714-7001Extn: 9014

Informatics Journals

Stephanie Skinner

omics@omicsinc.com

1-702-714-7001Extn: 9039

Material Sciences Journals

Rachle Green

materialsci@omicsinc.com

1-702-714-7001Extn: 9039

Mathematics and Physics Journals

Jim Willison

mathematics_physics@omicsinc.com

1-702-714-7001 Extn: 9042

Medical Journals

Nimmi Anna

medical@omicsinc.com

1-702-714-7001 Extn: 9038

Neuroscience & Psychology Journals

Nathan T

neuro_psychology@omicsinc.com

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

John Behannon

pharma@omicsinc.com

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

social_politicalsci@omicsinc.com

1-702-714-7001 Extn: 9042

 
© 2008-2016 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version