alexa Metagenomic Analysis of Molecular Profile of Breast Cancer Using Genie a Literature Based Gene Prioritizing Tool: A Novel Approach | OMICS International
ISSN: 2379-1764
Advanced Techniques in Biology & Medicine
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Metagenomic Analysis of Molecular Profile of Breast Cancer Using Genie a Literature Based Gene Prioritizing Tool: A Novel Approach

Amit Kumar Yadav* and Vidya Jha
Division of Molecular Pathology, Department of Pathology, Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India
Corresponding Author : Amit Kumar Yadav
Assistant Professor, Division of Molecular Pathology, Department of Pathology
Vardhman Mahavir Medical College and Safdarjung Hospital, New Delhi, India
Tel: 091-26707408
E-mail: [email protected]
Received: October 14, 2015; Accepted: October 24, 2015; Published: October 31, 2015
Citation: Yadav AK, Jha V (2015) Metagenomic Analysis of Molecular Profile of Breast Cancer Using Genie a Literature Based Gene Prioritizing Tool: A Novel Approach. Adv Tech Biol Med 3:147. doi: 10.4172/2379-1764.1000147
Copyright: © 2015 Yadav AK, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Related article at Pubmed, Scholar Google

Visit for more related articles at Advanced Techniques in Biology & Medicine

Abstract

Background: The biological complexity and heterogeneity of breast cancer can be explained by molecular profile. Microarray technique is the method of choice to study it. But there are certain limitations. Computational techniques are a novel method to study gene expression profile.

Materials and method: Genie, a freely available web based software was used to analyze literature, gene and homology information from MEDLINE, NCBI Gene and HomoloGene databases. Inputs given were target species (Homo sapiens) and biomedical topic (breast cancer). According to input provided genes of target species are prioritized.

Results: The ranking given to 1906 reported genes was not along expected lines. Therefore, they were manually re-ranked according to number of hits. These were narrowed to 70. The proteins encoded by these genes and their functions were obtained from NCBI database. On the basis of their function and role in carcinogenesis these genes were then grouped together into distinct categories.

Conclusion: A novel computational approach to study molecular profile of breast cancer has been demonstrated. A panel of 70 genes to study gene expression profile of breast cancer has been suggested. These genes comprehensively evaluate all aspects of molecular pathogenesis of breast cancer and recommended for future clinical studies.

Keywords
Breast cancer; Molecular profile; Genie; Gene prioritization; Metagenomic analysis
Introduction
Breast cancer is a very complex and heterogeneous disease. The vast majority of cases are morphologically infiltrating ductal carcinoma NOS. However, in these morphologically similar looking cases the biological behaviour is different. This leads to difference in response to therapy and outcome. Clinical parameters like age, tumour stage, grade and routinely used biomarkers like estrogen receptor (ER), progesterone receptor (PR) and HER2-neu cannot fully explain this heterogeneity [1,2]. This is due to the difference in molecular profile of each case.
Traditional classification of breast cancer is based on morphology. The study of gene expression profile has led to a new system of classification. In 2000, Perou and co-workers [3] in a seminal paper classified breast cancer into intrinsic subtypes based on gene expression profile. Subsequent work of numerous authors has fundamentally changed the way breast cancer is understood and classified [4-6].
Microarray technique which allows simultaneous analysis of expression of thousands of genes has been the method of choice to study gene expression profile of breast cancer [7]. The disadvantage of this technique is the limited sample and variability [8,9]. Also a comprehensive summarization of the genes is not possible due to the large number of genes and abstracts involved.
A novel approach to study molecular profile is using computational techniques to study gene function by analyzing the vast literature which is now available. Fontaine and co-workers [10] developed the Genie algorithm and web server. The input for the software is a biological topic.
It evaluates the entire MEDLINE database for relevance to that subject, and then evaluates all the genes of a user’s requested organism according to the relevance of their associated MEDLINE records. The advantage of this approach is that large amount of published data regarding genes in a particular disease can be analyzed. This kind of analysis at a multi genomic scale is not possible without computational approach. To the best of our knowledge this approach has not been used to date for studying molecular profile of breast cancer.
Materials and Methods
Analysis at a multi genomic scale is not possible without computational approach. To the best of our knowledge this approach has not been used to date for studying molecular profile of breast cancer.
The system requires two basic inputs: a target species (e.g., Homo sapiens) and a biomedical topic ideally related to a gene function (breast cancer in the present study). According to the input provided the genes of the target species are prioritized. The target species is defined by its scientific name or its taxonomic ID (e.g., Homo sapiens - 9606).
The biomedical topic is ultimately defined by a set of biomedical references represented by MEDLINE records. After giving the inputs the software initialised in 9 seconds and the analysis was complete in 73 seconds. It went through 796 abstracts on PubMed. All the relevant protein coding genes as per the input provided were analyzed. In order to ensure that only significant genes were reported the cut offs were taken as p<0.01 for abstracts and false discovery rate<0.01 for genes. A one-sided Fisher’s exact test was carried out by the algorithm to define the significance of gene-to topic relationship. It compared the number of selected abstracts to what is observed in a simulation using a set of ten thousand randomly selected abstracts.
Literature extension by orthology was not done as it was not needed. This is done when genes from poorly studied organisms are studied. The genes are ranked using, in addition to the abstracts directly associated to them, the abstracts associated to their orthologs in other species.
The genes are then presented in a list sorted by false discovery rate (FDR) with hyperlinks to the most significant abstracts, Entrez Gene and HomoloGene databases. A list of the words found to be relevant to the topic is provided to facilitate the interpretation of the results.
Results
A total number of 1906 genes were reported by the software to be associated with breast cancer. These genes were prioritized and ranked according to the abstracts directly associated with them. It was observed that the ranking given by the software was not along expected lines as it did not correlate with the data available from previous work [11-14]. For example, ZNF703 (zinc finger protein 703) was given first rank while ERBB2 (erb-b2 receptor tyrosine kinase) and ESR1 (estrogen receptor 1) were ranked third and seventh respectively.
In order to remove this discrepancy an alternative method of ranking was devised. The genes were ranked manually according to the number of hits that is the gene with the maximum number of hits was ranked first and one with minimum number of hits was ranked last. In order to find out the most important genes different cut offs for the number of hits were used-50 and 100 (Table 1). The number of cut offs was arrived at by trial method.
When cut off was used as 50 a total of 70 genes were selected, whereas a cut off of 100 yielded 33 genes. The first 5 ranks are now occupied by ESR1 (estrogen receptor 1), ERBB2 (erb-b2 receptor tyrosine kinase 2), EGFR (epidermal growth factor receptor), TP53 (tumor protein p53) and BRCA1 (breast cancer 1, early onset) genes. Previously these were taken by ZNF703 (zinc finger protein 703), GREB1 (Growth regulation by estrogen in breast cancer 1), ERBB2 (erb-b2 receptor tyrosine kinase 2), CST6 (cystatin E/M) and WISP2 (WNT1 inducible signalling pathway protein 2).
The proteins encoded by these genes and their functions were obtained from NCBI database. These are summarized in Table 2. On the basis of their function and role in carcinogenesis these genes were then grouped together into following distinct categories (Figure 1)
a) Proliferation.
b) Evading apoptosis.
c) Invasion and metastasis.
d) Sustained angiogenesis.
e) Tumour suppressor genes.
f) Estrogen.
g) Her-2 neu.
h) Miscellaneous.
Discussion
Breast cancer is a leading cause of cancer related mortality in women. As per WHO statistics [15] nearly 1.7 million new cases were diagnosed in 2012 (second most common cancer overall). This represents about 12% of all new cancer cases and 25% of all cancers in women. Traditionally clinical parameters like age, tumour stage, grade and routinely used biomarkers like oestrogen receptor(ER), progesterone receptor (PR) and HER2-neu have been used to evaluate the prognosis and guide therapy [16,17].
Vast amount of molecular information in breast cancer is now available from gene expression profiling studies. These have become an extremely important tool to assess the prognosis and guide appropriate management [18]. The technique of choice for this is microarray. The data that has become available from gene expression profile studies has impacted not only the management of breast cancer but other tumours like lung [19,20] and colon cancer [21]. However, microarray technique suffers from many disadvantages as mentioned previously [8,9]. Thus there is a need to explore alternative methods to study gene expression profile of breast cancer.
The authors have used one such technique Genie algorithm and web server in the present study to evaluate molecular profile of breast cancer. The utility of computational approach to find human genes associated with a disease has been shown previously [22-24]. However, Genie goes one step ahead and besides highlighting well known genes it brings out new candidate genes [10]. This helps in better characterization of a disease.
There are alternative gene prioritizing tools that perform automatic gene name extraction and normalization [25,26]. The basic concept behind such an analysis is that when two words repeatedly occur together in an abstract they are likely to be functionally related. However, these methods suffer from a disadvantage that they wrongly identify genes in text which leads to ambiguous results [27,28]. Genie overcomes this limitation by using NCBI curated gene associations and unambiguous gene identifiers [10].
A total number of 1906 genes were found to be associated with breast cancer. These genes were manually ranked using the number of hits as a parameter. Then the number of genes was further narrowed down by using two cut offs, i.e., 50 and 100 hits. These yielded 70 and 33 genes respectively. The authors feel that 33 genes are not sufficient for a comprehensive analysis of gene expression profile. Thus it is recommended that at least 70 genes should be studied for a proper examination. On categorization of these genes on the basis of function it was observed that amongst all the categories maximum number of genes belonged to the proliferation related group (24 genes). This was followed by tumour suppressor genes (12 genes). Overall the genes reported by the software comprehensively cover all aspects of the biology of breast cancer.
The availability of new data from microarray studies led to the development of many multigene prognostic tests to improve assessment of prognosis and therapeutic response in breast cancer [29]. The most widely used amongst these are Oncotype DX [14] and Mammaprint [13].
Oncotype DX (Genomic Health, Redwood City, CA, USA) is based on high-throughput real time, reverse transcriptase polymerase chain reaction (RT-PCR) analysis of formalin fixed paraffin-embedded (FFPE) tumor tissue [30,31]. Thus, it can also be used on archival blocks. The test utilizes 16 genes which have been shown to have highest correlation with distant recurrence after 10 years along with five housekeeping genes. The test algorithm is designed to calculate recurrence score (RS) from 0 to 100. A higher RS is associated with greater probability of recurrence at 10 years and vice versa.
MammaPrint (Agilent, Amsterdam, Netherlands) is a microarraybased test. It measures the expression of 70 genes. The test is recommended as an adjunctive prognostic test for breast cancer patients who are less than 61 years of age with stage I/II disease, lymph node-negative or one to three lymph node-positive [12]. MammaPrint stratifies patients into low-risk or high-risk prognostic groups [13]. The prognostic risk discrimination is good among. In patients who are ER-positive the assay has a good prognostic risk assessment. However, almost all ER-negative cases are stratified as high risk. This makes the prognostic score of limited clinical value in this group [32].
A large multicenter retrospective study suggested that adjuvant chemotherapy was beneficial only in the high-risk ER positive patients [33]. MammaPrint as described originally needed fresh-frozen tissue. This was a major drawback and reason for its limited clinical utilization. However, recently described version of the test can be performed on FFPE tissue [34].
Using a novel computational technique to carry out molecular profiling at a multi-genomic scale the authors suggest an alternative panel of genes (Table 1) to study gene expression profile of breast cancer. It is believed that this panel includes some of the important genes which were not present in the initial panels.
A comparison was done between the panel suggested in the present study and those included in Oncotype DX [14] and MammaPrint [13]. In case of Oncotype DX out of the total 16 genes related to breast cancer 6 (38%) are also present in the new panel. While in the case of MammaPrint out of the total 70 only a single gene i.e. MMP9 is shared with our panel. Thus there was greater correlation with Oncotype DX as compared to MammaPrint.
In a recent review of clinical utility of gene-expression profiling in women with early breast cancer Marrone and his co-workers [35] have said that five systematic reviews found no direct evidence of clinical utility for either Oncotype DX or MammaPrint. Indirect evidence showed Oncotype DX was able to predict treatment effects of adjuvant chemotherapy, whereas no evidence of predictive value was found for MammaPrint. No studies provided any direct evidence that using gene-expression profiling tests to direct treatment decisions improved outcomes in women with breast cancer. The authors believe that one of the main reasons for this apparent failure of the two techniques to influence treatment outcome is probably inappropriate gene selection. On going through the list of genes in both the tests it was felt that possibly some important genes have not been included. Thus there is a need for an alternative panel.
Conclusion
The present study has demonstrated a novel computational approach to study molecular profile of breast cancer using genie, a gene prioritizing software. A novel panel of 70 genes to study gene expression profile of breast cancer has been suggested in which the majority of genes are different from currently used panels. These genes we believe comprehensively evaluate all aspects of the molecular pathogenesis of breast cancer. However, the clinical utility of the present study can only be found out by carrying out well designed clinical studies in the future.
References

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

Tables and Figures at a glance

Table icon Table icon
Table 1 Table 2

 

Figures at a glance

Figure
Figure 1
Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

  • International Conference on Fitness and Expo
    June 06-07, 2018 Philadelphia, USA
  • 3rd International Conference on Anesthesia June 21-22, 2018 Dublin, Ireland
    June 21-22, 2018 Dublin, UK
  • Annual congress on Research and Innovations in Medicine
    July 02-03, 2018 Bangkok, Thailand
  • 7th International Conference On Telemedicine & Medical Informatics July 30 to July 31 Melbourne, Australia
    July 30- 31, 2018 Melbourne, Australia

Article Usage

  • Total views: 12168
  • [From(publication date):
    November-2015 - May 27, 2018]
  • Breakdown by view type
  • HTML page views : 8302
  • PDF downloads : 3866
 

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
 
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2018-19
 
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri & Aquaculture Journals

Dr. Krish

[email protected]

1-702-714-7001Extn: 9040

Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals

Ronald

[email protected]

1-702-714-7001Extn: 9042

Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001Extn: 9040

Clinical Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

Food & Nutrition Journals

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

General Science

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics & Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Materials Science Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Nursing & Health Care Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

Ann Jose

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001Extn: 9042

 
© 2008- 2018 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version
Leave Your Message 24x7