alexa Medaka Proteome Study: Use of Cross-species Matching and Expressed Sequence Tag Data for Protein Identification | Open Access Journals
ISSN: 0974-276X
Journal of Proteomics & Bioinformatics
Like us on:
Make the best use of Scientific Research and information from our 700+ peer reviewed, Open Access Journals that operates with the help of 50,000+ Editorial Board Members and esteemed reviewers and 1000+ Scientific associations in Medical, Clinical, Pharmaceutical, Engineering, Technology and Management Fields.
Meet Inspiring Speakers and Experts at our 3000+ Global Conferenceseries Events with over 600+ Conferences, 1200+ Symposiums and 1200+ Workshops on
Medical, Pharma, Engineering, Science, Technology and Business

Medaka Proteome Study: Use of Cross-species Matching and Expressed Sequence Tag Data for Protein Identification

Mezhoud Karim1*, Praseuth Danièle2, Marie Arul3,4 Puiseux-Dao Simone1 and Edery Marc1

1FRE 3206 MNHN-CNRS, Cyanobactéries, cyanotoxines et environnement, Muséum national d’Histoire naturelle, 57 rue Cuvier, F-75231, Paris Cedex 05, France

2CNRS, UMR5153; Inserm, U565: USM0503, Muséum national d’Histoire naturelle, 57 rue Cuvier, F-75231, Paris Cedex 05, France

3Plateforme de Spectrométrie de Masse et de Protéomique, Département Régulation Développement et Diversité Moléculaire, Muséum national d’Histoire naturelle, 63, rue Buffon, F-75231, Paris Cedex 05, France

4Molécules de Communication et Adaptation des Micro-Organismes, FRE 3206 CNRS, Paris, F-75005 France

*Corresponding Author:
Dr. Karim Mezhoud
Cyanobactéries, cyanotoxines et environnement
Muséum national d’Histoire naturelle
57 rue Cuvier, F-75231
Paris Cedex 05, France
Tel: (33)140793212,
Fax: (33)140793594,
E-mail: [email protected]

Received Date: January 06, 2009; Accepted Date: February 09, 2009; Published Date: February 20, 2009

Citation: Mezhoud K, Praseu th D, M arie A, Puiseux-Dao S, Edery M (2009) Medaka Proteome Study: Use of Cross-species Matching and Expressed Sequence Tag Data for Protein Identification. J Proteomics Bioinform 2: 067-077. doi: 10.4172/jpb.1000063

Copyright: © 2008 Mezhoud K, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Visit for more related articles at Journal of Proteomics & Bioinformatics


Proteomics aims to understand gene function and molecular processes of the living cell through a large-scale study of the expressed proteins. Although proteomics approach is largely applied on experimental models such as rodents, other animal models such as medaka fish (Oryzias latipes) also attracts considerable interest in proteomics field. Medaka is one of the most studied animal model in reproductive or developmental biology. Although, its genome has been sequenced, only a few proteins are available in the databases. We present the procedure used in one experiment as a model of identification of proteins, in this case from liver of medaka treated by a hepatotoxic cyanotoxin, the microcystin. Because O. latipes is not listed in mascot search engine, the identification of proteins (selected because modified by the treatment) was done on closed related species. First, 15 spots were analyzed using peptide mass fingerprinting (PMF). However none could be reliably identified using mascot engine. The reason for low sequence coverage is due to the fact that the outcome of the research depends on the completness of the protein database obtained from NCBInr and the availability of cross-species for protein identification. In order to improve the identification, the PMF search was combined with a search in a medaka nucleotide specific database and confirmed by MS/MS ion searches which revealed successful. This identification procedure has been successfully applied in different experiments with the medaka model.


Bioinformatics; Databases; Mass spectrometry; Medaka; Protein identification; Proteomics


Blast: Basic Local Alignment Search Tool; Blastp: Blast for protein database; EBI: European Bioinformatics Institute; EST: Expressed Sequence Tag; i.d., Identify; IDA: Information-Ddependent Acquisition; MS: Mass Spectrometry; MS/MS: tandem Mass Spectrometry; NCBI: National Centre of Biological Information; NCBInr: NCBI Nonredundant protein database; NCBI-EST: NCBI Expressed Sequence Tag database; PAGE: Polyacrylamide Gel Electrophoresis; PMF: Peptide Mass Fingerprinting; rpsblast: Reverse- Position Specific Blast (search conserved domains on a protein); tblastn: Translate Blast Nucleotide; 2-D: Two Dimensional; 2-DE: Two Dimensional Electrophoresis.


Proteomics is aiming at understanding gene function and at characterizing molecular processes of the living cell through a large-scale study of proteins demonstrated as more or less expressed or post-transcriptionally modified in a given biological context. In such studies, protein identification represents a key step. In toxicology proteomics is largely in use but on a limited number of experimental models such as rodents. However other animal models are more and more experimented: the fish medaka (Oryzias latipes) is one of the most studied in reproductive or developmental toxicology for example (Wittbrodt et al., 2002; Wester et al., 2004). But if its genome is sequenced (Kasahara et al., 2007), only a few proteins have been characterized in databases which restricts proteomics studies (Henrich et al., 2003).

Such studies require the combination of two-dimensional gel electrophoresis (2-DE) and mass spectrometry (MS) technology, in association with protein or nucleotide database search. Two MS identification strategies are possible. Peptide mass-fingerprinting (PMF) has become a highthroughput method that is easy to handle and rapid; however, the results largely depend on the range of proteins from various species listed in the database searched. A more sophisticated technology is based on the use of sequence tags for peptide-fragment information and subsequent matching to the nucleotide database using Mascot engine (Perkins et al., 1999). If protein sequences of the specific organism under study are not available, resorting to crossspecies protein identification is suitable (Cordwell et al., 1995). Depending on the organisms studied, the possibility and reliability of cross-species spot identification through PMF-matching vary greatly. Considering peptide masses, the results of cross-species identification yield low scores, with limited sequence coverage. Consequently, applying tandem mass spectrometry (MS/MS) becomes useful and permits identification of the amino acid sequences; in this case, matching the data with that in the nucleotide database (NCBI-EST) generally produces more results than the theoretical peptide-mass database (NCBInr). Moreover, modern sequencing technologies and projects generate a large quantity of DNA sequences at high speeds. Medaka is one of the interesting organisms whose genome sequence has been published (Kasahara et al., 2007); nevertheless, its proteome has not yet been characterised. Large-scale in situ hybridisation screenings have been carried out, and the expression data sorted in the databases are accessible through web interfaces. The medaka fish home page has grouped the major medaka genome-sequencing projects ( About 527,715 nucleotides and 1,933 proteins have been characterised in the European Bioinformatics Institute ( ) and the National Center for Biotechnology Information (NCBI) ( projects (as of December 2008). However, compared to a few other species, the nucleotide and protein databases of Oryzias latipes are still incomplete in public databases. Therefore, the finishing step and annotation process is time-consuming. It has been proved in this article that medaka protein identification using PMF and orthologous proteins can yield valuable information, but the procedure needs to be completed using tandem mass technology. These results could be used as guidelines for determining the feasibility of the shotgun proteomics approach using medaka fish as an experimental model.


2-dimensional Polyacrylamide Gel Electrophoresis and Staining

Freshly isolated fish livers were fractionated into cytosolic, membranous/organellar, nuclear and cytoskeletal fractions using the Subcellular proteome extraction kit from Calbiochem, according to the manufacturers instructions. For sodium laurylsulfate PAGE (SDS-PAGE), approximately 20 g proteins of the different subcellular fractions were electrophoresed in 12 % acrylamide gels according to Laemmli, (1970). For two dimension PAGE (2DPAGE), the samples were further treated with trichloroacetic acid precipitation so as to remove salts. The electrofocusing was then performed using commercial IPG strips (Immobiline dry strips, 7 cm, pH 4-7 non linear, GE Healthcare, England). The strips were passively rehydrated overnight with 200 ìg proteins in solution in 125 µ L 2 M thiourea, 6 M urea and 2 % CHAPS (w/v). The isoelectrofocusing was then performed at 16°C with the following potential ramp program: 0-50 V in 2 min, 50 V for 30 min, 50-200 V in 15 min, 200 V for 30 min, 200-2000 V in 30 min, 2000 V for 3 hours (maximum current per strip: 25 mA). Prior to the second dimension, the IPG strips were equilibrated twice for 20 min with gentle shaking in 8 mL SDS-containing equilibration buffer [50 mM Tris-HCl, pH 8.8, 6 M urea, 30 % glycerol (v/v), 2 % SDS (w/v) and traces of bromophenol blue]. DTT (1 % final, w/v) was added at the first equilibration step. Iodoacetamide (2.5 % final w/v) was added to the equilibration buffer at the second step. The strips were loaded on top of a 1 mm-thick 11 % acrylamide SDSPAGE gel, along with molecular weight markers (Molecular Probes). Each 2-D gel was stained with SYPRO Ruby stain (Molecular Probes) and was imaged on Typhoon 9410 (GE Healthcare, England) apparatus.

Protein Chemistry and Sample Preparation

2-D PAGE protein spots of interest were excised and washed thrice in 25 mM NH4HCO3 for 10 min and dehydrated in 100 % acetonitrile (ACN) for 5 min. Acetonitrile was evaporated under vacuum and the gel pieces were rehydrated with 15 µL of a 12.5 ng/µL trypsin solution (25 mM NH4HCO3, 5 mM CaCl2; trypsin from Promega) on ice for 45 min. The excess liquid was removed and the gel pieces were incubated overnight at 30°C in 20 µL 25 mM NH4HCO3. The resulting peptide mixture was extracted from the gel by sonication. Desalting the peptides was performed either using Zip-Tips (Millipore). Elution of the desalted peptides was with 80 % ACN and the peptide mixture was vacuum-dried. Peptides were redissolved in 2–5 µL 1 % formic acid.

MS techniques

Matrix-assisted Laser Desorption/Ionisation Time of Flight MS

Peptidic mixtures (2 µL) were mixed with an equal volume of a saturated solution of α-cyano-4-hydroxy-cinnamic acid (Sigma) in 50% acetonitrile (ACN), 0.05% trifluoroacetic acid. The new mixtures were allowed to co-crystallise on the matrix-assisted laser desorption/ionisation (MALDI) target plate (dried-droplet method). Mass spectra were recorded in the positive ion mode on a MALDI time of flight (MALDI-TOF) mass spectrometer (Voyager-DE Pro, Applied Biosystems). Mass/charge (m/z) ratios were measured in the reflection/delayed extraction mode, with an accelerating voltage 20 kV, grid voltage 150 V, 64.5% guide wire voltage and low mass gate 600. Protein identification using PMF was conducted using the MASCOT search engine ( The search parameters were defined as follows: NCBInr database, all taxa, trypsin digest (allowed up to 1 missed cleavage), cystein optionally modified by carbamidomethylation, methionine optionally modified by oxidation, monoisotopic, positive mode, maximal mass tolerance of 0.1–0.5 Da.

Electrospray-ionisation Quadrupole-TOF MS

Peptide mixture was mixed with 90% ACN, 10% methanol. Nano-electrospray-ionisation (Nano-ESI) experiments were carried out using a hybrid quadrupole timeof- flight (Q-TOF) mass spectrometer (Pulsar i, Applied Biosystems), equipped with a nano-ESI source. About 2 µL of the digest were loaded into the nanocapillary; MS and MS/MS data were acquired using an information-dependent acquisition (IDA) method with the following settings: chargestate selection from [M+2H]2+ to [M+4H]4+; intensity threshold: 5 counts; the collision energy setting was automatically determined by the IDA method based on the charge state of each precursor ion. Following the IDA-based gas-phase fragmentation data acquisition, precursor ions were excluded for 60 s. The acquired data were formatted into Mascot-compatible data using the mascot.dll script in Analyst QS v1.1 software shipped with the mass spectrometer. A no redundant protein database was initially searched (NCBInr). If the results were inconclusive, a nucleic acid database was searched by choosing expressed sequence tag (EST)-others from the master result page. The search was queried using the acquired mass-spectral data with the following settings: taxonomy: all taxa; trypsin digest (allowed up to 1 missed cleavage); cysteinyl residues optionally modified by carbamidomethylation; methionyl residues optionally modified by oxidation; maximal mass tolerance for precursor and fragment ions: 100 ppm.

Sequence Search and Medaka Nucleotide Databases

Database search was carried out by feeding the MS data to the Mascot search engine ( The search was conducted for all Taxa. The returned results (generally in orthologous species) were matched to specific medaka EST database. Two databases were used: the medaka gene-expression pattern database (Henrich et al., 2003) and the medaka EST database ( (see results for more details).


Proteins from liver medaka fishes were resolved by 2D gel technology and staining. Spots were then selected for protein identification because modified by the treatment (Figure 1). In the first step, fifteen sample spots were subjected to MALDI-TOF MS analysis and PMF search. Since the protein sequences of medaka fish are not yet characterised and Oryzias latipes is a species that is not listed in the database (taxonomy field) using the available search engine, the search was conducted without taxonomy restriction.


Figure 1: 2-DE gel of medaka hepatocyte cytosolic fraction stained with Sypro Ruby.

Most of the samples could not be identified by PMF due to the poor sequence coverage in spite of a significant number of peptides in the spectrum (Figure 2). Four spots with a modest score level defined by Mascot could be identified by cross-species matching (Table 1, Id. 1, 2, 3, 4).


Figure 2: MALDI-TOF MS spectra of mono protonated ions derived from spot 3 (Selenium binding protein 1 or SeBP) in Figure 1.

Spot Id. Protein identity Cov.%
Matched peptide
1 Phenylalanine hydroxylase
32442452 (nr, D. rerio)
BJ911084(MEPD, O. latipes)
13/86 peptides
2 Phenylalanine hydroxylase
32442452 (nr, D. rerio)
BJ737725 (MEPD, O. latipes)
12/86 peptides
3 Selenium binding protein 1
XP_707845 (nr, D. rerio)
BJ007475 (MEPD, O. latipes)
AAH5690(nr, D. rerio)
BJ007475 (MEPD,O. latipes )
BJ007475 (est, O. latipes)
4 Keratin 18 type 1
CAA74664 (nr, O. mykiss)
BJ498018 (MEPD, O. latipes)
AAC38007 (nr, C. auratus)
BJ747804 (MEPD, O. latipes)
5 Unidentified
Spot Id. Protein identity Cov.%
Matched peptide
6 DJ-1
BAD67176 (nr, O. latipes) Unnamed protein (RKIP)
CAG08164 (nr, T. nigroviridis)
BJ747456 (MEPD,O. latipes )
Raf kinase inhibitor protein
AU170544 (est, O. latipes)
126/50 14b
7 Acidic ribosomal
phosphoprotein P0
AAP20211 (nr, P. major)
BJ003458 (MEPD, O. latipes)
8 Natural killer enhancing factor
AAY25400 (nr, P. olivaceus)
BJ714211 (MEPD,O. latipes) RahpC-TCA family
AV669883 (est, O. latipes)
172/47 11b
9 Unidentified    
Spot Id. Protein identity Cov.%
Matched peptide
10 Proteasome α type 1
AAP20159 (nr, P. major) BJ01318 (est, O. latipes)
11 Hypoxanthine guanine
phosphoribosyl transferase
CAA35648 (nr, C. longicaudatus)
BJ729370(MEPD, O. latipes)
12 F-actin capping protein A
AAR16282 (nr, T. rubripes)
AM151914 (MEPD, O. latipes)
13 Actin capping protein B subunit
AAA52222 (nr, G. Gallus)
BJ729321 (MEPD, O. latipes )
14 Enolase
AAA70080 (nr, S. pombe)
BJ722994 (MEPD,O. latipes)
15 14-3-3 protein zeta/delta (RKIP-1)
P29361 (nr, O. aries )
BJ727482 (MEPD, O. latipes)

Table 1: Peptide sequences of identified proteins.

The matching peptides were then searched by off-line translate-blast nucleotide (tblastn) algorithm, using Expressed Pattern Database (MEPD, Germany, http:// and/or medaka EST database (Japan, (Figure 3). The output of the search was a list of clones from Oryzias latipes. The identification of the possible corresponding proteins was considered complete only if the three following conditions were verified for each clone: (1) the clone had a reasonable length; (2) the number of matched peptides was significant; (3) the virtual protein sequence obtained by translation of the clone had a computed mass compatible with the apparent mass observed on the gel. For each clone, an alignment of its polypeptidic sequence was carried out against the protein sequence initially obtained through cross-species identification to compare both protein sequences. To obtain additional information, tandem MS/MS was carried out. However information on O. latipes proteins is poorly available in public databases, and their homologies to proteins from other teleostean species whose genomes are available (D. rerio, O. mykiss and Tetraodon fugu) are insufficient. Then, online matching of the data spectrum was carried out with the NCBI-EST database. The remaining proteins were identified through approaches previously described (Altschul, 1989). In this case, one or multiple clones coding for the sequenced peptides were selected, and the corresponding polypeptide was reconstructed. If the computed mass of this polypeptide was compatible with the apparent mass observed on the gel, this specific sequence was used for a blastp/rpsblast search in NCBInr (Altschul, 1989) to confirm that the reconstituted sequence and the initial protein were homologous and could have the same putative function.


Figure 3: Protein-identification procedure.


This study was undertaken to establish a strategy to interrogate the databases in order to characterize proteins from the medaka fish by proteomic approach. The mass of the peptides (during MS scan) and their corresponding fragments (during MS/MS experiments) were used to search the selected database.

The feasibility of cross-species proteome protein identification using PMF approaches has been studied by relying on conservation of some of the genes and associated protein sequences. In general, the data suggest that standard PMF, with a mass accuracy of 0.1 Da results in unsatisfying identification scores. In spite of the relatively high conservation of tryptic cleavage sites, the few mutations that modify internal residues during evolution significantly change the peptide masses, making difficult the identification since many missed peptides thus fall in the “unmatched peptides” category. To facilitate the identification procedure, some additional information for database searching should be used. This includes protein features such as protein mass and isoelectric point, which are both relatively well conserved. However, using these criteria, only four proteins were identified, with poor scores (Table 1). Although medaka genome has been completely sequenced (Kasahara et al., 2007), only 1933 proteins have been characterised in the NCBI protein database (December 2008). Matching PMF data to NCBInr frequently yielded insignificant results through cross species identification in related species.

MS/MS sequencing was systematically performed to complete PMF analyses and EST database was selected since data on O. latipes is available. Several EST sequences were returned with high scores and allowed identification in cases where PMF was unsuccessful using the same prepared samples.

This approach provides a framework to identify proteins after 2D electrophoretic separation. It should be fruitful to apply it after use of the shotgun technique, resulting in many isolated proteins being accurately identified.


Gildas Mouta-Cardoso is warmly thanked for his technical help with the 2D gels and Lionel Dubost for massspectrometry analysis. This work was supported by grants from the ANR 07 SEST CYANOTOX 005, the AFSSET APR EST 2007 10, the Institut National des Sciences de l’Univers/Ecodyn and was further supported by a grant to Dr. Marc Edery from L’Association pour la Recherche sur le Cancer. This study is a contribution to the IRD research group CYROCO UR167.


  1. Altschul SF (1989) Gap costs for multiple sequence alignment. J Theor Biol 138: 297-309. » CrossRef » PubMed » Google Scholar
  2. Cordwell SJ, Wilkins MR, Cerpa PA, Gooley AA, Duncan M, et al. (1995) Cross-species identification of proteins separated by two-dimensional gel electrophoresis using matrix-assisted laser desorption ionisation/time-offlight mass spectrometry and amino acid composition. Electrophoresis 16: 438-443. » CrossRef » PubMed » Google Scholar
  3. Henrich T, Ramialison M, Quiring R, Wittbrodt B, Furutani SM, et al. (2003) MEPD: a Medaka gene expression pattern database. Nucleic Acids Res 31: 72-74. » CrossRef » PubMed » Google Scholar
  4. Kasahara M, Naruse K, Sasaki S, Nakatani Y, Qu W, et al. (2007) The medaka draft genome and insights into vertebrate genome evolution. Nature 447: 714-719. » CrossRef » PubMed » Google Scholar
  5. Laemmli UK (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227: 680-685. » CrossRef » PubMed » Google Scholar
  6. Mezhoud K, Praseuth D, Puiseux DS, François JC, Bernard C, et al. (2008) Global quantitative analysis of protein expression and phosphorylation status in the liver of the medaka fish (Oryzias latipes) exposed to microcystin-LR I. Balneation study. Aquat Toxicol 86:166-175. » CrossRef » PubMed » Google Scholar
  7. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data.Electrophoresis 20: 3551-3567. » CrossRef » PubMed » Google Scholar
  8. Wester PW, Van der Len LT, Vos JG (2004) Comparative toxicological pathology in mammals and fish: some examples with endocrine disrupters. Toxicology 205: 27-32. » CrossRef » PubMed » Google Scholar
  9. Wittbrodt J, Shima A, Schartl M (2002) Medaka — a model organism from the far east. Nature Reviews Genetics 3: 53-64. » CrossRef » PubMed » Google Scholar
Select your language of interest to view the total content in your interested language
Post your comment

Share This Article

Relevant Topics

Recommended Conferences

  • 9th International Conference on Bioinformatics
    October 23-24, 2017 Paris, France
  • 9th International Conference and Expo on Proteomics
    October 23-25, 2017 Paris, France

Article Usage

  • Total views: 11661
  • [From(publication date):
    February-2009 - Sep 20, 2017]
  • Breakdown by view type
  • HTML page views : 7891
  • PDF downloads :3770

Post your comment

captcha   Reload  Can't read the image? click here to refresh

Peer Reviewed Journals
Make the best use of Scientific Research and information from our 700 + peer reviewed, Open Access Journals
International Conferences 2017-18
Meet Inspiring Speakers and Experts at our 3000+ Global Annual Meetings

Contact Us

Agri, Food, Aqua and Veterinary Science Journals

Dr. Krish

[email protected]

1-702-714-7001 Extn: 9040

Clinical and Biochemistry Journals

Datta A

[email protected]

1-702-714-7001Extn: 9037

Business & Management Journals


[email protected]

1-702-714-7001Extn: 9042

Chemical Engineering and Chemistry Journals

Gabriel Shaw

[email protected]

1-702-714-7001 Extn: 9040

Earth & Environmental Sciences

Katie Wilson

[email protected]

1-702-714-7001Extn: 9042

Engineering Journals

James Franklin

[email protected]

1-702-714-7001Extn: 9042

General Science and Health care Journals

Andrea Jason

[email protected]

1-702-714-7001Extn: 9043

Genetics and Molecular Biology Journals

Anna Melissa

[email protected]

1-702-714-7001 Extn: 9006

Immunology & Microbiology Journals

David Gorantl

[email protected]

1-702-714-7001Extn: 9014

Informatics Journals

Stephanie Skinner

[email protected]

1-702-714-7001Extn: 9039

Material Sciences Journals

Rachle Green

[email protected]

1-702-714-7001Extn: 9039

Mathematics and Physics Journals

Jim Willison

[email protected]

1-702-714-7001 Extn: 9042

Medical Journals

Nimmi Anna

[email protected]

1-702-714-7001 Extn: 9038

Neuroscience & Psychology Journals

Nathan T

[email protected]

1-702-714-7001Extn: 9041

Pharmaceutical Sciences Journals

John Behannon

[email protected]

1-702-714-7001Extn: 9007

Social & Political Science Journals

Steve Harry

[email protected]

1-702-714-7001 Extn: 9042

© 2008-2017 OMICS International - Open Access Publisher. Best viewed in Mozilla Firefox | Google Chrome | Above IE 7.0 version