Muscle Proteomics of the Indian Major Carp Catla (Catla catla, Hamilton)

Proteomics is an unbiased, technology driven approach for the comprehensive cataloguing of entire protein complements and represent an ideal analytical tool for the high throughput discovery of protein alterations in health and disease [1]. Mass spectrometry-based proteomics is concerned with the global analysis of protein composition, posttranslational modifications and the dynamic nature of expression levels. The generation of large data sets on protein expression levels makes proteomics a preeminent hypothesis generating approach in modern biology [2,3]. Organisms with no available genome sequence data can be studied using comparative proteogenomic approach [3]. Aiming to better understand proteome alterations, it is vital to have a reference proteome map for a specific tissue and species. In this respect, proteogenomics is a thorough approach for the detailed biochemical analysis of heterogeneous and plastic types of tissue, such as muscles.


Introduction
Proteomics is an unbiased, technology driven approach for the comprehensive cataloguing of entire protein complements and represent an ideal analytical tool for the high throughput discovery of protein alterations in health and disease [1]. Mass spectrometry-based proteomics is concerned with the global analysis of protein composition, posttranslational modifications and the dynamic nature of expression levels. The generation of large data sets on protein expression levels makes proteomics a preeminent hypothesis generating approach in modern biology [2,3]. Organisms with no available genome sequence data can be studied using comparative proteogenomic approach [3]. Aiming to better understand proteome alterations, it is vital to have a reference proteome map for a specific tissue and species. In this respect, proteogenomics is a thorough approach for the detailed biochemical analysis of heterogeneous and plastic types of tissue, such as muscles.
Muscle plays a central role in whole-body protein metabolism by serving as the principal reservoir for amino acids to maintain protein synthesis in vital tissues and organs [4]. Skeletal muscle fibers represent one of the most abundant cell types in the vertebrates [5] and contractile fibers of skeletal muscle tissues provide coordinated excitation-contraction-relaxation cycles for voluntary movements and postural control [6], besides playing a central physiological role in heat homeostasis and presenting itself as a crucial metabolic tissue that integrates various biochemical pathways [7]. Skeletal muscle proteomics aims at the global identification, detailed cataloguing and biochemical characterization of the entire protein complement of voluntary contractile tissues in normal and pathological specimens [8]. Muscle proteomics has been applied to the comprehensive biochemical profiling of developing, maturity and ageing muscle, as well as the analysis of contractile tissues undergoing physiological adaptations seen in disuse atrophy, physical exercise and chronic muscular transformations [2].
Catla (Catla catla) is a commercially important carp species and contributes a major share to the freshwater aquaculture production in the Indian subcontinent. In the present study, we have generated reference muscle proteome map of Catla catla, identified 70 spots on 2-D gel and have compared these proteins with the muscle proteins identified across species. As the genome sequence data are not available for this organism, a proteogenomic approach is adopted to generate the partial gene sequence information on the identified muscle proteins.

Fish
Apparently healthy major carp Catla catla (n=12), weighing 800-1000 g (length 35-40 cm), were procured from a reputed fish farm and hatchery in Kolkata. The species status of the specimens was confirmed by analyzing for species-specific RAPD markers for catla [19].

Preparation of muscle extracts and protein quantification
Axial white skeletal muscle from midway down the body, under the dorsal fin and above the lateral line, was swiftly dissected out from fishes euthanized with MS 222 (>100-200 mgL -1 ). For muscle protein extraction, white muscle tissues were pooled and mechanically homogenized in ice-cold PBS (137 mM NaCl, 2.7 mM KCl, 4.3 mM Na 2 HPO 4 .7H 2 O, 1.4 mM KH 2 PO 4 ), pH 7.3 containing protease inhibitor cocktail (Sigma) [20]. To minimize protein modification or degradation, all dissection and sample processing was performed on ice. The homogenates were centrifuged in a high speed refrigerated centrifuge (Biofuge FRESCO, Heraeus) at 10,000 rpm at 4ºC for 10 min and supernatants (representing the soluble protein extracts) were aspirated out. Protein concentration of the extracts was determined using Bradford method [21] using BSA (Sigma) as the standard. The samples were stored as aliquots at -40ºC, until further use.

Gel electrophoresis
Prior to 2-D GE analysis, the proteins were analyzed by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) [22] to check the protein quality and also to ensure equal loading. For 2-D GE analysis, the first dimension run (isoelectric focusing) was performed using a Bio-Rad Protean IEF Cell with 11 cm immobilized pH gradient (IPG) strips (pH 5-8, Sigma) following standard protocol [23] and have been described earlier [20]. The protein sample (~150 μg) was premixed with rehydration buffer (8 M urea, 2 M thiourea 2% CHAPS, 50 mM DTT, 0.2% Biolyte, 5/8 ampholyte, and 0.001% bromophenol blue) and rehydration of the IPG strips was carried out for 12 h. The rehydrated strips were isoelectrofocussed at a current of 50 μA/strip at the stated voltage gradient: 250 V for 20 min, 500 V for 30

MALDI-TOF-TOF-MS
Protein spots of interest were cut from the 2-D polyacrylamide gel, destained in methanol and ammonium bicarbonate buffer, and digested overnight with trypsin. The resulting peptides were extracted following standard techniques [24] by two 20-min incubations with 10-20 µL ACN containing 1% TFA, depending on the size of the gel piece. The resulting tryptic peptide extract was dried by rotary evaporation and stored at -20°C for further analysis by MS. The peptides were analysed by MALDI-TOF-TOF mass spectrometer using a 5800 Proteomics Analyzer (AB Sciex).
The dry peptide samples were reconstituted in 10 µL standard diluent (30:70; ACN:water). The resulting solution was diluted 1:10 with matrix solution (CHCA, 10 mg/mL) and spotted on a 384-well Opti-TOF stainless steel plate. The spotted samples were analyzed using a first run of standard TOF-MS. The system was set to perform a second run of MS/MS focused on the 20 most intensive peaks of the first MS (excluding peaks known to be of trypsin). The laser was set to fire 400 times per spot in MS and 2000 times per spot in MS/MS mode. Laser intensity was 2800 J and 3900 J for MS and MS/MS, respectively. A mass range of 800-4000 amu with a focus mass of 2100 amu was used.
For protein identification, peptide masses from trypsin digests derived using the MALDI-TOF-MS were used to search against Ludwig NR Database using the MASCOT program (www. matrixscience.com) [25]. The MASCOT search parameters were as follows: peptide mass accuracy was 100 ppm, protein modifications: cysteine as S-carbamidomethyl-derivative and oxidation of methionine allowed. The default search parameters used were: enzyme, trypsin; maximum missed cleavages, 1; fixed modifications, carbamidomethyl (C); variable modifications, oxidation (M); peptide tolerance+0.4 Da; Fragment mass tolerance+0.4 Da; Protein mass, unrestricted; instrument=Default.

Data and network pathway
Proteins were identified by MS analyses and protein database (Mascot) searches, basic descriptions of identified protein spots including protein names, accession numbers, theoretical molecular mass and pI values were provided. In addition to these descriptions other fundamental information relevant to identified proteins were acquired through further search of databases available in the public domain. Database accession numbers for identified proteins were provided as inputs to the Uniprot database (http://www.uniprot.org/ uniprot/) to obtain gene ontology annotation and sequence length. The consensus lists of proteins obtained were then analyzed for their respective molecular function, biological processes involved protein class and associated metabolic pathway by using Panther classification system software (www.pantherdb.org/pathway/).

Tissue collection and RNA extraction for transcript analysis
White muscle tissues were collected from anaesthetized fishes and stored at (-80°C) in RNA later (Sigma) to avoid RNA degradation. Subsequently, in total RNA isolation was completed by using RNA Express Reagent (Himedia). Briefly, samples (~1 gm) were homogenized in 1 ml RNA Express reagent; chloroform was added to separate the mixture into protein, DNA and RNA. The RNA supernatant was precipitated with 2-propanol (Himedia). The pellets were resuspended in 25 μl of DEPC (Himedia) treated water and stored at -80°C until further use.

Polymerase chain reaction
To amplify the genes from the catla musle cDNA library, primers designed against the nucleotide sequences of the zebrafish, common carp and other related species with the assistance of Primer3 software were used (http://frodo.wi.mit.edu/). The reaction mixture (25 μl) consisted of 10 ng cDNA, 10X buffer (Himedia), 10 mM dNTPs (Himedia), 10 pM gene specific primer and 5U of Taq Polymerase (Himedia). PCR reaction was carried out by using thermal cycler (Veriti, Applied Biosystem). The amplification conditions were as follows: initial predenaturation at 95°C for 3 min followed by 45 cycles of amplification (denturation at 95°C for 45 s, annealing for 45 s at temperatures optimized for specific genes (Table 1), 72°C for 1min and a final extension at 72°C for 10 min. 8 μl of the PCR product was analyzed in 1.6% agarose gel in ImageQuant LAS 4000 (GE Healthcare).

Sequencing of the transcripts
Bands of appropriate size were purified with Hipura Quick gel purification Kit (Himedia) and were further sequenced by using Sanger's dideoxy sequencing protocol (ABI3730 XL) with specific primers. To confirm the identity of the gene the partial sequences were subjected to Blastn and Blastx [26] for comparing with the Genbank Nucleotide and Protein database.

Gel electrophoresis
SDS-PAGE of Catla catla soluble muscle proteins separated the muscle extracts into 22 bands in the MW range of 14 to >205 kDa ( Figure 1A). 2-D gel electrophoresis profiles of catla muscle proteins were generated; a representative 2-D gel profile of the catla muscle proteome is shown ( Figure 1B). Coomassie-silver double staining enabled visualization of soluble white muscle proteins into 130 spots. Majority of the separated proteins fall in the range of 20-60 kDa and pI 6-8, except few with MW >90 kDa and pI 6.5-7.5.

MALDI-TOF-MS
Proteomic analysis led to identification of 70 protein spots (Table 2 and Figure 2), which have been cataloged in an in-house fish proteomic database FISHPROT (http://www.cifri.ernet.in/fishprot.html) [27]. Out of the 70 protein spots identified, 43 were identified as metabolic enzymes involved in carbohydrate (32) (Supporting Information Figure  S1), protein (1), lipid (5) and nucleotide (5) metabolism. Another 19 spots were found to be cytoskeletal proteins, of which 14 are CK (Supporting Information Figure S2). Three proteins were identified to be associated with signal transduction and five were categorized separately based on their functions (Table 2 and Figure 2).
Proteins identified in other fish muscle proteomes reported earlier via gel-based proteomics were compared and newly identified protein spots in catla, which were not shown in other species proteome maps, prior to this study, are marked (Supporting Information Table S1). Similarly, comparative muscle protein profiles in the muscle proteome of higher and lower vertebrates, as identified via gel-based methods vis-à-vis proteins identified in catla (Supporting Information, Table  S2).

Pathway analysis
The major pathways associated with the identified proteins of catla based on MALDI-TOF-MS data are mostly related to different metabolic pathways (Table 2). Upon analyzing the muscle proteome dataset by Panther classification system software, the total biological pathway was divided into eight categories: viz. asparagines and aspartate biosynthesis pathway, de novo purine biosynthesis, fructose-galactose metabolism, glycolysis, G -protein signaling pathway, phenylalanine biosynthesis, pyruvate metabolism, tyrosine biosynthesis (Supporting Information Figure S3 A-i). Large groups were found to be involved in glycolysis pathway (Supporting Information Figure S1) and proteins associated with this pathway were subdivided as pyruvate kinase, aldolase, enolase (Supporting Information Figure S3A-ii). G-protein signaling pathway represents both phosphorylases-A and -B (Supporting Information Figure S3A-iii).
According to biological processes, proteins are divided into three  categories, viz. immune system, system process and metabolic process (Supporting Information, Figure S3B-i) of which metabolic process represented the largest group, which is subdivided as primary and ROS metabolic process (Supporting Information, Figure S3B-ii). Primary metabolic process is further divided in to carbohydrate, cellular amino acid, lipid and nucleotide metabolism (Supporting Information Figure  S3B-iii).
On the basis of molecular function, proteins are divided into two classes: viz. catalytic activity and antioxidant activity (Supporting Information Figure S4A-i) where catalytic activities of the identified proteins are further divided into transferase, isomerase, lyase and oxidoreductase catalysis (Supporting Information, Figure S4A-ii). Transferase activity of the enzyme proteins are further divided into glycosyltransferase, transaminase and a large group associated with kinase activity (Supporting Information Figure S4A-iii), which are further divided into nucleotide, amino acid, carbohydrate kinase (Supporting Information Figure S4A-iv).
Dataset proteins were grouped into five protein classes' viz. transferase, isomerase, kinase, lyase and oxidoreductase (Supporting Information Figure S4B-i). Proteins of transferase class are further sub grouped into transaminase, kinase and phosphorylase (Supporting Information Figure S4B-ii), whereas kinases (Supporting Information Figure S4B-iii) are divided as described in Supporting Information Figure S4A-iv and oxidoreductase activity is further divided as peroxidase and dehydrogenase (Supporting Information Figure S4Biv).
Pathway coverage of the major enzymes involved in carbohydrate metabolism is illustrated (Figure 3). The associated gene names, and sequence coverage are presented in adjacent grey boxes of the identified proteins ( Figure 3).

Discussion
Proteomics, the global analysis of protein synthesis, studies the end product of gene expression i.e. the protein, for which it is also termed as functional genomics. The nucleic acid based technologies for analyzing differential gene expression assay only mRNA expression, which is not always reflected in the levels of protein synthesis. Two-dimensional protein gels, combined with peptide mass mapping by MALDI-TOF MS for protein identification, are widely used for determining differential protein synthesis in biological systems. In the current study, we have generated muscle proteogenomic information on the commercially important carp Catla catla.
A number of samples were analyzed to assess the protein quality and to check for intra-individual variability, if any. Checking the protein quality for proteomic studies is important as many preanalytical variables are known to affect the same [29]. In order to investigate the proteome composition and to enable a detailed visualization of the proteins, 2-D gel electrophoresis was carried out. The combination of IEF and SDS-PAGE forms the classical separation technique in gelbased proteomics. In this study, isoelectric focusing was performed using IPG strips of pI 5-8 for muscle protein separation. Significant clusters of low molecular weight protein spots were resolved across the entire pI range of the gels ( Figure 1A). A total of 70 individual protein spots were identified from the 2-D gels on the basis of their peptide   Table 2).
The majority of protein identifications matched those determined from 2-D PAGE were enzymes of carbohydrate metabolism, same in the case of the previous study [14]. In many cases, the same protein was identified in multiple gel spots of similar molecular mass (Figure 1 and Table 2); these spots differed only in charge and are positional variants.
In the non-redundant Swiss-Prot database approximately 2500 sequences correspond to ray-finned fish of which 89 are common carp sequences [14]. Therefore, to accurately determine the identity of fish muscle proteins we have combined cross-species matching with MALDI-TOF-MS data. As the full genome sequence for any of the Indian major carps rohu -Labeo rohita, catla-Catla catla and mrigal-Cirrhinus mrigala is not available, the carp muscle proteins were identified by matching to other species, mainly zebrafish (Danio rerio), which has been sequenced and extensively annotated. This approach is facilitated by the close taxonomic relationship of carp to zebrafish; both are cyprinids. Whilst the availability of further sequence data for carp will enhance the identification of proteins in this type of study, the results indicate that it is possible to identify fish muscle proteins through cross-species matching to a taxonomic near neighbor.

Comparative proteomic data analysis across species
Proteins identified in this study are dominated by proteins which are mainly composed of enzymes such as enolase, GAPDH, pyruvate kinase, and creatine kinase (CK), which are associated with energy production pathways. Aldolase, enolase, pyruvate kinase, CK, and their fragments have been reported with detailed characterization in the sea bream [15], snakehead [16] and common carp [14] muscle proteome to provide a number of insights on the size and environment-related variability. Muscle proteome changes in association with development and exercise, by means of 2-DE and MALDI-TOF MS studies performed in zebrafish [13] earlier. The proteome map of catla further strengthens the knowledge base for comparative muscle proteomics among different fish species (Supporting Information, Table S1). The available datasets would act as a basis for studies related to physiological status assessment of Catla catla under different environmental conditions, screening for diseases and biomarker identification for assessment of fish quality. It has been found that twelve proteins viz. aspartate amino transferase, glycerol-3-phosphate dehydrogenase (GPDH), CK M3, uncharacterized actin binding protein (CAPZB gene product), PDZ and LIM domain 7 (Development GDNF family signaling), Zgc:165344 protein, Zgc:91930 (adenylate kinase family), proteosome subunit alpha type 6, α-1-antitrypsin homolog, peroxiredoxin, DJ-1 (Fragment), novel protein similar to phosphohistidine phosphatase 1 (PHPT1) identified in Catla catla muscle proteome map are new identifications in gel-based proteomes; they have not been observed in earlier studies on zebrafish, common carp, seabream, snakehead fish and cod fish (Supporting Information Table S1) [13][14][15][16][17]. In an earlier study, proteome cataloging using 1-D PAGE protein separation, nano LC peptide fractionation and linear trap quadrupole (LTQ) mass spectrometry of cod Gadus morhua [16] identified a total of 4804 peptides. Moreover, proteomic signature of muscle has also been established also via non-gel based methods in Rainbow trout, Oncorhynchus mykiss [28].
The proteins identified in catla muscle proteome have also been compared with higher vertebrates; rat [10], rabbit [11], chicken [12] and human [9]. Using a combination of one-dimensional gel electrophoresis and HPLC-ESI-MS/MS, 954 different proteins were identified in human muscle [9]. Proteome analysis of rat skeletal muscle led to identification of ~50 proteins [10]. A proteomic reference map for the gastrocnemius muscle of rabbit has also been generated and 45 proteins have been identified [11]. In the present study, 10 protein spots have been identified in catla muscle proteome which have not been shown in higher vertebrate muscle proteome by gel-based proteomics [9][10][11][12]. These are aspartate amino transferase, CK M2 and M3, Zgc: 165344 protein,adenylate kinase D, proteosome subunit α type 6, α-1-antitrypsin homolog, peroxiredoxin, DJ-1 (fragment) and novel protein similar to phosph ohistidine phosphatase 1 (PHPT1) (Supporting Information Table S2).

Protein dataset analysis
Proteomics technologies are under continuous improvements and new technologies are introduced. Nowadays high throughput acquisition of proteome data is possible. The young and rapidly emerging field of bioinformatics in proteomics is introducing new algorithms to handle large and heterogeneous data sets and to improve the knowledge discovery process. Local proteomics bioinformatics platforms viz. FISHPROT is a database management systems and is a knowledge base for fish proteomic data.
Although all the proteins identified in catla were grouped according to their biochemical properties (Table 2 and Figure 2), they were further verified by putting the respective gene names into 'Panther classification system software' http://www.pantherdb.org/pathway/ which is a commercial and freely available software. This system uses the gene names of the identified proteins and classifies them into different groups on basis of their similarity to specific organisms already available in its database; the zebrafish Danio rerio in case of fish. As evident from classification of identified protein spots of Catla catla by this software, on the basis of biological pathway (Supporting Information Figure S3a) and molecular function (Supporting Information Figure S4a), majority of identified proteins are housekeeping proteins such as those involved in metabolism of carbohydrates, proteins, lipids and nucleotides (Supporting Information Figure S3b) and the musculoskeletal proteins.
According to the classification of proteins on the basis of 'Total Biological processes', a large chunk of identified proteins were classified under 'Primary metabolic process', which is further divided into carbohydrate, cellular amino acid, lipid and nucleotide metabolism ( Figure 2). 22 protein spots ( Table 2) identified in the catla muscle proteome represent six glycolytic enzymes, triose phosphate isomerase, pyruvate kinase, aldolase A, enolase, phosphoglycerate kinase and glyceraldehydes-3-phosphate dehydrogenase (GAPDH) (Supporting Information Figure S1). However, the Panther classification system software is able to identify only three enzyme proteins viz. pyruvate kinase, aldolase, and enolase, out of the six identified (Supporting Information Figure S3A-ii); this may be so because this classification system is limited to the information available in its database for specific organisms (under the piscines generic information only on zebrafish is available in its database) and therefore, except for proteins showing homology with zebrafish, other identified proteins were not taken in to account.
Two spots (C-59, 122) were identified as glycogen phosphorylase. Glycogen phosphorylase catalyzes and regulates the entry of glucose residues into glycolysis from glycogen via glycogenolysis pathway. This is a regulatory enzyme present in both liver and muscle. In skeletal muscles the enzyme occurs in two forms, a catalytically active phosphorylated form (phosphorylase a) and a much less active dephosphorylated form (phosphorylase b). In muscle, the rate of conversion of glycogen units into glucose 1-phosphate is regulated by the ratio of the active phosphorylase a to the less active phosphorylase b. Glycogen synthase, the rate-limiting enzyme in glycogen biosynthesis is also regulated by glycogen phosphorylase (Figure 3).
Four spots (C-91, 92; CM-90, 124) were identified as phosphoglucomutase. Phosphoglucomutase is a key enzyme in the metabolism of glycogen and protein glycosylation. It is responsible for the reversible inter conversion of glucose 1-phosphate to glucose 6-phosphate, both of which are key intermediates in the synthesis and breakdown of glycogen and galactose metabolism. It is also important for the formation of UDP-glucose which is an essential intermediary metabolite in protein glycosylation. Inhibition of phosphoglucomutase has drastic effects on carbohydrate metabolism which reduces the steady-state levels of UDP-glucose, resulting in a defect of glycogen and trehalose biosynthesis, while galactose metabolism is inhibited, leading to galactosemia, accumulation of galactose 1-phosphate and Glucose 1-phosphate i.e., poor glycogen turnover.

Lipid metabolism
Four spots (C-73, 75, 76, 77) were identified as Glycerol 3-phophate dehydrogenase 1b. Glycerol 3-phosphate and fatty acyl-CoAs are the common precursors for triacylglycerols and glycerol phosphatides. Glycerol phosphate is formed in two ways; it is formed from the dihydroxy acetone phosphate generated during glycolysis by the action of cytosolic NAD-linked GPDH. It is also be formed from glycerol by the action of glycerol kinase.
Spot number C-111 has been identified as apolipoprotein A1 (Apo A-I) (Supporting Information Table S1). This protein has earlier been identified in Cyprinus carpio [14]. Apo A-I is the major protein component of high density lipoprotein (HDL) in plasma, which confers water solubility to the lipoprotein complex thus facilitating lipid transport and metabolism. It promotes cholesterol efflux from tissues to the liver for excretion. It is a cofactor for lecithin cholesterolacyltransferase (LCAT) which is responsible for the formation of most plasma cholesteryl esters. In lower vertebrate species Apo A-I is also synthesized in a number of peripheral tissues, e.g., rainbow trout gill [29] and liver [30], carp optic nerve [31] and skin [32]. In the cod (Gadus morhua) Apo A-I is closely associated with the C3 component of the complement system [33]. Apo A-I has also been reported to have a restorative or protective role and plays significant role in maintaining epithelial integrity [29].

Nucleotide metabolism
Five spots have been grouped as proteins related to nucleic acid metabolism; out of these, one spot (C-8) has been identified as adenylate kinase and the other 4 spots (C-13, 15, 20, 109) have been identified as adenylate kinase D ( Table 2). The adenylate kinase D has not been reported in the vertebrates by gel-based proteomics.

CK and other contractile proteins
CK catalyzes the transphosphorylation between phosphocreatine and ADP and is central to the regulation of muscle bioenergetics. Large number of protein spots (13) across a broad range of MW and pI were identified as positional variants of muscle CK in the present study (Supporting Information Figure S2). It has been reported earlier that creatine/phosphocreatine interconversion played an important role in ATP regeneration since depletion of glycolytic enzymes in carp resulted in anoxia. High tissue CK activity, whether constitutive, induced, or both, may rather directly enhance contractile responses by enhancing cellular energy and contractile reserve. This high CK activity may alter local ADP levels at the contractile proteins and contribute to the enhanced contractility and myosin ATPase activity.
Complex patterns of CK isoforms exist in the skeletal muscle of fish [34][35][36] and three isoforms of CK have previously been identified in carp skeletal muscle [37]. These are referred to as M1, M2 and M3 and have predicted masses of 43 kDa and pI 6.22-6.32 [37]. In the muscle proteome of catla, 14 spots have been identified as CK; out of these, six (C-57 Table 2; Supporting Information Figure S2). This observation is consistent with the earlier reports on presence of multiple forms of this enzyme in muscle tissue of other fish [35,37]. CK M-3 has not been reported earlier in any vertebrates by gel-based proteomics.
Four spots (C-2, CM-126, 137, 138) were identified as actin (Table 2). Actin, a 42-kDa globular protein found in all eukaryotic cells, participates in many important cellular processes including muscle contraction, cell motility, cell division and cytokinesis, vesicle and organelle movement, cell signaling, and the establishment and maintenance of cell junctions and cell shape.
CM-141, identified as PDZ and LIM domain -7 protein, is a muscle-specific protein [9]. It is representative of a family of proteins composed of conserved PDZ and LIM domains. PDZ is an acronym combining the first letters of three proteins -Post synaptic density protein (PSD95), Drosophila disc large tumor suppressor (Dlg1), and Zonula occludens-1 protein (zo-1) -which were first discovered to share the domain [38]. The PDZ domain is a common structural domain of 80-90 amino-acids found in the signaling proteins of bacteria, yeast, plants, viruses and animals. LIM domains are protein structural domains, composed of two contiguous zinc finger domains, separated by a two-amino acid residue hydrophobic linker; they are named after their initial discovery in the proteins Lin11, Isl-1 and Mec-3. LIM-domain containing proteins have been shown to play roles in cytoskeletal organization, organ development and oncogenesis. LIM-domains mediate protein: protein interactions that are critical to cellular processes [39]. LIM domains are proposed to function in proteinprotein recognition in a variety of contexts including gene transcription, development and in cytoskeletal interaction. The LIM domains of this protein bind to protein kinases, whereas the PDZ domain binds to actin filaments. The gene product is involved in the assembly of an actin filament-associated complex essential for transmission of ret/ptc2 mitogenic signaling. The biological function is likely to be that of an adapter, with the PDZ domain localizing the LIM-binding proteins to actin filaments of both skeletal muscle and non muscle tissues [9]. This protein has not been reported earlier in any fish species, by gel-based proteomics.

Signal transduction
Protein phosphorylation is a key regulatory mechanism for signal transduction in both prokaryote and eukaryotic cells. Most of our understanding regarding the signaling events in eukaryotes comes from tyrosine, serine/threonine kinases, and phosphatases. Histidine phosphorylation-dependent signaling in eukaryotes is less well characterized. The first vertebrate protein histidine phosphatase, PHPT1 was identified in 2002 [40] and it was found that PHPT1 is ubiquitously expressed in eukaryotes, from C. elegans to Homo sapiens. In the present study, one protein spot, CM160, has been identified as PHPT (Table 2); this protein has not been reported in other vertebrates by gel-based proteomics earlier (Supporting Information Table S1).
Two spots, C14 and C 18, were identified as peroxiredoxin and DJ-1, respectively (Table 1, Supporting Information Table S1 and S2). Peroxiredoxin controls cytokine induced peroxide levels and thereby mediating signal transduction in mammals. DJ-1, a chaperone protein having antioxidant properties, is involved in the cellular response to stress and have been described to play a role in apoptosis.

Transcript analysis
The partial gene sequence for the identified proteins (16/22) and for some additional proteins that include hsp47, hsp60, hsp70, hsc71, hsp90 and 18S RNA have been generated. Thus, this is the first study on this commercially important species with a proteogenomics approach. Proteogenomics aims mainly to use proteomics data and technologies for identifying novel genomic features, such as novel un-annotated genes, and for improving, correcting or confirming the structural and functional genome annotation. This approach is ideal to harness the wealth of information available at the proteome level and apply it to the available genomic information of organisms [3,41]. This would help to focus on the functional genome, rather than the whole genome and could possibly help to identify protein variants that could cause diseases, to identify protein biomarkers, to study genome variation and to identify QTLs associated with production traits. Multi-prolonged approaches such as transcriptomics and proteomics in addition to genomics should be included in future studies, as proteogenomic analyses provide a more accurate catalog of protein-coding genes [3,41].
In fish, flesh quality is dependent on environmental factors, mainly water and food quality for product safety and food composition for flesh nutritional quality. Nevertheless, amongst sensory quality, flesh texture is mainly determined by biological factors such as muscle organization, protein content, and composition. In fish, the best quality is firm and cohesive flesh with good water holding capacity. These traits are mainly determined by proteins' nature and properties, so proteomic tools appear especially of interest to study fish flesh quality. However, very few studies were undertaken to identify flesh quality biomarkers [42][43][44][45] and is a research gap needing attention of researchers.
The primary objective of the study was to establish a reference muscle proteome map for Catla catla and to identify the muscle proteins, which has been achieved to a great extent. The identified proteins in this study have been cataloged in the database http://www. cifri.ernet.in/fishprot.html and would act as baseline information on proteogenomics of Catla catla and other aquacultured species. The information generated could also be useful for biotechnological interventions in fish health and disease management; besides adding to the existing knowledge base on comparative muscle proteomics.