Shotgun Proteomics Analysis of Differentially Expressed Urinary Proteins Involved in the Hepatocellular Carcinoma

1Graduate Institute of Medicine, College of Medicine, Kaohsiung Medical University, Taiwan 2Division of Hepatobiliary Surgery, Department of Surgery, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan 3Department of Biotechnology, College of Life Science, Kaohsiung Medical University, Taiwan 4Center for Research Resources and Development, Kaohsiung Medical University, Taiwan 5Department of Medical Research, Kaohsiung Medical University Hospital, Taiwan #These authors contributed equally to this work


Introduction
Hepatocellular carcinoma (HCC), being ranked as the seventh most common malignant tumor in women and the fifth in men worldwide [1], is the first leading cause of cancer death in epidemiological studies of Taiwan in recent decades [2]. It has also been noted that HCC was prevalent in middle and western Africa, and eastern and south-eastern Asia; furthermore, over 80% of HCC incidences occur in developing countries [1]. Many etiological factors accountable for a wide spectrum of clinical manifestations were found to be associated generally with chronic infection with hepatitis B or C virus (HBV, HCV) and cirrhosis [3]. Moreover, several environmental factors including alcoholism, tobacco smoking and dietary exposure to aflatoxins could also account partly for a high incidence of HCC [4,5]. Many patients detected with HCC were found to have cirrhosis when they were initially diagnosed as sufferers of chronic liver disease lacking long-term clinical care and appropriate treatment. This abnormal cirrhosis condition results in an increase in the replacement of normal tissue with fibrous tissue leading to the loss of functional liver cells and the development of HCC [6].
The mortality rate for HCC exceeds 30 cases per 10,000 population and most cases are resistant to traditional chemotherapy and radiotherapy [7]. A wide variety of chemotherapeutic agents currently in use include fluorouracil, doxorubicin, mitoxantrone, cisplatin, mitomycin C, epirubicin, interferon-alpha and tegafur. However, no curative regimen has been found to date. The drug response and prolongation of survival are usually minimal (a few months or less), and there is a significant morbidity associated with poor treatment [8]. Curative surgery of HCC is feasible for only about 30% of patients. Transarterial embolization or chemoembolization (TAE/TACE) has been demonstrated to provide some survival benefits if tumors are confined to the localized area of liver and no evidence of portal vein thrombosis is found [9]. Therefore, in general diagnosis or detection at an early stage of HCC is considered to be essential to allow favorable and positive clinical treatments for increasing the life expectancy of HCC patients. Some screening tools such as the measurement of serum alpha-fetoprotein (AFP) or an abdominal ultrasonography at regular intervals targeting high risk populations were also applied to the detection of HCC at an early stage. Unfortunately, poor sensitivity and specificity of AFP and the need of an operator's expertise required on the ultrasonographic evaluation limited their prognostic use.

Abstract
Numerous investigations underlying the hepatocellular carcinoma (HCC) diagnosis as well as detection at an early stage based on hepatitis B virus surface antigen (HbsAg) concentration in serum and aflatoxin metabolites in urine have been commonly reported in the literature. To date, these biomarkers, even though specific and accurate, are not universal for the detection of HCC elicited by all causative factors. In addition, potential biomarkers may be present at low concentrations in contrast to the presence of abundant interfering proteins with a wide dynamic range. The aim of this study is to establish an operational analysis platform of effective and noninvasive diagnostic tool with a high sensitivity to explore protein expression profiles by shotgun proteomics using nano-liquid chromatography coupled tandem mass spectrometry (nanoLC-MS/MS) and stable isotope dimethyl labeling. Differentially expressed urinary proteins were identified and compared by the mass spectral patterns of their peptide fragments generated from protease digestion. In our results, the quantitative proteomic analysis of the differentially expressed proteins in urine identified at least 21 protein biomarker candidates with high confidence levels. We have further identified 14 proteins with up-regulation (stable isotope D/H ratio ≥ 1.5) and 7 with down-regulation (D/H ratio ≤ 0.6). The systematic decrease or increase of these identified marker proteins may potentially reflect the morphological aberrations and diseased stages of liver throughout progressive developments of HCC. The results would place a firm foundation for future work regarding validation and clinical translation of some identified biomarkers into targeted diagnosis and therapy for various classes of HCC.
Combined use of AFP testing and ultrasonography was also reported to increase false-positive rates [10]. Other tests, including Lens culinaris agglutinin-reactive AFP and des-gamma carboxyprothrombin (DCP), are currently being evaluated and validated [11]. Owing to the lack of accurate and specific biomarkers for the assessment of HCC incidence at an early stage, HCC is thus considered to be difficult for detection and usually diagnosed as an incurable disease when diagnosed at a late diseased stage. Therefore it is very urgent and imperative to discover biomarkers with prognosis potential coupled with the development of curative therapy in the effective management of HCC.
Although greater emphasis in biological research is being directed toward a comprehensive global analysis of protein expression profiles for biomarker discovery, reliable and high-throughput proteomewide comparative analyses for some diseases has not existed until the advent of current proteomics instrumentation. The qualitative and quantitative studies of proteins by means of fast-evolving and state-of-the-art proteomics methodologies have provided a firm basis for understanding the complex proteome profiles of total protein mixtures from various sample sources such as tissues, cells, plasma and urine [12,13]. A major phase of various proteomics strategies lies in the determination of protein identity (Protein ID) of interest using analytical "fingerprints" or peptide mass fingerprinting (PMF) generated by digestion of proteins with cleavage-specific enzymes such as trypsin or some other well-characterized proteases, from which tandem mass (MS/MS) spectra of peptide fragments can then be used for comparison and confirmation of protein ID in available sequence databanks. The strategy based on "bottom-up" proteomic approach by means of the direct analysis of peptides generated from protein digestion by high-resolution liquid chromatographies coupled with tandem MS/MS spectrometry (nanoLC-MS/MS) has facilitated the socalled "shotgun proteomics" for the identification of protein mixtures from any tissues of interest. Basically shotgun proteomics detects PMF profiles throughout the whole cellular proteome based on a random statistical sampling method similar to that of shotgun genomics in the decoding of human genome sequences. Various MS/MS spectra can be algorithmically compared with predicted peptide spectra from sequence databases to identify the respective proteins. This shotgun proteomics approach is capable of characterizing proteins directly from entire tissue or cell lysates [14][15][16].
In this study, we aim to establish a high-throughput operation platform of effective and noninvasive diagnostic tool for early detection of HCC biomarkers. To attain this goal, we have made an effort to characterize and compare the urinary proteins between diseased and control groups in order to identify potential biomarker candidates by means of gel-free shotgun proteomic analysis coupled with stable isotope dimethyl labeling [12,16,17] and nanoLC-MS/MS [17][18][19]. The proteomic analysis at the global cellular level reported herein will lay a firm foundation for future work regarding validation and clinical translation of some identified biomarkers into targeted diagnosis and therapy for HCC.

Sample collection
All the procedures used in this study were approved by the ethical committee of clinical research at Kaohsiung Medical University Hospital. We collected urine from patients who were diagnosed as cases of HCC incidence and never underwent cholecystectomy as the disease group. We also got agreement from patients diagnosed with non-HCC incidence but underwent cholecystectomy for urine collection as the normal control. 50 mL urine for each individual were harvested, concentrated by centrifugation and assayed for determination of total protein concentration using Coomassie protein assay reagent, and subsequently were stored at -80°C until being analyzed.

Dimethyl labeling and peptide preparation
Volumes of urine containing 100 μg of total proteins were adjusted to 60 μL and treated with 0.7 μL of 1 DTT and 9.3 μL of 7.5% SDS at 95°C for 5 min before reduction. After the reaction, lysates were further treated with 8 μL of 50 mM IAM at room temperature for alkylation in the dark for 30 min; subsequently proteins were precipitated by adding 52 μL of 50% TCA and incubated on ice for 15 min. After removing the supernatant by centrifugation at 13,000 x g for 5 min, the collected proteins were washed with 150 μL of 10% TCA, vortexed and centrifuged at 13,000 x g for 10 min. The precipitated proteins were washed again with 250 μL distilled H 2 O, vortexed and centrifuged thrice under the same condition. The resultant pellets were resuspended with 50 mM NH 4 HCO 3 (pH 8.5), then digested with 4 μg of trypsin for 8 h at 37°C and further dried in a vacuum centrifuge to remove NH 4 HCO 3 . The lyophilized peptides from HCC and normal urine re-dissolved in 180 μL of 100 mM sodium acetate at pH 5.5 were treated with 20 μL of 4% formaldehyde-H 2 and 20 μL 4% formaldehyde-D2, respectively [17][18][19] and mixed thoroughly. The mixtures were vortexed for 5 min, immediately followed by the addition of 10 μL of 0.6 M sodium cyanoborohydride and vortexed for 1 h at room temperature. The resultant liquids were acidified by 10% TFA/ H 2 O to pH 2.0~3.0 and applied onto the in-house reverse-phase C18 column pre-equilibrated with 200 μL of 0.1% TFA/H2O (pH 2.0~3.0) for desalting. The column was also washed with 200 μL of 0.1% TFA/H 2 O (pH 3.0) and then eluted with a stepwise ACN gradient from 50% to 100% in 0.1% TFA at room temperature.

NanoLC-MS/MS analysis
The lyophilized powders were reconstituted in 10 μL of 0.1% FA in H 2 O and analyzed by LTQ Orbitrap XL (Thermo Fisher Scientific, San Jose, CA). Reverse-phase nano LC separation was performed on an Agilent 1200 series nanoflow system (Agilent Technologies, Santa Clara, CA). A total of 10 μL sample from collected fractions was loaded onto an Agilent Zorbax XDB C18 precolumn (0.35 mm, 5 μm), followed by separation using in-house C18 column (i.d. 75 μm×15 cm, 3 μm). The mobile phases used were (A) 0.1% FA in water and (B) 0.1% FA in 100% ACN. A linear gradient from 5% to 95% of (B) over a 70min period at a flow rate of 300 nL/min was applied. The peptides were analyzed in the positive ion mode by applying a voltage of 1.8 kV to the injection needle. The MS was operated in a data-dependent mode, in which one full scan with m/z 400-1600 in the Orbitrap using a scan rate of 30 ms/scan. The fragmentation was performed using the CID mode with collision energy of 35 V. A repeat duration of 30 s was applied to exclude the same m/z ions from the reselection for fragmentation. The Xcalibur software (version 2.0.7, Thermo Fisher Scientific, San Jose, CA) was used for the management of instrument control, data acquisition, and data processing.

Protein database search and characterization
Peptides were identified by peak lists converted from the nanoLC-MS/MS spectra by bioinformatics searching against Homo sapiens taxonomy in the Swiss-Prot databases for exact matches using the Mascot search program (http://www.matrixscience.com) [23,24]. Parameters were set as follows: a mass tolerance of 10 ppm for precursor ions and 0.8 Da for fragment ions; no missed cleavage site allowed for trypsin; carbamidomethyl cysteine as fixed modification; dimethylation specified as standard of the quantification; oxidized methionine and deamidated asparagine/glutamine as optional modification. Peptides were considered positively identified if their Mascot individual ion score was higher than 20 (p<0.05).
Subsequently, the analysis of peptide quantification ratio (D/H) for normal (hydrogen labeling) and HCC (deuterium labeling) from urine was carried out by Mascot Distiller program (version 2.3, Matrix Science Ltd., London, U.K.) using the average area of the first 3 isotopic peaks across the elution profile. The Mascot search data as well as quantification resulting from each fraction were also merged by this program that combined the peptide ratios matching the same sequence obtained from different fractions or at different retention time and charge state [17]. The identified proteins with up-and downregulation were further categorized based on their biological process and molecular function using the PANTHER classification system (http://www.pantherdb.org) as described in the previous studies [25][26][27].

Construction of signaling pathways and network analysis of protein interaction
The software program (www.ingenuity.com) from Ingenuity Pathways Analysis (IPA, Ingenuity Systems, Redwood City, CA) was used for deriving the pathways and networks of protein interaction. Protein factors characterized by proteomic analysis were analyzed for their association with mapping related to canonical pathways deposited in the IPA library.

Protein expression levels analyzed by nanoLC-MS/MS
Quantitative proteome analysis by shotgun approach coupled with stable isotope dimethyl labeling has been used in identifying candidate biomarkers or target factors in different types of samples on account of the fact that this approach can detect differentially released proteins at relatively low abundance [28][29][30][31]. In this study, we conducted a comparative proteomics investigation of urine between HCC patients and control group by a bottom-up shotgun proteomic approach. A schematic representation of sample processing, separation and the subsequent workflow concerning trypsin digestion, dimethyl labeling and shotgun analysis is shown in Figure 1. Initially, 100 μg each of total urine proteins from HCC patients and control group was subjected to trypsin digestion and dimethyl labeling. Respective tryptic peptide samples were mixed in a 1:1 (w/w) ratio and then enriched by the reverse-phase C18 column. Owing to the fact that the enriched peptide population was too complex to be fully detected and characterized by a single LC-MS/MS run, the enriched peptides were fractionated by HILIC based on polarity difference, and then harvested into 10 fractions. Each fraction was analyzed by LC-LTQ-Orbitrap and the parameter used in searching for peptide identification was adjusted to allow for no missed cleavage. Most of the peptides were separated from a single or two adjacent HILIC fractions, and peptides identified by the Mascot search program (http://www.matrixscience.com) [23,24] were accepted if their individual ion score was higher than 20, which had been a cutoff point used for the lower-quality MS/MS spectra [32][33][34] Supplementary information.

Quantification of identified proteins with differential expression
Once the differentially released proteins with confident identification based on dimethyl labeling, enzyme digestion and peptide mass fingerprinting (PMF) were completed, the peptide quantification ratio (D/H) was obtained by Mascot Distiller program using the average area of the first 3 isotopic peaks across each elution profile [17,18,35]. Data from the publicly available Mascot searching engine as well as quantification results from each fraction were also merged by Mascot Distiller program into one file that combined peptides with more than one peptide matching the same sequence, which were harvested from different fractions or at different retention time and charge state. Herein, the 14 up-regulated (D/H ratio ≥ 1.5) and 7 down-regulated (D/H ratio ≤ 0.6) proteins displayed in at least four of seven comparative urine samples between HCC patients and control group were identified and listed in Table 1 Figure 5. Therefore, these two proteins show reduced release among patients. The results from the above data further indicate that no isotopic effect was observed in the two-dimensional HILIC-C18 separation, revealing the excellent separation efficiency of dimethylated peptides as fractionated by HILIC column. The high orthogonality of HILIC and reverse-phase C18 column may have made a great contribution to this efficiency.
All these differentially expressed proteins were further categorized using the PANTHER classification system [25,27,36]. Functional distributions of these identified proteins were shown in Figure 6. It is worthy to note that binding proteins and proteins with catalytic activity account for 28% and 24% of increased release proteins, respectively ( Figure 6A) and proteins of receptor type occupied as high as 37% of reduced release proteins ( Figure 6B) in the urine of HCC patients. These differentially released proteins were also associated with a variety of biological processes such as cellular process (10%) and metabolic process (21%) shown in Figure 6C; a high proportion of proteins    involved in the process of immune system (12%) showed reduced release, as shown in Figure 6D. Therefore, our current observation demonstrates that decreased and increased expression of these proteins implied the likelihood of their involvement in the pathologic statuses of carcinogenesis and differentiation of HCC cells.

Construction of signaling pathways and network analysis of protein interaction
Using a panel of these identified proteins, we further cluster them into a possible connection network based on the biochemical categorization to put forward a simulation scheme for the prospective signaling pathways governing the maintenance and progression of carcinogenic status in the liver tissue. In Figure 7, these identified proteins (shown in red color) mapped to canonical pathways derived by the Ingenuity Pathways Analysis (IPA, Ingenuity Systems) databank were displayed with different shapes to signify the disparate and diverse functions. Proteins reported in the literature and canonical pathway database based on their functional annotation were shown in white color and subjected to the association analysis and simulation of possible molecular interaction with our identified proteins. All the gray arrows designate the biological interrelationships between molecules. All arrows in the figure were supported by at least one reference from the literature, textbooks, or canonical information stored in the Ingenuity Knowledge Base. The increased release proteins including pro-epidermal growth factor (EGF), kininogen-1 (KNG1), beta-2glycoprotein 1 (APOH) and polymeric immunoglobulin receptor (PIGR), to some extent, are involved in inflammatory responses; simultaneously, basement membrane-specific heparan sulfate proteoglycan core protein (HSPG2) and ribonuclease (RNASE1) were categorized to be involved in the proliferation of cancer cells. Namely, HCC was not characterized with a singular and clear-cut enzymatic or cytoskeleton alteration but with a series of complex and diverse functional changes.
Several proteins identified by our shotgun approach were not recruited to the canonical pathways mapped in the database owing to the fact that these proteins were not linked to functional interaction; however, the importance of these unmapped proteins with up-regulated release cannot be overlooked. The availability and suitability of these identified proteins employed as candidate biomarkers will be validated by subjecting them to next phases of verification and validation using ion scanning of peptides measured and quantified in multiple reaction monitoring (MRM) mode of nanoLC-MS/MS analysis. In addition, a much larger number of matched sample pairs should be essential for being able to discriminate the subtle and yet crucial differences of released proteins between HCC and corresponding normal counterparts. After completion of the initial phase of biomarker discovery based on limited sample pairs in this preliminary pilot study, we are currently embarking the second phase of biomarker verification based on the identified candidate marker proteins and an expanded scale of urine sample collection of HCC and some other patients of different diseases [37,38].

Conclusion
Hepatocellular carcinomas (HCCs) encompass different etiology and pathological manifestations coupled with heterogenic genomic alterations leading to high complexity and intractable therapy and treatment. Collectively, the severity of HCC involves a variety of protein factors which play some regulatory roles in metabolic coordination of physiological functions. The systematic decrease and increase of these proteins may be reflective of the dysfunction of liver cells, followed by morphological aberrations upon progressive developments of HCC. The comparative proteome data from urine samples may help not only offer a novel approach to further understand the mechanism(s) underlying the development and the associated metabolic signaling pathways entailed in liver carcinogenesis, but also develop potential and valuable biomarker candidates useful for the non-invasive diagnosis and prognosis.  : Schematic representation of derived pathways associated with liver cancer and inflammation. The networks of these identified proteins mapping to the canonical pathways from Ingenuity Pathways Analysis (IPA, Ingenuity Systems) library were employed for the analysis of proteins with increased or decreased expression. Identified proteins shown in red color were displayed with different shapes to indicate different functions. The biological interrelationships between molecules were represented as arrows. All drawn arrows were cited and supported by at least one reference from the literature, textbooks, or canonical information stored in the Ingenuity Knowledge Base. Fx denoted for function.