Progress and Clinical Applications in Proteomics

The innovative “omics” technologies such as genomics, transcriptomics, proteomics, and metabolomics have greatly contributed to biomedical discovery and advances. A single gene can engender multiple protein products as a consequence of modulation in the processes of protein production from DNA such as transcription, processing and translation. In addition, protein modifications such as phosphorylation, dephosphorylation, glycosylation, acetylation, sulfation, hydroxylation, carboxymethylation and prenylation occur in vivo. Furthermore, it has been reported that the correlation between mRNA and protein levels was not sufficient to predict protein expression levels from mRNA information [1,2].


Introduction
The innovative "omics" technologies such as genomics, transcriptomics, proteomics, and metabolomics have greatly contributed to biomedical discovery and advances. A single gene can engender multiple protein products as a consequence of modulation in the processes of protein production from DNA such as transcription, processing and translation. In addition, protein modifications such as phosphorylation, dephosphorylation, glycosylation, acetylation, sulfation, hydroxylation, carboxymethylation and prenylation occur in vivo. Furthermore, it has been reported that the correlation between mRNA and protein levels was not sufficient to predict protein expression levels from mRNA information [1,2].
Proteomics analysis allows hundreds of proteins to be identified and quantified with high speed and sensitivity, and also aids global analyses of protein function, modifications, composition and dynamics. The term proteome was first proposed to define the expressed protein complement of a genome by Wilkins et al. in 1995 [3,4]. Proteomes are presumed to include over 13 million different proteins across all species and over two million different proteins in human [5,6]. Proteomics research has investigated the entire protein content of various organisms; in other words, it is functional genomics at the level of proteins. Proteomics-based approaches have used human plasma [7], serum [8], urine [9], cerebrospinal fluid [10], nipple aspirate fluid [11], ductal lavage [12], amniotic fluid [13], bile [14], lymph [15], breast milk [16], mucus [17], pleural fluid [18], saliva [19], tears [20], and various tissues and cells [21] as protein sources. Thus, the clinical application of this approach (clinical proteomics) can be applied to identify specific disease markers, biomarkers, drug and therapeutic targets. Serum levels of alpha B-crystallin and tropomyosin were reported to be significantly higher in cardiac allograft patients undergoing rejection than in patients who remained free from rejection using a proteomic approach [22]. Mancone et al. combined quantitative proteomics and computational biology molecular to study the onset of liver steatosis in patients with hepatitis C virus (HCV) infection; their findings may provide a new therapeutic approach for HCV [23]. Alexander et al. [24] suggested that the expression of alpha1-acid glycoprotein and gross cystic disease fluid protein-15 correlated with disease presence and stage in breast cancer patients using proteome analysis [24].
This review article focuses on current proteomic approaches, including their progress and limitations, and on actual trial data to detect novel biomarkers in patients with Kawasaki disease using proteomics.

Proteomic Methods
Proteomics makes feasible analyses of kinetic data such as expression and localization of proteins, of post-translational modification such as phosphorylation, and of protein-protein interactions. Recently, mass spectrometry (MS)-based proteomics has become one of the increasingly popular main stream techniques for disease-related research such as for the elucidation of pathogenic mechanisms and disease biomarkers. MS-based proteomics can be divided into two main classifications: discovery approach and verification approach. The discovery approach aims mainly to detect and identify biomarker proteins and useful target proteins for the development of new therapies and drugs, while the verification approach aims to assess the validity of candidate biomarker proteins and target proteins in population-based studies. However, it is very important that the candidate peptides or proteins identified in the discovery approach are evaluated in enlarged populations and samples, which should result in promising future clinical applications.

Discovery and Verification Approaches
The combination of 2-dimensional electrophoresis  to separate the proteins in a sample and mass spectrometry (MS) technologies to identify proteins has frequently been used for the identification and quantification of proteins. In the medical field, this technique has been used in clinical proteomics, making possible global studies of significant differences between healthy and disease patients, normal and diseased conditions in cells, tissues and organisms. Such studies may result in novel biomarkers for diagnosis and disease monitoring, and the development of new therapies and drugs. While this approach has been successful so far, some issues such as the solubility of surface hydrophobic protein, limited separation range, and limited sensitivity for detection and separation of proteins.
Shotgun proteomics analysis does not use 2-DE but also allows hundreds to thousands of proteins to be identified and quantified from complex samples. This method typically uses high-throughput liquid chromatography coupled with tandem mass spectrometry (LC-MS/ MS) analysis of proteins digested by enzymes such as trypsin [25][26][27]. Shotgun proteomics has a number of advantages: not using 2-DE; analyses of basic proteins; analysis of high molecular weight proteins (>120 kDa); and detection of low abundance proteins [28]. Dasari et al. [29] developed Pepitome, a new spectral library search tool, for the identification of peptides by comparing experimental MS/MS scans to those in spectral libraries. This tool makes the automation of quality analysis and quality control for shotgun proteomics data possible [29]. Despite the advances such as high-throughput, high sensitivity and precision, these methods have some disadvantages. Shotgun proteomics is unsatisfactory for the global analysis of proteins and their functions, because of the number and variety of proteins that are estimated to be produced from the human genome. One reason for this is the lack of accumulated information in protein and peptide databases, for identification, functional and structural analysis. Thus, there is still a need to continue to the development of valuable and efficient tools for proteomics analysis.
A variety of labeling approaches including protein labeling, Isotope-Coded Affinity Tags (ICATs) [30], Isobaric Tags for Relative and Absolute Quantification (iTRAQ) [31,32], and Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC) [33,34] are valuable techniques in proteomic analysis. Two-dimensional Difference Gel Electrophoresis (2D-DIGE) methods such as the protein tagginginduced approach were developed from conventional 2-DE analysis by using fluorescent dyes such as Cy3 and Cy5 and this has improved the sensitivity of detection and the reproducibility in MS-based proteomics [35][36][37]. In addition, 2D-DIGE allows a third dye, Cy2, to be used in multiple sample comparisons [38]. In 2D-DIGE, proteins extracted from two or three samples are labeled directly with different fluorescent dyes, and separated by 2-DE, which makes it possible to visually assess the proteins, including identical protein from different samples, on the same 2D gel. Thus, dynamic changes in protein levels of total or individual proteins related to various states can be easily identified, allowing a more sensitive and reproducible detection of differences to be made. In addition, it has been reported that 2D-DIGE makes the analysis of trace amounts of protein (0.5 fmol), and this method is linear over 10,000-fold concentration range [39,40]. Despite its many advantages, some problems remain to be solved; for example, the solubility of surface hydrophobic proteins, labeling with aminereactive DIGE dyes, and the limited dynamic range [37]. In the ICAT approach, biotinylated cysteine-containing peptides are selectively isolated. This approach could also be used for the quantification of differences in protein expression in cells and tissues by isotope dilution techniques. In the ICAT approach, two samples are labeled with different light and heavy ICAT reagents, mixed and degraded to peptide fragments by enzymatic reactions [30]. ICAT-labeled peptides are detected by avidin affinity chromatography and analyzed by MS and MS/MS, which makes quantitative analysis and protein identification by sequence information possible. However, ICAT reagents are specific for cysteine residues and can only be used to analyze proteins that contain a cysteine residue. In the SILAC approach, cell samples to be compared are grown separately in media containing either the heavy or light form of an essential amino acid. Although this approach is based on the simple process of labeling, it has been shown to be a powerful tool for quantitative proteomics. However, SILAC requires cells to be active to allow for the incorporation of the stable isotope [39]. The iTRAQ reagents can code the amino-termini and lysine residues of all the peptides in digests of a proteome through acylation [41]. When combined with an MS-based approach, iTRAQ labeling techniques can be of great benefit, because not only is the sensitivity for protein identification and quantitation enhanced but also post-translational modification events can be investigated. Furthermore, the information acquired from this approach is important for providing additional statistical validation in MS-based proteomics.
In the verification approach followed by the discovery approach, candidate proteins or peptides that are identified must be validated at a large-scale. Recent developments in the verification approach have led to a number of advances, not only in the quantitative, sensitive and high-throughput assessment of protein abundance but also in high reproducibility and wide dynamic range. Proteomic studies still have limitations, such as the identification and determination of low abundant proteins; however, powerful techniques to overcome some of these limitations are becoming available. In this review, the focus on the targeted proteomics approach based on multiple reaction monitoring (MRM) (also called selected reaction monitoring (SRM)) [42][43][44][45]. The processes for peptide identification are not involved in the MRM. The MRM method is a quantitative analysis approach using triple quadrupole (QQQ) MS where the first and third quadrupoles act as filters to select a particular peptide ion and a specific fragment ion of the peptide, respectively. The combination of m/z setting for the two quadrupoles is called "transition". Thus, the MRM method aims to detect specifically peptides of interest. Generally, by using a combination of QQQ-based MRM and High-Performance Liquid Chromatography (HPLC), quantitative detection with good reproducibility is made possible. An advantage of the MRM method is that the use of antigen-specific antibodies for detection is not required and the determination of multiple proteins can be performed in one run. However, a comprehensive quantitative analysis of proteins using MRM method remains to be established. One reason for this is the lack of prior information about the peptides required for MRM analysis, including information about the selection of optimal MRM transitions and the measurement of liquid chromatography retention times which are essential to success of the analysis. Retention times can be obtained from previous experiments or tools including the sequence-specific retention calculator (SSRCalc) [46]. However, because the human proteome contains a large number of proteins or protein products, accumulation of further data from experiments and compiled databases from the information is essential for the success of the MRM method. Regarding the selection of optimal MRM transitions, the number of transitions per peptide are limited (generally 3-8 transitions per peptide [47]) and the best ones for obtaining a high sensitivity analysis are selected. Selection could also depend on accumulating empirical data, such as that from shotgun proteomics approach and from QQQ apparatus.

Sample preparation
Major factors for successful proteomic analysis are to select appropriate samples for the study and to store them under optimum conditions. For instance, in studies that use cells or cell lines, the results are definitely related to the origin of the cells, and their passage number and culture environment. In particular, in clinical proteomics such as biomarker discovery, caution must be exercised in sample preparation as follows: (i) selection of an adequate source of samples such as injured cells, tissues, organs and their sub-cellular localization; (ii) different expression levels and kinds of proteins, even within one type of cell or cell line; (iii) differences in the historical and genomic background, such as ethnicity and/or diseased state, of the cells or cell lines that are used; and (iv) protein structure variations under different physiological conditions such as by endogenous enzyme activity. These factors at the very least greatly affect reproducibility and sensitivity of proteomics approaches. In addition, possible sample contamination needs to be considered. To help circumvent some of these difficulties, the selection and preservation of samples must be done carefully and monitored using standardized protocols. In addition, validation of the approach in several different cells or cell lines is required in advance.

Protein separation by 2-DE
Briefly, proteins are separated on the basis of charge (iso-electric point) in the first dimension, and then on the basis of molecular mass in the second dimension. As an example, a Coomassie brilliant blue- stability. Typically, this approach can be used to detect more than 1000 protein spots in a gel by staining with silver. High molecular weight (>150kDa) proteins, and strongly basic or hydrophobic proteins are difficult to separate [48]. In addition, low abundant proteins or limited protein load will result in a decreased number of identifications. To address these problems, several techniques have been developed. DeStreak rehydration solution transforms the protein thiol groups into stable disulfides and protects the disulfide groups from unspecific oxidation, which can help improve separation and streak between spots in the pH range 7-9. The albumin removal kit can improve the resolution of lower abundance proteins by allowing increased protein loads on IPG gels; however, it may result in the loss of albumin-binding proteins. Also, 2-DE using an agarose iso-electric focusing gel in the first dimension (agarose 2-DE) has been reported [48] as a tool to overcome these limitations and improve detection ability. Oh-ishi et al. [48] demonstrated the advantages of the agarose 2-DE method such as 10-fold increases in protein load compared with the 2-DE method using IPGs. In addition, some protein spots with basic pH values or high molecular mass proteins (>150kDa) could be detected or analyzed only by using the agarose 2-DE method. However, the agarose 2-DE method may suffer from low reproducibility because of technical difficulties such as the preparation of agarose IEF gels.

Protein identification
Since the 1980s, MS for the analyses of proteins and peptides has developed rapidly. MS plays a major role in protein identification and Peptide Mass Fingerprinting (PMF). MS analysis can be used for efficient identification, dynamics and detection of post-translational modifications in proteins. In addition, this method is sensitive at femtomole to attomole concentrations and is a powerful tool for highthroughput analysis [49,50]. Therefore, the development of MS largely contributed to the development of proteomic approaches. Different methods for the ionization of proteins or peptides, such as Matrix-Assisted Laser Desorption Ionization (MALDI) from solid state [51] and Electrospray Ionization (ESI) from liquid state [52] are available. The selection of the most appropriate MS method for identification or analysis of samples is crucial because of their individual properties. Various methods such as MALDI Time-of-Flight (TOF) MS using a combination of TOF analyzer that distinguishes the molecules using their arrival times at a detector, ESI using a quadrupole MS [53] and ion trap MS [54] are being developed to overcome some of the main problems of the proteomics approach. MALDI-TOF MS provides highthroughput, high automation and good sensitivity, and allows various types of samples such as serum, saliva, urine and cerebrospinal fluid to be analyzed. Thus, the MALDI-TOF MS-based approach is at the core of clinical proteomics aimed at identifying biomarkers; on the other hand, the ESI-MS-based approach is a powerful tool for the analysis and characterization of polypeptides in clinical proteomics [55]. Tandem MS (MS/MS) has the advantage of using two-stage MS devices and is particularly useful for the measurement of complex samples because of increased ion information derived from target materials regardless of differences in ionization or sample states. The MS/MS method has also been reported to be able to identify cross-linked peptide candidates in complexes rapidly and sensitively by using a combination of chemical cross-linking and (18)O-labeling [56].

Data analysis
Bioinformatic tools are essential for identifying proteins based on MS, MS/MS and other proteomic approaches, and are used in the quantitative analysis of differential patterns of protein expression in 2-DE such as 2D-DIGE [57].
Accessible databases and database tools such as Mascot Search [58], SEQUEST [59], X! Tandem [60], ExPASy [61] and Phenyx [62] are frequently used. Mascot Search stores information on peptide mass values from an enzymatic digest of a protein and sequence queries and MS/MS ion searches can be used for identification based on raw MS/ MS data. SEQUEST is a software algorithm for the analysis of peptide MS/MS spectra that can be used to determine the amino acid sequence, the protein and organism that corresponds to the mass spectrum being analyzed. X! Tandem is a search algorithm that can match MS/ MS spectra with peptide sequences that then can be used for protein identification. ExPASy is one of the main bioinformatics resources for protein sequences and identification, protein characterization and function, post-translational modification, protein structure and protein-protein interaction. Phenyx software can also be used to identify and characterize proteins and peptides from MS and MS/MS data.

An Approach to Detect Novel Biomarkers Using Proteomics Kawasaki disease and anti-endothelial cell antibodies
Kawasaki disease (KD) is an acute vasculitis of unknown etiology that mostly affects children younger than five years of age, and, in particular, infants around one-year-old. It mainly affects small and medium-size arteries, particularly coronary artery, and is characterized by systemic inflammation and cardiovascular manifestations such as coronary artery aneurysm formation and endothelial dysfunction.
Anti-endothelial cell antibodies (AECA) represent a heterogeneous group of antibodies directed against a great variety of endothelial cell (EC) surface antigens. AECA have been detected in various diseases such as autoimmune, inflammatory, and infectious diseases. Interestingly, it was reported that its prevalence is high, especially in patients with systemic vasculitis and secondary vasculitis such as rheumatoid arthritis and systemic lupus erythematosus complicating vasculitis [63,64]. In KD, IgM-and IgG-AECA were detected in 42-73% and 5-26% of the patients, respectively [65,66]. In addition, AECA in patients with KD were reported to induce cytotoxicity to ECs, such as complement-dependent cytotoxicity [65,66], enhanced expression of adhesion molecules, and monocyte adhesion to human umbilical  [67,68]. Therefore, AECA are considered to play pathological roles in vasculitis such as KD; however, the detailed mechanisms remain unclear. One reason for this is a lack of detailed study about individual auto-antigens for AECA. Thus, we used proteomics techniques to comprehensively detect the auto-antigens for AECA in patients with vasculitis.

Detection and Identification of the Target Antigens for AECA
To understand the pathogenic roles of AECA in vasculitis, we applied proteomic techniques to detected EC-specific antigens for AECA by comparing the proteomes of HUVEC and HeLa cells (control cells). The following protocol was used for the analysis: (i) Proteins was extracted from HUVEC and HeLa cells. (ii) The extracted proteins were separated by 2-DE on the basis of charge in the first dimension and relative molecular weight in the second. (iii) After electrophoresis, one gel was stained with Coomassie brilliant blue and the other gel was used for transfer of the separated protein onto a nitrocellulose membrane for Western Blot (WB) analysis (iv) In the WB analysis, protein spots detected in the HUVEC sample but not in the HeLa samples were selected as candidate EC-specific antigens for AECA. (v) The proteins in the selected spots were identified by peptide mass fingerprinting (vi) Their antigenicity was identified by WB using prepared recombinant antigens. To confirm that the antibodies against the identified protein bind to the cell surface of live HUVEC, indirect immunofluorescence (IIF) was conducted (vii) The clinical importance of antibodies against the identified protein was investigated by estimating the positive rate of the antibodies in vasculitis by comparing the disease activity and laboratory data between the antibody-positive and antibody-negative patients with vasculitis.
Although target antigens for AECA were identified using this approach, at least three problem have to be considered. First, the EC antigens for AECA can include not only constitutive proteins on the surface of the ECs but also proteins that have translocated to the membrane surface by various stimulations (non-constitutive proteins). In addition, the EC antigens for AECA can be modulated by cytokines such as IL-6 and TNF-α, and by physical effects such as shear stress. Considering the diversity of AECA antigens, it is not easy to investigate whether or not each of the AECA antigens is involved in the pathogenesis of vasculitis. Second, even within HUVEC cells, the antigen expression patterns will differ depending on the origin of the cells, their passage numbers, and environmental factors such as hypoxia and pH. Third, EC lines cannot represent the AECA antigens for all human EC types. Furthermore, the prevalence of the AECA may differ depending on the EC types; for example, granulomatosis with polyangiitis patients were reported to exhibit AECA-binding to human nasal ECs (61%), human kidney ECs (71%), HUVEC (7%) and human liver sinusoidal ECs (0%) [69]. In our data obtained by one-dimensional electrophoresis (1DE) and WB, different patterns of target antigens for AECA were observed in different ECs such as HUVEC, HAEC and human coronary arterial endothelial cells (HCAEC) (Figure 2). Thus a key to the success of this method is to use appropriate EC lines based on consideration of the injured vessel size and affected organs.

Kawasaki Disease and Peroxiredoxin2
We detected about 150 candidate target proteins for AECA using differential 2-DE and WB. One of the more than 50 proteins identified was peroxiredoxin2 (Prx2), an anti-oxidative enzyme. Oxidative stress is known to cause inflammation such as vasculitis [70,71]. In an animal model, it has been shown that the oxidation status of Prx reflected oxidative stress in the vasculature and correlated to the extent of lesion formation [72]. In mammalian cells, the Prx family consists of at least six Prxs, including Prx2 which is the fastest regenerated protein after oxidative stress in the family [73,74]. Prx itself is inactivated by excessive oxidant production which may be involved in cell injury. We have also shown that antibodies to Prx1 and Prx4 were found in patients with autoimmune diseases such as systemic lupus erythematosus and rheumatoid arthritis [75].
In our data, IgG antibodies to Prx2 were detected in 60% of untreated patients with KD, whereas no IgG antibodies were detected in healthy individuals [76]. In addition, all of the three patients who subsequently developed coronary artery lesions (CAL) had IgG antibodies to Prx2. IgG antibodies to Prx2 are specific for KD patients as compared to IgM-and IgA antibodies to Prx2. In immunocytochemistry, antibodies to Prx2 were found to bind to the cell surface of unfixed ECs, and Prx2 was detected in various types of vascular ECs by WB using cell lysates. Functionally, the anti-Prx2 antibodies also significantly increased various inflammatory cytokine secretions; in particular, IL-6 in HUVEC, G-CSF in HCAEC, and MCP-1 in HAEC. Anti-Prx2 antibodies induced increased expression of adhesion molecule such as E-selection and ICAM-1. Interestingly, addition of anti-Prx2 antibodies to ECs resulted in increased concentrations of H 2 O 2 in cell lysates from the ECs. Clinically, compared to the samples before intravenous immunoglobulin therapy, post-treatment samples had low ratios of anti-Prx2 IgG antibody titers per serum IgG levels in the tested KD patients. Fujieda et al. [69] reported that the duration of fever >37.5°C was significantly longer in the anti-Prx2 positive group than in the anti-Prx2 negative group [69]. Furthermore, urinary concentrations of 8-iso-prostaglandin F2alpha, an index of oxidative stress in urine, significantly correlated with anti-Prx2 antibody titers. Thus, anti-Prx2 antibodies may cause vascular dysfunction by inducing expression of endothelial adhesion molecules, inflammatory cytokine production and/or inhibition of anti-oxidative activity of Prx2 by binding to Prx2 on ECs.

Conclusions and Perspectives
Proteomic approaches provide a great deal of information about individual proteins, post-translational modification, protein function and protein-protein interactions. Further analysis of proteomics data is served by computerized databases that store this accumulating information. One of the most important factors for achieving the maximum performance of proteomics is the development of statistics and bioinformatics tools that can be used to mine and analyze important information from the large amounts of available data, including the protein and nucleotide sequence databases.
Recent advances in 'omics' technologies including genomics, transcriptomics and proteomics have helped to understand molecular mechanisms at the DNA, mRNA and protein levels. These combined approaches are required to understand disease pathogenesis and to identify disease biomarkers and important targets for new therapy or drugs, because the pathogenic mechanisms of disease can involve abnormalities at several levels, such as DNA and/or protein.
We attempted to identify target proteins for AECAs and to clarify the pathogenic roles of AECAs in KD using a proteomic approach. We demonstrated that Prx2, one of the identified proteins, was a novel autoantigen for AECA and IgG antibody to Prx2 which might be a useful biomarker for KD. The results demonstrated that the proteomic approach was a powerful tool for identifying target proteins for AECA. As a next step toward understanding the role of Prx2 in KD, this candidate biomarker should be validated precisely in large-scale studies.