U.S. Department of Agriculture, Agricultural Research Service, Soybean Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
Received date: February 05, 2014; Accepted date: February 07 2014; Published date: February 09, 2014
Citation: Natarajan SS (2014) Analysis of Soybean Seed Proteins Using Proteomics. J Data Mining Genomics Proteomics 5:e113. doi: 10.4172/2153-0602.1000e113
Copyright: © 2014 Natarajan SS. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Visit for more related articles at Journal of Data Mining in Genomics & Proteomics
Soybean food products are popular because of their health benefits and they are a major source of protein for various food supplements. Soybean seed contains 40-50% protein on a dry matter basis making it a good source of plant protein in human consumables such as baby formula and protein concentrate. The seeds contain an abundance of storage proteins, namely β-conglycin in and glycinin, which account for ~70-80% of the total seed protein content. Lesser abundant proteins include β-amylase, cytochrome c, lectin, lipoxygenase, urease, Kunitz trypsin inhibitor (KTI), and the Bowman Birk inhibitor (BBI) of chymotrypsin and trypsin .
In order to determine the variation of seed proteins that may occur in the crop as a result of genetic modification, accurate and replicable methodology for protein extraction, isolation, and characterization is essential. Modern proteomic tools are being used to study the expression of proteins to examine alterations in protein profiles caused by genetic mutations, and environmental stress [2-10]. Proteome analysis is performed using a variety of methods including structural proteomics such as high throughput (HT) X-ray crystallography and HT nuclear magnetic resonance (NMR) spectroscopy; expressional or analytical proteomics such as gel based electrophoresis (IDE, 2DE, 2DIGE), gel-free (LC-MS/MS or multidimensional protein identification technology (Mud PIT), protein chips, DNA chips, mass spectrometry (MS), micro sequencing; and functional or interaction proteomics such as HT functional assays, ligand chips, yeast 2-hybrid, deletion analysis, and motif analysis [11-14]. In this mini review, we discuss some of the expression analysis methodology that we’ve applied to study soybean seed proteins. Because the extraction of soybean seed proteins for accurate 2-dimensional polyacrylamide gel electrophoresis (2D-PAGE) is challenging, we initially optimized protein extraction techniques by comparing different methods. Extraction of protein suitable for 2D-PAGE is sample-dependent and is achieved by optimizing the concentration of chaotropic agents, detergents, reducing agents, buffers, enzymes and ampholytes. Also helpful in this regard are advances in immobile pH gradient (IPG) technology and the development of electrophoretic instruments have improved the reproducibility of protein separation. The availability of commercial IPG strips in linear and nonlinear gradients with multiple narrow pH ranges , allows effective protein separation for various downstream analyses. We compared four different solubilization methods (urea/thiourea, urea, modified trichloroacetic acid (TCA)/ acetone and phenol) for extraction of proteins from soybean seeds for subsequent analysis by 2D-PAGE . Our study demonstrated that the modified TCA/acetone method with urea/thiourea solubilization resolved more protein spots than the urea or phenol extractions. Using the phenol/urea method, resolution of proteins was generally poor and spots diffuse in the high molecular weight region of the gel, particularly when separating the proteins at pH 4.0 to 7.0. Moreover, while overall protein separation was similar in the TCA/acetone and urea/thiourea methods, low molecular weight proteins were more consistently resolved with the TCA/acetone method. In addition, characterization of low abundant proteins is a challenge because they are often masked by highly abundant proteins. Recently, Boschetti and Righetti  published a review of plant proteomic methods to isolate low abundance proteins. In the past, we also developed methodologies that effectively remove the highly abundant proteins in soybean seed enabling detection of low abundant proteins previously unknown [18,19]. To improve resolution and protein identification, we used both wide (pH 3-10) and narrow (pH 4-7 and 6-11) pH gradients to separate the seed proteins that include storage, allergen, and anti-nutritional proteins (Figure 1). Proteins were then extracted from small gel pieces and analyzed by MALDI-TOF-MS, LC-MS/MS and identified by searches using the NCBI non-redundant protein database [20-22]. Based on the proteins we identified, we developed a comprehensive seed protein database named SoyProDB .
The soybean storage proteins, β-conglycinin and glycinin, are classified based on their sedimentation coefficients. β-conglycinins are encoded by two mRNA groups. The first mRNA group encodes α and α´β-conglycinin subunits and the second mRNA group encodes the β-subunit of β-conglycinin [24,25]. To adequately resolve the multiple protein subgroups, we used narrow pH strips, 4.0-7.0 for acidic proteins and pH 6.0-11.0 range for basic proteins. This approach resolved 7 α subunit protein spots (#1-7), an α´ subunit of β-conglycinin (#8) and 6 β-subunits of β-conglycinin (#9-14). Using these methods, we reported the variation of the above proteins in wild and cultivated soybean genotypes . Schuler et al.  reported that β-conglycinin subunits are products of a multigene family. They suggested that the variation in the distribution of protein spots may be caused by post-translational modifications and not differences in amino acid composition [24,26]. Glycinin is composed of five subunits, G1, G2, G3, G4, and G5, the precursors of which are encoded by five non-allelic genes, Gy1, Gy2, Gy3, Gy4, and Gy5, respectively . To effectively separateglycinin proteins, we used pH 3.0-10.0, 4.0-7.0 and 6.0-11.0 to resolve acidic and basic chains of glycinin subunits. The G1 subunit showed 3 basic polypeptides (spot #15-17); G2 subunit showed 8 spots (#18-25) of acidic and basic polypeptides; G3 subunit showed8 spots (#26-33); G4 subunit showed 12 spots (#34-40); G5 subunit showed 7 seven spots (#41-47). The relative amount of all aforementioned glycinin subunits showed variation among 16 soybean genotypes examined .
Soybean seed also contains several allergen proteins. Gly m Bd 60K has been described as storage seed proteins as well as a major allergen in soybean. Gly m Bd 60 K includes β-conglycinin and glycinin. Only the α subunit of β-conglycinin is reported to be allergenic for consumers . The acidic polypeptides of G1 and all G2 subunits of glycinin are also reported to be allergenic . However, using immunoblot and MALDI-TOF analysis, Krishnan et al.  reported that all three subunits of soybean β-conglycinin are potential food allergens. Our investigation using 2D-PAGE showed seven spots (# 1-7) of α subunits  as shown in Figure 1. The glycinin G2 allergens showed eight spots (#18-25). Gly m Bd 30K is another protein of soybean designated as a major allergen. This protein was previously described as the 34-kD vacuolar protein P34 . Using a monoclonal antibody against P34, Yaklich et al.  found that P34 showed genetic diversity in soybean with respect to protein quantity. In our analyses, we reported 2 protein spots of P34 (#50,51) in wild soybean genotypes. Gly m Bd 28 K is a less abundant allergen protein of soybean that was originally isolated from soybean meal . In our study, two spots (#61 and 62) of Gly m Bd 28 K were identified. A study by Xiang et al.  suggests that this allergen is probably processed into two smaller polypeptides of 240 and 212 amino acids in the soybean seed, and both portions are expected to be present in soybean-derived foods. These two spots of Gly m Bd 28 K probably come from the post-translational processing of the same gene.
Kunitz trypsin inhibitor (KTI) is one of the abundant antinutritional proteins in soybean seed that inhibits trypsin, an important animal digestive enzyme . In addition, KTIs have been characterized as food allergens in humans . In soybean, three KTI genes (KTI1, KTI2 and KTI3) have been reported, and both transcriptional and post-transcriptional processes regulate KTI gene expression . The KTI3 transcript was detected only in the soybean seed, while KTI1 and KTI2 transcripts are expressed in soybean leaf, root, and stem. Our study showed that although the overall distribution patterns of KTI protein spots are quite similar in wild and cultivated genotypes, the number of protein spots and their intensities varied between these genotypes . Three protein spots (#54-56) of KTI were identified by MALDI-TOF-MS and database searches in wild soybean varieties. Soybean agglutinins are another class of anti-nutritional proteins present abundantly in soybeans. The presence of this protein accounts for about 10% of total protein in some legumes. The wild soybeans showed 6 spots (#48-49, 57-60) of soybean agglutinins.
In conclusion, effective extraction methodology is important for analysis of both abundant and low abundant soybean seed proteins. The combination for 2D-PAGE with MS effectively differentiates several classes of soybean seed proteins. We have characterized storage, allergen, and anti-nutritional proteins among different soybean genotypes using proteomic tools. The database, SoyProDB, is useful for scientists wishing to tailor products with specific proteins to produce value-added soybean genotypes. These tools and methods will be used in ongoing investigations to study the biosafety of transgenic soybeans in our laboratory.
Funding for this research was provided by ARS project 1245-21220-232-00D.