A Free Flow Electrophoresis Separation Strategy for Segregation of High Abundant Phycobilisomes from Cyanobacterium Nostoc punctiforme PCC 73102

1Biological and Environmental Systems Group, ChELSI Institute, Department of Chemical and Process Engineering, The University of Sheffield, Mappin St., Sheffield, S1 3JD, UK 2BD Diagnostics, Tullastrasse 8-12 69126, Heidelberg, Germany 3Protein Biochemistry Laboratory, Novartis Vaccines and Diagnostics Research Centre via Fiorentina 1I-53100 Siena, Italy 4BD Diagnostics, the Danby Building, Oxford Science Park, Oxford, OX4 4DQ, UK 5Department of Biochemistry, University of Nebraska-Lincoln, 1901 Vine Street, Beadle Center Lincoln, NE 68588, USA


Introduction
Cyanobacteria, being both oxygenic and photoautotrophic, are some of the oldest organisms on the planet [1]. Given their diverse morphology and metabolic infrastructure to produce high value biological active precursors, they show tremendous potential in agricultural, biopharmaceuticals, drug-discovery, bioprocess engineering, and bio-fuels applications [2][3][4][5][6][7]. The increasing number of sequenced cyanobacterial genomes (39 genomes as of December 2011) and the rapid advancements in systems biology, has opened up many opportunities. More recently, there has been a significant push in the development of cyanobacterial proteomics research for industrial and bioprocess applications [8].
While there have been noteworthy successes in the development of robust and effective analysis platforms, a major issue that remains to be solved in cyanobacterial proteomics is the large dynamic range of the proteome. For example, the linear dynamic range of phycobiliproteins was found to be up to eight order of magnitude [9], which makes the detection and quantification of proteins that are present in low abundance particularly difficult [10], especially since most modern mass spectrometers typically can only observe 3-4 orders of magnitude at best. The development of an effective and robust form of protein separation is therefore needed to alleviate this issue. Since proteins can be highly complex and diverse in their properties, various separation methods can be devised. These include sodium dodecyl sulphatepolyacrylamide gel electrophoresis (SDS-PAGE) (1-D and 2-D) and 1-D (usually ion-exchange chromatography) and 2-D (ion-exchange followed by reversed phase) shotgun approaches. Two-dimensional SDS-PAGE is a long-standing conventional technique commonly used to separate and visualise highly complex protein mixtures [11]. In this technique, proteins are separated based on their isoelectric points (1 st dimension), followed by molecular weight (2 nd dimension) [12,13].
Despite its continuing popularity, gel-based separations suffer from a number of limitations such as reproducibility and limited detection of low abundance proteins [14].
One of the emerging gel-free techniques, free flow electrophoresis (FFE), has been shown to provide an alternative route to separate charged analytes, such as peptides and proteins. This technique can also be applied to separate low-molecular weight organic compounds, membranes, organelles and whole cells in aqueous media, under both native and denaturing conditions [15]. Different separation approaches such as isoelectric point (IEF), mass to charge ratio (zone electrophoresis) and electrophoretic mobility (isotachophoresis) can be realised using the FFE setup [16]. FFE approach provides the advantages of very high sample load due to its continuous sample introduction, short separation time (usually 1 hour) and multiple sample recovery; thus allowing effective high throughput analysis [17]. This technique has been shown to be compatible with traditional 2D-SDS-PAGE and 2D-LC setup on a vast portfolio of applications, including microbial [18], human [19], model insects (Drosophila melanogaster) [17], cancer [20] and general medical research [21]. Potential role of FFE in accessing the low abundance proteome has been discussed in detail in a recent

Abstract
Generally proteomics analysis of cyanobacteria has limited dynamic resolution because of the presence of abundant proteins. Thus proteomics assessment of the low abundance proteins in cyanobacteria is difficult as it comprises phycobilisome and Rubisco. The current investigation assesses the performance of FFE in isoelectric focusing (IEF) mode: a solution-based technique that separates proteins on the basis of isoelectric point (pI). We explore the advantages of combining robust protein fractionation using FFE and 1D-LC-MS/MS to extend protein identification deeper through the dynamic range of the model cyanobacterium, Nostoc punctiforme PCC 73102. Sixty-one novel new proteins (out of 248 (all identified with ≥ 2 peptides)) were successfully identified using FFE-IEF compared to all previous reports. Results demonstrate the ability of FFE to provide improved protein distributions, while also providing effective segregation of highly abundant phycobilisomes to allow access to 37 low abundance proteins (pI > 9) in the N. punctiforme proteome, which are difficult to observe using conventional methods.
review. Readers are directed to a recent publication by Nissum and Foucher [22]. Segregation of abundant proteins such as albumin into specific fractions can be achieved using FFE under native or denaturing condition [22] and that gave an alternative to immunoaffinity technique for enrichment of low abundance proteins, similar strategy can be used in case of cyanobacteria where abundant proteins such as phycocyanin and phycobilisomes can be depleted using FFE.
Given the advantages and capabilities of FFE, we present an analysis of the proteome of the cyanobacterium Nostoc punctiforme (hereafter denoted as N. punctiforme) using the IEF mode of separation. This proteome has been studied before using ion exchange 2D-LC [23] and 2D-SDS-PAGE [24,25]. We aim to provide an alternative protein prefractionation strategy that attempts to address the issue of separation of high abundance phycobiliproteins.

Cell culture and lysis of Nostoc punctiforme
The filamentous, heterocystous cyanobacterium N. punctiforme strain PCC 73102 (also ATCC 29133) was cultured in 500 mL Erlenmeyer flasks containing 200 mL BG-11₀ medium under constant irradiation of 45 μmol of photons m -2 s -1 at 25°C [26]. Cells were harvested at midexponential growth phase, and cultures were centrifuged at 4000×g, for 10 min at room temperature.

Protein extraction
Cells were centrifuged and washed once with extraction buffer 40 mM Tris-HCl pH 8.5 and proteins extracted via mechanical disruption. Crude protein extract was centrifuged at 21,000xg and the supernatant stored at -20°C prior to analyses. The total protein concentration was determined using RC-DC Protein Quantification Assay (Bio-Rad, Hertfordshire, U.K.). 1 mg of the proteins was used for FFE IEF mode separation.

Free flow electrophoresis separation
FFE set-up and separation have been described elsewhere [17,27]. In brief, FFE-based IEF was performed using BD ™ Free Flow Electrophoresis System (BD GmBH, Germany). A concentration of 1mg/ml protein sample was prepared in separation medium containing 7 M urea, 2 M thiourea, 250 mM mannitol and carrier ampholytes (BD™ IEF Buffer pH 3-10). An IEF pH gradient of 3-10 was generated, as confirmed by a pI marker test. The crude proteome sample was processed focused under the following conditions: Constant voltage and current at 520 V and 13 mA; separation buffer total flow rate at 60 mL/hr and sample flow rate at 1.0 mL/hr. IEF protein separation was carried out at constant temperature to minimise sample degradation during focusing (10°C).

Sample clean-up and enzymatic digestion
To allow improved compatibility with downstream LC-MS/MS methods, urea/thiourea content in IEF separated protein fractions (fractions 23-73, see later sections) were firstly diluted in 40mM Tris-HCl pH 8.5 buffer to give <4M urea concentration to give better compatibility with our LC-MS workflow. Samples were then spincleaned using Centricon Ultracel YM-3 (3 kDa cutoff) membrane spin filters (Millipore, Hertfordshire UK) according to the manufacturer's instructions. Spin-cleaned samples were further exchanged in 3 cartridge volumes of 40mM Tris-HCl pH 8.5 to remove excess urea. Samples were subsequently prepared in solution with minimal transfer volume (<50 µl). Protein disulfide bonds were reduced with tris (2-carboxyethyl) phosphine at a final concentration of 12.5 mM at 37°C for 1 h and 55mM final concentration of iodoacetamide (1 hour, room temperature). The individual fractions were then incubated with 1µg of sequencing grade trypsin in 37°C for 12 hours (Promega, Southampton, UK).

Nano LC-MS/MS quadrupole time-of-flight mass spectrometry
Nano-ESI-MS/MS was performed on a QSTAR-XL tandem mass spectrometer (Applied Biosystems, MDS-Sciex) coupled with an Ultimate 3000 nanoflow HPLC (Dionex, Surrey, U.K.). Dried FFE fraction peptides were loaded into a 5 cm, 300 μm i.d. LC-Packings C 18 PepMap trap cartridge under Buffer A (consisting of 0.1% formic acid in 5% ACN) and eluted to 15 cm, 75 μm i.d. LC-Packings C 18 PepMap analytical column via Buffer B (0.1% formic acid in 95% ACN). The nanoLC gradient was 75 min in length with the first 8 min comprising 5% B, followed by 63 min ramping from 5 to 90% B and then 7 min of 90% B before a final 5 min of 5% B. The flow rate of the gradient was 300 nLmin -1 . Electrospray fused silica PicoTip TM needles were obtained from New Objective (Woburn, MA), and the spray voltage was set at 5.5 kV. The MS data acquisition was performed in the positive ion mode and was piloted by Sciex-Analyst (MDS Sciex, Concord, Ontario, Canada) using automatic switching between MS and MS/MS modes.

Protein identification and bioinformatic data analysis
Tandem MS data from QSTAR XL was first converted to generic MGF peaklists using the mascot.dll embedded script in Analyst QSv. 1.5 (Applied Biosystems, Sciex; Matrix Science). Spectral data was interrogated using an in-house Phenyx algorithm cluster (binary version 2.6; Genebio Geneva) at the ChELSI Institute, University of Sheffield. The FFE data was interrogated using the Nostoc punctiforme database downloaded from NCBI http://www.ncbi.nlm.nih.gov/ (6776 proteins, May 2009) and parsed using Phenyx parser. The search was enzymatically restricted to trypsin with no restriction on molecular weight and isoelectric point and one missed cleavage was allowed; taxonomy was fixed to root so as to search all entries in the database; carbamidomethylation of cysteine was selected as fixed modification and oxidation of methionine was allowed to be variable. All details of the search parameters are given in supplementary information file (Phenyx search parameters.pdf). A turbo scoring tolerance of 0.3 Da was set. All spectra were searched against both forward and reversed (so as to identify the false discovery rate [28] (FDR)) Nostoc punctiforme database. All proteins identified with at least 2 peptides were considered to be true hit. Physico-chemical parameters including the grand average hydropathy (GRAVY) index [29] was calculated using ProtParm web-tool service [30] (http://www.expasy.ch/tools/ protparam.html). Proteins showing a positive GRAVY index were considered as hydrophobic and with negative index considered as hydrophilic (GRAVY value ≥ +0.3). Prediction of protein subcellular localization was carried out using PSORTb v2.0 [31], while the detection of lipoproteins proteins was performed using LipoP web-tool [32].

Results and Discussion
From the onset of the first published analysis of N. punctiforme proteome in 2004, seven separate peer reviewed investigations [23][24][25][33][34][35][36] have appeared. While their relevant objectives were unique and with different intended applications, most workflows were based on model techniques: either PAGE-based analysis or shotgun approaches (Figure 1a). Thus far, workflows reliant on multidimensional separation strategies appeared to be more robust and effective in terms of identification and applicability for quantitative measurements. Current proteome coverage estimations (c.a. 25%, see subsequent sections) suggest that there is still considerable room for improvement before a complete/near-complete coverage can be attained [37]. The following subsections describe the biological relevance, physiochemical properties, and the general impact of the identified proteins for the proteomic understanding of N. punctiforme.

Online LC-MS/MS protein identification analysis
The LC-MS/MS analysis of the FFE separated proteins resulted in the confident identification of 4925 peptides (global peptide level FDR [38] < 3%), yielding tandem MS ion evidence for 342 unique proteins with 207 ≥2 peptides. A master list of all the previously reported N. punctiforme proteins (1774) observed in proteomics studies to date [23,24,[33][34][35][36]39], representing 25% of the theoretical proteome, was compiled and a comparison made between it and of the proteins identified in this study ( Figure 1b) and supplementary material S1). This comparison revealed that 94 new proteins were found here (supplementary material S2). Their distributions were also metabolically directed, as 7% and 8% of these were from the transcription and signal transduction metabolic regions respectively. There were also a number of identifications that were unique to this FFE study: Npun_F0166 (pI=12.024) and Npun_F3727 (pI=3.824) being the proteins at two extremes of the pI range (for a detailed list of unique identified proteins in the FFE workflow, refer to Supplementary material S2).

FFE separation
After IEF focussing, we observed that a majority of the proteins in N. punctiforme were focused primarily between fractions 23-73, a region corresponding to a 2.56 to 10.73 pH range. The observed data agrees well with the broad theoretical proteome map of N. punctiforme (Figure 1a). For each fraction, the pIs of the confidently identified proteins determined in a given fraction were averaged and compared with the measured pH attained as a result of the IEF gradient. In current study, we observed differences between theoretical p I and the pH of the fraction where the protein was found. To check this, the protein distribution of each fraction was analysed according to their pI. Please refer to Figure 3a and 3b. We observed a vertical spread in the theoretical pI range for proteins observed in each fraction. We feel that this could be attributed to the protein hydrophobicity and a number of factors, such as the degree of denaturation, overall charge, critical concentration of the proteins, post-translational modifications, electrophoretic suppression, and reliability of the theoretical predictions. For example Npun_F0166 (hypothetical protein, pI=12.02  and focused at pH 4-6) and Npun_F3800 (acyltransferase 3 (pI=9.463 and focussed at pH 6-7) [40. A spread in observed pI vs theoretical pH were also reported earlier in Sulfolobus solfataricus P2 by Chong et al. [41] using strip-based IEF. A key to this of course is whether the approach produces a good separation compared to other methods. One of the primary concerns of the IEF-based FFE is the potential for highly abundant proteins with significant charge heterogeneity, to adversely affect the focusing resulting in broadening into neighbouring regions. This phenomenon in turn affects the efficacy of the separation, and results in an increase of the dynamic range in the affected fractions, since poorly focused abundant proteins would interfere with the low abundance proteins distributed in multiple adjacent fractions. In this respect, we were able to observe 12 highly abundant proteins present in consecutive fractions (refer to supplementary material S3, fractions collected at pH 6-7). These proteins primarily cover house-keeping proteins, such the phycobilisome complexes and those present in the central metabolic pathways (see supplementary materials S3), though the degree of fractional crossover appeared much better controlled than those previously reported using more conventional IEF techniques ( Figure 4) [42,43].

Comparison of protein properties, COG, network and coverage maps
To further aid the classification and characterisation of these identified protein classes, we also assessed the trans-localisation of the 248 proteins (all ≥2 peptides) using markers predicted by the PSORTb and SOSUIsignal [44,45]. Nine proteins were predicted to be hydrophobic, as identified by their GRAVY index (Table 1). Of these, 4 proteins have pI more than 9. Npun_F0166 and Npun_F5190 were predicted to possess conserved motifs for peptidase I signal, while the identified glycolipid transporters (Npun_F2140) were found to possess an expected transmembrane helicase. More notably, we demonstrate that an integrated FFE workflow provided an improved distribution in the identified acidic and neutral proteins as compared with existing studies [46]. 10 proteins were predicted to carry pI > 10, of which one hypothetical protein, Npun_F0166, was uniquely detected here. Eighteen cytoplasmic membrane proteins, together with five outer membrane proteins, were identified along with GRAVY supplementary estimations. These belong primarily to phycobilisome linkers, membrane secretion and transport protein classes. We also report 21 predicted lipoproteins, which may contain motifs for peptidase signal I, II and transmembrane helicases (supplementary material S2). The consensual overlap between the different techniques ( Figure 1b) results in only 29 (1.6%) common identifications, while HT-based methods shows only 135 (11.53%) overlap. Of the twentyfive functional categories compared in Figure 4 functional overlaps between the identified proteins in N. punctiforme translates to 38% coverage of deduced COG proteome classes. The largest percentage of proteins characterized so far was derived primarily from translation, ribosomal structure and biogenesis categories (57%). In terms of known network coverage, if we superimpose the current data with the KEGG metabolic network results in up to 23.35% of the predicted enzymatic network and 25.49% of the different metabolic pathways if the 1700 proteins were measured together in an independent study. Thus, our FFE pilot achieves 18.95% of the defined E.C. network [47]. We feel at least with the current set of reported observations, an FFEbased analysis has provided improved access to a number of important metabolic categories such as homologous recombination, mismatch repair, nucleotide excision repair, base excision repair, carotenoid biosynthesis, sulfur metabolism and lipoic acid metabolism. There are however, a number of metabolic pathways that are still poorly represented such as RNA processing and modification and chromatin structure and dynamics and cell motility. The underpinning reasons for  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  41  43  45  47  49  51  53  54  56  58  60  61  63  65  66  67  69  70  72  73  74   this however, are understandable given the dynamic expression ranges between primary and secondary metabolic processes [10].

Conclusion and Future Perspectives
In conclusion, we found FFE to be useful for the fractionation of the whole N. punctiforme cell lysate under denaturing conditions. More broadly, the technique can be used individually or as a combination with other separation technologies. IEF based electrophoresis and RPLC are orthogonal techniques and we found that the combination of these two techniques is useful for high throughput protein separation in the cyanobacteria. This technique can also be implemented with other high throughput and conventional proteomics techniques such as gel-based and gel-free (shotgun); as demonstrated in other organisms [15,20,21]. As shown in the Figure 2, the consensual overlap between different techniques is only 1.9%, which makes FFE a highly complementary tool for mining the cyanobacterial proteome.
Using this technique, we were able to identify 32 membrane and transmembrane proteins, while allowing access to low abundance proteins which are typically difficult to identify using traditional techniques [48]. With this, it has also enabled improved characterisation of a number of metabolic classes (i.e. nucleotide excision repair, base excision repair, carotenoid biosynthesis, sulfur metabolism and lipoic acid metabolism) which have previously been poorly observed in N. punctiforme (Please see supplementary material S2 for proteins found unique to the FFE workflow as compared to other work-flows). More importantly, the effective ability of FFE to segregate phycobilisome proteins would allow more proteins to be identified. More broadly, we envisage the potential for FFE to separate the different cell types in N. punctiforme and also other cyanobacteria (vegetative cells, heterocyst, harmogonia and akinetes); given the technique has been demonstrated to separate different cellular compartments and living cells [49].

ORF
Abbreviations: M=membrane, C=Cytosolic, Unk=unknown, CM=Cytoplasmic Membrane, Cyt: Cytoplasmic Tables 1: Hydrophobic proteins detected in this study with information on their physico-chemical and biological characters such as Gravy Index Score, instability index and corresponding p H at which they were eluted in FFE wells. Biological characters showing their predicted sub-cellular location studied using PsortB and SOSUI algorithms. Proteins were also studied using LipoP algorithm for prediction of lipoproteins and discrimination of lipoprotein signal peptide and other signal peptides.
Studentship support from the Ministry of Social Justice and Empowerment, Govt of India (grant no.11015/17/2005-SCD-V) to NW.