Klaus Jung^{1}, Gereon Poschmann^{2}, Katharina Podwojski^{3}, Martin Eisenacher^{2}, Michael Kohl^{2}, Kathy Pfeiffer^{2}, Helmut E. Meyer^{2}, Kai Stühler^{2}^{†} , Christian Stephan^{2*†}  
^{1}Abteilung Medizinische Statistik, GeorgAugustUniversität Göttingen, Göttingen, Germany  
^{2}Medizinisches ProteomCenter, RuhrUniversität Bochum, Bochum, Germany  
^{3}Fakultät Statistik, Technische Universität Dortmund, Dortmund, Germany  
^{†}Both authors contributed equally to this work  
Corresponding Author :  Christian Stephan, Medizinisches ProteomCenter, RuhrUniversität Bochum, Universitätsstraße 150, D44801 Bochum, Germany, Tel : +492343229288, Fax : +492343214554, Email : christian.stephan@ruhrunibochum.de 
Received December 19, 2008; Accepted February 09, 2009; Published February 20, 2009  
Citation: Klaus J, Gereon P, Katharina P, Martin E, Michael K, et al. (2009) Adjusted Confidence Intervals for the Expression Change of Proteins Observed in 2Dimensional Difference Gel Electrophoresis. J Proteomics Bioinform 2:078087. doi:10.4172/jpb.1000064  
Copyright: © 2009 Klaus J, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.  
Related article at Pubmed Scholar Google 
Visit for more related articles at Journal of Proteomics & Bioinformatics
Differential proteome analyses focus on the detection and quantification of expression changes between samples from different biological groups. While the significance of an expression change is detected by some statistical test, the strength of an expression change is usually quantified by some ratio estimate, e.g. the ‘fold change’. Due to its quantitative character, the fold change is more intuitively for biologists than the decision of a statistical test. However, strong expression changes are often misleading if this change is not significant. For this reason, we propose the employment of confidence intervals, adjusted for multiple hypotheses testing, which naturally comprise both, test decision and quantification. The adjusted confidence intervals can be used for making test decisions under the control of error rates typically considered in multiple hypotheses testing (e.g. the familywise error rate or the false discovery rate). For biologists, test decisions based on adjusted confidence intervals offer a more intuitive method for selecting proteins with a significant expression change between two groups. The length of the intervals can be used for sample size planning of upcoming experiments. Our approach is primarily addressed to protein expression data recorded by twodimensional Difference Gel Electrophoresis.
Keywords 
Differential proteome analysis; Confidence intervals; Multiple hypothesis testing; 2Dimensional Difference Gel Electrophoresis 
Abbreviations 
2DE: TwoDimensional Gel Electrophoresis; DIGE: Difference Gel Electrophoresis; FWER: FamilyWise Error Rate; FDR: False Discovery Rate. 
Introduction 
Typically, a differential proteome study is performed for comparing samples from distinct biological groups, each consisting of several individuals. Stühler et al., (2006) for example compared pairwise the brain proteome of mice from groups of different developmental stages, embryonic, juvenile and adult. Another experimental design is the comparison of proteome samples from several individuals before and after treatment. Differences in the proteome of a neuroblastoma cell line before and several hours after stimulation with a nerve growth factor were for example studied by Sitek et al., (2005). The common goal of such studies is to detect candidate proteins involved in relevant biological pathways or as drug targets coregulated with the examined biological groups or with the applied treatments. Here, we consider particularly the case of two independent groups. 
The application of twodimensional gel electrophoresis (2 DE) means to compare the expression levels from thousands of proteins and their post translational modified species simultaneously. 2DE is an important tool for measuring protein expression levels by separating proteins by their isoelectric point and molecular size (Klose and Kobalz, 1995). Being covalently labelled by a fluorescent dye or silverstained before and after separation, respectively, the proteins are revealed as distinct spots on the 2D gels, where intensity of a spot can be taken as a measure of abundance. The image of each gel is digitized by a scanner and spots are detected by an automatic image analyzing algorithm. 
An improvement of 2DE is twodimensional Difference Gel Electrophoresis (2D DIGE), which allows to analyze up to three distinct proteome samples, labelled by three different fluorescent dyes (Cy2, Cy3 and Cy5), within the same gel (Ünlü et al., 1997). Usually, these three samples are one observation from the treatment group, one from the control group and one is an internal standard. Consequently, equal group sizes are normally used in such an experimental setup. The internal standard is a mixture of the proteome samples from all individuals involved in the experiment. It is used to standardize the recorded expression levels of the treatment and control group and making thus the separate gels comparable. A 2D DIGE gel in the described setup yields three different digital images, which are obtained using different emission wavelengths for scanning, specifically for each of the dyes. In this article we specifically regard data from the 2D DIGE technique, though our approach is also applicable to protein expression levels recorded by other quantitative proteome techniques. In order to match the spot maps of the individual gels, one master gel is selected (which is representative for the experiment) and all spot maps from the other gels are matched to the spot map of the master gel. Because not all protein spots that were detected on the master gel are found in the other gels the effective sample size is different for single proteins (Jung et al., 2005; Jung et al., 2006). 
Significant expression changes between treatment and control sample are usually detected by classical ttests in 2 D DIGE experiments. While the pvalues resulting from these tests can be taken as a measure of significance, they do not provide quantitative information and are often not intuitively comprehensible for nonstatisticians. For this reason, besides detecting statistical significance of expression changes by ttests, the strength of those changes is usually quantified by some ratio estimate, e.g. the ‘fold change’. If the ratio is greater than one, the protein is called upregulated and it is called downregulated if the ratio falls below one. It is a common practice in proteomics studies to preselect proteins for further laboratory analyses not only by test decisions, but also by the magnitude of the ratio. However, the applied thresholds for these ratios are mostly unjustified values. For instance, it has been suggested to utilize two times the standard deviation of an ingel analysis (DIA, Difference in Gel Analysis) as the threshold for the determination of significant expression changes (Karp et al., 2004). Furthermore, these ratio estimates can be misleading by suggesting a strong expression change for a certain protein, while the variance of this ratio is also very high and the expression change is thus not significant. Hence, a ratio of for example 1.5 can have different accuracy and does not necessarily mean that the respective protein is truly upregulated. Thus, making a test decision by using a confidence interval instead of the pvalue, gives on the one hand not only the information for rejection or nonrejection of the null hypothesis, but relativizes the view to the simple fold change on the other hand. 
Therefore, we propose here to make use of the general duality between test and confidence interval. A confidence interval is generally defined as a region that covers the true parameter with probability (1 – α). If one quantifies the strength of the expression changes by confidence intervals, one can call a particular protein significantly upregulated if the lower confidence limit exceeds one and downregulated if the upper limit falls below one. This has the benefit of testing and quantifying at the same time. 
Searching for differentially expressed proteins means to perform thousands of tests simultaneously. If the individual significance level for each test is for example α = 5%, a high number of false positive findings is to be expected in many cases. In fact, if the individual significance level for each test is α = 5%, the global significance level α^{*} increases by 1 – (1 – α)n, where n is the number of individual tests (Lehmann, 1986). Thus, the global significance level α^{*}, i.e. the probability for making a false test decission becomes nearly equal 1 if the number n of tests is just one hundred. Therefore, certain error rates are usually controlled in multiple hypothesis testing, e.g. the false discovery rate (FDR) or the more strict familywise error rate (FWER). These error rates can for example be controlled by adjustment algorithms for the raw pvalues of the tests. Test decisions based on adjusted pvalues result in smaller portions of false positive findings than those based on the raw pvalues. Here, in a twostep algorithm, we firstly calculate such adjusted pvalues for finding an adjusted α level, and secondly use this αlevel for calculating the confidence intervals. Thus, the test decisions based on the confidence intervals are also under control of the chosen error rate. Whereas the duality between test and confidence interval is a natural fact in statistical test theory, and pvalue adjustments for certain error rates are already widely used, a combination of both for analyzing highdimensional data has not yet been used within the field of proteomics. 
This paper is organized as follows: In the next section we present our experiment, formalize the data structure resulting from 2D DIGE experiments and detail the adjustment of confidence intervals. In the subsequent section we present the results of a 2D DIGE study, analysing a lung adenocarcinoma cell line A549 upon TGFβ treatment applying our method. At the end we give a discussion of our method and the results. 
Material and Methods 
Data Acquisition: 2DDIGE Analysis of a Cellular Model for Lung Fibrosis 
In order to collect a data set, large enough for in depth statistical analysis, we choose a cellular model for lung fibrosis. By stimulating the well established lung adenocarcinoma cell line A549 (Giard et al., 1973) with the growth factor TGFβ we hope to additionally get insights in the process of TGFß signal transduction and secondary cellular effects. TGFb is known to be a mediator of multiple processes in lung tissue like wound repair, inflammatory processes, senescence or survival of certain cell populations. In our 2D gel based approach we compared TGFβ stimulated with nonstimulated cells: we used cell lysates prepared from 40 independent culture dishes – 20 for each group (TGFβ treated and untreated). Expression levels of these samples were measured by the use of 2DDIGE gels (method described by Sitek et al., 2005). For this 50 µg of cell lysates were labelled with fluorescence dyes (TGFß treated: Cy3; untreated: Cy5; internal standard – a mixture of all samples: Cy2), combined and separated in two dimensions (Klose and Kobalz, 1995). Subsequently the 20 produced gels were scanned using the Typhoon 9400 scanner (GE Healthcare) and three images per gel acquired representing the three different Cydye channels resulting in 60 spot map images. 
Before image analysis with the DeCyder 2D 6.5 software (GE Healthcare) the images were cropped with ImageQuantTM software (GE Healthcare). The intragel spot detection was performed using the Differential Ingel Analysis (DIA) mode of the DeCyder software. The estimated number of spots was set to 10,000. An exclusion filter was applied to remove spots with a volume value < 20,000. The DeCyder BVA module has been used for spot matching and quantitation of spot volume data. The software algorithm first applies a normalization procedure resulting in normalized spot volumes for each spot map. Based on these normal volumes, standardization was performed by building ratios between the Cy3 and Cy2 channel (internal standard) and Cy5 and Cy2 channel respectively of each spot pair. The resulting normalized and standardized spot volumes were used for further calculations: the mean volumes of matched spots were calculated for each group and provided the basis for building spot volume ratios. 
Adjustment of Confidence Intervals 
We begin this section with a short scheme of the data structure resulting from a 2D DIGE experiment. Let n be the total number of protein spots detected on the master gel and the individual spots themselves be indexed by j (j = 1,..., n). As mentioned before, not each spot detected on the master gel is found on all other gels and available sample sizes m_{j }are thus different for the individual spots. For spot j, m_{j }denotes the number of gels were this spot has been detected by the automatic image analysis algorithm. The expression levels for spot j can be denoted by x_{j} = (x_{j1},...,x_{jmj}) for the treatment group and by y_{j} = (y_{j1},...,y_{jmj}) for the control group. We reemphasize that the available sample size m_{j} of each spot is the same in both groups, because missing spots ‘occur’ always simultaneously in a 2D DIGE experiment in the used setup. 
In the following, we assume that x_{j} and y_{j} represent already preprocessed expression levels, i.e. they are normalized, standardized and logtransformed. 
A typical measure which is often used for estimating the expression change of a particular protein j is called the ‘fold change’ R_{j}. For protein j it is defined by: 
(1) 
where X̄ ^{*}_{j}and Ȳ ^{*}_{j}are the geometric means of the expression levels in the treatment and control group, respectively, before logtransformation. Thus, a fold change of 2 means a 2fold upregulation in the treatment group, whereas a fold change of 0.5 means a 2fold downregulation. For better interpretation, fold changes less than one are often represented as their negative reciprocal. 
For assessing the significance of a protein’s expression change, Welch’s ttest is usually used in the analysis of such 2D DIGE experiments. The null hypothesis of this test is that there is no difference between the means of the logtransformed expression levels of the two groups. A confidence interval for this difference is given by 
(2) 
where X̄_{j}, Ȳ_{j},S^{x}_{j} and S^{y}_{j} are the means and standard deviations in the two groups t_{v1α/2 } and is the (1 – α / 2) quantil of the tdistribution with 
(3) 
degrees of freedom (Moser and Stewens, 1992). Simple confidence intervals for the fold change can now be obtained by exponentiating the confidence limits in equation (2). Instead of making a test decision by the pvalue of the ttest, one can use these confidence intervals for deciding whether a protein is significantly differentially regulated: a significant upregulation is given if the lower bound exceeds one, and a significant downregulation is given if the upper confidence bound falls below one. 
Testing each of the several thousand proteins with an individual significance level of α = 0.05 may possibly result in a large number of false positive test decisions. Therefore, an adjustment of pvalues with respect to some error rate is common. Two special error rates that are often desired to be controlled in multiple hypothesis testing are the familywise error rate (FWER) and the false discovery rate (FDR). The FWER is defined as the probability of having at least one false positive. This error rate can be controlled by the Bonferronicorrection, which is either made by performing each test with an adjusted significance level of α^{Bonf} =α / n, i.e. the individual significance level divided by the number n of tests, or by adjusting the raw pvalue P_{j} of protein j by 
(4) 
and comparing P^{Bonf} _{j}with the unadjusted αlevel. Either procedure guarantees an FWER of α. 
A less strict error rate is the FDR, which is defined as the portion of false positives among all positive test decisions.For controlling the FDR, pvalues can be adjusted by the procedure of Benjamini and Yekuteli (Benjamini and Yekuteli, 2001), which should particularly be used if a correlation between variables is obvious. This procedure starts by sorting the raw pvalues in an ascending order: P_{1}≦P_{2}≦....P_{n}. The adjusted pvalues are than given by 
(5) 
(Dudoit et al., 2003). Assume now that k proteins have an adjusted pvalue p^{BY} smaller than α. Instead of using these adjusted pvalues one can also compare the raw pvalues against an adjusted αlevel α^{BY} = α × k / c, where 
c=∑^{n}_{j1} 1/j,for making a test decision. 
The adjustment of the confidence intervals in equation (2) with respect to the FWER or the FDR is simply given by exchanging the raw αlevel by α^{Bonf} or by α^{BY}. 
Results 
Automatic image analysis of our gels, without user interferences, carried out by the DeCyder software, results in 3359 protein spots on the master gel. Because not all spots from the master gel were detected in all gels, different available samples sizes for the 3359 proteins were obtained (Fig. 1). For the further analyses we included only those 3149 spots that have a sample size of at least five observations. 
For detecting differentially regulated proteins we first performed a twosample ttest for each spot using individual alevels of 0.05 resulting in a total of 1916 differentially regulated proteins. According to their fold change, 948 of them are upregulated and 968 are downregulated. In order to reduce the number of false positives we adjusted the raw pvalues according the method of Benjamini and Yekuteli to obtain an FDR of 5%, yielding 1447 differentially regulated proteins, 679 of them upregulated and 668 downregulated. The stricter FWERadjustment yields 388 upregulated and 359 downregulated proteins. 
Next, instead of using the pvalues for making a test decision, we derived the unadjusted and adjusted confidence intervals, detailed in the method section, and called a protein significantly upregulated if the respective lower confidence limit exceeded one, and called it downregulated if the respective upper confidence limit fell below one. Due to the duality of test and confidence interval, these procedures, in the unadjusted and in the two adjusted cases, yield the same sets of up and downregulated proteins as the pvaluebased test decisions. i.e., the lower limit of an upregulated protein exceeds one exactly when the related ttest pvalue falls below 0.05 (Fig. 2). Reversed, the upper limit of a downregulated protein falls below one, when the related pvalue falls below 0.05. Thus a test decision can also be made based on the confidence intervals instead of using the pvalues. It is a fact that many nonstatisticians do not realy know how to interpret the pvalue. It is however very easy to understand that a spot is upregulated if the lower limit of a confidence interval for the fold change exceeds one. 
The lengths of confidence intervals naturally depend on the sample size (Fig. 3) and on the variance of the data. In particular, the lengths of the intervals decrease when the sample sizes increase. In our case the lengths also depends on the adjustment method. The smallest intervals were the unadjusted ones, whereas the FWERadjusted intervals were the largest. The largest confidence interval (reaching from 0.0015 to 1045.49) in our study was obtained when using the Bonferroniadjustment, where the samples size for the associated protein was m_{j} = 5. The largest confidence interval when using the BenjaminiYekuteliadjustment reaches from 0.31 to 24.66 (associated sample size: m_{j} = 7). 
Discussion 
We proposed to use adjusted confidence intervals, instead of pvalue based test decisions, for the detection of differentially regulated proteins observed in 2D DIGE experiments. Test decisions based on confidence intervals and those based on pvalues are naturally equivalent, so that the same sets of up and downregulated proteins are obtained. Adjustments can control typical error rates used in multiple hypotheses testing like the FWER or the FDR. Other typical adjustments of multiple hypotheses testing, that were not considered in this study are possible, as for example the FDRprocedure of Benjamini and Hochberg, (1995), which is based on stronger assumptions with regard to the correlation structure between the single tests. The confidence interval method was more intuitive for nonstatisticians and allows easier biological interpretations. It is a fact that many proteome scientists overestimate the relevance of proteins with a high fold change without considering the statistical significance of an expression change. However, proteins with the same fold change can have a completely different significance. Table 1 presents two proteins with a fold change of about 1.5. The corresponding confidence intervals showed that the protein with rank 1363, depending on the type of adjustment, can not necessarily be seen as differentially expressed. In particular, that protein with the largest positive fold change (R_{j} = 2.79) in our experiment is not statistically significant (raw pvalue = 0.09, see also Figure 5). 
The length of a confidence interval depends always on the available sample size. Therefore, this method can also be used for sample size decisions for upcoming studies, by choosing an appropriate degree of precision for the quantification of the expression changes. Especially for the most significant proteins, the length is rather small, and therefore the expression change unambiguous (Table 1). 
Before calculating ttests and confidence intervals, expression levels were normalized, standardized and logtransformed. Normalization is usually done for removing dyespecific gains in the raw expression levels. This can for example be done by normalization methods used in DNA microarray experiments (Bolstad et al., 2003). Standardization is performed by dividing all values from the treatment and control group by the corresponding values of the internal standard. This preprocessing step is needed for making the distinct gels comparable. The last step in preprocessing is usually a log or asinhtransformation for stabilizing the variance, which has been found to depend on the mean value of a spot (Kreil et al., 2004). 
A lack of our analysis is the normality assumption that is made by the ttests and the confidence intervals. Furthermore, we assume that there is still a bias between detected and true fold change. Bolstad, (2004) detected such a bias in the analysis of gene expression data recorded by DNA microarrays. 
At the end of this article we want to give a short biological discussion of our experiment. We detected a large number of differentially expressed proteins, even when using the strict Bonferroni adjustment. This can be traced back to the fact that TGFβ induced changes affect many different groups of proteins (Derynck and Akhurst, 2007). The cells change their morphology and their growing characteristics. So proteins of the cytoskeleton, energy and metabolic pathways as well as proteins of different signalling networks are affected which present a huge part of the cells’ total proteome. Indeed, a high portion of differences in untreated and TGFβtreated A549cells was also detected on the RNA level by Keating et al., (2006). Additionally, many of the differentially regulated proteins show only a small expression change and are not necessarily biologically relevant. The distribution of their fold changes is displayed in Figure 4. 551 of the 747 protein found with the Bonferroni adjustment have an absolute fold change higher than 1.3, 251 an absolute fold change higher than 1.5 and only 35 of them have an absolute fold change higher than 2.0. 
Currently a software, called Statistical DIGE Analyzer, for the analysis of data from 2D DIGE experiments is being developed at the Medizinisches ProteomCenter. This software will include the methods proposed in this article. 
Acknowledgments 
This work was supported by the Ministry of Innovation, Science, Research and Technology North Rhine Westphalia “Ziel 2Programm NRW 20002006” (grant PtJAZ0511V01 and Nachwuchsgruppe Neuroproteomics) and ProDaC. ProDaC is funded by the European Commission (6th framework programme, project number LSHG CT 2006 
References 
