Improving Label-Free Quantitative Proteomics ... - ACS Publications

Apr 3, 2015 - Department of Pathology and Laboratory Medicine, The University of Kansas Medical Center, 3901 Rainbow Boulevard, Kansas. City, Kansas ...
0 downloads 0 Views 857KB Size
Subscriber access provided by RUTGERS UNIVERSITY

Article

Improving Label-Free Quantitative Proteomics Strategies by Distributing Shared Peptides and Stabilizing Variance Ying Zhang, Zhihui Wen, Michael Paul Washburn, and Laurence Alexandra Florens Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/ac504740p • Publication Date (Web): 03 Apr 2015 Downloaded from http://pubs.acs.org on April 10, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Improving Label-Free Quantitative Proteomics Strategies by Distributing Shared Peptides and Stabilizing Variance Ying Zhang1‡, Zhihui Wen1‡, Michael P. Washburn1, 2, and Laurence Florens1* 1

Stowers Institute for Medical Research, 1000 E. 50th Street, Kansas City, Missouri 64110, USA Department of Pathology and Laboratory Medicine, The University of Kansas Medical Center, 3901 Rainbow Boulevard, Kansas City, Kansas 66160, USA

2

ABSTRACT: In a previous study, we demonstrated that spectral counts based label-free proteomic quantitation could be improved by distributing peptides shared between multiple proteins. Here, we compare four quantitative proteomic approaches; namely the Normalized Spectral Abundance Factor (NSAF), the Normalized Area Abundance Factor (NAAF), Normalized Parent Ion Intensity Abundance Factor (NIAF), and the Normalized Fragment Ion Intensity Abundance Factor (NFAF). We demonstrate that label-free proteomic quantitation methods based on chromatographic peak area (NAAF), parent ion intensity in MS1 (NIAF), and fragment ion intensity (NFAF) are also improved when shared peptides are distributed based on peptides unique to each isoform. To stabilize the variance inherent to label-free proteomic quantitation datasets, we use cyclic-locally weighted scatter plot smoothing (LOWESS) and linear regression normalization (LRN). Again, all four methods are improved when cyclic-LOWESS and LRN are applied to reduce variation. Finally, we demonstrate that absolute quantitative values may be derived from label-free parameters such as spectral counts, chromatographic peak area, and ion intensity when using spiked-in proteins of known amounts to generate standard curves.

Label-free quantitation has emerged as an important and widely used tool for proteomics research.1, 2 Label-free quantitative proteomics analyses include the use of MS1 intensity or peak area,3-5 spectrum counting,6-11 and MS2 fragment ion intensity.12-15 In addition, a variety of software tools are available to implement such strategies1, 2, 16 and combining different workflows has been shown to improve results.17-19 Additional factors are also important for label-free quantitative proteomics workflows. Normalization of data is widely used in many approaches. We developed the normalized spectral abundance factor (NSAF),11 which has been adopted by others and is frequently used as a benchmark against which other methods are compared.13, 14, 20-22 However, one issue with the original NSAF approach was the inappropriate over counting of shared peptides. As a result, we developed the distributed NSAF (dNSAF), which we demonstrated provided superior results to the NSAF equation by analyzing a complex mixture of yeast proteins spiked in with different levels of albumins from different organisms.23 Properly dealing with shared peptides is an area of increasing research in quantitative proteomics and typically leads to improved results.23-25 Several algorithms have been used for normalization transformation of microarray gene expression datasets.26-27 One of the commonly used algorithms is the global normalization that averages measurements over the total measurements within an experiment. In fact, NSAF is technically the global normalization of Spectral Abundance Factor (SAF). The global normalization assumes that the total measurements within experiments are essentially similar. This assumption should be often true for technical replicates. Besides the global normalization step on SAF, the classic log2 transformation is also wide-

ly used to stabilize variance 27, 28. Besides its mathematical simplicity, the advantage of log2 transformation over other variance stabilizing transformation (VST) methods28, 29 is that it does not require technical replicates. Log2 transformation rescales datasets hence reducing heteroskedasticity and therefore is helpful for applying canonical statistical analyses. Log2 transformation can also improve the linearity between explanatory variable and response variable.27, 28 However, with log2 transformation the standard errors of the predicted means are also in log2 scale that should be mathematically converted back through the exponential operator. This step could easily lead to errors much larger than standard errors with linear transformation. For the purpose of predicting response instead of qualitatively evaluating the linearity with regression, the exponential standard errors with log2 transformation are a concern and therefore other variance stabilization methods should be considered. When technical replicates are available, other normalization transformation methods can adjust measurements so that replicates may be balanced appropriately to make meaningful biological comparisons. One of such methods is cyclic locally weighted scatterplot smoothing (cyclic-LOWESS).30 LOWESS normalization of datasets is widely used in large scale RNA analysis.29-34 LOWESS35 is a data smoothing algorithm through which each data point is fitted using weighted least squares within the neighborhood points, the width of the fitting window being set by a user-defined smoothing parameter. LOWESS is very flexible and suitable for general framework in data smoothing because it does not require a specified global function to fit a model.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 9

When experiments are technical replicates, linear regression normalization (LRN) may also been used. Different from cyclic-LOWESS, which does not require a specified global function to fit a model, LRN assumes that the systematic bias is linearly dependent on the magnitude of the measurement. The algorithm firstly makes a reference experiment that may be generated by taking the median or mean of the measurements for each protein. Then least square linear regression is performed with the constructed reference experiment’s measurements as explanatory variable and every original experiment’s measurements as response variable. For each protein, the normalization correction is then derived from the residual. Finally the obtained normalization corrections are used to adjust the measurements. LOWESS and linear regression normalization have both been used to improve quantitative proteomics data analysis by removing systematic biases.36-38 In the current study, we reanalyzed the dataset of albumins spiked-in a background of S. cerevisiae proteins that was used to develop the dNSAF approach23 to compare the strategies for dealing with shared peptides across different label-free quantitation methods. Furthermore, we applied cyclic-LOWESS and linear regression normalization to further reduce systematic biases in these label-free quantitative proteomics pipelines using our previous dataset23 as well as independently acquired datasets39. After these steps, we found that distributed Normalized Label-Free Abundance Factors (dNXAF) could be used to estimate absolute protein amounts that closely matched abundance values published for Saccharomyces cerevisiae proteins based on GFP analysis40 or targeted proteomics41 and for Escherichia coli proteins based on 2D-PAGE.42

(yNSAF), Normalized Peak Area Abundance Factor (yNAAF), Normalized MS1 Intensity Abundance Factor (yNIAF), and Normalized MS2 Fragment Intensity Abundance Factor (yNFAF), as defined in Table 1. The MS2 intensity values were directly collected from the .ms2 files generated using in-house developed software, RawDistiller v. 1.0.43 The MS1 intensity values and peak area values were obtained from the .raw files through the proprietary Thermo Scientific library XRawfile2.dll and XcaliburAnalysis.dll, respectively. C++ in visual studio 2005 was used to call the Thermo Scientific libraries. The Normalized Label-Free Abundance Factor (yNXAF) — where X stands for any of the four LF features defined above— was calculated as follow:

EXPERIMENTAL PROCEDURES

nXAFk =

Proteomics Datasets. Throughout this work, we used a dataset we previously generated consisting of six albumins from different species spiked at different amounts into a whole cell protein lysate from Saccharomyces cerevisiae23 and analyzed by 12 replicate runs on linear ion trap mass spectrometers (ftp://ftp.stowers.org/pub/washburn/Zhang_dNSAF_AnalChe m_2010_RAW-SQT/). We also applied our approach to an independently-acquired dataset consisting of Universal Proteomics Standard Set 2 (Sigma-Aldrich) diluted in a protein extract from Escherichia coli39 and analyzed in 3 to 4 replicate runs on three different types of mass spectrometers (ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2013/12/PXD0006 02/). Data Analysis. As previously described23, peptides detected when combining all replicate analyses were used to establish a master list of proteins, which could fall into three categories following the parsimony principle. “Unique” peptides included (i) peptides whose sequence matched only one protein; (ii) peptides unique to a group of undistinguishable protein; and (iii) peptides unique to a protein/protein group after removal of subset proteins. All calculations related to chromatographic and mass spectrometric data items were fulfilled automatically by the in-house written NSAF7 software23 coupled with Thermo Scientific XCalibur. NSAF7 was used to calculate normalized quantitative value of each protein/protein group based on the label-free (LF) features of its identified peptides (Table S1). Four LF features —spectral counts (SpC), peak areas (PA), parent ion intensities in MS1 (MI), and fragment ion intensities in MS2 (MF)— were used in turns to calculate Normalized Spectral Abundance Factor

yNXAFk =

yXAFk N

∑ yXAF

i

(1)

i =1

where subscript k denotes a protein/protein group identity and N is the total number of proteins i detected in an experiment, while yXAF is a protein’s label-free abundance factor that is defined as the sum of LF features for peptides mapped to this protein divided by its length. The prefix y in equation (1) distinguishes between the three main strategies used to deal with shared label-free features as described previously:23 (i) When the prefix y equals n,

uLFk + sLFk uLk + sLk

(2)

in which label-free features (sLF) and length (sL) for peptides shared between multiple proteins are not separated from LF features (uLF) and length (uL) for unique peptides. Therefore unique peptides and shared peptides are not distinguished and sLF may be allotted multiple times to different proteins;11 (ii) When y equals u (for “unique”),

uXAFk =

uLFk uLk

(3)

in which only label-free features (uLF) and unique sequence length (uL) from unique peptides are considered in the XAF calculation, while shared LF features (sLF) and shared length (sL) are dismissed;23 (iii) When y equals d (for “distributed”),

uLFk + ∑ j dXAFk =

uLFk



M

uLFm m =1

uLk + sLk

× sLFk j

(4)

in which shared label-free features (sLF) are distributed based on LF features unique to each protein k divided by the sum of all unique LF features for the M protein isoforms that share peptide j with protein k.23

ACS Paragon Plus Environment

Page 3 of 9

Analytical Chemistry

Table 1. Summary of different yNXAF strategies and their parameters

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Peptide-Level Quantitation

Protein-Level Quantitation

Label-free quantitative feature

Abundance yXAFa

Spectral counts (SpC)

Peak (PA)

Ways to handle shared peptides

ySpC

Area

Parent ion Intensity in MS1 (MI)

yPA

yMI

yMF

Normalized Abundance Factor yNXAF

nSpC

nSAF (eq. 2)

nNSAF

uSpC

uSAF (eq. 3)

uNSAF

ySAF

yNSAF

dSpC

dSAF (eq. 4)

dNSAF

nPA

nAAF (eq. 2)

nNAAF

uPA

yAAF

uAAF (eq. 3)

yNAAF

uNAAF

dPA

dAAF (eq. 4)

dNAAF

nMI

nIAF (eq. 2)

nNIAF

uMI

yIAF

dMI Fragment Ion Intensity in MS2 (MF)

Factor

uIAF (eq. 3)

yNIAF

dIAF (eq. 4)

uNIAF

nMF

nFAF (eq. 2)

nNFAF

uFAF (eq. 3)

uNFAF

yFAF

dFAF (eq. 4)

yNSAFk =

yNFAF

dNFAF

( ySpC / Length) k N ∑ ( SpC / Length)i i =1

yNAAFk =

( yPA/ Length ) k ∑ ( yPA / Length )i N

i =1

yNIAFk =

( yMI / Length ) k ∑ ( yMI / Length)i N

i =1

dNIAF

uMF dMF

Equations

yNFAFk =

( yMF / Length ) k ∑ ( yMF / Length )i N

i =1

a

When, the prefix y equals n, LF features from peptides shared between multiple proteins are assigned multiple times to all isoforms; when y equals u, only LF features from peptides unique to a protein/protein group are counted and shared peptides are discarded; the prefix d means that LF features from shared peptides are distributed amongst isoforms based on quantitative features for unique peptides.

Data Normalization. The routines for LOWESS were downloaded from http://www.netlib.org/go/. To adapt LOWESS to normalization transformation, an M vs. A plot was used as follows. Assuming we had two sets of label-free quantitative features, LFi,1 and LFi,2, where LF was spectral counts (SpC), chromatographic peak areas (PA), parent ion intensities in MS1 (MI), or the sum of fragment ion intensities in MS2 (MF); where subscript i denoted the protein identification; and where subscripts 1 and 2 denoted the MudPIT experiment identification. First, two variables were constructed: a response variable M that was the difference in log expression values, log2(LFi,1/LFi,2), and an explanatory variable A that was the average of log expression values, 0.5*log2(LFi,1*LFi,2). The algorithm performed the LOWESS curve fit through the MA plot. We used 0.4 as smoothing parameter for LOWESS. Then for each protein, the normalization correction factor was derived from the residuals. Finally, the obtained normalization corrections were used to adjust LF. The cyclic-LOWESS normalization transformation repeated this procedure with all possible pairs of experiments exhaustively. Multiple iterations of this process were implemented until a predefined epsilon criterion was reached. The epsilon we used was 0.005, computed as the difference between the current iteration’s averaged standard deviation and the last iteration’s averaged standard deviation divided by the pre-iteration’s averaged standard deviation. Linear regression normalization (LRN) was implemented as follows. We used the means of the label-free quantitative features (LF) to construct the reference experiment. Then a least square linear regression was performed between the constructed reference experiment’s measurements and the original experiment’s measurements. For each protein, the normalization correction was then derived from the residuals. Finally, the obtained normalization corrections were used to adjust the quantitative feature measurements.

The dNXAF values calculated after cyclic-LOWESS and LNR were labeled as dNXAFcLL to denote the additional normalization steps. Singleton proteins could only be normalized with log2 transformation and global normalization (from XAF to yNXAF). All other normalizations were not applied to proteins identified in only one replicate analysis. Statistical Analysis. The Mann-Whitney U tests were performed using OriginPro 9.1 (Table S2A, B, D). Differences in the slopes measured for the linear fits (performed in OriginPro9.1; Figure S1) between Log2(Amount) and Log2(uNXAF), Log2(dNXAF), or Log2(dNXAFcLL) for the six albumins were statistically assessed using the t-test in excel (Table S2C). To assess statistically-significant differences in dNXAF values before and after each normalization step (Table S2E), we used the QSPEC44-derived QPROT software (http://sourceforge.net/projects/qprot/) based on hierarchical Bayes estimation, with a burn-in parameter of 2000 and an iteration of 10000. Absolute Protein Quantitation. The slopes and intercepts of the linear regressions through log2-transformed dNSAFcLL, dNAAFcLL, and dNIAFcLL as a function of log2-transformed albumin amounts (performed in OriginPro9.1) were used to calculate the absolute amounts (in pmol) of the soluble yeast proteins present in the albumin isoforms samples (Table S3A). The slopes and intercepts of the linear regressions through log2-transformed dNSAFcLL and dNIAFcLL as a function of log2-transformed UPS2 amounts (performed in OriginPro9.1) were used to calculate the absolute amounts (in fmol) of the soluble E. coli proteins (Table S3B). Using the solution of the linear regressions through the n standard proteins’ log2 (Standard-Amount) expressed as a function of log2(StandarddNXAFcLL), the unknown amount of a background protein from S. cerevisiae or E.coli was derived from its log2(dNXAFcLL).

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

RESULTS AND DISCUSSION Definition of Equations. Chromatographic peak area (PA) and parent ion intensity in MS1 (MI) can be equally used as spectral counts for label-free quantitation. In a previous study,45 we have shown that in a LC-MS/MS platform with dynamic exclusion enabled, a peptide’s spectral counts could be mathematically expressed as a function of its chromatographic peak area. We hence defined Normalized Peak Area Abundance Factor (yNAAF) and Normalized Parent Ion Intensity in MS1Abundance Factor (yNIAF) based on the yNSAF calculation (Table 1). Like ySAF, yAAF is defined as a protein’s nominal peak area (summing up chromatographic peak areas of its peptides) divided by its nominal length. Similarly, yIAF is defined as a protein’s nominal parent ion intensity in MS1 (summing up ion intensities in MS1 of its peptides) divided by its nominal length. It was demonstrated that the sum of fragment ion intensities (MF) could also be used for label-free quantitation.13 We hence defined Normalized Fragment Ion Intensity in MS2 Abundance Factor (yNFAF) similarly to the yNSAF calculation (Table 1). Like ySAF, yFAF is defined as a protein’s nominal fragment ion intensity in MS2 (summing up its peptides’ fragment ion intensities in MS2) divided by its nominal length. nNFAF (Table 1) was the same calculation as the previously published label-free quantitative factor SIN.13 For homogeneity purposes, we used the NFAF nomenclature throughout and introduced the newly defined variations in calculation we implemented for dealing with label-free features from peptides shared between multiple proteins (u and d prefixes, for using only unique peptides and for distributing shared peptides, respectively). To summarize, we defined variations of the normalized label-free abundance factor yNXAF (Table 1) where the prefix y represented how shared peptides were dealt with and X could be equal to S/A/I/F to represent different label-free quantitative features including spectral counts (SpC), peak area (PA), parent ion intensity in MS1 (MI), fragment ion intensity in MS2 (MF). Comparison of Strategies. We set up to test how different label-free yNXAF strategies compared to one another using a previously established dataset consisting of trypsin digests of six albumin isoforms (Supporting Table S1) spiked at known concentrations in a background of Saccharomyces cerevisiae soluble proteins.23 The amounts of the six standard proteins were evenly distributed in logarithmic scale from 0.1 pmol to 10 pmol, and we acquired twelve technical replicates of each sample.23 All other parameters involved in the normalization calculations being equal allowed us to directly compare the performance of spectral counts, peak areas, parent ion intensities in MS1, and fragment ion intensities in MS2 in estimating protein levels. The averaged log2(yNXAF) values were plotted as a function of log2(Protein Amount) for the six albumin standards (Figure S1A-C) and linear regressions were performed for each of the yNXAF strategies. As observed for the yNSAF calculations,23 yNAAF/yNIAF/yNFAF also showed the following behaviors: (i) nNXAF values deviated from linearity by overestimating the lowest protein amounts and by underestimating the highest amounts; (ii) uNXAF and dNXAF had a much better linearity with protein amount than nNXAF. uNXAF and dNXAF calculations eliminated the protein overestimation problem with low protein amounts, and

Page 4 of 9

uNAAF/uNIAF/uNFAF and dNAAF/dNIAF/dNFAF calculations eliminated protein underestimation on the higher amount side. uNSAF and dNSAF values were still slightly underestimated for higher protein amounts, most likely due to dynamic exclusion having a dampening effect on spectral counts for proteins of higher abundance.45 Overall the R2 values for dNXAF response to protein amount were greater than 0.97, while only uNSAF had R2 values lower than 0.97 (Table 2). The Mann-Whitney U Test was used to determine whether or not there were statistically-significant differences between the distributions of the square of the Pearson product moment correlation coefficients (RSQ values calculated for each of the 12 replicates; Table S1) between uNXAF and dNXAF and between dNXAF strategies (Table S2A). RSQs between log2(dNSAF) and log2(Amount) tended to be greater than RSQs between log2(uNSAF) and log2(Amount) with a pvalue of 0.04 (Table S2A), while the distributions of RSQs measured for the other three unique (uNXAF) or distributed (dNXAF) strategies were not significantly different at the 0.05 level. Table 2. Linear regression between yNXAF values and known protein amounts Slopea

Adj. R2a

nNSAF

0.33 ± 0.08

0.77

nNAAF

0.42 ± 0.12

0.67

nNIAF

0.53 ± 0.20

0.54

nNFAF

0.40 ± 0.09

0.79

uNSAF

0.68 ± 0.09

0.91

uNAAF

0.92 ± 0.05

0.99

uNIAF

1.41 ± 0.11

0.97

uNFAF

0.93 ± 0.08

0.97

dNSAF

0.67 ± 0.06

0.97

dNAAF

0.93 ± 0.05

0.98

dNIAF

1.40 ± 0.09

0.98

dNFAF

0.94 ± 0.06

0.98

dNSAFcLL

0.67 ± 0.05

0.97

cLL

0.93 ± 0.05

0.99

1.40 ± 0.08

0.98

0.94 ± 0.06

0.98

dNAAF dNIAF

cLL

dNFAF

cLL

a

2

Slopes and adjusted R from the linear regressions (Figure S1) between log2(albumin amounts) and log2(nNXAF), log2(uNXAF), log2(dNXAF), or log2(dNXAFcLL) (obtained after cyclic-LOWESS and linear regression normalization, LRN).

We next compared the precision and sensitivity of each strategy by examining the standard errors and slopes of the linear regressions between the averaged log2(yNXAF) and log2(Amount). Amongst the uNXAF and dNXAF strategies, uNSAF and dNSAF had significantly smaller standard deviations (error bars, Figure S1B-C) as shown by the exact pvalues of the U-tests to compare the standard deviations of u/dNSAF vs. u/dNAAF, u/dNIAF, or u/dNFAF being all less than 0.05 (Table S2B). There were no significant differences in standard deviations between uNXAF and their corresponding dNXAF calculations (Table S2B). The linear regression between log2(u/dNSAF) values and log2(Amount) appeared to

ACS Paragon Plus Environment

Page 5 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

have the smallest slope, while regression through the u/dNIAF values had the steepest slope (Table 2, Figure S1B-C). Differences in the slopes measured for the linear fits (Table 2) between Log2(u/dNXAF) and Log2(Amount) (averaged across 12 technical replicates; Table S1) were statistically assessed using the t-test (Table S2C). The slopes measured for linear regression through uNXAF values did not show any statistical difference with their corresponding dNXAF values (Table S2C). The regression through u/dNIAF values had a significantly greater slope than the ones measured for regressions through u/dNSAF, u/dNAAF, or u/dNFAF (p-values < 0.01), while the slopes measured through u/dNAAF and u/dNFAF were not statistically different. The slopes measured for u/dNSAF were significantly smaller than the other three strategies, with p-values less than 0.01 (Table S2C). Smaller standard deviations in quantitation mean better precision, while larger slopes of linear response to protein amount mean better sensitivity. Therefore, when log2 transformation was applied, u/dNSAF had the best precision and u/dNIAF had the best sensitivity. We assessed the reproducibility of each strategy by calculating individual coefficients of determination (RSQs) for the linear regressions between any two of the 12 replicates using log2(u/dNXAF) values for all identified and quantified proteins (Table S2D). We compared the reproducibility of each strategy using the non-parametric U-test (Table S2D). In the case of dNSAF, dNAAF and dNFAF values, the distributions of the 66 measured RSQs tended to be greater than their corresponding uNSAF, uNAAF, and uNFAF RSQ values (p-value less than 0.05), while the RSQs distributions for uNIAF and dNIAF were not significantly different (Table S2D). Amongst the distributive strategies, dNSAF and dNIAF had the best reproducibility, followed by dNAAF, while dNFAF RSQS distribution was significantly lower (exact p-values were all less than 2.0×10-7). The results showed that uNXAF was better than nNXAF by counting unique spectra only and dismissing shared sequence length, but dNXAF had an overall better reproducibility than uNXAF. Different LF features normalized using the dNXAF calculation could hence be used similarly for proteomic quantitation. Peptide spectral counts and chromatographic peak area are mathematically equivalent for proteomic quantitation,45 the sum of a peptide’s parent ion intensity in MS1 can be considered as an approximation of its chromatographic peak area, and the sum of a peptide’s fragment ion intensity in MS2 depends on the sum of the peptide’s parent ion intensity in MS1. Therefore it was not unexpected that yNXAF values behaved very similarly when the same strategies were used to deal with label-free features from shared peptides. Stabilization of Variance. The RSQ values calculated between any two technical replicates (Table S2D) were plotted as heat maps for dNSAF, dNAAF, dNIAF and dNFAF (from top to bottom, Figure 1, panels A). The dNSAF, dNAAF, dNIAF, and dNFAF datasets showed a good reproducibility, while the RSQs followed dNSAF, dNIAF > dNAAF > dNFAF, where the exact p-values of U-test were 7.6×10-13 for dNSAF > dNAAF, 2.1×10-22 for dNIAF > dNAAF, and 3.0×10-15 for dNAAF > dNFAF (Table S2D). For the 12 LC-MS/MS technical replicates acquired for the albumin mixtures, we calculated the averaged standard deviations and averages of all identified proteins dNXAF. The averaged dNXAF standard deviations were 0.000335 for

dNSAF, 0.000558 for dNAAF, 0.000347 for dNIAF and 0.000652 for dNFAF, compared to the averaged dNXAF averages 0.001198 for dNSAF, 0.001478 for dNAAF, 0.001134 for dNIAF and 0.001275 for dNFAF. The averaged variances were hence 28% for dNSAF, 38% for dNAAF, 31% for dNIAF and 51% for dNFAF. We could conclude that dNSAF, dNAAF, dNIAF and dNFAF all had prominent variance, with dNFAF having the largest variance. Label-free quantitation values based on spectral counts, chromatographic peak area, parent ion intensity in MS1, and fragment ion intensity in MS2 generally tended to have significant variation, especially for the proteins of higher abundance and when repeated measurements were acquired. Stabilizing variance should then be considered to transform the measurements or adjust the measurements of these datasets. Different normalization transformation methods may be integrated together in one application. We first used cyclicLOWESS on the distributed dSpC/dPA/dMI/dMF values to reduce non-linear systematic bias, and then linear regression normalization (LRN) to reduce linear systematic bias and therefore stabilize variance (Figure 1B). LRN requires that systematic bias is linearly dependent on the magnitude of the measurement; otherwise it could not stabilize variance and even deteriorate variance. We found that the SpC, PA, MI, and MF quantitative features usually met this condition amongst technical replicates. The average standard deviation of the distributed dSpC/dPA/dMI/dMF values decreased with increasing number of normalization iterations when cyclicLOWESS and LRN were applied (Figure 1B). The averaged standard deviation of dSpC/dPA/dMI/dMF were sharply reduced with the first iteration of cyclic-LOWESS and then reached the method’s limit after 2 to 8 more iterations until LRN was subsequently applied (Figure 1B). The distribution of the standard deviation of dSAF/dAAF/dIAF/dFAF was narrower when cyclic-LOWESS and LRN were applied (Figure 1C). dNXAFcL values were calculated after cyclicLOWESS and dNXAFcLL after LRN on dSpC/dPA/dMI/dMF (Table S2D), and the corresponding RSQ values measured for any two of the 12 technical replicates were improved after stabilization of variance (Figure 1D). The averaged RSQ between any two replicates were increased from 0.93 to 0.99 for dNSAFcLL, from 0.86 to 0.99 for dNAAFcLL, from 0.94 to 0.99 for dNIAFcLL, and from 0.78 to 0.99 for dNFAFcLL (Table S2D). The variance reduction in dSpC/dPA/dMI/dMF through cyclic-LOWESS and LRN led to variance reduction in dNXAFcLL. The additional normalization steps did not have any effect on linearity and sensitivity (Figure S1D) with no statistical differences in slopes between the dNXAF and dNXAFcLL datasets (Table 2). After cyclic-LOWESS and LRN, dNIAFcLL still had the largest slope amongst all four strategies with pvalues < 0.002 and dNSAFcLL had the smallest slope with pvalues < 0.01 (Table S2C). The measured standard deviations were all significantly smaller after normalization (error bars in Figure S1D and Table S2B). The RSQs between any two replicates for dNXAFcLL measured for all proteins became greater than 0.99, but the U-test on RSQ values (Table S2D) indicated that dNFAFcLL reproducibility was still significantly lower than the reproducibility measured for dNSAFcLL, dNIAFcLL, and dNAAFcLL values (p-values < 8.4×10-6).

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1. Cyclic-LOWESS and LRN stabilize the variance of label-free abundance factors. (A) Individual coefficients of determinations (RSQs) for the linear regressions between any two of the 12 replicate analyses for dNSAF, dNAAF, dNIAF, and dNFAF values, calculated before variance correction of label free features, from top to bottom, respectively (Table S2). (B) Average standard deviations for dSpC, dPA, dMI, and dMF (from top to bottom) before and after cyclic-LOWESS and LRN normalization iterations. (C) Distribution of standard deviations for dXAF before and after cyclic-LOWESS and LRN. (D) RSQs calculated between any two of the 12 technical replicates after cyclicLOWESS (dNXAFcL, top-left corner of matrix) followed by LRN (dNXAFcLL, bottom-right corner of matrix). (E) RSQs calculated before and after normalization between any 2 technical replicates of a UPS2 spiked in E. coli proteins analyzed on three different mass spectrometers.39 Note that the 4th LTQ replicate and 4th Velos replicate39 were not used in our analysis because their total number of spectra, peptides, and proteins were less than half of those in other replicates.

To evaluate whether the normalization steps significantly affected the dNXAFcLL values of the 1061 identified yeast proteins, we applied the QSPEC44-derived QPROT software to test their dNXAF values before and after cyclic-LOWES, and to test their dNXAFcL values before and after LRN. A global QPROT FDR (up or down) lower than 0.5% was used to identify proteins whose dNXAF values were significantly changed (Table S2E). Less than 0.4% dNXAF values were increased after cyclic-LOWESS and less than 2.72% dNXAF values were increased after LRN. LRN affected proteins of lower abundance (1.76%) more than the more abundant ones (0.96%). Overall, cyclic-LOWESS and LRN did not significantly affect dNXAF values of individual proteins. We tested our variance stabilization approach more broadly on a publicly-available dataset39 consisting of replicate analyses of a sample containing the Universal Protein Standard Set 2 diluted in a soluble protein extract from E. coli. This sample was analyzed on three different types of mass spectrometers of low (LTQ and LTQ-Velos) and high (Orbitrap-Velos) resolution. We calculated dNSAF and dNIAF values for each iden-

Page 6 of 9

tified protein and further normalized them using cyclicLOWESS and LNR (Figure 1E). In all cases, normalization improved the RSQ values calculated between any two replicate analyses. Although applying cyclic-LOWESS and LRN to a dataset is more involved than the classic log2 transformation, such additional normalization steps might be worthwhile to use when technical replicates are available because they do not rescale the data as log2 transformation does. When the amount of standard proteins is evenly distributed in a narrower range, log2 transformation should be avoided because it would make the amount of standard proteins distributed unevenly, in which case other normalization methods such as cyclic-LOWESS and LRN should be applied. On the one hand, stabilizing variance might not be imperative for statistics methods used in differential protein expression analysis such as PLGEM46 and QPROT44 because they take variance into consideration in their model. On the other hand, stabilizing variance is an important step when using linear regression analysis to perform absolute quantitation. Absolute Quantitation. The albumin isoforms used in this study had been spiked into a yeast total cell lysate, therefore we could use the dNXAFcLL values and known albumin amounts to build a standard curve from which to derive the absolute amount of yeast proteins in these samples. From the linear regression of the average dNAAFcLL and dNIAFcLL (after cyclic LOWESS and LRN normalization) vs. albumin protein amount (Figure S1D), we calculated the absolute amounts (pmol) for the 1061 S. cerevisiae proteins detected in the 12 replicate datasets (Table S3A). The calculated absolute amounts for yeast proteins covered a range as large as 7 orders of magnitude (Table S3A). Adding the absolute amount for each protein and knowing the total amount of yeast proteins was about 8µg,23 we estimated the protein recovery rates for each label-free quantitation method (Table S3A). The recovery from dNAAFcLL values (58%) was below the observed recovery from dNSAFcLL (112%) and dNIAFcLL (103%). While the correlations between dNIAFcLL ~ dNSAFcLL and between dNIAFcLL ~ dNAAFcLL for these 1061 yeast proteins were good with adjusted R2 of 0.83 and 0.86, respectively (Figure 2A-B), the absolute protein amounts quantified by the three label-free methods had even better correlations with adjusted R2 of 0.93 and 0.94 (Figure 2C-D). The slope of the linear regression between dNAAFcLL-based and dNIAFcLLbased amounts was 0.80, i.e. the absolute protein amounts quantified through dNAAFcLL were lower than the amounts based on dNIAFcLL. In addition, the regression between dNXAFcLL values deviated from linearity for the most abundant proteins (Figure 2A-B): their dNIAFcLL values were higher than their dNAAFcLL and dNSAFcLL. The two issues could be systematically caused by the difficulty and intricacy of chromatograph peak area calculation, while dynamic exclusion45 affects dNSAF values for highly abundant proteins. Whether we chose dNSAFcLL, dNIAFcLL or dNAAFcLL to represent a protein’s relative abundance, these values might have systematic bias and be considerably different for those of highly abundant proteins (Figure 2A-B). However, by calculating absolute amounts based on label-free quantitation of protein standards, this systematic bias could be corrected to some extent: the absolute protein amount values based on dNXAFcLL were more equivalent than the simple dNXAF values even for highly abundant proteins (Figure 2C-D).

ACS Paragon Plus Environment

Page 7 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2. Absolute Quantitation. Relative or absolute quantitative values were transformed by cubic-root extraction such as the plotted values were more evenly distributed along the axes, hence allowing for more robust linear regressions. (A-B) Linear regressions between dNIAFcLL ~ dNSAFcLL (A) and dNIAFcLL ~ dNAAFcLL (B) measured for 1061 yeast proteins (Table S3A). (C-D) Linear regressions between dNIAFcLL- and dNSAFcLLbased amounts (C) and dNIAFcLL- and dNAAFcLL-based amounts (D) calculated for 1061 yeast proteins. (E-F) Linear regressions between absolute amounts calculated based on dNXAFcLL (X=S/I/A) and molecules per cell derived from quantitative Western blotting of epitope-tagged proteins40 (E) and from SRM41 (F) for the 12 yeast proteins quantified in all three studies. (G-I) Linear regressions between absolute amounts calculated based on dNXAFcLL (X=S/I/A) and molecules per cell derived from quantitative Western blotting of GFP-tagged proteins40 for the 895 yeast quantified in both studies. (J-L) Linear regressions between relative protein amounts based on 2D-PAGE analysis of E. coli proteins42 and the absolute proteins amounts derived from dNSAFcLL and dNIAFcLL values (Table S3B) measured in the LTQ (J), Velos (K) and Orbitrap (L) datasets.39

We next assessed the agreement between the protein amounts quantified by dNSAFcLL, dNAAFcLL and dNIAFcLL and previously reported literature values. Out of the 1061 S. cerevisiae proteins we absolutely quantified, the abundance of 895 of these had also been measured by quantitative Western blotting of TAP-tagged proteins,40 while the abundance of 12 of these proteins had been reported by Single Reaction Monitoring (SRM)41 as well as quantitative Western blotting40 (Table S3A). When comparing the 895 proteins whose abundance in yeast extracts had been previously determined by quantitative Western blots,40 the adjusted R2 were 0.62, 0.73 and 0.71 for dNSAFcLL-, dNIAFcLL- and dNAAFcLL-based protein amounts, respectively (Figure 2G-I). For the 12 yeast proteins quantified in all three studies, the adjusted R2 were 0.61, 0.61 and 0.65 for the linear regressions between Western-blot abundance values40 and dNSAFcLL-, dNIAFcLL- and dNAAFcLL-based amounts (Figure 2E). For the linear regres-

sions against abundance values determined by SRM assays41, the adjusted R2 reached 0.75, 0.79 and 0.84 for dNSAFcLL-, dNIAFcLL- and dNAAFcLL-based amounts, respectively (Figure 2F). The significantly higher adjusted R2 values observed between dNXAFcLL-based protein amounts and SRM-based amounts (Figure 2F) were likely due to the fact that our absolute quantitation method and the SMR method41 were both LC-MS/MS based. The highly positive R2 values indicate the method we describe here may be useful and straightforward for estimating absolute quantitation values of proteins in complex mixtures. We also derived dNSAFcLL- and dNIAFcLL-based protein amounts from the E. coli datasets where UPS2 proteins were used to generate a standard curve39 (Table S3B). Knowing that 31.3µg of E. coli extract were present in each sample39 and summing up the absolute amount we calculated from each protein, we obtained total protein recovery rates between 47% and 68% (Table S3B). Such recoveries were respectable yet lower than what we observed from the yeast analysis, most likely because the E. coli proteins were separated on SDSPAGE and in-gel digested prior to LC-MS/MS analyses.39 The dNSAFcLL- and dNIAFcLL-based absolute protein amounts were compared against the relative amounts based on a 2DPAGE analysis of E. coli proteins42 (Figure 2J-L). Although the data came from different biological samples analyzed in different laboratories with different instrument methods, the adjusted R2 values were between 0.85 and 0.90 (Figure 2J-L) demonstrating again an excellent agreement between our absolute quantitation method and literature values.

CONCLUSIONS Several strategies are currently used for label-free quantitative proteomics analyses, which include the use of MS1 intensity,3-5 spectrum counting,6-11 and MS2 fragment ion intensity.13-15, 22 We compared four quantitative proteomic approaches based on spectral counts (NSAF), peak areas (NAAF), parent ion intensity (NIAF), and fragment ion intensity (NFAF). We analyzed each of these label free quantitative values based on the inclusion of all peptides (nNXAF), the exclusion of shared peptides (uNXAF), and the distribution of shared peptides (dNXAF). In general, the distributive dNXAF strategy had the best performance, but a detailed comparison of 12 technical replicates indicated that significant variance remained an issue. In both the analysis of RNA29-34 and in quantitative proteomics,36-38 methods like cyclicLOWESS and linear regression have been used to reduce and stabilize variance and improve data analysis pipelines. When we applied both cyclic-LOWESS and LRN to dNXAF values, we obtained a significant reduction in variance and RSQ values that exceeded 0.99 for all methods and both of the independent datasets tested, which was a significant improvement of 8% for dNSAF, 15% for dNAAF, 6% for dNIAF, and 28% for dNFAF. A comparison of estimated protein amounts derived from dNSAFcLL, dNAAFcLL and dNIAFcLL values (stabilized by cyclic-LOWESS and LRN) to published S. cerevisiae and E. coli absolute protein abundances40, 41 revealed a strong positive correlation especially considering that different growths in different laboratories at different times were being compared. These results demonstrate that there is a high degree of concordance between different label free quantitative proteomic methods and that further use of variance stabilization could have widespread value in quantitative proteomics studies.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ASSOCIATED CONTENT Supporting Information Supporting Table 1: Label-free features measured for albumin isoforms digested with trypsin. Supporting Table 2: Statistical analyses. Supporting Table 3: Absolute quantitation of S. cerevisiae (A) and E. coli (B) proteins based on linear regression through standards of known amounts. Figure S1. Linear regression between yNXAF values and known albumin amounts. “This material is available free of charge via the Internet at http://pubs.acs.org.”

AUTHOR INFORMATION Corresponding Author *

Tel.: (816) 926-4458. Fax: (816) 926-4685. Email: [email protected]

Author Contributions The manuscript was written through contributions of all authors. ‡These authors contributed equally.

ACKNOWLEDGMENT This work was supported by the Stowers Institute for Medical Research.

REFERENCES 1. Bantscheff, M.; Lemeer, S.; Savitski, M. M.; Kuster, B. Analytical and bioanalytical chemistry 2012, 404, 939-965. 2. Neilson, K. A.; Ali, N. A.; Muralidharan, S.; Mirzaei, M.; Mariani, M.; Assadourian, G.; Lee, A.; van Sluyter, S. C.; Haynes, P. A. Proteomics 2011, 11, 535-553. 3. Bondarenko, P. V.; Chelius, D.; Shaler, T. A. Analytical chemistry 2002, 74, 4741-4749. 4. Chelius, D.; Bondarenko, P. V. Journal of proteome research 2002, 1, 317-323. 5. Listgarten, J.; Emili, A. Molecular & cellular proteomics : MCP 2005, 4, 419-434. 6. Cooper, B.; Feng, J.; Garrett, W. M. Journal of the American Society for Mass Spectrometry 2010, 21, 1534-1546. 7. Ishihama, Y.; Oda, Y.; Tabata, T.; Sato, T.; Nagasu, T.; Rappsilber, J.; Mann, M. Molecular & cellular proteomics : MCP 2005, 4, 1265-1272. 8. Liu, H.; Sadygov, R. G.; Yates, J. R., 3rd. Analytical chemistry 2004, 76, 4193-4201. 9. Lu, P.; Vogel, C.; Wang, R.; Yao, X.; Marcotte, E. M. Nature biotechnology 2007, 25, 117-124. 10. Zybailov, B.; Coleman, M. K.; Florens, L.; Washburn, M. P. Analytical chemistry 2005, 77, 6218-6224. 11. Zybailov, B.; Mosley, A. L.; Sardiu, M. E.; Coleman, M. K.; Florens, L.; Washburn, M. P. Journal of proteome research 2006, 5, 2339-2347. 12. Colaert, N.; Vandekerckhove, J.; Martens, L.; Gevaert, K. Methods in molecular biology 2011, 753, 373-398. 13. Griffin, N. M.; Yu, J.; Long, F.; Oh, P.; Shore, S.; Li, Y.; Koziol, J. A.; Schnitzer, J. E. Nature biotechnology 2010, 28, 83-89. 14. Trudgian, D. C.; Ridlova, G.; Fischer, R.; Mackeen, M. M.; Ternette, N.; Acuto, O.; Kessler, B. M.; Thomas, B. Proteomics 2011, 11, 2790-2797. 15. Wu, Q.; Zhao, Q.; Liang, Z.; Qu, Y.; Zhang, L.; Zhang, Y. The Analyst 2012, 137, 3146-3153. 16. Nahnsen, S.; Bielow, C.; Reinert, K.; Kohlbacher, O. Molecular & cellular proteomics : MCP 2013, 12, 549-556. 17. Colaert, N.; Helsens, K.; Martens, L.; Vandekerckhove, J.; Gevaert, K. Nature methods 2009, 6, 786-787. 18. Dicker, L.; Lin, X.; Ivanov, A. R. Molecular & cellular proteomics : MCP 2010, 9, 2704-2718.

Page 8 of 9

19. Chen, Y. Y.; Chambers, M. C.; Li, M.; Ham, A. J.; Turner, J. L.; Zhang, B.; Tabb, D. L. Journal of proteome research 2013, 12, 4111-4121. 20. Colaert, N.; Gevaert, K.; Martens, L. Journal of proteome research 2011, 10, 3183-3189. 21. Colaert, N.; Vandekerckhove, J.; Gevaert, K.; Martens, L. Proteomics 2011, 11, 1110-1113. 22. Gokce, E.; Shuford, C. M.; Franck, W. L.; Dean, R. A.; Muddiman, D. C. Journal of the American Society for Mass Spectrometry 2011, 22, 2199-2208. 23. Zhang, Y.; Wen, Z.; Washburn, M. P.; Florens, L. Analytical chemistry 2010, 82, 2272-2281. 24. Blein-Nicolas, M.; Xu, H.; de Vienne, D.; Giraud, C.; Huet, S.; Zivy, M. Proteomics 2012, 12, 2797-2801. 25. Fermin, D.; Basrur, V.; Yocum, A. K.; Nesvizhskii, A. I. Proteomics 2011, 11, 1340-1345. 26. McLachlan, G. J.; Do, K.-A.; Ambroise, C. In Probability and Statistics, DavidJ. Balding, N. A. C. C., Nicholas I. Fisher,; lain M. Johnstone, J. B. K., Geert Molenberghs. Louise M. Rvan,; David W. Scott, A. F. M. S., Jozef L. Teugels, Eds.; John Wiley & Sons, Inc: Hoboken, New Jersey, 2004, p 315. 27. Parmigiani, G. The Analysis of Gene Expression Data: Methods and Software; Springer, 2003. 28. Dziuda, D. M. Data Mining for Genomics and Proteomics: Analysis of Gene and Protein Expression Data; Wiley, 2010. 29. Durbin, B. P.; Hardin, J. S.; Hawkins, D. M.; Rocke, D. M. Bioinformatics 2002, 18, S105-110. 30. Bolstad, B. M.; Irizarry, R. A.; Astrand, M.; Speed, T. P. Bioinformatics 2003, 19, 185-193. 31. Ballman, K. V.; Grill, D. E.; Oberg, A. L.; Therneau, T. M. Bioinformatics 2004, 20, 2778-2786. 32. Berger, J. A.; Hautaniemi, S.; Jarvinen, A. K.; Edgren, H.; Mitra, S. K.; Astola, J. BMC bioinformatics 2004, 5, 194. 33. Hua, Y. J.; Tu, K.; Tang, Z. Y.; Li, Y. X.; Xiao, H. S. Genomics 2008, 92, 122-128. 34. Lin, S. M.; Du, P.; Huber, W.; Kibbe, W. A. Nucleic acids research 2008, 36, e11. 35. Cleveland, W. S.; Devlin, S. J. Journal of the American Statistical Association 1988, 83, 596-610. 36. Callister, S. J.; Barry, R. C.; Adkins, J. N.; Johnson, E. T.; Qian, W. J.; Webb-Robertson, B. J.; Smith, R. D.; Lipton, M. S. Journal of proteome research 2006, 5, 277-286. 37. Kultima, K.; Nilsson, A.; Scholz, B.; Rossbach, U. L.; Falth, M.; Andren, P. E. Molecular & cellular proteomics : MCP 2009, 8, 2285-2295. 38. Ting, L.; Cowley, M. J.; Hoon, S. L.; Guilhaus, M.; Raftery, M. J.; Cavicchioli, R. Molecular & cellular proteomics : MCP 2009, 8, 2227-2242. 39. Krey, J. F.; Wilmarth, P. A.; Shin, J. B.; Klimek, J.; Sherman, N. E.; Jeffery, E. D.; Choi, D.; David, L. L.; Barr-Gillespie, P. G. Journal of proteome research 2014, 13, 1034-1044. 40. Ghaemmaghami, S.; Huh, W. K.; Bower, K.; Howson, R. W.; Belle, A.; Dephoure, N.; O'Shea, E. K.; Weissman, J. S. Nature 2003, 425, 737-741. 41. Picotti, P.; Bodenmiller, B.; Mueller, L. N.; Domon, B.; Aebersold, R. Cell 2009, 138, 795-806. 42. Lopez-Campistrous, A.; Semchuk, P.; Burke, L.; PalmerStone, T.; Brokx, S. J.; Broderick, G.; Bottorff, D.; Bolch, S.; Weiner, J. H.; Ellison, M. J. Molecular & cellular proteomics : MCP 2005, 4, 1205-1209. 43. Zhang, Y.; Wen, Z.; Washburn, M. P.; Florens, L. Analytical chemistry 2011, 83, 9344-9351. 44. Choi, H.; Fermin, D.; Nesvizhskii, A. I. Molecular & cellular proteomics : MCP 2008, 7, 2373-2385. 45. Zhang, Y.; Wen, Z.; Washburn, M. P.; Florens, L. Analytical chemistry 2009, 81, 6317-6326. 46. Pavelka, N.; Fournier, M. L.; Swanson, S. K.; Pelizzola, M.; Ricciardi-Castagnoli, P.; Florens, L.; Washburn, M. P. Molecular & cellular proteomics : MCP 2008, 7, 631-644.

ACS Paragon Plus Environment

Page 9 of 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table of Contents Artwork

ACS Paragon Plus Environment