Influence of the Digestion Technique, Protease, and Missed Cleavage

Jul 2, 2014 - Peptide Selection for Targeted Protein Quantitation. Cristina Chiva and Eduard Sabidó. Journal of Proteome Research 2017 16 (3), 1376-1...
0 downloads 5 Views 489KB Size
Subscriber access provided by UNIV NEW ORLEANS

Article

Influence of the digestion technique, protease and missed cleavage peptides in protein quantitation Cristina Chiva, Mireia Ortega, and Eduard Sabidó J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/pr500294d • Publication Date (Web): 02 Jul 2014 Downloaded from http://pubs.acs.org on July 5, 2014

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

INFLUENCE OF THE DIGESTION TECHNIQUE, PROTEASE AND MISSED CLEAVAGE PEPTIDES IN PROTEIN QUANTITATION 1,2

1,2

Cristina Chiva , Mireia Ortega , Eduard Sabidó

1,2,*

1

Proteomics Unit, Centre de Regulació Genòmica (CRG), Dr. Aiguader 88, 08003 Barcelona, Spain 2

Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003 Barcelona, Spain

* Corresponding author Eduard Sabidó, PhD Head of the UPF/CRG Proteomics Unit Centre de Regulació Genòmica (CRG), Carrer Dr. Aiguader 88, 08003 Barcelona, Spain Tel. +34 933 160 834 Fax +34 933 160 099 email: [email protected] Keywords Protein quantitation, digestion protocols, in-solution, filter-aided, trypsin, chymotrypsin, endopeptidase Lys-C, FASP, mass spectrometry Author Contributions The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding Sources This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme. The CRG/UPF Proteomics Unit is part of the “Plataforma de Recursos Biomoleculares y Bioinformáticos (ProteoRed)” supported by grant PT13/0001 of the Instituto de Salud Carlos III (ISCIII).

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT Quantitative determination of absolute and relative protein amounts is an essential requirement for most current bottom-up proteomics applications, but protein quantitation estimates are affected by several sources of variability such as sample preparation, mass spectrometric acquisition, and data analysis. Among them, sample digestion has attracted much attention from the proteomics community, as protein quantitation by bottom-up proteomics relies on the efficiency and reproducibility of protein enzymatic digestion, with the presence of missed cleavages, non-specific cleavages or even the use of different proteases having been postulated as important sources of variation in protein quantitation. Here, we evaluated both in-solution and filter-aided digestion protocols and assessed their influence in the estimation of protein abundances using five E. coli mixtures with known amounts of spiked proteins. We observed that replicates of trypsin specificity digestion protocols are highly reproducible in terms of peptide quantitation, with digestion technique and the chosen proteolytic enzyme being the major sources of variability in peptide quantitation. Finally, we also evaluated the result of including peptides with missed cleavages in protein quantitation, and observed no significant differences in precision, accuracy, specificity, and sensitivity compared with the use of fully tryptic peptides.

ACS Paragon Plus Environment

Page 2 of 23

Page 3 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

INTRODUCTION Quantitative determination of absolute and relative protein amounts is an essential requirement for most current 1

2,3

proteomics applications such as biomarker validation , differential proteomics studies , and the modeling of 4

certain biological processes . Bottom-up proteomics is still the method of choice for most researchers to perform proteomics quantitative studies. In this approach, protein mixtures are enzymatically digested and the generated peptides are separated by liquid chromatography and subsequently analyzed by tandem mass spectrometry. Peptide sequences are then deduced from experimental spectra and protein quantities are inferred from the 5-7

peptide extracted ion area, peak heights, or spectral counts, using different heuristics . In this general workflow, there are several sources of variability that can introduce biases in the final protein quantitation estimates, sources which are derived from sample preparation, mass spectrometric acquisition, and data analysis. Among them, sample digestion has attracted much attention from the proteomics community, and over the past few years efforts have been invested in optimizing digestion protocols by testing several proteolytic 8-11

enzymes and digestion conditions

. Indeed, protein quantitation by bottom-up proteomics relies on the

efficiency and reproducibility of protein enzymatic digestion, with the presence of missed cleavages, non-specific cleavages or even the use of different proteases having been postulated as important sources of variation in absolute and relative protein quantitation. In this regard, a recent study reported large differences in the estimated number of protein copies per cell in yeast samples when using different proteolytic enzymes, thus 12

providing evidence for the existence of a protease bias in absolute protein quantitation studies . Similarly, 13

digestion kinetics was also revisited as part of the efforts to standardize experimental digestion techniques . There, the authors evaluated the efficiency and completeness of twelve different digestion methods using human serum albumin as standard protein and they found a substantial variation among methods. Finally, the recently performed evaluation of different in-solution protein digestion protocols revealed a reduction in the number of 14

missed cleavages with the combined use of different proteolytic enzymes . 15

The use of peptides with missed cleavages has historically been ill-advised in quantitative proteomics . Protein quantitation using a bottom-up proteomics approach often uses the peptide areas as proxies for estimating protein quantities. However, this approach has some limitations, as peptides are often not generated in a one-toone basis due to incompleteness of protein digestion, and the fact that some cleavage points are often skipped during the proteolytic reaction. Despite the existence of some empirical rules for preferred missed cleavage sites for different proteases, there is uncertainty concerning which sites will be cleaved and to what extent. Moreover, chemical and biological modifications of a particular peptide sequence can affect the digestion efficiency and eventually, also the protein quantitation. Even if a proteotypic peptide without an apparent missed cleavage site was chosen for protein quantitation, there is no guarantee that there is not a peptide counterpart with one or more missed cleavages that contains the selected sequence.

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In order to address some of the questions raised by the possible effects of the proteolytic enzymes used and of the digestion methodology on protein quantitation, it is important also to consider the contribution of incorporating, or not, missed cleavage peptides in the analyses. Indeed, efficient digestion might be important to estimate absolute protein abundances since it nowadays requires peptide recovery close to one hundred per cent. However, reproducibility of the digestion itself can be even more important than completeness to ensure reliable relative protein quantitation since it is based on the consistency of the generated peptide areas. Here, we have evaluated both in-solution and filter-aided digestion protocols and assessed their reproducibility and its influence on the estimation of protein abundances. More specifically, five E. coli mixtures with known amounts of spiked-in proteins were prepared and digested following different protein digestion protocols. Accuracy and precision of protein quantitation was evaluated as well as the impact of including peptides with missed cleavages in protein quantitation in our dataset.

ACS Paragon Plus Environment

Page 4 of 23

Page 5 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

EXPERIMENTAL PROCEDURE Sample preparation Thirty commercial proteins were prepared at 1.5 pmol/µL in three different subsets of 10 proteins each (Supplementary Table ST1). Proteins from these subsets were spiked to a 15-µg Escherichia coli background (BioRad, cat. no. 163-2110) in different proportions as shown in Table 1 to prepare five different mixtures in triplicates. The final amount of each protein in the different mixtures was either 100, 200, or 400 fmol per µg of E. coli protein extract for the subsets marked as 1, 2 and 4, respectively, in Table 1. HeLa cells were grown at 37 ºC and 5 % CO2 in DMEM medium (Invitrogen Inc.) supplemented with 10 % FCS, and 1 2

% penicillin/streptomycin (Invitrogen). Cells were cultured in 100-cm plates and passaged at 80 % confluence. Cells were then washed with ice-cold PBS, scraped from the plates and homogenized with either 6 M urea in 0.2 M NH4HCO3 for in-solution digestions, or 4 % SDS in 0.1 M HEPES for filter-aided digestion protocols. Final protein concentration was quantified using the BCA assay (Pierce). Two different digestion techniques—filter-aided and in-solution digestion—were used with different proteases or a sequential combination of them: in-solution digestion protocol with trypsin (Promega, V-5111), Lys-C

16

(Wako,

129-02541), chymotrypsin (Roche, 1 418 467 00) and Lys-C/trypsin; and a filter-aided sample preparation method

17

with trypsin and Lys-C/trypsin using filters with a pore size of 30kDa (Microcon, YM-30 membrane 30 kDa, Millipore). Briefly, for the in-solution digestion protocols 15 µg of each sample was dissolved in 30 µL of 6 M urea in 0.2 M NH4HCO3, reduced with dithiothreitol (DTT, 10 mM, 37 ºC, 60 min), and alkylated with iodoacetamide (IAM, 20 mM, 25 ºC, 30 min). In the tryptic and chymotryptic digestion protocols, samples were diluted 10-fold before being digested with trypsin at 37 ºC and with chymotrypsin at 25 ºC overnight. In the sequential LysC/trypsin digestion protocol, samples were diluted up to 2 M urea, digested overnight with Lys-C at 37 ºC, and then diluted 2-fold again and digested overnight with trypsin at 37 ºC. For the Lys-C protocol, samples were diluted up to 4 M urea, pH was adjusted to 8.8 with 25 mM Tris-HCl and digested with Lys-C at 25ºC overnight. Finally, in the filter-aided digestion protocols, 15 µg of each sample were dissolved in 30 µL 4 % SDS in 0.1 M HEPES, and reduced with DTT (final concentration 0.1 M, 60ºC, 30 min). Samples were loaded into a filtration device where they were washed three times with urea buffer (6 M urea in 0.1 M HEPES) to remove the SDS. Alkylation with IAM was made in the filtration device (50 µM IAA, 25 ºC, 20 min) followed by three washing steps with 100 µL each (2 M urea, in 0.1 M HEPES). The buffer was finally exchanged to 50 mM NH4HCO3 to perform the enzymatic digestion. Digestion with trypsin and Lys-C or Lys-C/Trypsin was done in the filter with the same enzyme-substrate ratio, time and temperature as the in-solution protocols explained above. Peptide collection was done by centrifugation. In all cases, peptides were desalted using a C18 membrane packed on a pipette tip (3M, 66883-U), evaporated to dryness and dissolved in 30 µL of 0.1 % formic acid in water.

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Mass spectrometric analyses A 2.5-µL aliquot of each peptide mixture was analyzed using a LTQ-Orbitrap Velos Pro mass spectrometer (Thermo Fisher Scientific, San Jose, CA, USA) coupled to a nano-LC (Proxeon, Odense, Denmark) equipped with a reversedphase chromatography 12-cm column with an inner diameter of 75 μm, packed with 5 μm C18 particles (Nikkyo Technos Co. Ltd., Japan). Chromatographic gradients started at 97% buffer A and 3% buffer B with a flow rate of 300 nl/min, and gradually increased to 93% buffer A and 7% buffer B in 1 min, and to 65% buffer A and 35% buffer B in 60 min. After each analysis, the column was washed for 10 min with 10 % buffer A and 90 % buffer B. Buffer A: 0.1 % formic acid in water. Buffer B: 0.1 % formic acid in acetonitrile. The mass spectrometer was operated in positive ionization mode with nanospray voltage set at 2.2 kV and source temperature at 250 ºC. Ultramark 1621 for the FT mass analyzer was used for external calibration prior the analyses. The background polysiloxane ion signal at m/z 445.1200 was used as lock mass. The instrument was operated in data-dependent acquisition (DDA) mode, and full MS scans with 1 microscan at resolution of 60,000 were used over a mass range of m/z 250-2,000 with detection in the Orbitrap. Auto gain control (AGC) was set to 1e6, dynamic exclusion was set at 60 s, and the charge state filter disqualifying singly charged peptides for fragmentation was activated. Following each survey scan, the top twenty most intense ions with multiple charged ions above a threshold ion count of 5000 were selected for fragmentation at normalized collision energy of 35 %. Fragment ion spectra produced via collisioninduced dissociation (CID) were acquired in the linear ion trap, AGC was set to 5e4, isolation window of 2.0 m/z, activation time of 0.1 ms and maximum injection time of 100 ms was used. All data were acquired with Xcalibur software v2.2. Data analysis Acquired data were analyzed using the Proteome Discoverer software suite (v1.3.0.339, Thermo Fisher Scientific) 18

and the Mascot search engine (v2.3, Matrix Science ) was used for peptide identification. Data corresponding to the protein mixes in the E. coli background were searched against an in-house generated database containing all the spiked-in proteins (Table 1), and the E. coli Swissprot protein database (version of July 2012) plus the most common contaminants (total of 23,541 sequences). Similarly, the human Swissprot protein database (version of July 2012) plus the most common contaminants (total of 37,368 sequences) was used to analyze the acquired data derived from the HeLa cell protein extracts. In both cases, a precursor ion mass tolerance of 7 ppm at the MS1 level was used, and up to three missed cleavages for trypsin were allowed. The fragment ion mass tolerance was set to 0.5 Da. Oxidation of methionine and protein acetylation at the N-terminal were defined as variable modification. Carbamidomethylation on cysteines was set as a fix modification. The identified peptides were filtered using a FDR < 0.01 calculated using a decoy database strategy. Protein relative quantitation and the corresponding mean square error were achieved by extracting the peptide areas with the Proteome Discoverer software suite (v1.3.0.339, Thermo Fisher Scientific) followed by median

ACS Paragon Plus Environment

Page 6 of 23

Page 7 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

normalization (Supplementary Figure S2). Only unique peptides per protein were used for the differential protein 19

quantitation analysis with the linear mixed-effects model implemented in the R package MSstats v2.0 . When a peptide was missing completely in a condition, the nointeraction (default) option was used, which indicates that the quantified interferences are random artifacts that should be considered as noise (Supplementary Tables ST2 and ST3). Sensitivity and specificity were calculated using the results from MSstats, and their variance was estimated by resampling bootstrap (n=1,000 iterations) using the boot R package.

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

RESULTS AND DISCUSSION To evaluate the influence of digestion protocols, proteolytic enzymes and the peptides with missed cleavages on protein quantitation, we designed a set of five controlled samples each of them consisting of a mixture of three subsets of commercial proteins spiked at different quantities in an E. coli background (Table 1 and Supplementary Table ST4). In contrast to the use of a single protein standard or a complex biological sample, the experimental design used here with controlled protein amounts in a complex background allowed us to reliably determine parameters such as accuracy and precision of protein quantitation while maintaining the sample complexity. Five E. coli mixtures with known amounts of spiked-in proteins were prepared, and three different proteases were tested in the in-solution digestion protocol: trypsin (TINSOL), chymotrypsin (CINSOL), endopeptidase Lys-C (LINSOL), and a combination of Lys-C and trypsin (LTINSOL); and two proteases were used in the filter-aided protocols: trypsin (TFASP), and a combination of Lys-C and trypsin (LTFASP). The five protein mixtures were prepared in triplicates for each of the six digestion protocols described (Figure 1A). Influence of digestion protocols in peptide and protein identification Initially, we evaluated the different digestion protocols in terms of the number and type of identified peptides, as well as of the number of potentially quantifiable proteins groups (Supplementary Table ST4). Our results showed that the digestion protocols leading to the highest number of identified peptides and proteins were those in which tryptic protease specificity was used (Figure 1B). Among them, the digestion protocol combining Lys-C and trypsin in a filter-aided approach (LTFASP) gave approximately 10 % more identified protein groups than standard tryptic digestions, these numbers being very reproducible among replicates. In contrast, TINSOL rendered the highest number of identified peptides. The increase in the number of identified peptides was due to the presence of a higher number of peptides with missed cleavages that generated redundant information at the peptide level and, therefore, it had little impact on the final number of identified proteins. Indeed, the filter-aided digestions as well as the digestions that included Lys-C resulted in a high percentage of peptides without missed cleavages (Figure 1C), while the protocol of in-solution tryptic digestion showed a significant increase in peptides with missed cleavages with a slight bias towards missed cleavages at lysines as compared to missed cleavages at arginine (67 % vs 33 %). This is remarkable as the relative occurrence of Arg in respect to Lys in the E. coli protein database is favorable to Arg, and therefore, the observed tendency does not reflect the natural relative occurrence, and points towards a certain bias in peptide missed cleavage. This tendency was observed not only in TINSOL protocol, but also in LTINSOL (Arg: 61 vs. Lys: 39 %) and in TFASP (Arg: 70 % vs. Lys: 30 %), whereas it was inverted in LTFASP protocol (Arg: 89 % vs. Lys: 11 %) (Table 2). It is known that the efficiency of a protease is influenced by different factors such as the surrounding residues of the cleavage point or the local conformation. A peptide sequence motif analysis of the peptides with missed cleavages revealed that most of these sites contained either an acidic residue (Asp/Glu) near the missed cleavage site or adjacent Arg/Lys cutting points (e.g. KK, KR, RK or RR) as previously

ACS Paragon Plus Environment

Page 8 of 23

Page 9 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

20

described . In the case of CINSOL digestions an influence of the presence of acidic residues surrounding the cleavage point was also observed. Despite of the described differences among protocols, the observed percentage of peptides with missed cleavages was, in all cases, very reproducible among replicates of the same protocol, and the degree of overlap between replicates when considering all peptides was comparable to that obtained when considering only peptides with missed cleavages, which suggests that although some digestion protocols might exhibit low digestion efficiencies, the percentages of peptides with missed cleavages are reproducible when experiments are conducted in parallel and under the same conditions. Proteolytic specificities other than trypsin gave a significantly lower number of identified proteins and peptides (Figure 1A). More specifically, in the case of CINSOL, we observed only around 60% of the protein groups observed with tryptic specificity. Moreover, assuming cleavage after Tyr, Phe, and Trp as the normal cleavage for chymotrypsin, this protocol also resulted in a high percentage of peptides with one, two, or three missed cleavages, thus evidencing low digestion efficiency in the conditions tested (Figure 1C).

Influence of digestion protocols in peptide and protein quantitation The reproducibility of digestion protocols was further assessed by correlating the extracted peptide areas among the different digestion protocols that shared tryptic specificity including both peptides from the E. coli background and peptides from the spiked proteins in Mixture 1 (Figure 2A, Supplementary Table ST5). 2

Our results showed a higher correlation within the same digestion technique (i.e., in-solution or FASP; R = 0.79; 2

Pearson correlation) than between protocols with the same proteolytic activity (i.e., trypsin or Lys-C/trypsin; R = 2

0.68). Moreover, results of technical replicates within the same digestion protocol were also highly consistent (R = 0.95), and the peptide areas generated were reproducible regardless of the number of missed cleavages present in 2

the generated peptides (R =0.94 when only peptides with missed cleavages were considered). These observations confirmed that, given a specific digestion protocol, replicates performed in parallel and under the same conditions are reproducible in terms of the type of peptides generated and, therefore, in peptide quantitation. Moreover, our 12

results also evidenced that there is not only a protease bias as described previously , but also an important technique bias (i.e., in-solution or FASP) that under the conditions tested had even a higher impact in peptide quantitation. The influence of protein digestion protocols in protein quantitation was also assessed at the protein level by 5

comparing the protein abundance ranks in the different digestion protocols calculated by the T3PQ approach . Following this approach, the average area of the three most intense peptides per protein was calculated using Proteome Discoverer software suite (v1.3.0.339, Thermo Fisher Scientific) and used as an estimate for protein

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 23

abundance. Therefore, a rank comparison could be done for all techniques used, even among protocols with different protease specificity, because the T3PQ approach does not require that peptides in the comparison are the same. Indeed, the Spearman correlation coefficient was calculated assuming that even though the value of the T3PQ area could differ among techniques due to the probable use of different peptides, the order of proteins in the quantitation should be similar, i.e., the most abundant proteins should be the same in all replicates regardless of the digestion protocol used. For proteins with less than three peptides, the remaining peptides were used to estimate the protein amount. The comparison was done considering proteins from the E. coli background and peptides from the spiked proteins in the three replicates of Mixture 1 (Figure 2B). Our analysis showed an excellent 2

rank correlation among replicates of the same digestion protocol (R = 0.97; Spearman correlation) and grouped filter-aided digestion protocols together. In-solution digestion protocols also grouped together in this analysis, except for CINSOL that formed an independent family. It is remarkable that LINSOL groups together with the other in-solution digestion protocols despite the fact that the different protease specificity TINSOL protocol renders more similar results to the LINSOL protocol than to the TFASP protocol. Protein-level evaluation also evidenced the presence of the technique bias as a major source of variability during protein digestion.

Influence of peptides with missed cleavages in peptide and protein quantitation In order to assess the contribution of peptides with missed cleavages in relative protein quantitation, we performed a relative quantitation of the spiked control proteins and evaluated the effect of peptides with missed cleavages in terms of precision, accuracy, specificity and sensitivity. Similarly to the total protein digest, spiked control proteins presented a considerable number of peptides with missed cleavages, which represented around 46 % of the total identified peptides in the TINSOL digestion protocol, 30 % in LTINSOL, 33 % in TFASP, 21 % in LTFASP, 26% in LINSOL, and up to 79% in the CINSOL digestion protocol. Protein relative quantitation was achieved by extracting the peptide areas followed by a differential protein 19

quantitation analysis using the linear mixed-effects model implemented in the R package MSstats v2.0 . Protein relative quantitation was performed using either all unique peptides regardless of the number of missed cleavages, or using only unique peptides with no missed cleavages. In both cases, the precision and accuracy of relative protein quantitation was assessed using the mean squared error (MSE), which sums the differences between the estimated protein fold-changes and their true values ( =

 

 ∑

( − ̅ ) ) (Figure 3).

We evaluated the MSE of the different protein fold-changes using all the peptides or using only peptides with no missed cleavages. MSE values were very similar when the quantitation had been performed with all the peptides than when only peptides without any missed cleavage were included in the analysis. Actually, precision and accuracy were even slightly better when using all peptides in cases such as LINSOL and LTFASP (Figure 3) and the

ACS Paragon Plus Environment

Page 11 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

observed results were consistent regardless the fold-change analyzed. The analysis of CINSOL peptides without missed cleavages rendered an erratic behavior, probably due to the reduced number of peptides present in this category. Similarly, the obtained values for sensitivity and specificity were not significantly different between the analyses with only peptides without missed cleavages and the analyses in which peptides with missed cleavages were also included (Supplementary Figure S1). Enzymatic specificity different than trypsin underperformed in comparison with the other digestion protocols, thus evidencing that for all the protein ratios considered in this controlled study, tryptic specificity renders, in general, better precision and accuracy for relative protein quantitation. No clear differences were observed in terms of sensitivity and specificity among the different digestion protocols (Supplementary Figure S1). To discard any sample-dependent effect on our observations, we evaluated the effect of peptides with missed cleavages in a human sample (obtained from HeLa cells, Supplementary Table ST6). For each of the four digestion protocols with tryptic specificity (TINSOL, TFASP, LTINSOL, LTFASP), the peptide areas among replicates of the same technique was assessed for each the set of peptides with and without missed cleavages. Our results confirmed that the reproducibility of peptide areas does not differ significantly between the two sets of peptides (Figure 4A) meaning that peptides with missed cleavages do not introduce any additional variability in protein quantitation than that due to peptides with no missed cleavages. This observation could be easily explained by the previously discussed reproducibility in the generation of peptides with missed cleavages during protein digestion. Therefore, the incorporation of peptides with missed cleavages in relative protein quantitation does not only have no remarkable effect in variability but it normally increases protein coverage and might even improve protein relative abundance estimates due to the larger amount of experimental points used. Finally, we evaluated the impact of including peptides with missed cleavages in the estimates of protein abundances. For this purpose, we compared the rank correlation coefficient between T3PQ areas of each protein including and excluding the peptides with missed cleavages. This analysis provided evidence of the very good correlation exhibited by replicates of the same technique (Figure 4B, Supplementary Table ST5). The technique used (filter-aided vs. in-solution) was the most important source of variation among samples, followed by the enzyme used (Lys-C vs. Lys-C plus Trypsin), with variation due to the inclusion of peptides with missed cleavages being almost indistinguishable from that attributable to technical replicates. The low impact of the inclusion of peptides with missed cleavages in estimating protein absolute abundances detected in our study had already been suggested in two recent publications by Schmidt and colleagues in which several protein quantity indexes were evaluated using peptides with missed cleavages and with peptide modifications. These authors found no obvious reasons to exclude peptides with missed cleavages to estimate protein absolute abundances

ACS Paragon Plus Environment

14,21

.

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

CONCLUSIONS Here, we evaluated both in-solution and filter-aided digestion protocols and assessed their influence on the estimation of protein abundances using five E. coli mixtures with known amounts of spiked proteins. Overall, our results confirmed that replicates of trypsin specificity digestion protocols performed in parallel and under the same conditions are highly reproducible in terms of peptide quantitation regardless of the number of missed cleavages, being LTFASP the digestion protocol rendering the highest number of proteins quantified. Moreover, we evidenced not only a protease bias, but also an important technique bias (i.e., in-solution vs. filter-aided) that under the conditions tested had an even higher impact in peptide quantitation. Finally, we also evaluated the contribution of including peptides with missed cleavages in protein quantitation. Far from being residual cases, peptides with missed cleavages are abundant and in some cases even major products of the digestion reaction. Although the percentage of peptides with missed cleavages varies with the kinetics of the digestion reaction, these peptides are present even when the digestion reaction has reached equilibrium. Therefore, the selection of a peptide with no missed cleavages does not guarantee a one-to-one ratio to the original protein due to the potential presence of another peptide with missed cleavages that includes the selected sequence and which has not been identified in shotgun proteomics experiments, or even not considered in targeted strategies. Nevertheless, our results show that, under reproducible digestion procedures, the inclusion of peptides with missed cleavages in protein relative quantitation does not introduce a higher degree of variability, and no significant differences in precision, accuracy, specificity, and sensitivity were found compared with the use of fully tryptic peptides. These observations suggest that peptides with missed cleavages can be included as proteotypic peptides in quantitative proteomics experiments without major concerns, as their addition renders similar quantitative results to the obtained with the use of only peptides considered to be fully tryptic.

ACS Paragon Plus Environment

Page 12 of 23

Page 13 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

FIGURES Figure 1 A) Schematic representation of the six digestion protocols used in this study; B) Number of proteins groups, peptides, and peptide-spectrum matches (PSM) identified in the E. coli sample spiked with controlled proteins under the different digestion protocols used; C) Percentage of peptides bearing zero, one, two or three missed cleavages as a result of the different digestion protocols tested. Figure 2 A) Correlation of the extracted peptide areas among the digestion protocols in Mixture 1 with tryptic specificity including peptides with and without missed cleavage. The number of points used for this correlation analysis (after removing NA) was n=1846; B) Rank correlation of estimated protein abundance in Mixture 1 calculated by T3PQ approach for the different digestion protocols. The number of points used for this correlation analysis (after removing NA) was n=563. Figure 3 Precision and accuracy of relative protein quantitation assessed by the use of the mean squared error (MSE), for protein fold-change 1 (log2FC = 0) (A), protein fold-change 2 (log2FC = 1) (B), and protein fold-change 4 (log2FC = 2) (C) when including all peptides (all) or only peptides without any missed cleavages (noMC). The mean squared error (MSE) was calculated with the formula  =

 

 ∑

( − ̅ ) which sums the differences between

the estimated protein fold-changes and their true values, and its variability was estimated by data re-sampling (bootstrap method, n=1,000). Figure 4 A) Standard deviation distribution of the extracted peptide areas among digestion protocol replicates in HeLa cell extract of all peptides (red) and of peptides (blue) without missed cleavages; B) Rank correlation of estimated protein amounts in a HeLa cell extract calculated by T3PQ approach for the different digestion protocols. T3PQ was calculated with the median intensity of either the three most intense peptides per protein regardless of the number of missed cleavages, or the three most intense peptides per protein without missed cleavages (noMC). In those cases in which three peptides were not available, the remaining most intense peptides were used instead. The number of points used for this correlation analysis (after removing NA) was n=979.

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TABLES Table 1 Composition of the five different controlled mixtures prepared in this work with ratios 1:2:4 among the different protein subsets. Table 2 Percentage of missed cleavages at lysines and arginines.

ACS Paragon Plus Environment

Page 14 of 23

Page 15 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

SUPPLEMENTARY MATERIAL Supporting Information Available: Description of the material. This material is available free of charge via the Internet at http://pubs.acs.org. Supplementary Table ST1 List of the SwissProt accession numbers divided by subsets corresponding to the controlled proteins spiked in an E. coli background. Supplementary Table ST2 Complete set of inputs and outputs from MSstats Supplementary Table ST3 Summary table of the outputs from MSstats for the assessment of protein quantitation of different sample digestion protocols. Supplementary Table ST4 List of protein groups and peptides identified in each digestion protocol in the E. coli samples with spiked-in control proteins. Supplementary Table ST5 Pearson and Spearman correlation values of the extracted peptide areas corresponding to figures 2A, 2B and 4B. Supplementary Table ST6 List of protein groups and peptides identified in each digestion protocol in the HeLa sample.

Supplementary Figure S1 Sensitivity and specificity of relative protein quantitation for all the different values of protein fold-changes calculated with all peptides or only including peptides without any missed cleavages. Sensitivity was calculated as the proportion of proteins known to change that had a significant p-value. Similarly, specificity referred to the proportion of non-changing proteins which were correctly identified as such (i.e. having a non-significant p-value). Supplementary Figure S2 Distribution of median normalized peptide log2 intensities from different digestion protocols represented by a boxplot.

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 23

REFERENCES 1. Cerciello, F., Choi, M., Nicastri, A., Bausch-Fluck, D., Ziegler, A., Vitek, O., Felley-Bosco, E., Stahel, R., Aebersold, R., & Wollscheid, B. (2013) Identification of a seven glycopeptide signature for malignant pleural mesothelioma in human serum by selected reaction monitoring. Clin Proteomics 10, 16. 2. Sabidó, E., Quehenberger, O., Shen, Q., Chang, C.-Y., Shah, I., Armando, A. M., Andreyev, A., Vitek, O., Dennis, E. A., & Aebersold, R. (2012) Targeted proteomics of the eicosanoid biosynthetic pathway completes an integrated genomics-proteomics-metabolomics picture of cellular metabolism. Mol Cell Proteomics 11, M111.014746. 3. Sabidó, E., Wu, Y., Bautista, L., Porstmann, T., Chang, C.-Y., Vitek, O., Stoffel, M., & Aebersold, R. (2013) Targeted proteomics reveals strain-specific changes in the mouse insulin and central metabolic pathways after a sustained high-fat diet. Mol Syst Biol 9, 681. 4. Oliveira, A. P., Ludwig, C., Picotti, P., Kogadeeva, M., Aebersold, R., & Sauer, U. (2012) Regulation of yeast central metabolism by enzyme phosphorylation. Mol Syst Biol 8, 623. 5. Silva, J. C., Gorenstein, M. V., Li, G.-Z., Vissers, J. P. C., & Geromanos, S. J. (2006) Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition. Mol Cell Proteomics 5, 144-56. 6. Ishihama, Y., Oda, Y., Tabata, T., Sato, T., Nagasu, T., Rappsilber, J., & Mann, M. (2005) Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics 4, 1265-72. 7. Schwanhäusser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W., & Selbach, M. (2011) Global quantification of mammalian gene expression control. Nature 473, 337-42. 8. Proc, J. L., Kuzyk, M. A., Hardie, D. B., Yang, J., Smith, D. S., Jackson, A. M., Parker, C. E., & Borchers, C. H. (2010) A quantitative study of the effects of chaotropic agents, surfactants, and solvents on the digestion efficiency of human plasma proteins by trypsin. J Proteome Res 9, 5422-37. 9. Walmsley, S. J., Rudnick, P. A., Liang, Y., Dong, Q., Stein, S. E., & Nesvizhskii, A. I. (2013) Comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics. J Proteome Res 12, 5666-80. 10. León, I. R., Schwämmle, V., Jensen, O. N., & Sprenger, R. R. (2013) Quantitative assessment of in-solution digestion efficiency identifies optimal protocols for unbiased protein analysis. Mol Cell Proteomics 12, 2992-3005. 11. Loziuk, P. L., Wang, J., Li, Q., Sederoff, R. R., Chiang, V. L., & Muddiman, D. C. (2013) Understanding the role of proteolytic digestion on discovery and targeted proteomic measurements using liquid chromatography tandem mass spectrometry and design of experiments. J Proteome Res 12, 5820-9. 12. Peng, M., Taouatas, N., Cappadona, S., van Breukelen, B., Mohammed, S., Scholten, A., & Heck, A. J. R. (2012) Protease bias in absolute protein quantitation. Nat Methods 9, 524-5. 13. Lowenthal, M. S., Liang, Y., Phinney, K. W., & Stein, S. E. (2014) Quantitative bottom-up proteomics depends on digestion conditions. Anal Chem 86, 551-8. 14. Glatter, T., Ludwig, C., Ahrné, E., Aebersold, R., Heck, A. J. R., & Schmidt, A. (2012) Large-scale quantitative assessment of different in-solution protein digestion protocols reveals superior cleavage efficiency of tandem LysC/trypsin proteolysis over trypsin digestion. J Proteome Res 11, 5145-56. 15. Lange, V., Picotti, P., Domon, B., & Aebersold, R. (2008) Selected reaction monitoring for quantitative proteomics: a tutorial. Mol Syst Biol 4, 222. 16. Dephoure, N. & Gygi, S. P. (2011) A solid phase extraction-based platform for rapid phosphoproteomic analysis. Methods 54, 379-86. 17. Wiśniewski, J. R., Zougman, A., Nagaraj, N., & Mann, M. (2009) Universal sample preparation method for proteome analysis. Nat Methods 6, 359-62. 18. Perkins, D. N., Pappin, D. J., Creasy, D. M., & Cottrell, J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551-67.

ACS Paragon Plus Environment

Page 17 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

19. Chang, C.-Y., Picotti, P., Hüttenhain, R., Heinzelmann-Schwarz, V., Jovanovic, M., Aebersold, R., & Vitek, O. (2012) Protein significance analysis in selected reaction monitoring (SRM) measurements. Mol Cell Proteomics 11, M111.014662. 20. Siepen, J. A., Keevil, E.-J., Knight, D. & Hubbard, S. J. Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics. J. Proteome Res. 6, 399–408 (2007). 21. Ahrné, E., Molzahn, L., Glatter, T., & Schmidt, A. (2013) Critical assessment of proteome-wide label-free absolute abundance estimation strategies. Proteomics 13, 2567-78.

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

FIGURE 1

ACS Paragon Plus Environment

Page 18 of 23

Page 19 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

FIGURE 2

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

FIGURE 3

ACS Paragon Plus Environment

Page 20 of 23

Page 21 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

FIGURE 4

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TABLE 1

TABLE 2

Digestion Protocol Tinsol TFASP LTinsol LTFASP

%Missed K 69% ± 0% 74% ± 1% 38% ± 0% 7% ± 0%

%Missed R 31% ± 0% 26% ± 1% 62% ± 0% 93% ± 0%

ACS Paragon Plus Environment

Page 22 of 23

Page 23 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

GRAPHICAL ABSTRACT

ACS Paragon Plus Environment