MethylQuant: A Tool for Sensitive Validation of Enzyme-Mediated

Oct 23, 2017 - MethylQuant: A Tool for Sensitive Validation of Enzyme-Mediated Protein Methylation Sites from Heavy-Methyl SILAC Data ... by amino aci...
2 downloads 11 Views 3MB Size
Subscriber access provided by UNIVERSITY OF LEEDS

Article

MethylQuant: a tool for sensitive validation of enzyme-mediated protein methylation sites from heavy-methyl SILAC data Aidan P. Tay, Vincent Geoghegan, Daniel Yagoub, Marc R. Wilkins, and Gene Hart-Smith J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.7b00601 • Publication Date (Web): 23 Oct 2017 Downloaded from http://pubs.acs.org on October 25, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

MethylQuant: a Tool for Sensitive Validation of Enzyme-mediated Protein Methylation Sites from Heavy-methyl SILAC Data Aidan P. Tay,1‡ Vincent Geoghegan,2‡ Daniel Yagoub,1 Marc R. Wilkins,1 Gene Hart-Smith1* 1

NSW Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, University

of New South Wales, Sydney, New South Wales 2052, Australia 2

Centre for Immunology and Infection, University of York, Heslington, York, YO10 5DD, United

Kingdom ‡

These authors contributed equally to this work

Address reprint requests to Gene Hart-Smith, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, New South Wales 2052, Australia. Phone: +61-2-93853633; Fax, +61-2-9385-3950; E-mail: [email protected]

ACS Paragon Plus Environment

1

Journal of Proteome Research

Page 2 of 46

Abstract 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The study of post-translational methylation is hampered by the fact that large-scale LCMS/MS experiments produce high methylpeptide false discovery rates (FDRs). The use of heavymethyl SILAC can drastically reduce these FDRs; however this approach is limited by a lack of heavy-methyl SILAC compatible software. To fill this gap we recently developed MethylQuant. Here, using an updated version of MethylQuant, we demonstrate its methylpeptide validation and quantification capabilities and provide guidelines for its best use. Using reference heavy-methyl SILAC datasets, we show that MethylQuant predicts with statistical significance the true or false positive status of methylpeptides in samples of varying complexity, degree of methylpeptide enrichment, and heavy to light mixing ratios. We introduce methylpeptide confidence indicators – MethylQuant Confidence and MethylQuant Score – and demonstrate their strong performace in complex samples characterized by a lack of methylpeptide enrichment. For these challenging datasets, MethylQuant identifies 882 of 1165 true positive methylpeptide spectrum matches (i.e. >75% sensitivity) at high specificity ( 0.5) isotopomer pairs only (the ‘H/L ratio #2’ output). Using the above outputs, MethylQuant indicates for each PSM the likelihood that it is associated with a heavy and light peptide pair. These indicators – ‘MethylQuant Confidence’ and ‘MethylQuant Score’ – are detailed in the Results.

Statistical analysis of MethylQuant outputs For heavy-methyl SILAC datasets, logistic regression analyses were performed using SPSS Statistics (version 24, IBM) with the dependent variable being the true or false positive status of each methyl-PSM. Analyses were performed for data obtained individually from each combination ACS Paragon Plus Environment

12

Page 13 of 46

Journal of Proteome Research

of sample and instrument, and for data obtained from each sample using all instruments combined. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In each case, two analyses were performed. For Analysis 1, peptide pair detection was specified as a categorical predictor variable. PSMs were considered to have putative peptide pairs when 3 sets of m/z values matching those of theoretical heavy and light isotopomers were identified by MethylQuant, resulting in H/L Ratio #1 outputs > 0. For Analysis 2 the following MethylQuant outputs were specified as predictor variables: Isotope Distribution Correlation, H/L Ratio #1, H/L Ratio #2 and the number of Elution Profile Correlations > 0.5. Only PSMs identified by MethylQuant to have a putative peptide pair were used in Analysis 2.

MaxQuant and Proteome Discoverer quantification of peptide pairs To benchmark MethylQuant’s quantification performance, H/L ratio outputs from conventional SILAC datasets were compared to those obtained using MaxQuant (version 1.5.8.0), run using standard parameters,33 and Proteome Discoverer (version 2.1, Thermo Scientific). For MaxQuant searches, H/L ratios were obtained for PSMs identified via Andromeda, run using the following parameters: precursor ion and peptide fragment mass tolerances were ±4.5 ppm and ±20 ppm respectively; carbamidomethyl (C) was included as a fixed modification; oxidation (M) and N-terminal protein acetylation were included as variable modifications; enzyme specificity was trypsin with up to two missed cleavages; and the Swiss-Prot database (July 2017 release, 555,100 sequence entries) was searched using human sequences only. For Proteome Discoverer 2.1 searches, H/L ratios were obtained using the default SILAC 2plex (Arg10, Lys8) quantification method for PSMs identified via Mascot, run using the parameters described in Sequence database searches. Only H/L ratios associated with high scoring PSMs identified by both Andromeda (PEP ≤ 0.05) and Mascot (Percolator-derived q-value ≤ 0.01) were considered for benchmarking analyses. High scoring PSMs were kept irrespective of whether or not Andromeda and Mascot produced identical peptide sequence identifications.

ACS Paragon Plus Environment

13

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 46

Results The MethylQuant outputs summarized in Figure 1, and described in detail above, have been designed to facilitate the validation and relative quantification of methyl-SILAC pairs (and by implication, methyl-PSMs). The results presented below firstly evaluate the performance of these MethylQuant outputs for methylpeptide validation. The ability for MethylQuant to accurately quantify high confidence peptide pairs is then evaluated.

MethylQuant outputs are significant predictors of true and false positive methylSILAC pairs To evaluate the efficacy of MethylQuant for methyl-SILAC pair validation, MethylQuant was tested against heavy-methyl SILAC datasets in which true and false positive methyl-PSMs had previously been unambiguously determined (vide supra).26, 31 These datasets, summarized in Figure 2, were obtained from a total of 369 LC-MS/MS experiments conducted using 3 instrument platforms on 3 sample types. Figure 2 shows that each of these datasets presents a different set of challenges when characterizing methyl-SILAC pairs. In combination they allow the performance of MethylQuant to be assessed in relation to sample complexity (i.e. frequency of peptide feature coelution), H/L protein mixing ratios, methylpeptide enrichment, methylpeptide ion intensities and XIC resolution.

ACS Paragon Plus Environment

14

Page 15 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Journal of Proteome Research

Figure 2. Characteristics of the heavy-methyl SILAC reference datasets used in the present study. Average numbers of peptide features detected per 20 m/z by 0.5 min region in each LCMS/MS experiment are shown for each dataset. Representative XICs for the light and heavy monoisotopic peaks of doubly charged RMe.GGFGGR, derived from S. cerevisiae 40S ribosomal protein S2, are shown for each dataset. Very challenging and challenging characteristics of each dataset in relation to methyl-SILAC pair characterization are highlighted in red and yellow boxes respectively.

ACS Paragon Plus Environment

15

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Page 16 of 46

Figure 3. MethylQuant outputs from the reference heavy-methyl SILAC datasets of Figure 2 when outputs from instrument-specific datasets are combined. For each panel outputs are shown for false positive methylpeptide spectrum matches (top; orange), true positive methylpeptide spectrum matches (middle; green) and unmethylated methionine-containing peptide spectrum matches (bottom; grey). (A) Proportions of peptide spectrum matches detected with putative heavy and light peptide pairs (i.e. H/L Ratio #1 outputs > 0). (B) MethylQuant outputs for peptide spectrum matches detected with putative peptide pairs. Histograms of Isotope Distribution Correlations (bin intervals = 0.02; lines show cumulative frequency) are on the left. Histograms of log2(H/L Ratio #1) outputs (bin intervals = 0.01) are in the middle. Histograms of the number of Elution Profile Correlations > 0.5 are on the right.

ACS Paragon Plus Environment

16

Page 17 of 46

Journal of Proteome Research

Figure 3 provides a summary of MethylQuant outputs obtained from the abovementioned 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

datasets when outputs from instrument-specific datasets are combined for each sample type (results shown for biological replicate 1 of the SDS-PAGE samples; results for biological replicate 2 shown in Figure S3). For each dataset, MethylQuant outputs associated with false positive methyl-PSMs (top), true positive methyl-PSMs (middle), and unmethylated methionine-containing PSMs (bottom) are shown separately. False positive methyl-PSMs are not associated with genuine heavy and light peptide pairs, whereas true positive methyl-PSMs and ~99% of unmethylated methionine-containing PSMs are. Table 1 complements Figures 3 and S3 by summarizing the probability values at which MethylQuant outputs serve as predictors of the true and false positive status of methyl-PSMs. Results derived from the same MethylQuant outputs, separated according to instrument type, are shown in Figures S1, S2 and S4, and Table S1. H/L Ratio #2 outputs, not shown in the above figures, are illustrated in Figure S5.

ACS Paragon Plus Environment

17

Journal of Proteome Research

Page 18 of 46

Table 1. Probability values at which MethylQuant outputs predict the true and false positive status of methylpeptide

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

spectrum matches (significant predictors of p < 0.01 are labeled **). Data are from logistic regression analyses conducted using MethylQuant outputs from the reference heavy-methyl SILAC datasets of Figure 2 when peptide spectrum matches from instrument-specific datasets are combined, with the dependent variable being the true or false positive status of each methylpeptide spectrum match. For Analysis 1, peptide pair detection (Y/N) was a categorical predictor variable. For Analysis 2 the listed MethylQuant outputs were predictor variables, and only peptide spectrum matches detected with putative peptide pairs were analyzed.

Analysis 1

Dataset

HILIC samples SDS-PAGE samples (replicate 1) SDS-PAGE samples (replicate 2) Immunoaffinity enriched sample

Analysis 2

Peptide pair detection

Isotope Distribution Correlation

H/L Ratio #1

H/L Ratio #2

Number of Elution Profile Correlations > 0.5

**

**

n.s.

**

**

p = 3.4E-68

p = 0.0061

p = 0.498

p = 9.1E-4

p = 9.8E-7

**

**

**

**

**

p = 8.3E-197

p = 7.6E-5

p = 2.0E-6

p = 0.014

p = 2.9E-23

**

**

**

**

**

p = 1.4E-80

p = 1.5E-26

p = 6.7E-9

p = 2.1E-4

p = 1.8E-10

n/a

n/a

n/a

n/a

n/a

Figure 3A, and the related Figures S1-S4, show the proportions of PSMs identified by MethylQuant to have a putative heavy and light peptide pair (i.e. PSMs that produce H/L Ratio #1 outputs > 0). For each sample type, putative peptide pairs are detected more frequently for true positive methyl-PSMs and unmethylated methionine-containing PSMs than false positive methylPSMs. Tables 1 and S1 show that, for each instrument and sample type (other than the immunoaffinity enriched sample), identification of putative peptide pairs serves as a significant predictor of the true or false positive status of methyl-PSMs. For the immunoaffinity enriched sample, logistic regression analysis was not performed due to an insufficient quantity of data. Nonetheless for this dataset 89% of true positive methyl-PSMs are detected with putative heavy and light peptide pairs, whereas none are detected for false positive methyl-PSMs.

ACS Paragon Plus Environment

18

Page 19 of 46

Journal of Proteome Research

Despite these positive results it is apparent that for complex datasets (e.g. the present HILIC1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and SDS-PAGE-derived datasets) putative peptide pair detection cannot categorize true and false positive methyl-PSMs with perfect specificity. Additional predictors of true or false positive status are therefore required for methyl-PSMs detected with putative peptide pairs. Figure 3B and the related Figures S1-S4 show histograms derived from possible such predictors; i.e. MethylQuant’s Isotope Distribution Correlation, H/L Ratio and Elution Profile Correlation outputs. When considering MethylQuant’s Isotope Distribution Correlation outputs, false positive methyl-PSMs produce broad distributions of good and poor correlations. This is true for each instrument and sample type, with the exception of the immunoaffinity enriched sample in which no putative peptide pairs were detected for false positive methyl-PSMs. In contrast true positive methylPSMs and unmethylated methionine-containing PSMs produce distributions skewed towards high Isotope Distribution Correlations. These observations are most pronounced in datasets derived from samples with relatively high (> 0.33) H/L mixing ratios. For the HILIC-derived dataset, which is associated with a relatively low (~0.08) H/L mixing ratio, heavy peptide ion intensities are consistently near the linear dynamic range limits of the employed orbitrap mass analyzers, resulting in an overall lowering of Isotope Distribution Correlations. Nonetheless for the datasets derived from both SDS-PAGE and HILIC samples, Isotope Distribution Correlations serve as significant predictors of the true or false positive status of methyl-PSMs, as summarized in Table 1. When considering MethylQuant’s H/L Ratio outputs, Figure 3B and Figures S1-S4 show that false positive methyl-PSMs often have low H/L Ratios. This is most marked in the datasets derived from SDS-PAGE samples, in which overall ion intensities are high. To explain this observation it should be noted that, when performing heavy-methyl SILAC, theoretical heavy partner peptide ions can have m/z values matching those of light peptide isotopomers. Thus when the ion intensities of false positive methylpeptides are high, multiple isotopomers matching those of putative heavy peptide ions may be observed, producing a putative peptide pair with a very low (< 0.06) H/L Ratio. In contrast the histograms for true positive methyl-PSMs and unmethylated methionine-containing PSMs contain relatively few very low H/L Ratios and are approximately normally distributed. In the ACS Paragon Plus Environment

19

Journal of Proteome Research

Page 20 of 46

SDS-PAGE-derived datasets, both of MethylQuant’s H/L Ratio outputs serve as significant 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

predictors of the true or false positive status of methyl-PSMs, although the H/L Ratio #2 output is a less reliable predictor when datasets derived from each instrument type are considered separately (see Tables 1 and S1). The predictive capacities of the H/L Ratio outputs in the HILIC-derived dataset are relatively lower. Only the H/L Ratio #2 output serves as a significant predictor, and neither H/L Ratio output serves as a consistent predictor when datasets for each instrument type are considered separately. This stems from the fact that, due to the low H/L mixing ratio associated with the HILIC sample, H/L Ratios for true positive methyl-SILAC pairs are not substantially different from the false positive pairs with very low H/L Ratios described above. When considering MethylQuant’s Elution Profile Correlation outputs, Figure 3B and Figures S1-S4 show that false positive methyl-PSMs predominantly produce 0 well correlated XIC pairs, and very few instances in which multiple XIC pairs are well correlated (when using MethylQuant’s default Pearson correlation coefficient threshold of 0.5). This is true for every dataset in which putative peptide pairs are detected for false positive methyl-PSMs. In comparison true positive methyl-PSMs produce a higher proportion of well correlated XIC pairs. This is particularly marked in the SDS-PAGE-derived datasets, where true positive methyl-SILAC pairs are typically observed with 3 Elution Profile Correlations > 0.5. In the HILIC-derived dataset, the low H/L mixing ratio reduces the general quality of heavy isotopomer XIC peak shapes, resulting in fewer true positive heavy isotopomer XICs correlating with their light counterparts. Nonetheless for both SDS-PAGEand HILIC-derived datasets, the number of Elution Profile Correlations > 0.5 consistently serves as a significant predictor of the true or false positive status of methyl-PSMs (see Tables 1 and S1). Notably this predictive capacity is observed in all datasets derived from LTQ Orbitrap Velos Pro ETD instrumentation, in which slow ETD scan speeds (relative to CID and HCD) compromised XIC resolution. This indicates that MethylQuant’s Elution Profile Correlation output is tolerant of low XIC resolution. Another notable finding relating to MethylQuant’s Elution Profile Correlation output is that true positive methyl-PSMs produce a higher proportion of Elution Profile Correlations > 0.5 than ACS Paragon Plus Environment

20

Page 21 of 46

Journal of Proteome Research

unmethylated methionine-containing PSMs. This unexpected difference stems from the observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

that heavy methionine-containing peptide ions consistently elute marginally earlier than their light counterparts in the reverse-phase LC conditions employed here (see Figure 4, elaborated upon below), resulting in an overall reduction in the number of well correlated XIC pairs for methioninecontaining PSMs. In contrast true positive heavy and light methylpeptide ions consistently elute at identical retention times; the correlations of their XIC pairs are therefore not similarly affected by elution time differences. This suggests that the exposed deuterons on heavy methionine residues slightly reduce the hydrophobicity of heavy peptides relative to their light counterparts. On the other hand, the exposed deuterons in peptides with heavy methylated arginine or lysine residues either produce a proportionally lower and not observable reduction in peptide hydrophobicity relative to their light counterparts, or do not alter peptide hydrophobicity at all, and therefore do not impact upon MethylQuant’s ability to identify methyl-SILAC pairs (provided these peptides do not also contain methionine).

Performance of MethylQuant for methylpeptide validation The above results indicate that all MethylQuant outputs are capable of serving as predictors of the true or false positive status of methyl-PSMs. They are tolerant of high sample complexity and low XIC resolution, and are particularly effective when H/L mixing ratios of > 0.33 are used. Despite this, it is also apparent that none of these outputs can categorize true and false positive methyl-PSMs with perfect specificity when used as stand-alone predictors. MethylQuant outputs must therefore be used in combination for sensitive and specific methylpeptide validation. Based on the evaluations described above, MethylQuant provides 2 different confidence indicators for methyl-PSMs (or other PSMs of interest). These confidence indicators, ‘MethylQuant Confidence’ and ‘MethylQuant Score’, are derived from the combinations of MethylQuant outputs detailed below, and indicate the confidence by which each PSM of interest can be associated with a true positive methyl-SILAC pair (or other co-eluting heavy and light peptide pair of interest). These confidence indicators are designed to serve complementary purposes (elaborated upon in the

ACS Paragon Plus Environment

21

Journal of Proteome Research

Page 22 of 46

Discussion), and to be applicable across datasets of differing complexity, H/L mixing ratios and XIC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

resolution. The MethylQuant Confidence indicator designates PSMs as ‘Very High’, ‘High’ or ‘Low’ confidence. PSMs designated by MethylQuant as Very High confidence have: a putative heavy and light peptide pair; an Isotope Distribution Correlation ≥ 0.99 for this peptide pair; 3 well correlated isotopomer pairs (i.e. 3 Elution Profile Correlations > 0.5 when using MethylQuant’s default Pearson correlation coefficient setting); and a H/L Ratio #1 output ≥ 0.06 (when considering mass shifts associated with methyl-SILAC pairs only). PSMs designated by MethylQuant as High confidence have: a putative heavy and light peptide pair; an Isotope Distribution Correlation ≥ 0.75 for this peptide pair; at least 2 well correlated isotopomer pairs (i.e. at least 2 Elution Profile Correlations > 0.5 when using MethylQuant’s default Pearson correlation coefficient setting); and a H/L Ratio #1 output ≥ 0.06 (when considering mass shifts associated with methyl-SILAC pairs only). PSMs that are not designated as Very High or High confidence are designated Low confidence. The MethylQuant Score is derived from a logit model produced using data from biological replicate 1 of the SDS-PAGE samples (i.e. the largest of the 4 datasets studied here), using combined outputs from instrument-specific datasets. Logistic regression analysis was performed with the dependent variable being the true or false positive status of each methyl-PSM; predictor variables were the detection of putative peptide pairs (0 when pairs were not detected and 1 when pairs were detected), the Isotope Distribution Correlation output, and the number of Elution Profile Correlations > 0.5 (0, 1, 2 or 3). The coefficients and probability values for the predictor variables in the resulting model are provided in Table S2. To produce the final MethylQuant Score, logit model-derived probability values for each methyl-PSM (i.e. values ranging from 0 to 1.0) are linearly converted to scores ranging from 0 and 50.

ACS Paragon Plus Environment

22

Page 23 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Journal of Proteome Research

Figure 4. Examples of MethylQuant characterizations of putative peptide pairs associated with peptide spectrum matches. Mass spectra and XICs measured by MethylQuant for the isotopomers of putative heavy and light peptide pairs are shown alongside MethylQuant outputs. (A) Data from a true positive methylpeptide spectrum match to doubly charged NVSVK2Me.EIR, derived from S. cerevisiae elongation factor 1-α. (B) Data from a false positive methylpeptide spectrum match. (C) Data from a true positive methylpeptide spectrum match to doubly charged ASLFAQGKMe.R, derived from S. cerevisiae 60S ribosomal protein L42-A/B. (D) Data from an unmethylated methionine-containing peptide spectrum match to doubly charged TPAEMSRPATTTR, derived from S. cerevisiae fatty acid synthase subunit-α.

ACS Paragon Plus Environment

23

Journal of Proteome Research

Page 24 of 46

Figure 4 provides practical examples showing how MethylQuant outputs are used in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

combination to produce the abovementioned MethylQuant confidence indicators. Figure 4A illustrates MethylQuant outputs obtained from a true positive methyl-PSM, NVSVK2Me.EIR from elongation factor 1-α. Multiple strong predictors of the true positive status of this methyl-PSM are observed. MethylQuant identifies that this methyl-PSM is associated with a putative peptide pair, that this peptide pair has a high Isotope Distribution Correlation (> 0.99), and that all 3 putative isotopomer pairs are well correlated (Elution Profile Correlations > 0.5). Together these MethylQuant outputs define this methyl-PSM as Very High confidence, and produce a MethylQuant Score of 49.1. In comparison Figure 4B illustrates MethylQuant outputs obtained from a false positive methyl-PSM. Despite being a false positive, two strong predictors of true positive status are observed for this methyl-PSM: it is associated with a putative peptide pair, and this putative peptide pair has a high Isotope Distribution Correlation (1.00). However it is also observed that all 3 Elution Profile Correlations are poor, indicating that the apparent heavy and light peptide pair instead stems from the co-elution of unrelated peptide ions. Together these MethylQuant outputs define this methyl-PSM as Low confidence, and produce a MethylQuant Score of 9.4. Figure 4C illustrates MethylQuant outputs obtained from another true positive methyl-PSM, ASLFAQGK2Me.R from 60S ribosomal protein L42-A/B. Relative to the example shown in Figure 4A, fewer strong predictors of the true positive status of this methyl-PSM are observed. MethylQuant identifies that this methylPSM is associated with a putative peptide pair; however this peptide pair has a relatively low Isotope Distribution Correlation (0.83). XICs reveal that this is due to co-elution of an unrelated peptide ion of m/z matching that of the heavy monoisotopic peak, resulting in a low Elution Profile Correlation 1 output. Despite the interference from this co-eluting peptide ion, the Elution Profile Correlation outputs still strongly predict the true positive status of this methyl-PSM, as multiple Elution Profile Correlation outputs > 0.5 are observed. Together these MethylQuant outputs define this methyl-PSM as High confidence (but not Very High confidence), and produce a MethylQuant Score of 44.3. Interestingly for this methyl-PSM the accuracy of the H/L Ratio #1 output is reduced due to the

ACS Paragon Plus Environment

24

Page 25 of 46

Journal of Proteome Research

abovementioned co-eluting peptide ion; however the accuracy of the H/L Ratio #2 remains high, as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

this output dismisses data obtained from poorly correlated XIC pairs. Figure 4 also provides an example of the different combinations of MethylQuant outputs observed for an unmethylated methionine-containing PSM: TPAEMSRPATTTR from the fatty acid synthase subunit-α (part D). It can be seen that MethylQuant correctly identifies the heavy and light peptide pair associated with this PSM, and shows that this peptide pair has an Isotope Distribution Correlation (> 0.99). However the early elutions of heavy peptide isotopomers relative to their light counterparts result in Elution Profile Correlation outputs < 0.5 for all 3 isotopomer pairs. Together these MethylQuant outputs define this PSM as Low confidence, and produce a MethylQuant Score of 9.4. This underscores that MethylQuant confidence indicators are not designed to identify heavy and light methionine-containing peptide pairs, although for many methionine-containing PSMs these confidence indicators will nonetheless remain high.

ACS Paragon Plus Environment

25

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Page 26 of 46

Figure 5. FDRs and sensitivities of methylpeptide spectrum matches identified using MethylQuant’s confidence indicators. Results are from the reference heavy-methyl SILAC datasets of Figure 2 when outputs from instrument-specific datasets are combined (results are shown for biological replicate 1 of the SDS-PAGE samples). Sensitivities refer to the proportion of true positive methylpeptide spectrum matches of q-value ≤ 0.01 identified using each method. (A) Results obtained using MethylQuant Confidence designations (above) and target-decoy approach-based sequence database search filtering using Mascot Ion Score thresholds (below). Pie chart sizes reflect the sensitivities of true positive methylpeptide spectrum match identifications. (B) Results obtained using MethylQuant Scores.

ACS Paragon Plus Environment

26

Page 27 of 46

Journal of Proteome Research

Figure 5 illustrates for the present datasets the FDRs and sensitivities of methyl-PSMs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

identified using MethylQuant’s confidence indicators when outputs from all instrument types are combined (results are shown for biological replicate 1 of the SDS-PAGE samples; similar results for biological replicate 2 are shown in Figure S6). Figures 5A and S6A also illustrate the FDRs and sensitivities of methyl-PSMs identified using traditional target-decoy approach-based filtering. The data underlying Figures 5A and S6A are shown in Tables S3-6 for datasets separated according to instrument type (with the exception of data obtained from the immunoaffinity enriched sample, in which there were insufficient numbers of PSMs for target-decoy approach-based filtering). These figures and tables reinforce our previous finding that the target-decoy approach is not an effective predictor of the true or false positive status of methyl-PSMs.31 Actual methyl-PSM FDRs substantially exceed the 1% methyl-PSM FDRs estimated using both the global and separate methylpeptide target-decoy approaches. Additionally datasets filtered using the separate methylpeptide target-decoy approach are severely compromised by low methyl-PSM sensitivities. For the respective HILIC-, SDS-PAGE (biological replicate 1)- and SDS-PAGE (biological replicate 2)-derived datasets, only 6.1(±1.8)%, 9.5(±1.7)% and 13.4(±1.1)% of true positive methyl-PSMs in unfiltered sequence database searches (with q-value ≤ 0.01) survive this methyl-PSM filtering approach. In contrast to these traditional data filtering methods, MethylQuant confidence indicators are capable of identifying true positive methyl-PSMs with both high specificity and sensitivity. When considering MethylQuant Confidence designations (Figures 5A and S6A), methyl-PSMs designated as Very High confidence are almost always true positives. Of the 750 methyl-PSMs given this designation across the present datasets only 4 are associated with false positive methyl-PSMs (an overall FDR of 0.53%). Moreover all 4 of these false positive methyl-PSMs stem from a single peptide ion that was subjected to multiple MS/MS events in only one dataset. These very low FDRs are however observed alongside sub-optimal sensitivity. For the respective HILIC-, SDS-PAGE (biological replicate 1)- and SDS-PAGE (biological replicate 2)-derived datasets, only 19.2(±6.0)%, 41.0(±4.6)% and 32.7(±4.6)% of true positive methyl-PSMs identified in sequence database searches ACS Paragon Plus Environment

27

Journal of Proteome Research

Page 28 of 46

are designated as Very High confidence, while only 11.1% are designated as such in the dataset 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

derived from the immunoaffinity enriched sample. (The particularly poor sensitivity associated with this dataset stems from its very low methylpeptide ion intensities; these findings are elaborated upon in the Discussion.) However when methyl-PSMs designated by MethylQuant as High confidence are also considered, methyl-PSM sensitivities increase. While this increase is not large for the HILICderived dataset, it is substantial for the two SDS-PAGE-derived datasets: 75.7(±4.1)% and 75.2(±4.0)% of true positive methyl-PSMs identified in sequence database searches of these datasets are designated as High confidence. The remaining false negative methyl-PSMs stem from methylpeptides with long elution profiles that are subjected to multiple MS/MS events, producing a series of redundant methyl-PSMs (data not shown). Almost all of these methylpeptides are designated as High confidence from at least one of their redundant methyl-PSMs. These high methyl-PSM sensitivities are observed alongside low methyl-PSM FDRs. Of the 1,421 methylPSMs designated as High confidence across the present datasets only 25 are associated with false positive methyl-PSMs (an overall FDR of 1.7%). Figures 5B and S6B illustrate for the present SDS-PAGE- and HILIC-derived datasets the FDRs and sensitivities of methyl-PSM identifications obtained using MethylQuant Scores. For both of the SDS-PAGE-derived datasets, methyl-PSM identification sensitivities of > 70% can be achieved alongside FDRs of < 5%. However MethylQuant Scores do not perform as well for the HILIC-derived dataset; this is predominantly due to the low H/L mixing ratio associated with the HILIC sample compromising the Elution Profile Correlation outputs (vide supra). Nonetheless for this dataset methyl-PSM identification sensitivities of > 20% (corresponding to > 110 methyl-PSMs) can still be achieved alongside FDRs of < 5%. Together these results indicate that MethylQuant’s confidence indicators are capable of validating methyl-PSMs with very high specificity, and that high sensitivity analyses may be achieved while keeping methyl-PSM FDRs acceptably low. Guidelines for best practice use of MethylQuant’s performance indicators are outlined in the Discussion.

ACS Paragon Plus Environment

28

Page 29 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Performance of MethylQuant for peptide pair quantification MethylQuant has been designed to provide accurate relative quantification information for the heavy and light peptide pairs that it identifies. The following section evaluates MethylQuant’s H/L Ratio #1 and H/L Ratio #2 outputs toward this purpose. To benchmark these outputs, MethylQuant was run against a complex peptide dataset prepared from unlabeled and conventionally SILAC labeled (R10K8) proteins mixed in a 1:1 ratio. As these heavy and light proteins were obtained from different cellular conditions, H/L ratios deviating from 1:1 were expected. MethylQuant H/L Ratio outputs were compared to equivalent outputs obtained from MaxQuant and Proteome Discoverer 2.1; these software platforms utilize different methods for calculating H/L ratios. MaxQuant uses three-dimensional (intensity × m/z × retention time) peak areas obtained from peptide features to calculate its ratios,33 while Proteome Discoverer 2.1 calculates its ratios using a simpler measure, individual peptide ion peak intensities. In contrast MethylQuant’s H/L Ratio #1 uses summed heavy and light isotopomer peak intensities from an averaged mass spectrum, while its H/L Ratio #2 uses summed heavy and light isotopomer XIC areas from well correlated isotopomer pairs, as detailed earlier.

ACS Paragon Plus Environment

29

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Page 30 of 46

Figure 6. Correlations of MethylQuant-, MaxQuant- and Proteome Discoverer 2.1-derived log2(H/L ratios) obtained from stimulated (R10K8 SILAC labeled) and unstimulated (unlabeled) human primary T lymphocyte samples mixed 1:1. Regression lines and their associated R2 values are shown for the data subsets defined in figure legends. Dotted lines represent theoretical perfect positive correlations. Results are presented following removal of 2 outliers of Proteome Discoverer 2.1 H/L ratio > 30 and 1 outlier of MethylQuant H/L Ratio #1 > 30. (A) Data from all SILAC pairs commonly identified across the MethylQuant, MaxQuant and Proteome Discoverer 2.1 software platforms, categorized by MethylQuant Confidence. (B) Data from SILAC pairs commonly identified across the MethylQuant, MaxQuant and Proteome Discoverer 2.1 software platforms with Isotope Distribution Correlations < 0.75.

ACS Paragon Plus Environment

30

Page 31 of 46

Journal of Proteome Research

Figure 6 illustrates the results of these benchmarking experiments. Figure 6A displays data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

for all SILAC pairs commonly identified across the MethylQuant, MaxQuant and Proteome Discoverer 2.1 software platforms. In each scatter plot, SILAC pairs are categorized according to their MethylQuant Confidence. As expected, all H/L ratio outputs obtained across the different software platforms are strongly positively correlated, and strong positive correlations are also observed between both MethylQuant H/L Ratio outputs. Despite these overall positive correlations, it is nonetheless apparent that the different methods of measuring H/L ratios deviate from one another to some extent. These deviations are most apparent when all SILAC pairs are considered, and decrease when only SILAC pairs associated with strong MethylQuant Confidence are considered. This indicates that the different methods of measuring H/L ratios most substantially deviate when peptide pairs are either subject to extensive interference from co-eluting peptide ions or have very low ion intensities, and are therefore designated by MethylQuant as Low confidence. For well behaved peptide pairs designated by MethylQuant as Very High confidence, all of the different H/L ratio measurements show very strong positive correlations. Interestingly of the different H/L ratio measurements studied in Figure 6A, MethylQuant’s H/L Ratio #1 outputs most substantially deviate from the others. This is likely due to the fact that this MethylQuant output utilizes 3 isotopomer peak intensities per peptide ion (6 peaks in total for a peptide pair) to calculate H/L ratios, and is therefore subject to more interference from co-eluting peptide ions than the other H/L ratio measurements. Figure 6B shows scatter plots that test this hypothesis. The data points in Figure 6B are from SILAC pairs with poorly correlated isotope distributions (MethylQuant Isotope Distribution Correlations < 0.75); co-eluting peptide ions are therefore likely to have interfered with the majority of these SILAC pairs. While all of the different H/L ratio outputs remain positively correlated in Figure 6B, MethylQuant’s two different H/L ratio measurements substantially deviate from one another relative to the data obtained from all SILAC pairs. It can also be seen that MethylQuant’s H/L Ratio #2 outputs remain strongly positively correlated with both the MaxQuant and Proteome Discoverer 2.1 H/L ratios, whereas MethylQuant’s H/L Ratio #1 outputs are only weakly positively correlated. This indicates that interference from coACS Paragon Plus Environment

31

Journal of Proteome Research

Page 32 of 46

eluting peptide ions does have a larger impact on MethylQuant’s H/L Ratio #1 than the other H/L 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ratio measurements. In addition MaxQuant and Proteome Discoverer 2.1 H/L ratios are more strongly correlated with each other than MethylQuant’s H/L Ratio #2 outputs. This indicates that coeluting peptide ions have a different impact upon MethylQuant’s H/L Ratio #2 outputs than the MaxQuant and Proteome Discoverer 2.1 H/L ratios. As MethylQuant’s H/L Ratio #2 outputs are calculated after removal of XIC peak areas that show evidence of interference from co-eluting peptide ions, it is therefore possible that interference from co-eluting peptide ions reduce the accuracy of both the MaxQuant and Proteome Discoverer 2.1 H/L ratios, but not MethylQuant’s H/L Ratio #2 outputs. Taken together the above results indicate that, for broad-scale relative quantification of peptide pairs, both of MethylQuant’s H/L Ratio outputs strongly correlate with those obtained from MaxQuant and Proteome Discoverer 2.1. The very strong H/L ratio correlations observed for peptide pairs designated by MethylQuant as Very High confidence suggest that, for these particular peptide pairs, all 3 software platforms produce very accurate quantification data. For peptide pairs that show evidence of interference from co-eluting peptide ions (i.e. relatively low Isotope Distribution Correlation outputs), although decreased accuracy can be expected from MethylQuant’s H/L Ratio #1 outputs, particularly robust accuracy can be expected from MethylQuant’s H/L Ratio #2 outputs. The best practice use of MethylQuant’s two H/L Ratio outputs is elaborated upon in the Discussion.

Discussion To date, the utility of heavy-methyl SILAC and its offshoots for validating methylpeptide identifications has been constrained by a lack of software. This has limited the use of these techniques in large-scale characterizations of enzyme-mediated methylation.31 The results presented here illustrate that MethylQuant is capable of filling this software gap. MethylQuant can specifically and sensitively identify individual heavy and light peptide pairs in samples in which not all PSMs exist in such pairs. In this way it is capable of classifying true and false positive methyl-PSMs identified from heavy-methyl SILAC (or related) datasets. MethylQuant thus offers an automated

ACS Paragon Plus Environment

32

Page 33 of 46

Journal of Proteome Research

means of validating methylpeptide identifications in large-scale LC-MS/MS experiments, while also 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

providing accurate relative quantification information for identified methyl-SILAC pairs. The results presented in this study indicate that MethylQuant is tolerant of high sample complexity and a lack of methylpeptide enrichment. Nonetheless for any given MethylQuant analysis, the specificity, sensitivity and accuracy of methylpeptide identification and quantification will be affected by sample preparation and experimental design. This is elaborated upon below in guidelines for the best practice use of MethylQuant.

Best practice use of MethylQuant for methylpeptide validation To maximize the specificity and sensitivity of methylpeptide identifications when using MethylQuant, carefully considered sample preparation can be of benefit. Regarding specificity, it is useful to note that all false positive methyl-SILAC pairs identified by MethylQuant stem from the misidentification of unrelated peptide ions as co-eluting heavy and light peptide pairs. The rates of such misidentifications are directly related to sample complexity (i.e. peptide feature density in LCMS/MS data). The HILIC- and SDS-PAGE-derived datasets used in the present MethylQuant evaluations are of high complexity, as described earlier. It is therefore likely that methylpeptide FDRs even lower than those described in Figures 5 and S6 can be obtained if, when preparing samples for heavy-methyl SILAC analysis, overall sample complexity can be decreased while maintaining or increasing relative methylpeptide abundances. For example there were no false positive methyl-PSMs in the MethylQuant searches of the data collected from the immunoaffinity enriched sample studied here. Thus, although MethylQuant can be expected to operate well for high complexity datasets, the use of MethylQuant in conjunction with the growing list of available methylpeptide enrichment strategies15, 19, 23, 30, 39-41 is highly recommended. When considering the sensitivity of MethylQuant analysis (i.e. the proportion of true positive methyl-PSMs identified from sequence database searches that are validated by MethylQuant), our evaluations show that peptide ion intensities are particularly important. For this reason we recommend use of H/L mixing ratios between 3:1 and 1:3. This is because MethylQuant’s ability to

ACS Paragon Plus Environment

33

Journal of Proteome Research

Page 34 of 46

detect peptide pairs becomes compromised when intensities of isotopomers associated with these 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

pairs fall near or below mass analyzer detection limits. The likelihood of this scenario for any given methyl-SILAC pair is increased when heavily uneven H/L ratios are used, such as the ~1:12 H/L ratio of the present HILIC-derived dataset. We do however note that 1:1 H/L mixing ratios do not appear to offer advantages in sensitivity over slightly uneven H/L mixing ratios. For example in the convention SILAC dataset studied here, 31.4% and 77.0% of PSMs were respectively designated by MethylQuant to be Very High and High confidence members of peptide pairs (whereas every PSM in this dataset should theoretically be associated with a heavy and light peptide pair). These identification sensitivities for predominantly 1:1 SILAC pairs are comparable to the identification sensitivies for the ~1:3 methyl-SILAC pairs obtained from the SDS-PAGE-derived datasets evaluated here (shown in Figures 5A and S6A). This suggests that slightly uneven H/L mixing ratios, which may be used to increase overall peptide identification rates in metabolic labeling experiments,42 will not compromise the sensitivity of MethylQuant searches. In addition to sample preparation, consideration must also be given to sequence database search methodologies and the lists of methyl-PSMs to supply to MethylQuant searches. This is because MethylQuant’s classification of true and false positive heavy and light peptide pairs is independent of sequence database searching. Thus, although very poorly filtered sequence database search results do not diminish MethylQuant’s capacity to identify genuine heavy and light peptide pairs, we find that the pairs identified from such data are frequently associated with incorrectly sequenced PSMs or methyl-PSMs with incorrect methylation site localizations (data not shown). To account for this, when attempting to validate methyl-PSMs using MethylQuant we recommend that sequence database searches are initially filtered to estimated 1% FDRs based on the global targetdecoy approach. This brings the number of falsely sequenced methylpeptides down to levels that are appropriate for MethylQuant searches without drastically reducing overall methyl-PSM sensitivity, which occurs when applying the separate methylpeptide target-decoy approach.31 Following MethylQuant searching we suggest that MethylQuant’s confidence indicators – MethylQuant Confidence and MethylQuant Score – can be used for complementary purposes. To ACS Paragon Plus Environment

34

Page 35 of 46

Journal of Proteome Research

identify validated methyl-PSMs we recommend using MethylQuant Confidence designations, and 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

filtering datasets to only include Very High confidence PSMs. When filtering datasets in this manner, false positive methyl-PSMs will be extremely rare when analyzing highly complex datasets and negligible when analyzing less complex datasets with low peptide feature densities. To obtain larger pools of methyl-PSMs that are predominantly true positives, we recommend also including PSMs of High MethylQuant Confidence. When filtering datasets in this manner, low methyl-PSM FDRs can be expected when analyzing highly complex datasets (~2%) and even lower FDRs can be expected for less complex datasets (exact FDRs will be dependent on peptide feature densities). Moreover we suggest that methyl-PSMs identified by MethylQuant as High confidence across multiple biological replicates can be considered validated. This suggestion is informed by our evaluations of the present biological replicate SDS-PAGE-derived datasets, which indicate that peptide ions that are mistaken for heavy and light peptide pairs do not co-elute with perfect consistency across biological replicate LC-MS/MS experiments. For more flexible analyses in which the methyl-PSM FDR to be tolerated may be chosen, we recommend using the MethylQuant Score. As MethylQuant Scores are independent of sequence database search scores, MethylQuant Score performance can be accurately estimated for individual datasets using the target-decoy approach. For such analyses we recommend identifying statistically significant target and decoy methyl-PSMs using sequence database search utility metrics (such as the Mascot Expect value), and running these PSMs through MethylQuant. Methyl-PSM FDRs may then be estimated using the target-decoy approach for different MethylQuant Score thresholds. The logit model underlying the MethylQuant Score was derived from a highly complex dataset, as mentioned earlier. It can therefore be expected that when it is applied to less complex datasets with otherwise equivalent methylpeptide abundances, produced using the H/L mixing ratios suggested earlier, higher methyl-PSM sensitivities than those shown in Figures 5B and 6SB will be observed. For samples that produce very low methylpeptide ion intensities, such as the particular immunoaffinity enriched sample studied here, MethylQuant’s confidence indicators may not operate with high sensitivity. However in these instances MethylQuant may still be used to aid manual data ACS Paragon Plus Environment

35

Journal of Proteome Research

Page 36 of 46

interpretation. For example for the present immunoaffinity enriched sample, MethylQuant identified 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

evidence for peptide pairs (i.e. H/L Ratio #1 or H/L Ratio #2 outputs > 0) for each true positive methylpeptide with no false positives. We do however recommend that if MethylQuant confidence indicators are not used, putative methylpeptides should be treated with caution and independently validated using orthogonal techniques. Finally, as MethylQuant operates by classifying true and false positive heavy and light peptide pairs in a manner that is independent of sequence database searching, it cannot provide information on the accuracy of methylation site localizations. When MethylQuant returns confident matches for methyl-PSMs with multiple possible methylated residues – for example with multiple lysine or arginine residues – we suggest that additional evaluations of site localizations should be employed (e.g. using Andromeda’s modification localization probabilities43).

Best practice use of MethylQuant for methylpeptide quantification In addition to providing a means of confidently identifying methylpeptides, MethylQuant can facilitate their relative quantification in metabolic labeling experiments conducted across different conditions or sample types.23 Guidelines for such studies, which require careful experimental design, are presented below. To perform studies of this nature, we suggest that methylpeptides should firstly be validated by MethylQuant from heavy-methyl SILAC (or related) experiments conducted on reference samples using the guidelines presented above. Following this, conventional SILAC experiments may then be performed to compare the reference condition to the test condition. Methyl-PSMs identified in these experiments can be quantified from conventional SILAC pairs using MethylQuant’s H/L Ratio #2 (if this output is available; see below), while relative protein abundances may be measured using any other quantitative proteomics software platform. (Other software platforms may also be used to quantify the conventional SILAC pairs for individual methyl-PSMs; however they may not be as sensitive as MethylQuant for this purpose.) In this manner changes in methylpeptide

ACS Paragon Plus Environment

36

Page 37 of 46

Journal of Proteome Research

abundances that are attributable to equivalent changes in parent protein abundances may be 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

identified, while genuine changes in methylation site occupancy can be quantified. To improve the sensitivity of these SILAC experiments for methylpeptide quantification, LC-MS/MS inclusion lists designed to target the methylpeptides identified from the preceding heavy-methyl SILAC experiments are likely to be of benefit. In addition methyl-PSMs that are confidently validated in heavy-methyl SILAC experiments need not be designated by MethylQuant as Very High or High confidence in follow-up SILAC experiments, as any decreases in methylpeptide abundance in the test condition may negatively impact MethylQuant’s confidence indicators. Furthermore, decreases in methylpeptide abundance in the test condition may result in the unavailability of MethylQuant’s H/L Ratio #2 output. In these instances MethylQuant’s more sensitive but generally less accurate quantification measure, the H/L Ratio #1 output, may be used. Finally to assign statistical significance to quantitative changes in methylpeptide abundance, we suggest that biological replicate and reciprocal labeling experiments should be performed following established procedures.44

Conclusions Increasing attention is being devoted to quality control in the field of proteomics.45, 46 It has been demonstrated that particularly stringent quality control measures are often required for PTMs identified in sequence database searches,47 and when considering methylpeptides in particular, we have recently shown that orthogonal validation of search outputs should generally be considered a prerequisite for obtaining high confidence data.31 By eliminating bottlenecks associated with the analysis of heavy-methyl SILAC and related data, MethylQuant offers a means for conducting routine orthogonal methylpeptide validation in broad-scale proteomics experiments. Although MethylQuant is specifically tailored to the analysis of methyl-SILAC pairs, it can be applied to co-eluting light and heavy isotope labeled peptide pairs of any type, including those derived from other PTM-specific labeling workflows.48 MethylQuant’s confidence indicators therefore offer a universally applicable means of reporting on the quality of heavy and light peptide ACS Paragon Plus Environment

37

Journal of Proteome Research

Page 38 of 46

pair co-elution. Moreover MethylQuant may be used as a quality filter for any metabolic labeling1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

based quantitative proteomics experiment; by removing low quality peptide pairs in such experiments, MethylQuant offers a straightforward means by which overall protein quantification accuracy may be improved. Together MethylQuant’s in-depth capabilities towards individual heavy and light peptide pair analysis fills a noteworthy gap in proteomics software, and clears a path toward routine high accuracy characterizations of the methylproteome.

Acknowledgements G.H.-S. and M.R.W. thank the Australian Research Council (ARC) for their financial support (ARC DE150100019 and DP130100349). G.H.-S. also acknowledges funding from the UNSW School of Biotechnology and Biomolecular Sciences. A.P.T. acknowledges the support of an Australian Postgraduate Award. The authors thank Dr. Ling Zhong, Ms. Sydney Liu Lau and A/Prof. Mark Raftery for their maintenance of the orbitrap mass spectrometers housed at the UNSW Bioanalytical Mass Spectrometry Facility, and Mr. Daniel L. Winter for testing and providing feedback on MethylQuant.

Supporting Information The following files are available free of charge at ACS website http://pubs.acs.org/: Supporting Information.docx. Graphical summaries and statistical analyses of reference heavy-methyl SILAC datasets separated according to instrument type; MethylQuant H/L Ratio #2 outputs for reference heavy-methyl SILAC datasets; results of target-decoy approach-based sequence database search filtering for reference heavy-methyl SILAC datasets separated according to instrument type; details of logit models.

ACS Paragon Plus Environment

38

Page 39 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

References (1)

Piak, W. K.; Piak, D. C.; Kim, S., Historical review: the field of protein methylation. Trends

in Biochemical Sciences 2007, 32 (3), 146-152. (2)

Bedford, M. T.; Clarke, S., Protein arginine methylation in mammals: who, what, and why.

Molecular Cell 2009, 33 (1), 1-13. (3)

Huang, J.; Berger, S. L., The emerging field of dynamic lysine methylation of non-histone

proteins. Current Opinion in Genetics & Development 2008, 18 (2), 152-158. (4)

Lee, Y. H.; Stallcup, M. R., Minireview: protein arginine methylation of nonhistone proteins

in transcriptional regulation. Molecular Endocrinology 2009, 23 (4), 425-433. (5)

Low, J. K. K.; Wilkins, M. R., Protein arginine methylation in Saccharomyces cerevisiae.

The FEBS Journal 2012, 279 (24), 4423-4443. (6)

Erce, M. A.; Pang, C. N. I.; Hart-Smith, G.; Wilkins, M. R., The methylproteome and the

intracellular methylation network. Proteomics 2012, 12 (4-5), 564-586. (7)

Webb, K. J.; Zurita-Lopez, C. I.; Al-Hadid, Q.; Laganowsky, A.; Young, B. D.; Lipson, R.

S.; Souda, P.; Faull, K. F.; Whitelegge, J. P.; Clarke, S. G., a novel 3-methylhistidine modification of yeast ribosomal protein Rpl3 is dependent upon the YIL110W methyltransferase. Journal of Biological Chemistry 2010, 285 (48), 37598-37606. (8)

Heurgue-Hamard, V.; Champ, S.; Mora, L.; Merkoulova-Rainon, T.; Kisselev, L. L.;

Buckingham, R. H., The glutamine residue of the conserved GGQ motif in Saccharomyces cerevisiae release factor eRF1 is methylated by the product of the YDR140w gene. Journal of Biological Chemistry 2005, 280 (4), 2439-2445. (9)

Mattheakis, L.; Shen, W.; Collier, R. J., DPH5, a methyltransferase gene required for

diphthamide biosynthesis in Saccharomyces cerevisiae. Molecular and Cellular Biology 1992, 12 (9), 4026-4037.

ACS Paragon Plus Environment

39

Journal of Proteome Research

(10) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 46

Marr, R.; Blair, L.; Thorner, J., Saccharomyces cerevisiae STE14 gene is required for

COOH-terminal methylation of a-factor mating pheromone. Journal of Biological Chemistry 1990, 265 (33), 20057-20060. (11)

Young, B. D.; Weiss, D. I.; Zurita-Lopez, C. I.; Webb, K. J.; Clarke, S. G.; McBride, A. E.,

Identification of methylated proteins in the yeast small ribosomal subunit: a role for SPOUT methyltransferases in protein arginine methylation. Biochemistry 2012, 51 (25), 5091-5104. (12)

Webb, K. J.; Lipson, R. S.; Al-Hadid, Q.; Whitelegge, J. P.; Clarke, S. G., Identification of

protein N-terminal methyltransferases in yeast and humans. Biochemistry 2010, 49 (25), 5225-5235. (13)

Wu, J.; Tolstykh, T.; Lee, J.; Boyd, K.; Stock, J. B.; Broach, J. R., Carboxyl methylation of

the phosphoprotein phosphatase 2A catalytic subunit promotes its functional association with regulatory subunits in vivo. The EMBO journal 2000, 19 (21), 5672-5681. (14)

Hamey, J. J.; Winter, D. L.; Yagoub, D.; Overall, C. M.; Hart-Smith, G.; Wilkins, M. R.,

Novel N-terminal and lysine methyltransferases that target translation elongation factor 1A in yeast and human. Molecular & Cellular Proteomics 2016, 15 (1), 164-176. (15)

Cao, X.-J.; Arnaudo, A. M.; Garcia, B. A., Large-scale global identification of protein lysine

methylation in vivo. Epigenetics 2013, 8 (5), 477-485. (16)

Fisk, J. C.; Li, J.; Wang, H.; Aletta, J. M.; Qu, J.; Read, L. K., Proteomic analysis reveals

diverse classes of arginine methylproteins in mitochondria of trypanosomes. Molecular & Cellular Proteomics 2013, 12 (2), 302-311. (17)

Uhlmann, T.; Geoghegan, V. L.; Thomas, B.; Ridlova, G.; Trudgian, D. C.; Acuto, O., A

method for large-scale identification of protein arginine methylation. Molecular & Cellular Proteomics 2012, 11 (11), 1489-1499. (18)

Bremang, M.; Cuomo, A.; Agresta, A. M.; Stugiewicz, M.; Spadotto, V.; Bonaldi, T., Mass

spectrometry-based identification and characterisation of lysine and arginine methylation in the human proteome. Molecular bioSystems 2013, 9 (9), 2231-2247.

ACS Paragon Plus Environment

40

Page 41 of 46

Journal of Proteome Research

(19) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Guo, A.; Gu, H.; Zhou, J.; Mulhern, D.; Wang, Y.; Lee, K. A.; Yang, V.; Aguiar, M.;

Kornhauser, J.; Jia, X., Immunoaffinity enrichment and mass spectrometry analysis of protein methylation. Molecular & Cellular Proteomics 2014, 13 (1), 372-387. (20)

Lott, K.; Li, J.; Fisk, J. C.; Wang, H.; Aletta, J. M.; Qu, J.; Read, L. K., Global proteomic

analysis in trypanosomes reveals unique proteins and conserved cellular processes impacted by arginine methylation. Journal of proteomics 2013, 91, 210-225. (21)

Wang, K.; Zhou, Y. J.; Liu, H.; Cheng, K.; Mao, J.; Wang, F.; Liu, W.; Ye, M.; Zhao, Z. K.;

Zou, H., Proteomic analysis of protein methylation in the yeast Saccharomyces cerevisiae. Journal of proteomics 2015, 114, 226-233. (22)

Alban, C.; Tardif, M.; Mininno, M.; Brugière, S.; Gilgen, A.; Ma, S.; Mazzoleni, M.;

Gigarel, O.; Martin-Laffon, J.; Ferro, M., Uncovering the protein lysine and arginine methylation network in Arabidopsis chloroplasts. PloS one 2014, 9 (4), e95512. (23)

Geoghegan, V.; Guo, A.; Trudgian, D.; Thomas, B.; Acuto, O., Comprehensive identification

of arginine methylation in primary T cells reveals regulatory roles in cell signalling. Nature communications 2015, 6, 6758. (24)

Plank, M.; Fischer, R.; Geoghegan, V.; Charles, P. D.; Konietzny, R.; Acuto, O.; Pears, C.;

Schofield, C. J.; Kessler, B. M., Expanding the yeast protein arginine methylome. Proteomics 2015, 15 (18), 3232-3243. (25)

Sylvestersen, K. B.; Horn, H.; Jungmichel, S.; Jensen, L. J.; Nielsen, M. L., Proteomic

analysis of arginine methylation sites in human cells reveals dynamic regulation during transcriptional arrest. Molecular & Cellular Proteomics 2014, 13 (8), 2072-2088. (26)

Yagoub, D.; Hart‐Smith, G.; Moecking, J.; Erce, M. A.; Wilkins, M. R., Yeast proteins

Gar1p, Nop1p, Npl3p, Nsr1p and Rps2p are natively methylated and are substrates of the arginine methyltransferase Hmt1p. Proteomics 2015, 15 (18), 3209-3218.

ACS Paragon Plus Environment

41

Journal of Proteome Research

(27) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 46

Shishkova, E.; Zeng, H.; Liu, F.; Kwiecien, N. W.; Hebert, A. S.; Coon, J. J.; Xu, W., Global

mapping of CARM1 substrates defines enzyme specificity and substrate recognition. Nature Communications 2017, 8, 15571. (28)

Caslavka Zempel, K. E.; Vashisht, A. A.; Barshop, W. D.; Wohlschlegel, J. A.; Clarke, S. G.,

Determining the Mitochondrial Methyl Proteome in Saccharomyces cerevisiae using Heavy Methyl SILAC. Journal of proteome research 2016, 15 (12), 4436-4451. (29)

Larsen, S. C.; Sylvestersen, K. B.; Mund, A.; Lyon, D.; Mullari, M.; Madsen, M. V.; Daniel,

J. A.; Jensen, L. J.; Nielsen, M. L., Proteome-wide analysis of arginine monomethylation reveals widespread occurrence in human cells. Sci. Signal. 2016, 9 (443), rs9-rs9. (30)

Carlson, S. M.; Moore, K. E.; Green, E. M.; Martín, G. M.; Gozani, O., Proteome-wide

enrichment of proteins modified by lysine methylation. Nature Protocols 2014, 9 (1), 37-50. (31)

Hart-Smith, G.; Yagoub, D.; Tay, A. P.; Pickford, R.; Wilkins, M. R., Large Scale Mass

Spectrometry-based Identifications of Enzyme-mediated Protein Methylation Are Subject to High False Discovery Rates. Molecular & Cellular Proteomics 2016, 15 (3), 989-1006. (32)

Ong, S. E.; Mittler, G.; Mann, M., Identifying and quantifying in vivo methylation sites by

heavy methyl SILAC. Nature Methods 2004, 1 (2), 1-8. (33)

Cox, J.; Mann, M., MaxQuant enables high peptide identification rates, individualized ppb-

range mass accuracies and proteome-wide protein quantification. Nature biotechnology 2008, 26 (12), 1367-1372. (34)

Mitchell, C. J.; Kim, M.-S.; Na, C. H.; Pandey, A., PyQuant: a versatile framework for

analysis of quantitative mass spectrometry data. Molecular & Cellular Proteomics 2016, mcp. O115. 056879. (35)

Hart-Smith, G.; Raftery, M. J., Detection and characterization of low abundance

glycopeptides via higher-energy c-trap dissociation and orbitrap mass analysis. Journal of the American Society for Mass Spectrometry 2012, 23 (1), 124-140.

ACS Paragon Plus Environment

42

Page 43 of 46

Journal of Proteome Research

(36) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Käll, L.; Canterbury, J. D.; Weston, J.; Noble, W. S.; MacCoss, M. J., Semi-supervised

learning for peptide identification from shotgun proteomics datasets. Nature Methods 2007, 4 (11), 923-925. (37)

Hart-Smith, G.; Low, J. K.; Erce, M. A.; Wilkins, M. R., Enhanced methylarginine

characterization by post-translational modification-specific targeted data acquisition and electrontransfer dissociation mass spectrometry. Journal of the American Society for Mass Spectrometry 2012, 23 (8), 1376-1389. (38)

Jones, A. R.; Eisenacher, M.; Mayer, G.; Kohlbacher, O.; Siepen, J.; Hubbard, S. J.; Selley, J.

N.; Searle, B. C.; Shofstahl, J.; Seymour, S. L., The mzIdentML data standard for mass spectrometry-based proteomics results. Molecular & Cellular Proteomics 2012, 11 (7), M111. 014381. (39)

Olsen, J. B.; Cao, X.-J.; Han, B.; Chen, L. H.; Horvath, A.; Richardson, T. I.; Campbell, R.

M.; Garcia, B. A.; Nguyen, H., Quantitative profiling of the activity of protein lysine methyltransferase SMYD2 using SILAC-based proteomics. Molecular & cellular proteomics 2016, 15 (3), 892-905. (40)

Wu, Z.; Cheng, Z.; Sun, M.; Wan, X.; Liu, P.; He, T.; Tan, M.; Zhao, Y., A Chemical

Proteomics Approach for Global Analysis of Lysine Monomethylome Profiling. Molecular & Cellular Proteomics 2015, 14 (2), 329-339. (41)

Wang, K.; Dong, M.; Mao, J.; Wang, Y.; Jin, Y.; Ye, M.; Zou, H., Antibody-Free Approach

for the Global Analysis of Protein Methylation. Analytical chemistry 2016, 88 (23), 11319-11327. (42)

Arsova, B.; Zauber, H.; Schulze, W. X., Precision, proteome coverage, and dynamic range of

arabidopsis proteome profiling using 15N metabolic labeling and label-free approaches. Molecular & Cellular Proteomics 2012, 11 (9), 619-628. (43)

Cox, J.; Neuhauser, N.; Michalski, A.; Scheltema, R. A.; Olsen, J. V.; Mann, M.,

Andromeda: a peptide search engine integrated into the MaxQuant environment. Journal of proteome research 2011, 10 (4), 1794-1805. ACS Paragon Plus Environment

43

Journal of Proteome Research

(44) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 46

Ting, L.; Cowley, M. J.; Hoon, S. L.; Guilhaus, M.; Raftery, M. J.; Cavicchioli, R.,

Normalization and statistical analysis of quantitative proteomics data generated by metabolic labeling. Molecular & Cellular Proteomics 2009, 8 (10), 2227-2242. (45)

Ezkurdia, I.; Vázquez, J. s.; Valencia, A.; Tress, M., Analyzing the first drafts of the human

proteome. Journal of proteome research 2014, 13 (8), 3854-3855. (46)

Bittremieux, W.; Walzer, M.; Tenzer, S.; Zhu, W.; Salek, R. M.; Eisenacher, M.; Tabb, D.

L., The Human Proteome Organization–Proteomics Standards Initiative Quality Control Working Group: Making Quality Control More Accessible for Biological Mass Spectrometry. Analytical chemistry 2017, 89 (8), 4474-4479. (47)

Fu, Y.; Qian, X., Transferred subgroup false discovery rate for rare post-translational

modifications detected by mass spectrometry. Molecular & Cellular Proteomics 2014, 13 (5), 13591368. (48)

Molden, R. C.; Goya, J.; Khan, Z.; Garcia, B. A., Stable isotope labeling of phosphoproteins

for large-scale phosphorylation rate determination. Molecular & Cellular Proteomics 2014, 13 (4), 1106-1118.

ACS Paragon Plus Environment

44

Page 45 of 46

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Tables Table 1. Probability values at which MethylQuant outputs predict the true and false positive status of methylpeptide spectrum matches (significant predictors of p < 0.01 are labeled **). Data are from logistic regression analyses conducted using MethylQuant outputs from the reference heavy-methyl SILAC datasets of Figure 2 when peptide spectrum matches from instrument-specific datasets are combined, with the dependent variable being the true or false positive status of each methylpeptide spectrum match. For Analysis 1, peptide pair detection (Y/N) was a categorical predictor variable. For Analysis 2 the listed MethylQuant outputs were predictor variables, and only peptide spectrum matches detected with putative peptide pairs were analyzed.

Analysis 1

Dataset

HILIC samples SDS-PAGE samples (replicate 1) SDS-PAGE samples (replicate 2) Immunoaffinity enriched sample

Analysis 2

Peptide pair detection

Isotope Distribution Correlation

H/L Ratio #1

H/L Ratio #2

Number of Elution Profile Correlations > 0.5

**

**

n.s.

**

**

p = 3.4E-68

p = 0.0061

p = 0.498

p = 9.1E-4

p = 9.8E-7

**

**

**

**

**

p = 8.3E-197

p = 7.6E-5

p = 2.0E-6

p = 0.014

p = 2.9E-23

**

**

**

**

**

p = 1.4E-80

p = 1.5E-26

p = 6.7E-9

p = 2.1E-4

p = 1.8E-10

n/a

n/a

n/a

n/a

n/a

ACS Paragon Plus Environment

45

Journal of Proteome Research

Page 46 of 46

For TOC only 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

46