An NGS-Independent Strategy for Proteome ... - ACS Publications

Searching shotgun proteomic data sets against these NGS-derived databases allowed for identification of SAP peptides, thus validating the proteome-lev...
0 downloads 0 Views 1MB Size
Subscriber access provided by UNIV OF CONNECTICUT

Article

An NGS-independent strategy for proteome-wide identification of single amino acid polymorphisms by mass spectrometry Yun Xiong, Yufeng Guo, Weidi Xiao, Qichen Cao, Shanshan Li, Xianni Qi, Zhidan Zhang, Qinhong Wang, and Wenqing Shui Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.5b04417 • Publication Date (Web): 25 Jan 2016 Downloaded from http://pubs.acs.org on January 26, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

An NGS-independent strategy for proteome-wide identification of single amino acid polymorphisms by mass spectrometry Yun Xiong1, Yufeng Guo1, Weidi Xiao2, Qichen Cao1, Shanshan Li1, Xianni Qi1, Zhidan Zhang1, Qinhong Wang1,*, Wenqing Shui1,* 1

Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial

Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China 2

College of Life Sciences, Nankai University, Tianjin 300071, China

*To whom correspondence should be addressed to: Qinhong Wang, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China; Tel: 86-22-84861950; email: [email protected] Wenqing Shui, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China; Tel: 86-22-24828740; email: [email protected]

1

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Detection of proteins containing single amino acid polymorphisms (SAPs) encoded by nonsynonymous SNPs (nsSNPs) can aid researchers in studying the functional significance of protein variants. Most proteogenomic approaches for largescale SAPs mapping require construction of a sample-specific database containing protein variants predicted from the next-generation sequencing (NGS) data. Searching shotgun proteomic datasets against these NGS-derived databases allowed for identification of SAP peptides, thus validating the proteome-level sequence variation. Contrary to the conventional approaches, our study presents a novel strategy for proteome-wide SAP detection without relying on sample-specific NGS data. By searching a deep-coverage proteomic dataset from an industrial thermotolerant yeast strain using our strategy, we identified 337 putative SAPs compared to the reference genome. Among the SAP peptides identified with stringent criteria, 85.2% of SAP sites were validated using whole-genome sequencing data obtained for this organism, which indicates high accuracy of SAP identification with our strategy. More interestingly, for certain SAP peptides that cannot be predicted by genomic sequencing, we used synthetic peptide standards to verify expression of peptide variants in the proteome. Our study has provided a unique tool for proteogenomics to enable proteome-wide direct SAP identification and capture nongenetic protein variants not linked to nsSNPs.

2

ACS Paragon Plus Environment

Page 2 of 30

Page 3 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Introduction Recent advances in mass spectrometry (MS)-based proteomics and nextgeneration sequencing (NGS) have vastly assisted the growth of proteogenomics which validates genomic or transcriptomic variation at the protein level based on proteomic data to improve genome annotation1, 2. Among an array of known genetic variations, single-nucleotide polymorphisms (SNPs) have been the most intensely investigated mainly through genome-wide association studies seeking to uncover causative SNPs for a particular disease or trait3-5. Because nonsynonymous SNP (nsSNP) leads to changes in the protein product sequences known as single amino acid polymorphisms (SAPs), many SNP functional studies have focused on SAPs to reveal their potential impact on protein stability, localization, protein-protein interactions, and signal transduction6, 7. MS-based proteomics has become an essential tool for validation of SAPs on a global scale. To achieve the best accuracy and sensitivity in SAP mapping, most current proteogenomic studies choose to create a customized proteome database including SAP variant sequences that are derived from whole genome sequencing (WGS) or deep RNA-sequencing (RNA-seq) of the same sample source1, 8. Alternatively, generic gene variants can be collected from public mutation repositories to build a customized database for proteomic data search, yet it suffers from increased risk of false positives in SAP identification8. In addition, several groups have designed searching algorithms of de novo sequencing9,

10

, peptide

sequence tagging11, 12, or combination of both13 to identify putative peptide variants without prior construction of a sample-specific database. These strategies circumvent the need of performing NGS experiments or collecting genomic variation information for SAP detection at the proteome level. However, most of these approaches substantially increase the database search space and could result in unacceptable false discovery rates (FDR). Thus they have not been considered widely practical for 3

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

proteome-wide SAP identification. Recently, a mass-tolerant search method has been devised to uncover a large portion of unassigned mass spectra in shotgun proteomics arising from protein post-translational modifications (PTMs) or amino acid variations14. This type of exhaustive search tested all possible PTM forms and amino acid substitutions in each protein, and restricted FDR in peptide variant identification by use of high-resolution MS/MS analysis. Nevertheless, this promising approach has not been specifically optimized and comprehensively evaluated for SAP mapping. Here we integrated error-tolerant database search for SAP discovery with conventional search for SAP validation on the proteomic scale to eliminate the need of sample-specific NGS endeavor. The proposed workflow was benchmarked on a proteomic dataset of an industrial yeast strain to discover its unique SAPs compared to a reference strain. Whole genome sequencing and specific gene sequencing were performed to evaluate the accuracy of our new approach. For a subset of SAPs identified by mass spectrometry analysis yet not mapped to genetic variants, we discussed possible causes and conducted further validation experiments to demonstrate the advantages of our approach and reveal the limits of current shotgun proteomics.

4

ACS Paragon Plus Environment

Page 4 of 30

Page 5 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Materials and Methods Yeast cell protein extraction and digestion. The thermotolerant industrial strain ScY01 described in our previous study15 was cultivated at 40 °C in YPD medium with high glucose concentration (200 g/L) for 16 h and then cells were harvested by centrifugation. Cell pellets were lysed by glass-bead shaking in the buffer of 5% SDS, 50 mM DTT and 0.1 M Tris-HCl (pH 7.6) supplemented with a protease inhibitor cocktail (Roche, Germany). The protein extracts were quantified with the 2-D Quant Kit (GE Healthcare, USA). Thereafter, the protein extract was digested using the FASP protocol with certain adjustment16. In brief, the protein sample was mixed with 0.2 mL of 8 M urea in 0.1 M Tris/HCl, pH 8.0 (UA solution) loaded into the filtration devices (30K Microcon filter unit, Millipore) and centrifuged at 14,000g for 15 min. The concentrates were diluted with 0.2 mL of UA solution and centrifuged again. After centrifugation, the concentrates were mixed with 0.1 mL of 50 mM iodoacetamide in UA solution and incubated in darkness at room temperature (RT) for 30 min followed by centrifugation for 15 min. Then, the concentrate was diluted with 0.2 mL of 100 mM NH4HCO3, and concentrated again. This step was repeated twice. The resulting concentrate was diluted with 100 mM NH4HCO3 and sequencinggrade trypsin (Promega) was added (1:50 w/w). After overnight digestion, the peptides were collected by centrifugation of the filter units for 20 min.

2D RPLC-MS/MS analysis for proteome profiling. The protein digests (~150 µg) were subjected to 2D RPLC separation and MS/MS identification. The first dimensional basic-pH RPLC was performed on Nexera UHPLC system (SHIMADZU, Japan) using a 4.6 mm × 250 mm Durashell-C18 column (Agela, China) at a flow rate of 0.8 ml/min. Solvents were composed of water/acetonitrile/ammonium acetate (A: 100%/0%/200 mM, B: 20%/80%/200 mM). The LC method was 0-5 min 5% B, 530 min 5-15% B, 30-45 min 15-38% B, 45-46 min 38-90%, 46-50 min 90%, and re5

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

equilibrium for 10 min at 5% B. A total of 45 fractionated peptides were collected and consolidated to 15. The fractionated peptides were dried in speed vacuum and reconstituted in 0.1% formic acid prior to the nanoLC-MS/MS analysis on an Eksigent nanoLC system connected to TripleTOFTM 5600 mass spectrometer (AB SCIEX, USA). Peptide samples were loaded onto a trap column (10 mm×100 µm, 5 µm C18 resin) and separated on an analytical column (100 mm×75 µm) in-house packed with C18-AQ resin (3 µm, Dr. Maisch, GmbH, Germany) using a gradient of 5-36% solvent B (0.1% formic acid, 98% acetonitrile) over 80 min at flow rate of 300 nl/min. In each MS data collection cycle, one full MS scan (300-1,500 m/z) was acquired and top 30 ions were selected for isolation and MS/MS scans (100-1,800 m/z), with the abundance threshold of 120 counts per second. The accumulation time for MS and MS/MS scan was 250 ms and 50 ms respectively. The dynamic exclusion time was set at 30 sec. Collision energy (CE) was calculated with the following formulas: CE = (m/z × 0.044) + 4 for 2+ charged peptides, CE = (m/z × 0.051) +3 for 3+ charged peptides17.

1D nanoLC-MS/MS analysis for targeted SAP peptide identification. Protein digests (~1 μg) from separate ScY01 cultures were prepared and analyzed on the 1D nanoLC-MS system described above. In some cases, the total protein digests were spiked in with low (0.05-0.1 pg on column) or high (0.1-0.2 pg on column) amount of mixed synthetic SAP peptides (GL Biochem, China) before nanoLC-MS analysis. Each data collection cycle consisted of one MS1 scan followed by 22 MS/MS targeting predefined precursor ions of the SAP peptides to be validated. The MS and MS/MS accumulation time was 250 ms and 100 ms respectively.

Database searching

6

ACS Paragon Plus Environment

Page 6 of 30

Page 7 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

I. Error-tolerant search for SAP discovery. ProteinPilotTM software 4.5 (AB SCIEX) equipped with Paragon algorithm18 was employed to search the 2D LC-MS/MS dataset against an S. cerevisiae ORF database from SGD (strain S288c, 6750 entries, 17-Oct-2014) supplemented with common protein contaminant sequences. A regular search was performed first with trypsin as the specified enzyme (two miscleavage allowed) and Cys carbamidomethylation as a static modification. In the error-tolerant search mode, all modifications included in UniMod and a set of single amino acid substitutions were searched simultaneously based on the BLOSUM 62 Matrix19. Up to two substitutions per peptide was allowed in this search. It is noteworthy that any putative substitution to Leu is recognized to be Leu/Ile because these two amino acids are indistinguishable in MSMS analysis. ProteinPilot automatically clustered the identified proteins into protein groups sharing common peptides. Protein and peptide level FDRs were controlled below 1% using a targetdecoy search strategy20. Peptide spectral matches (PSMs) for putative SAP peptides identified in this search were sequentially filtered as described in Fig. 1B and finally consolidated into 524 unique SAP peptides. The automatic error-tolerant search in Mascot was also employed against the same database, with the same settings for trypsin specificity and Cys alkylation. PSMs for putative SAP peptides identified with an overall peptide FDR D substituent had solid evidence for their expression in the ScY01 culture. The three strictly validated SAP peptides may implicate certain amino acid substitution is independent of genetic or transcript variation and thus suggest a novel mechanism of post-transcriptional regulation. It is noteworthy that computational prediction categorized two such SAPs in proteins TDH2/TDH3 or ADK1 as “severely damaging” sites that would probably disrupt conserved protein domains and affect protein functions (Table 1).

16

ACS Paragon Plus Environment

Page 16 of 30

Page 17 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

CONCLUSIONS In summary, our study integrated error-tolerant search with conventional search for proteome-wide SAP mapping without the need of genomic sequencing information. Our stringent data filtration workflow identified 337 SAPs in an industrial yeast strain compared to the reference and achieved 85.2% mapping rate to the nsSNPs detected in the same sample. The sufficient accuracy of our approach for NGS-independent SAP analysis renders it possible to detect individual-specific protein variants, which will have considerable potential in biomarker discovery and personalized medicine. Furthermore, analysis of the non-mapping SAPs provides three aspects of insights to proteogenomics. First, the regular SNV calling algorithm could filter out true variants and a very small fraction of these false-negatives can be rescued with our approach. Our study uncovered five SAPs originally missed in WGS results which only constitute 0.033% of all SAPs revealed by WGS. Second, current shotgun proteomics still require improvement in distinguishing PTMs from SAPs and precise localization of the variation sites. Third, our strong evidence of specific SAPs present in the proteomic sample yet not linked to genomic variants may indicate an unrecognized dimension of nongenetic regulation, though the molecular mechanism remains elusive. Therefore, our study provides a unique tool for proteogenomics to enable proteome-wide direct SAP identification and capture nongenetic protein variants.

17

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Accession codes. MS raw and processed data files were deposited to ProteomeXchange (PXD003101). WGS raw data were deposited to SRA from NCBI (SRA308394)

ACKNOWLEDGEMENT We thank Prof. Shian Wu from Nankai University for critical discussion. This work was supported by grants from the National Natural Science Foundation of China (No. 31401150, 21505151 and 31470214) and the Key Projects in Tianjin Science & Technology Pillar Program (No. 14ZCZDSY00062).

COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.

18

ACS Paragon Plus Environment

Page 18 of 30

Page 19 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

REFERENCES 1. Low, T.Y.; van Heesch, S.; van den Toorn, H.; Giansanti, P.; Cristobal, A.; Toonen, P.; Schafer, S.; Hubner, N.; van Breukelen, B.; Mohammed, S.; Cuppen, E.; Heck, A.J.; Guryev, V. Cell Rep 2013, 2013 5, 1469-1478. 2.

Wu, P.; Zhang, H.; Lin, W.; Hao, Y.; Ren, L.; Zhang, C.; Li, N.; Wei, H.; Jiang, Y.; He, F. J Proteome Res 2014, 2014 13, 2409-2419.

3.

Buchanan, C.C.; Torstenson, E.S.; Bush, W.S.; Ritchie, M.D. J Am Med

Inform Assoc 2012, 2012 19, 289-294. 4.

McCarthy, M.I.; Abecasis, G.R.; Cardon, L.R.; Goldstein, D.B.; Little, J.; Ioannidis, J.P.; Hirschhorn, J.N. Nat Rev Genet 2008, 2008 9, 356-369.

5.

Cooper, J.D.; Walker, N.M.; Smyth, D.J.; Downes, K.; Healy, B.C.; Todd, J.A.

Genes Immun 2009, 2009 10 Suppl 1, S85-94. 6.

Helgason, H.; Sulem, P.; Duvvari, M.R.; Luo, H.; Thorleifsson, G.; Stefansson, H.; Jonsdottir, I.; Masson, G.; Gudbjartsson, D.F.; Walters, G.B.; Magnusson, O.T.; Kong, A.; Rafnar, T.; Kiemeney, L.A.; Schoenmaker-Koller, F.E.; Zhao, L.; Boon, C.J.; Song, Y.; Fauser, S.; Pei, M.; Ristau, T.; Patel, S.; Liakopoulos, S.; van de Ven, J.P.; Hoyng, C.B.; Ferreyra, H.; Duan, Y.; Bernstein, P.S.; Geirsdottir, A.; Helgadottir, G.; Stefansson, E.; den Hollander, A.I.; Zhang, K.; Jonasson, F.; Sigurdsson, H.; Thorsteinsdottir, U.; Stefansson, K. Nat Genet 2013, 2013 45, 1371-1374.

7.

Singh, P.; Schimenti, J.C. Proc Natl Acad Sci U S A 2015, 2015 112, 10431-10436.

8.

Sheynkman, G.M.; Shortreed, M.R.; Frey, B.L.; Scalf, M.; Smith, L.M. J

Proteome Res 2014, 2014 13, 228-240. 9.

Kim, S.; Gupta, N.; Bandeira, N.; Pevzner, P.A. Mol Cell Proteomics 2009, 2009 8, 53-69.

10.

Frank, A.; Pevzner, P. Anal Chem 2005, 2005 77, 964-973.

11.

Dasari, S.; Chambers, M.C.; Slebos, R.J.; Zimmerman, L.J.; Ham, A.J.; Tabb, D.L. J Proteome Res 2010, 2010 9, 1716-1726.

12.

Abraham, P.; Adams, R.M.; Tuskan, G.A.; Hettich, R.L. J Proteome Res 2013, 2013

12, 3642-3651. 13.

Bern, M.; Cai, Y.; Goldberg, D. Anal Chem 2007, 2007 79, 1393-1400.

14.

Chick, J.M.; Kolippakkam, D.; Nusinow, D.P.; Zhai, B.; Rad, R.; Huttlin, E.L.; Gygi, S.P. Nat Biotechnol 2015, 2015 33, 743-749.

15.

Shui, W.; Xiong, Y.; Xiao, W.; Qi, X.; Zhang, Y.; Lin, Y.; Guo, Y.; Zhang, Z.; Wang, Q.; Ma, Y. Mol Cell Proteomics 2015, 2015 14, 1885-1897.

19

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

16.

Wisniewski, J.R.; Zougman, A.; Nagaraj, N.; Mann, M. Nat Methods 2009, 2009 6, 359-362.

17.

Lange, V.; Malmstrom, J.A.; Didion, J.; King, N.L.; Johansson, B.P.; Schafer, J.; Rameseder, J.; Wong, C.H.; Deutsch, E.W.; Brusniak, M.Y.; Buhlmann, P.; Bjorck, L.; Domon, B.; Aebersold, R. Mol Cell Proteomics 2008, 2008 7, 1489-1500.

18.

Shilov, I.V.; Seymour, S.L.; Patel, A.A.; Loboda, A.; Tang, W.H.; Keating, S.P.; Hunter, C.L.; Nuwaysir, L.M.; Schaeffer, D.A. Mol Cell Proteomics 2007, 2007 6, 1638-1655.

19.

Henikoff, S.; Henikoff, J.G. Proc Natl Acad Sci U S A 1992, 1992 89, 10915-10919.

20.

Elias, J.E.; Gygi, S.P. Nat Methods 2007, 2007 4, 207-214.

21.

Wang, L.H.; Li, D.Q.; Fu, Y.; Wang, H.P.; Zhang, J.F.; Yuan, Z.F.; Sun, R.X.; Zeng, R.; He, S.M.; Gao, W. Rapid Commun Mass Spectrom 2007, 2007 21, 29852991.

22.

Lohse, M.; Bolger, A.M.; Nagel, A.; Fernie, A.R.; Lunn, J.E.; Stitt, M.; Usadel, B. Nucleic Acids Res 2012, 2012 40, W622-627.

23.

Langmead B, S.S. Nature Methods 2012, 2012 9, 357–359.

24.

Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; Genome Project Data Processing, S.

Bioinformatics 2009, 2009 25, 2078-2079. 25.

McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; DePristo, M.A. Genome

Res 2010, 2010 20, 1297-1303. 26.

Clevenger, J.; Chavarro, C.; Pearl, S.A.; Ozias-Akins, P.; Jackson, S.A. Mol

Plant 2015, 2015 8, 831-846. 27.

Nagaraj, N.; Kulak, N.A.; Cox, J.; Neuhauser, N.; Mayr, K.; Hoerning, O.; Vorm, O.; Mann, M. Mol Cell Proteomics 2012, 2012 11, M111 013722.

28.

Li, J.; Su, Z.; Ma, Z.Q.; Slebos, R.J.; Halvey, P.; Tabb, D.L.; Liebler, D.C.; Pao, W.; Zhang, B. Mol Cell Proteomics 2011, 2011 10, M110 006536.

29.

Adzhubei, I.A.; Schmidt, S.; Peshkin, L.; Ramensky, V.E.; Gerasimova, A.; Bork, P.; Kondrashov, A.S.; Sunyaev, S.R. Nat Methods 2010, 2010 7, 248-249.

30.

Ng, P.C.; Henikoff, S. Genome Res 2001, 2001 11, 863-874.

31.

Vaudel, M.; Breiter, D.; Beck, F.; Rahnenfuhrer, J.; Martens, L.; Zahedi, R.P.

Proteomics 2013, 2013 13, 1036-1041.

20

ACS Paragon Plus Environment

Page 20 of 30

Page 21 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table 1. Identification of nine non-mapping SAP peptides in biological replicates by targeted PRM analysis Protein

SAP Site

SAP peptidea

DDA scoreb

Glyceraldehyde-3-phosphate dehydrogenase 2

Y138K

VVITAPSSTAPMFVMGVNEEKK

74

53

No SNP

1.000

Cystathionine beta-synthase

S458K

LNNFNDVSSYNENKK

51

45

No SNP

0.003

Adenylate kinase 1

M73K

IMDQGGLVSDDIMVNK

87

41

No SNP

0.999

Elongation factor 1-gamma 2

K55D

QAPAFLGPDGLK

56

66

K55N

0.001

1,3-beta-glucanosyltransferase GAS1

R207G

IPVGYSSNDDEDTGVK

80

67

No SNP

1.000

Phosphoglycerate kinase

V245T

KTLENTEIGDSIFDK

90

49

No SNP

0.020

H/ACA ribonucleoprotein complex subunit 4

I169D

TDYESNLIEFDNKR

51

32

No SNP

1.000

Enolase 2

A359K

VNQIGTLSESIKK

39

26

No SNP

0.988

Glucokinase-1

P317V

LSTNVGFHLFEK

22

17

No SNP

0.401

a

SAP peptides further validated in the synthetic peptide spike-in experiment are highlighted in blue.

b

Mascot score for the peptide identification in the 2D LC-MS/MS dataset acquired by DDA analysis

c

Mascot score for the peptide identification in the 1D LC-MS/MS dataset acquired by PRM analysis

d

Transcript sequencing results

e

Prediction score >0.903 indicates “probably damaging” mutations that could impact protein functions

21

ACS Paragon Plus Environment

PRM Transcriptd Polyphen-2e scorec

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure legends Figure 1. Proteome-wide identification of SAPs in the industrial yeast strain ScY01. (A) Protein identification results by regular and error-tolerant search of a proteomic dataset collected from ScY01 cell extracts. (B) Overview of the workflow for SAP discovery and validation. In the discovery phase, the proteomic dataset was first searched against a reference proteome by error-tolerant search to acquire PSMs for SAP peptides at a global FDR