MS spectra for accurate quantification in

2 days ago - The presence of spectrum interference is judged by examining the overlap in the elution time of all scanned precursor ions. Removal of th...
7 downloads 0 Views 694KB Size
Subscriber access provided by ALBRIGHT COLLEGE

Article

Removal of interference MS/MS spectra for accurate quantification in isobaric tag-based proteomics Mio Iwasaki, Tsuyoshi Tabata, Yuka Kawahara, Yasushi Ishihama, and Masato Nakagawa J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.9b00078 • Publication Date (Web): 30 Apr 2019 Downloaded from http://pubs.acs.org on May 1, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Removal of interference MS/MS spectra for accurate quantification in isobaric tag-based proteomics Mio Iwasaki1*, Tsuyoshi Tabata1,2, Yuka Kawahara1, Yasushi Ishihama2, Masato Nakagawa1* 1 Center for iPS Cell Research and Application, Kyoto University, Kyoto 606-8507, Japan 2 Graduate school of Pharmaceutical Sciences, Kyoto University, Kyoto 606-8501, Japan *Corresponding Author: Masato Nakagawa and Mio Iwasaki, phone +81-75-366-7000, FAX +81-75-366-7023, E-mail: [email protected], [email protected]

Abbreviations iPSC, induced pluripotent stem cell; HDF, human dermal fibroblast; nanoLC-MS/MS, nano-scale liquid chromatography – tandem mass spectrometry; RiMS, removal of interference mixture MS/MS spectra

ACS Paragon Plus Environment

1

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Paragon Plus Environment

2

Page 2 of 45

Page 3 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

ABSTRACT

Rapid progress in mass spectrometry (MS) has made comprehensive analyses of the proteome possible, but accurate quantification remains challenging. Isobaric tags for relative and absolute quantification (iTRAQ) is widely used as a tool to quantify proteins expressed in different cell types and various cellular conditions. The quantification precision of iTRAQ is quite high, but the accuracy dramatically decreases in the presence of interference peptides that are co-eluted and co-isolated with the target peptide. Here, we developed “removal of interference mixture MS/MS spectra (RiMS)” to improve the quantification accuracy of isobaric tag approaches. The presence of spectrum interference is judged by examining the overlap in the elution time of all scanned precursor ions. Removal of this interference decreased the protein identification (11% loss), but improved the quantification accuracy. Further, RiMS does not require any specialized equipment, such as MS3 instruments or an additional ion separation mode. Finally, we demonstrated that RiMS can be used to quantitatively compare human induced pluripotent stem cells and human dermal fibroblasts, as it revealed differential protein expressions that reflect the biological characteristics of the cells.

Keywords

ACS Paragon Plus Environment

3

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

quantification, interference problem, iTRAQ, isobaric tag, nanoLC-MS/MS

ACS Paragon Plus Environment

4

Page 4 of 45

Page 5 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

INTRODUCTION Recent advances in mass spectrometry (MS)-based proteomics have enabled the comprehensive identification of proteins at levels comparable with next-generation sequencing1, 2. One challenge in MS-based proteomics, however, is accurate protein quantification with high coverage. Label-free quantification methods such as EmPAI3, spectral-counting4, APEX5, and iBAQ6 are easy to use and minimally decrease the quantified number of peptides and proteins, but they also lower the accuracy because they are sensitive to slight changes in the experimental conditions7. To improve the quantification accuracy, quantitative labeling methods such as stable isotope labeling by amino acids in cell culture (SILAC)8, dimethyl labeling9, and isobaric tandem mass tags (iTRAQ10, TMT11) have been developed. In these methods, differently labeled samples are combined before the MS analyses. In SILAC and dimethyl labeling, the relative ion abundances of the differently labeled peptides are compared using a full MS1 scan. While MS1 scan level quantification is effective12, the quantification is limited to at most three labels per sample. Also, the quantification accuracy is reduced by the high spectral complexity of the MS1 scan. On the other hand, isobaric tandem mass tags use the reporter ion intensities of differently labeled peptides in the MS/MS scan for the simultaneous quantification of up to 8 labels for iTRAQ and 11 labels for TMT13,

14.

This method is highly sophisticated

because the total measurement time can be shortened to multiplex samples and the

ACS Paragon Plus Environment

5

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

study design can be simplified, but sometimes at the expense of the quantification accuracy. This problem is known as ratio compression12, 15, 16 and is the result of the coisolation and co-fragmentation of interfering peptide ions within the target peptide isolation window. Many solutions have been proposed. For example, extensive fractions of the sample before MS analyses are effective at reducing the sample complexity, but generally result in only slight improvement in the quantification accuracy17-19. Regarding the MS settings, the use of proton-transfer ion-ion reactions (PTR)20, high-field asymmetric waveform ion mobility spectrometry (FAIMS)21, narrowing the precursor ion selection window, and fragmentation at the peak of the peptide elution signal22 were all shown to partially remove any interference. The most successful method in MS settings, triple stage MS (MS3)19,

23,

removes interference by the additional isolation and re-

fragmentation of selected ions in the MS/MS spectrum. Very recently, Winter et al., reported EASI-tag, a new isobaric tandem mass tag that shows higher quantification accuracy in combination with the asymmetric isolation window of precursor ions24. However, these MS-based solutions need additional ion separation modes in the MS1921, 23, 24

and special MS instruments19,

22, 23.

Another solution is to estimate the

interference by bioinformatics approaches such as the use of a decoy sample25, ratio adjustment by spiking the samples with a 6-protein calibration mixture18, the use of another fragment ion cluster in TMT26, and measurement of the spectral purity in a MS1 survey scan19, 27, 28. Although these strategies are effective for the interference problem,

ACS Paragon Plus Environment

6

Page 6 of 45

Page 7 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

they require an additional decoy25 and calibration18 samples, and substantially decrease the quantified number of peptides and proteins19,

22, 23, 26-28.

Moreover, the measured

spectral purity in the MS1 survey scan can sometimes underestimate interference levels18. As a simpler solution, here we developed “removal of interference mixture MS/MS spectra (RiMS)”, a bioinformatics method that improves the quantification accuracy of isobaric tag approaches. Our results indicate that if the number of MS/MS scan cycles is high enough for the sample complexity, the presence of spectrum interference can be judged by examining the elution time range overlap of all the scanned precursor ions. To demonstrate the value of RiMS, we show it has better quantification accuracy and a higher quantified number of proteins compared to a standard method when measuring spectral purity in an isolation window27, 28.

EXPERIMENTAL SECTION

Materials C18 Empore disc cartridges and membranes were purchased from 3M. Sodium deoxycholate (SDC), sodium lauroyl sarcosinate (SLS), mass spectrometry-grade lysylendoprotease (Lys-C), ethyl acetate, acetonitrile, acetic acid, methanol, trifluoroacetic acid (TFA), iodoacetamide (IAA), sodium carbonate, all other chemicals and ultrapure

ACS Paragon Plus Environment

7

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

water were purchased from Wako. Gradient polyacrylamide gels (Multi Gel II mini 5/10 (13W)) for SDS-PAGE were purchased from Cosmo Bio.

Preparation of E. coli lysate E. coli strain DH5α grown in Luria-Bertani (LB) culture at 37°C was used in this study. The cell pellet was prepared by centrifugation at 4,500 g, 4°C for 10 min and was resuspended in 1 mL of ice-cold lysis buffer (PTS buffer: 12 mM SDC, 12 mM SLS, 100 mM Tris-HCl (pH 9.0), 1% phosphatase inhibitors, and 1% protease inhibitor). The cells were lysed by ultra-sonication, and unbroken cells and debris were precipitated at 2,500

g, 4°C for 5 min. The supernatant was used for further analyses.

Sample fraction HeLa proteins lysed with SDS buffer (1% SDS, 20 mM Tris-HCl (pH 8.0), 1% phosphatase inhibitors, and 1% protease inhibitor) were separated on a 5-10% polyacrylamide gel. Gels were cut into 3, 10 or 20 fractions according to the molecular size. Then, in-gel digestion of the protein was performed with trypsin. The digested peptides were extracted from the gel and desalted using StageTip29.

Sample preparation for MS analysis

ACS Paragon Plus Environment

8

Page 8 of 45

Page 9 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

HeLa cells and HDFs (HDF1388) were cultured in Dulbecco’s modified Eagle’s medium (nacalai tesque) with 10% fetal bovine serum (FBS, gibco) and 1% penicillinStreptomycin (gibco). Human iPSCs (201B7) were cultured in StemFit AK03N (Ajinomoto) on iMatrix-511-coated dishes. For cell lysis, the medium was removed, and the cells were washed once with ice-cold PBS (nacarai) and directly lysed with ice cold PTS buffer. Cell lysates were collected by scraping and pipetting. E.coli, HeLa, iPSC and HDF protein samples lysed with PTS buffer were subjected to reduction, alkylation, Lys-C/trypsin digestion (enzyme ratio: 1/100) and desalting as previously described30. The resulting peptides were labeled with isobaric tags for relative and absolute quantification (iTRAQ, Sciex). Briefly, 120 μg of desalted peptide samples were dried and dissolved in 10 μL of 500 mM triethyammonium bicarbonate. Approximately 20 μL of iTRAQ reagents (Multiplex kit, Sciex) was added to 23 μL of ethanol and mixed with the peptide sample. After incubation for one and half hour at room temperature, 16 μL of 10% TFA and 400 μL of loading buffer (0.5% trifluoroacetic acid and 4% (v/v) acetonitrile) were added to quench the reaction, and the sample mixture was desalted using StageTip29. For the phosphoproteome analysis, 100 μg of iTRAQ labelled peptides were used for the phosphopeptide enrichment by HAMMOC, as previously reported31. For iPSC and HDF proteome analyses, 8 μg samples for whole proteome and enriched 150 μg samples for phosphoproteome were subjected to nanoLC-MS/MS analyses.

ACS Paragon Plus Environment

9

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Nano-liquid chromatography (nanoLC)-mass spectrometric (MS) analysis Samples were subjected to nanoLC-MS/MS using a TripleTOF 5600 System (AB Sciex) equipped with an HTC-PAL autosampler (CTC Analytics). Loaded peptides were separated on a self-pulled analytical column (150 mm length, 100 μm i.d.) packed with ReproSil-Pur C18-AQ (3 μm, Dr. Maisch GmbH) or monolithic column (4-m length, 100 μm i.d., GL Science) using a Dionex UltiMate 3000 RSLCnano System. The mobile phases were composed of 0.5% acetic acid with 5% (v/v) DMSO (solution A) and 0.5% acetic acid in 80% (v/v) acetonitrile with 5% (v/v) DMSO (solution B)32. For the beads column, a flow rate of 400 nL/min of 5-10% (v/v) solution B for 5 min, 10-40% solution B for 60 min, 40-100% solution B for 5 min, 100% solution B for 10 min and 5% solution B for 30 min was used (total 110 min). For the monolithic column, a flow rate of 400 nL/min of 5-15% solution B for 205 min, 15-35% solution B for 549 min, 35-40% solution B for 103 min, 40-100% solution B for 5 min, 100% solution B for 118 min and 5% solution B for 100 min was used (total 1,080 min). For the phosphoproteome analyses, a flow rate of 400 nL/min of 5-10% solution B for 210 min, 10-28% solution B for 375 min, 28-40% solution B for 20 min, 40-100% solution B for 5 min, 100% solution B for 10 min and 5% solution B for 100 min was used (total 720 min). The coiled monolithic capillary column was connected to a self-pulled emitter (100 μm i.d., 3-5 μm tip) formed with SutterP-2000 (Novato) and a conductive distal coating end applied with Ion Coater

ACS Paragon Plus Environment

10

Page 10 of 45

Page 11 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Model IB-2 (Eiko Engineering) from which the spray voltage was applied. The applied spray voltage was 2,300 V, and the interface heater temperature was 150 oC. The MS scan range was 300-1,500 m/z every 0.25 s, and the MS/MS scan range was 80-1500

m/z every 0.1 s. The maximum number of candidate ions monitored per cycle was 10, and the cycle time was 1.3 s. The resolution of the Q1 scan was UNIT. To minimize repeated scanning, previously scanned ions were excluded for 12 s from the beads column and 30 s from the monolithic column. Analyses were performed in triplicate per sample, and blank runs were inserted between samples.

Proteome data analysis for protein identification The raw data files were analyzed using ProteinPilot v5.0 (Sciex) with acceptable modifications of N-terminal iTRAQ, iTRAQ of lysine, carbamidomethylation of cysteine, oxidation of methionine, phosphorylation of serine, threonine or tyrosine, deamidation of asparagine or glutamine, the N-terminal pyro-glutamic acid of glutamine or glutamic acid, and protein N-terminal acetylation. Peak lists, which were generated from a ProteinPilot.group file, were analyzed by Mascot v2.5 (Matrix Science) with the carbamidomethylation of cysteine as the fixed modification, and the N-terminal iTRAQ, iTRAQ of lysine, and methionine oxidation as the variable modification. For the phosphoproteome analyses, the phosphorylation of serine, threonine and tyrosine were added to the variable modification. Both database search engines were used against

ACS Paragon Plus Environment

11

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

selected human and E.coli entries or human entries of UniProt/Swiss-Prot release 2016_06 (8-June-2016) with a precursor mass tolerance of 20 ppm, a fragment ion mass tolerance of 0.1 Da, and strict trypsin and Lys-C specificity, which allowed up to two missed cleavages. For the peptide identification, peptides were rejected if any of the following conditions were not satisfied: (a) if the same scan was assigned to different peptides between ProteinPilot and Mascot, (b) peptide confidence was below 0.05, (c) the charge state was more than 5, (d) or the peptide length was less than 6 amino acids. For the protein identification, at least two confidently (p< 0.05) identified peptides per protein were used. Single peptides with higher confidence (p< 0.01) were allowed. Finally, peptides were grouped into protein groups based on previously established rules33. False discovery rates (FDRs) were estimated by searching against a decoy sequence database ( 0.75), class II (0.5 < P ≤ 0.75), or class III (P ≤ 0.5)35. Class I phosphosites were accepted automatically as unambiguous sites. The MS/MS data have been deposited to the ProteomeXchange Consortium via jPOSTrepo36 (https://repository.jpostdb.org/) with the dataset identifier JPST000492 (PXD011913) for E.coli and human proteome analysis, JPST000500 (PXD012368) for iPSC and HDF proteome analysis, and JPST000501 (PXD012367) for iPSC and HDF phosphoproteome analysis.

RESULTS Approximately 50% of MS/MS spectra for identified peptides had interference At first, we evaluated the interference in proteome datasets of E.coli and HeLa cell lysate. To measure the quantification accuracy, we generated two proteome samples using E.coli and human tryptic peptides. E.coli tryptic peptides were tagged with four different labels of reporter ions (m/z: 114, 115, 116 and 117). These labeled peptides were mixed at the following ratios: 114 : 115 : 116 : 117 = 10 : 1 : 5 : 2. Human tryptic peptides were tagged with two different labels of reporter ions at 114 and 115, and these samples were mixed 1:1. This labeled human peptide mix was used as

ACS Paragon Plus Environment

14

Page 14 of 45

Page 15 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

interference in the labeled E.coli peptide mix as previously described19 (Figure 1A). We analyzed the labeled peptides mix with a 400 cm long C18 monolithic column (total gradient time, 1,080 min) using a TripleTOF5600 mass spectrometer. Using single nanoLC-MS/MS analysis, we identified 1,898 E.coli proteins (24,430 peptides) in the

E.coli peptide mix, and 1,622 E.coli proteins (19,111 peptides) and 4,239 human proteins (22,258 peptides) in the E.coli and human peptide mix (average number of technical triplicate). Then, we counted the number of interference ions whose elution time range overlapped with the MS/MS spectra time of the identified peptide (Figure 1B). As a result, about 55% of MS/MS spectra for identified peptides had an interference ion number greater than zero in the E.coli peptide mix analysis (Figure 1C;

E.coli). Adding human tryptic peptides to the E.coli peptide mix increased the percentage to 62% (Figure 1C; E.coli+H). We did the same analysis with a conventional C18 beads column and 110 min gradient time, however, the percentage was constant at approximately

30%

regardless

of

the

inclusion

of

human

tryptic

peptides

(Supplementary figure 1A; E.coli, E.coli+H). We examined why sample complexity showed correlation with the interference ion number in the 1,080 min analysis. To this end, we fractionated the human protein sample into three, ten and twenty samples by 1D SDS-PAGE for the 110 min gradient time analysis. Again, the distribution of the interference ion number was constant at 30% regardless of the protein level fraction (Supplementary figure 1A; E.coli+H 1/3, 1/10, 1/20). This finding indicates that our

ACS Paragon Plus Environment

15

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

methodology for counting interference ion numbers requires a high number of MS/MS scans per peptide peaks, as otherwise the existence of interference may go undetected. The total number of MS/MS scans for the 1,080 min and 110 min gradient times were about 500,000 and 50,000, respectively (one cycle takes 1.25 s (8 Hz)). The peak capacity of the monolithic column was 810 (W1/2 = 0.70 min, t0 = 75.8 min) and of the beads column it was 170 (W1/2 = 0.31 min, t0 = 4.8 min). Based on these numbers, we estimated that a 110 min gradient time would require approximately 20 times more MS/MS scans. Following the above analysis and because the distribution of the interference ion number depended on the sample complexity for the 1,080 min gradient time analysis, we assumed the number of MS/MS scan cycles was appropriate. To improve the quantification accuracy, we analyzed the proteome data by removing MS/MS spectra that had an interference ion number greater than zero. We named this approach removal of interference mixture MS/MS spectra (RiMS).

RiMS increased quantification accuracy To measure quantification accuracy, we compared the quantification results using RiMS with those using all identified spectra (All Spectra). Average values of the six observed E.coli protein ratios were calculated (Figure 2A). The two methods were consistent for the measured ratios and expected ratios of the E.coli peptide mix. Upon

ACS Paragon Plus Environment

16

Page 16 of 45

Page 17 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

adding human tryptic peptides to the E.coli peptide mix, RiMS improved the average of three ratios compared with All Spectra: the average observed ratios for 114/115 (expected ratio of 10), 116/115 (expected ratio of 5) and 117/115 (expected ratio of 2) using All Spectra were 6.3, 2.2 and 1.0, respectively, but using RiMS they were 7.7, 2.9 and 1.2. Although RiMS had better quantification accuracy than All Spectra, the observed and expected ratios were still different. This problem was solved by simple 1D SDS-PAGE gel fractionation using human protein sample into three samples to reduce sample complexity (see Methods, Sample fraction). Following the fractionation, the observed average ratios using All Spectra were 7.3, 3.7 and 1.2, respectively, but using RiMS they were 9.1, 5.0, and 1.6 (Figure 2A; E.coli+H 1/3). To further analyze differences in the expected and observed ratios, we calculated the least squares value (see Methods). We found the least squares value for RiMS was significantly less than that of All Spectra for the E.coli peptide mix with human tryptic peptides (Figure 2B; black and white bars). We also measured the least squares values using MS/MS spectra with non-zero interference ion numbers (Figure 2B; gray bars). Again, the value for RiMS compared with interference-containing spectra was significantly lower. These results suggest that RiMS removed interference-containing MS/MS spectra effectively, thus providing better quantification accuracy. Next, to evaluate the efficacy of RiMS, we measured interference using spectral purity in the MS1 survey scan. Spectral purity was defined as the abundance of the precursor

ACS Paragon Plus Environment

17

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

peptide ion cluster divided by the total ion abundance in the isolation window19, 27, 28, 37. MS spectra that have close to 100% spectral purity were defined as having no interference, and those with close to 0% spectral purity as having 100% interference. We evaluated the m/z dependence of detected precursor ions with MS/MS and of identified peptides by a database search with or without spectral purity and RiMS in the analysis of E.coli peptide mix with human tryptic peptides (Figure 3A). The median of the m/z distribution for detected precursor ions and identified peptides was around 600 and 700, respectively, indicating higher ion complexity at lower m/z. For selected ions using spectral purity, the distribution was similar to that of identified peptide ions, but using RiMS, the median of m/z distribution was shifted to around 800. This shift indicates RiMS removes noisy ions effectively compared to the spectral purity approach. Then, we evaluated the precursor intensity dependence of the identified peptides using All Spec, spectral purities and RiMS (Figure 3B). We found a clear shift to a lower median intensity of RiMS compared to All Spec and spectral purities. This shift indicates RiMS can rescue peptides that have a lower precursor ion intensity (104), but tends to remove peptides that have a higher intensity (105). For the protein level, we calculated the least squares value using All Spectra, RiMS and different spectral purities in the analysis of E.coli peptide mix with human tryptic peptides (Figure 3C; see Methods and the previous section). Increasing the spectral purity decreased the least squares value, indicating improved quantification accuracy. A previous report showed that differentially

ACS Paragon Plus Environment

18

Page 18 of 45

Page 19 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

expressed proteins could be reliably quantified at more than 70% spectral purity27, a level that RiMS can surpass (Figure 3C). We consider two reasons for this improved accuracy.

One is that the spectral purity sometimes gives a poor estimate if the

precursor ions overlap with other ions such as singly charged non-peptide-like ions or fragment ions in the isolation window18. The other reason is that spectral purity tends to be low if the MS1 signal intensity is low38. For this point, we show an example MS spectrum with a spectral purity of 52%, which showed no interference ions (Figure 4A). In this MS spectrum, the identified and quantified precursor peptides (PTW3C_ECOLI, QTIQVIVGAK) have low intensity (less than 1500). Noisy ions were observed in the MS1 selection window resulting in low spectral purity (Figure 4A, upper left panel, red background), but were never identified as peptides. Conditions like these (relatively low amounts of peptides) can be rescued by RiMS, which gives ratios of the reporter ions close to the expected ratios (Figure 4A, upper right panel). Further, fragment ions in the MS/MS spectrum were accurately assigned (Figure 4A, bottom panel). Although coisolated precursor ions could give a similar quantification result as the quantified peptides, we found the least squares value for purities with RiMS was significantly less than that of spectral purities 70% spectral purity in terms

ACS Paragon Plus Environment

20

Page 20 of 45

Page 21 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

of the number of quantified proteins, but was superior in terms of quantification accuracy (Figure 3C, Table 1). We also compared the effect of the iTRAQ intensity threshold on the least squares value (Figure 3D). For >90% spectral purity, there was no difference in least squares values among intensity thresholds. We speculate this result is because the MS/MS spectra for >90% spectral purity contains lower number of peptides which have low precursor ion intensities, as shown in Figure 3B. To include spectra which have lower precursor ion intensities and achieve good quantification accuracy, our data suggest RiMS is suitable at iTRAQ intensity thresholds greater than 150 (Figure 3D).

Quantification accuracy of cell-type specific proteins by RiMS We applied RiMS to the quantitative proteome analysis of a human induced pluripotent stem cell (hiPSC) line and an original human dermal fibroblast (HDF) line. We analyzed peptide samples with a 400 cm long C18 monolithic column (total gradient time: 1,080 min) and mass spectrometer. We identified 66,400 peptides and 6,799 proteins and quantified 42,042 peptides and 6,048 proteins by RiMS (37% and 11% decrease in quantified peptides and proteins, respectively; average number of biological triplicate, with each biological sample a technical triplicate) (Figure 5A). About 57% of MS/MS spectra for identified peptides had an interference ion number greater than zero, as observed in the E.coli and human peptide mix analysis (Figure 5B). We also did a

ACS Paragon Plus Environment

21

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 45

phosphoproteome analysis of the cells. In total, 6,658 phosphopeptides and 1,970 phosphoproteins

were

identified,

and

3,293

phosphopeptides

and

1,386

phosphoproteins were quantified by RiMS (51% and 30% decrease in quantified peptides and proteins, respectively; average number of biological triplicate) (Figure 5C). Surprisingly, almost the same distribution of interference ion numbers (54%) was observed for the phosphopeptide-enriched sample as the whole proteome (57%, Figure 5B). Recent reports showed there exists interference in the analysis of isobaric tagged phosphopeptide-enriched samples40, 41. Thus, our analysis could quantify the complexity of the phosphoproteome, showing that it is higher than expected. hiPSCs and HDFs have specific cell characters that are mainly defined by protein expressions like those of Oct3/4 (TF), Lin28a, Lin41, Dnmt3a/b and Nanog (TF), which represent the pluripotent properties of hiPSCs, and of Fibronectin, Vimentin, CD13, CD44 and CD59, which represent the mesenchymal properties of HDFs. For phosphorylated proteins, we selected the phosphorylated sites of EPS15L1 (an EGF pathway ligand) in iPSCs42. We quantified these proteins using All Spectra and RiMS. For ubiquitously expressed proteins such as Gapdh and Vinculin, we found no differences in the quantified ratios (iPSC/HDF) between the two methods. However, the quantified ratios of several cell-specific proteins including Fibronectin, Vimentin, CD13 and CD44 for HDFs (HDF/iPSC) and Oct3/4, Lin41 for iPSCs (iPSC/HDF) were higher when using RiMS (Figure 5D). These results suggest that RiMS can reduce the ratio

ACS Paragon Plus Environment

22

Page 23 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

compressions at a cost of only a mild decrease in the number of quantified proteins (11%) for the analysis of whole human proteome samples (Figure 5A).

DISCUSSION Many bioinformatic quantification methods have been developed to overcome interference problems of the isobaric labelling method, such as calculation of the spectral purity

19, 27, 28.

However, previous reports have shown that these methods can

incorrectly estimate interference and dramatically decrease the number of quantified proteins18,

38.

New algorithms in combination with spectral purity that correct the

experimental isobaric labelling ratios based on determined peptide interference levels have been reported28, but with the assumption that the majority of proteins do not change significantly per condition. This assumption cannot be applied to different cell types such as iPSCs and HDFs, for which 30% of quantified proteins are upregulated more than 2-fold (972 proteins were differently expressed among the 3,327 quantified proteins in Supplementary figure 3). In the present study, we used the elution time overlap of all precursor ions to evaluate interference, an approach we call RiMS (removal of interference mixture MS/MS spectra). This concept is different from the calculation of the spectral purity, because we defined interference based on the elution time overlap and not on the spectral purity in the isolation window. This approach requires enough MS/MS scan cycles depending on

ACS Paragon Plus Environment

23

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 45

the sample complexity. For this purpose, using a meter-scale long monolithic column with long gradient time is recommended, as it improves the performance of the proteome analysis and is effective at reducing the sample complexity

30, 43, 44.

As a

result, we found 50% of identified MS/MS spectra included at least one interference ion in E.coli and human proteome samples. Removing these interference mixture MS/MS spectra directly improved the quantification accuracy. Overall, we found RiMS had better quantification accuracy than 70% spectral purity with a comparable number of quantified proteins, but it did not perform as well as 90% spectral purity. The quantification accuracy of RiMS can be improved by simple three fractionation experiments using 1D SDS-PAGE gel (Figure 2A), which suggests the sample complexity of the whole proteome demands more MS/MS scan cycles be performed to reach the level of 90% spectral purity. The demerit of RiMS, however, is that quantification accuracy comes at a cost of the number of identified peptides and proteins similar to the spectral purity approach. This effect is much higher at the peptide level than protein level for RiMS. New software development would benefit RiMS performance, such as algorithms that calculate spectral similarity, peptide identification from multiple MS/MS spectra, and effective peak-picking. To further demonstrate the applicability of RiMS, we used it to analyze the hiPSC and HDF proteomes. RiMS improved the quantification ratio for each cell-specific protein with only a mild decrease in the number of quantified proteins (11% decrease).

ACS Paragon Plus Environment

24

Page 25 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Interestingly, RiMS analysis indicated that the phosphoproteome data have a similar degree of interference as the whole proteome, suggesting the phosphoproteome and whole proteome have similar complexity. However, the effects on the ratio compression and the number of quantified peptides and proteins were smaller in the phosphoproteome analysis, suggesting modifications to RiMS such as the removal of noisy MS/MS spectra would enhance the analysis. In summary, we developed a new quantification method, RiMS, to improve the quantification accuracy of isobaric tag approaches. RiMS requires a relatively large proteome data set with sufficient MS/MS scan cycles, but does not require any specialized equipment, such as MS3 instruments or an additional ion separation mode.

SUPPORTING INFORMATION The following supporting information is available free of charge at the ACS website http://pubs.acs.org: Figure S1. Distribution of the interference ion numbers and least squares values using short gradient time analysis and 15 cm beads column Figure S2. RiMS performance with ±0.3, 0.5 and 0.7 m/z window ranges Figure S3. Volcano plots for the analysis with All Spectra (upper panel) and RiMS (lower panel) Figure S4. Intensity distribution of E.coli proteins and human proteins

ACS Paragon Plus Environment

25

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table S1. Protein quantification of the E.coli and Human proteome Table S2. Protein identification and quantification of the iPSC and HDF proteomes Table S3. Peptide identifications by triplicate analyses in iPSCs and HDFs Table S4. Phosphoprotein identifications in iPSCs and HDFs Table S5. Phosphopeptide identifications by iPSC and HDF analysis

AUTHOR CONTRIBUTIONS Y.K. performed the experiments. T.T. generated the quantification program. M.N. and Y.I. advised and directed the study. M.I. performed the experiments, generated the quantification program, directed the study and wrote the manuscript.

ACKNOWLEDGMENT We thank members of the Shinya Yamanaka laboratory at CiRA (Kyoto University) for fruitful discussions, and Koshi Imami and Peter Karagiannis for critical reading of the manuscript. This work was supported by K-CONNEX from the Japan Science and Technology Agency (JST), Core Center for iPS Cell Research from Japan Agency for Medical Research and Development (AMED) and the Japanese Society for the Promotion of Science KAKENHI Grant (JSPS; 13J02403 (M.I.)).

ACS Paragon Plus Environment

26

Page 26 of 45

Page 27 of 45

FIGURE AND TABLE LEGENDS

Figure 1

2

2

+

1

1

0.4

0.2 0

2 1

2

2 1

1

0.4

0.2 0

114 115 116 117

B

3

0 114 115 116 117

114 115 116 117

C

Removal of interference mixture spectra (RiMS)

Examine potential interference precursor ions RT (min)

(a), (b)

x y

1

2

m/z

MS/MS spectrum of two precursor ions of an identified peptide (I1, I2) MS/MS spectrum of two potential interference precursor ions (Ix, Iy) MS/MS spectrum of other precursor ions

Check overlap between the elution time ranges (Tx , Ty ) of the two interference ions and acquisition time of each MS/MS spectrum (I1, I2) of the identified peptide (c, d)

XIC of a precursor ion of an identified peptide I1 I2

XIC of potential interference precursor ions

RT (min)

I1 I2

Tx Ix

RT (min)

Ty RT (min)

Iy

RT (min)

Time of MS/MS spectrum (I1) overlapped with elution time range (Tx and Ty) of Ix and Iy.  discard this spectrum for quantification Time of MS/MS spectrum (I2) did not overlapped with elution time range of any interference ions.  Use this spectrum for quantification

100%

Interference spectra (%)

3

E. coli + human cell lysate (with interference)

Human cell lysate

Proteins (μg)

3

E. coli cell lysate (no interference) Proteins (μg)

A Proteins (μg)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

0

1

2

3≦

80% 60% 40% 20%

45

38

E.coli

E.coli +H

0%

Figure 1 Interference modeling and workflow of the interference removal method (A) Four-plex isobaric tags for relative and absolute quantification (iTRAQ) was used for

E.coli cell lysates, and two-plex iTRAQ was used for human cell lysates. Briefly, E.coli cell lysate analysis was used as the no interference model, and mixed E.coli and human cell lysate analysis was used as an interference model to evaluate quantification accuracy. Numbers above the bars indicated the injected protein amount (µg) in this study.

ACS Paragon Plus Environment

27

Journal of Proteome Research

(B) The workflow for assigning interference mixture MS/MS spectra. See Methods (Proteome data analysis for protein quantification (RiMS) and bioinformatics) for details. XIC: extracted ion chromatogram. (C) The distribution of the interference ion numbers in the identified MS/MS spectra are shown for the E.coli cell lysate analysis (E.coli, no interference) and E.coli and human cell lysate analysis (E.coli+H, with interference). Error bars indicate standard deviations.

Figure 2 All Spectra RiMS

A

5 0 E.coli E.coli E.coli +H +H 1/3

10

Ratio (114/117)

3≦interference 2≦interference 1≦interference All Spectra RiMS

*

3 2 1

50 E.coli E.coli E.coli +H +H 1/3

10

* 5

0 E.coliE.coli E.coli +H +H 1/3 5 4 3 2 1 0

5

*

40

*

20 10

0 E.coliE.coli E.coli +H +H 1/3

0 E.coli

3 2

*

30

4

E.coliE.coli E.coli +H +H 1/3

*

0

Least squares value

*

Ratio (114/116)

**

Ratio (116/115)

10

B

4

Ratio (117/115)

Ratio (114/115)

15

Ratio (116/117)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 45

**

1 0 E.coli E.coli E.coli +H +H 1/3

ACS Paragon Plus Environment

28

E.coli+H

*

Page 29 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2 Observed ratios for All Spectra and RiMS (A) The six observed ratios are shown using All Spectra and RiMS for the analysis of

E.coli, E.coli+H and E.coli+H 1/3 (simple 1D SDS-PAGE gel fractionation using human protein sample into three samples). The expected ratios were 10 (114/115), 2 (114/116), 5 (114/117), 5 (116/115), 2.5 (116/117), and 2 (117/115) (solid red lines). For the calculations, reporter ions were present in all four channels (signal intensity ≧ 200). For E.coli+H analysis, we observed better quantification accuracy by RiMS for the ratios 114/115, 116/115 and 117/115. Error bars indicate standard deviations. (*p