High-Confidence de Novo Peptide Sequencing Using Positive Charge

Mar 28, 2013 - highly efficient on-tip charge derivatization and tandem MS spectra merging ... the use of peptide N-terminal positive charge derivatiz...
2 downloads 0 Views 1MB Size
Subscriber access provided by Otterbein University

Article

High-Confidence De Novo Peptide Sequencing Using Positive Charge Derivatization and Tandem MS Spectra Merging Mingrui An, Xiao Zou, Qingsong Wang, Xuyang Zhao, Jing Wu, LiMing Xu, Hong-yan Shen, Xueyuan Xiao, Dacheng He, and Jianguo Ji Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/ac4001699 • Publication Date (Web): 28 Mar 2013 Downloaded from http://pubs.acs.org on April 1, 2013

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

High-Confidence De Novo Peptide Sequencing Using Positive Charge Derivatization and Tandem MS Spectra Merging Mingrui An a, ‡, Xiao Zou a, ‡, Qingsong Wang a, Xuyang Zhaoa, Jing Wua, Li-Ming Xua, Hong-Yan Shena, Xueyuan Xiaob, Dacheng Heb and Jianguo Jia* a

State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences,

Peking University, Beijing 100871, China b

Key Laboratory for Cell Proliferation and Regulation Biology, Ministry of Education,

Beijing Normal University, Beijing 100875, China

Author email address Mingrui An; [email protected] Xiao Zou; [email protected] Qingsong Wang; [email protected] Xuyang Zhao; [email protected] Jing Wu; [email protected] Li-Ming Xu; [email protected] Hong-Yan Shen; [email protected] Xueyuan Xiao; [email protected] Dacheng He; [email protected] Jianguo Ji; [email protected]

*

To whom correspondence should be addressed. P. O. Box 31, State Key Laboratory of

Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing

1

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

100871,

China.

Phone:

(86)-10-62755470.

Fax:

Page 2 of 33

(86)-10-62751526.

[email protected]. ‡

These authors contributed equally to this work.

2

ACS Paragon Plus Environment

E-mail:

Page 3 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Key words: CID, De novo peptide sequencing, ETD, SCX, TMPP-Ac-OSu

3

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 33

Abstract De Novo peptide sequencing holds great promise in discovering new protein sequences and modifications, but has often been hindered by low success rate of mass spectra interpretation, mainly due to the diversity of fragment ion types and insufficient information for each ion series. Here, we describe a novel methodology that combines highly efficient on-tip charge derivatization and tandem MS spectra merging, which greatly

boosts

the

performance

(Succinimidyloxycarbonylmethyl

of

interpretation.

TMPP-Ac-OSu

tris(2,4,6-trimethoxyphenyl)phosphonium

bromide)

was used to derivatize peptides at N-termini on tips to reduce mass spectra complexity. Then, a novel approach of spectra merging was adopted to combine the benefits of CID and ETD fragmentation. We applied this methodology on rat C6 glioma cells and the Cyprinus carpio, and searched the resulting peptide sequences against the protein database. Then, we achieved thousands of high-confidence peptides, a level that conventional de novo sequencing methods could not reach. Next, we identified dozens of novel peptide sequences by homology searching of sequences that were full backbone covered but unmatched during database search. Furthermore, we randomly chose 34 sequences discovered in rat C6 cells and verified. Finally, we conclude that this novel methodology that combines on-tip positive charge derivatization and tandem MS spectra merging will greatly facilitate the discovery of novel proteins and the proteome analysis of non-model organisms.

4

ACS Paragon Plus Environment

Page 5 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Introduction The de novo peptide sequencing strategy is characterized by deriving peptide sequences directly from the mass differences between fragment ions in tandem MS spectra,1,2 making it different from the routine database search strategy that retrieves all candidate peptide sequences from a specified protein sequence database and scores each peptide-spectrum match to find the most possible hit.3 Hence, the routine database search strategy has limited usage in identification of proteins from species without databases (genomes unsequenced) or unreported protein variations even from a genome-sequenced organism. However, the de novo peptide sequencing strategy is able to derive peptide sequences from all organisms. Once novel peptide sequences are obtained, homology searching BLAST (Basic Local Alignment Search Tool, which finds regions of similarity between biological sequences) can be used for protein identification. De novo peptide sequencing holds great potential in discovery of new protein sequences and modifications,4,5 but the low success rate of interpretation has limited its wide application,6 which is mainly caused by the insufficient fragmentation information and the overlap of different ion series (N-terminal and C-terminal ions) in tandem MS spectra.7,8 Here, we aim to obtain nearly exclusive b ions by using peptide N-terminal positive charge derivatization9 and a more comprehensive sequence information by exploiting the complementarity of CID and ETD fragmentation. TMPP-Ac-OSu

(succinimidyloxycarbonylmethyl

tris(2,4,6-

trimethoxyphenyl)phosphonium bromide) is a positive charge derivatization reagent that derivatizes peptides in a simple, mild and N-termini-specific manner. This derivatization adds a hydrophobic and positively charged moiety at the N-termini of peptides; thus

5

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

greatly increases the peptide ionization and simplifies their tandem MS spectra.10,11 However, one drawback is that it brings a considerable amount of by-products (dominated by TMPP-Ac-OH) which hamper its application on minute amount of sample (such as less than 1 pmol) that is the frequent case in proteomics studies.12-14 Here, we demonstrate an on-tip approach where the complex or minute amount of peptide samples are concentrated, derivatized and purified on homemade C8-SCX-C8 StageTips.15 This drastically reduces the formation of side products and alleviates the loss of TMPP-Ac derivatized peptides allowing complex or minute amount of samples being successfully tagged and analyzed. The doubly charged TMPP-Ac-peptides produce “cleaner” CID spectra with dominant b1+ ion series16. We found they also produced straightforward ETD spectra with exclusive c1+ ion series in company with several known background ions. For most peptides, CID spectra or ETD spectra alone do not contain sufficient fragmentation information for de novo sequencing.17 We transformed straightforward c1+ ion series in ETD spectra to virtual b1+ ion series and inserted them into corresponding CID spectra using an in-house computer procedure. Thus the complementarity of CID and ETD spectra was fully exploited, which resulted in the successful interpretation of a substantial number of peptide sequences that could not be correctly interpreted from CID or ETD spectra alone. We applied this methodology to rat C6 glioma cells and the cyprinus carpio. Briefly, the tryptic peptide samples were tagged with TMPP-Ac moiety and analyzed with nanoLC-MS/MS. The resulting CID and ETD spectra were merged and de novo sequenced using the commercial software PEAKS studio.18-20 To assess the statistical significance of this methodology, the peptide sequences were searched against database

6

ACS Paragon Plus Environment

Page 6 of 33

Page 7 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

by PEAKS search, thousands of high confidence peptides (FPR < 1%) were identified; while none was identified without positive charge derivatization or mass spectra merging. Interestingly, hundreds of reliable peptide sequences were found unmatched during database search. Next BLAST step indicated that some of these peptide sequences were homologous with known ones, while others were non-homologous at all. We also chose and verified 34 such peptide sequences discovered in rat C6 cells.

7

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 33

Materials and Methods Materials TMPP-Ac-OSu

bromide

(Succinimidyloxycarbonylmethyl

tris(2,

4,

6-

trimethoxyphenyl) phosphonium bromide) was from Sigma (St. Louis, MO, USA). Rat C6 glioma cell line (CCL-107) were from the American Type Culture Collection (Rockville, MD, USA). C8 and SCX beads were from Agela Technologies Inc. (Newark, USA). Water used in all experiments was purified by a MilliQ purification system (Millipore, Bedford, MA). All other chemicals were purchased from commercial sources and were of analytical grade. Extraction, separation and digestion of proteins Proteins (50 μg) from the rat C6 glioma cell lysate or the cyprinus carpio were extracted21 and separated by one-dimensional (1D) SDS-PAGE. The entire protein gel lane was excised and cut into 24 or 32 slices. Proteins were processed to alkylate cysteine using 10 mM DTT (56 °C, 40 min) and 25 mM iodoacetamide (in the dark, 25 °C, 30 min) followed by digestion using trypsin (Promega) with a ratio of about 50 : 1 (37 °C, 8 h). Preparation of C8-SCX-C8 StageTips C8-SCX-C8 StageTips were made as follows. First, a small portion of C8 Empore material was placed in an Ependorf pipette tip as previously described.15 Then the C8 beads (5 μm, 300 Å) suspended in methanol were added to the Ependorf pipette tip and precipitated onto Empore material by centrifugation at 1 000 × g for 5 min to form a C8 beads layer. Then, the SCX (5 μm, 300 Å) and another C8 beads layers were formed in the same way. All the next on-tip equilibrium, wash and elution steps had the same centrifugation parameters.

8

ACS Paragon Plus Environment

Page 9 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

MALDI-TOF mass spectrometry and database search MALDI-TOF MS analysis was performed with an Ultraflex MALDI–TOF/TOF mass spectrometer (Bruker Daltonics, Germany) equipped with a nitrogen UV laser (337 nm) under the control of the software FlexControl™ 2.2. The TOF MS spectra were recorded in the positive ion reflector mode and analyzed with FlexAnalysis™ 2.2 and Biotools™ 2.2. The protein identification was performed by peptide mass fingerprinting (PMF) using the Mascot database search engine (Matrix Science, UK; version 2.2.2). Matrix peaks, such as 833.090, 855.072, 871.046, 907.994, 1060.088, 1082.070, 1098.044, and porcine trypsin autolysis peaks, such as 842.510, 870.541, 1045.564, 2211.105, 2225.120, 2283.181 and 3337.758 were removed in a range of ± 50 ppm. A custom monoisotopic modification titled TMPP-Ac (N-term) (m/z 573.18841) was set in the modifications file of Mascot to accommodate the TMPP-Ac derivatization of the peptide N-terminus. The search parameters were set as follows: Database, Swiss-Prot; Taxonomy, Mammalia; Enzyme, Trypsin; Fixed modifications, Carbamidomethyl (C); Variable modifications, Oxidation (M) and TMPP-Ac (N-term); no restrictions on protein mass; up to one missed cleavage allowed. For PMF, peptide tolerance was set at ± 50 ppm. Protein scores greater than 61 were considered significant (p ≤ 0.05). On-tip positive charge derivatization of tryptic peptides After washed with 45% acetonitrile, the first C8 layer of stage tip was equilibrated with 1% TFA and loaded by the mixture of tryptic peptides and the derivatization reagent at a ratio of 1:2 (w/w) (in 1% TFA). The buffer on the C8 layer was then exchanged with 20 mM NaHCO3 (pH 8.2) to trigger the reaction. After incubation at room temperature for 2 h, the C8 layer was washed with 1% TFA followed by elution with SCX buffer A (5

9

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

mM KH2PO4, 45% acetonitrile, pH 3.0), while TMPP-Ac-peptides would combine onto SCX beads and the by-products (mainly TMPP-Ac-OH) ran off. After the removing of by-products, TMPP-Ac-peptides were eluted by SCX buffer B (5 mM KH2PO4, 500 mM KCl, 5% acetonitrile, pH 3.0) and loaded on the second C8 layer. The products were desalted with 1% TFA and eluted with 45% acetonitrile. NanoLC-ESI mass spectrometry The TMPP-Ac derivatized tryptic peptides were separated using a NanoLC system (Micro-Tech Scientific, Vista, USA). The C8 needle-column was prepared using a fusedsilica capillary pulled with a P-2000 laser puller (Sutter Instruments Co., Novato, USA) as described before22 and packed with 10 cm of C8 beads (5 μm, 300 Å). The samples were eluted using a 6 min linear gradient from 5% to 17% and a 75 min linear gradient from 17% to 25% acetonitrile in 0.1% formic acid at a constant flow rate of 500 nL/min. Mass spectra were recorded on an LTQ-Orbitrap-ETD XL (Thermo Fisher Scientific Inc., Waltham, USA) mass spectrometer. The capillary temperature and the spray voltage were maintained at 200 °C and 2.0 kV, and the tube lens was set as 250 (m/z tuned for 1296). The data was acquired in a data-dependent mode, the three strongest peaks of each MS acquisition were selected for subsequent MS/MS analysis; for every selected peak, a CID (collision energy 35%) was performed followed by an ETcaD23 with a maximum reagent ion injection time of 150 ms and a reaction time of 100 ms. The survey full-scan MS spectra (from m/z 300 to 2000) and the fragment ion spectra were both acquired in the Orbitrap; tube lens was automatically set as ‘250’. MS spectra merging

10

ACS Paragon Plus Environment

Page 10 of 33

Page 11 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Firstly, the ‘.raw’ files were converted to ‘.mzXML’ files and then to ‘.mgf’ files using the programs MzXML2 Search and ReAdw (Thermo Fisher Scientific Inc., Waltham, USA). These ‘.mgf’ files were processed by in-house developed program TEMC (Transforming ETD spectra and Merging them into CID spectra) (Supporting Note 1) which worked according to the following principles. (1). The m/z of peaks less than 646, larger than 2 × parent peptide2+ - 128 and within several zones which were between ‘background peaks’ - 3.015 and ‘background peaks’ + 3.015 were removed on ETD spectra. These background peaks were parent peptide2+, 2 × parent peptide2+ 168.0781, 2 × parent peptide2+ - 533.1935. (2). Now the ETD spectra only had the c1+ ion series, masses of all peaks were minus 17.02655 to form virtual b1+ ion series. (3). The intensity of all peaks in ETD spectra was altered to be equal with the strongest peak in corresponding CID spectra. (4). The resulting ETD spectra were merged into corresponding CID spectra to form new ‘.mgf’ files. De novo peptide sequencing The ‘.mgf’ files were de novo sequenced using the commercial software PEAKS Studio (version 5.2). The processing steps were ‘Data Files Adding’, ‘Data Refine’ and ‘De Novo’. A custom monoisotopic modification titled TMPP-Ac (N-term) (573.18841) was set to accommodate TMPP-Ac derivatization. The parameters of ‘Data Refine’ were set as follows: Correct Precursor Mass, Correct Precursor Charge States (Min charge = 2, Max charge = 2) and Filter Scans (Precursor mass between 1220.0 and 5000.0 Da) were all chosen; Data Preprocess was ‘yes’. The parameters of ‘De Novo’ were set as follows: Error Tolerance for Parent ion was 10 ppm and Fragment ion was 0.015 Da; Enzyme, Trypsin; Fixed modifications, Carbamidomethyl (C) and TMPP-Ac (N-term); Variable

11

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 33

modifications, Oxidation (M); up to three variable modifications per peptide were accepted; report up to one candidates per spectrum. PEAKS search These interpreted peptides were searched against the protein database by PEAKS search to evaluate their confidence using the PEAKS studio (version 5.2). We wanted to get the peptides with FPR less than 1% using a filtering threshold like the database search approach using tandem MS spectra. The parameters were set as follows: Error Tolerance for Parent ion was 10 ppm and Fragment ion was 0.015 Da; Enzyme, Trypsin; up to two missed cleavages per peptide were allowed; Fixed modifications, Carbamidomethyl (C) and TMPP-Ac (N-term); Variable modifications, Oxidation (M); up to three variable modifications per peptide were allowed; Database, the rat International Protein Index (IPI) (the latest version 3.87, 39,925 protein entries, downloaded February 2013) or homemade fish database (272,530 protein entries, downloaded March 2013) which was generated by extracting from the National Center for Biotechnology Information (NCBI) web site (http://www.ncbi.nlm.nih.gov/protein); Validation with reverse database as decoy was selected. MS Blast Peptides containing full b1+ ion series but unmatched in PEAKS search were selected for similarity-search using MS Blast algorithms. Searches were performed at the Heidelberg

server

(http://dove.embl-heidelberg.de/Blast2/msblast.html)

using

the

following parameters: Database, non-redundant database (nrdb); unique peptides, one; score table, a hundred. Results were considered reliable if the scores were higher than the threshold score (marked in red).

12

ACS Paragon Plus Environment

Page 13 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Results On-tip derivatization and cleanup TMPP-Ac-OSu has attracted more interest than other positive charge derivatization reagents.24,25 However, huge amount of by-products formed during traditional in-solution derivatization12,26 seriously interfered with reversed-phase (RP) chromatographic separation and mass spectrometric detection (Figure 1a), thus limiting its application in analysis of minute amount of sample. The SCX separation column was used to remove side products;12 however, it was inconvenient and hard to provide proper weight ratio of SCX beads to peptide sample (20 – 80 according to our experiments), resulting in considerable sample loss. To effectively remove by-products with less sample loss, we developed a derivatization and purification method that exploits StageTips (Figure 2). Adopting this procedure, we do not find the by-products contamination any more (Figure 1b). The TMPP-Ac tag dramatically increased the hydrophobic nature of derivatized peptides, resulting in the deterioration of their chromatographic resolution on C18 column. We separated

TMPP-Ac-peptides

with

C8

column,

which

presented

excellent

chromatographic resolution (Figure 1c). Evaluating the significance of derivatization Regardless of charge state, native peptides typically produce complex CID spectra due to the overlap of different ion series (Figure S1a), while the doubly charged TMPPAc-peptides produce simplified CID spectra (Figure S1b), because ranges of N-terminal and C-terminal ions are not very overlapped, and the intensities of N-terminal ions are

13

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

enhanced.27 In addition, derivatized N-terminal ions avoid the low-mass cut off in CID spectra due to the additional mass of TMPP-Ac tag. After optimizing tube lens for high mass (tuning 1221), doubly charged peptides increased to more than 90% of the parent ion population (Table S1). Owing to the positive charge derivatization, the intensities of peptides did not show significant compromise. Next, we set out to demonstrate the statistical significance of TMPP-Ac derivatization on complex samples using rat C6 glioma cell lysate. After SDS-PAGE separation and ingel tryptic digestion, peptides were extracted and equally divided into two aliquots. One aliquot was directly analyzed on LC-MS/MS as a negative control, while the other was derivatized using the on-tip method before MS analysis. The same mass spectrometry parameters were used: both full-scan MS spectra and the CID spectra were collected in the Orbitrap, whereby amino acids residues of similar masses (such as Phe and oxidized Met, Lys and Gln) could be discriminated. The results were de novo sequenced by PEAKS studio, where Leu and Ile residues were both arbitrarily designated as Leu residues. These interpreted peptide sequences were searched against database using PEAKS search. From the control sample, only 469 peptides (FPR range: 2.73% ~ 2.98%) (Figure S2a) were identified, in which no peptide was high-confidence (FPR < 1%) (Figure S2b, Table 1). However, from the derivatized sample, 3801 peptides (FPR range: 0.36% ~ 3.01%) (Figure S3a) were identified, in which 2879 peptides were highconfidence (Figure S3b). Evaluating the significance of tandem MS spectra merging

14

ACS Paragon Plus Environment

Page 14 of 33

Page 15 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

The doubly charged TMPP-Ac-peptides produce simplified CID and ETD spectra (Figures 3). However, in most cases, the CID spectra present incomplete b1+ ion series and other type or noise ions; the ETD spectra also present incomplete c1+ ion series. It is well established that CID and ETD (or ECD) spectra belonging to the same precursor can be paired up to obtain more fragmentation information.28 A few algorithms have been proposed to achieve this goal.2,29 Here, tandem MS spectra of the both fragmentation types present mainly singly charged N-terminal ions owing to the positive derivatization; hence, we developed a program TEMC to merge the ETD spectra into corresponding CID spectra before de novo sequencing. The resulting ETD & CID merged spectrum looks like a CID spectrum but contains more straightforward and comprehensive b1+ ion series. A large amount of peptides which could not be interpreted using only their CID spectra or ETD spectra were successfully interpreted using the merged spectra. For example, the CID spectrum (Figure 4a) of the peptide TMPP-Ac-HADICTLPD was de novo sequenced by PEAKS Studio, the resulting sequence (Figure 4b) was not correct due to the insufficient fragmentation information. The ETD spectrum (Figure 4c) of this peptide showed worse de novo result (Figure 4d). Even if the software became more mature, the sequence still could hardly be correctly interpreted also because of the insufficient fragmentation information. When the CID spectrum and the ETD spectrum were merged (Figure 4e), the comprehensive fragmentation information was presented. Although the intensities of virtual b1+ ions in merged spectra become different during the refinement process which optimizes spectra for next de novo peptide sequencing, the interpreted peptide sequence is still the same as expected (Figure 4f).

15

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

To estimate the statistical significance of tandem MS spectra merging, peptide sequences interpreted from the CID data and the CID & ETD merged data were searched against database by PEAKS search, respectively. Peptides from C6 cells with Peptide Score higher than 72.5 in the CID data (Figure S3a) and higher than 15.7 in the CID & ETD merged data (Figure S4a) are high-confidence; in the latter there are as many as 3307 such peptides achieved (Figure S4b and Table S2). For the cyprinus carpio (a species with the genome unsequenced), the spectra merging is much more necessary. After being searched against database by PEAKS search, no high-confidence peptide could be identified (1971 peptides with FPR between 3.68% and 14.00%) only using the CID data (Figure S5); while 985 high-confidence peptides (Figure S6 and Table S3) were identified using the CID & ETD merged data (Table 1). BLAST of peptides with full b1+ ion series The PEAKS search only identifies peptides whose sequences exist in the protein database despite of a couple of amino acids reversed. However, homologous mutations, such as Val to Leu, Asp to Glu, etc cannot be discovered. We used PEAKS search to estimate the significance of charge derivatization and MS spectra merging and filter out those known peptide sequences. As described above, more reliable sequences could be interpreted using tandem MS spectra with more comprehensive b1+ ion series. We found many peptide sequences containing full b1+ ion series but unmatched in PEAKS search, then selected 100 such sequences from the rat and the cyprinus carpio, respectively, and identified them by the MS Blast30,31 algorithms. According to the MS Blast results, we classified these peptide sequences into three categories. The first category contained sequences existing in protein databases (data not

16

ACS Paragon Plus Environment

Page 16 of 33

Page 17 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

shown). They were not identified by PEAKS search because of the hydrolysis of amide (i.e. Gln to Glu, Asn to Asp), the arbitrary designation of Ile as Leu residues and/or the insufficiency of protein entries in IPI_3.87 database. The second category contains sequences which are homologous with known ones (Table 2 and Table S4). The third category contains sequences not being identified by the MS Blast (Table S5). We speculate that the hydrolysis of amide may take place during the sample preparation steps (such as peptides extraction using 1% TFA). The cyprinus carpio has more sequences in the second category than the rat. We speculate the reason is that the cyprinus carpio, as a non-model organism whose genome is not sequenced, has very insufficient protein entries in the database; thus, more homologous sequences can be discovered. Sequences in the third category may be the most specific sequences of a species. In this category, the cyprinus carpio (Figure S7) also has more sequences than the rat. Although peptide sequences with full b1+ ion series were reliable theoretically, we aimed to testify if these sequences were really correct. Thirty four such peptide sequences from C6 cells were randomly chosen (Figure S8) and chemically synthesized. Firstly, we verified them by comparing the tandem MS spectra of these synthesized peptides (Figure S9) with those of peptides from C6 cells. Afterwards, we randomly synthesized and purified 6 antibodies using synthesized peptides and New Zealand rabbits. We performed Western Blotting of C6 cells protein and got positive results (Figure S10). We also added these 34 peptides into the protein database and searched against using the data of underivatized sample from C6 cells, and finally identified 12 peptides. The above three experiments suggests that reliable peptide sequences are definitely correct and that our methodology is feasible and powerful.

17

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Discussion Many other strategies, such as negative charge derivatization32,33 and Lys-N digestion29,34 have been used to simplify ion series in tandem MS spectra. The negative charge derivatization strategy has much more compromising sensitivity and worse comparability with ESI mass spectrometry because the negative charge tag reduces the yield of ionization and induces in-source fragmentation.35 Lys-N strategy is not limited by the above difficulty and has been applied to real complex sample; but only a small part of peptides have been sequenced, because most peptides have an additional non-Nterminal basic amino acid which do not produce simplified tandem MS spectra at all.36 Since the problems of the negative charge derivatization and Lys-N digestion strategies are inherent and can hardly be improved, the positive charge derivatization strategy is relatively more promising. The derivatization reaction taking place on C8 layer of StageTips rather than in solution37,38 was to concentrate peptides and the derivatization reagent. When derivatizing peptides, TMPP-Ac-OSu was hydrolyzed by water molecules to form TMPP-Ac-OH (by-product). We tested on tryptic peptides from BSA, three different concentration (1 mM, 0.1 mM, 0.01 mM) of which were derivatized in solution at the same ratio of peptides to TMPP-Ac-OSu (w/w = 1:1). We found that higher concentrations of peptides and TMPP-Ac-OSu showed the higher derivatization efficiency of peptides (Figure S11). In other words, to ensure sufficient derivatization, higher concentrations of peptides and TMPP-Ac-OSu lower the ratio of TMPP-Ac-OSu to peptides, resulting in fewer by-products generated. Much less by-products led to less

18

ACS Paragon Plus Environment

Page 18 of 33

Page 19 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

sample loss in the subsequent cleanup step on SCX layer, which efficiently removed the by-products (dominated by TMPP-Ac-OH) to help the mass spectrometric detection of TMPP-Ac derivatized peptides. Here, this methodology greatly increased the confidence of interpreted peptide sequences. Thus, thousands of high-confidence peptides from complex samples (rat C6 cells and the cyprinus carpio) were identified by PEAKS search and FPR filtration. Since the routine database search of tandem MS spectra meets many difficulties, such as peptide MS/MS homology (not sequence homology), typically resulting wrong identifications of proteins,28 the strategy of de novo peptide sequencing combined with database search of peptide sequences were urgently needed as an complement to provide more reliable results. In this strategy, it was vital to achieve more correct peptide sequences using this methodology at the former step. Furthermore, this methodology can be applied to handling PTMs, such as fragile phosphorylation (Figure S12). MS Blast (similarity search) is a powerful tool to discover homologous protein variants and localize novel or unusual PTMs, which can hardly be handled by routine database search of tandem MS spectra. In recent years, different gene models and proteome dynamics have aroused broad concern.39 Some algorithms, such as PepSplice,40 were used to match MS/MS spectra against the genome database to identify non-specific splicing peptides. Many novel proteins from genome regions that have no annotated protein-coding capacity have been reported. Here, peptides that were full backbone covered but unmatched in PEAKS search or MS Blast are the best candidates to verify those novel proteins.

19

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

In the

18

O-water enzymolysis strategy, the N-link glycosylation is localized usually

according to the sites of Asn residues which convert to Asp residues.41 The discovery of hydrolysis of asparagines and glutamine suggests that we should be very cautious to prepare samples during the enzymolysis process and screen out possible false positive results during data processing.

Conclusions De novo peptide sequencing using on-tip positive charge derivatization combined with spectra merging of CID and ETD tandem MS data is greatly valuable for achieving highconfidence peptide sequences. As the on-tip method efficiently tagged positive charge moiety (TMPP-Ac) on peptides N-termini without troublesome by-products (TMPP-AcOH), which were dominant in traditional derivatization procedure, peptides from whole proteome sample were positively charged derivatized and analyzed on mass spectrometer successfully for the first time. The doubly charged TMPP-Ac-peptides produce simplified CID spectra with dominant b1+ ion series and straightforward ETD spectra with exclusive c1+ ion series. We optimized mass spectrometry parameters to increase the percentage of doubly charged peptide ions more than 90%, then transformed straightforward c1+ ion series in ETD spectra to virtual b1+ ion series and merged them into corresponding CID spectra using new computer program TEMC. Thus the complementarity of CID and ETD spectra was sufficiently exploited; a large amount of peptide sequences which could not be correctly interpreted using CID or ETD spectra alone were successfully interpreted. We applied this methodology to rat C6 glioma cells and the cyprinus carpio (genome unsequenced) which enabled the identification of thousands of high-confidence peptides

20

ACS Paragon Plus Environment

Page 20 of 33

Page 21 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

(false positive rate, FPR < 1%) by database search, a level the traditional procedure without charge derivatization or spectra mergence could not reach. In addition, many homologous and non-homologous peptide sequences with reported sequences were discovered. We randomly synthesized 35 peptides and verified them in several ways. We believe this methodology will have broad application in discovery of novel proteins and proteome analysis of non-model organisms.

21

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Supporting information: Supplemental tandem MS spectra, lists of peptides and other data are provided in the supporting information. This material is available free of charge via the Internet at http://pubs.acs.org/.

Acknowledgments: This work was supported by grants from National Natural Science Foundation of China (No. 31270872, 31200610, 30970652) and National Key Basic Research Program of China (No. 2010CB912203, 2011CB915504). We thank Xishu Chen, Zishen Gao from Thermo Scientific for assistance in application of the PEAKS software.

22

ACS Paragon Plus Environment

Page 22 of 33

Page 23 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

References: (1) Ma, B.; Zhang, K. Z.; Hendrie, C.; Liang, C. Z.; Li, M.; Doherty-Kirby, A.; Lajoie, G. Rapid Commun. Mass Spectrom. 2003, 17, 2337-2342. (2) Chi, H.; Chen, H.; He, K.; Wu, L.; Yang, B.; Sun, R. X.; Liu, J.; Zeng, W. F.; Song, C. Q.; He, S. M.; Dong, M. Q. J. Proteome Res. 2013, 12, 615-625. (3) Dancik, V.; Addona, T. A.; Clauser, K. R.; Vath, J. E.; Pevzner, P. A. J Comput. Biol. 1999, 6, 327-342. (4) Jia, C.; Hui, L.; Cao, W.; Lietz, C. B.; Jiang, X.; Chen, R.; Catherman, A. D.; Thomas, P. M.; Ge, Y.; Kelleher, N. L.; Li, L. Mol. Cell. Proteomics 2012, 11, 19511964. (5) Bhatia, S.; Kil, Y. J.; Ueberheide, B.; Chait, B. T.; Tayo, L.; Cruz, L.; Lu, B.; Yates, J. R., 3rd; Bern, M. J. Proteome Res. 2012, 11, 4191-4200. (6) Allmer, J. Expert Rev. Proteomics 2011, 8, 645-657. (7) Madsen, J. A.; Brodbelt, J. S. Anal. Chem. 2009, 81, 3645-3653. (8) Chi, H.; Sun, R. X.; Yang, B.; Song, C. Q.; Wang, L. H.; Liu, C.; Fu, Y.; Yuan, Z. F.; Wang, H. P.; He, S. M.; Dong, M. Q. J. Proteome Res. 2010, 9, 2713-2724. (9) Zaia, J.; Biemann, K. J. Am. Soc. Mass Spectrom. 1995, 6, 428-436. (10) Huang, Z. H.; Wu, J.; Roth, K. D. W.; Yang, Y.; Gage, D. A.; Watson, J. T. Anal. Chem. 1997, 69, 137-144. (11) Huang, Z. H.; Shen, T. L.; Wu, J. A.; Gage, D. A.; Watson, J. T. Anal. Biochem. 1999, 268, 305-317. (12) Chen, W. B.; Lee, P. J.; Shion, H.; Ellor, N.; Gebler, J. C. Anal. Chem. 2007, 79, 1583-1590. (13) An, M. R.; Dai, J. Q.; Wang, Q. S.; Tong, Y. P.; Ji, J. G. Rapid Commun. Mass Spectrom. 2010, 24, 1869-1874. (14) Gallien, S.; Perrodou, E.; Carapito, C.; Deshayes, C.; Reyrat, J. M.; Van Dorsselaer, A.; Poch, O.; Schaeffer, C.; Lecompte, O. Genome Res. 2009, 19, 128-135. (15) Rappsilber, J.; Mann, M.; Ishihama, Y. Nat Protoc. 2007, 2, 1896-1906. (16) Adamczyk, M.; Gebler, J. C.; Wu, J. Rapid Commun. Mass Spectrom. 1999, 13, 1413-1422. (17) Coon, J. J. Anal. Chem. 2009, 81, 3208-3215. (18) Wilson, J. J.; Brodbelt, J. S. Anal. Chem. 2006, 78, 6855-6862. (19) Zhang, J.; Xin, L.; Shan, B.; Chen, W.; Xie, M.; Yuen, D.; Zhang, W.; Zhang, Z.; Lajoie, G. A.; Ma, B. Mol. Cell. Proteomics 2012, 11, M111 010587. (20) Ma, B.; Johnson, R. Mol. Cell. Proteomics 2012, 11, O111 014902. (21) Yang, W.; Liu, P.; Liu, Y. S.; Wang, Q. S.; Tong, Y. P.; Ji, J. G. Proteomics 2006, 6, 2982-2990. (22) Gatlin, C. L.; Kleemann, G. R.; Hays, L. G.; Link, A. J.; Yates, J. R. Anal. Biochem. 1998, 263, 93-101. (23) Swaney, D. L.; McAlister, G. C.; Wirtala, M.; Schwartz, J. C.; Syka, J. E. P.; Coon, J. J. Anal. Chem. 2007, 79, 477-485. (24) Mirzaei, H.; Regnier, F. Anal. Chem. 2006, 78, 4175-4183. (25) Roth, K. D. W.; Huang, Z. H.; Sadagopan, N.; Watson, J. T. Mass Spectrom. Rev. 1998, 17, 255-274. (26) Regnier, F. E.; Julka, S. Proteomics 2006, 6, 3968-3979.

23

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(27)

Sadagopan, N.; Watson, J. T. J. Am. Soc. Mass Spectrom. 2000, 11, 107-

119. (28) Zubarev, R. A.; Zubarev, A. R.; Savitski, M. M. J. Am. Soc. Mass Spectrom. 2008, 19, 753-761. (29) Altelaar, A. F.; Navarro, D.; Boekhorst, J.; van Breukelen, B.; Snel, B.; Mohammed, S.; Heck, A. J. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 407-412. (30) Shevchenko, A.; Sunyaev, S.; Loboda, A.; Shevehenko, A.; Bork, P.; Ens, W.; Standing, K. G. Anal. Chem. 2001, 73, 1917-1926. (31) Samyn, B.; Sergeant, K.; Carpentier, S.; Debyser, G.; Panis, B.; Swennen, R.; Van Beeumen, J. J. Proteome Res. 2007, 6, 70-80. (32) Keough, T.; Youngquist, R. S.; Lacey, M. P. Proc. Natl. Acad. Sci. U. S. A. 1999, 96, 7131-7136. (33) Zhou, C. X.; Zhang, Y. J.; Qin, P. B.; Liu, X.; Zhao, L. Y.; Yang, S. H.; Cai, Y.; Qian, X. H. Rapid Commun. Mass Spectrom. 2006, 20, 2878-2884. (34) Boersema, P. J.; Taouatas, N.; Altelaar, A. F. M.; Gouw, J. W.; Ross, P. L.; Pappin, D. J.; Heck, A. J. R.; Mohammed, S. Mol. Cell. Proteomics 2009, 8, 650-660. (35) Dai, J. Q.; Wang, J. L.; Zhang, Y. J.; Lu, Z.; Yang, B.; Li, X. H.; Cal, Y.; Qian, X. H. Anal. Chem. 2005, 77, 7594-7604. (36) Taouatas, N.; Drugan, M. M.; Heck, A. J. R.; Mohammed, S. Nat. Methods 2008, 5, 405-407. (37) Czeszak, X.; Morelle, W.; Ricart, G.; Tetaert, D.; Lemoine, J. Anal. Chem. 2004, 76, 4320-4324. (38) Kuyama, H.; Sonomura, K.; Nishimura, O.; Tsunasawa, S. Anal. Biochem. 2008, 380, 291-296. (39) Baerenfaller, K.; Grossmann, J.; Grobei, M. A.; Hull, R.; HirschHoffmann, M.; Yalovsky, S.; Zimmermann, P.; Grossniklaus, U.; Gruissem, W.; Baginsky, S. Science 2008, 320, 938-941. (40) Roos, F. F.; Jacob, R.; Grossmann, J.; Fischer, B.; Buhmann, J. M.; Gruissem, W.; Baginsky, S.; Widmayer, P. Bioinformatics 2007, 23, 3016-3023. (41) Kuster, B.; Mann, M. Anal. Chem. 1999, 71, 1431-1440.

24

ACS Paragon Plus Environment

Page 24 of 33

Page 25 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figures legends: Figure 1. Comparison of total ion chromatogram (TIC) of TMPP-Ac derivatized tryptic peptides from C6 cells lysate with different approaches. Peptides were TMPP-Ac derivatized by traditional in-solution method followed by C18 column separation (A), by on-tip method followed by C18 column separation (B) and by on-tip method followed by C8 column separation (C).

Figure 2. Work flow of the on-tip TMPP-Ac derivatization of peptide. After the preparation and equilibration, tryptic peptides and derivatization reagent were loaded on the C8-SCX-C8 StageTips, concentrated and triggered to react on the top C8 layer. The SCX layer efficiently removes by-products; and the bottom C8 layer completes the procedure by desalting.

Figure 3. CID spectra and ETD spectra of doubly charged TMPP-Ac-peptides from rat C6 cells (A, B, C, D) and cyprinus carpio (E, F, G, H). CID spectra present dominantly b1+ ions, along with a few ions of other types and background noise, whereas ETD spectra present exclusively c1+ ions with a few known background ions. These two types of tandem MS spectra are generally complementary and can be used to construct spectra with more complete ion series.

Figure 4. De novo peptide sequencing results of tandem MS spectra of peptide HADICTLPDTEK. The CID (A) and ETD (C) spectra are merged into a new spectrum (E) with an in-house program TEMC. The three MS/MS spectra are refined and de novo

25

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

sequenced in PEAKS Studio software. Peptide sequence is incorrectly interpreted using CID (B) or ETD (D) spectrum alone, while it is correctly interpreted using the merged spectrum which presents a complete b1+ ion series (F).

Figure TOC. A novel methodology that combines highly efficient on-tip charge derivatization and tandem MS spectra merging for de novo peptide sequencing. Through simplifying and merging CID and ETD spectra, we fully exploited the merits of the two types of fragmentation methods and obtained complete peptide sequence information

26

ACS Paragon Plus Environment

Page 26 of 33

Page 27 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table 1. Overview of PEAKS database search results of peptides from the rat C6 cell and the cyprinus carpio. Range of false

Number of high-

positive rate

confidence (FPR

(FPR)

< 1) peptides

5.5 ~ 98.7

2.73 ~2.98

0

3801

5.0 ~ 99.0

0.36 ~3.01

2879

3533

5.0 ~ 99.1

0.37 ~1.31

3307

1971

5.1 ~ 99.1

3.68 ~14.00

0

1702

5.0 ~ 99.1

0.73 ~5.57

985

Total number of

Range of PEAKS

identified peptides

score (%)

CID spectra

469

CID spectra

Spectra type Native peptides from the rat C6 cell TMPP-Acpeptides from the rat C6 cell TMPP-Acpeptides from the rat C6 cell

CID & ETD merged

TMPP-Acpeptides from the

CID spectra

Cyprinus carpio TMPP-Acpeptides from the Cyprinus carpio

CID & ETD merged

27

ACS Paragon Plus Environment

Analytical Chemistry

Table 2. MS Blast results of the rat peptide sequences whose homologous sequences and corresponding proteins were identified. Amino acid residues before and in brackets are from de novo peptide sequencing and the rat protein database, respectively.

Homologous with known sequences

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Rat C6 cells ALAN(D)SLACEGK……………………..stathmin-2 DSYVGE(D)EAQSK……………………. actin, cytoplasmic 2 EE(D)QTEYLEER………………… …… heat shock protein 90 FNVWDTAA(G)QEK…………………… GTP-binding nuclear protein Ran NLQYYE(D)LSAK……………………… GTP-binding nuclear protein Ran TPAQFDAE(D)ELR……………………...annexin A1 Y(F)EDENFLLK………………………….peptidyl-prolyl cis-trans isomerase A

28

ACS Paragon Plus Environment

Page 28 of 33

Page 29 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Comparison of total ion chromatogram (TIC) of TMPP-Ac derivatized tryptic peptides from C6 cells lysate with different approaches. Peptides were TMPP-Ac derivatized by traditional in-solution method followed by C18 column separation (A), by on-tip method followed by C18 column separation (B) and by on-tip method followed by C8 column separation (C). 101x70mm (600 x 600 DPI)

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Work flow of the on-tip TMPP-Ac derivatization of peptide. After the preparation and equilibration, tryptic peptides and derivatization reagent were loaded on the C8-SCX-C8 StageTips, concentrated and triggered to react on the top C8 layer. The SCX layer efficiently removes by-products; and the bottom C8 layer completes the procedure by desalting. 46x31mm (600 x 600 DPI)

ACS Paragon Plus Environment

Page 30 of 33

Page 31 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

CID spectra and ETD spectra of doubly charged TMPP-Ac-peptides from rat C6 cells (A, B, C, D) and cyprinus carpio (E, F, G, H). CID spectra present dominantly b1+ ions, along with a few ions of other types and background noise, whereas ETD spectra present exclusively c1+ ions with a few known background ions. These two types of tandem MS spectra are generally complementary and can be used to construct spectra with more complete ion series. 109x68mm (300 x 300 DPI)

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

De novo peptide sequencing results of tandem MS spectra of peptide HADICTLPDTEK. The CID (A) and ETD (C) spectra are merged into a new spectrum (E) with an in-house program TEMC. The three MS/MS spectra are refined and de novo sequenced in PEAKS Studio software. Peptide sequence is incorrectly interpreted using CID (B) or ETD (D) spectrum alone, while it is correctly interpreted using the merged spectrum which presents a complete b1+ ion series (F). 98x55mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 32 of 33

Page 33 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

A novel methodology that combines highly efficient on-tip charge derivatization and tandem MS spectra merging for de novo peptide sequencing. Through simplifying and merging CID and ETD spectra, we fully exploited the merits of the two types of fragmentation methods and obtained complete peptide sequence information 47x26mm (600 x 600 DPI)

ACS Paragon Plus Environment