Identification of Alternative Splice Variants Using Unique Tryptic

May 16, 2017 - This combined database can be used as a general database for searching of LC–MS data. LC–MS data derived from in-solution digests o...
2 downloads 0 Views 593KB Size
Subscriber access provided by UB + Fachbibliothek Chemie | (FU-Bibliothekssystem)

Article

Identification of alternative splice variants using unique tryptic peptide sequences for database searches Trung The Tran, Ravi Chand Bollineni, Margarita Strozynski, Christian J. Koehler, and Bernd Thiede J. Proteome Res., Just Accepted Manuscript • Publication Date (Web): 16 May 2017 Downloaded from http://pubs.acs.org on May 16, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Identification of alternative splice variants using unique tryptic peptide sequences for database searches

Trung T. Tran, Ravi C. Bollineni, Margarita Strozynski, Christian J. Koehler, Bernd Thiede*

Department of Biosciences, University of Oslo, Oslo, Norway

*To whom the correspondence should be addressed: Bernd Thiede, Department of Biosciences, University of Oslo, P.O. Box 1066 Blindern, 0316 Oslo, Norway, Tel.: +47-22840533; E-mail: [email protected]

Abbreviations: ASV, alternative splice variant; FDR, false discovery rate; LC, liquid chromatography; MS, mass spectrometry; PSM, peptide spectrum match

1 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Alternative splicing is a mechanism in eukaryotes by which different forms of messenger RNAs (mRNAs) are generated from the same gene. Identification of alternative splice variants requires the identification of peptides specific for alternative splice forms. For this purpose, we generated a human database which contains only unique tryptic peptides specific for alternative splice forms from Swiss-Prot entries. Using this database allows an easy access to splice variant specific peptide sequences that match to MS data. Furthermore, we combined this database without alternative splice variant 1specific peptides with human Swiss-Prot. This combined database can be used as a general database for searching of LC-MS data. LC-MS data derived from in-solution digests of two different cell lines (LNCaP, HeLa), and phosphoproteomics studies were analyzed using these two databases. Several non- alternative splice variant-1 specific peptides were found in both cell lines, some of them seemed to be cell line specific. Control and apoptotic phosphoproteomes from Jurkat T cells revealed several nonalternative splice variant-1 specific peptides and some of them showed clear quantitative differences between the two states.

Keywords: alternative splicing, isoforms, mass spectrometry, phosphorylation, protein species, proteoform, proteomics

2 ACS Paragon Plus Environment

Page 2 of 23

Page 3 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction Most proteins exist in different forms because of alternative splicing, polymorphisms, proteolytical processing, posttranslational modifications and other processes 2,3

address the different protein forms, the terms protein species

1

or proteoforms

. To

4

are

most commonly used. Analysis of protein species/proteoforms can be performed by topdown and bottom-up approaches. Top-down mass spectrometry of undigested proteins has improved significantly during the last years 5. A landmark paper was published in 2011 showing the identification of more than 3,000 protein species of around 1,000 proteins 6. Furthermore, tremendous progress has been achieved in the identification of different posttranslational modifications by bottom-up proteomics approaches, e.g., phosphorylations, acetylations, glycosylations to name a few 7. The same is true for the identification of proteolytical processing events, where approaches for the enrichment and identification of N-terminal and C-terminal peptides play an important role 8. In contrast to posttranslational modifications and proteolytic processing, publications about alternative splice variants using proteomics approaches are scarce despite the described importance of this process in cancer and other diseases

9-12

. Approximately

100,000 alternative splice variants were estimated in humans and several studies proved functionally different alternative splice variants

13

. In addition, an expansion of

protein interaction capabilities by alternative splicing was described

14

. Moreover,

regulation of alternative splicing by phosphorylation has been reported as well

15,16

.

However, analysis of alternative splice variants by proteomics approaches requires the identification of alternative splice variant -specific unique peptides. This goal is particularly difficult to achieve because enrichment methods do not exist. Nevertheless, proteomics has been applied to identify alternative splice variants in mouse models and more than 600 distinct alternative splice variants were identified

17-19

. In these reports,

peptides were mapped against a modified ECgene database with all potential protein sequences of more than 10 million entries. In another publication, more than 10 million MS2 spectra were searched against a database (GenoMS-DB) derived from an in silico digestion of available data sources (Ensembl, Vega, Augustaus) and discovered 10 novel protein-coding genes and 53 alternative splice forms

20

. SpliceVista, a tool for

splice variant identification and visualization, retrieves gene structure and translated 3 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 23

sequences from the alternative splice databases EVDB and ECgene. This program maps MS2 spectra to splice variants and enabled the identification of 939 alternative splice variant-specific peptides in A431 cells and several splice variant differences were found between control and gefitinib-treated cells 21. In this report, we aimed to facilitate the simple and straightforward analysis of alternative splice variants. For this purpose, we generated two databases which includes unique tryptic peptides specific for alternative splice variants. The commonly used database of human Swiss-Prot extended with tryptic peptides of other alternative splice variants can be used as a general search database. The other database contains only tryptic peptides of all alternative splice variants and can be used to specifically study alternative splicing events.

Experimental Section

Materials Acetonitrile (MS grade) was purchased from Burdick Jackson, Seelze, Germany. Acetone (HPLC grade), glycolic acid, glycerol, pyrrolidine, and water (HPLC grade) were purchased from VWR, Oslo, Norway. Ammonium bicarbonate, ammonium glutamate, ammonium hydroxide, bovine serum albumin (BSA), cisplatin, formic acid, βglycerophosphate,

HEPES,

iodoacetamide,

sodium

orthovanadate,

sodium

pyrophosphate, trifluoroacetic acid, and TRIS were bought from Sigma-Aldrich, Oslo, Norway. 1,4-dithiotreitol (DTT) and urea were purchased from Bio-Rad, Munich, Germany. RPMI-1640 and fetal calf serum were obtained from Life Technologies, Oslo, Norway.

Cell culture LNCaP cell pellets were kindly provided by Fahri Saatcioglu (Department of Biosciences, University of Oslo). HeLa cells and Jurkat T-cell line E6 were cultured in RPMI-1640 supplemented with 10 % fetal bovine serum and maintained in a humid incubator at 37°C in a 5 % CO2 environment. For induction of apoptosis, 1 x 106/ml Jurkat T cells were treated with 60 µM cisplatin for 16 hours. Appearance of apoptotic morphology on 4 ACS Paragon Plus Environment

Page 5 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

the cell surface and detection of poly (ADP-ribose) polymerase 1 (PARP-1) cleavage by immunoblot analysis confirmed apoptosis occurrence. In addition, Jurkat T cells were exposed to cisplatin for 0, 1, 2, 4, 6, 8, 16 and 24 h and the stages of apoptosis were assayed applying the MUSE™ annexin V & dead cell kit combined with laser-based fluorescence detection using a MUSE™ cell analyzer (Millipore, Norway, Oslo). HeLa cell pellets were frozen and stored in liquid nitrogen. Pellets of cells were thawed on ice and 800 µl SILAC Phosphoprotein lysis buffer B (Invitrogen, Oslo, Norway) was added. The cell slurry was homogenized with a pestle (20x) for mechanical breakage of the cells followed by sonication using an Ultrasonic processor (UP400s, Dr. Hielscher). Samples were centrifuged at 16,000 g for 20 minutes at 4°C in a Heraeus Biofuge pico (Kendro, Hanau, Germany) and the supernatant was aliquoted in 40 µl aliquots. Jurkat T cells were lysed in lysis buffer (20 mM HEPES, pH 8.0, 9 M urea, 1 mM sodium orthovanadate, 2.5 mM sodium pyrophosphate and 1 mM β-glycerophosphate), sonicated and cleared by centrifugation. Protein concentration was determined by spectrophotometry with BSA as standard curve. Four replicates with each of 1.7 mg protein amount of cisplatin-treated and control were collected for subsequent experiments.

In-solution digestion The cell lysates were precipitated with 6 volumes of acetone overnight, centrifuged at 13,000 rpm for 10 minutes and air dried. The protein pellet was dissolved in 200 µl of 6 M urea in 100 mM ammonium bicarbonate. To the dissolved protein was added 10 µl of 200 mM DTT in 0.1 M Tris-HCl, pH 8 and incubated at 30oC for 30 minutes. 30 µl of freshly prepared 200 mM iodoacetamide was added and the sample was incubated at room temperature in the dark for 1 hour. Subsequently, 40 µl of 200 mM DTT was added and incubated at 30oC in 30 minutes. The sample was then diluted with 960 µl of 50 mM ammonium bicarbonate before 17 µg of trypsin/Lys-c Mix Mass Spec Grade (Promega, Madison, WI, USA) was added and incubated at 37oC for 1 hour and 30oC for 15 hours, respectively. The digestion was finally quenched by adding 20 µl formic acid (50%) and

5 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

peptides were cleaned by SPE using a Strata C18-E cartridge (55 µm, 70 Å, Phenomenex, Værlose, Denmark).

Enrichment of phosphopeptides Phosphopeptide analysis was performed as described

22

. Dried tryptic peptide samples

were dissolved in loading buffer (1 M glycolic acid, 6% trifluoroacetic acid, 5% glycerol, and 80% acetonitrile) under continuous shaking. TiO2 beads (Titansphere, TiO2, GL Sciences Inc, Japan) were washed in loading buffer three times before transferring them to the dissolved tryptic peptide samples. After one hour of continuous shaking, the supernatant was collected and transferred to a new tube containing freshly washed TiO2 beads for a secondary incubation. The TiO2 beads were collected separately and gently washed with 200 µL of loading buffer, 200 µL 80% acetonitrile/2% trifluoroacetic acid, 200 mM ammonium glutamate, and 200 µL of 50% acetonitrile/1% trifluoroacetic acid, respectively. The TiO2 beads were dried and bound peptides were eluted sequentially in 10 minutes at first with 50 µL of 10% ammonium hydroxide, pH 11.7, then with 50 µL of 15% ammonium hydroxide/60% acetonitrile, and finally with 50 µL of 1% pyrrolidine. Eluted peptides were acidified by adding 75 µL 50% formic acid and cleaned up using ZipTip-C18 (Millipore, Billerica, MA, USA).

Liquid chromatography-mass spectrometry (LC-MS) Three technical replicates of the HeLa and LNCaP cell lines and three biological replicates of both phosphoproteome of Jurkat T, control and cisplatin-induced apoptosis were analyzed by LC-MS. The peptide samples were analyzed using an Ultimate 3000 nano-UHPLC system (Dionex, Sunnyvale, CA, USA) connected to a Q Exactive mass spectrometer (ThermoElectron, Bremen, Germany) equipped with a nano electrospray ion source. For liquid chromatography separation, an Acclaim PepMap 100 column (C18, 3 µm beads, 100 Å, 75 µm inner diameter, 50 cm) (Dionex, Sunnyvale CA, USA) was used. A flow rate of 300 nL/min was employed with a solvent gradient of 4-35% B in 207 min, to 50% B in 20 min and then to 80% B in 2 min. Solvent A was 0.1% formic acid and solvent B was 0.1% formic acid/90% acetonitrile. The mass spectrometer was operated in the data-dependent mode to automatically switch between MS and MS/MS 6 ACS Paragon Plus Environment

Page 6 of 23

Page 7 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

acquisition. Survey full scan MS spectra (from m/z 300 to 2000) were acquired with the resolution R = 70,000 at m/z 200, after accumulation to a target of 1e6. The maximum allowed ion accumulation times were 100 ms. Unassigned and charge states 1 and > 7 were excluded from acquisition. The method used allowed sequential isolation of up to the ten most intense ions, depending on signal intensity (intensity threshold 1.7e4), for fragmentation using higher-energy collision induced dissociation (HCD) at a target value of 10,000 charges and a resolution R = 17,500 with NCE 25 and NCE 35 for the phosphoproteomics study and HCD 28 for the other cell lines. Target ions already selected for MS/MS were dynamically excluded for 60 sec. The isolation window was m/z = 2 without offset. The maximum allowed ion accumulation for the MS/MS spectrum was 60 ms. For accurate mass measurements, the lock mass option was enabled in MS mode for internal recalibration during the analysis.

Data analysis Data were acquired using Xcalibur v2.5.5 and raw files were processed to generate peak list in Mascot generic format (*.mgf) using ProteoWizard release version 3.0.7230. Database searches were performed using Mascot in-house version 2.4.0 to search the canonical Swiss-Prot (Human, 20187 sequences), canonical & isoform Swiss-Prot (Human, 42,144 sequences, 24,279,941 residues), Swiss-Prot-plus alternative splice variant ≠ 1 (Human, 44,471, 11,826,192 residues), and the alternative splice variant specific database (Human, 74,259 sequences, 1,434,844 residues) assuming the digestion enzyme trypsin, fragment ion mass tolerance of 0.05 Da, parent ion tolerance of 10 ppm and oxidation of methionines, and acetylation of the protein N-terminus as variable modifications. The false discovery rate (FDR) was adjusted to 1% in Mascot using the canonical Swiss-Prot, canonical & isoform Swiss-Prot, and Swiss-Prot-plus alternative splice variant ≠ 1 databases. Because the alternative splice variant specific database contained only single peptide entries, we used the above identity threshold obtained by the canonical Swiss-Prot database for these searches. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD006026. Reviewer account details: 7 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Username: [email protected] Password: gGMXE5jx

Results and Discussion

Generation of unique peptide-specific search databases to identify alternative splice variants The workflow to generate the two novel search databases for the analysis of alternative splice variants is presented with an example in figure 1. First, the FASTA files of human Swiss-Prot (20,187 entries) and additionally manually curated sequences of alternative splice variants (canonical & isoform; 42,144 entries) were downloaded from UniProt. As a note, we prefer the use of alternative splice variants because the term isoform is also inappropriately used for other protein modifications. To reduce the dataset to proteins with alternative splice variants, proteins with identical accession numbers and entry names were selected. To unify the accession numbers, -1 was added to alternative splice variant 1. The proteins were in silico digested with trypsin without missed cleavage sites. The generated peptide sequences with at least eight amino acids were compared against the whole human proteome of tryptic peptides. Identical sequences were removed to extract alternative splice variants. Because the isobaric amino acids isoleucine and leucine cannot be distinguished by mass spectrometry, sequences with isoleucine/leucine were considered as identical. Testing the N-terminal peptide with and without initiator methionine is hard coded in Mascot because of methionine removal of many proteins by aminopeptidase. To circumvent this problem, a tryptic cleavage site must be added at the N-terminus to sequences starting with N-terminal methionine, otherwise the remaining peptide without methionine would be considered in a database search as well. The resulting database with 74,259 sequences/1,434,844 residues can be used to search for all unique alternative splice variant-specific tryptic peptides (DB1 in figure 1). In addition, a combined database of human Swiss-Prot with unique alternative splice variants-specific tryptic peptides (without alternative splice variant 1) with 44,471 sequences/11,826,192 residues was generated (DB2 in figure 1). These two databases are available as supplementary material (Supplementary files 1 and 2). 8 ACS Paragon Plus Environment

Page 8 of 23

Page 9 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Identification of alternative splice variants in cell line proteomes To compare the performance of the two novel databases against the canonical SwissProt and canonical & isoform Swiss-Prot databases for the identification of alternative splice variants, we first used the LC-MS data of four biological replicates of in-solution digests of two frequently used cell lines, HeLa and LNCaP. The canonical human SwissProt revealed in average 3,856 proteins with 30,357 peptide spectrum matches (PSMs) for HeLa and 3,141 proteins with 23,883 PSMs for LNCaP. Lower numbers were obtained for both cell lines using the database containing the human alternative splice variants (Swiss-Prot-isoform) with 3,844 proteins and 29,602 PSMs for HeLa and 3,120 proteins with 23,300 PSMs for LNCaP (Table 1). The reason for this result is most probably due to the increase of the search space (20,187 vs. 42,144 entries). Thus, using the database which includes the alternative splice variants is not of great use for a standard search. Alternative splice variant 1-specific peptides are covered by a search using the canonical Swiss-Prot database. To cover other alternative splice variants as well, we created a database combining the canonical human Swiss-Prot database and added additional alternative splice variant-specific unique peptides (DB2 in figure 1). Using this database, several non-alternative splice variant-1 protein splice variants were identified in HeLa and LNCaP cells (Table 2). More or less the same number of proteins were identified (3,856 for HeLa, and 3,141 and 3,135 for LNCaP), but additional nonalternative splice variant-1 tryptic peptides were found as well (43 for HeLa, and 25 for LNCaP) (Table 1). Considering PSMs, the total number was also almost exactly the same (Table 1). Thus, a decrease in performance was not observed here. Considering only non-alternative splice variant-1 unique tryptic peptides, we summarized the peptides which were identified in at least two replicates, with three PSMs and a Mascot ion score more than 30 in table 2. Notably, we have not used 1% FDR for the isoform specific database, because it contains only single entries. The intensity threshold obtained by Mascot and using the canonical Swiss-Prot database was always below 18. However, after manual inspection of MS2 spectra, we concluded that a higher Mascot ion score for these single peptides is more reliable. With these 9 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 23

parameters, 37 alternative splice variant-specific peptides of 34 protein alternative splice variants were detected. 16 non-alternative splice variant-1 specific peptides were identified in both cell lines, 17 only in HeLa cells and 4 only in LNCaP cells. A reason for the higher number of identified splice variant-specific peptides in HeLa cells is probably due to the fact that more PSMs have been achieved with this cell line (Table 1). Despite the fact that general PSMs were less in LNCaP cells, a few alternative splice variants were found only in this cell line which indicates to cell line specific differences. The other database was created to search only for unique alternative splice variantspecific tryptic peptides including alternative splice variant 1 (DB1 in figure 1). Considering the number of identified alternative splice variant specific unique tryptic peptides, 25 of 1041 (2.4%) in HeLa and 13 of 648 (2.0%) in LNCaP were not alternative splice variant 1. Similar numbers were obtained considering the PSMs with 33 of 1,731 (1.9%) for HeLa and 22 of 1,178 (1.9%) for LNCaP (Table 2). Thus, the number of identified non-alternative splice variant-1 splice variants was relatively low. We also analyzed the co-existence of multiple alternative splice variants of single proteins. In HeLa and LNCaP cells, alternative splice variant 1 was detected for 9 of 38 nonalternative splice variant-1 specific tryptic peptides (Table 2). This result showed that several alternative splice variants of the same protein can coexist in cell lines.

Identification of alternative splice variants in phosphoproteomics data To compare the performance of the two novel databases with the canonical SwissProt and canonical & isoform Swiss-Prot databases for the identification of phosphorylated alternative splice variants, a large-scale phosphoproteomics study using two different collision energies for HCD (HCD 25 and 35) was performed to compare Jurkat T cells of cisplatin-induced apoptosis against control cells as described

22

. Using

the canonical human Swiss-Prot database, in average 2,816 (cisplatin-induced apoptosis) and 3,204 phosphoproteins with 8,904 and 10,704 PSMs, respectively, were identified using HCD 25. In contrast only 1,835 phosphoproteins and 4,689 PSMs, respectively, were found with the same datasets using the canonical & isoform SwissProt database. Applying the database which we have generated containing the human alternative splice variants in addition to Swiss-Prot, revealed 2,629 and 3,035 10 ACS Paragon Plus Environment

Page 11 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

phosphoproteins and 8,209 and 10,147 PSMs, respectively (Table 1). A similar trend but with lower numbers was observed with HCD 35. In comparison to our database which includes non-alternative splice variant-1 tryptic peptides, a reduction of identified proteins and PSMs of about 7-8% in comparison to the standard Swiss-Prot database was observed, but it performed much better than the canonical & isoform Swiss-Prot database (Table 1). Whereas the identified number of proteins and PSMs was only changed moderately between the three databases in the cell lines, the effect was much more pronounced with the phosphoproteomics datasets, in particular for the Swiss-Prot database which contains isoforms (Table 1). We assume that the reason for this observation is due to the different sizes of the search space and that only few PSMs typically match the corresponding proteins in phosphoproteomics studies. Considering only non- alternative splice variant-1 unique phosphopeptides, fourteen phosphopeptides of non- alternative splice variant-1 protein splice variants have been detected with Mascot ion scores above 30 in at least three replicates and three PSMs (Table 3). Considering the PSMs, eight of these phosphopeptides seemed to be unchanged between control and apoptotic cells. However, the alternative splice variantspecific phosphopeptides of elongation factor 1-delta (Isoform 3), serine/threonineprotein kinase TAO2 (Isoform 2), and zinc finger CCCH-type antiviral protein 1 were only detected in apoptotic cells, EF-hand calcium-binding protein domain-containing protein 4B (Isoform 2), and spectrin beta chain, non-erythrocytic 1 (Isoform 3) showed a more than three-fold difference in PSMs in apoptotic cells compared to control cells, and Rho guanine nucleotide exchange factor 12 (Isoform 2) was only found in control cells (Table 3, Figure 2). Three of these proteins (Elongation factor 1-delta, spectrin beta chain, nonerythrocytic 1, and zinc finger CCCH-type antiviral protein 1) have been found previously in studies of apoptotic cells, whereas the other four proteins (EF-hand calcium-binding protein domain-containing protein 4B, Rho guanine nucleotide exchange factor 12, and serine/threonine-protein kinase TAO2) have not been reported in the Cancer Proteomics database (cancerproteomics.uio.no) which inter alia summarizes published quantitative differences observed in studies of apoptosis using proteomics 23-25.

Conclusions 11 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 23

In this report, we present two databases to identify alternative splice variants with alternative splice variant-specific unique tryptic peptides. These databases can be easily implemented in typical proteomics platforms as shown here with the Mascot search engine. By appending the alternative splice variant-specific unique tryptic peptides to the canonical database, the performance was almost identical to the canonical Swiss-Prot database for standard large-scale proteome analyses of cell lines. The database containing only alternative splice variant-specific tryptic peptides allows the specific study of only alternative splice variants including alternative splice variant 1. Notably, a requirement for the publication of alternative splice variants of proteins is that the peptide sequence and MS2 fragmentation spectrum must be provided, which is easily accessible using this database.

References (1)

Schluter, H.; Apweiler, R.; Holzhutter, H. G.; Jungblut, P. R., Finding one's way in

proteomics: a protein species nomenclature. Chem Cent J 2009, 3, 11. (2)

Jungblut, P.; Thiede, B.; Zimny-Arndt, U.; Muller, E. C.; Scheler, C.; Wittmann-

Liebold, B.; Otto, A., Resolution power of two-dimensional electrophoresis and identification of proteins from gels. Electrophoresis 1996, 17, (5), 839-47. (3)

Jungblut, P. R.; Holzhutter, H. G.; Apweiler, R.; Schluter, H., The speciation of the

proteome. Chem Cent J 2008, 2, 16. (4)

Smith, L. M.; Kelleher, N. L.; Consortium for Top Down, P., Proteoform: a single

term describing protein complexity. Nat Meth 2013, 10, (3), 186-7. (5)

Catherman, A. D.; Skinner, O. S.; Kelleher, N. L., Top Down proteomics: facts

and perspectives. Biochem Biophys Res Commun 2014, 445, (4), 683-93. (6)

Tran, J. C.; Zamdborg, L.; Ahlf, D. R.; Lee, J. E.; Catherman, A. D.; Durbin, K. R.;

Tipton, J. D.; Vellaichamy, A.; Kellie, J. F.; Li, M.; Wu, C.; Sweet, S. M.; Early, B. P.; Siuti, N.; LeDuc, R. D.; Compton, P. D.; Thomas, P. M.; Kelleher, N. L., Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 2011, 480, (7376), 254-8. (7)

Olsen, J. V.; Mann, M., Status of large-scale analysis of post-translational

modifications by mass spectrometry. Mol Cell Proteomics 2013, 12, (12), 3444-52. 12 ACS Paragon Plus Environment

Page 13 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(8)

Journal of Proteome Research

Fortelny, N.; Pavlidis, P.; Overall, C. M., The path of no return--Truncated protein

N-termini and current ignorance of their genesis. Proteomics 2015, 15, (14), 2547-52. (9)

Dvinge, H.; Kim, E.; Abdel-Wahab, O.; Bradley, R. K., RNA splicing factors as

oncoproteins and tumour suppressors. Nat Rev Cancer 2016, 16, (7), 413-30. (10)

Oltean, S.; Bates, D. O., Hallmarks of alternative splicing in cancer. Oncogene

2014, 33, (46), 5311-8. (11)

Chen, J.; Weiss, W. A., Alternative splicing in cancer: implications for biology and

therapy. Oncogene 2015, 34, (1), 1-14. (12)

Lee, Y.; Rio, D. C., Mechanisms and Regulation of Alternative Pre-mRNA

Splicing. Annu Rev Biochem 2015, 84, 291-323. (13)

Nilsen, T. W.; Graveley, B. R., Expansion of the eukaryotic proteome by

alternative splicing. Nature 2010, 463, (7280), 457-63. (14)

Yang, X.; Coulombe-Huntington, J.; Kang, S.; Sheynkman, G. M.; Hao, T.;

Richardson, A.; Sun, S.; Yang, F.; Shen, Y. A.; Murray, R. R.; Spirohn, K.; Begg, B. E.; Duran-Frigola, M.; MacWilliams, A.; Pevzner, S. J.; Zhong, Q.; Trigg, S. A.; Tam, S.; Ghamsari, L.; Sahni, N.; Yi, S.; Rodriguez, M. D.; Balcha, D.; Tan, G.; Costanzo, M.; Andrews, B.; Boone, C.; Zhou, X. J.; Salehi-Ashtiani, K.; Charloteaux, B.; Chen, A. A.; Calderwood, M. A.; Aloy, P.; Roth, F. P.; Hill, D. E.; Iakoucheva, L. M.; Xia, Y.; Vidal, M., Widespread Expansion of Protein Interaction Capabilities by Alternative Splicing. Cell 2016, 164, (4), 805-17. (15)

Stamm,

S.,

Regulation

of

alternative

splicing

by

reversible

protein

phosphorylation. The Journal of biological chemistry 2008, 283, (3), 1223-7. (16)

Naro, C.; Sette, C., Phosphorylation-mediated regulation of alternative splicing in

cancer. Int J Cell Biol 2013, 2013, 151839. (17)

Menon, R.; Omenn, G. S., Proteomic characterization of novel alternative splice

variant proteins in human epidermal growth factor receptor 2/neu-induced breast cancers. Cancer Res 2010, 70, (9), 3440-9. (18)

Menon, R.; Zhang, Q.; Zhang, Y.; Fermin, D.; Bardeesy, N.; DePinho, R. A.; Lu,

C.; Hanash, S. M.; Omenn, G. S.; States, D. J., Identification of novel alternative splice isoforms of circulating proteins in a mouse model of human pancreatic cancer. Cancer Res 2009, 69, (1), 300-9. 13 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(19)

Page 14 of 23

Omenn, G. S.; Menon, R.; Zhang, Y., Innovations in proteomic profiling of

cancers: alternative splice variants as a new class of cancer biomarker candidates and bridging of proteomics with structural biology. J Proteomics 2013, 90, 28-37. (20)

Brosch, M.; Saunders, G. I.; Frankish, A.; Collins, M. O.; Yu, L.; Wright, J.;

Verstraten, R.; Adams, D. J.; Harrow, J.; Choudhary, J. S.; Hubbard, T., Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and "resurrected" pseudogenes in the mouse genome. Genome Res 2011, 21, (5), 756-67. (21)

Zhu, Y.; Hultin-Rosenberg, L.; Forshed, J.; Branca, R. M.; Orre, L. M.; Lehtio, J.,

SpliceVista, a tool for splice variant identification and visualization in shotgun proteomics data. Mol Cell Proteomics 2014, 13, (6), 1552-62. (22)

Tran, T. T.; Strozynski, M.; Thiede, B., Quantitative phosphoproteome analysis of

cisplatin-induced apoptosis in Jurkat T cells. Proteomics 2017. (23)

Arntzen, M. O.; Bull, V. H.; Thiede, B., Cell death proteomics database:

consolidating proteomics data on cell death. Journal of proteome research 2013, 12, (5), 2206-13. (24)

Arntzen, M. O.; Thiede, B., ApoptoProteomics, an integrated database for

analysis of proteomics data obtained from apoptotic cells. Mol Cell Proteomics 2012, 11, (2), M111.010447. (25)

Arntzen, M. O.; Boddie, P.; Frick, R.; Koehler, C. J.; Thiede, B., Consolidation of

proteomics data in the Cancer Proteomics database. Proteomics 2015, 15, (22), 376571.

Acknowledgement: Financial support from the Norwegian Cancer Society (Project 4514636) and MLS@UiO is gratefully acknowledged.

Conflict of interest Disclosure: The authors declare no competing financial interest.

14 ACS Paragon Plus Environment

Page 15 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 1: Number of identified proteins and PSMs using different databases The number of identified proteins (no. proteins) and PSMs are presented for different cell lines (HeLa, LNCaP, Jurkat). General protein analysis was performed for HeLa and LNCaP cell lines. Phosphoprotein analysis was applied to Jurkat T cells to compare cisplatin-induced apoptosis (CisPt) with untreated controls (Ctrl). Two different normalized collision energies (25 and 35, respectively) were used for HCD. The number in brackets displays the number of identified additional alternative splice variants (ASVs) beside alternative splice variant 1 (≠ 1). Sample/DB

HeLa Swiss-Prot Swiss-Prot-isoform Swiss-Prot-plus ASV ≠ 1 ASV-specific peptides LNCaP Swiss-Prot Swiss-Prot-isoform Swiss-Prot-plus ASV ≠ 1 ASV-specific peptides Jurkat pSTY CisPt HCD25 Swiss-Prot Swiss-Prot-isoform Swiss-Prot-plus ASV ≠ 1 ASV-specific phosphopeptides Jurkat pSTY CisPt HCD35 Swiss-Prot Swiss-Prot-isoform Swiss-Prot-plus ASV ≠ 1 ASV-specific phosphopeptides Jurkat pSTY Ctrl HCD25 Swiss-Prot Swiss-Prot-isoform Swiss-Prot-plus ASV ≠ 1 ASV-specific phosphopeptides Jurkat pSTY Ctrl HCD35 Swiss-Prot Swiss-Prot-isoform Swiss-Prot-plus ASV ≠ 1 ASV-specific phosphopeptides

No. proteins

PSMs

(≠ 1)

(≠ 1)

3856 3844 3856 (+43) 1041 (+25)

30357 29602 30304 (+54) 1721 (+33)

3141 3120 3135 (+25) 648 (+13)

23883 23300 23841 (+40) 1178 (+22)

2816 1835 2600 (+29) 84 (+11)

8904 4689 8137 (+72) 159 (+25)

2498 1445 2192 (+26) 65 (+9)

5695 4335 4990 (+72) 124 (+17)

3204 2258 2997 (+38) 92 (+9)

10704 6351 10047 (+100) 182 (+19)

2776 1760 2496 (+29) 70 (+6)

6612 3742 6118 (+84) 153 (+13)

15 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 23

Table 2: Identified non-alternative splice variant-1 protein splice variants in HeLa and LNCaP. Protein name, alternative splice variant (ASV), accession number, and the number of replicates/PSMs/maximum Mascot score (R/P/M) for all biological replicates are displayed for the two cell lines. Only alternative splice variants are shown which were identified with at least two replicates, three PSMs and Mascot ion score more than 30. The peptide view of the replicate with highest Mascot ion score out of the LC-MS analyses of all biological replicates for each cell line for each protein is shown in supplementary figure 1. Bold protein name and corresponding protein name displays identification of alternative splice variant 1-specific peptides of the same protein.

Protein name

ASV

Accession

Adenylyl cyclase-associated protein 1 Alpha-aminoadipic semialdehyde dehydrogenase Arginyl-tRNA--protein transferase 1 Caspase recruitment domain-containing protein 19 Cysteine--tRNA ligase, cytoplasmic DNA fragmentation factor subunit alpha Elongation factor 1-delta Epimerase family protein SDR39U1 Far upstream element-binding protein 1 F-box/LRR-repeat protein 18 Glutaminase kidney isoform, mitochondrial Glutaminase kidney isoform, mitochondrial Heterogeneous nuclear ribonucleoprotein A1 Heterogeneous nuclear ribonucleoprotein K Inhibitor NF-κB kinase-interacting protein Leucine-rich repeat flightless-interacting protein 1 LIM domain only protein 7 Lipopolysaccharide-responsive and beige-like anchor protein Myosin light polypeptide 6 NADH dehydrogenase [ubiquinone] flavoprotein 3, mitochondrial NADH dehydrogenase [ubiquinone] flavoprotein 3, mitochondrial Nicalin Nucleolar and coiled-body phosphoprotein 1 Phosphate carrier protein, mitochondrial Prelamin-A/C Profilin-2 Prosaposin Pyruvate kinase PKM

2 2 ATE1-2 2 3 DFF35 3 2 2 4 3 3 A1-A 3 4 4 3 2

Q01518-2-1 P49419-2-1 O95260-2-1 Q96LW7-2-3 P49589-3-3 O00273-2-1 P29692-3-1 Q9NRG7-2-1 Q96AE4-2-3 Q96ME1-4-3 O94925-3-1 O94925-3-2 P09651-2-1 P61978-3-1 Q70UQ0-4-4 Q32MZ4-4-5 Q8WWI1-3-1 P50851-2-2

3/3/79 2/3/45

Sm. muscle 2

P60660-2-1 P56181-2-2

3/7/158 4/4/56

2

P56181-2-8

3/3/35

2 3 B C IIb Sap-mu-9 M1

Q969V3-2-1 Q14978-3-1 Q00325-2-2 P02545-2-1 P35080-2-1 P07602-3-1 P14618-2-3

16 ACS Paragon Plus Environment

HeLa (R/P/M) 3/3/45 3/6/73 4/4/35 3/3/60

LNCaP (R/P/M)

3/3/42 3/3/42 4/4/80 3/3/64 3/3/40 4/4/33 4/4/112 4/6/109 4/6/225 4/6/74 4/4/45

4/6/91

2/4/81 3/3/45 2/4/65

4/8/242 4/7/69 4/10/122 4/4/38

4/8/77 4/10/127

3/6/32 4/7/155 4/13/76 4/10/125 2/5/54

3/4/78

Page 17 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Reticulon-4 Ribonucleoprotein PTB-binding 1 Ribonucleoprotein PTB-binding 1 Splicing regulatory glutamine/lysine-rich protein 1 Splicing regulatory glutamine/lysine-rich protein 1 Transcription factor E2-alpha Ubiquitin carboxyl-terminal hydrolase 5 Unconventional myosin-VI Vacuolar protein sorting-associated protein 29

2 2 2 2 2 E47 Short 6 2

Q9NQC3-2-1 Q8IY67-2-2 Q8IY67-2-7 Q8WXA9-2-1 Q8WXA9-2-3 P15923-2-1 P45974-2-1 Q9UM54-6-1 Q9UBQ0-2-1

17 ACS Paragon Plus Environment

4/4/59 2/3/67 4/5/148 3/4/52 3/4/102 3/3/73 4/7/53 4/5/168

3/3/85

4/8/108 4/4/44

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Page 18 of 23

Table 3: Phosphopeptides of non-alternative splice variant-1 protein splice variants in Jurkat T cells. Protein name, alternative splice variant (ASV), accession number, and the number of replicates/PSMs/maximum Mascot score (R/P/M) all biological replicates are displayed for the two cell lines. Only alternative splice variants are shown which were identified with at least two replicates, three PSMs and Mascot ion score more than 30. Phosphorylated amino acids are shown in bold if the modified residue could be identified with more than 95% probability. The peptide view of the replicate with highest Mascot ion score out of the LC-MS analyses of all biological replicates for each protein is shown in supplementary figure 2. Phosphorylated peptides with a more than three-fold difference in number of replicates and PSMs are marked in bold in columns CisPt and Ctrl, respectively.

Protein name

ASV

Accession

Peptide sequence

EF-hand calcium-binding domain-containing protein 4B (EFC4B) Elongation factor 1-delta (EF1D) HBS1-like protein (HBS1L) Plectin (PLEC) Protein kinase C beta type (KPCB) Protein kinase C beta type (KPCB) Ras GTPase-activating protein-binding protein 2 (G3BP2) Rho guanine nucleotide exchange factor 12 (ARHGC) Ribonucleoprotein PTB-binding 1 (RAVR1) Ribonucleoprotein PTB-binding 1 (RAVR1) Serine/threonine-protein kinase TAO2 (TAOK2) Spectrin beta chain, non-erythrocytic 1 (SPTB2) UV excision repair protein RAD23 homolog A (RD23A)

2

Q9BSW2-2-11

IISVEEDPLPQLLDGGFEQPLSK

3 2 4 Beta-II Beta-II B 2 2 2 2 2 3

P29692-3-1 Q9Y450-2-15 Q15149-4-1 P05771-2-1 P05771-2-2 Q9UN86-2-1 Q9NZN5-2-1 Q8IY67-2-2 Q8IY67-2-3 Q9UL54-2-1 Q01082-3-1 P54725-3-1

Zinc finger CCCH-type antiviral protein 1 (ZCCHV)

3

Q7Z2W4-3-5

QSSGPGASSGTSGDHGELVVR LSSTDSLESLLSK TSSEDNLYLAVLR NIDQSEFEGFSFVNSEFLKPEVK HPPVLTPPDQEVIR STTPPPAEPVSLPQEPPKPR IASHDFDPTGLVQR SSGGSGGGPLSHFYSGSPTSYFTSGLQAGLK LLSPLSSAR AASGGSGSENVGPPAAAVPGPLSR TSSISGPLSPAYTGQVPYNYNQLEGR AVEYLLTGIPGSPEPEHGSVQESQVSEQPATEAG ENPLEFLR NLVPTTPGESTAPAQVSTLPQSPAALSSSNR

18 ACS Paragon Plus Environment

CisPt (R/P/M) 4/6/91

Ctrl (R/P/M) 1/1/38

3/3/47 4/4/120 4/4/111 4/10/241 6/15/148 6/19/222 4/5/104 5/5/42 4/5/110 6/26/475 6/14/158

4/4/112 4/4/98 4/10/231 6/17/191 6/23/256 4/5/51 3/3/74 2/2/43 5/8/257 6/12/155

3/3/32

-

Page 19 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure legends

Figure 1: Workflow to generate the two databases A hypothetical example for a protein with two alternative splice variants is presented. Sequence differences between the two alternative splice variants are highlighted in blue for alternative splice variants 1 and red for alternative splice variants 2, respectively. The different steps included to select proteins with identical accession numbers/entry names, in silico tryptic digestion, selection of sequences with at least eight amino acids, selection of unique peptides including exchange of leucine and isoleucine. Finally, two novel databases were created (DB1 and DB2).

Figure 2: Non-alternative splice variants-1 specific phosphopeptides in control and apoptotic Jurkat T cells. Proteins reported in table 3 are presented with protein entry names (without _HUMAN). Protein entry names in bold represent significantly changed proteins in apoptotic and non-apoptotic cells. For each protein, the upper box represents the sequence of alternative splice variant 1, and the lower box the identified alternative splice variant (Table 3). For larger proteins, only the sequence covering the relevant identified phosphorylation sites is shown, indicated by amino acids numbers in front and/or at the end of the sequences. Blue boxes display sequence differences between the alternative splice variants, and black boxes to missing sequences. Identified sites are displayed with bold lines on top of the sequence if the site could be order to a single amino acid or as grey boxes if the phosphorylation sites could not be ordered to a single amino acid (Table 2). If the identified phosphorylated amino acid residue was found in corresponding sequences between the two alternative splice variants, it was checked if the same site(s) has been reported in the phosphosite database (www.phosphosite.org) and is shown as a thin line. In such cases, the different peptide sequences are displayed.

19 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 23

Supporting Information

Supplementary figure 1: Mascot peptide view of non-alternative splice variant 1 protein forms in HeLa and LNCaP.

Supplementary figure 2: Mascot peptide view of phosphopeptides of non-alternative splice variant 1 protein forms in HeLa and LNCaP.

Supplementary file 1: Database of alternative splice variant-specific proteotypic tryptic peptides.

Supplementary file 2: Database of combined human Swiss-Prot and tryptic peptides specific for alternative splice forms beside alternative splice variant 1.

Supplementary table 1: Expansion of table 2.

Supplementary table 2: Expansion of table 3.

Supplementary table 3: Mascot search results of a representative biological replicate of HeLa cells.

Supplementary table 4: Mascot search results of a representative biological replicate of LNCaP cells.

20 ACS Paragon Plus Environment

Page 21 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

Journal of Proteome Research

For TOC only Isoform

Peptide

>sp|P20936-3-1|RASA1_HUMAN GLIDLSVCSVYVVHDSLFGR

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

Page 22 of 23

Figure 1 Swiss-Prot canonical & isoform sp|Q00001|Example_HUMAN MNQEEFVRFDSDVGEYRAVTELGARSLEHWNSRVK VTVYPSKTQ----PLQHYNLLKVCS

sp|Q00001-2|Example_HUMAN Isoform2 MNQEEFVRFDSDVGEYRMTLSELGARILEHWNSRVKV TVYPSKTQSSLTPLQHYNLLKVCS

Proteins with identical accession numbers/entry names sp|Q00001-1|Example_HUMAN sp|Q00001-2|Example_HUMAN Isoform2 MNQEEFVRFDSDVGEYRAVTELGARSLEHWNSRVHK MNQEEFVRFDSDVGEYRMTLSELGARILEHWNSRVKV VTVYPSKTQ----PLQHYNLLKVCS TVYPSKTQSSLTPLQHYNLLKVCS In silico tryptic digestion, minimum 8 amino acids sp|Q00001-1|Example_HUMAN MNQEEFVR, FDSDVGEYR, AVTELGAR, SLEHWNSR, TQPLQHYNLLK

sp|Q00001-2|Example_HUMAN Isoform2 MNQEEFVR, FDSDVGEYR, MTLSELGAR, ILEHWNSR, TQSSLTPLQHYNLLK

Proteotypic peptides (incl. Ile=Leu) sp|Q00001-1|Example_HUMAN AVTELGAR, SLEHWNSR, TQPLQHYNLLK

sp|Q00001-2|Example_HUMAN Isoform2 MTLSELGAR, ILEHWNSR, TQSSLTPLQHYNLLK sp|Q00815-1|Other-Example_HUMAN LLEHWNSR

Alternative splice variants specific proteotypic peptide (DB1) sp|Q00001-1-1|Example_HUMAN AVTELGAR sp|Q00001-1-2|Example_HUMAN SLEHWNSR sp|Q00001-1-3|Example_HUMAN TQPLQHYNLLK sp|Q00001-2-1|Example_HUMAN Isoform2 AAAKMTLSELGAR sp|Q00001-2-2|Example_HUMAN Isoform2 TQSSLTPLQHYNLLK

Swiss-Prot plus alternative splice variants ≠ 1 of DB1 (DB2) sp|Q00001-1|Example_HUMAN MNQEEFVRFDSDVGEYRAVTELGARSLEHWNSRVK VTVYPSKTQ----PLQHYNLLKVCS sp|Q00001-2-1|Example_HUMAN Isoform2 AAAKMTLSELGAR sp|Q00001-2-2|Example_HUMAN Isoform2 TQSSLTPLQHYNLLK

* ACS Paragon Plus Environment *

Page 23 of 23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58

Journal of Proteome Research

Figure 2 EFC4B SLAGSSGPGASSGTSGDHGELVVR

EF1D

QSSGPGASSGTSGDHGELVVR

HBS1L 500

PLEC

500

KPCB STTPPPAEPVSLPQEPPKAFSWASVTSK

G3BP2

STTPPPAEPVSLPQEPPKPR 500 IASHDFDPTDSSSK

ARHGC

481 IASHDFDPTGLVQR

RAVR1 TAOK2

700

1235

700

1049 500

SPTB2

500 AVEYLLTGIPGSPEPEHGSVQESQVSEQPATEAAGENPLEFLRVQR AVEYLLTGIPGSPEPEHGSVQESQVSEQPATEAGENPLEFLRVQR

RD23A ZCCHV

400 400

ACS Paragon Plus Environment