Depletion of high molecular mass proteins for the identification of

7 days ago - Articles ASAP · Current Issue · Submission & Review ... biological samples is hampered by the dominance of higher molecular weight protei...
0 downloads 0 Views 525KB Size
Subscriber access provided by NJIT | New Jersey Institute of Technology

Article

Depletion of high molecular mass proteins for the identification of small proteins and short open reading frame encoded peptides in cellular proteomes Liam Cassidy, Philipp T. Kaulich, and Andreas Tholey J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00948 • Publication Date (Web): 19 Feb 2019 Downloaded from http://pubs.acs.org on February 20, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Depletion of high molecular mass proteins for the identification of small proteins and short open reading frame encoded peptides in cellular proteomes

Liam Cassidy1, Philipp T. Kaulich1, Andreas Tholey1*

1Systematic

Proteome Research & Bioanalytics, Institute for Experimental Medicine,

Christian-Albrechts-Universität zu Kiel, 24105 Kiel, Germany

* To whom correspondence should be addressed: Andreas Tholey Systematic Proteome Research & Bioanalytics, Institute for Experimental Medicine Christian-Albrechts-Universität zu Kiel 24105 Kiel, Germany Phone: #49 (431) 500 30300; Fax: #49 (431) 500 30308 E-mail: [email protected]

1 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 30

Abstract The identification of small proteins and peptides (below ca. 100 to 150 amino acids) in complex biological samples is hampered by the dominance of higher molecular weight proteins. On the other hand, the increasing knowledge about alternative or short open reading frames creates a need for methods that allow the existence of the corresponding gene products to be proven in proteomics experiments. We present an acetonitrile-based precipitation methodology which depletes the majority of proteins above ca. 15 kDa. Parameters such as depletion mixture composition, pH and temperature, were optimized using a model protein mixture and the method was evaluated in comparison to the established differential solubility method. The approach was applied to the analysis of the low molecular weight proteome of the archaea Methanosarcina mazei by means of LC-MS. The data clearly show a beneficial effect from a reduction of complexity, especially in terms of the quality of MS/MS based identification of small proteins. This fast, detergent-free method allowed for, with minimal sample manipulation, the successful identification of several not yet identified short open reading frame encoded peptides in M. mazei.

Keywords Acetonitrile precipitation, Methanosarcina mazei, peptidomics, short open reading frame

2 ACS Paragon Plus Environment

Page 3 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction Driven by the recent and rapid (re)analyses of numerous genomes, and the development of sophisticated ribosome sequencing technologies, numerous alternative open reading frames (altORF) have been identified across almost all biological kingdoms.1 The identification of these altORF has led to the realisation that the number of potential small proteins and peptides encoded by organisms may be far greater than previously anticipated. A defining feature of altORF predictions is the size of the translated proteins. While the canonically defined proteomes of many organisms results in the translation of proteins with an average size of approximately 460 amino acids in length, the average size of the altORFs is closer to 46 amino acids in length with a narrowly clustered distribution of sizes.2 The gene products of altORF, smallORF or shortORF (smORF or sORF) have been named short open reading frame encoded peptides (SEP).3 Compared to the number of predicted altORF or sORF,4 the number of SEP validated at the protein level is still low. This is primarily due to non-trivial issues associated with the identification of peptides and smaller proteins within complex proteomes.3,

5-7

A major

challenge is the formation of a relatively low number of peptides from small peptides/proteins upon proteolytic digestion in bottom-up proteomics. These can easily be missed in the analysis in the large background of peptides formed out of greater proteins which usually represent the main body of the proteome. Further problems arise with the low abundance of the target analytes. Several analytical pipelines to tackle these challenges have been successfully set up,3,

6-7

including proteogenomics approaches.5 We recently introduced a two dimensional separation scheme encompassing an open gel tube (“GelFree”) separation of intact proteins in first dimension, followed by digestion of the collected proteins fractions and subsequent second 3 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 30

dimension LC-MS analysis at peptide level.6 This method enabled the identification of 17 SEP in the archaea M. mazei under stringent identification criteria. Further, different strategies for peptide extraction have been successfully tested for the analysis of small proteins, e.g. the use of ultrafiltration devices for a separation of peptides from high molecular weight proteins (HMWP).8 Another approach for the pre-fractionation of the low molecular weight fraction of proteomes is the differential solubilisation (DS) methodology which is based on precipitation of all protein species from a sample before re-solubilising the “small” protein complement of the precipitated sample using acetonitrile (ACN).9 This method has been successfully applied for the analysis of peptides in plasma samples. Other approaches focused on the depletion of high abundant proteins in plasma samples by using precipitation methods.10-11 Despite the success of all these approaches for the analysis of the low molecular weight fraction of the proteome, the analysis of these analytes remains challenging, creating the need for alternative approaches. Here we present an alternative depletion method, based on selective acetonitrile precipitation of larger proteins, for the isolation of the small protein complement of a proteome. We have optimised several key parameters such as pH, ion strength, temperature and established a protocol that allows for the reproducible depletion of higher molecular weight proteins (higher than approximately 15 kDa). Application of this easy to perform method leads to a significant reduction of the background proteome. Optimisation steps were performed on a simple mixture of six proteins. Subsequently, we used the archaea Methanosarcina mazei Gӧ1 to further optimise the procedure. Initial assessment of the efficacy of the methodology was performed via Coomassie stained SDSPAGE, while mass spectrometry was used for the validation of results using the optimised workflow. In addition to the confirmation of earlier identified SEP,6 we identified three SEP

4 ACS Paragon Plus Environment

Page 5 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

not yet formerly described within the archeon M.mazei at the protein level, demonstrating the potential of this easy to implement sample preparation procedure.

Experimental section Samples A simple protein mixture consisting of six proteins (bovine insulin, bovine serum albumin, lactoglobulin and casein from bovine milk, equine cytochrome c, and ovalbumin) was utilised for method optimisation. For the initial experiments, the six-protein mixture was solubilised in 100mM triethylammonium bicarbonate (TEAB) buffer (pH 8.5).. Cultivation of M. mazei Gö1 under nitrogen limiting conditions was performed as described earlier on minimal media in an anaerobic environment until an optical density (OD) at 600nm of 0.6 was reached.6 The cells were concentrated via centrifugation, washed in 100 mM TEAB, and then disrupted via freeze thaw cycling and homogenisation in a mixing mill with glass beads in lysis buffer (100 mM TEAB, 1 x EDTA-free cOmplete protease inhibitor cocktail (Roche, Germany), pH 7.4). The protein concentration was determined via BCA protein assay and aliquots of the samples were stored at -20°C until required.

Depletion of the high molecular mass proteome, SDS-PAGE and protein digestion Following optimization of the parameters for the depletion of the HMWP the following standardised procedure was performed for all samples. During the optimisation process, minor alterations to the procedure were employed. These are stated where appropriate. Aliquots of 100 µg of total protein lysate were concentrated to near (but not complete) dryness via vacuum evaporation on a Concentrator Plus (Eppendorf, Germany). The samples 5 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 30

were then made up to a volume of 20 µl of either 210 mM NaCl (acidic depletion mixture with final concentrations of 50mM NaCl/0.1% trifluoroacetic acid (TFA)), or 420 mM TEAB (basic depletion mixture with a final concentration of 100 mM TEAB). The samples were vortexed to ensure solubilisation in the solution prior to the addition of 64 µl of ACN plus 0.1% TFA. The samples were vigorously vortexed for 30 sec before incubation at 20ºC on a temperature controlled shaker (1300 rpm) for 1 h. Following incubation the samples were centrifuged (21,000 x g for 20 min at 20ºC) and the supernatants were transferred to new eppendorf tubes. The samples were dried down via vacuum evaporation prior to further analysis. For visualisation of the isolated proteins, SDS-PAGE (16% T) was performed, and the gels were stained with either Coomassie Brilliant Blue (CBB) or Silver staining. For the analysis of proteins via LC-MS, protein samples were suspended in 100mM TEAB, and reduced with dithiothreitol (DTT) (10 mM, 56°C, 1 h), before alkylation was performed with iodoacetamide (IAA) (50 mM, RT, 30 min). Enzymatic digestion was performed with sequencing grade Trypsin (Promega, Germany) (enzyme to protein ratio of 1:50, 37°C, overnight). All samples were then cleaned on a single use solid phase extraction cartridge (as per manufactures protocol) before the peptides were eluted (60% ACN, 0.1% TFA) and dried down via vacuum evaporation.

Depletion of high molecular weight protein using the different solubility method In order to evaluate our high mass protein depletion procedure we performed also and established differential solubility (DS) procedure which is based on precipitation under denaturing conditions followed by re-solubilisation.9 An aliquot of our sample was diluted 1:2 with 20 μL of denaturating solution (7 M urea, 2 M thiourea, and 20 mM DTT), slowly 6 ACS Paragon Plus Environment

Page 7 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

dropped into 900 μL ice-cold acetone, and immediately stirred at 4°C for 1 h, followed by centrifugation at 19,000 x g for 15 min at 4°C. The precipitate was taken up in 200 μL of 70% ACN containing 12 mM HCl and mixed at 4°C for 1 h, then centrifuged again at 19,000 x g for 15 min at 4°C. The low molecular mass proteins/peptides were extracted into the supernatant.

LC-MS conditions and database settings Separation of samples via liquid chromatography was performed on a Dionex U3000 UHPLC system (Thermo, Dreieich, Germany) equipped with an Acclaim PepMap 100 column (2 μm, 75 μm× 500 mm) coupled online to a Q-Exactive plus Orbitrap mass spectrometer (Thermo, Bremen, Germany) utilizing HCD fragmentation at a normalised collision energy (NCE) of 27.5. A full scan MS acquisition was performed (resolution 70,000, AGC target 3e6, max IT 50ms) with subsequent MS/MS (resolution 17,500 AGC target 1e5, max IT 100 ms) of the top 15 most intense ions, dynamic exclusion was enabled (30 sec duration). The eluents used for LC were eluent A: 0.05% formic acid (FA), eluent B: 80% ACN + 0.04% FA. The separation was performed over a programmed 215-minute run. Initial chromatographic conditions were 4% B for 2 minutes followed by linear gradients from 4% to 20% B over 120 minutes and 20% to 40% B over 60 minutes, an 8-minute increase to 90% B, and 9 minutes at 90% B. An inter-run equilibration of the column was achieved by 15 minutes at 4% B. A flow rate of 300 nl/min was used. Before injection the samples were re-suspended in loading eluent (3% ACN, 0.1% TFA), and approximately 1 μg of sample was injected per run. The output data files from the mass spectrometer were processed using the Proteome Discoverer software package (Version 2.2.0.388). The individual files were searched using the SequestHT algorithm node against a protein database containing the canonical proteins of 7 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 30

M. mazei (M. mazei Gö1 accessed from UniProt on the 2017/06/26) as well as a collection of predicted sORF protein products 12-13 and the cRAP list of common laboratory contaminants. Enzyme: Trypsin (full), precursor mass tolerance: 7 ppm, fragment mass tolerance: 0.02 Da, Fixed modification: Carbamidomethyl (C), Dynamic modification: Oxidation (M) / Deamidation (N, Q) / Acetyl (protein N-terminus). The Percolator node was used for false discovery rate (FDR) calculation and a target protein target FDR of 0.01 was sought.

Determination of peptide recovery A tryptic digest of 100 µg bovine serum albumin (BSA) dissolved in 100 µL of 100 mM TEAB was reduced (10 mM DTT, 60°C, 1 h), and alkylated (50 mM IAA, 30 min, RT) before digestion with sequencing grade Trypsin (enzyme to protein ratio 1:100, overnight, 37°C). BSA digest was added (5 µg, 500 ng, 50 ng, 5 ng, 500 pg, 50 pg, 5 pg) to aliquots of (150 µg) M. mazei proteome. The samples were processed via the standard extraction method using 50 mM NaCl and 3.2 volumes of ACN plus 0.1% TFA, and analysed using the same LC gradient and MS parameters as described.

Raw data repository LC-MS data have been deposited to the ProteomeXchange Consortium

14

via the PRIDE

partner repository with the dataset identifier PXD011996.

8 ACS Paragon Plus Environment

Page 9 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Results and Discussion Experimental setup and method optimization In order to reduce the complexity of proteome samples and allow deeper coverage of the low molecular weight fraction, we developed and optimized a method that depletes the higher molecular weight fraction via precipitation, while retaining the smaller molecules and peptides in solution. Major factors influencing protein solubility are, aside from the protein amino acid composition and structure, the hydrophilic/hydrophobic characteristics of the solvent, its acid/base properties and ionic strength, and the temperature. The effect of these parameters on the ability of our approach to deplete the higher molecular weight proteome was tested on a full proteome of M. mazei, with pre-experiments performed on a simple six protein mixture spanning a MW range from 5.8 to 66 kDa. The protein content in the supernatant was monitored by SDS-PAGE. We used CBB staining during the method optimization, but added more sensitive silver staining for the final evaluation of the optimized procedure. Further, for the optimized protocol a full LC-MS experiment (bottom-up proteomics) was performed and compared against the results of both an analysis without depletion and an analysis using a previously published different solubility (DS) protocol.9 Additionally, by spiking a digest of BSA to the proteome of M. mazei we determined the recovery rates of peptides in the optimized protocol. For HMWP depletion acetonitrile is the organic solvent of choice as it is widely compatible with analytical downstream processes and is easily removed by evaporation. At the same moment it is miscible with water allowing it to trigger the solvent hydrophily, e.g. to keep small peptides dissolved. Acetonitrile:water mixtures ranging from 1:1 – 7:1 (v:v) were tested to deplete HMWP in a M. mazei proteome.

9 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 30

Optimization experiments using the six-protein mixture identified a ratio of 3.2:1 as an optimal ratio for the depletion of larger proteins while retaining the small protein complement (Suppl. Fig S1A). This effect was further observed during optimization experiments with the more complex M mazei sample, which showed the gradual depletion of higher molecular mass proteins for ratios from 2:1 to 3.2:1 (Suppl. Fig. S1B). At ratios higher than 4:1, very few proteins were observed in the supernatant by CBB stained SDS-PAGE; at these ratios, proteins in the low molecular weight range were also widely depleted. Based on this observation further optimization was performed with acetonitrile:water ratios of 3.2:1. The loss of low molecular weight proteins at ratios above 4:1 hinted for a co-precipitation with other proteins, which could potentially be caused by the fact that we used a non-buffered aqueous mixture. In order to assess this hypothesis, we studied the effect of the salt/buffer concentration within the aqueous solution. Based on the visual assessment of CBB stained gels, low millimolar concentrations of TEAB (1 h) did not provide a discernible benefit in the extraction. Based on these observations, an incubation time of 1 h was kept for all future analyses. When using higher temperatures (37°C and 50°C), faint bands of higher molecular mass proteins were observable in the CBB stained gels, while for 20°C most efficient reduction of high molecular weight proteins with simultaneous retention of low molecular weight proteins (e.g. insulin) was observed for our six protein mixture (Suppl. Fig. S6). In contrast, at 0°C (samples kept on ice), small proteins were also depleted from the samples.

11 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 30

Evaluation of the protein depletion efficiency The optimized HWMP depletion protocol was finally assessed by SDS-PAGE employing more sensitive silver staining. Using the full M. mazei extract showed that our method, using either TEAB or sodium chloride within the aqueous solution, reproducibly depleted the vast majority of abundant proteins greater than 14 kDa (as visible via CBB staining) (Fig. 1 A, C). Silver staining, which allows for a more sensitive detection of proteins was also performed, this confirmed further confirmed that the depletion for a lower level of detection but confirmed that the methodology is robust in its depletion of the larger molecular mass protein complement (Fig. 1 B, D). In the course of the study, when using the TEAB depletion methodology, a strong band at 29 kDa was visible within CBB stained gels (Fig. 1 A). We identified the dominant protein species in this band as a methanol corrinoid protein (UniProt accession Q8PXZ3) using tryptic in-gel digestion and MALDI TOF/TOF MS. From former experiments, we know this protein to be highly abundant within M. mazei,6 which may have hindered its depletion via our protocol. An alternative explanation, which we however have not data for at the present time, could be that binding of smaller proteins to this larger one increases its solubility specifically within the ACN:TEAB mixture. Assessment of the depletion of larger proteins was performed via LC-MS. We directly compared our optimized depletion method against a full proteome analysis of the same sample using identical gradients and comparable on column injection amounts (approx. 1 µg). In addition, the differential solubility (DS) method

9

for the depletion of the larger

molecular mass proteins using precipitation with cold acetone followed by re-solubilization with 70% ACN/12 mM HCl at 4°C was also assessed. For each extraction methodology duplicate extractions and LC-MS analyses were performed.

12 ACS Paragon Plus Environment

Page 13 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Using a standard workflow for a full proteome analysis via 1D LC-MS without depletion steps, we identified 1,222 protein species from M. mazei. Depletion of the larger molecular mass proteins using the procedure developed here resulted in the identification of 450 distinct protein species. The application of the DS protocol led to the identification of 1,085 proteins. Comparison of the distribution of identified protein sizes, revealed a typical distribution for the full proteome sample (based on the canonical sequences of the full-length amino acid sequence of the inferred protein),15 and revealed the overall efficiency of the depletion (Fig. 2). A reduction in the total number of protein identifications using the DS method was observed compared to the non-depleted sample, however, on average it showed a less than 20% decrease across the majority of protein size bins, with no apparent dependency upon protein size. In contrast, using our depletion strategy, there was a negligible difference in the total number of proteins identified below 100 amino acid residues in length, an almost 50% depletion of proteins from 100 to 150 amino acid residues in length, and, on average, a more than 70% depletion of all larger protein species (Fig. 3). While not providing evidence for an extended number of identified small proteins, the analysis of proteins after application of our depletion method produced higher quality identifications for those proteins observed with a higher number of PSMs found per peptide (14.20 compared to 11.25, Suppl. Table 1). In general this did not translate into an increased average sequence coverage, potentially due to the low number of possible peptides available for smaller proteins. However, the quality of the spectra were arguably improved in terms of a reduction in the number of non-assigned peaks due to precursor co-isolation, which significantly enhances identification of shorter peptides, e.g. for single peptide identifications as discussed below. An example is shown in Fig. 4A/B for a peptide derived from the short open reading frame encoded peptide SEP MM_3380.

13 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 30

Our methodology clearly shows the intended impact on the presence - and hence the identification - of residual proteins greater than 100-150 amino acid residues in length. Here, a significant reduction in the number of both PSMs and peptides per protein, as well as a clear reduction in the sequence coverage following extraction via our methodology was achieved (Fig. 5). This leads finally to a significant reduction of sample complexity, which will potentially allow using shorter LC-separations in future experiments. The selectivity for a depletion of larger proteins is further supported by the observation, that the number of proteins (>100 aa) identified by only a single peptide was proportionally higher following our extraction procedure, with 128 of 303 (42%) versus 230 of 1,039 (22%) of the identified proteins in the depleted versus non-depleted extractions, respectively. These one-hit wonders were in many cases also identified with a very low number of PSMs. Fewer than 2% of proteins from non-depleted, and more than 11% of the proteins from depleted, were identified by only a single PSM; these identifications would not be considered in standard proteomics approaches. Overall these data show an efficient reduction of high molecular weight proteins by our procedure which outperforms the DS approach.

Recovery of small proteins and peptides A spike-in experiment was performed to determine how well peptides could be recovered through the procedure. A tryptic digest of BSA was prepared, and amounts from 5 µg down to 5 pg were spiked into 150 µg aliquots of M. mazei whole cell lysate. The depletion of the high molecular mass protein complement was performed using acetonitrile:water 3.2:1, 50 mM NaCl/0.1% TFA, and subsequent analysis of the samples employed LC-MS. In a first experiment we analyzed the spiked-in peptides recovered in the supernatant. BSA peptides were detectable within samples spiked with between 5 µg – 500 pg of BSA. (Suppl. 14 ACS Paragon Plus Environment

Page 15 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 2). For the lowest BSA peptide spiked sample (500 pg), this equates to the recovery of approximately 7 fmol of digested protein through the depletion process. While this apparently does not equate to an extremely low level of sensitivity, this value is comparable to a previous analysis of the limits of detection for both the QExactive plus and Fusion Lumos (which utilises the next generation in Orbitrap mass analyser) in which 50 pg of a spiked protein was identifiable (i.e. ten times more sensitive) within a complex proteome background.16 Additionally, it is important to note that the background proteome used for our spike in experiments was 600 times more abundant than in this previous study. In the second experiment the pellet of the 5 µg spike-in sample was re-solubilized and digested, and the LC-MS spectra were inspected for BSA peptides being co-depleted. Twenty-two BSA derived peptides were identified, however in total only 88 PSMs were found. In contrast, in the supernatant the same 5 µg sample yielded 2,634 PSMs. Even taking into account the higher complexity of the pellet sample including suppression/co-isolation effects, this hints for only a minor loss of BSA peptides using our depletion procedure.

Identification of SEP Assessment of the number of SEP identified in this analysis was carried out by comparison of our optimised basic (ACN/TEAB) or acidic (ACN/NaCl/TFA) depletion methodologies versus the non-depleted full proteome analysis. For this, six technical replicates for each method were analysed. We identified 11 SEP, of which only four were also identified in the non-depleted samples. All peptides identified in the full proteome were also identified in the depleted samples, albeit with either higher numbers of peptides or PSMs, or with increased total sequence coverage (Table 1). 15 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 30

Quality of MS/MS spectra is a key feature for SEP identification, in particular for smaller SEP, which potentially form only a limited number of peptides in a (tryptic) digest. This results in many cases in “one-hit wonder” like identifications. In order to allow unambiguous identifications interpretation guidelines were recently established,17 e.g. the manual inspection of spectra and the requirement of continuous series of at least five b- or y-ions. The improved quality of MS/MS spectra achieved after depletion, as shown in Fig 4, in this respect provides a major benefit, allowing a markedly improved identification. Interestingly, while the extraction with the basic acetonitrile solution did not increase the number of SEP identified compared to the non-depleted extraction, it did allow the highly confident identification of a previously identified and partially characterised SEP from M. mazei, sp36_3SW; this peptide was not identified with the acidic acetonitrile depletion solution. In a previous study using the same biological samples we identified 17 SEP (under stringent identification criteria) using two different two-dimensional separation schemes: a classical 2D-LC-MS approach (bottom-up) and a semi-top-down approach base on intact protein separation by gel free electrophoresis followed by digestion and LC-MS analysis.6 While the number of SEP identified after depletion in this study was lower (11) in comparison to our previous analyses, we had a markedly better SEP identification compared to the non-depleted extraction (Table 1). This was evidenced by a closer inspection into the average percentage sequence coverage obtained for SEP identified via the depletion (34%), compared to full extraction (29%), and the average number of peptides per protein that was 17% higher compared to the non-depleted analysis. This again reflects the beneficial effect of reduced complexity, achievable without a long and laboursome 2D-separation scheme. However, the fact that many SEP were not identified after depletion implies that both approaches deliver complementary results (Suppl. Fig. 7) and thus should be used in parallel in order to 16 ACS Paragon Plus Environment

Page 17 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

maximise the number of identified SEP and simultaneously enhance the quality of identifications. Amongst the eleven SEP identified, this study provides the first protein level evidence for three additional SEP: MM_RS05370, MM_RS06425 and spRNA1766. While their roles within M. mazei have yet to be determined, evidence of their translation provides justification towards more targeted analyses of these species and will hopefully lead to a better understanding of their role within the organism.

17 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 30

Conclusions The selective isolation of the small protein complement of a system is a useful step towards identification and molecular characterization of potentially biologically important small proteins and peptides in a cellular context. While previous iterations for the depletion of large proteins using acetonitrile have focused around the removal of highly abundant, predominantly large proteins from plasma and serum, the ability to fine tune the procedure for the enrichment of small proteins from cellular proteomes offers a revival of this methodology. The approach presented here is based on the (selective) depletion of larger proteins with simultaneous retention of proteins below around 150 amino acids in solution. We achieved a depletion of more than 70% of proteins above ca. 150 amino acids, which was significantly higher than that achieved to the differential solubility method. However, like in all depletion methods, a loss of smaller proteins was also observed. In the relatively small proteome of the archaeon M. mazei analyzed here, no obvious gain in the total number of small proteins was observed compared to non-depleted samples. Nevertheless, the depletion did not lose small proteins identified with the other methods and additionally it offered a number of real advantages compared to non-depletion. Reducing the complexity of the samples led to an improved identification of small proteins, in particular caused by the improved quality of MS/MS spectra due to significantly less precursor coisolation. This factor becomes particularly important for the identification of very small proteins for which only few peptides are potentially present in bottom-up analysis, and thus are only identifiable by single peptides. Consequently, in future experiments the depletion will potentially allow for the application of shorter LC-gradients for samples of similar complexity, or will be even more beneficial for proteomes of higher complexity, e.g. from

18 ACS Paragon Plus Environment

Page 19 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

invertebrate, vertebrate or of mammalian cells. Indeed, first experiments performed with our procedure using C. elegans hint in this direction. Applying the depletion method we were able to identify hitherto not yet identified SEP in M. mazei; further, several SEP identified earlier could be confirmed here, but the identification quality was improved in terms of sequence coverage and the MS/MS spectra quality described above. Nevertheless, it has to be stated that some SEP which we identified earlier in the same samples using a semi top-down separation scheme 6 could not be identified after application of the depletion. Hence, we conclude that the parallel use of different analytical approaches will remain a key for the identification and characterization of SEP/altORF proteins and peptides, as well as of classical small proteins and products of proteolysis.

19 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 30

SUPPORTING INFORMATION: The following supporting information is available free of charge at ACS website http://pubs.acs.org Figure S1. Effect of increasing the acetonitrile:water ratio on the retention of proteins within the supernatant using (A) the 6 protein mixture, and (b) the M.mazei proteome Figure S2. Effect of TEAB concentration on the retention of proteins within the supernatant Figure S3. Effect of NaCl concentration on the depletion of proteins Figure S4. Distribution of isoelectric points of proteins identified by LC-MS in M. mazei Figure S5. Effect of incubation time on the retention of proteins in the supernatant using a six-protein mixture Figure S6. Temperature dependency of the low molecular mass protein isolation performed using a six-protein mixture Figure S7. Comparison of the SEP identified in the present study with those identified in a previous analysis of M. mazei using two different approaches (2D-LC MS; GelFree LC-MS Table S1. Average number of proteins identified less than and greater than 100 amino acid residues in size Table S2. Detection of bovine serum albumin (BSA) peptides spiked into 150 µg aliquots of M. mazei after extraction

20 ACS Paragon Plus Environment

Page 21 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Acknowledgements This work was supported by the Deutsche Forschungsgemeinschaft (DFG) within the “Schwerpunktprogramm SPP2002”, project Z1. We thank Katrin Weidenbach and Ruth Schmitz-Streit for the M. mazei samples.

21 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 30

Figures

Figure 1. Depletion of larger molecular mass proteins from M. mazei via either the ACN/TEAB (A&B) or ACN/NaCl (C&D) methodology and visualised via either Coomassie brilliant blue or silver stain.

22 ACS Paragon Plus Environment

Page 23 of 30

160

# Proteins identified

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

140

ACN/NaCl

120

DS Full

100 80 60 40 20 0

Protein lenght (aa residues)

Figure 2. Distribution of (calculated) molecular weights of identified proteins (LC-MS) obtained following the two depletion methodologies tested in comparison to the non-depleted full proteome.

23 ACS Paragon Plus Environment

Journal of Proteome Research

100%

Proteins identified (%)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 30

80%

60%

40%

20%

ACN/NaCl

DS

Full

0%

Protein lenght (aa residues)

Figure 3. Percentage of proteins identified across the molecular mass ranges via either our depletion method or the DS depletion strategy after LC-MS analysis relative to the number of proteins identified in the full proteome analysis of M. mazei.

24 ACS Paragon Plus Environment

Page 25 of 30

A)

F67: 180417_Mmazei_Full_Proteome_2.raw #22994, EffectiveRT=73.5780 min FTMS, Isolation=453.24 Da / 452.64-453.84 Da, z=+2, Mono m/z=453.24036 Da, MH+=905.47344 6

180417_Mmazei_Full_Proteome_2.raw #22996 RT: 73.5836 min FTMS, [email protected], z=+2, Mono m/z=453.24036 Da, MH+=905.47344 Da, Match Tol.=0.02

b₄²⁺-H₂O 171.11226

453.24036 z=2 453.25894

173.12788

600 5

453.74030 z=2 453.89908

454.24774 z=2

3

456.24304 456.27045

452.88193 453.00027 456.47397 456.50220 454.74930 z=2 454.92929

1

456.72491 457.01382

456.74615

451.27127

451.75858 452.27335

450.89413 451.64285

455.43555 452.60684

195.07562

y₃⁺ 432.26004

y₂⁺ 246.18028

454.46317

457.47433

454.88174

y₅⁺ 660.36908

515.25665 421.25497

y₇²⁺-H₂O 393.70682

y₇²⁺-NH₃ 394.21204

457.92056 457.72437 458.59256 457.99545

y₇⁺-H₂O 786.40948

553.34613

[M+2H]²⁺-H₂O 444.23727 b₄⁺-H₂O 341.18222

b₃⁺ 246.10777

100

457.55438

454.50058

452.0116

663.38007

200

457.22223

455.9118

579.25189

y₄⁺-H₂O 529.27130 b₅⁺-H₂

805.42535

748.40430

607.32056 b₆⁺-H₂O 642.28320

b₅

839.40881 y₆⁺-H₂O 729.38885

0

0 451

452

453

454

455

456

457

100

458

200

300

400

500

600

700

800

m/z 180503_Mmazei_ACN_NaCl_1.raw #29186 RT: 92.2377 min FTMS, [email protected], z=+2, Mono m/z=453.23972 Da, MH+=905.47215 Da, Match Tol.=0.02

y₄⁺ 547.28552

4

300

y₆⁺ 747.40015

Intensity [counts] (10^3)

453.23972 z=2

Intensity [counts] (10^3)

b₂⁺ 159.07587

3

2

453.74121 z=2 453.56964 458.29691

y₁⁺-NH₃ 130.08580 200

y₃⁺ 432.25955 150

y₂⁺ 246.18034

y₁

159.09099

454.58554

b₃⁺ 246.10757

454.91949

451.31354 451.23288

b₂⁺-H₂

453

456.28159 455.58829 456.73608 455.55188 456.20093 455.92300 456.57855 454

455

316.14926 y₇²⁺-NH₃ 394.21259

274.11719

456

457.2651 457.20819

b₅⁺ 474.23099

502.22757 y₄⁺-H₂O 529.27100

b₅⁺-H₂O 456.20731

748.40204

548.29089

601.29315

b₆⁺-H₂O 642.28479

661.37500 672.36298

457.90585

458

805.42566 y₇⁺-H₂O 786.41302

541.23730

457.57901 458.23618

457

y₇²⁺-H₂O 393.70923

y₅²⁺ 330.68713

b₁

455.25497 z=2 455.17877

0 452

b₃⁺-H₂O 228.09735

50

452.25919 452.76144 451.92529 452.88971 451.73267 452.23727 452.70670 451.88687

y₅⁺ 660.36926 302.11218

173.12791

454.72150 z=2

1

454.24277 z=2 454.21460

y₇⁺ 804.42218

250

100

451

900

m/z

m/z

m/z

F55: 180503_Mmazei_ACN_NaCl_1.raw #29183, EffectiveRT=92.2299 min FTMS, Isolation=453.24 Da / 452.64-453.84 Da, z=+2, Mono m/z=453.23972 Da, MH+=905.47215

Intensity [counts] (10^6)

y₇⁺ 804.42090

y₆⁺ 747.40131

b₃⁺-H₂O 228.09790

y₁⁺ 147.11226

300

302.11212 244.09203

455.26642 z=2

453.49377

y₄⁺-NH₃ 530.26379

400

b₁

452.56924 450.98563 451.03458

B)

b₂⁺ 159.07616

Intensity [counts] (10^3)

Intensity [counts] (10^6)

4

2

y₄⁺ 550.29620 547.28564

Intensity [counts] (10^3)

Intensity [counts] (10^6)

500

Intensity [counts] (10^6)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

y₆⁺-H₂O 729.38330 y₇⁺-NH

844.40356

0 100

200

300

400

m/z

500

600

700

800

900

m/z

m/z

m/z

Figure 4. MS isolation window spectra (left) and MS/MS-Spectra (right) of the proteotypic peptide TGSIDWVK from the SEP MM_3380 in A) the full proteome extraction Charge: +2, Monoisotopic m/z: 453.24036 (0.8 ppm), and B) following depletion with ACN/ NaCl Charge: +2, Monoisotopic m/z: 453.23972 Da (0.61 ppm).

25 ACS Paragon Plus Environment

Journal of Proteome Research

35

50

Peptides PSM %Cov

30

40

# of peptides or PSMs

25 30

20

15

20

Sequence coverage (%)

10 10 5

100 aa

Figure 5. Comparative assessment of the number of peptides and PSMs per protein for identified proteins less than or greater than 100 amino acid residues in length, with the associated percentage sequence coverage for the ACN/50 mM NaCl/0.1% TFA depletion, the DS-methodology and the non-depleted (full) proteome.

26 ACS Paragon Plus Environment

Page 27 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Journal of Proteome Research

Table 1. SEP identified in M. mazei after depletion of high molecular weight proteins using either the optimised acetonitrile depletion or in nondepleted samples (Extraction: full). Depletion mixtures: ACN-water 3.2:1, with either 100 mM TEAB or 50mM NaCl/ 0.1% TFA. Accession

Protein Sequence

Extraction

Coverage [%]

# Unique Peptides

# Peptides

# PSMs

# AAs

MW [Da]

calc. pI

sp36_3_SW

VTIWEYDVKEIRFSEWSKAKEDLNNLGVEGWELIKFSN EIDENGMVAAVFKRPVDYVDAAF

ACN/TEAB

41

2

2

22

61

7108.53

4.49

MM_3380

MIMGKTGSIDWVKVKGRKGKVIKVQKSKSQKAHPGPA QRFTSSGHKRRFIRRSAKALVK

Full

17

2

2

9

59

ACN/NaCl

17

2

2

15

59

6612.81

12.32

MM_3395

VSTMVYVTKICPVCGKEFFVLKDAEEKAIYCTLACLATA QEKFERRGRSFPSFS

ACN/NaCl

20

1

1

2

54

ACN/TEAB

20

1

1

11

54

6096.05

8.18

ACN/NaCl

43

2

2

5

51

6034.98

4.65

ACN/NaCl

40

2

2

3

62

7363.58

7.96

Full

19

1

1

1

80 9118.50

4.7

7694.52

8.54

8028.01

8.46

MM_3401 MM_RS01660

MM_RS05370

MSTDQEILARIQEIEKRMERMEATLESINNILKKVEQNT YFGCYVEGEKLD MFTMENTEGFDQKTLDKMNAEVKKSMSKHDPKSKNY KKICEQVEDQVFDKYCSEIFRPSLKL MTRITDHLHKNVLSVIMDIFEMDHFREPEVEEVQLFLES TGTHGSIVDETALDMDCGTSPKIQARQSVLLNFDEPVQ PIK

MM_RS12120

MAFNDRNSFRGRDNNRGGFGGAPREMHNATCSDCG AETQVPFKPDPDRPVYCRDCLPNHRKPRENRY

MM_RS16040

MKTAGYALAIIGSAYVVKQVLKCEKCRSWSGWHKIKV EDEGPDSSTGSHTIKVEEEEGERKGSRYGSNPIRY

MM_RS17510

MM_RS06425 spRNA1766

MASAADDIGGDSGSSDAVDTGDSDSETESSDSDSDS GDSEKFNLRDLGSKYFSWRNSDPEDSSSDDDETSEE DEKSGDDEQSGNDETSSEDETSGGDVPSDNGETSGE DDQSGNDETSGGDIPSVNDETSGEDDQSGNDETSGG DIPSDEDDSTDDNPADDGSETDDSGTGSPSTDNPETE NSEVSDTSSSNSGSGSSSSSDSSSTANPAAGSDPGS TANPESGPDSGDTANPDTGSSSGSSPDSDSGSSGMG SGISTEPATNIAIKELATRNVMSGYHVKYEFPQNVTCIT YIEYDAERTFRKTTTVVEVLRDKSTLVKVLPRGEVYKH VNIWVGENAAGLPTSLKNGLVGFKVEKKWMEKNNVS ESFITLQWYNRGWEPLDTIKTGEDEKYVYFESETPGYS FFAITEYEGEETELQKTLRSLAGSKKDGTAREPMKAAK MLLAIALPLFLVVAGYCVLKKKI MAYESQYYPGATSVGANRRKHMSGKLEKLREISDEDL TAVLGH LWILLYDIASLKCHNTSGQFQDTGRCIPEIQRHKGRFR

ACN/NaCl

19

3

3

8

80

ACN/TEAB

19

2

2

12

80

Full

45

5

5

54

67

ACN/NaCl

75

7

7

48

67

Full

36

2

2

9

72

ACN/NaCl

47

3

3

7

72

ACN/NaCl

9

2

2

2

465

49034.98

4.15

ACN/NaCl

77

1

10

60

43

4807.38

7.42

ACN/NaCl

16

1

1

1

38

4499.32

9.19

27 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 30

References

(1) Chugunova, A.; Navalayeu, T.; Dontsova, O.; Sergiev, P. Mining for Small Translated ORFs. J Proteome Res 2018, 17, 1-11. (2) Samandi, S.; Roy, A. V.; Delcourt, V.; Lucier, J. F.; Gagnon, J.; Beaudoin, M. C.; Vanderperre, B.; Breton, M. A.; Motard, J.; Jacques, J. F.; Brunelle, M.; Gagnon-Arsenault, I.; Fournier, I.; Ouangraoua, A.; Hunting, D. J.; Cohen, A. A.; Landry, C. R.; Scott, M. S.; Roucou, X. Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins. Elife 2017, 6. (3) Saghatelian, A.; Couso, J. P. Discovery and characterization of smORF-encoded bioactive polypeptides. Nat Chem Biol 2015, 11, 909-16. (4) Olexiouk, V.; Van Criekinge, W.; Menschaert, G. An update on sORFs.org: a repository of small ORFs identified by ribosome profiling. Nucleic Acids Res 2018, 46, D497-D502. (5) Yagoub, D.; Tay, A. P.; Chen, Z.; Hamey, J. J.; Cai, C.; Chia, S. Z.; Hart-Smith, G.; Wilkins, M. R. Proteogenomic Discovery of a Small, Novel Protein in Yeast Reveals a Strategy for the Detection of Unannotated Short Open Reading Frames. J Proteome Res 2015, 14, 5038-47. (6) Cassidy, L.; Prasse, D.; Linke, D.; Schmitz, R. A.; Tholey, A. Combination of Bottom-up 2D-LCMS and Semi-top-down GelFree-LC-MS Enhances Coverage of Proteome and Low Molecular Weight Short Open Reading Frame Encoded Peptides of the Archaeon Methanosarcina mazei. J Proteome Res 2016, 15, 3773-3783. (7) Budamgunta, H.; Olexiouk, V.; Luyten, W.; Schildermans, K.; Maes, E.; Boonen, K.; Menschaert, G.; Baggerman, G. Comprehensive Peptide Analysis of Mouse Brain Striatum Identifies Novel sORF-Encoded Polypeptides. Proteomics 2018, 18, e1700218. (8) Tinoco, A. D.; Tagore, D. M.; Saghatelian, A. Expanding the dipeptidyl peptidase 4-regulated peptidome via an optimized peptidomics platform. J Am Chem Soc 2010, 132, 3819-30. (9) Kawashima, Y.; Fukutomi, T.; Tomonaga, T.; Takahashi, H.; Nomura, F.; Maeda, T.; Kodera, Y. High-yield peptide-extraction method for the discovery of subnanomolar biomarkers from small serum samples. J Proteome Res 2010, 9, 1694-705. (10) Kay, R.; Barton, C.; Ratcliffe, L.; Matharoo-Ball, B.; Brown, P.; Roberts, J.; Teale, P.; Creaser, C. Enrichment of low molecular weight serum proteins using acetonitrile precipitation for mass spectrometry based proteomic analysis. Rapid Commun Mass Spectrom 2008, 22, 3255-60. (11) Henning, A. K.; Albrecht, D.; Riedel, K.; Mettenleiter, T. C.; Karger, A. An alternative method for serum protein depletion/enrichment by precipitation at mildly acidic pH values and low ionic strength. Proteomics 2015, 15, 1935-40. (12) Jager, D.; Sharma, C. M.; Thomsen, J.; Ehlers, C.; Vogel, J.; Schmitz, R. A. Deep sequencing analysis of the Methanosarcina mazei Go1 transcriptome in response to nitrogen availability. Proc Natl Acad Sci U S A 2009, 106, 21878-82. (13) Dar, D.; Prasse, D.; Schmitz, R. A.; Sorek, R. Widespread formation of alternative 3' UTR isoforms via transcription termination in archaea. Nat Microbiol 2016, 1, 16143. (14) Vizcaino, J. A.; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Rios, D.; Dianes, J. A.; Sun, Z.; Farrah, T.; Bandeira, N.; Binz, P. A.; Xenarios, I.; Eisenacher, M.; Mayer, G.; Gatto, L.; Campos, A.; Chalkley, R. J.; Kraus, H. J.; Albar, J. P.; Martinez-Bartolome, S.; Apweiler, R.; Omenn, G. S.; Martens, L.; Jones, A. R.; Hermjakob, H. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat Biotechnol 2014, 32, 223-6. (15) Tiessen, A.; Perez-Rodriguez, P.; Delaye-Arredondo, L. J. Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species 28 ACS Paragon Plus Environment

Page 29 of 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes. BMC Res Notes 2012, 5, 85. (16) Levy, M. J.; Washburn, M. P.; Florens, L. Probing the Sensitivity of the Orbitrap Lumos Mass Spectrometer Using a Standard Reference Protein in a Complex Background. J Proteome Res 2018, 17, 3586-3592. (17) Slavoff, S. A.; Mitchell, A. J.; Schwaid, A. G.; Cabili, M. N.; Ma, J.; Levin, J. Z.; Karger, A. D.; Budnik, B. A.; Rinn, J. L.; Saghatelian, A. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat Chem Biol 2013, 9, 59-64.

29 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 30

For Table of Contents only

30 ACS Paragon Plus Environment