Subscriber access provided by WEBSTER UNIV
Article
Comparison of protein quantification in a complex background by DIA and TMT workflows with fixed instrument time. Jan Muntel, Joanna Kirkpatrick, Roland Bruderer, Ting Huang, Olga Vitek, Alessandro Ori, and Lukas Reiter J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00898 • Publication Date (Web): 06 Feb 2019 Downloaded from http://pubs.acs.org on February 7, 2019
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
Comparison of protein quantification in a complex background by DIA and TMT workflows with fixed instrument time
Authors: Jan Muntel+1, Joanna Kirkpatrick+2, Roland Bruderer1, Ting Huang3, Olga Vitek3, Alessandro Ori*2, Lukas Reiter*1
+contributed
equally
* shared corresponding authors Alessandro Ori: phone: +49 3641 65 6808, email:
[email protected] Lukas Reiter: phone: +41 44 7382043, email:
[email protected] Institutions: 1. Biognosys AG, Wagistrasse 21, 8952 Schlieren, Switzerland 2. Leibniz Institute on Aging - Fritz Lipmann Institute, Beutenbergstrasse 11, 07745 Jena, Germany 3. Northeastern University, 360 Huntington Avenue, Boston, MA 02115, USA
1 ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 40
Abstract Label free quantification (LFQ) and isobaric labelling quantification (ILQ) are among the most popular protein quantification workflows in discovery proteomics. Here, we compared the TMT SPS/MS3 10-plex workflow to a label free single shot data-independent acquisition (DIA) workflow on a controlled sample set. The sample set consisted of ten samples derived from 10 biological replicates of mouse cerebelli spiked with the UPS2 protein standard in five different concentrations. For a fair comparison, we matched the instrument time for the two workflows. The LC-MS data were acquired at two facilities to assess interlaboratory reproducibility. Both methods resulted in a high proteome coverage (>5,000 proteins) with low missing values on protein level ( 8,000 proteins within in one run of certain samples, the overall depth of the analysis is potentially limited by the dynamic range and sensitivity of the current LC-MS instrumentation. Typically, the dynamic range is improved by prefractionation techniques, but these methods are inherently difficult to combine with label-free quantification 26.
The ILQ workflow relies on a chemical labeling of the peptides and allows multiplexing of up to 11 samples in the case of TMT. The isobaric nature of the tag makes the labeled peptides indistinguishable on MS1 level, but after fragmentation each label creates a sample-specific reporter ion, which is then used for quantification. This results in very low counts of missing values, because the quantification doesn’t rely on the selection of the same peptides across multiple samples. The main advantages of the TMT workflow is its ability to simultaneously perform a deep proteome analysis of up to 11 samples. Since the labeled peptides share the same physicochemical properties, this makes them perfectly suitable for prefractionation
27–29.
Thus, experiments with more than 11 samples can only be analyzed using multiple
blocks with a reference channel in each block, which can lead to complications such as missing values across blocks. In early ILQ studies, the quantitative accuracy was rather low, because co-isolation of multiple labeled peptides led to an underestimation of the ratios
30,31.
This issue has been addressed by a
number of approaches such as the introduction of MS3-based methods, in which one or several fragment ions are isolated after MS2 for further fragmentation 32,33. This additional isolation step aims to ensure that spectra with only reporter ions derived from one distinct peptide are generated in MS3 and results in greatly improved quantitative accuracy. However, the lower scan speed of these MS3 or SPS (synchronous precursor selection) methods decreases the sample throughput. In summary, both TMT and DIA are now amenable to a high proteome coverage, and precise and accurate quantification with low counts of missing values. Nevertheless, very few studies systematically compared these workflows on a controlled dataset with predefined changes in abundance. Most laboratories either have only the expertise in one of the workflows, or choose the workflows based on the available resources, e. g. number of samples. In 2018, the Gygi group demonstrated on a controlled dataset (yeast spike-in into human background) that TMT delivered a better precision and accuracy of 4 ACS Paragon Plus Environment
Page 5 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
quantification compared to label-free quantification using data-dependent acquisition (DDA) data, as implemented in the MaxQuant software
26.
The authors concluded that this advantage is most likely
caused by a much lower number of missing values in TMT, and that DIA workflows might overcome this issue for label-free quantification approaches in the future
34.
In other studies, the iTRAQ and SWATH
workflows were compared on biological samples without a known ground truth
29.
Bourassa and co-
workers suggested that SWATH is a promising alternative to iTRAQ, even though the number of quantified proteins by SWATH was lower
35.
A more recent study on tear samples concluded that both
workflows have similar accuracy, but SWATH should be preferred for biomarker discovery studies because of a better reproducibility, less required sample and better scalability to large cohorts 36. Here, we present a study based on controlled quantitative mixtures (UPS2 protein standard spike-in into mouse cerebellum samples). The UPS2 was spiked into 10 biologically different cerebellum samples to simulate a real biological experiment with varying biological background. Our study compares the TMT SPS/MS3 and DIA workflows in terms of quantitative precision and accuracy across a wide dynamic range (3 orders of magnitude in mouse cerebellum background of 5 orders of magnitude) of quantified proteins and different effect sizes. For a fair comparison, we used the same instrument time for both workflows. Workflows were exchanged between the Biognosys facility (BGS), having a long-standing experience in DIA method development, and at the Fritz-Lipmann-Institute (FLI), which is well experienced with the TMT workflow. The sample set was measured in both laboratories with both workflows. Our results showed an good transferability of the two workflows across laboratories. Both methods yielded comparable results in term of recovery of the ground truth, and demonstrated higher accuracy of the DIA workflow, but higher precision of the TMT workflow.
5 ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 40
Material and Methods All chemicals were purchased from Sigma (St Louis, MO), otherwise the vendor is stated.
Generation of controlled sample Ten mouse cerebellum samples were ordered from AMS biotechnology (Abingdon, UK). Tissues were lyzed in lysis buffer (8 M urea, 0.1 M ammonium bicarbonate) using the TissueLyzer II (5 cycles, 30 beats per s for 30 s; Qiagen, Hilden, Germany). Afterwards the DNA was sheared using sonication in the Bioruptor (5 cycles, 30 s ON, 30 s OFF, 4 °C; Diagenode, Seraing, Belgium). After centrifugation of the lysates (20 min, 16,000 x g, room temperature), aliquots were reduced and alkylated by addition of tris(2carboxyethyl)phosphine
(TCEP,
final
concentration:
5mM)
and
2-chloroacetamide
(CAA,
final
concentration: 20 mM) for 1 h at 37 °C. Urea concentration was diluted to 1.5 M by addition of 0.1 M ammonium bicarbonate buffer. Lysates were digested by trypsin (Promega, Madison, WI) using a 1:100 enzyme to protein ratio at 37 °C overnight. Digests were purified using MacroSpin clean up columns (NEST group, Southborough, MA) following manufacturers protocol. Eluates were dried completely by vacuum centrifugation (Savant SPD131DDA, Thermo Fisher Scientific, San Jose, CA). The UPS2 protein standard was digested separately. The lyophilized proteins were dissolved in 1.5 M urea buffer (in 0.1 M ammonium bicarbonate, 5 mM TCEP, 20 mM CAA), incubated for 1h at 37 °C and digested by trypsin overnight at 37C. After acidification by 20% (v/v) TFA, peptides were purified using MicroSpin clean-up columns (NEST group) and eluates were dried completely by vacuum centrifugation. All samples were resuspended in solvent A (1% acetonitrile, 0.1% formic acid in water) including iRT peptides (Biognosys, Schlieren, Switzerland). The concentration of the cerebellum samples was adjusted to 1 ug/µl and spiked with 5 different concentrations of the UPS2 digest in duplicates. The UPS2 consisted 6 ACS Paragon Plus Environment
Page 7 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
of 48 proteins organized in 6 abundance tiers each with 8 proteins, covering 5 orders of magnitude. Based on the lowest abundant tier the protein concentrations were: S1: 0.05 amol/µl, S2: 0.055 amol/µl (+10%), S3: 0.072 amol/µl (+30%), S4: 0.136 amol/µl (+90%) and S5: 0.503 amol/µl (+270%) (assuming no losses during sample preparation). An overview of the spike-in concentrations of all UPS2 proteins can be found in Supplementary Table S1. All MS experiments were performed on two Orbitrap Fusion Lumos mass spectrometers (Thermo Fisher Scientific, San Jose, CA) using the identical samples. The same MS methods were applied in both facilities (Biognosys, BGS and Fritz-Lipmann-Institute, FLI). The LC setups varied between BGS and FLI and are described in more detail below.
Library generation for targeted analysis of data-independent acquisition (DIA) data For library generation, 20 µg aliquots of the 10 samples were pooled. High-pH reverse phase fractionation of the sample was performed at FLI with the following setup. An Agilent 1260 Infinity HPLC System (Agilent, Santa Clara, CA) was used, equipped with a binary pump, degasser, variable wavelength UV detector (set to 220 and 254 nm), peltier-cooled autosampler (set at 10°C) and a fraction collector. The column was a Waters XBridge C18 column (3.5 µm, 100 x 1.0 mm, Waters, Milford, MA) with a Gemini C18, 4 x 2.0 mm SecurityGuard (Phenomenex, Torrance, CA) cartridge as a guard column. The solvent system consisted of 20 mM ammonium formate (pH 10) as mobile phase (A) and 100% acetonitrile as mobile phase (B). 100 µg peptides were injected and the separation was accomplished at a mobile phase flow rate of 0.1 mL/min using a non-linear gradient from 95 % A to 31 % B in 91 min. 48 fractions were collected along with the LC separation, which were subsequently pooled (in a non-sequential fashion) into 6 fractions (i.e., every 6th well was put together as a final fraction, 8 wells per fraction). This fractionation was repeated once to have enough sample for both laboratories. The pooled fractions from both separations were pooled, dried in a Speed-Vac and then stored at -80°C until LC-MS/MS analysis, 7 ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
whereupon they were resuspended in solvent A (1% acetonitrile, 0.1% formic acid in water) containing iRT peptides (Biognosys, Schlieren, Switzerland). The peptide concentration was determined by nano-drop (Spectrostar Nano, BMG labtech, Ortenberg, Germany) and 2 µg per fraction was used to acquire DDA data to make the sample specific library. Assuming no losses and an equal distribution of the UPS2 across the fractions, each fraction contained in total 1.45 pmol of UPS2 and 8.7 pmol in all six fractions. After fractionation the pooled fractions were analyzed by a DDA method at BGS and FLI. DDA method with following settings was used for library generation: MS1 scan: 60k, AGC target: 5e5, max injection time: 20 ms; MS2 scan: MIPS: peptide, intensity threshold: 2e4, charge state: 2-5, dynamic exclusion: 60 s, isolation width: 1.6 Th, activation time: HCD, HCD collision energy: 27%, orbitrap resolution: 15k, AGC target: 2e5, max injection time: 25 ms, cycle time: 3 s. At BGS an Easy nLC 1200 (Thermo Fisher Scientific, San Jose, CA) was coupled online to the MS. Peptide mixtures were separated in direct injection mode on an in-house packed analytical column at 50°C (Dr. Maisch ReproSil Pur 1.9 µm, ID 75 µm, 500 mm, fritted tip New Objective). Peptides were eluted by a non-linear 2h gradient at a flow rate of 250 nl/min from 1 % buffer B (85% acetonitrile, 0.1% formic acid in water) / 99% buffer A (0.1% formic acid in water) to 40% buffer B. Total runtime was 155 min, including sample loading, washing and equilibration. At FLI a NanoAcquity UPLC (Waters, Milford, MA) was coupled online to the MS. Peptide mixtures were separated in trap/elute mode, using a trapping (nanoAcquity Symmetry C18, 5 µm, 180 µm x 20 mm) and an analytical column (nanoAcquity BEH C18, 1.7 µm, 75 µm x 250 mm). The outlet of the analytical column was coupled directly to the MS using the Proxeon nanospray source. Solvent A was water, 0.1 % formic acid and solvent B was acetonitrile, 0.1 % formic acid. The samples were loaded with a constant flow of solvent A, at 5 µl/min onto the trapping column. Trapping time was 6 minutes. Peptides were eluted via the analytical column with a constant flow of 300 nl/min. During the elution step, the percentage of solvent B increased in a non-linear fashion from 0 % to 40 % in 120 minutes. Total runtime was 145 min, including clean-up and column re-equilibration. DDA data were searched with MaxQuant 1.5.6.5
37
against the UniProt mouse database (downloaded on
Jan 6th, 2016 as FASTA file) and the UPS2 database (www.sigma.com/ups) using the following settings: 8 ACS Paragon Plus Environment
Page 8 of 40
Page 9 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
fixed modifications: carbamidomethyl (C); variable modifications: oxidation (M), acetyl (protein N-term); enzyme: Trypsin/P; max. missed cleavages: 2; first search peptide tolerance: 20 ppm; main search peptide tolerance: 4.5 ppm; second peptide search was enabled. All other settings were set to default. Result were filtered by 1% FDR on PSM and protein level. A detailed overview of the fractionations, including cumulative precursor, peptide and protein identifications, can be found in Supplementary Figure S1 and in Supplementary Table S6 including number of MS2 scans per fraction. The MaxQuant search results were imported into Spectronaut Pulsar 11 (11.0.15038.5)
11
to generate a
library using the default settings. To avoid interferences from peptides that are shared between the UPS2 proteins and the mouse proteome, these peptides were removed from the library using in house R scripts (libraries: Supplementary Table).
DIA scan parameter and data analysis At both sites, the same LC setup (2h non-linear gradient) and mass spectrometer was used as for the generation of the library. The MS method was transferred from BGS to FLI. The method was adapted from 12.
Two µg of cerebellum digest containing between 89 fmol (S1) and 894 fmol (S5) of UPS2 were injected
(overview of UPS2 spike-in concentrations in Supplementary Table S1). In brief, following MS settings were applied: 1 MS1 scan at 120k resolution with an AGC target of 4e5 and max injection time of 20 ms in the range of 350 to 1,650 Th followed by 40 DIA scans with segment widths adjusted to the precursor density (Supplementary Table S3). Scan resolution in the orbitrap was set to 30k with an AGC target of 1e6 and max injection time of 60 ms. The HCD collision energy was set to 27%. DIA data were analyzed in Spectronaut Pulsar X (12.0.20228.0, Biognosys, Schlieren, Switzerland) using the previously generated library and default settings. Default normalization strategy was applied (local regression normalization38). The results were filtered by an FDR of 1% on precursor and protein group level (Q value 99%, the 10 samples were combined, and peptides were purified using MacroSpin clean up columns (NEST group, Southborough, MA) following manufactures protocol. Eluate was dried completely by vacuum centrifugation and resuspended in solvent A. To increase the analytical depth and to match the overall acquisition time to the DIA samples, the combined TMT sample (200 µg cerebellum sample and 145 pmol of UPS2) was fractionated into 10 fractions by high pH reverse chromatography on a Dionex Ultimate 3000 LC (Thermo Scientific, Sunnyvale, CA) using an ACQUITY UPLC CSH1.7µm C18 column (2.1 x 150mm, Waters, Milford, MA). Peptides were separated by a 30 min non-linear gradient from 1% buffer B (100% acetonitrile) / 99% buffer A (20 mM ammonium formate, pH 10) to 40% buffer B. A micro fraction was taken every 45 s and 10 ACS Paragon Plus Environment
Page 11 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
pooled into 10 final fractions. Pooled fractions were dried completely by vacuum centrifugation and resuspended in 20 µl solvent A containing iRT peptides. Assuming equal distribution of peptides in the pooled fractions, peptide concentration per fraction was 1 µg/µl of cerebellum digest and 725 fmol/µl of UPS2. The concentration was the UPS2 proteins of tier 1 was 0.8 amol/µl, tier 2: 8.2 amol/µl, tier 3: 81.6 amol/µl, tier 4: 816 amol/µl, tier 5: 8.2 fmol/µl and tier 6: 81.6 fmol/µl. At both sites, the same LC-MS setup was used as for the analysis of the DIA data and generation of the library. The SPS/MS3 MS method was transferred from FLI to BGS. One µg (equals 1 µl of sample, 145 fmol of UPS2) of peptides per fraction were analyzed and the same 2h gradients as described above. In brief, following MS settings were applied: for MS1 scan: resolution: 60k; scan range: 375-1,500; AGC target: 4e5; max injection time: 20ms; MIPS: peptide; Intensity threshold: 5e3; charge state: 2-5; dynamic exclusion: 20 s; for MS2 scan: isolation window: 1 Th; activation type: HCD; collision energy: 35%; detector type: ion trap; ion trap scan rate: rapid; AGC target: 1e4; max injection time: 50 ms; for SPS-MS3 scan: precursor selection range: 400-2,000; precursor ion exclusion: low: 18 Th, high: 5 Th; isobaric tag loss exclusion: TMT; synchronous precursor selection (SPS): enabled; number of SPS precursors: 8; MS isolation window: 2 Th; activation type: HCD; collision energy: 65%; detector type: Orbitrap; resolution: 50k; first mass: 100 Th; AGC target: 5e4; max injection time: 86 ms. The nature of the TMT label required to apply a higher collision energy for the analysis of the TMT samples (TMT samples: 35; unlabeled samples: 27). This was done according to the manufacturer’s recommendations and has also been done elsewhere40.TMT MS data were analyzed in ProteomeDiscoverer 2.2 (Thermo Fisher Scientific, San Jose, CA) using the same databases as described above using Mascot v2.5.1 (Matrix Science) with the following settings: Enzyme was set to trypsin, with up to 1 missed cleavage. MS1 mass tolerance was set to 10 ppm and MS2 to 0.5 Da. Carbamidomethyl cysteine was set as a fixed modification and oxidation of methionine and N-terminal acetylation as variable modifications. Other modifications included the TMT-10plex modification from the quantification method (as defined in Mascot). The quan method was set for reporter ions quantification with HCD and MS3 (mass tolerance, 20 ppm). The false discovery rate for peptide-spectrum matches (PSMs) was set to 0.01 using Percolator
41.
For the reporter ion abundance
only unique peptides were considered for the quantification and intensities were reported based on signalto-noise (S/N), with no corrections applied. Protein FDR was also set to 1% in the consensus workflow. 11 ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 40
The search results were filtered by a FDR of 1% on precursor and protein level. “A detailed overview of the fractionations, including cumulative precursor, peptide and protein identifications and labeling efficiencies (>99% for all channels), can be found in Supplementary Figure S1 and in Supplementary Table S6 including number of MS2/MS3 scans per fraction. The search results (as peptide spectrum match tables) were exported and proteins were grouped based on the IDpicker algorithm
42,
which is also
used in Spectronaut Pulsar X for protein grouping. Afterwards shared peptides of the UPS2 proteins with the mouse proteome were removed from the output and reporter ion channels were normalized by the median (quantitative data are summarized in Supplementary Table S7). The mass spectrometry proteomics data including the Spectronaut Pulsar X analyses have been deposited to the ProteomeXchange Consortium via the PRIDE
43
partner repository with the dataset
identifier PXD011691. The
Spectronaut
projects
can
be
viewed
using
the
free
Spectronaut
viewer
(www.biognosys.com/technology/spectronaut-viewer).
Statistical analysis of the data The DIA and TMT data were subjected to the same statistical analysis, to ensure that the choice of the method had no effect on the results. Precursors with more than 20% missing values across the 10 MS runs of DIA data or 10 channels of TMT data were filtered out. The remaining precursors were mapped to protein IDs. All the intensities of the precursors uniquely mapped to a protein were summarized into a single protein-level intensity in each sample, by calculating the median intensity of precursors of the protein in a sample (Supplementary Table S8A). The two-sample t-test was used separately for each protein to detect changes in protein abundance between two groups with different spiked-in concentration that are more systematic than as expected by random chance. The resulting candidate lists of differentially abundant proteins can be found as Supplementary Tables S8B (deviations from the spike-in ratios added to Supplementary Table S8A).
12 ACS Paragon Plus Environment
Page 13 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
Results Sample set and experimental overview For an informative quantitative comparison, we first generated a set of 10 controlled mixtures, similar to a true biological setting, but with known differences in protein abundance for some of the proteins. We achieved this by spiking digested UPS2 from Sigma into digests of mouse cerebellum from 10 distinct mice (Fig. 1). This setup allowed us to determine how many of the UPS2 proteins were quantified as differentially abundant in a background with biological variation. The UPS2 contained 48 proteins spanning 5 orders of magnitude dynamic range organized in 6 abundance tiers each with 8 proteins. We created 5 sample groups (S1 to S5), where the UPS proteins were spiked in different concentrations. Each sample group was represented by spiking the proteins into two biological replicates of cerebellum samples. The increase in UPS2 concentration between S1 and S2 was 10%, between S2 and S3 30%, between S3 and S4 90% and between S4 and S5 270%. The maximum fold change was ~10-fold between S1 and S5. The lowest amount of an UPS2 protein on column was 0.1 amol and the highest amount on column 100 fmol (for the DIA data; an overview of the spike-in concentrations of all UPS2 proteins can be found in Supplementary Table S1). The spike-in amount of UPS2 was chosen so that the UPS2 proteins were well distributed across the dynamic range and we wanted to avoid that the UPS2 proteins were among the highest abundant proteins in the sample. (The experimental design is illustrated for the DIA dataset in Supplementary Fig. S2). As the UPS2 covered 5 orders of magnitude, we did not expect to quantify proteins of the lowest abundant tiers in the complex background. Next, we used this sample set to compare DIA and TMT-based quantification. Typically, quantification with the DIA workflow utilizes the intensities of the fragment ions. Therefore, each sample was analyzed separately. This resulted in 10 LC-MS runs for the DIA experiment. In contrast, the TMT quantification is based on chemical modification of the peptides. We used the 10-plex TMT kit to label each of the 10 samples individually. The labeled samples were pooled. To increase the proteome coverage, we 13 ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 40
fractionated the pooled sample into 10 high pH reverse phase fractions. The DIA and the TMT experiments were analyzed by a 2h gradient in two facilities (Biognosys: BGS and Fritz-Lipmann-Institute: FLI), resulting in the same instrument time for each method (in total 20 h of gradient time per method). All the samples were analyzed on an Orbitrap Fusion Lumos mass spectrometer, chosen because it allows the SPS-MS3 acquisition for the analysis of the TMT samples and it is also suitable for DIA measurements. Prior to the mixing of the TMT samples, we checked the label efficiency with a short 1 h gradient LC-MS run. This added an overhead of 10 h gradient time to the TMT experiment. This allowed us to generate a library for the targeted analysis of the DIA data based on 6 high pH reverse phase fractions analyzed by a 2 h gradient. Including LC overheads from column washing/column equilibration/sample loading, it equaled the instrument time of the label check.
Higher number of protein identifications with TMT First, we compared the number of identified precursors, peptides and protein groups for the DIA and TMT method separately and in-between the facilities (summary in Table 1). While the depth of proteome coverage was not the primary focus of this work, it provided an insight into the sensitivity of the methods. For targeted analysis of the DIA data
10,
we generated a library based on DDA data from high pH reverse
phase fractions of pooled non-labeled samples. We didn’t generate very deep libraries to keep the workflow overhead times similar between TMT and DIA. The libraries had a similar size between both labs and comprised ~90,000 precursors (BGS: 87,208; FLI: 92,148) corresponding to ~61,500 peptides (BGS: 59,962; FLI: 63,312) and ~6,800 protein groups (BGS: 6,754; FLI: 7,003) at 1% FDR at precursor and protein group level. Even though the BGS library was slightly smaller 19 UPS2 proteins were identified, whereas 15 UPS2 proteins were found in the FLI library (Libraries: Supplementary Table S2). We applied the facility specific library in the analysis of the DIA data in Spectronaut Pulsar X. We generated a separate library for each LC-MS setup, because best results are typically obtained when the library and the DIA data are acquired under the same settings
44.
Slightly more precursors (BGS: 74,851; FLI: 14
ACS Paragon Plus Environment
Page 15 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
63,406), peptides (BGS: 53,572; FLI: 48,058), protein groups (BGS: 5,725; FLI: 5,139) and UPS2 proteins (BGS: 15; FLI: 12) were identified in the BGS data and we noticed an 80% overlap in protein group identification (Supplementary Fig. S3). Application of the TMT workflow with 10 high pH reverse phase fractions resulted in ~91,000 precursors (BGS: 87,839; FLI: 95,453), corresponding to ~65,000 peptides (BGS: 65,837; FLI: 65,633) and ~6,500 protein groups (BGS: 6,546; FLI: 6,448) at 1% FDR at precursor and protein group level. 16 UPS2 proteins were quantified in the BGS data and 19 UPS2 proteins in the FLI data. The identification results of the TMT workflow showed less variation between the two facilities than the DIA data. We also observed for the TMT data a large overlap in protein group IDs between the facilities (82%, Supplementary Fig. S3). Compared to the respective DIA dataset, we observed a higher number of identified precursors (BGS: +15%; FLI: +33%), peptides (BGS: +19%; FLI: +27%) and protein groups (BGS: +13%; FLI: +20%). Besides the number of identifications, we also compared the number of missing values, which are particularly important for the statistical analysis. Here, we noted a very low number of missing values on precursor level for the TMT data (BGS: 0.6%, FLI: 0.3%) and a higher value for the DIA data (BGS: 14%; FLI: 17%). Prior calculation of the missing values on protein level, precursors with more than 20% missing values were removed from the DIA and TMT datasets. The number of missing values was greatly reduced on protein level for the DIA dataset (BGS: 1%; FLI: 2%). We also observed very low values for the TMT dataset (BGS: 0.2%; FLI: 0.2%).
The TMT experiment provided better quantitative precision, while the DIA experiment provided better accuracy To compare the quantitative performance of the two workflows, we first determined the precision and the accuracy of the DIA- and TMT-based quantification. The precision on precursor and protein level was calculated as %CV across all samples, including the UPS2 precursors/proteins. The precision in our study was influenced by a combination of technical and biological variance, because the background proteome 15 ACS Paragon Plus Environment
Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 40
was derived from 10 different mice. The biological variance was the same for the DIA and TMT dataset and therefore the differences in the precision were based on the technical variation of the methods. Within our study TMT showed a slightly higher precision on precursor level with CVs ~10% for both labs (Supplementary Fig. S4A) and also on protein level with %CV ~ 8% (Fig. 2A). The precision of the DIAbased quantification was between 10% (FLI) and 11% (BGS) on precursor and protein level. The observed differences between the DIA and TMT datasets were statistically significant for both laboratories and on both precursor and protein level (Wilcoxon rank sum test, p value < 2.2e-16 for both laboratories). The precision of the TMT-based quantification on protein level was higher across the whole dynamic range of the quantified proteins (Supplementary Fig. S5). The accuracy of the quantification was calculated in terms of deviations of the estimates of fold changes of the UPS2 precursors and proteins from their expected values (protein level: Fig. 2B; precursor level: Supplementary Fig. S4B). The median deviation from the expected fold-change for the DIA datasets was ~18% on precursor level and ~26% on protein level (BGS: 28%, FLI: 25%). Thus, the deviation was significantly lower than for TMT, where the variation on precursor level was 35% (FLI) to 63% (BGS) and on protein level was ~38% (BGS: 39%, FLI: 36%) (Wilcoxon rank sum test, BGS p value = 5.5e-4, FLI p value = 3.4e-4). Statistical significance of the observed fold changes of the UPS2 precursors and proteins between the conditions was determined using a two-sample t-test on protein level. After applying a statistical significance cut off of 0.05 for all comparison, the median variation of the fold-change was reduced by 10 to 15% for both methods on precursor (Supplementary Fig. S4C) and on protein level (Fig. 2C). However, the differences between the methods were still significant (BGS p value = 5.6e-4, FLI value = 9.9e-4). The highest accuracy was observed when comparing protein abundances between S5 and S4 (abundance difference +270%, median absolute delta fold-change for both DIA datasets: 7%, BGS-TMT: 14%, FLI-TMT: 17%) and between S5 and S3 (+621%, DIA datasets: 10%, BGS-TMT: 27%, FLI-TMT: 30%, Supplementary Fig. S6A). Finally, we noticed that with both approaches the measured fold-changes on protein level were typically lower than expected based on the experimental setup (Supplementary Fig. S6B). . 16 ACS Paragon Plus Environment
Page 17 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Journal of Proteome Research
The TMT method resulted in ~15-20% more protein identifications which also covered >90% of the proteins quantified by the DIA method in both facilities (Supplementary Fig. S3B). A different set of quantified proteins can also influence the protein quantification. Therefore, we repeated the analysis of precision and accuracy only for the proteins that were quantified in common between the methods (Supplementary Fig. S7). Whereas the precision of the proteins quantified in common was similar (DIA: ~9-11%, TMT: ~8%), The overall quantitative accuracy improved and still the accuracy of DIA-based quantification was significantly ~20% compared to the TMT method (BGS-DIA: 13%, BGS-TMT: 32%, FLIDIA: 14%, FLI-TMT: 36%).Statistical testing for differential abundance was performed with a two-sample ttest based on protein quantities for both datasets (details in Material and Methods). We were particularly interested in the number of UPS2 proteins reported in either top 500, or top 100 differentially abundant proteins (Fig. 3A). We combined the statistical tests for all the possible pairs of sample groups (5 sample groups resulted in 10 pairwise comparisons for each protein). Among the top 100 differentially abundant proteins we observed 9 (FLI) to 14 (BGS) UPS2 proteins in the DIA dataset and 15 (FLI) to 6 (BGS) in the TMT dataset. In contrast, in the FLI dataset, TMT-based quantification resulted in more UPS2 proteins throughout the top 500 candidate list (top 500: TMT: 33, DIA: 29), in the BGS dataset DIA performed better until the top 300 comparison (DIA: 23, TMT: 23), further down the candidate list TMT performed better than DIA (top 500: TMT: 32, DIA: 29). When only including proteins that were quantified by both methods, the BGS results remained very similar to the analysis of all proteins, but for the FLI dataset both methods now performed basically equally (Supplementary Fig. S7D). The UPS2 covered 5 orders of magnitude in dynamic range with 8 proteins per abundance tier (in total 48 proteins in 6 abundance tiers). A more detailed analysis of these data showed that most of the proteins in the highest abundant tier of UPS2 (amount on column in DIA experiments between 10 fmol in S1 to ~100 fmol on column in S5, compare Supplementary Table S1) were detected as differentially abundant (p < 0.05; BGS-DIA: 32 out of 70 comparisons, BGS-TMT: 34/80, FLI-DIA: 33/70, FLI-TMT: 31/80; Fig. 3B). The main difference between the methods were observed for the low fold changes (0.05), white tones: around the significance threshold of 0.05, red tones: significant p values (0.05), white tones: around the significance threshold of 0.05, red tones: significant p values (