Comprehensive and Scalable Highly Automated MS-Based Proteomic

Jul 18, 2014 - *E-mail: [email protected]. ... large clinical studies, enabled by the development of a highly automated quantitative proteomic ...
1 downloads 0 Views 2MB Size
Article pubs.acs.org/jpr

Comprehensive and Scalable Highly Automated MS-Based Proteomic Workflow for Clinical Biomarker Discovery in Human Plasma Loïc Dayon,* Antonio Núñez Galindo, John Corthésy, Ornella Cominetti, and Martin Kussmann Molecular Biomarkers Core, Nestlé Institute of Health Sciences, Campus EPFL, Quartier de l’innovation, Lausanne CH-1015, Switzerland S Supporting Information *

ABSTRACT: Over the past decade, mass spectrometric performance has greatly improved in terms of sensitivity, dynamic range, and speed. By contrast, only limited progress has been accomplished with regard to automation, throughput, and robustness of the proteomic sample preparation process upstream of mass spectrometry. The present work delivers an optimized analysis of human plasma samples in both small preclinical and large clinical studies, enabled by the development of a highly automated quantitative proteomic workflow. Several iterative evaluation and validation steps were performed before process “design freeze” and development completion. A robotic liquid handling workflow and platform (including reduction, alkylation, digestion, TMT labeling, pooling, and purification) were shown to provide better quantitative trueness and precision than manual operation at the bench. Depletion of the most abundant human plasma proteins and subsequent buffer exchange were also developed and integrated. Finally, 96 identical pooled human plasma samples were prepared in a 96-well plate format, and each sample was individually subjected to our developed workflow. This test revealed increased throughput and robustness compared with to-date published manual or less automated workflows. Our workflow is ready-to-use for future (pre-) clinical studies. We expect our work to facilitate, accelerate, and improve clinical proteomic discovery in human blood plasma. KEYWORDS: automation, blood, clinical proteomics, depletion, isobaric labeling, mass spectrometry, plasma, sample preparation, tandem mass tag



In MS-based proteomics, a common “pipeline” to discover biomarker candidates for health/disease diagnosis, prognosis, and monitoring, typically looks as follows:8 (i) The initial discovery phase to generate lists of candidates is performed by relatively quantifying thousands of protein analytes in a limited number of samples, typically dozens, which are often tissues or body fluids. (ii) In the qualification and verification steps, the biomarker candidates, usually about 10−100, are then measured in human plasma samples, which is the preferred sample in routine clinical practice; these steps require targeted, multiplexed quantitative methods that may or may not be available; if not, time-consuming method design and development is required. (iii) Clinical validation is then performed on the remaining candidates using a very high number of samples. Overall, such “pipeline” results in a very long process, and a limited number of candidates have been translated into routinely used protein biomarkers.9,10 This limitation calls for a more robust and feasible alternative strategy regarding the discovery workflow, which is the basis and motivation of our

INTRODUCTION

Mass spectrometry (MS) is the main analysis technique for large-scale proteomic studies.1,2 In the past few years, great efforts have been made to improve the MS performance in terms of sensitivity, dynamic range, speed, and robustness as well as to provide easier-to-use hybrid instruments with the scope of “democratizing” the field beyond the experienced proteomic laboratories. Concomitantly with the development of robust quantitative MS-based techniques, these improvements now enable extensive quantitative proteome coverage in a reasonable amount of time with direct reversed-phase liquid chromatography tandem MS (RP-LC−MS/MS or MS2).3 By contrast, rather few developments have been accomplished with regard to throughput and automation of the proteomic sample preparation process, and only a few solutions are currently available on the market.4−7 Nonetheless, sample preparation throughput and robustness represent key elements to leverage the proteomic field and enable large-scale studies not only in terms of proteome coverage but also number of samples. This assertion is particularly relevant for clinical applications, where hundreds or thousands of samples have to be analyzed. © 2014 American Chemical Society

Received: June 24, 2014 Published: July 18, 2014 3837

dx.doi.org/10.1021/pr500635f | J. Proteome Res. 2014, 13, 3837−3845

Journal of Proteome Research

Article

depletion of 14 abundant human proteins, multiple affinity removal system (MARS) columns, Buffer A, and Buffer B were obtained from Agilent Technologies (Wilmington, DE). BCA protein assay was purchased from Thermo Scientific. E. coli sample (Bio-Rad, Hercules, CA) was dissolved in H2O and precipitated with prechilled acetone (−20 °C) overnight. After centrifugation at 8000g for 10 min (4 °C), acetone was decanted, and the pellets were dried for 10 min.

present development, that is: (i) increasing the number of samples included in the initial phase to decrease the falsediscovery rate (FDR); (ii) improving and refining the list of potential candidates; and (iii) enabling effort/delivery-optimized development of targeted, multiplexed quantitative assays for candidate qualification and verification. In addition, performing the initial discovery phase directly in human blood plasma samples is even more relevant when the ultimate goal is indeed to develop diagnostic/prognostic/monitoring tests in this sample matrix.11 In this perspective, throughput and robustness of the whole proteomic workflow is, again, crucial. To improve both throughput and robustness of the proteomic discovery methodology, we have developed and assessed a highly automated sample preparation and MS analysis workflow. The work was primarily aimed at optimizing the proteomic analysis of (human) plasma samples for both small preclinical and large clinical studies. Several iterative evaluation and validation steps of the proteomic workflow were performed. In particular, proteome coverage, reproducibility, and quantitative trueness and precision were assessed at every stage of the sample preparation and analysis processes. We have previously published a workflow that consists of immunodepletion, reduction/alkylation/digestion of proteins, isobaric labeling of peptides to facilitate multiplexed quantification, and a 3h long RP-LC−MS/MS analysis.12 In the present work, the developed robotic workflow, including protein reduction, alkylation, digestion, isobaric tandem mass tag (TMT) labeling,13 and peptide purification, was evaluated by comparing with standard experiments performed in parallel by manual operations. Depletion of the top 14 most abundant human plasma proteins and buffer exchange were incorporated upstream.14 Finally, 96 identical plasma samples were prepared in a 96-well plate format using the highly automated workflow. Contribution of precursor interferences on the TMT quantification performances was also evaluated in the particular context of plasma sample analysis. Overall, we report herein an extensive development and characterization (at each method step and development stage) of a highly automated sample preparation pipeline that we have meanwhile successfully deployed to analyze a cohort of human subjects encompassing more than 1000 blood plasma samples (unpublished work).



Sample Preparation

Only the final highly automated protocol is described here. From 25 μL of pooled plasma sample (diluted in 75 μL of Buffer A containing 0.0134 mg·mL−1 LACB and filtered with 0.22 μm filter plate from Millipore), 14 abundant plasma proteins were removed, following the manufacturer instructions, with MARS columns and high-performance (HP) LC systems (Thermo Scientific, San Jose, CA) equipped with an HTC-PAL (CTC Analytics AG, Zwingen, Switzerland) fraction collectors. After immuno-depletion, samples were snap-frozen. Buffer exchange was performed with Strata-X 33u Polymeric RP (30 mg/1 mL) cartridges mounted on a 96-hole holder and a vacuum manifold, all from Phenomenex (Torrance, CA). Conditioning, equilibration/washing, and elution were performed with 100% CH3CN/0.08% TFA, H2O/0.1% TFA, and 70% CH3CN/0.08% TFA, respectively. Samples were evaporated with a vacuum centrifuge (Thermo Scientific) and stored at −80 °C. Reduction, alkylation, digestion, TMT labeling, and sample purification were implemented on a 4-channels Microlab Star liquid handler (Hamilton, Bonaduz, Switzerland) according to a previously reported procedure.12 In brief, each lyophilized sample was dissolved in 95 μL of TEAB 100 mM and 5 μL of SDS 2%. Next, a volume of 5.3 μL TCEP 20 mM was added, and incubation was performed for 1 h at 55 °C. A volume of 5.5 μL of IAA 150 mM was added (incubation for 1 h in the dark). Following this step, enzymatic digestion was performed by the addition of 10 μL trypsin at 0.25 μg·μL−1 in TEAB 100 mM and incubation overnight at 37 °C. TMT labeling was performed by the addition of 0.8 mg sixplex TMT reagent in 41 μL of CH3CN (incubation for 1 h at room temperature). After reaction, a volume of 8 μL of hydroxylamine 5% in H2O was added to each tube to react for 15 min. Samples from a given sixplex TMT experiment were pooled together in a new tube. Pooled samples were further purified with Oasis HLB cartridges (1 cm3, 30 mg) from Waters (Milford, MA), followed by strong cation-exchange (SCX) solid-phase extraction (SPE) using SCX cartridges from Phenomenex. Samples were then evaporated to dryness before storage at −80 °C. The same procedure was followed for the two-proteome model experiments with previous spiking of the E. coli protein extract sample in 30 μL of human plasma, as detailed later in the text.

EXPERIMENTAL SECTION

Materials

Pooled human plasma samples were obtained from the DiOGenes project (http://www.diogenes-eu.org/)15 and used for method development before performing additional individual sample measurements. Iodoacetamide (IAA), tris(2-carboxyethyl) phosphine hydrochloride (TCEP), triethylammonium hydrogen carbonate buffer (TEAB) 1 M pH 8.5, sodium dodecyl sulfate (SDS), and β-lactoglobulin (LACB) from bovine milk were purchased from Sigma (St. Louis, MO). Formic acid (FA, 99%) was from BDH (VWR International, Poole, U.K.). Hydroxylamine solution 50 wt % in H2O (99.999%) was from Aldrich (Milwaukee, WI). Water (18.2 MΩ·cm at 25 °C) was obtained from a Milli-Q apparatus (Millipore, Billerica, MA), and acetonitrile was from BDH. Trifluoroacetic acid (TFA) Uvasol was from Merck Millipore (Billerica). The sixplex TMTs were purchased from Thermo Scientific (Rockford, IL). Sequencing-grade modified trypsin was from Promega (Madison, WI). For immuno-affinity

RP-LC−MS/MS

The pooled sixplex TMT-labeled samples were dissolved in 500 μL of H2O/CH3CN/FA 96.9/3/0.1 for RP-LC−MS/MS analysis. LC MS/MS was performed on a hybrid linear ion trap-Orbitrap (LTQ-OT) Elite from Thermo Scientific equipped with an Ultimate 3000 RSLC nano system (Thermo Scientific). Proteolytic peptides (injection of 5 μL of sample) were trapped on an Acclaim PepMap 75 μm × 2 cm (C18, 3 μm, 100 Å) precolumn (Thermo Scientific). Following washing, peptides were separated on an Acclaim PepMap RSLC 75 μm × 50 cm (C18, 2 μm, 100 Å) column (Thermo 3838

dx.doi.org/10.1021/pr500635f | J. Proteome Res. 2014, 13, 3837−3845

Journal of Proteome Research

Article

Figure 1. (a) Experimental design for evaluation of reduction, alkylation, digestion, TMT labeling, and purification using a liquid handler. Each sixplex TMT experiment was performed in duplicate, and each was analyzed in triplicate by RP-LC−MS/MS. Samples were either treated individually (red circles) or splitted after common reduction, alkylation, and digestion (orange circles) performed at the bench. (b) Distribution of log2(in/i126) obtained from raw spectral data matched to a peptide sequence (all experiments) for the “Liquid Handler (1)” and the “Bench (2)” experiments. (c) Distribution of log2(in/i126) obtained from raw spectral data matched to a peptide sequence (all experiments) for the “Liquid Handler (1)” (also called here “Full Process (1)”), “After Digestion (3)”, and “After Labelling (4)” experiments.

maximum injection time of 250 ms. A maximum of 10 precursor ions (most intense) were selected for activation and subsequent MS2 analyses. HCD was performed at 35% of the normalized collision energy. Dynamic exclusion was set for 60 s within a ± 5 ppm window. The lock mass at m/z = 445.1200 Th was used. Each sample was analyzed in triplicate.

Scientific) coupled to a stainless-steel nanobore emitter (40 mm, OD 1/32”) (Thermo Scientific) mounted on a Nanospray Flex Ion Source (Thermo Scientific). The analytical separation was run for 150 min using a gradient that reached 30% of CH3CN after 140 min and 80% of CH3CN after 150 min. A flow rate of 220 nL·min−1 was used. For MS survey scans, the OT resolution was 120 000, and the ion population was 1 × 106 with an m/z window from 300 to 1500 Th. For MS2 detection in the OT with higher-energy collisional dissociation (HCD), ion population was set to 1 × 105 (isolation width of 2 Th), with a resolution of 15 000, first mass at m/z = 100 Th, and a

Data Processing and Analysis

Proteome Discoverer (version 1.4, Thermo Scientific) was used as data analysis interface. Identification was performed in the human UniProtKB/Swiss-Prot database (24/07/2013 release) 3839

dx.doi.org/10.1021/pr500635f | J. Proteome Res. 2014, 13, 3837−3845

Journal of Proteome Research

Article

Table 1. Descriptive Statistics for the Protein Quantification in the Successive Workflow Evaluationsa experiment no.

short description

expected ratio fold change

expected log2 (fold change)

1/2; 1/4

−1; −2

(2)

liquid handler or full process bench

1/2; 1/4

−1; −2

(3)

after digestion

1/2; 1/4

−1; −2

(4)

after labelling

1/2; 1/4

−1; −2

(5) (6) (7)

individual after depletion after buffer exchange 96-well plate

1 1 1 1

(1)

(8)

median

mean

RSDb

lower 95% CI of meanc

upper 95% CI of meanc

Kurtosis

0 0 0

−1.114; −2.018 −0.9794; −1.945 −1.009; −2.089 −1.011; −2.029 −0.1086 −0.05155 −0.03965

−1.101; −2.002 −0.9820; −1.951 −1.009; −2.088 −1.013; −2.033 −0.1121 −0.05895 −0.04117

0.08878; 0.1228 0.07535; 0.2023 0.04735; 0.07773 0.04438; 0.06618 0.2570 0.1461 0.09111

−1.107; −2.011 −1.094; −1.993 −1.012; −2.093 −1.016; −2.038 −0.1192 −0.06301 −0.04370

−0.9875; −1.965 −0.9766; −1.936 −1.005; −2.082 −1.010; −2.028 −0.1049 −0.05489 −0.03864

34.48; 32.56 14.31; 393.8 7.466; 3.394 5.516; 26.71 51.70 2.051 5.994

0

−0.0283

−0.02400

0.1654

−0.02571

−0.02228

3.844

a

Details of the experiments are given in the text and displayed in the Figures. In all cases, quantitative ratios, e.g., fold changes, were calculated as in/ i126. bRSD stands for relative standard deviation. cCI stands for confidence interval.

columns.12 The same approach was followed herein with a 150 min RP-LC separation using a 50 cm column. For optimization purposes, several additional steps were considered in terms of sample loading, data acquisition, and processing. In particular, most suitable amounts of peptide to be analyzed were determined (see Experimental Section) to optimize proteome coverage (data not shown). For MS2, HCD was used with a maximum of 10 precursor ions to analyze sixplex TMT-labeled tryptic peptides. This choice was motivated by the globally good performance of HCD with the LTQ-OT Elite17,18 and the simplicity of data processing using the bioinformatics pipeline detailed in the Experimental Section. Database searches were performed with Mascot, and the search parameters were adjusted for human plasma sample, as indicated in the Experimental Section. The Mascot result files were sequentially uploaded in the Scaffold software and merged afterward whenever needed. We opted for this data processing pipeline because the Scaffold software allows visualizing and organizing hundreds of TMT experiments together, for example, thousands of individual samples, a sine qua non condition for large studies.

including the LACB sequence (20 268 sequences in total). Mascot16 (version 2.4.0, Matrix Sciences, London, U.K.) was used. Variable amino acid modifications were oxidized methionine, deamidated asparagine/glutamine, and sixplex TMT-labeled peptide amino terminus (+ 229.163 Da). Sixplex TMT-labeled lysine (+ 229.163 Da) was set as fixed modifications as well as carbamidomethylation of cysteine. Trypsin was selected as the proteolytic enzyme, with two potential missed cleavages. Peptide and fragment ion tolerance were set to, respectively, 10 ppm and 0.02 Da. All Mascot result files were loaded into Scaffold 4.2.1 (Proteome Software, Portland, OR) to be further searched with X! Tandem. Peptide and protein FDRs were fixed at 1 and 1%, respectively, with a 2 unique peptide criterion to report protein identification. Quantitative values were exported from Scaffold either as raw spectral data or as log2 of the protein ratio fold changes, that is, mean log2 values after isotopic purity correction but without normalization applied between samples and experiments. Calculation and statistics were performed with Excel 2010 (Microsoft, Redmond, WA) and GraphPad Prism 5 (GraphPad Software, San Diego, CA). For the two-proteome model experiments, an additional search was performed with Mascot in the UniProtKB/SwissProt database restrained to the E. coli taxonomy. Other parameters were unchanged.

Automated Sample Preparation: Reduction, Alkylation, Digestion, TMT Labeling, and Purification

Liquid-handling robots are the machines of choice to process a high number of samples in a reproducible manner. For highthroughput protein reduction/alkylation/digestion, sixplex TMT labeling of tryptic peptides, subsequent pooling, and purifications (i.e., sample cleanup), a robotic platform was developed to prepare 96 samples in parallel. To validate the successive sample preparation steps and the entire platform, we benchmarked the performances of the liquid-handling platform (“Liquid Handler (1)”) against experienced, routine manual work at the bench (“Bench (2)”). As illustrated in Figure 1a, plasma protein samples were prepared in three different quantities, that is, 100, 50, and 25 μg. Theoretical ratios of 1/2 for 50 and 1/4 for 25 μg, relative to the highest quantity of 100 μg, were expected to be obtained with RP-LC−MS/MS analysis. (See Table 1.) Computing all matched raw spectral data together, the distributions of the log2 of the ratio fold change as measured with MS were represented in Figure 1b. Theoretically, two lines at log2 values of −1 and −2 for ratios of 1/2 and 1/4, respectively, should have been obtained. We experimentally achieved curves with maxima centered on those theoretical



RESULTS AND DISCUSSION The development and characterization of the highly automated plasma sample preparation and MS analysis workflow was performed block-wise and in reversed sense with regard to the work stream to ensure proper assessment of each individual step of the method. First, optimal RP-LC−MS/MS and data processing parameters were defined; then, sample preparation with a liquid handler was implemented and evaluated; finally, upstream immuno-depletion of abundant proteins and buffer exchange were incorporated. All of these method blocks were developed and assessed considering robustness, throughput, and day-to-day practicability (i.e., repeatability, convenience, robustness). Finally, validations were performed to assess the reproducibility and quantitative capabilities of the platform in real-life conditions. RP-LC−MS/MS Analysis and Data Processing

We have previously defined and reported improved conditions for RP-LC−MS/MS such as the use of long LC gradients and 3840

dx.doi.org/10.1021/pr500635f | J. Proteome Res. 2014, 13, 3837−3845

Journal of Proteome Research

Article

Figure 2. (a) Experimental design for evaluation of immuno-depletion and buffer exchange. Each sixplex TMT experiment was performed in duplicate, and each was analyzed in triplicate by RP-LC−MS/MS. (b) Distribution of log2(in/i126) obtained from raw spectral data matched to a peptide sequence (all experiments). (c) Representation of log2(ratio fold change) for proteins commonly quantified under all experimental conditions and all technical and measurement replicates (i.e., 166 proteins).

lines. When comparing distributions for the “Liquid Handler (1)” and the “Bench (2)” experiments, it appeared that the robotic platform performed generally as good as or better than manual operation at the bench both in terms of reproducibility and accuracy, as further exemplified by values of the median, mean, and Kurtosis (Table 1). As additional observations in Figure 1b, more values fitted the theoretically expected ratios, attesting higher performance for the automated platform. Full

width at half-maximum (fwhm) for the “Liquid Handler (1)” curve was narrower and showed enhanced precision. Characterizing the individual sample preparation steps, we demonstrated that the protein reduction/alkylation/digestion processes induced higher variability than the TMT labeling procedure (Figure 1c). For the experiments named “After Digestion (3)” and “After Labelling (4)”, protein reduction/ alkylation/digestion was performed on a higher quantity of 3841

dx.doi.org/10.1021/pr500635f | J. Proteome Res. 2014, 13, 3837−3845

Journal of Proteome Research

Article

Figure 3. (a) Workflow used for the analysis of human plasma samples. In total, 96 identical samples were individually prepared using the highly automated MS-based proteomic workflow. RP-LC−MS/MS analysis was performed in triplicate. (b) Distribution of log2(in/i126) obtained from matched raw spectral data (all experiments). (c) List of the 15 proteins (from 149 proteins commonly quantified in the 96 samples) given by SwissProt entry names, presenting the lowest variability through the whole workflow. Variability is represented as the variation of log2(ratio fold change) across the 96 analyzed samples. (d) List of the 15 proteins (from 149 proteins commonly quantified in the 96 samples), given by Swiss-Prot entry names, showing the highest variability through the whole workflow. Variability is represented as the variation of log2(ratio fold change) across the 96 analyzed samples.

represent up to 99% of the total bulk protein mass. As a result, when performing straight RP-LC−MS/MS analysis of plasma proteins, these abundant proteins largely hide the less abundant ones, as a general consequence of the matrix effect in a complex sample. Therefore, immuno-depletion of the most abundant plasma proteins is recommended to increase proteome coverage.14 Abundant-protein immuno-depletion and buffer exchange were hence integrated upstream of the previously described procedures. To optimize throughput and reproducibility, we opted for HPLC and antibody-based columns for the depletion and for SPE for buffer exchange. (The latter was selected from several alternatives, such as molecular-weight cutoff spin concentration and protein precipitation; data not shown.) Buffer exchange was required at this point because the depletion buffer is incompatible with protein reduction/ alkylation/digestion. Parallel experiments were performed where identical pooled plasma samples (same amount of proteins, i.e., 30 μL of plasma samples) underwent depletion and buffer exchange. Half of the individual samples were pooled after each tested stage to measure the variability induced by each of those steps, as displayed in Figure 2a. The distribution of the log2 of the ratio fold changes of the matched raw spectral data obtained with MS showed that accuracy of quantitative measurements decreased when

sample, and only then the resulting tryptic digest was aliquoted at three different quantities as previously defined, that is,100, 50, and 25 μg (Figure 1a). Performing TMT labeling with the liquid handler or at the bench did not show different results and statistics (Figure 1c and Table 1), yet a slight alteration of the quantitative value distribution (value at maximum and fwhm) was noticed when the protein reduction/alkylation/digestion was individually performed, as it was the case for the “Liquid Handler (1)” experiment (Figure 1c and Supporting Information Figures S1 and S2). Larger relative standard deviation (RSD) values (Table 1) confirmed the observation that the reduction/alkylation/digestion induced larger variability than the TMT labeling. Overall, the entire automated process displayed state-of-theart characteristics for quantitative proteomics with 95% confidence intervals (CIs) of the mean of 0.464 to 0.504 and 0.248 to 0.256 for ratio fold changes corresponding to theoretical ratios of 1/2 and 1/4, respectively. A clear benefit of using liquid handling robotics was confirmed for both throughput and robustness. Hyphenated Depletion of Abundant Proteins and Buffer Exchange for Human Plasma Samples

Human plasma is a challenging sample matrix to analyze because protein concentrations span 10 to 12 orders of magnitude. In particular, only a few abundant proteins 3842

dx.doi.org/10.1021/pr500635f | J. Proteome Res. 2014, 13, 3837−3845

Journal of Proteome Research

Article

depletion and buffer exchange were added to the sample preparation workflow (Figure 2b). The precision and trueness were, as expected, higher when individual samples were pooled and further aliquoted “After Depletion (6)” and “After Buffer Exchange (7)”. The largest variability was induced by the immuno-depletion step, as shown by the RSD and mean values (Table 1). The newly introduced buffer exchange step (see Figure 2b,c for comparison between “After Depletion (6)” and “After Buffer Exchange (7)” experiments) was easily parallelized and provided good readouts (95% CIs of the mean shifting from 0.970−0.974 to 0.957−0.963 with buffer exchange for the theoretical protein ratio fold change of 1). The addition of multiplexed one-column immuno-depletion of 14 abundant proteins (see Figure 2b,c for “Individual (5)” experiment) still guaranteed an acceptable analytical performance (Table 1), for example, 95% CIs of the mean of 0.921 to 0.930 for the theoretical protein ratio fold change of 1. This characterization exercise helped define the extent to which the workflow could be used for the determination of true ratio fold changes. (See the next paragraph.) It was finally decided to include depletion and buffer exchange to the final highly automated workflow for the analysis of human plasma samples. Some limitations of the abundant-protein immuno-depletion process have been reported.19 In particular, concomitant unspecific and unintended removal of proteins has been demonstrated, and the study of the depletion column-retained fraction, in addition to the depleted plasma, was recommended.20 However, the analysis of such retained fraction was not performed in our study because of time and throughput considerations in our large-scale biomarker discovery programs. As a general fact and issue, almost any analytical step in a complex proteomic workflow, as described in the present work, can cause unintended loss of proteins. This further reinforces the crucial need for reproducibility of each and every step in the process.

0.3770 for the log2 of the ratio fold changes. In Figure 3c,d are represented the least and most variable plasma proteins when subjected to the complete workflow. Among the most variable ones, several proteins should have been removed during the immuno-depletion (e.g., APOA2, FIBA, FIBB, and FIBG), showing that the immuno-affinity removal is not 100% effective, and values for those proteins should not be considered for posterior data analysis. Assessment of Interferences with a Two-Proteome Model

Isobaric tagging like TMT is known to suffer from interferences resulting from cofragmentation of several precursor ions at a time. As shown by us and others, this phenomena can be alleviated or eliminated by using alternative acquisition methods for recording of the reporter ions such as gas-phase purification,21 MS3,22,23 or ion mobility MS.24,25 Despite the sample preparation workflow being independent and decoupled from the MS analysis per se, we evaluated to what extent HCD MS2 as used in this report with an LTQ-OT Elite mass spectrometer can provide reliable data and if such acquisition mode can be used in future studies. As an alternative, the very last generation of Tribrid mass spectrometer3 (i.e., Orbitrap Fusion) that offers straightforward MS 3 analysis with synchronous precursor selection (SPS) feature to eliminate reporter-ion interferences26 would allow us to fully leverage our highly automated sample preparation workflow. To evaluate how reliable the quantitative measurements obtained were, we performed a calibration curve experiment by spiking increasing amounts of an E. coli protein extract (0:0.5:1:1.25:2.5:5 ratios) in identical pooled human plasma samples (1:1:1:1:1:1 ratios) using sixplex TMT (Figure 4). E. coli protein total extract was spiked in the concentration range of μg/μL to ng/μL in plasma. While the human plasma

Application to a Full 96-Well Plate

The integrated complete workflow was applied to 96 identical pooled plasma samples (Figure 3a). Each sample was prepared individually in a 96-well plate format, and relative quantification was performed with MS2 using sixplex TMT. Abundant-protein immuno-depletion remained the rate-limiting step, and it required 4 days to be completed for the batch of 96 samples; buffer exchange, including overnight evaporation, took 16 h; the entire robotic procedure, including final evaporation, delivered the sample in 2 days for RP-LC−MS/MS analysis. (See the Experimental Section.) The representation of the distribution of the ratio fold changes, which were theoretically expected to be 1, attested good overall trueness and precision (Figure 3b and in Table 1, see “96-Well Plate (8)”; see also Supporting Information, Figure S3). We further determined that 95% of the raw quantitative spectral data were comprised between 0.769 and 1.3 ratio fold changes (Figure 3b), which showed that below and above this interval quantification differences can be regarded as being of “true biological” rather than of “technical” nature. An average coefficient of variation (CV) of 8% was determined on the relative abundance values of the proteins commonly identified in all 96 samples. In addition, we sorted the proteins according to their individual RSD across the 96 samples. In total, 149 proteins (see Supporting Information, Table S4) were consistently quantified in the 96 samples with RSD ranging from 0.0966 to

Figure 4. Quantitative performances obtained with a two-proteome model using the developed workflow. E. coli total protein extract was spiked at 0, 2.5, 5, 6.25, 12.5, and 25 μg in 30 μL of human plasma, that is, final concentration of 0, 0.083, 0.167, 0.208, 0.417, and 0.833 μg/μL, respectively. Samples were prepared as previously described using the complete highly automated workflow. While human plasma proteins were quantitatively stable (average ratio of 1.0; RSD = 15%), a calibration curve was obtained for the E. coli proteins (y = 0.8115 × x + 1.16102; R2 = 0.9957). TMT sixplex experiments were performed four times, and each was analyzed in triplicate by RP-LC−MS/MS. The measured amounts were recalculated from the measured relative abundances obtained with MS and the total spiked amount. 3843

dx.doi.org/10.1021/pr500635f | J. Proteome Res. 2014, 13, 3837−3845

Journal of Proteome Research



ACKNOWLEDGMENTS We thank Jörg Hager, Ivo Fierro Monti, Guillaume Gesquiere, Sylviane Métairon, Montserrat Ferrer, Nicolas Bouche, Christophe Boss, and Martin Jech for their help and support.

proteins were quantitatively constant across all samples, we observed the expected calibration curve for the 88 E. coli proteins consistently quantified, yet we detected quantitative ratio compression27 as previously described when using MS2 mode (as seen from the slope of the calibration curve) and a residual background from human proteins as proved by the nonzero value of the y axis intercept. Nonetheless, a fairly robust calibration curve was obtained (y = 0.8115 × x + 1.16102; R2 = 0.9957), showing that our workflow is sufficiently accurate and can be further used for real-life discovery application.



CONCLUSIONS Sample preparation for large-scale proteomic studies, especially those for clinical biomarker candidate discovery, remains today a bottleneck. When manual operation is not feasible in practical terms, (semi)automated procedures are required. Unfortunately, only a few solutions are available on the market in terms of complete sample preparation and many lack versatility. Therefore, we have developed a comprehensive highly automated workflow for the analysis of human plasma proteins. With two immuno-depletion systems, one liquid handler, and one LC−MS/MS instrument we have the capability to analyze 2 × 96 samples within 3 weeks with better quantitative trueness and precision than manual operation at the bench; this workflow is certainly scalable by multiplication and multiplexing of the same type of resources, both instrumental and human, and costs. Because of analytical logic, this sequential array of steps cannot be parallelized per batch; nonetheless, the versatility of the platform offers the possibility to perform each step as optional or mandatory. Importantly, we extensively characterized the procedure and evaluated its current boundaries for future real-life applications. With slight adaptations, the platform is expected to deliver good performances for different sample types, other than plasma, in order to cope with the massive number of proteomes that still need to be analyzed and compared. ASSOCIATED CONTENT

S Supporting Information *

Figures S1 and S2. Fold changes and standard deviations for identified and quantified proteins in the experiments described in Figure 1. Figure S3. Representation of quantitative data for spiked LACB standard (experiment described in Figure 3). Table S4. List of proteins identified and quantified in the experiment described in Figure 3. This material is available free of charge via the Internet at http://pubs.acs.org.



REFERENCES

(1) Aebersold, R.; Mann, M. Mass spectrometry-based proteomics. Nature 2003, 422 (6928), 198−207. (2) Domon, B.; Aebersold, R. Mass spectrometry and protein analysis. Science 2006, 312 (5771), 212−217. (3) Hebert, A. S.; Richards, A. L.; Bailey, D. J.; Ulbrich, A.; Coughlin, E. E.; Westphall, M. S.; Coon, J. J. The one hour yeast proteome. Mol. Cell. Proteomics 2014, 13 (1), 339−347. (4) Alterovitz, G.; Liu, J.; Chow, J.; Ramoni, M. F. Automation, parallelism, and robotics for proteomics. Proteomics 2006, 6 (14), 4016−4022. (5) Hou, W.; Ethier, M.; Smith, J. C.; Sheng, Y.; Figeys, D. Multiplexed proteomic reactor for the processing of proteomic samples. Anal. Chem. 2007, 79 (1), 39−44. (6) Switzar, L.; van Angeren, J.; Pinkse, M.; Kool, J.; Niessen, W. M. A. A high-throughput sample preparation method for cellular proteomics using 96-well filter plates. Proteomics 2013, 13 (20), 2980−2983. (7) Whiteaker, J. R.; Zhao, L.; Anderson, L.; Paulovich, A. G. An automated and multiplexed method for high throughput peptide immunoaffinity enrichment and multiple reaction monitoring mass spectrometry-based quantification of protein biomarkers. Mol. Cell. Proteomics 2010, 9 (1), 184−196. (8) Rifai, N.; Gillette, M. A.; Carr, S. A. Protein biomarker discovery and validation: The long and uncertain path to clinical utility. Nat. Biotechnol. 2006, 24 (8), 971−983. (9) Anderson, N. L. The clinical plasma proteome: A survey of clinical assays for proteins in plasma and serum. Clin. Chem. 2010, 56 (2), 177−185. (10) Anderson, N. L.; Anderson, N. G. The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 2002, 1 (11), 845−867. (11) Zolg, W. The proteomic search for diagnostic biomarkers: Lost in translation? Mol. Cell. Proteomics 2006, 5 (10), 1720−1726. (12) Dayon, L.; Kussmann, M. Proteomics of human plasma: A critical comparison of analytical workflows in terms of effort, throughput and outcome. EuPA Open Proteomics 2013, 1, 8−16. (13) Dayon, L.; Hainard, A.; Licker, V.; Turck, N.; Kuhn, K.; Hochstrasser, D. F.; Burkhard, P. R.; Sanchez, J. C. Relative quantification of proteins in human cerebrospinal fluids by MS/MS using 6-plex isobaric tags. Anal. Chem. 2008, 80 (8), 2921−2931. (14) Echan, L. A.; Tang, H. Y.; Ali-Khan, N.; Lee, K.; Speicher, D. W. Depletion of multiple high-abundance proteins improves protein profiling capacities of human serum and plasma. Proteomics 2005, 5 (13), 3292−3303. (15) Larsen, T. M.; Dalskov, S. M.; Van Baak, M.; Jebb, S. A.; Papadaki, A.; Pfeiffer, A. F. H.; Martinez, J. A.; Handjieva-Darlenska, T.; Kunešová, M.; Pihlsgård, M.; Stender, S.; Holst, C.; Saris, W. H. M.; Astrup, A. Diets with high or low protein content and glycemic index for weight-loss maintenance. New Engl. J. Med. 2010, 363 (22), 2102−2113. (16) Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20 (18), 3551−3567. (17) Chiva, C.; Sabidó, E. HCD-only fragmentation method balances peptide identification and quantitation of TMT-labeled samples in hybrid linear ion trap/orbitrap mass spectrometers. J. Proteomics 2014, 96, 263−270. (18) Pichler, P.; Köcher, T.; Holzmann, J.; Möhring, T.; Ammerer, G.; Mechtler, K. Improved precision of iTRAQ and TMT quantification by an axial extraction field in an orbitrap HCD cell. Anal. Chem. 2011, 83 (4), 1469−1474.





Article

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Phone: +41 21 632 6114. Fax: +41 21 632 6499. Author Contributions

The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Notes

The authors declare no competing financial interest. 3844

dx.doi.org/10.1021/pr500635f | J. Proteome Res. 2014, 13, 3837−3845

Journal of Proteome Research

Article

(19) Ichibangase, T.; Moriya, K.; Koike, K.; Imai, K. Limitation of immunoaffinity column for the removal of abundant proteins from plasma in quantitative plasma proteomics. Biomed. Chromatogr. 2009, 23 (5), 480−487. (20) Yadav, A. K.; Bhardwaj, G.; Basak, T.; Kumar, D.; Ahmad, S.; Priyadarshini, R.; Singh, A. K.; Dash, D.; Sengupta, S. A systematic analysis of eluted fraction of plasma post immunoaffinity depletion: Implications in biomarker discovery. PLoS One 2011, 6 (9), e24442. (21) Wenger, C. D.; Lee, M. V.; Hebert, A. S.; McAlister, G. C.; Phanstiel, D. H.; Westphall, M. S.; Coon, J. J. Gas-phase purification enables accurate, multiplexed proteome quantification with isobaric tagging. Nat. Methods 2011, 8 (11), 933−935. (22) Dayon, L.; Sonderegger, B.; Kussmann, M. Combination of gasphase fractionation and MS3 acquisition modes for relative protein quantification with isobaric tagging. J. Proteome Res. 2012, 11 (10), 5081−5089. (23) Ting, L.; Rad, R.; Gygi, S. P.; Haas, W. MS3 eliminates ratio distortion in isobaric multiplexed quantitative proteomics. Nat. Methods 2011, 8 (11), 937−940. (24) Shliaha, P. V.; Jukes-Jones, R.; Christoforou, A.; Fox, J.; Hughes, C.; Langridge, J.; Cain, K.; Lilley, K. S. Additional Precursor Purification in Isobaric Mass Tagging Experiments by Traveling Wave Ion Mobility Separation (TWIMS). J. Proteome Res. 2014, 13 (7), 3360−3369. (25) Sturm, R. M.; Lietz, C. B.; Li, L. Improved isobaric tandem mass tag quantification by ion mobility mass spectrometry. Rapid Commun. Mass Spectrom. 2014, 28 (9), 1051−1060. (26) McAlister, G. C.; Nusinow, D. P.; Jedrychowski, M. P.; Wuhr, M.; Huttlin, E. L.; Erickson, B. K.; Rad, R.; Haas, W.; Gygi, S. P. MultiNotch MS3 Enables Accurate, Sensitive, and Multiplexed Detection of Differential Expression across Cancer Cell Line Proteomes. Anal. Chem. 2014, 86 (14), 7150−7158. (27) Karp, N. A.; Huber, W.; Sadowski, P. G.; Charles, P. D.; Hester, S. V.; Lilley, K. S. Addressing accuracy and precision issues in iTRAQ quantitation. Mol. Cell. Proteomics 2010, 9 (9), 1885−1897.

3845

dx.doi.org/10.1021/pr500635f | J. Proteome Res. 2014, 13, 3837−3845