Semi-Targeted Plasma Proteomics Discovery Workflow Utilizing Two

Oct 11, 2010 - A quantitative proteomics workflow was implemented that provides extended plasma protein coverage by extensive protein depletion in ...
0 downloads 0 Views 3MB Size
Semi-Targeted Plasma Proteomics Discovery Workflow Utilizing Two-Stage Protein Depletion and Off-Line LC-MALDI MS/MS Peter Juhasz,* Moira Lynch, Mahadevan Sethuraman, Jennifer Campbell, Wade Hines, Manuel Paniagua, Leijun Song, Mahesh Kulkarni, Aram Adourian, Yu Guo, Xiaohong Li, Stephen Martin, and Neal Gordon BG Medicine, 610N Lincoln Street, Waltham, Massachusetts 02451, United States Received June 25, 2010

A quantitative proteomics workflow was implemented that provides extended plasma protein coverage by extensive protein depletion in combination with the sensitivity and breadth of analysis of twodimensional LC-MS/MS shotgun analysis. Abundant proteins were depleted by a two-stage process using IgY and Supermix depletion columns in series. Samples are then extensively fractionated by two-dimensional chromatography with fractions directly deposited onto MALDI plates. Decoupling sample fractionation from mass spectrometry facilitates a targeted MS/MS precursor selection strategy that maximizes measurement of a consistent set of peptides across experiments. Multiplexed stable isotope labeling provides quantification relative to a common reference sample and ensures an identical set of peptides measured in the set of samples (set of eight) combined in a single experiment. The more extensive protein depletion provided by the addition of the Supermix column did not compromise overall reproducibility of the measurements or the ability to reliably detect changes in protein levels between samples. The implementation of this workflow is presented for a case study aimed at generating molecular signatures for prediction of first heart attack. Keywords: plasma • proteomics • depletion • Supermix • iTRAQ • MALDI • MS/MS • biomarkers

Introduction The discovery and subsequent validation of clinically relevant biomarkers is being widely pursued as an approach to provide better diagnosis and management of disease. While there are numerous sample types available to support initial biomarker discovery, the most desirable sample types for clinical implementation are universally available and minimally invasive to the patient. Consequently, circulating body floods, such as blood or urine, continue to be the sample source of choice. In recent years, with the development of a new breed of broad surveying or “omics” measurement techniques, combined with fully sequenced genomes offering a better framework for interpreting biological measurements, the molecular interrogation of biology and disease at a systems level is emerging, leading to novel types of molecular diagnostics. The first wave of commercial tests measure gene expression signatures in tumor tissue,1 and more recently, molecular diagnostics has also embraced protein-based signatures.2 Protein signatures, by definition, require the measurement of multiple proteins from a single sample. While conceptually it is possible to measure individual proteins sequentially, analogous to the well-accepted techniques utilized during routine “blood workup”, discovery of novel protein-based molecular signatures is greatly facilitated by high content protein screening methodologies. The greater the number of * To whom correspondence should be addressed. E-mail: pjuhasz@ bg-medicine.com.

34 Journal of Proteome Research 2011, 10, 34–45 Published on Web 10/11/2010

proteins screened, the higher the likelihood of obtaining a viable signature. From the repertoire of modern analytical technologies, mass spectrometry is the one best suited to quantify large number of proteins from a single sample.3,4 While the capabilities of “shotgun” proteomics are impressive from the perspective of detecting a very extensive, nearcomplete set of proteins from individual cell types5 or simple organisms,6 its translation to plasma-based clinical discovery studies has presented two general challenges that to date have not been overcome. Abundances of proteins constituting the plasma proteome span an extreme dynamic range:7 nearly 12 orders of magnitude is the concentration difference between the most abundant plasma protein, serum albumin, and circulating cytokines, drivers of numerous, fundamentally important disease processes. Serum albumin, the most abundant circulating protein makes up about two-thirds of the entire protein content of plasma. In the top three decades of protein concentration (between 60 mg/mL and 60 µg/mL), there are likely fewer than 100 different genes that are represented. Depletion of abundant plasma proteins prior to analysis is a common strategy for increasing the number of proteins detectable with mass spectrometry. Several strategies have been described in the literature8-10 and the most common ones rely on antibody-based retention of a selected set of the abundant plasma proteins. More recently, a two-stage antibody-based depletion scheme was developed11 to bring more proteins into a similar concentration window, resulting in more than dou10.1021/pr100659e

 2011 American Chemical Society

research articles

Semi-Targeted Plasma Proteomics Discovery Workflow bling of the number of detected proteins relative to single state depletion. The first stage depletes 12-14 abundant proteins using antibodies specifically generated to each of the 12-14 proteins. Antibodies used for the second stage of depletion are prepared to an essentially undefined, complex set of proteins, using the flow-through from the first stage depletion as immunogen for production of polyclonal antibodies. This technology has recently become commercially available and is referred to as “Supermix” depletion.12,13 An alternative to antibody-based depletion is provided by beads carrying a combinatorial peptide ligand library.14 These beads are available under the commercial name of “ProteoMiner” and utilize an approach that compresses the dynamic range of protein concentrations. When complex biological samples are applied to the beads, the high-abundance proteins saturate their high affinity ligands and excess protein is washed away. In contrast, the medium- and low abundance proteins are concentrated on their specific affinity ligands. This reduces the dynamic range of protein concentrations while maintaining a broad representation of all proteins in the original sample. To date, the evaluation of these depletion technologies has been largely limited to assessing the improvements in the coverage of the plasma proteome.12,13 Less work is published on how these depletion techniques can be utilized within a biomarker discovery workflow.15 A particularly important question is whether improved protein coverage has been achieved at the expense of analytical variability to the extent that true biological differences are overwhelmed by the increased analytical “noise” in the system. Another challenge to the efficient use of shotgun proteomics for plasma-based discovery studies aimed at detecting molecular signatures of phenotypical differences in study samples is posed by the random nature of peptide detection in multidimensional LC-MS/MS experiments. A consequence of the large dynamic range of the plasma proteome is that only a small fraction of all the proteins in circulation are routinely measured. It is unavoidable that the random selection of peptide precursors for MS/MS-based identification and quantification yields peptide/protein sets that vary largely from experiment to experiment, or sample to sample. Repetitive analyses can improve the consistency of measured proteins in large sample sets at the expense of significantly increased analysis times, however, rendering such approaches impractical.16 This problem has led to the widespread use of targeted measurement practices utilizing multiple reaction monitoring type experiments (LC-MRM) for selected targets.17-19 Multiple reaction monitoring ensures that all the target analytes are consistently measured in every study sample and the relatively high speed of these measurements makes them attractive as a verification tool beyond their utility in discovery.20 However, it is important to recognize that the number of proteins measurable in a single LC-MRM experiment is much less than that in a true shotgun experiment because the achievable sensitivity is more modest unless MRM detection is coupled with an appropriate enrichment strategy.21,22 Such enrichment strategies typically further increase the specificity of the analysis, rendering MRM more specialized as a validation tool and less useful for discovery. Present work describes a semitargeted workflow that combines the benefits of extensive protein depletion with the sensitivity and the breadth of analysis of a two-dimensional LC-MS/MS shotgun analysis. Off-line LC-MALDI MS/MS coupling is utilized facilitating an MS/MS precursor selection strategy that maximizes targeting consistency in two-dimen-

Table 1. Clinical Baseline Characteristics of Sample Collection for Cardiovascular Study; Cases Are Defined by the Occurrence of Myocardial Infarct within 4 Years of Baseline Examination characteristic Age, years Gender, male % Body mass index, kg/m2 Smoking status, % Diabetes, % Blood pressure, mmHg Diastolic Systolic Statin use, % Antihypertensive therapy, % Diuretic therapy, %

cases (n ) 252)

controls (n ) 499)

p-value

68.7 ( 10.9 62.3 27.7 ( 4.9 42.6 12.4

68.6 ( 10.9 62.3 26.5 ( 4.0 47.5 6.2

matched matched