Integrated Pipeline for Mass Spectrometry-Based ... - ACS Publications

Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, and Abramson Family Cancer Research. Institute, University of Pennsylvania School o...
0 downloads 0 Views 431KB Size
Integrated Pipeline for Mass Spectrometry-Based Discovery and Confirmation of Biomarkers Demonstrated in a Mouse Model of Breast Cancer Jeffrey R. Whiteaker,†,‡ Heidi Zhang,†,‡,§ Lei Zhao,†,‡ Pei Wang,†,‡ Karen S. Kelly-Spratt,†,‡ Richard G. Ivey,†,‡ Brian D. Piening,† Li-Chia Feng,† Erik Kasarda,† Kay E. Gurley,† Jimmy K. Eng,† Lewis A. Chodosh,| Christopher J. Kemp,† Martin W. McIntosh,† and Amanda G. Paulovich*,† Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, and Abramson Family Cancer Research Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104 Received April 9, 2007

Despite their potential to impact diagnosis and treatment of cancer, few protein biomarkers are in clinical use. Biomarker discovery is plagued with difficulties ranging from technological (inability to globally interrogate proteomes) to biological (genetic and environmental differences among patients and their tumors). We urgently need paradigms for biomarker discovery. To minimize biological variation and facilitate testing of proteomic approaches, we employed a mouse model of breast cancer. Specifically, we performed LC-MS/MS of tumor and normal mammary tissue from a conditional HER2/Neu-driven mouse model of breast cancer, identifying 6758 peptides representing >700 proteins. We developed a novel statistical approach (SASPECT) for prioritizing proteins differentially represented in LC-MS/MS datasets and identified proteins over- or under-represented in tumors. Using a combination of antibodybased approaches and multiple reaction monitoring-mass spectrometry (MRM-MS), we confirmed the overproduction of multiple proteins at the tissue level, identified fibulin-2 as a plasma biomarker, and extensively characterized osteopontin as a plasma biomarker capable of early disease detection in the mouse. Our results show that a staged pipeline employing shotgun-based comparative proteomics for biomarker discovery and multiple reaction monitoring for confirmation of biomarker candidates is capable of finding novel tissue and plasma biomarkers in a mouse model of breast cancer. Furthermore, the approach can be extended to find biomarkers relevant to human disease. Keywords: biomarker • breast cancer • mouse • HER2 • mass spectrometry • osteopontin • multiple reaction monitoring • SISCAPA

Introduction Protein biomarkers dramatically impact cancer patient care. For example, the discovery of breast tumor tissue markers such as estrogen receptor and HER2 have led to the development and directed use of targeted hormonal1 and antibody2 therapies. Despite their potential to impact breast cancer management throughout the natural history of disease, including risk assessment, early detection, treatment selection, monitoring response to therapy, and surveillance for recurrence, very few biomarkers are in clinical use.3 Several obstacles have limited the translation of protein biomarkers into the clinic. First, despite tremendous advances in proteomics, current technologies are unable to globally * To whom correspondence shouldbe addressed. Amanda G. Paulovich, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N., Seattle, WA 98109. E-mail, [email protected]. † Fred Hutchinson Cancer Research Center. ‡ These authors contributed equally. § Present address: Amgen, Thousand Oaks, CA 91320. | University of Pennsylvania School of Medicine.

3962

Journal of Proteome Research 2007, 6, 3962-3975

Published on Web 08/21/2007

interrogate the human proteome; for example, discovery of tumor-derived biomarkers directly in plasma is challenging because the 20 most abundant plasma proteins account for 99% of the total protein mass and impede detection of lower abundance tumor antigens.4 Recent work coupling mass spectrometry (MS) to multidimensional fractionation has produced a large catalog of proteins identified in serum and/or plasma;5,6 however, much of the success relies on sophisticated instrumentation and extensive fractionation, contributing to experimental variability and limiting the number of samples that can reasonably be interrogated. Second, although modern “omics” technologies routinely produce lists of hundreds of biomarker candidates, refining true positives from the candidate list is a tremendous challenge. The downstream need to develop quantitative proteomic assays (e.g., Enzyme-Linked Immunosorbent Assay; ELISA) for potential candidates is prohibitively resource- and time-intensive, creating a bottleneck at the interface of biomarker discovery and confirmation. Finally, uncharacterized genetic and environmental heterogeneity among humans, as well as practical difficulties in obtaining large 10.1021/pr070202v CCC: $37.00

 2007 American Chemical Society

LC-MS-Based Biomarker Discovery in a Mouse Model

quantities of human clinical specimens, severely hamper the discovery process. These variables, coupled with the high dimensionality of proteomics datasets, create a tremendous challenge to finding statistically relevant biomarkers. Because of these issues, an approach to biomarker development should be well conceived and play to the strengths of current technologies while acknowledging and addressing the limitations.7 In this work, we demonstrate an MS-based process for biomarker discovery and confirmation that employs an undirected, shotgun-based mode of MS for biomarker candidate discovery in diseased tissue followed by a targeted multiple reaction monitoring (MRM) mode of MS for candidate confirmation in tissue as well as plasma. Although ultimate clinical validation of biomarkers must contend with variability arising from genetic, environmental, and behavioral differences among humans, optimization of the discovery and candidate confirmation processes involves controlling as many biological variables as possible so that the technologies being employed can be directly evaluated. Hence, as a proof-of-principle study, we chose to initiate this work in a highly controlled inbred mouse model that recapitulates features of human breast cancer.8 Because plasma-based discovery of low-abundance tumor markers is analytically challenging,4,9 we chose to discover biomarker candidates directly in tumor tissues where they may be more abundant, hypothesizing that some proteins originating in the tissue could subsequently be monitored in the bloodstream. Leaky capillary beds,10 local production of proteases,11 and the high rates of cell death found in tumor tissue are expected to facilitate the shedding or secretion of tumor proteins into the bloodstream. Additionally, by targeting the more abundant proteins in tumor tissue, it was not necessary to perform extensive fractionation of the sample, thus minimizing experimental variation and allowing for many technical replicates to be acquired such that the strength of individual candidates could be determined. Because a list of candidate biomarkers requires downstream confirmation and clinical validation, we develop a novel statistical algorithm (SASPECT) to prioritize biomarker candidates based on a q-value. Furthermore, we confirm the differential expression of a number of candidates in tissue using quantitative mass spectrometry and demonstrate qualification of osteopontin and fibulin-2 as circulating markers of breast cancer in the mouse plasma. On the whole, this study provides an assessment of a pipeline for biomarker discovery and confirmation that could be applied to human samples to discover biomarkers of human disease.

Experimental Antibodies, ELISAs, and Synthetic Peptides. Antibodies against mouse Osteopontin (Western blots; AF808), Cathepsin D (AF1029), Cathepsin B (MAB965), as well as recombinant mouse Osteopontin protein (441-OP/CF) and ELISA (#DY441) were purchased from R&D systems. Anti-Pak2 (#2608), antiNpm1 (#3542), and anti-PCNA (#2586) antibodies were obtained from Cell Signaling Technology. Anti-Ddx5 (ab10261) antibody was purchased from Abcam, and anti-γ-Tubulin was purchased from Sigma (T5192). HRP-conjugated horse antimouse and goat anti-rabbit secondary antibodies were obtained from Cell Signaling Technology. HRP-conjugated mouse antigoat was obtained from Jackson ImmunoResearch Laboratories. Affinity purified polyclonal rabbit anti-peptide antibody was generated against peptide IGPAPAFAGDTISLTITK (from mouse fibulin-2, Fbln2) as part of this study. Dynabeads Protein G

research articles magnetic beads were obtained from Dynal Biotechnology. Light and heavy stable isotope-labeled (13C, 15N Lysine or Arginine amino acids) synthetic peptides were obtained from SigmaAldrich. Mouse Models. All mouse work was done under IACUC regulations as approved by the FHCRC animal use committee. Three breast cancer models were used: (1) doxycyclineinducible, MMTV-rtTA/TetO-NeuNT mice;8 (2) MMTV LTRdriven polyoma middle T-driven;12 and (3) MMTV-driven Wnt1.13 Squamous cell carcinomas were induced by DMBA/TPA treatment14 on p19Arf heterozygous mice;15 TPA-only treated mice were used for controls. For lung tumor induction, A/J mice were injected with urethane (i.p., 1 mg/g body weight) at 3 weeks of age. Two GI tumor models were used: Apc1638N and ApcMin mice, both of which contain a heterozygous germline mutation in the Apc gene on a C57BL/6 genetic background.16,17 The TRAMP prostate tumor model contains the rat probasin gene driving SV40 T antigen expression in secretory epithelial cells of the murine prostate.18,19 To avoid bias, experimental and control mice were paired at weaning and were matched with respect to age, sex, litter, cage, and treatment protocols. All mice were fed standard laboratory chow (Harland Teklad, 8664) and acidified water ad libitum and kept on a 12 h light-dark cycle. All tumor-bearing mice and control pairs were euthanized back-to-back on the same day by use of CO2, and blood was obtained through cardiac puncture using a 1 cm3 syringe and a 23G needle. Blood was placed in a K3EDTA tube and plasma was isolated by centrifugation at 2000× g for 5 min. Aliquots were transferred to cryovials, air-evacuated with argon gas, and frozen at -120 °C before being placed in liquid nitrogen for storage. For necropsy, a vertical incision was made through the skin from neck to anus without cutting through the peritoneal lining. The skin was pulled back to expose the breast tissue, fat pads, and lymph nodes. Breast tissue was cut away from the underside of the skin and lymph node and any visible fat. Breast tumor tissue was dissected in a similar fashion, except isolation of lymph nodes and any fat was not possible due to the extent of tumor tissue. Tissues were rinsed with phosphate buffered saline to remove any visible blood and cut into pieces. Pieces for frozen preservation were placed in cryovials and snap frozen in liquid nitrogen. Pieces for paraffin embedding were placed in a cassette in normal buffered formalin (NBF) for 4-8 h. Cassettes were transferred to 70% methanol before embedding. Extraction of Protein from Mouse Tissues. For biomarker discovery experiments, normal or cancerous tissues were harvested and processed separately from 5 control and 5 tumor-bearing mice. Tissues were homogenized for 20 s in lysis buffer (50 mM Tris, 1 mM EDTA, 1 mM EGTA, and 0.05M DTT) using a PowerGen 125 homogenizer (Fisher Scientific). Lysates were rocked in the cold room for 15 min then centrifuged at 20 000× g to remove debris. Normal and tumor tissue lysates were pooled by equal mass. Pooled tissue lysates were digested by trypsin (see below for details) prior to analysis by mass spectrometry. A second (independent) cohort of tissues from 10 control and 10 tumor-bearing mice was harvested for quantitative mass spectrometry (MRM) confirmation studies. Tissues were homogenized for 30 s in lysis buffer (50 mM Tris pH 8, 1 mM EDTA, 1 mM EGTA, 50 mM DTT) using a Brinkman Polytron homogenizer. Lysates were incubated in 4 °C for 15 min on a rocker then centrifuged at 20 000× g for 5 min to remove debris. Journal of Proteome Research • Vol. 6, No. 10, 2007 3963

research articles Normal and tumor tissue lysates were pooled by equal mass. Heavy stable isotope-labeled peptides were spiked into the pooled control and tumor tissue protein lysates at 100 fmol/ 10 µg tissue lysate. Pooled tissue lysates were digested by trypsin (see below for details) prior to analysis by mass spectrometry. Western Blot Analysis. Eleven micrograms of cell lysate or 1 µL of serum proteins were transferred to 0.45 µm Nitrocellulose membrane, blocked with SuperBlock-PBS (Pierce), 0.1% Tween 20 (Sigma), and incubated with primary antibody overnight in 1 × PBS, 10% SuperBlock, 0.1% Tween 20. Membranes were washed in 1 × PBS, 0.1% Tween 20. Secondary antibody and Streptavidin-HRP were added for 1 h, washed 1 × PBS, 0.1% Tween 20 and developed with LumiGlo substrate (Cell Signaling Technology). Enrichment with Anti-Peptide Antibody. One-hundred micrograms of fibulin-2 anti-peptide antibody was added to 250 µL (7.5 mg) of Protein G beads and incubated at room temperature (RT) for 1 h to allow antibodies to attach to the surface. The beads were then washed with 1 mL of 0.2 M triethanolamine, pH 8.2. Magnetic Protein G beads with immobilized antibodies were incubated with tryptic digests of serum at 4 °C overnight as described previously.20 The beads were washed with 1 mL water and the bound peptides were eluted by incubating the beads in 40 µL of 5% acetic acid for 8 min at RT. Preparation of Samples for Mass Spectrometry. Tissue lysates were clear (no visible blood contamination). The protein concentration of each lysate was determined using Bradford QuickStart Assay (Bio-Rad Laboratories). For biomarker discovery, two pools were prepared, one containing equal mass of protein from 5 tumor lysates and one from 5 normal lysates. Additionally, for biomarker confirmation studies, two pools of plasma were prepared for quantitative MRM containing equal mass of protein from 10 tumor-bearing mice and 10 normal mice. Pooled lysates were separately denatured and reduced with 60% methanol and 10 mM dithiothreitol (DTT) at 60 °C for 1 h and alkylated with 50 mM iodoacetamide (IAM) at room temperature in the dark for 30 min. Ammonium bicarbonate (50 mM) was added to achieve a final methanol concentration of 20%. The samples were digested with trypsin (Promega) at 37 °C for 6 h at a protein-to-enzyme ratio of 50:1 (w/w), dried in a SpeedVac, and resuspended in 50 mM NH4HCO3. It is estimated that approximately 12 pmol of peptides were injected per LC-MS/MS analysis. Pooled plasma was depleted by the mouse Multiple Affinity Removal System (MARS, Agilent Technologies). Plasma was diluted 5-fold with Buffer A and filtered using 0.22 µm filters by centrifugation at 16 000× g at room temperature for 1 min. One-hundred microliters of diluted and centrifuged plasma was injected on a 4.6 × 50 mm MARS column at a flow rate of 0.25 mL/min for 9 min on a BioCad Vision High Performance Liquid Chromatography (HPLC) system (Applied Biosystems). The flow-through containing unbound low-abundant proteins was collected during the first 6 min (approximately 1.5 mL) and stored at -20 °C. The bound proteins were eluted by Buffer B at a flow rate of 1.0 mL/min for 5 min, and the column was regenerated by equilibrating with Buffer A at a flow rate of 1 mL/min for 7.5 min. The flow-through fractions were desalted with 50 mM NH4HCO3 using a 4 mL spin concentrator with a 5 kDa MWCO. Nanoliquid Chromatography (LC). Agilent 1100 nano flow systems (Agilent Technologies) equipped with micro well-plate 3964

Journal of Proteome Research • Vol. 6, No. 10, 2007

Whiteaker et al.

autosamplers and isocratic capillary pumps were used for liquid chromatography. Solvents used were water/0.1% formic acid (mobile phase A) and acetonitrile/0.1% formic acid (mobile phase B). Each system was connected to an Integrafrit trap column (2 cm × 100 µm, New Objective) and a RP-18 monolithic column (15 cm × 100 µm, Chromolith CapRod, Merck) via a microcross connector (Upchurch Scientific). The trapping column was packed in-house at a pressure of 500 psi using Atlantis C18 material (5 µm particle, Waters Corporation). Samples were loaded on the trapping column at 10 µL/min and desalted by washing with 2% B for 5 min. The LC gradient for the nano column was delivered at 0.8 µL/min and was developed from 10 to 40% B for 120 min. Normalization of Peptide Loading. An LCT Premier timeof-flight mass spectrometer (Waters Corporation) was interfaced with the nanoLC system (Agilent 1100) for LC-MS analysis. A capillary voltage of 2000 V was applied to a platinum wire in the micro-cross connector; the cone voltage was 60 V. The source temperature was 120 °C. Mass spectra were acquired over the range m/z 400-1600 every 1.0 s with a 0.05 s interscan delay time. The instrument was mass calibrated with a sodium formate solution prior to analysis. Glu-fibrinopeptide (m/z 785.8426) was used as a reference compound for mass calibration. Six ESI-TOF runs were acquired on the trypsindigested pooled protein lysates to normalize sample loading for LC-MS/MS analysis. Peptides were detected in the LCMS profiles using the msInspect feature detection tool.21 Sample concentrations were normalized (prior to subsequent shotgun LC-MS/MS analysis) by adjusting the median and 98th percentile feature intensity to be equal for both case and control samples. Electrospray Ionization Linear Ion Trap Mass Spectrometry. The nanoLC system (Agilent 1100) was connected to a linear ion trap mass spectrometer (LTQ, Thermo Scientific) equipped with a nano electrospray interface operated in the positive ion mode. Typical instrument settings include a spray voltage of 1.5 kV, an ion transfer tube temperature of 200 °C, and a collision gas pressure of 1.3 Torr. Voltages across the capillary and the quadrupole lenses were tuned for optimal signal intensity using the +2 ion of angiotensin I (m/z 649). For data-dependent experiments, the settings for dynamic exclusion include a repeat count of 1, exclusion list of 500, exclusion duration of 180 s, and exclusion mass window of 3 Da. Ion selection threshold was set at 1000 ion counts for MS/ MS. All tandem mass spectra were collected using normalized collision energy of 35%. Sequencing was conducted by acquiring one full scan spectrum (m/z 400-1600) followed by five tandem mass spectra (m/z 200-2000) of the most abundant ions. Consecutive sample runs were alternated in pairs (tumortumor-normal-normal, etc.) with extensive autosampler and column washing in between. Electrospray Ionization Hybrid Triple Quadrupole/Linear Ion Trap Mass Spectrometry. The nanoLC system (Agilent 1100) was connected to a hybrid triple quadrupole/linear ion trap mass spectrometer (4000 QTRAP, Applied Biosystems/ MDSciex) equipped with a nano electrospray interface operated in the positive ion mode. Typical instrument settings include a spray voltage of 3.0 kV, a nebulizer gas setting (GS1) of 5, and an ion source temperature of 150 °C. For quantitative multiple reaction monitoring (MRM) experiments, instrumental parameters (CE, DP, CXP, EP) were optimized by infusing synthetic peptide standards. For semiquantitative MRM experiments for which synthetic peptides were not prepared, transi-

LC-MS-Based Biomarker Discovery in a Mouse Model

tions for peptides were determined empirically using data obtained in the tissue shotgun LC-MS/MS experiments. For all MRM studies, the quadrupoles were operated in unit/unit resolution, and dwell times were 15 ms. For MRM-initiated data-dependent experiments, the settings for exclusion of ions included an occurrence of 2, exclusion duration of 30 s, and exclusion mass window of 4 Da. Ion selection threshold was set at 1000 ion counts for MS/MS. All tandem mass spectra were collected using rolling collision energy. The entire sampling sequence consisted of acquiring about 200 MRM transitions followed by two tandem mass spectra (m/z 250-1500) of the most abundant ions. Analysis of LC-MS Data. MS/MS data were searched using X!Tandem22,23 configured with a scoring plug-in24 compatible with PeptideProphet25 analysis. This entire processing pipeline is distributed freely as part of the Computational Proteomics Analysis System.26 All data were searched against version 3.02 of the IPI sequence database that was released on January 6, 2005. All searches were performed with tryptic enzyme constraint allowing for up to two missed cleavages. Peptide MH+ mass tolerances were set at (2.0 Da. Oxidized methionine was set as a variable modification, and cysteine residues were considered alkylated. We considered peptide identifications “correct” if the PeptideProphet score was g0.95; peptides that were identified with both unmodified and oxidized forms of methionine were only counted once. MRM data was analyzed using MultiQuan (MDS Sciex). Transitions were grouped according to the protein of interest. Transitions for the peptide of interest were required to have the same retention times in normal and cancer samples and multiple transitions to the same peptide also were required to have the same retention time. A signal-to-noise threshold of 3 was used to determine the limit of detection. Estimation of the False Discovery Rate (FDR). To evaluate differential protein expression in tumor versus normal samples, we developed a statistical procedure, SASPECT (Significant AnalysiS of PEptide CounTs), which is described in detail in the Supporting Information. An open source R-package implementing SASPECT algorithm is publicly available at http:// peiwang.fhcrc.org/research-project.html, as is a zip archive containing the data, programs, and scripts used to calculate the false discovery rates. Public Access to the Dataset. Data are available at https:// proteomics.fhcrc.org/CPAS/. Navigate to the Published Experiments folder, then to “HER2 mouse breast MS2”. Data include LC-MS/MS search results, the Pep3D images in PNG format, and input data to the FDR calculations.

Results and Discussion Biomarker Discovery using Tissue-Based Comparative Proteomics. Traditional biomarker discovery work using 2D gel electrophoresis or immunological detection of tumor antigens has recently been supplemented by mass spectrometrybased discovery approaches that have spawned hope and controversy,27 but no consensus as to the best approach for biomarker discovery and as yet no successes in identifying new clinical diagnostics. We set out to evaluate a process for biomarker discovery and confirmation in which unbiased comparative tissue proteomics-based discovery is phased into a targeted, candidate-driven confirmation stage in both tissue and plasma. To minimize biological variation and facilitate testing of a variety of proteomic approaches for biomarker discovery, we chose to use an inbred mouse model of breast

research articles cancer as a highly controlled system where biological “noise” could be minimized. For example, the uncontrollable genetic and environmental variation among humans diminishes the statistical power of biomarker discovery. Furthermore, there are many molecular subtypes of human cancers; although some of these subtypes have been identified (e.g., estrogen receptor or HER2/Neu positive), most are poorly characterized. Despite the presence of some residual biological variation,28 inbred mouse models minimize these confounding variables thereby increasing the statistical power of biomarker discovery experiments and allowing the performance of the technologies used to be more directly assessed. The conditional HER2/Neu-driven mouse model of mammary cancer has been described previously.8 Briefly, the tetracycline regulatory system is used to conditionally express an activated Neu oncogene in the mammary epithelium of transgenic mice. When induced by the addition of doxycycline to the drinking water, bitransgenic MMTV-rtTA/TetO-NeuNT mice express activated Her2p in mammary epithelium and develop ductal hyperplastic changes that evolve focally into multiple invasive mammary carcinomas; in contrast, singly TetO-NeuNT transgenic control mice remain disease-free. Five tumor-bearing, bitransgenic mice and five healthy single TetO-NeuNT transgenic control mice were sacrificed when the tumor size reached ∼1 cm (after 10-14 weeks of doxycycline exposure). Although tumors were clearly visible at this point, the mice otherwise appeared to be healthy. Histological analysis of the tumors (Figure 1A) revealed invasive solid nodular carcinomas typical of Neu/ErbB2-initiated mammary tumors.29 Equivalent protein mass from tissue lysates from 5 independent tumor-bearing and 5 independent control mice were pooled and processed in parallel. To confirm expression of the HER2/ Neu protein, the individual protein lysates as well as the protein lysate pools were analyzed by Western blot analysis using an anti-ErbB2/HER2 antibody. As expected, only tumor-derived lysates showed significant expression of the transgene (Figure 1B). To create a database of protein biomarker candidates, we generated an extensive catalog of proteins identified by LCMS/MS from the pooled tumor and normal tissue lysates. Since our goal was ultimately to assess the strength of individual biomarker candidates, a large number of technical replicates were performed to ensure that sufficient observations of peptides were obtained to make conclusions regarding the statistical significance of individual candidates (described in more detail below). Each of the two pools, cancer and normal, was subjected to 42 independent LC-MS/MS analyses. Figure 2 shows the cumulative number of unique peptide and protein identifications in cancer and control samples. The cumulative number of unique, high-confidence peptide identifications from normal tissue was 3589 (Table 1). Despite equalization of the amount of peptides injected from the normal vs tumor tissue lysates, the cumulative number of peptides identified was higher for tumor tissue (5450). One potential cause of this differential could have been that, although caution was taken to wash away blood from the tissues at harvest, the normal lysates could have been more contaminated with blood than tumor lysates. Because of the extraordinary dynamic concentration range of plasma proteins (g1010) and the unusually high abundance of albumin (∼60 mg/mL),4 any significant residual contamination of blood in lysates could result in suppression of ionization of tissue peptides. Overall, serum albumin generated more peptide Journal of Proteome Research • Vol. 6, No. 10, 2007 3965

research articles

Whiteaker et al.

Figure 1. Histopathology of tumor and normal mammary tissue. (a) H&E of tissue sections from MMTV/TAN normal mouse mammary tissue and solid mammary tumors from early stage (4 weeks) and late stage (14 weeks) tumorigenesis. 40 × mag. (b) Western blot of individual and pools of tissue lysate probed with anti-ErbB2/HER2 antibody. ErbB2/HER2 is detected in the mammary carcinomas of tumor-bearing bitransgenic MMTV-rtTA/TetO-NeuNT mice (right), but not in the normal mammary tissue of healthy single MMTV-rtTA transgenic controls (left). (c) H&E stain of mammary tumor from Wnt-1 mice showing papillary (top) and comedo (bottom) tumor types. 40 × mag.

Figure 2. Cumulative unique peptide and protein identifications from 42 shotgun LC-MS/MS runs in normal mammary tissue lysate (left panels) or cancerous mammary tissue lysate (right panels). Each box plot along the curves indicates the number of unique peptides or proteins that are identified on any combination of N number of runs where N is the x-axis value ranging from 1 to 42. A PeptideProphet cutoff of g0.95 was used. For simplicity, in this figure, protein counts are based on single high confidence peptide hits; more stringent criteria, requiring multiple peptide hits, are shown in Table 1 and listed in Supplemental Table 1 (annotated with q-values). Note that the purpose of the large number of technical replicates was to ensure that enough observations were accumulated to achieve statistical power to make predictions about the relative abundance of a protein in the tumor vs normal tissue (see SASPECT algorithm discussion in text).

identifications than that of any other protein in these tissue samples, representing 7097 total identifications (IDs) from 55 unique peptides, suggesting that residual contamination from blood was present; however the number of IDs were distributed equally between case and control pools (p e 0.6455), demonstrating that residual plasma protein contamination was not the cause for the difference. An alternative explanation is that the difference in cumulative identifications in tumor vs normal 3966

Journal of Proteome Research • Vol. 6, No. 10, 2007

tissue could be due to a greater number of cellular proteins from tumor tissue (vs normal) being present in the top 3-4 orders of magnitude of all tissue protein abundances (since these were the proteins most likely to be identified by LCMS/MS). This could happen, for example, if the complexity of tissue varies between normal and tumor (i.e., differences in cell types). We considered using laser-captured tissue to focus on epithelial components; however, we chose against this option

research articles

LC-MS-Based Biomarker Discovery in a Mouse Model Table 1. Summary of Protein and Peptide Identifications Accumulated Over 42 LC-MS/MS Analyses of Either Normal Breast Tissue or Breast Cancer Tissue Lysates

tissue lysate

peptide identifications (PeptideProphet g0.95)

1 peptide

2 peptides

g3 peptides

normal cancer totals total unique

3589 5450 9039 6758

1000 1669 2669 1917

493 859 1352 981

350 577 927 686

protein identifications based on:

for 4 reasons: (i) tumors were far and away comprised of epithelial cells (see Figure 1A), (ii) a growing body of literature on tumor-stromal interactions indicates that biomarkers can emanate from any cell type, (iii) the number of cells required for our analyses makes laser capture microdissection impractical, and (iv) we had concerns that the tissue preparation and laser capture procedure could compromise the quality and reproducibility of the mass spectrometry data. Hence, we chose to very stringently account for the different number of identifications in tumor vs normal in calculating false discovery rates (see below and Supporting Information). Determining Differential Protein Expression in Tumors Using a Peptide Counting Approach. Our goal was to discover candidate biomarkers that are differentially expressed in tumor vs normal mammary tissues. Downstream of the discovery process, considerable resources must be devoted to developing quantitative assays for confirmation and validation of candidates as biomarkers. Therefore, it is imperative to estimate the false discovery rate (FDR) in biomarker discovery experiments to ensure that only the highest confidence candidates are invested in for further work. For this discussion, we define the FDR as the percent of proteins misclassified as being differentially expressed in tumor tissue. To evaluate differential protein expression in tumor vs normal samples, we developed a statistical procedure that is related to the spectral counting approach.30-32 It has been demonstrated in LC-MS/MS data that the frequency and number of peptides sequenced for a given protein provides an estimate of the protein’s abundance;30,31 we used this parameter to rank biomarker candidates potentially over- and underrepresented in tumor vs normal tissue lysates. The assumption is that the probability of a protein’s being observed in an LCMS/MS experiment is proportional to its expression level in the sample. Thus, deciding whether one protein is differentially expressed between the tumor group and the normal group is equivalent to testing whether the protein is observed significantly more frequently in tumor tissues than in normal tissues. We developed a scoring algorithm, SASPECT (Significant AnalySis of PEptide CounTs), that calculates a q-value for each candidate, accounting for multiple sources of error, as described below. Compared with other similar studies in the microarray literature (i.e., searching for differentially expressed

genes), the current problem has three new challenges: (i) we observe peptides in the experiment, while we need to make inferences for proteins; (ii) errors in database searches for identification of peptides/proteins must be accounted for; (iii) the undersampling of ions by MS/MS33 may result in artificial differences (unrelated to biology) in the total number of identified peptides between normal and tumor samples (e.g., as seen in Table 1). To address these challenges, we implemented an Expectation-Maximization (EM) model34 to make inferences about proteins based on peptide observations, PeptideProphet25 scores to account for the errors from database search algorithms, and rescaling (conditioned on total number of identified CID) to remove the artificial effect due to experimental limitation. Then, to account for multiple hypotheses testing, the sample labels were permuted to generate the null distribution of the test statistics. The mathematical derivation of the SASPECT algorithm is described in detail in the Supporting Information, and an open source R-package implementing SASPECT algorithm has been made publicly available; see Methods section. Table 2 summarizes the results, wherein 314 proteins are found to be over-represented in the tumor vs normal tissue dataset at an FDR < 0.10. (A detailed list of identified proteins and their q-value is provided in Supplemental Table 1.) Confirmation of Tissue Biomarker Candidates Using Commercially Available Reagents. We next sought to experimentally determine whether analysis of our shotgun-based candidate discovery by SASPECT was able to accurately predict tissue biomarker candidates. Because building de novo assays to all candidates is prohibitively expensive, we first looked to commercial sources for available assay reagents such as antibodies for Western blotting or ELISA measurements. Studies using Western blotting were challenging because: (i) Western blotting is neither quantitative nor high throughput, (ii) no antibody was available for most candidates, and (iii) many of the available antibodies gave very high background and therefore inconclusive data. For example, of the 158 most probable tumor biomarker candidates (Table 2; FDR < 0.01), antibodies could only be found in the Biocompare database for 24. We obtained 18 antibodies and performed Western blot analysis of tumor vs normal tissue protein lysates. Eleven antibodies yielded inconclusive results (due to high number of background bands), 7 were conclusive and confirmed the MS data (Figure 3A), and none conclusively disproved the MS data. Note that, despite rigorous experimental control, some of the protein biomarkers show significant variation among the mice (Figure 3; Ddx5, Npm1); although these proteins had higher expression in the cancer pools (and hence were correctly “flagged” as potential biomarkers by the SASPECT algorithm), their variation among individual animals will likely confound their use as diagnostic markers. ELISA, the current gold standard for clinical proteomic assays, is a more quantitative, higher throughput alternative

Table 2. Number of Proteins Over- or Under-Represented in the Tumor vs Normal Tissue Pools at Various FDR Thresholds Calculated Using the SASPECT Algorithm to Analyze the Shotgun LC-MS/MS Data From Tumor vs Normal Breast Tissue protein identifications based on:

FDR cutoff

proteins overrepresented in tumor

proteins underrepresented in tumor

1 peptide

2 peptides

g3 peptides

0.005 0.01 0.05 0.1

126 158 243 314

0 0 0 77

0 1 5 18

22 29 40 71

104 128 198 225

Journal of Proteome Research • Vol. 6, No. 10, 2007 3967

research articles

Whiteaker et al. Table 3. ELISA Results Measuring Osteopontin (OPN) Protein Concentrations in Tissue and Serum Samples From the HER2/Neu Mouse Modela sample

tissue

class

normal

cancer

plasma (10 weeks)

normal

cancer Figure 3. Western blot analysis confirming differential expression of biomarker candidates in normal and tumor tissue lysates (a) or plasma (b). Note that cathepsin B is differentially processed in normal and tumor tissues; the 27 kD mature enzyme is seen in normal tissue while in tumor the enzyme is processed into the 5.2 kD light chain and the 22.2 kD heavy chain.

plasma (4 weeks)

Normal

cancer

to Western blotting; however, very few ELISAs were commercially available for the candidate biomarkers identified in this study. Of the biomarker candidates identified in our dataset, an ELISA was available for osteopontin (OPN; Spp1), which was of particular interest because it was notably overrepresented in the tumor tissue (Figure 3A) and had been implicated as a biomarker in human breast cancers.35,36 Originally identified as a phosphoprotein secreted by transformed cells in culture37 and subsequently shown to play roles in cell adhesion, migration, and cytokine production,38 OPN has been shown to be overexpressed in a variety of human tumors and is present at elevated levels in the blood of some patients with metastatic cancers.39-50 In human breast cancer, OPN has been detected in both tumor cells and infiltrating inflammatory cells, and expression in tumor cells is associated with poor survival.51,52 Elevated circulating OPN has also been found to correlate with tumor burden and survival in patients with advanced breast cancer.53 Using the commercially available ELISA for OPN, we were able to: (i) confirm and quantify its overexpression in tumor tissue (Table 3), (ii) demonstrate its overexpression in plasma in animals with pre-clinical (4 weeks post-doxycycline induction) as well as palpable (10 weeks postdoxycycline induction) disease (Table 3, Figure 3B), (iii) confirm its absence in urine (Table 3), and (iv) characterize its behavior as a biomarker in multiple additional mouse models of human cancer (Supplemental Table 2). The paucity and poor quality of available antibodies and assays for confirmation of biomarker candidates highlight the desperate need for protein assay technologies that allow the flexibility to test hundreds of biomarker candidates in a moderate number of samples with a minimum of upfront investment of time or money. Despite the poor commercial availability of high quality assay reagents, the results described above overall suggested that shotgun-based discovery coupled to SASPECT analysis was able to identify true biomarkers and 3968

Journal of Proteome Research • Vol. 6, No. 10, 2007

urine (10 weeks)

normal

cancer

OPN (ng/mL)

%CV

3.0 2.4 0.9 5.3 1.1 171.1 857.5 216.3 886.3 140.8 11.5

7 5 2.7 7.5 3 4.4 9.3 3.4 2.3 3.3 5.8

27.7 30.5 23.5 24.3 171.1 857.5 216.3 886.3 140.8 27.9

7.0 10.0 2.7 7.8 4.4 9.3 3.4 2.3 3.3 0.0

23.4 29.3 171.1 857.5 216.3 4.4

8.5 4.2 3.8 23.8 3.7 4.4

3.8 3.1 4.3 3.4 2.7

7.9 3.4 6.0 10.0 6.0

average (ng/mL)

3

mean change (ng/mL)

451

454

24

7334

7358

27

940

967 4

1

3

a Each ELISA was performed in triplicate, and the % coefficient of variation (CV) for each sample is shown. Additionally 3-5 individual animals were used for each biospecimen. The limit of quantification was 31.3 pg/mL.

prompted us to seek alternative assay formats that could enable more comprehensive testing of our candidates. More Comprehensive Confirmation of Tissue Biomarker Candidates Using Semiquantitative and Quantitative LCMRM-MS. Selected reaction monitoring (SRM) and multiple reaction monitoring (MRM) are quantitative mass spectrometry (MS) techniques used extensively in analysis of small molecules.54,55 More recently, these techniques have been applied to protein analysis, wherein a proteotypic peptide is selected as a surrogate for the protein of interest and analyzed by MRMMS in a targeted fashion.56-58 In addition, recent work has shown the coupling of affinity-based enrichment with quantitative MRM-mass spectrometry makes possible an assay for investigation of low abundance candidates.20,59 Given the promise of these techniques and the poor availability and quality of commercial antibodies, we next assessed MRM-MS for the semiquantitative and quantitative confirmation of our candidates. First, MRM analysis was applied to semiquantitatively test whether a number of candidates were truly over-represented in the tumor tissues as predicted by the SASPECT analysis of the LC-MS/MS shotgun data. Specifically, 60 candidates were randomly chosen from the list of proteins determined to be over-represented in tumor tissue at a FDR < 0.10 (see Supplementary Table 3 for a list of chosen candidates). Optimum transitions were manually chosen for highly confi-

LC-MS-Based Biomarker Discovery in a Mouse Model

dent identified peptides originating from the protein candidates by interrogating the shotgun LC-MS/MS data. Once the MRM transitions were chosen, equal amounts of lysate from normal and tumor tissues were injected for targeted LC-MRM-MS analysis using a hybrid triple quadrupole/linear ion trap (QTRAP). Full scan MS/MS spectra were triggered when the MRM intensity reached a threshold value. The full scan MS/ MS spectra provided identifications, allowing absolute confirmation that the observed transitions arose from the peptide(s)/ protein(s) of interest. For proteins where identifications could not be assigned to the proteotypic peptide transitions chosen, at least one peptide per protein was required to have two MRM transitions observed. For semiquantitative confirmation, the relative signal intensities (peak areas) of the peptide transitions were compared in the normal and tumor tissue lysates; candidates with greater peak areas in the tumor tissue lysate (vs normal) were tentatively considered to be over-represented based on this semiquantitative analysis. In total, of 60 candidate proteins checked using this approach, 48 were semiquantitatively confirmed to be more abundant in the tumor tissue lysate, 1 was found to have no change, and 11 were not detected in the MRM analysis and were inconclusive (see Supplementary Table 3 for a detailed table of candidates, MRM transitions, observed retention times, and peak intensities). Further investment in stable isotope-labeled synthetic peptide standards renders MRM-MS a quantitative technique.56,57 To evaluate the accuracy of the semiquantitative confirmation, we chose a subset of our candidates to generate stable isotope standards and apply quantitative analysis. In total, 15 candidates were chosen based on several criteria; the proteins were required to be: (i) more abundant in the tumor tissues as predicted by the shotgun MS/MS dataset, (ii) predicted to be secreted or located on the cell surface, and (iii) previously implicated in cancer. For the 15 candidates that met the criteria, a proteotypic peptide was chosen based on empirical data from the LC-MS/MS discovery dataset. Peptides were chosen that ionized efficiently, gave predictable fragmentation, and contained no predicted sites for post-translational modifications. Heavy stable isotope standards were synthesized for the 15 peptide targets and spiked (10 fmol peptide/µg tissue lysate) into a second pool of tumor and normal tissue lysates (derived from independent mice not previously used in the shotgunbased discovery experiment), digested, and analyzed by LCMRM-MS. The peak area ratio of endogenous peptide to stable isotope standard was measured for each target, and the ratios were compared to determine the relative difference between normal and tumor tissues. Figure 4 demonstrates the application of MRM analysis with stable isotope standards for the peptide GDSLAYGLR from Osteopontin, a confirmed biomarker (Figure 3, Table 3). Chromatograms for transitions 476.3 f 508.4, 579.4 (the endogenous peptide) and 481.3 f 518.4, 589.4 (the spiked synthetic stable isotope standard peptide), were measured in normal (Figure 4A, B) and tumor (Figure 4C, D) tissue lysates. The transitions for the endogenous (light) and internal standard (heavy) peptides eluted at the same retention time. In addition, the identity of the Osteopontin-derived peptide was confirmed by acquisition of a full scan MS/MS spectrum of the precursor ion at m/z 476.3, as shown in Figure 4E. As can be seen in Figure 4, the endogenous peptide was detected in the cancer tissue lysates whereas no peaks were detected in the normal sample, demonstrating that Osteopontin was more abundant in the tumor tissue.

research articles For the 15 chosen candidates, the peak area ratio of endogenous:internal standard was calculated for normal and tumor lysates, and compared to determine the relative abundance of peptides in the samples. Table 4 summarizes the relative difference between peak area ratios measured in tumor and normal tissues using MRM with stable isotope standards for the 15 chosen biomarker candidates (see Supplemental Table 4 for a detailed table of transitions, retention times, and peak intensities). Overall, 11 of the 15 candidates were quantitatively confirmed to be more abundant in the tumor tissues compared to the normal tissue lysates. Three candidates (hypoxia upregulated 1, tenascin, and ewing sarcoma homolog) were below the limit of detection in the tissue lysates under the conditions used and hence their relative abundance could not be determined. The final candidate (legumain) exhibited average peak area ratios 1.5 times higher in the tumor lysate, which was inconclusive due to experimental variation (see Supplemental Table 4). Overall, the quantitative results confirm the overexpression in tumor tissue of candidate biomarkers predicted by SASPECT analysis of the shotgun LC-MS/MS experiment. The MRM-MS assay provides a complementary measure of confirmation for biomarker candidates. Furthermore, the ability to multiplex target analytes, run samples at a higher throughput (vs shotgun LC-MS/MS), and analyze medium and highabundance targets without an affinity reagent make MRM-MS very attractive for use at the confirmation stage of a biomarker pipeline. Characterization of Fibulin-2 as a Tissue and Plasma Biomarker. Once a cancer diagnosis has been established, tissue-based biomarkers (generally assessed by immunohistochemistry) are of clinical use for subclassifying tumors with respect to their likelihood of responding to targeted therapeutics.1,60,61 In contrast, screening human populations for detecting new disease requires monitoring biospecimens that can be less invasively obtained, such as blood plasma. Unfortunately discovery of tumor-derived biomarkers directly in plasma is challenging because the 20 most abundant plasma proteins account for 99% of the total protein mass and impede detection of lower abundance tumor antigens.4 An alternative to de novo plasma-based discovery is first to identify tissue-based markers using a shotgun MS/MS approach and then in a subsequent step to use more sensitive MRM-MS on plasma to search in a targeted manner for the presence of tumor tissue markers in the circulation. PSA and soluble Her2p are examples of tumor tissue-derived biomarkers that can also be monitored in the circulation. In this study, we have already demonstrated that OPN, which we identified as a tumor tissue biomarker, is also elevated in the plasma of affected animals (Figure 3B, Table 3). As OPN is already a well-known circulating biomarker in humans, we next asked whether we could identify a novel circulating biomarker using this approach. One of our confirmed biomarkers in the tissue lysates was fibulin-2 (Table 4), an extracellular matrix protein that serves to stabilize extracellular structures such as elastic fibers and basement membranes.62,63 Fibulins have been reported to have both tumor suppressive and oncogenic acitivities.64 Fibulin-2 has been identified as a metastasis-associated gene in solid tumors, but has also been reported as down-regulated in breast cancer cell lines and primary tumors.65,66 Deregulation of fibulin-2 expression or changes in post-translational modifications may lead to destabilization in the molecular architecture of extracellular structures, enhancing the ability of tumor cells to migrate. Thus, the presence or absence of fibulins may have Journal of Proteome Research • Vol. 6, No. 10, 2007 3969

research articles

Whiteaker et al.

Figure 4. Application of MRM analysis with stable isotope standards for the peptide GDSLAYGLR from Osteopontin. Chromatograms for transitions 476.3 f 508.4, 579.4 (the endogenous peptide) and 481.3 f 518.4, 589.4 (the spiked synthetic stable isotope standard peptide), were measured in normal (A, B) and tumor (C, D) tissue lysates. The identity of the Osteopontin-derived peptide was confirmed by acquisition of a full scan MS/MS spectrum of the precursor ion at m/z 476.3 (E).

diagnostic or prognostic applications in the clinic, making fibulin-2 an intriguing target for further investigation. To determine whether fibulin-2 was elevated in the plasma of tumor-bearing mice, we developed a quantitative assay for the proteotypic peptide IGPAPAFAGDTISLTITK. First, we spiked the heavy stable isotope-labeled version of the fibulin-2 peptide 3970

Journal of Proteome Research • Vol. 6, No. 10, 2007

into pooled plasma from mice with and without mammary tumors (40 fmol peptide/µg plasma digest). The samples were then analyzed by LC-MRM-MS analysis and the peak ratios used to determine the levels of endogenous peptide in the samples. Figure 5 shows the MRM chromatograms for the transitions 887 f 1048.6, 1434.8 (Figure 5A, C) and the internal

research articles

LC-MS-Based Biomarker Discovery in a Mouse Model

Table 4. Quantitative MRM-MS Analysis Confirming Overexpression of Biomarker Candidates In Tumor Tissue Lysatesa ipi

description

peptide

average ratio cancer/normal

IPI00309133 IPI00118892 IPI00408626 IPI00132067 IPI00123342 IPI00381239 IPI00135186 IPI00271951 IPI00403938 IPI00330862 IPI00130627 IPI00128153 IPI00132314 IPI00322492 IPI00122312

OSTEOPONTIN PLASTIN-2. TUMOR PROTEIN D52. ISOFORM 1 OF FIBULIN-2 PRECURSOR. HYPOXIA UP-REGULATED 1 ISOFORM PLEC-1B OF PLECTIN-1. CALUMENIN PRECURSOR. PROTEIN DISULFIDE ISOMERASE ASSOCIATED 4. ISOFORM 1 OF TENASCIN PRECURSOR. EZRIN. LEGUMAIN PRECURSOR. KAPPA-CASEIN PRECURSOR. NUCLEOBINDIN-1 PRECURSOR. EWING SARCOMA HOMOLOG. FIBRINOGEN, GAMMA POLYPEPTIDE

GDSLAYGLR YTLNILEDIGGGQK LGISSLQEFK IGPAPAFAGDTISLTITK LYQPEYQEVSTEEQR AGTLSITEFADMLSGNAGGFR SFDQLTPEESK VEGFPTIYFAPSGDK VPGDQTSTTIR SQEQLAAELAEYTAK DYTGEDVTPENFLAVLR GEKNDIVYDEQR AATADLEQYDR GDATVSYEDPPTAK YLQEIYNSNNQK

>238 3.4 2.9 31.3 ND >5.8 3.1 3.1 ND 29.8 1.5 >11.8 >7.1 ND 2.7

a Fold difference was calculated from relative peak area ratios for two transitions of endogenous:internal standard peptides measured in normal and tumor tissue lysate by LC-MRM-MS. Several candidates were readily detected in tumor tissue but were below the limit of detection in the normal tissue sample; in these cases, the minimum fold change was calculated (indicated by greater than sign in the table). Measurements were performed in triplicate; details are presented in Supplemental Table 4.

standard 891 f 1056.6, 1442.8 (Figure 5B, D) in the plasma of normal (Figure 5A, B) and tumor-bearing (Figure 5C, D) mice. Both transitions of the endogenous peptide were detected at the same retention time as the internal standard in the plasma of the tumor-bearing mice (between 40 and 45 min), whereas the peptide transitions were not detected in the plasma from healthy control animals, suggesting that fibulin-2 is a plasma biomarker whose levels are below the detection limit in plasma from healthy animals, as confirmed below. Although selected reaction monitoring (SRM-MS) has been applied to the quantitation of plasma protein-derived peptides,58,67-72 the majority of biomarker proteins (i.e., present at eng/mL) cannot be detected in plasma without enrichment relative to large quantities of interfering proteins (e.g., albumin, globulin, present at 30-50 mg/mL). Fortunately, the technique Stable Isotope Standards with Capture by Anti-Peptide Antibodies (SISCAPA) has been shown to provide adequate enrichment to measure proteins in the ng/mL range with sufficient precision and accuracy for screening biomarker candidates.20,59 Briefly, in the SISCAPA approach, anti-peptide antibodies are immobilized on Protein G-coated magnetic beads and used simultaneously to capture endogenous as well as spiked stable isotope-labeled peptides from plasma tryptic digests from cancer-bearing and control animals. In a subsequent step, the eluates are subjected to MRM-MS analysis, and the concentration of the endogenous peptide is determined relative to the known concentration of the spiked stable isotope-labeled standard. Because the fibulin-2 peptide was below the limit of detection in the normal plasma sample (Figure 5), to unequivocally confirm our findings, we further measured fibulin-2 using the SISCAPA approach. We generated an affinity-purified rabbit polyclonal antibody against the fibulin-2 peptide and coupled the antibody to magnetic beads. To ensure the accuracy in quantitation of fibulin-2 using SISCAPA, a calibration curve was constructed by spiking a series of known amounts of synthetic peptide into a mouse plasma digest, along with a constant amount of heavy stable isotope-labeled internal standard peptide (Figure 6). Following enrichment from plasma, the peptides were eluted off the magnetic beads and analyzed by LC-MRM-MS. The ratio of endogenous (light) peptide to the heavy stable isotope-labeled standard was used to construct a

calibration curve (Figure 6). The response was linear (R2 ) 0.9914) over the range 50-8000 ng/mL. The assay was used to measure endogenous fibulin-2 in plasma of normal and tumorbearing mice by spiking the labeled internal standard into plasma at known concentration, enriching with the beadcoupled anti-peptide antibodies, and measuring the peak area ratio of endogenous-to-heavy stable isotope-labeled internal standard. The measured peak area ratios are indicated in Figure 6 and were within the linear range of the assay. The measured peak area ratios corresponded to plasma protein concentrations of 125.8 ( 2.6 ng/mL in the pool of normal plasma and 3638 ( 54.9 ng/mL in the pool of plasma from the tumor-bearing mice, an approximately 30-fold increase of fibulin-2 in the plasma of cancerous mice. Hence, we conclude that fibulin-2 is a novel circulating biomarker in this mouse model.

Conclusions In this study we demonstrate the feasibility of a mass spectrometry-based biomarker development pipeline (Figure 7) encompassing shotgun-based discovery in diseased tissues, statistical prioritization of candidate biomarkers, and targeted high-throughput MRM-based confirmation in both tissue and plasma. This approach is attractive because targeting proteins in the top ∼3 orders of magnitude of abundance in tissues allows MS throughput during biomarker discovery to be devoted to technical and biological replicates (rather than analysis of a large number of fractions per sample). Having replicate data allows meaningful estimation of the false discovery rate, which is imperative because the downstream need to develop quantitative proteomic assays (e.g., Enzyme-Linked Immunosorbent Assay; ELISA) for potential candidates is prohibitively resource- and time-intensive, creating a bottleneck at the interface of biomarker discovery and confirmation. The pipeline described herein is also attractive because it employs a staged approach to biomarker confirmation, wherein a minimum of upfront resources are invested until evidence is accumulated to support a given candidate’s differential expression. For example, the SASPECT algorithm provides an estimate of each individual candidate biomarker’s q-value, allowing prioritization of candidates for follow-up studies. Next, semiquantitative MRM analysis (without a spiked stable isotope labeled standard) can be used to rapidly test a large number Journal of Proteome Research • Vol. 6, No. 10, 2007 3971

research articles

Whiteaker et al.

Figure 5. Quantitative multiple reaction monitoring (MRM) for confirmation of fibulin-2 as a biomarker in plasma. Two transitions (887 f 1048.6, 1434.8) corresponding to the endogenous peptide IGPAPAFAGDTISLTITK, from fibulin-2, were monitored in equal amounts of (a) normal plasma and (c) plasma from tumor-bearing mice. Likewise, two transitions from the heavy stable isotope-labeled version of the peptide (891 f 1056.6, 1442.8) were used as an internal standard for relative quantification and are shown for the (b) normal plasma and (d) plasma from tumor-bearing mice.

of biomarker candidates with moderate throughput. In our study, one operator tested 60 candidates in approximately 1 month’s time and was tentatively able to confirm 80% of the candidates as tumor biomarkers using semiquantitative MRM analysis. If commercial antibodies were available for Western blot analyses of these 60 candidates (an unlikely event), these confirmation studies, would have cost approximately $15 000 in antibody reagents alone (assuming $250/antibody). For most candidates, no commercial antibodies are available, and making a novel antibody to the 60 candidates would have cost >$60 000 and required 6-9 months of lead time. In our pipeline, no candidate-specific reagents are generated until a protein is semiquantitatively confirmed by MRM. In the dataset described herein, 11 of the 15 candidates tested by quantitative 3972

Journal of Proteome Research • Vol. 6, No. 10, 2007

MRM (via a spiked stable isotope standard) were quantitatively confirmed to be more abundant in the tumor tissues compared to the normal tissue lysates, representing an g73% success rate and leading to very efficient use of resources. Predicting which tumor tissue-based biomarkers might be in the circulation and subsequently confirming their presence in plasma present the greatest challenges to this pipeline. Limits of detection for MRM-based analysis following immunodepletion of abundant plasma proteins have been reported to be on the order of 100-1000 ng/mL.58 Although some biomarkers may be detectable at this range, many of the most clinically useful markers are expected to be in the ng/mL range or less. Hence, MRM alone or coupled to depletion of a few abundant proteins will not be sufficient for plasma screening, and either

LC-MS-Based Biomarker Discovery in a Mouse Model

research articles early detection in the mouse model may fail to do so in humans.

Figure 6. SISCAPA calibration curve and quantitation of fibulin-2 in plasma from normal and tumor-bearing mice. A six point calibration curve is plotted for the peptide IGPAPAFAGDTISLTITK, from fibulin-2. For calibration, measurements were performed in triplicate. Gray circles indicate peak area ratios determined from analyzing plasma from normal and cancer-bearing mice.

some method of biochemical pre-fractionation of plasma or antibody-based enrichment of target peptides will need to be employed, as demonstrated for fibulin-2 in this study. The volume of the tumor relative to blood volume is proportionally much greater in a mouse than a human cancer patient, raising the possibility that the concentration of tumorderived biomarkers in circulation could be much higher in mice vs humans, facilitating the detection of plasma-based biomarkers by MRM. Our data support this notion because plasma OPN levels in tumor-bearing HER2/Neu mice averaged 7358 ng/mL (Table 3), whereas circulating levels in human breast cancer patients are only ∼800 ng/mL.73 Of note, healthy humans have higher levels of circulating OPN (439 ng/mL)73 than healthy mice (24 ng/mL for the Her2/NEU controls). Hence, the presence of a breast tumor in the mouse resulted in an approximate 307-fold increase in the circulating biomarker, whereas in humans the increase was only 1.8-fold. Although the greater differential in serum marker levels between cases and controls in the mouse models bodes well for discovery efforts, it raises a question as to whether the mouse model will be as useful for validating cancer markers for early disease detection. For example, the putative more robust serum biomarker response in the mouse might be expected to cause a serum biomarker level to increase to a detectable level much earlier in the natural history of the tumor in the mouse vs human. Hence, it is possible that many markers that achieve

Although the inbred mouse model allows for exquisite experimental control, biological variation is minimized but not eliminated (Figure 3A).28 In this study, the observed variation turns out to be interesting because it highlights an important element of experimental design: the limitations of pooling individual biospecimens for biomarker discovery. For example, Npm1p shows significant variation among individual mice regardless of cancer status (Figure 3A); the SASPECT algorithm correctly identified this protein as a potential biomarker since it can be seen in Figure 3A that overall its level was higher in the pooled tumor lysates vs normal. However, this is likely a “jackpot” effect, stemming from the small pool size (5 mice/ pool); by chance, there were more mice expressing high level of Npm1p in the cancer pool than in the normal tissue pool. In contrast, DDx5p is only elevated in a subset of the breast cancers tested (Figure 3A). Although all tumors in this transgenic model were initiated by expression of an activated Her2p in the mammary epithelium, secondary stochastic somatic mutations must accumulate for the tumors to progress, likely leading to subtypes of breast cancers within any given mouse model, similar to the many subtypes of human breast cancers.74,75 Hence, it is likely that there will be biomarkers specific to subtypes of tumors in the mouse, exactly as is the case for human breast cancers (e.g., Her2-overproducing, estrogen receptor positive vs negative, p53 mutant vs wildtype, etc.). Although pooling samples from several individuals will average out biological variation during biomarker discovery, such variation will ultimately need to be addressed before emergent biomarkers can be applied clinically.

Acknowledgment. This work was funded by NCI subcontract 23XS144A as well as by generous gifts from the Canary Foundation, the Keck Foundation, and the Paul G. Allen Family Foundation. We thank Drs. Norm Greenberg and Raju Kucherlapati for providing breeder mice. Supporting Information Available: Description of SASPECT algorithm; a detailed list of ranked biomarker candidates from the tissue comparative proteomics dataset; a detailed table of MRM transitions, retention times, and peak intensities for semiqualitative confirmation studies; a detailed table of MRM transitions, retention times, and peak intensities for quantitative confirmation studies; and ELISA results of Osteopontin measured in the plasma of several other mouse models of cancer. This information is available free of charge via the Internet at http://pubs.acs.org.

Figure 7. Mass spectrometry-based biomarker development pipeline encompassing shotgun-based discovery in diseased tissues, statistical prioritization of candidate biomarkers, and targeted high-throughput MRM-based confirmation in both tissue and plasma.

Journal of Proteome Research • Vol. 6, No. 10, 2007 3973

research articles References (1) Clemons, M.; Danson, S.; Howell, A. Cancer Treat. Rev. 2002, 28, 165-180. (2) Ligibel, J. A.; Winer, E. P. Semin. Oncol. 2002, 29, 38-43. (3) Duffy, M. J. Clin. Chem. 2006, 52, 345-351. (4) Anderson, N. L.; Anderson, N. G. Mol. Cell. Proteomics 2002, 1, 845-867. (5) Shen, Y.; Kim, J.; Strittmatter, E. F.; Jacobs, J. M.; Camp, D. G., 2nd; Fang, R.; Tolie, N.; Moore, R. J.; Smith, R. D. Proteomics 2005, 5, 4034-4045. (6) States, D. J.; Omenn, G. S.; Blackwell, T. W.; Fermin, D.; Eng, J.; Speicher, D. W.; Hanash, S. M. Nat. Biotechnol. 2006, 24, 333338. (7) Rifai, N.; Gillette, M. A.; Carr, S. A. Nat. Biotechnol. 2006, 24, 971983. (8) Moody, S. E.; Sarkisian, C. J.; Hahn, K. T.; Gunther, E. J.; Pickup, S.; Dugan, K. D.; Innocent, N.; Cardiff, R. D.; Schnall, M. D.; Chodosh, L. A. Cancer Cell 2002, 2, 451-461. (9) Whiteaker, J. R.; Zhang, H.; Eng, J. K.; Fang, R.; Piening, B. D.; Feng, L. C.; Lorentzen, T. D.; Schoenherr, R. M.; Keane, J. F.; Holzman, T.; Fitzgibbon, M.; Lin, C.; Zhang, H.; Cooke, K.; Liu, T.; Camp, D. G., 2nd; Anderson, L.; Watts, J.; Smith, R. D.; McIntosh, M. W.; Paulovich, A. G. J. Proteome Res. 2007, 6, 828836. (10) Dvorak, H. F.; Nagy, J. A.; Dvorak, J. T.; Dvorak, A. M. Am. J. Pathol. 1988, 133, 95-109. (11) Nagase, H.; Woessner, J. F., Jr. J. Biol. Chem. 1999, 274, 2149121494. (12) Lin, E. Y.; Jones, J. G.; Li, P.; Zhu, L.; Whitney, K. D.; Muller, W. J.; Pollard, J. W. Am. J. Pathol. 2003, 163, 2113-2126. (13) Li, Y.; Hively, W. P.; Varmus, H. E. Oncogene 2000, 19, 10021009. (14) Kemp, C. J.; Donehower, L. A.; Bradley, A.; Balmain, A. Cell 1993, 74, 813-822. (15) Kelly-Spratt, K. S.; Gurley, K. E.; Yasui, Y.; Kemp, C. J. PLoS Biol. 2004, 2, E242. (16) Moser, A. R.; Mattes, E. M.; Dove, W. F.; Lindstrom, M. J.; Haag, J. D.; Gould, M. N. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 89778981. (17) Fodde, R.; Edelmann, W.; Yang, K.; van Leeuwen, C.; Carlson, C.; Renault, B.; Breukel, C.; Alt, E.; Lipkin, M.; Khan, P. M.; et al. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 8969-8973. (18) Greenberg, N. M.; DeMayo, F.; Finegold, M. J.; Medina, D.; Tilley, W. D.; Aspinall, J. O.; Cunha, G. R.; Donjacour, A. A.; Matusik, R. J.; Rosen, J. M. Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 3439-3443. (19) Gingrich, J. R.; Barrios, R. J.; Morton, R. A.; Boyce, B. F.; DeMayo, F. J.; Finegold, M. J.; Angelopoulou, R.; Rosen, J. M.; Greenberg, N. M. Cancer Res. 1996, 56, 4096-4102. (20) Whiteaker, J. R.; Zhao, L.; Zhang, H. Y.; Feng, L. C.; Piening, B. D.; Anderson, L.; Paulovich, A. G. Anal. Biochem. 2007, 362, 4454. (21) Bellew, M.; Coram, M.; Fitzgibbon, M.; Igra, M.; Randolph, T.; Wang, P.; May, D.; Eng, J.; Fang, R.; Lin, C.; Chen, J.; Goodlett, D.; Whiteaker, J.; Paulovich, A.; McIntosh, M. Bioinformatics 2006, 22, 1902-1909. (22) Craig, R.; Beavis, R. C. Rapid Commun. Mass Spectrom. 2003, 17, 2310-2316. (23) Craig, R.; Beavis, R. C. Bioinformatics 2004, 20, 1466-1467. (24) Maclean, B.; Eng, J. K.; Beavis, R. C.; McIntosh, M. Bioinformatics 2006, 22, 2830-2832. (25) Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Anal. Chem. 2002, 74, 5383-5392. (26) Rauch, A.; Bellew, M.; Eng, J.; Fitzgibbon, M.; Holzman, T.; Hussey, P.; Igra, M.; Maclean, B.; Lin, C. W.; Detter, A.; Fang, R.; Faca, V.; Gafken, P.; Zhang, H.; Whiteaker, J.; States, D.; Hanash, S.; Paulovich, A.; McIntosh, M. W. J. Proteome Res. 2006, 5, 112121. (27) Diamandis, E. P. Mol. Cell. Proteomics 2004, 3, 367-378. (28) Kosanke, S.; Edgerton, S. M.; Moore, D., 2nd; Yang, X.; Mason, T.; Alvarez, K.; Jones, L.; Kim, A.; Thor, A. D. Comp. Med. 2004, 54, 280-287. (29) Cardiff, R. D.; Wagner, U.; Hennighausen, L. J. Mammary Gland Biol. Neoplasia 2000, 5, 243-244. (30) Liu, H.; Sadygov, R. G.; Yates, J. R., 3rd. Anal. Chem. 2004, 76, 4193-4201. (31) Ishihama, Y.; Oda, Y.; Tabata, T.; Sato, T.; Nagasu, T.; Rappsilber, J.; Mann, M. Mol. Cell. Proteomics 2005, 4, 1265-1272. (32) Zybailov, B.; Coleman, M. K.; Florens, L.; Washburn, M. P. Anal. Chem. 2005, 77, 6218-6224.

3974

Journal of Proteome Research • Vol. 6, No. 10, 2007

Whiteaker et al. (33) Masselon, C.; Pasa-Tolic, L.; Tolic, N.; Anderson, G. A.; Bogdanov, B.; Vilkov, A. N.; Shen, Y.; Zhao, R.; Qian, W. J.; Lipton, M. S.; Camp, D. G., 2nd; Smith, R. D. Anal. Chem. 2005, 77, 400-406. (34) Dempster, A.; Laird, N.; Rubin, D. J. R. Stat. Soc. B 1977, 39. 1-38. (35) Bellahcene, A.; Castronovo, V. Am. J. Pathol. 1995, 146, 95-100. (36) Gillespie, M. T.; Thomas, R. J.; Pu, Z. Y.; Zhou, H.; Martin, T. J.; Findlay, D. M. Int. J. Cancer 1997, 73, 812-815. (37) Senger, D. R.; Wirth, D. F.; Hynes, R. O. Cell 1979, 16, 885-893. (38) Rittling, S. R.; Chambers, A. F. Br. J. Cancer 2004, 90, 1877-1881. (39) Rodrigues, L. R.; Teixeira, J. A.; Schmitt, F. L.; Paulsson, M.; Lindmark-Mansson, H. Cancer Epidemiol. Biomarkers Prev. 2007, 16, 1087-1097. (40) Furger, K. A.; Menon, R. K.; Tuck, A. B.; Bramwell, V. H.; Chambers, A. F. Curr. Mol. Med. 2001, 1, 621-632. (41) Tuck, A. B.; Chambers, A. F. J. Mammary Gland Biol. Neoplasia 2001, 6, 419-429. (42) Chang, Y. S.; Kim, H. J.; Chang, J.; Ahn, C. M.; Kim, S. K.; Kim, S. K. Lung Cancer 2007, in press. (43) Denhardt, D. T.; Noda, M.; O’Regan, A. W.; Pavlin, D.; Berman, J. S. J. Clin. Invest. 2001, 107, 1055-1061. (44) Desai, B.; Rogers, M. J.; Chellaiah, M. A. Mol. Cancer 2007, 6, 18. (45) Lin, H. M.; Chatterjee, A.; Lin, Y. H.; Anjomshoaa, A.; Fukuzawa, R.; McCall, J. L.; Reeve, A. E. Oncol. Rep. 2007, 17, 1541-1549. (46) Matthews, C. P.; Birkholz, A. M.; Baker, A. R.; Perella, C. M.; Beck, G. R., Jr.; Young, M. R.; Colburn, N. H. Cancer Res. 2007, 67, 24302438. (47) Nordsmark, M.; Eriksen, J. G.; Gebski, V.; Alsner, J.; Horsman, M. R.; Overgaard, J. Radiother. Oncol. 2007, 83, 389-397. (48) Reiniger, I. W.; Wolf, A.; Welge-Lussen, U.; Mueller, A. J.; Kampik, A.; Schaller, U. C. Am. J. Ophthalmol. 2007, 143, 705-707. (49) Shevde, L. A.; Samant, R. S.; Paik, J. C.; Metge, B. J.; Chambers, A. F.; Casey, G.; Frost, A. R.; Welch, D. R. Clin. Exp. Metastasis 2006, 23, 123-133. (50) Tigrani, D. Y.; Weydert, J. A. Am. J. Clin. Pathol. 2007, 127, 580584. (51) Tuck, A. B.; O’Malley, F. P.; Singhal, H.; Harris, J. F.; Tonkin, K. S.; Kerkvliet, N.; Saad, Z.; Doig, G. S.; Chambers, A. F. Int. J. Cancer 1998, 79, 502-508. (52) Rudland, P. S.; Platt-Higgins, A.; El-Tanani, M.; De Silva Rudland, S.; Barraclough, R.; Winstanley, J. H.; Howitt, R.; West, C. R. Cancer Res. 2002, 62, 3417-3427. (53) Singhal, H.; Bautista, D. S.; Tonkin, K. S.; O’Malley, F. P.; Tuck, A. B.; Chambers, A. F.; Harris, J. F. Clin. Cancer Res. 1997, 3, 605611. (54) Want, E. J.; Cravatt, B. F.; Siuzdak, G. ChemBioChem 2005, 6, 1941-1951. (55) Chace, D. H.; Kalas, T. A. Clin. Biochem. 2005, 38, 296-309. (56) Barr, J. R.; Maggio, V. L.; Patterson, D. G., Jr.; Cooper, G. R.; Henderson, L. O.; Turner, W. E.; Smith, S. J.; Hannon, W. H.; Needham, L. L.; Sampson, E. J. Clin. Chem. 1996, 42, 1676-1682. (57) Gerber, S. A.; Rush, J.; Stemman, O.; Kirschner, M. W.; Gygi, S. P. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 6940-6945. (58) Anderson, L.; Hunter, C. L. Mol. Cell. Proteomics 2006, 5, 573588. (59) Anderson, N. L.; Anderson, N. G.; Haines, L. R.; Hardie, D. B.; Olafson, R. W.; Pearson, T. W. J. Proteome Res. 2004, 3, 235-244. (60) Shawver, L. K.; Slamon, D.; Ullrich, A. Cancer Cell 2002, 1, 117123. (61) Ross, J. S.; Fletcher, J. A. Stem Cells 1998, 16, 413-428. (62) Pan, T. C.; Sasaki, T.; Zhang, R. Z.; Fassler, R.; Timpl, R.; Chu, M. L. J. Cell Biol. 1993, 123, 1269-1277. (63) Argraves, W. S.; Greene, L. M.; Cooley, M. A.; Gallagher, W. M. EMBO Rep. 2003, 4, 1127-1131. (64) Gallagher, W. M.; Currid, C. A.; Whelan, L. C. Trends Mol. Med. 2005, 11, 336-340. (65) Ramaswamy, S.; Ross, K. N.; Lander, E. S.; Golub, T. R. Nat. Genet. 2003, 33, 49-54. (66) Yi, C. H.; Smith, D. J.; West, W. W.; Hollingsworth, M. A. Am. J. Pathol. 2007, 170, 1535-1545. (67) Aguiar, M.; Masse, R.; Gibbs, B. F. Anal. Biochem. 2006, 354, 175181. (68) Barnidge, D. R.; Goodmanson, M. K.; Klee, G. G.; Muddiman, D. C. J. Proteome Res. 2004, 3, 644-652. (69) Berna, M.; Schmalz, C.; Duffin, K.; Mitchell, P.; Chambers, M.; Ackermann, B. Anal. Biochem. 2006, 356, 235-243. (70) Kuhn, E.; Wu, J.; Karl, J.; Liao, H.; Zolg, W.; Guild, B. Proteomics 2004, 4, 1175-1186. (71) Pan, S.; Zhang, H.; Rush, J.; Eng, J.; Zhang, N.; Patterson, D.; Comb, M. J.; Aebersold, R. Mol. Cell. Proteomics 2005, 4, 182190.

research articles

LC-MS-Based Biomarker Discovery in a Mouse Model (72) Wu, S. L.; Amato, H.; Biringer, R.; Choudhary, G.; Shieh, P.; Hancock, W. S. J. Proteome Res. 2002, 1, 459-465. (73) Fedarko, N. S.; Jain, A.; Karadag, A.; Van Eman, M. R.; Fisher, L. W. Clin Cancer Res 2001, 7, 4060-4066. (74) Desai, K. V.; Xiao, N.; Wang, W.; Gangi, L.; Greene, J.; Powell, J. I.; Dickson, R.; Furth, P.; Hunter, K.; Kucherlapati, R.; Simon, R.; Liu, E. T.; Green, J. E. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 69676972.

(75) Herschkowitz, J. I.; Simin, K.; Weigman, V. J.; Mikaelian, I.; Usary, J.; Hu, Z.; Rasmussen, K. E.; Jones, L. P.; Assefnia, S.; Chandrasekharan, S.; Backlund, M. G.; Yin, Y.; Khramtsov, A. I.; Bastein, R.; Quackenbush, J.; Glazer, R. I.; Brown, P. H.; Green, J. E.; Kopelovich, L.; Furth, P. A.; Palazzo, J. P.; Olopade, O. I.; Bernard, P. S.; Churchill, G. A.; Van Dyke, T.; Perou, C. M. Genome Biol. 2007, 8, R76.

PR070202V

Journal of Proteome Research • Vol. 6, No. 10, 2007 3975