Application of an End-to-End Biomarker Discovery Platform to

Proteomics, Merck Research Laboratories, 33 Louis Pasteur Avenue, Boston, Massachusetts 02115, Applied Computer Science and Mathematics, Merck Researc...
13 downloads 8 Views 4MB Size
Application of an End-to-End Biomarker Discovery Platform to Identify Target Engagement Markers in Cerebrospinal Fluid by High Resolution Differential Mass Spectrometry Cloud P. Paweletz,†,# Matthew C. Wiener,‡,# Andrey Y. Bondarenko,§,#,[ Nathan A. Yates,|,# Qinghua Song,⊥ Andy Liaw,⊥ Anita Y. H. Lee,| Brandon T. Hunt,§,[ Ernst S. Henle,§,[ Fanyu Meng,| Holly Funk Sleph,∇ Marie Holahan,∇ Sethu Sankaranarayanan,O Adam J. Simon,O Robert E. Settlage,| Jeffrey R. Sachs,‡ Mark Shearman,¶ Alan B. Sachs,+ Jacquelynn J. Cook,∇ and Ronald C. Hendrickson*,| Proteomics, Merck Research Laboratories, 33 Louis Pasteur Avenue, Boston, Massachusetts 02115, Applied Computer Science and Mathematics, Merck Research Laboratories, 126 East Lincoln Avenue, Rahway New Jersey 07065, Rosetta Biosoftware, 401 Terry Avenue North, Seattle, Washington 98109, Proteomics, Merck Research Laboratories, 126 East Lincoln Avenue, Rahway, New Jersey 07065, Biometrics Research, Merck Research Laboratories, Rahway, New Jersey 07065, Imaging, Merck Research Laboratories, Sumneytown Pike, West Point, Pennsylvania 19486, Integrative Systems Neuroscience, Merck Research Laboratories, Sumneytown Pike, West Point, Pennsylvania 19486, Neuroscience Drug Discovery, Merck Research Laboratories, 33 Louis Pasteur Avenue, Boston, Massachusetts 02115, and SIRNA Therapeutics, 1700 Owens Street, 4th Floor, San Francisco, California 94107 Received October 14, 2009

The rapid identification of protein biomarkers in biofluids is important to drug discovery and development. Here, we describe a general proteomic approach for the discovery and identification of proteins that exhibit a statistically significant difference in abundance in cerebrospinal fluid (CSF) before and after pharmacological intervention. This approach, differential mass spectrometry (dMS), is based on the analysis of full scan mass spectrometry data. The dMS workflow does not require complex mixing and pooling strategies, or isotope labeling techniques. Accordingly, clinical samples can be analyzed individually, allowing the use of longitudinal designs and within-subject data analysis in which each subject acts as its own control. As a proof of concept, we performed multifactorial dMS analyses on CSF samples drawn at 6 time points from n ) 6 cisterna magna ported (CMP) rhesus monkeys treated with 2 potent gamma secretase inhibitors (GSI) or comparable vehicle in a 3-way crossover study that included a total of 108 individual CSF samples. Using analysis of variance and statistical filtering on the aligned and normalized LC-MS data sets, we detected 26 features that were significantly altered in CSF by drug treatment. Of those 26 features, which belong to 10 distinct isotopic distributions, 20 were identified by MS/MS as 7 peptides from CD99, a cell surface protein. Six features from the remaining 3 isotopic distributions were not identified. A subsequent analysis showed that the relative abundance of these 26 features showed the same temporal profile as the ELISA measured levels of CSF Abeta 42 peptide, a known pharmacodynamic marker for γ-secretase inhibition. These data demonstrate that dMS is a promising approach for the discovery, quantification, and identification of candidate target engagement biomarkers in CSF. Keywords: proteomics • biomarker • γ secretase • dMS • mass spectrometry

Introduction Cerebrospinal fluid (CSF), which is in direct contact with the brain’s interstitial fluid and circulates through the ventricular system * To whom correspondence should be addressed. Dr. Ronald Hendrickson, Merck Research Laboratories, 126 E. Lincoln Ave., RY800-B301, Rahway, NJ, 07065. † Proteomics, Merck Research Laboratories, Boston, MA. ‡ AppliedMathematicsandComputerSciences,MerckResearchLaboratories. § Rosetta Biosoftware. | Proteomics, Merck Research Laboratories, Rahway, NJ. # These authors contributed equally to the publication. ⊥ Biometrics Research, Merck Research Laboratories. ∇ Imaging, Merck Research Laboratories. O Integrative Systems Neuroscience, Merck Research Laboratories. ¶ Neuroscience Drug Discovery, Merck Research Laboratories. + SIRNA Therapeutics. [ Present address: Microsoft Corporation, One Microsoft Way, Redmond Washington, 98052.

1392 Journal of Proteome Research 2010, 9, 1392–1401 Published on Web 01/24/2010

of the brain and spinal cord, is a particularly promising fluid for biomarkers of neurologic disorders. It is reasonable to believe that proteins reflecting the pathophysiology of the brain will spread into the CSF circulation, where they can be detected and quantified1,2 In fact, some CSF biomarkers are known: Aβ levels indicate senile plaque formation, elevated Tau levels indicate neuronal damage, and pTau is associated with the formation of neurofibrillary tangles.3-5 These protein markers are measured using immunological methods, which quantify predetermined analytes and require antibodies specific for each protein. Methods that could broadly profile proteins in CSF or other fluids, without requiring prespecification of the analytes to be examined, could be a useful complement to immunological methods. The use of mass spectrometry is a powerful approach to study complex mixtures of proteins without prior knowledge 10.1021/pr900925d

 2010 American Chemical Society

An Integrated Protein Biomarker Discovery Platform 6-10

of the analytes of interest. Some techniques have used isotopic labeling strategies11-13 to provide relative or absolute quantitation of analytes in different samples. While these techniques have been shown to be useful, they do not extend readily to in vivo studies. Previously, we have described a general proteomic approach, differential mass spectrometry (dMS), that provides relative quantitation and identifies statistically significant changes in full scan mass spectrometry data without the use of chemical or isotopic labels.14,15 Unlike methods that require pooling of samples, dMS enables direct comparison of individual samples across multiple conditions, allowing the use of complex experimental designs. Furthermore, features of interest can be targeted in subsequent experiments to obtain MS/MS spectra for sequence identification. dMS, however, remains far from routine laboratory use.16,17 To industrialize this workflow, we have developed reproducible sample processing, robust mass spectrometry methods, and end-to-end data analysis tools that support largescale biomarker discovery experiments. Previously, we showed that we could detect and quantify peptides spiked into a complex proteinaceous matrix background.18,19 Here, we extend the dMS method further to profiling CSF in a multifactorial study involving 108 individual CSF samples. We profiled CSF from 6 rhesus monkeys treated with 2 different γ-secretase inhibitors or vehicle at 6 distinct time points in a triple crossover study design. Twenty-six features, in 10 distinct isotopic distributions, were found to differ between samples from drug treated and vehicle treated animals. Interestingly, the time-dependent course of changes in these proteomic features was similar to the course of changes in levels of Aβ42, a known pharmacodynamic marker for GSI inhibition as measured by ELISA. Seven of these isotopic distributions were identified to arise from peptides from a single protein, CD99.

Materials and Methods Differential mass spectrometry (dMS) is a general workflow which includes the following steps: collect biological samples, perform reproducible biochemical sample processing, analyze them using liquid chromatography mass spectrometry (LCMS), align the LCMS spectra obtained for different samples, extract features from the aligned LCMS data, perform statistical tests to find features differentially expressed across experimental conditions, select features of particular interest, perform targeted analysis to obtain MS/MS spectra for those features, and determine the amino acid sequence of the analytes from the MS/MS spectra. Synthetic Chemistry. GSI-1 (Figure 5A) and GSI-2 (Figure 5B) were prepared by standard chemical techniques as described in WO 2006123183 (Merck) and WO 0019210 (Elan), respectively. Animal Model and CSF Sample Collection. Briefly, a catheter was permanently implanted into the cisterna magna of rhesus monkeys and connected to a subcutaneous port that allows for noninvasive, sterile access to CSF with a huber needle.20 Freshly drawn CSF was centrifuged at 4 °C for 4 min at 4000g and stored in 250 µL aliquots at -70 °C prior to use. CMP rhesus monkeys were treated with a 10 mg/kg single oral dose of GSI-1, a 3 mg/kg single oral dose of GSI-2 and vehicle (n ) 6) in a pseudo crossover design. CSF was collected 20 h prior to dose administration, immediately before dosing (t ) 0) and at 4, 24, 144, and 268 h time points after dosing. These doses were previously demonstrated to reduce Aβ42 levels by at least 40% versus mean baseline. All procedures

research articles related to the use of animals in these studies were reviewed and approved by the Institutional Animal Care and Use Committee at Merck Research Laboratories at West Point, and conform with the Guide for the Care and Use of Laboratory Animals (Institute of Laboratory Animal Resources, National Research Council, 1996). Biochemical CSF Sample Processing. A solution of horse myoglobin (Sigma) was spiked into 200 µL aliquots of neat CSF to a final concentration of 3 µg/mL. Samples were randomized and run 4 per block using 4 individual MARS spin columns (Aglient, Santa Clara, CA) as per manufacturer’s recommendations over a 2 week period. Briefly, samples were spun at 700g, and the flow through was collected. A total of 400 µL of Buffer A was added to the spin column and spun at 700g, and both fractions were combined and desalted with a 5000 MWCO cutoff ultrafiltration filter (VivaSpin). Concentrated samples were reconstituted to 100 µL in 50 mM NH4CO3 after which proteins were reduced with 4 mM TCEP for 30 min, alkylated with 10 mM iodoacetamide for 30 min in the dark, and digested overnight with 1:50 (w/v) sequence grade trypsin (Promega). Digests were quenched with 10 µL of neat acetic acid, desalted on a Michrom peptide trap column, concentrated to dryness, and dissolved in 20 µL of 0.1 M. Instrumental Setup and Mass Spectrometric Conditions. One microliter (equivalent of 10 µL immunodepleted CSF) of digested sample was loaded with a Famos autosampler (LC Packings, Sunnyvale, CA) onto a capillary sample trap column (100 µm i.d., 2.5 cm) and desalted on line for 3 min at 3 µL/ min with solvent A [100% HPLC grade water, 0.1 M acetic acid]. After 4 min, the flow rate was reduced to 1 µL/min and peptides were eluted into an in-house packed spray tip column (100 µm i.d., 190 µm o.d. × 8 cm; POROS R2; flame pulled tip ∼5 µm). Peptides were analyzed on a hybrid LTQ-FTMS (positive mode, AGC ) 100,000, maximum injection time ) 1s, spray voltage ) 2.5 kV, capillary temperature at 240 °C) using a 75 min gradient run (ThermoFisher, San Jose, CA). The gradient was delivered by an Agilent 1100 capillary pump and had four distinct sections: (a) 100% A at a flow rate of 3 µL/min from 0 to 3 min; (b) 3.01-5 min binary gradient from 0% to 6% solvent B [90% acetonitrile, 0.1 M acetic acid] at a flow rate of 1.0 µL/ min; (c) 5.01-39 min to 30% B at 1 µL/min; and (d) 30-60 min to 90% Solvent B, followed by equilibration 100% A at 1 µL/min from 60.01 to 75 min. Full scan high resolution mass spectra are recorded at a rate of 1 spectrum/s with a resolving power of 50,000 and a 300-2000 m/z scan range. Targeted MS/ MS spectra were acquired on differential expressed features. MS/MS spectra were searched by Sequest against an indexed primate database built from the non redundant NCBI database. Identification of Candidate Markers. Expression profiles generated from the raw LC-MS data collected for each sample were analyzed using the Elucidator proteomics data analysis suite (Rosetta Biosoftware, Seattle, WA) (Version: 3.1). The PeakTeller algorithm was used to align, measure and extract, m/z, retention time, and intensity data for the features contained in the data set. Alignment. The time-alignment algorithm in PeakTeller corrects for run-to-run variations in peak retention times and allows for accurate comparison of peak intensities across runs. Alignment occurs one image at a time: one image from the data set is selected as the master image, and all other images are aligned to it. The alignment algorithm does not recognize individual peaks (peaks are not yet defined by time and m/z boundaries at this stage); dynamic time warping21 is used to Journal of Proteome Research • Vol. 9, No. 3, 2010 1393

research articles

Paweletz et al.

Figure 1. Nonaligned feature (A) and aligned feature (B). Selected ion current chromatograms of the (M + 3H)3+ ions at m/z 632.630-632.674 and retention time window 33-35 min for 108 nonaligned (A) and aligned (B) data files. Time alignment was performed as described in Materials and Methods.

align spectra across images, and the peaks within the images are therefore aligned. Because retention time shifts are assumed (and generally observed) to be the same for all m/z within a spectrum, peaks that appear in some raw data images but not in the master image are time-shifted based on the shifts of other peaks with the same approximate retention time, but different m/z. Feature Extraction. Once the spectra have been aligned, peaks are extracted and the total intensity (peak area) for each peak is calculated for each spectrum. (For clarity, we use “peaks” and “features” interchangeably.) Peaks that represent different isotopes from the same charge state of a peptide can be linked into a cluster or group termed an isotope group. Isotope groups that represent different charge states of the same peptide can be further grouped. Figure 1 shows selected ion chromatograms for the feature at m/z 632.630-632.674 at 33-35 min before and after alignment. Figure 2 shows a set of aligned peaks and an example of grouping peaks corresponding to different isotopes and charge states arising from a single peptide. Statistical Analysis and Feature Selection. Standard ANOVA22 and linear mixed effects analysis23 was performed on log-transformed peak area data, taking into account that the same animals received all three treatments as described in the results section. Statistically significant features were filtered using tools in Elucidator or exported and processed in another environment such as R.24 Protein Identification. MS/MS spectra for the 26 features of interest were acquired and searched against a forward and reverse indexed nonredundant NCBI protein database containing human and nonhuman primate sequences using TurboSEQUEST v.27 (rev. 12). The following search parameters were applied: precursor ion mass tolerance of 50 ppm, fragment ion mass tolerance of 0.5 Da MONO/MONO, no enzyme restriction, static modification of +57.02 Da on cysteine and differential modification of +15.99 Da for oxidation on methionine. There were 1,002,532 protein entries in the database actually searched. Results were ranked by XCorr and each spectrum was manually verified. Peptide sequences from human proteins were submitted for query using BLAST to confirm identity to rhesus protein sequence. Aβ42 ELISA Measurements. CSF Aβ42 was measured using a custom sandwich ELISA consisting of a 6E10 capture antibody followed by detection with a C-terminal Aβ42 neo-epitope specific antibody conjugated with Alkaline Phosphatase (AP). 1394

Journal of Proteome Research • Vol. 9, No. 3, 2010

Briefly, CSF samples were diluted 1:4 and assayed in duplicate. Peptide standards were diluted into Aβ-immunodepleted CSF. A total of 50 µL of standards and CSF samples was added into the wells of the antibody coated plate along with 50 µL of detection antibody and left to incubate overnight at 4 °C. Next day, plates were washed and developed using alkaline phosphatase substrate (Cat no. T2214, Applied Biosystems, CA). Luminescence counts were measured using an LJL Analyst (Molecular Devices, CA). Counts from individual samples were converted to actual Aβ42 concentrations using a third order spline fit to the peptide standards. ELISA results were kept blinded to the proteomics group until all LC-MS analysis were completed and proteomics markers identified.

Results High Resolution LC-MS Data. The analysis of complex mixtures by high resolution LC-MS results in information-rich data files, or profiles. For the digested CSF samples studied in this work, high-resolution mass spectrometry was used to record the mass-to-charge ratio (m/z), chromatographic retention time, and relative intensity of singly or multiply charged ions that arise from peptides present in the mixture. Modern high resolution mass spectrometers, with their exquisite mass resolution, can separate ions that differ by as little as 0.005 m/z units and thus typically resolve ions that differ by the incorporation of a single 13C isotope. Figure 3A shows a representative base peak chromatogram and high-resolution mass spectrum for a 10 µL equivalent of immunodepleted and digested CSF. Peptides elute over a 13-55 min retention time range. Close examination of the 400-900 m/z range reveals more than 100 distinct ions present in ∼20 distinct isotope groups that represent ∼20 different coeluting peptides. Note the observation of low-abundance features is aided by reduction of chemical noise and the high resolution of the mass spectrometer (Figure 3B, inset). While the absolute number of peptides detected in a single sample cannot be accurately determined, we estimate that this LC-MS platform is capable of resolving in excess of 100,000 peptides per analysis. Platform Variability. Our first objective was to assess the variability of the analytical platform, which includes the biochemical sample processing and LC-MS analysis. To accomplish this goal, we spiked horse myoglobin into 1.6 mL of rhesus monkey CSF, divided the sample into 8 aliquots (or technical replicates) of 200 µL, and processed each

An Integrated Protein Biomarker Discovery Platform

research articles

Figure 2. One example feature, displayed in two different views. (A) Merged chromatogram view displaying the (M + 2H)2+ at m/z 948.472. (B) Merged image view of two isotope groups, (M + 3H)3+ ions at m/z 632.651 and (M + 2H)2+ ions at m/z 948.472. Note in this example the isotope groups contain 7 and 8 isotopes (or features), respectively. This feature is from the peptide, AQGFTEDSIVFLPQTDK, a unique sequence from prostoglandin H2D-isomerase precursor.

aliquot individually before analyzing by LC-MS. We selected peptide features readily measured by the high-resolution mass spectrometer (measured intensity >5000 in at least one sample and 2 or more isotopes and charge stage between 2 and 6) and calculated CVs (coefficient of variation) for those features across the N ) 8 technical replicates. The median CVs for endogenous CSF features is 41% (interquartile range, or 25th to 75th percentile, 31-60%) before the log-transformation (linear data) or 4.6% (interquartile range 3.5-6.9%) on the log-transformed data which is used in the statistical analysis). Three peptide features that correspond to horse myoglobin peptides added to the CSF had a mean CV of 32.2%, or 2.4% for the log-transformed data. Collectively, these results characterize the platform performance of the biochemical sample processing and LC-MS based profiling methods used in this study. LC-MS Profiling of Individual Rhesus CSF Samples. In this study, six animals were subject to three treatment regimens, and CSF was acquired at six time points per treatment yielding 108 discrete CSF samples. To avoid systemic error, samples were processed using an interwoven sample order and the laboratory personnel were blinded to sample identities. Additional controls containing 100 fmol of bradykinin, angiotensin and neurotensin in a solvent blank, a tryptic digest of conalbumin protein, and a reaction blank were interwoven every 6

LC-MS injections to measure the coefficient of variance attributed to the mass spectrometer. Additional controls were run on each day resulting in a total of 176 LC-MS injections. LC retention times for all 108 CSF samples were stable and did not change by more than 70 s (data not shown); a gradual decrease in retention times was observed with increasing analysis order. The coefficients of variation for bradykinin, angiotensin, and neurotensin were 24.7%, 31.6%, and 31.8%, respectively, or 2.8%, 3.1%, and 3.0% for the log-transformed data (data not shown), representing the variability of the LC-MS measurements. To estimate biologic variability (without including the treatment effect), we examined features appearing in the 36 vehicle treated CSF samples. We selected peptide features readily measured by the high-resolution mass spectrometer (maximum averaged intensity in a peak >5000 in the vehicle samples and 2 or more isotopes and charge state between 2 and 6). The median CV was 48% (interquartile range 34-67%) for the untransformed data, or 4.9% (interquartile range 3.4 - 6.9%) for the log-transformed peak area data used in the analysis. Figure 4 shows that, as expected, features with lower intensities had larger CVs. Analysis of LC-MS Data To Select Drug Responsive Features. In the pseudo crossover design used here, all animals received all treatments. Treatments A and B were administered Journal of Proteome Research • Vol. 9, No. 3, 2010 1395

research articles

Paweletz et al.

Figure 3. (A) The FT-MS base peak chromatogram (equivalent to 10 µL of immunodepleted CSF loaded onto column). (B) Mass spectrum for all molecular ion species between 25 and 26 min. Observation of low-abundance signals in the complex environment is aided by reduction in chemical noise and increase in resolution of the FT-MS mass spectrometer (inset; expanded region of m/z 838-843).

Figure 4. Coefficient of variation distribution for 3893 molecular ion features. Average and median CV are 26.6% and 36.1%, respectively. Coefficients of variation (standard deviation/mean) of proteomic features are largest for the smallest features, and get progressively smaller for larger features. Each panel shows the distribution of coefficient of variation (standard deviation/ mean) for a subset of the features, from the smallest features on the bottom to the largest features on the top. Panels are divided by mean peak volume, with each panel showing the distribution for slightly more than 1/6 of the features, with a small amount of overlap between features included in adjacent panels. The dark part of each strip shows the range of mean AUCs of features included in each panel on a log scale, though actual values are not indicated.

in random order, followed by treatment C. An advantage of this design is that each animal can serve as its own control, minimizing the effect of variability between animals and 1396

Journal of Proteome Research • Vol. 9, No. 3, 2010

increasing statistical power. A total of 39,567 features having distinct m/z, retention time, and intensity values, were extracted from the LC-MS data (averaged intensity >5000 in at least one treatment group vehicle, contains 2 or more isotopes and charge state between 2 and 6). Using “time” and “treatment” as fixed effects and “animal” as a random effect, linear mixed-effect models were fit on the log-transformed peak area data revealing 812 features that exhibit a statistically significant interaction between the time effect and treatment effects. Because peptides contain 12C and 13C, and the mass spectrometer can detect peptide ions with different numbers of 12C and 13 C atoms independently, we required that features be part of isotopic distributions with at least two isotopes detected. Furthermore, an additional computational filter was used to select a subset of features of interest: the relative abundance between vehicle and treatment was not different (p > 0.05) at -20 h, 0 h, and 268 h, and different (p < 0.05) for each compound at least at one of 4 h, 24 h, or 144 h (though not necessarily at the same time point in the two treatments). One hundred fifty features passed these two filters. We further required that relative abundance at the two pretreatment measurements (-20 and 0 h) be similar and return to pretreatment levels at 268 h (all three measurements within a multiple of 1.5) as is expected for a single dose, wash-out experiment, leaving 32 features. Finally, since both compounds are GSI, the two treatments were required to influence relative abundance in the same direction (that is, both lower than in vehicle, or both higher). This left 26 features that met all these conditions. Figure 5 shows mass spectra and selected ion current chromatograms for one example feature, averaged across animals for each experimental time point and treatment condition. Because samples are analyzed separately rather than being pooled, we can examine how individual animals respond to

An Integrated Protein Biomarker Discovery Platform

research articles

Figure 5. Structures of GSI-1 and GSI-2 and example of Individual Feature Data. (a) Structure of GSI-1 and GSI-2. (b) Selected ion current chromatogram for a peak at m/z 841.042 between retention time of approximately 25.1 and 25.9 (between vertical dashed lines). (c) High-resolution mean intensity spectrum (m/z: 840.6-842.7) for the same retention time range, that contains the mean abundance for four isotopic features at m/z ) 841.042, m/z ) 841.376, m/z ) 841.711, and m/z ) 842.046 at all the time points for n ) 6 animals treated with GSI-2, GSI-1, or vehicle. The gray line that approaches the top of the isotopic peaks shows the estimated isotopic distribution for a peptide ion with an accurate mass equal to 2521.102. (d) Mean value for n ) 6 animals for each treatment and time point. This feature shows no change in the vehicle group, a significant decrease in abundance in CSF from GSI-2 (3 mg/kg) treated rhesus monkeys and a significant, albeit less dramatic, change in CSF from GSI-1 (10 mg/kg) treated rhesus monkeys. Journal of Proteome Research • Vol. 9, No. 3, 2010 1397

research articles

Paweletz et al.

Figure 6. Data from individual animals for 1 selected feature. The area under curve (on a natural log scale) for feature at m/z 841.042 (same feature as Figure 5) for each animal for all experimental time points and treatment conditions (vehicle, GSI-1, GSI-2).

Figure 7. Drug responsive features identified in proteomics data. The log-transformed peak area values for all features were analyzed in a linear mixed-effect model using “time” and “treatment” as fixed effects and “animal” as a random effect and filtered as described in Material and Methods and Results. The 26 statistically significant features discovered fall into 10 isotope groups with the most abundant isotope in each group at: m/z 1004.051, 1261.552, 669.367, 790.626, 820.893, 841.373, 889.964, 940.001, 954.516, 975.512. The mean log peak area for n ) 6 animals for each treatment (GSI-1 (blue b), GSI-2 (pink b), vehicle (green b)) and each time point (-20, 0, 4, 24, 144, and 268 h) for the 10 isotope groups is shown. Error bars represent standard error of the mean.

the different treatments or how multiple animals respond to the same treatment. For the same example feature shown in Figure 5, Figure 6 shows the area under curve (AUC) for each animal for all experimental time points and treatment conditions. The 26 features discovered in this experiment fall into 10 isotope groups. The most abundant isotope in each group is shown in Figure 7. Interestingly, all the features found decreased and have maximum change from pretreatment levels at the 24 h time point post dose, although our filtering criteria did not require this. To estimate the false positive rate, the likelihood that features were selected by chance, we randomly permutated the condition and time labels for the 108 FTMS data files and repeated the entire analysis, including the filters discussed above, using 1398

Journal of Proteome Research • Vol. 9, No. 3, 2010

the randomly labeled raw data. This process was repeated five times, and yielded a total of 12 features (0, 2, 2, 2, and 6 on the 5 runs). This analysis gives a rough estimate of the false positive rate of about 2 features, which is small compared to the 26 features found. Measurement of Aβ42 Levels in CSF Samples. The gamma secretase enzyme processes the amyloid precursor protein, APP, to generate a peptide fragment Aβ1-42. Thus, inhibition of gamma secretase in the brain reduces levels of Aβ1-42 peptide in the CSF indicating that decreasing levels of Aβ1-42 are a pharmacodynamic marker of GSI inhibition.25-27 We measured the quantity of Aβ1-42 peptide in the CSF samples by a quantitative ELISA (Figure 8). GSI-2 shows greater Aβ42 lowering efficacy as compared to GSI-1 which is consistent with

research articles

An Integrated Protein Biomarker Discovery Platform

protein to be measured. These targeted assays provide a “closed” platform: they can only find what they are looking for. In contrast, “open” or “profiling” platforms gather data on hundreds or thousands of components in complex mixtures. Because profiling platforms gather information broadly, one does not need to prespecify what they should look for. Instead, such platforms can use statistical tests to find components that change in desired ways. In this study, we demonstrated that protein profiling can pull out a small number of previously unsuspected pharmacodynamic markers of enzyme activity from among thousands of features detected by the mass spectrometer. Figure 8. ELISA quantitation of Aβ42 peptide. Quantitation by ELISA of Aβ42 peptide levels in CSF taken from the same animals and same time (as described in Materials and Methods (vehicle (black 9), GSI-1(red b), GSI-2(green 2)).

our experience with these compounds in preclinical models (data not shown). These results confirm the desired inhibition of γ-secretase in the rhesus CSF samples. Interestingly, the time course of Aβ42, the known PD marker, matches the time course of the proteomic features with strong Aβ42 lowering at the 24 h time point. Amino Acid Sequence Identification for Selected Features. For the 10 distinct isotope groups, 7 high quality MS/MS spectra were obtained, all of which were identified with a database search (Sequest search described above) and the amino acid sequence and corresponding accession information are listed in Table 1. The identified peptides were checked manually using a verification viewer to link the feature (m/z and retention time) to the corresponding MS/MS spectrum from which the identification was obtained.

Discussion Platforms that can efficiently quantify proteins in biological samples represent a critical need for the pharmaceutical industry. While biochemical assays can be used to quantify known proteins in clinical samples with excellent sensitivity and specificity, they require antibody reagents unique for each

The advantages of modern high-resolution mass spectrometers are well recognized.28 Specifically, the high resolving power of the FTMS facilitates automatic separation of peptide signals based on mass-to-charge ratios and reduces the need for extensive chromatographic separations that add time and complexity to the experiment. In addition, the part-per-million mass measurement accuracy of the FTMS provides precise mass-to-charge ratios that are consistent to several decimal places across experiments and laboratories. It is important to recognize that the approaches used to estimate the platform variability for “open” platforms are fundamentally different than the approaches used when specific analytes are measure with a closed platform. Platform variability includes variability due to biochemical sample processing and LC-MS analysis. In a closed platform, one can measure the variability of one or a few analytes and perform power calculations based on those measurements. Profiling platforms measure many analytes simultaneously, and different analytes may well be measured with different levels of variability. Therefore, a single power calculation based on a single variability will not hold for all analytes. One approach is to characterize the distribution of variability across many of the analytes (Figure 2B) and then assume a fold change or choose an appropriate level of variability for power calculations. We might, for example, decide to use the median observed variability, or the 90th percentile of observed variability. Naturally, the greater the variability we decide to plan for, the more

Table 1. Twenty-six Drug Responsive Features and Peptide Identification feature no.

peak centroid m/z

molecular weight

isotope group

peptide no.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

889.964 890.465 890.967 939.499 940.001 940.502 975.019 975.512 954.015 954.516 841.039 841.373 841.708 842.042 1261.051 1261.552 820.642 820.893 821.142 821.392 669.367 670.037 1003.549 1004.051 790.626 790.877

1777.913 1777.913 1777.913 1876.98 1876.98 1876.98 1948.027 1948.027 1906.015 1906.015 2520.104 2520.104 2520.104 2520.104 2520.086 2520.086 3278.529 3278.529 3278.529 3278.529 2005.081 2005.081 2005.093 2005.093 3157.485 3157.485

1 1 1 2 2 2 3 3 4 4 5 5 5 5 6 6 7 7 7 7 8 8 9 9 10 10

1 1 1 2 2 2 3 3 4 4 5 5 5 5 5 5 6 6 6 6

XCorr

protein description

EGEEADAPGVIPGIVGAVV

peptide sequence

3.229

CD99 (sw: P14209)

EGEEADAPGVIPGIVGAVVV

2.900

CD99 (sw: P14209)

EGEEADAPGVIPGIVGAVVVA KEGEEADAPGVIPGIVGAVV

2.858 3.454

CD99 (sw: P14209) CD99 (sw: P14209)

PNPNQAGSSGSFSDADLADGVSGGEGK

6.986

CD99 (trm: O02788)

PNPNQAGSSGSFSDADLADGVSGGEGK

2.510

CD99 (trm: O02788)

PPKPKPNPNPNQAGSSGSFSDADLADGVSGGEGK

4.469

CD99 (trm: O02788)

no high quality MS/MS spectra acquired no high quality MS/MS spectra acquired no high quality MS/MS spectra acquired no high quality MS/MS spectra acquired no high quality MS/MS spectra acquired no high quality MS/MS spectra acquired

Journal of Proteome Research • Vol. 9, No. 3, 2010 1399

research articles samples we will need to use to attain a given power. This is a price well worth paying for the ability to analyze many analytes in a single experiment. It is worth noting that traditional characterization and validation approaches can be readily applied after an analyte has been discovered using an open platform. Measuring the variability of a platform requires several experiments. To measure technical variability associated with LCMS analysis, we run multiple aliquots of a single sample through the mass spectrometer. To measure variability associated with biochemical sample processing, we separate aliquots of a sample, run each aliquot through the entire sample process and the mass spectrometer. Finally, to estimate biological variability, we run samples from different subjects through the entire process. We analyzed the 39,567 features of specific m/z and intensity that were obtained by LC-MS analysis for rhesus CSF in Elucidator. The primary data analysis steps involve the extraction and organization of features from the native FTMS mass data. Statistical and proteomic criteria were used to select just 26 features in 10 isotope groups as candidate pharmacodynamic markers. Unlike many proteomics data analysis approaches that require up-front sequence identification of thousands of MS/MS spectra, our approach moves sequence analysis to the end of the process where the identification of only a few peptide sequences is required. Moving peptide identification to the end of the data analysis process not only saves time by reducing the amount of MS/MS data that needs to be acquired and analyzed, but it also allows specialized and/ or manual sequencing techniques that would not be feasible for large shotgun-based approaches. In addition, we speculate that it may improve reproducibility of discovery proteomics experiments. Another key aspect of our approach is that the analysis is based on data from individual samples that have not been pooled, mixed or chemically labeled. Because individual sample integrity has been preserved, this approach permits “within” subject data analysis where an individual animal (or patient) can serve as its own control. This data analysis approach also allows mass spectrometry data to be compared and correlated with additional end point data (e.g., Aβ42) on a per-patient or per-treatment group basis. Overall, the end-to-end data analysis tools used in this study allows large amounts of data to be analyzed in a variety of ways. The multifactorial analyses on 108 CSF samples (6 rhesus animals, 3 treatment conditions, 6 time points) resulted in the identification of a small number of features (26) that behave similarly to a known pharmacodynamic marker (Aβ42). The number of positively identified features in this study is small (7 from 10 isotope groups) but not completely unexpected due to the low abundance of the features and the sheer complexity of the CSF samples. When possible, we recommend larger starting volumes of CSF to ensure sufficient amount of material for exhaustive targeted analysis. Gamma secretase is an aspartyl protease and member of a small class of proteases that have the unusual property of cleaving amid bonds within the lipid bilayer.29 CD99 is a single pass type 1 transmembrane protein with residues 123-147 that span the plasma membrane region. Interestingly, as shown in Table 1, features 1-12 have a nontryptic C-terminus that contains the residues 135, 136, and 137, all of which are within the region of CD99 that crosses the plasma membrane and located in the precise compartment where gamma secretase is know to reside within the cell. 1400

Journal of Proteome Research • Vol. 9, No. 3, 2010

Paweletz et al. Features 13-20 are tryptic peptides N-terminal of features 1-12 and are likely generated by the trypsin processing of our samples. Our results suggest that the transmembrane portion of CD99 is an interesting nonamyloid based candidate PD marker of γ-secretase inhibition in rhesus monkeys. Subsequent experiments will be needed to confirm this. A final important aspect of the biomarker discovery platform described here is the existence of a clear path for measuring and translating candidate markers in the clinic. Triple quadrupole mass spectrometry has been shown to be robust and reproducible and can be used as a targeted assay to identify and quantitate the analyte in clinical samples. By using a triple quadrupole MS based assay and stable isotope labeled internal standards, in general, we anticipate the CV for the assay to improve from approximately 30-40% to 8-15%. Widespread use of mass spectrometry based assays in the clinic has already been established for the analysis of small molecule drugs30 and suggests that the use of similar MS based assays for biomarker measurements is practical.

Summary and Conclusions Strategies that profile complex mixtures of proteins simultaneously and provide quantitative measurement without the need for prespecifying the analyte provide opportunities to discover new biomarkers that can accelerate the development and testing of medicines. Using a Rhesus CSF model of gamma secretase inhibition, we demonstrate differential mass spectrometry (dMS) as a viable end-to-end strategy for the identification of candidate protein biomarkers. This approach uses reproducible biochemistry, high-resolution mass spectrometry, and sophisticated data analysis software to profile complex biological mixtures and detect, quantify, and identify candidate protein biomarkers. Even though mass spectrometric based measurements do not currently have the sensitivity of classical immunoassays, candidate markers discovered by differential mass spectrometry have a clear translation path for the development of robust triple quadrupole based assays that can cross the species barriers. Such protein biomarkers may provide quantitative measures of target engagement, guide dosing, and aid in the selection of patients for clinical trials. Abbreviations: dMS, differential mass spectrometry; FT-MS, Fourier transform mass spectrometry; CSF, cerebrospinal fluid; LC-MS, liquid chromatography mass spectrometry; µLC-MS, microcapillary liquid chromatography mass spectrometry; GSI, gamma secretase inhibitor; CMP, cisterna magna ported.

Acknowledgment. Special thanks to Richard Hargreaves for the conversation that got this project going, James Conway for expert assistance with the database searches, and Stephen Friend for his support. Supporting Information Available: Supplementary data, identified peptide sequences mapped to Rhesus CD99 protein sequence. Supplementary table with peptide sequence. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Yuan, X.; Desiderio, D. M. Human cerebrospinal fluid peptidomics. J. Mass Spectrom. 2005, 40, 176–181. (2) Zhang, J.; Goodlett, D. R.; Quinn, J. F.; Peskind, E.; Kaye, J. A.; Zhou, Y.; Pan, C.; Yi, E.; Eng, J.; Wang, Q.; Aebersold, R. H.; Montine, T. J. Quantitative proteomics of cerebrospinal fluid from patients with Alzheimer disease. J. Alzheimer’s Dis. 2005, 7, 125–133.

research articles

An Integrated Protein Biomarker Discovery Platform (3) Hansson, O.; Zetterberg, H.; Buchhave, P.; Londos, E.; Blennow, K.; Minthon, L. Association between CSF biomarkers and incipient Alzheimer’s disease in patients with mild cognitive impairment: a follow-up study. Lancet Neurol. 2006, 5, 228–234. (4) Kosik, K. S.; Joachim, C. L.; Selkoe, D. J. Microtubule-associated protein tau (tau) is a major antigenic component of paired helical filaments in Alzheimer disease. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 4044–4048. (5) Grundke-Iqbal, I.; Iqbal, K.; Tung, Y. C.; Quinlan, M.; Wisniewski, H. M.; Binder, L. I. Abnormal phosphorylation of the microtubuleassociated protein tau (tau) in Alzheimer cytoskeletal pathology. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 4913–4917. (6) Wang, W.; Zhou, H.; Lin, H.; Roy, S.; Shaler, T. A.; Hill, L. R.; Norton, S.; Kumar, P.; Anderle, M.; Becker, C. H. Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal. Chem. 2003, 75, 4818–4826. (7) Hunt, D. F.; Michel, H.; Dickinson, T. A.; Shabanowitz, J.; Cox, A. L.; Sakaguchi, K.; Appella, E.; Grey, H. M.; Sette, A. Peptides presented to the immune system by the murine class II major histocompatibility complex molecule I-Ad. Science 1992, 256, 1817–1820. (8) Hunt, D. F.; Henderson, R. A.; Shabanowitz, J.; Sakaguchi, K.; Michel, H.; Sevilir, N.; Cox, A. L.; Appella, E.; Engelhard, V. H. Characterization of peptides bound to the class I MHC molecule HLA-A2.1 by mass spectrometry. Science 1992, 255, 1261–1263. (9) MacCoss, M. J.; McDonald, W. H.; Saraf, A.; Sadygov, R.; Clark, J. M.; Tasto, J. J.; Gould, K. L.; Wolters, D.; Washburn, M.; Weiss, A.; Clark, J. I.; Yates, J. R., III. Shotgun identification of protein modifications from protein complexes and lens tissue. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 7900–7905. (10) Aebersold, R.; Mann, M. Mass spectrometry-based proteomics. Nature 2003, 422, 198–207. (11) Everley, P. A.; Krijgsveld, J.; Zetter, B. R.; Gygi, S. P. Quantitative cancer proteomics: stable isotope labeling with amino acids in cell culture (SILAC) as a tool for prostate cancer research. Mol. Cell. Proteomics 2004, 3, 729–735. (12) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 1999, 17, 994– 999. (13) Ong, S. E.; Mann, M. Stable isotope labeling by amino acids in cell culture for quantitative proteomics. Methods Mol. Biol. 2007, 359, 37–52. (14) Lee, A. Y.; Paweletz, C. P.; Pollock, R. M.; Settlage, R. E.; Cruz, J. C.; Secrist, J. P.; Miller, T. A.; Stanton, M. G.; Kral, A. M.; Ozerova, N. D.; Meng, F.; Yates, N. A.; Richon, V.; Hendrickson, R. C. Quantitative analysis of histone deacetylase-1 selective histone modifications by differential mass spectrometry. J. Proteome Res. 2008, 7 (12), 5177–5186. (15) Zhao, X.; Deyanova, E. G.; Lubbers, L. S.; Zafian, P.; Li, J. J.; Liaw, A.; Song, Q.; Du, Y.; Settlage, R. E.; Hickey, G. J.; Yates, N. A.; Hendrickson, R. C. Differential mass spectrometry of rat plasma reveals proteins that are responsive to 17beta-estradiol and a selective estrogen receptor modulator PPT. J. Proteome. Res. 2008, 7, 4373–4383. (16) Turck, C. W.; Falick, A. M.; Kowalak, J. A.; Lane, W. S.; Lilley, K. S.; Phinney, B. S.; Weintraub, S. T.; Witkowska, H. E.; Yates, N. A. The

(17) (18)

(19)

(20)

(21)

(22) (23) (24) (25) (26)

(27)

(28) (29)

(30)

Association of Biomolecular Resource Facilities Proteomics Research Group 2006 study: relative protein quantitation. Mol. Cell. Proteomics 2007, 6, 1291–1298. Domon, B.; Aebersold, R. Mass spectrometry and protein analysis. Science 2006, 312, 212–217. Meng, F.; Wiener, M. C.; Sachs, J. R.; Burns, C.; Verma, P.; Paweletz, C. P.; Mazur, M. T.; Deyanova, E. G.; Yates, N. A.; Hendrickson, R. C. Quantitative analysis of complex peptide mixtures using FTMS and differential mass spectrometry. J. Am. Soc. Mass Spectrom. 2007, 18, 226–233. Wiener, M. C.; Sachs, J. R.; Deyanova, E. G.; Yates, N. A. Differential mass spectrometry: a label-free LC-MS method for finding significant differences in complex peptide and protein mixtures. Anal. Chem. 2004, 76, 6085–6096. Gilberto, D. B.; Zeoli, A. H.; Szczerba, P. J.; Gehret, J. R.; Holahan, M. A.; Sitko, G. R.; Johnson, C. A.; Cook, J. J.; Motzel, S. L. An alternative method of chronic cerebrospinal fluid collection via the cisterna magna in conscious rhesus monkeys. Contemp. Top. Lab. Anim. Sci. 2003, 42, 53–59. Wang, C. P.; Isenhour, T. L. Time-warping algorithm applied to chromatographic peak matching gas chromatography/Fourier transform infrared/mass spectrometry. Anal. Chem. 1987, 59, 649– 654. Brenton R.Clarke. (2008) Linear Models: The Theroy and Application of Variance, John Wiley & Sons, Inc. Pinheiro J. C., Bates D. M. Mixed-Effects Models in S and S-Plus (Statistics and Computing); Springer-Verlag: New York, 2000. R Development Core Team. (2008) R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria. Wolfe, M. S. Inhibition and modulation of gamma-secretase for Alzheimer’s disease. Neurotherapeutics 2008, 5, 391–398. Abramowski, D.; Wiederhold, K. H.; Furrer, U.; Jaton, A. L.; Neuenschwander, A.; Runser, M. J.; Danner, S.; Reichwald, J.; Ammaturo, D.; Staab, D.; Stoeckli, M.; Rueeger, H.; Neumann, U.; Staufenbiel, M. Dynamics of Abeta turnover and deposition in different beta-amyloid precursor protein transgenic mouse models following gamma-secretase inhibition. J. Pharmacol. Exp. Ther. 2008, 327, 411–424. Shearman, M. S.; Beher, D.; Clarke, E. E.; Lewis, H. D.; Harrison, T.; Hunt, P.; Nadin, A.; Smith, A. L.; Stevenson, G.; Castro, J. L. L-685,458, an aspartyl protease transition state mimic, is a potent inhibitor of amyloid beta-protein precursor gamma-secretase activity. Biochemistry 2000, 39, 8698–8704. Mann, M.; Kelleher, N. L. Precision proteomics: the case for high resolution and high mass accuracy. Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 18132–18138. Lemberg, M. K.; Menendez, J.; Misik, A.; Garcia, M.; Koth, C. M.; Freeman, M. Mechanism of intramembrane proteolysis investigated with purified rhomboid proteases. EMBO J. 2005, 24, 464– 472. Baillie, T. Advances in the application of mass spectrometry to studies of drug metabolism, pharmacokinetics and toxicology. Int. J. Mass Spectrom. Ion Processes 1992, 118/119, 289–314.

PR900925D

Journal of Proteome Research • Vol. 9, No. 3, 2010 1401