Methodology Utilizing MS Signal Intensity and LC Retention Time for

Nov 1, 2006 - Applied Biosystems/Sciex Corporation, 500 Old Connecticut Path, Framingham, Massachusetts 01701. This study describes a methodology for ...
1 downloads 0 Views 486KB Size
Anal. Chem. 2006, 78, 7986-7996

Methodology Utilizing MS Signal Intensity and LC Retention Time for Quantitative Analysis and Precursor Ion Selection in Proteomic LC-MALDI Analyses Stephen J. Hattan*,† and Kenneth C. Parker‡

Applied Biosystems/Sciex Corporation, 500 Old Connecticut Path, Framingham, Massachusetts 01701

This study describes a methodology for performing relative quantitation in large-scale proteomic sample comparisons using an LC-MALDI mass spectrometry analytical platform without the use of isotope tagging reagents. The method utilizes replicate analyses of a sample to create a profile of constituent components that are aligned based on LC elution time and mass. Once components from individual runs have been grouped as common “features”, the Student’s t test is used to determine which components are systematically different between samples. In this study, five HPLC runs of human plasma were compared to five HPLC runs of human serum. About 3889 components were detected in all 10 runs. Of these, 1831 corresponded to ∼100 known serum proteins, based on MS/MS analysis of one run each from serum and plasma. As expected, fibrinogen r, β, and γ chains accounted for many of the most significant differences. Therefore, using MALDI, samples containing thousands of peptides can be compared in a minimal amount of time. Moreover, the results of the comparison can be used to guide further MS/MS mode sample interrogation in a result dependent manner. Proteomic analyses using MALDI TOF/TOF mass spectrometry for the sequencing of peptides for protein identification has proven a viable and effective analytical strategy.1-3 Typical experimental workflows begin with a complex protein mixture from which hundreds to thousands of proteins can be identified.4-8 * Corresponding author. E-mail: [email protected]. † Present address: Virgin Instruments, 60-R Union Ave., Sudbury, MA 01776. ‡ Present address: Children’s Hospital, 320 Longwood Ave., Boston, MA 02115-5737. (1) Adv. Protein Chem. 2005, 70. Medzihradszky, K. F.; Campbell, J. M.; Baldwin, M. A.; Falick, A. M.; Juhasz, P.; Vestal, M. L.; Burlingame, A. L. Anal. Chem. 2000, 72, 552-558. (2) Beinvenut, W. V.; Deon, C.; Pasquarello, C.; Campbell, J. M.; Sanchez, J. C.; Vestal, M. L.; Hochstasser, D. F. Anal. Chem. 2000, 72, 552-558. (3) Yergey, A. L.; Coorssen, J. R.; Backlund, P. S., Jr.; Blank, P. S.; Humphrey, G. A.; Zimmerberg, J.; Campbell, J.; Vestal, M. J. Am. Soc. Mass Spectrom. 2002, 13, 784-791. (4) Malmstrom, J.; Larsen, K.; Malmstrom, L.; Tufvesson, E.; Parker, K.; Marchese, J.; Williamson, B.; Patterson, D. H.; Martin, S. A.; Juhasz, P.; Westergren-Thorsson, G.; Marko-Varga, G. Electrophoresis 2003, 24, 3806537. (5) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R., III. Nat. Biotechnol. 1999, 17, 676-682.

7986 Analytical Chemistry, Vol. 78, No. 23, December 1, 2006

This starting mixture may be fractionated by chromatography, electrophoresis, or affinity pull-down while retaining and concentrating the analytes of interest.9-12 In most but not all cases,13,14 initial protein isolation is followed by enzymatic protein degradation into constituent peptides. Following protein digestion, and depending on the complexity of the initial protein mixture, the resulting peptides typically undergo one or two dimensions of chromatographic separation prior to analysis by mass spectrometry. The final dimension of peptide separation is usually a reversed-phase mode separation due to its characteristics of near universal binding and high-resolution elution for peptides and delivery of the sample to the mass spectrometer in a compatible medium. With regard to LC-MALDI, several LC hardware systems now exist that allow eluting peptides to be infused with MALDI matrix prior to deposition onto the MALDI target.15,16 This experimental scheme allows for the peptide resolution gained by chromatographic separation to be maintained and spatially fixed on the MALDI target. Once a LC-MALDI target is created, MS analysis for peptide sequencing and protein identification proceeds in two stages. First, each spot is analyzed in MS mode and the spectra are arranged according to the order of LC elution. This initial analysis is performed to create an inventory of the peptides that are present. Subsequently, peptides to be subjected to fragmentation in MS/ MS mode for determination of peptide sequence and protein identification are then selected based on signal intensity. To date, (6) Peng, J.; Elias, J. E.; Thoreen, C. C.; Licklider, L. J.; Gygi, S. P. J. Proteome Res. 2003, 2, 43-50. (7) Hattan, S. J.; Marchese, J.; Khainovski, N.; Martin, S.; Juhasz, P. J. Proteome Res. 2005, 4, 1931-1941. (8) Wolters, D. A.; Washburn, M. P.; Yates, J. R. III. Anal. Chem. 2001, 73, 5683-5690. (9) Pappin, D. H. C.; Hojrup, P.; Bleasby, H. J. Curr. Biol. 1993, 3, 327-332. (10) Lim, H.; Eng, J.; Yates, J. R., III; Tollaksen, S. L.; Giometti, C. S.; Holden, J. F.; Adams, M. W. W.; Reich, C. I.; Olsen, G. J.; Hays L. G. J. Am. Soc. Mass Spectrom. 2003, 14, 957-970. (11) Chong, B. E.; Yan, F.; Lubman, D. M.; Miller, F. R. Rapid Commun. Mass Spectrom. 2001, 15, 291-296. (12) Zhen, Y.; Xu, N.; Richardson, B.; Becklin, R.; Savage, J. R.; Blake, K; Peltier, J. M. J. Am. Soc. Mass Spectrom. 2004, 15, 803-815. (13) Wang, M. Z.; Howard, B.; Campa, M. J.; Patz, E. F., Jr.; Fitzgerald, M. C. Proteomics 2003, 3, 1661-1666. (14) Howard, B. A.; Wang, M. Z.; Campa, M. J.; Corro, C.; Fitzgerald, M. C.; Patz, E. F., Jr. Proteomics 2003, 3, 1720-1724. (15) Schwartz, H.; Van Soest, R.; Swart, R.; Salzmann, J. P. Am. Genomic/ Proteomic Technol. 2002, (May/June). (16) Na¨gele, E.; Vollmer, M.; Ho ¨rth, P. J. Biomol. Tech. 2004, 15, 134-143. 10.1021/ac0610513 CCC: $33.50

© 2006 American Chemical Society Published on Web 11/01/2006

this experimental scheme has been successful for identifying large numbers of proteins; however, quantitative comparisons between two different samples have relied on isotope tags17-20 or SILAC reagent technology21,22 (Invitrogen Corp.) due to the perception that MALDI spectra are too variable for quantitation based on signal intensity.23-25 Although the mixture of LC effluent with liquid MALDI matrix results in a homogeneous mixture, after the sample has dried, the analyte distribution within the resulting field of matrix crystals is heterogeneous.26 On early MALDI instruments, this heterogeneity in sample distribution often required the user to search a given location for the “sweet spot” of analyte concentration and signal intensity. The often low-resolution analysis of a given location yielded highly variable results with regard to signal intensity. On this basis, MALDI mass spectrometry was considered a qualitative technique but too nonreproducible for relative signal quantitation unless isotope tags were used. However, present instrumentation has incorporated higher repetition rate lasers, faster moving stages, and the ability to run sophisticated search patterns that make it possible to accumulate several hundred to several thousand spectra across a given sample area in a matter of seconds. The ability to accumulate data and thoroughly interrogate a given location for an analyte makes it possible to average out inherent sample heterogeneity and, as this study shows, readily enables the use of MALDI mass spectrometry for relative quantitation based on signal intensity. Described herein is an LC-MALDI workflow used for the comparative qualitative and relative-quantitative analysis of two complex biological samples. The approach uses multiple measurements on each sample to create a profile of components with respect to mass, elution position, and signal intensity. The success of the experiment hinges on reproducibility in LC elution and signal intensity and both reproducibility and accuracy in mass measurement. Once intrasample features are assigned and measurement precision established, an intersample comparison is performed in order to distinguish features that are common and unique to each sample. Although this concept is not new,27-33 we (17) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994-999. (18) Ross, P. L.; Huang, Y. N.; Marchese, J. N.; Williamson, B.; Parker, K.; Hattan, S. J.; Khainovski, N.; Pillai, S.; Dey, S.; Daniels, S.; Pukayastha, B.; Juhasz, P.; Martin, S.; Bartlet-Jones, M.; He, F.; Jacobson, A.; Pappin, D. Mol. Cell. Proteomics 2004, 12, 1154-1169. (19) Malmstrom, J.; Larsen, K.; Malmstrom, L.; Tufvesson, E.; Parker, K.; Marchese, J.; Williamson, B.; Hattan, S. J.; Patterson, D.; Martin, S.; Graber, A.; Juhasz, P.; Westergren-Thorsson, G.; Marko-Varga, G. J. Proteome Res. 2004, 3, 525-537. (20) Choe, L. H.; Aggarwal, K.; Franck, Z.; Lee, K. H. Electrophoresis 2005, 26, 2437-2449. (21) Ong, S. E.; Blagoev, B.; Kratchmarova, I.; Kristensen, D. B.; Steen, H.; Pandey, A.; Mann, M. Mol. Cell. Proteomics 2002, 5, 376-386. (22) Blagoev, B.; Kratchmarova, I.; Ong, S. E.; Nielsen, M.; Foster, L. J.; Mann, M. Nat. Biotechnol. 2003, 3, 315-318. (23) Baggerly, K. A.; Morris, J. S.; Edmonson, S. R.; Coombes, K. R. J. Natl. Cancer Inst. 2005, 97, 307-309. (24) Baggerly, K. A.; Morris, J. S.; Edmonson, S. R.; Coombes, K. R. J. Natl. Cancer Inst. 2005, 3, 315-318. (25) Coombes, K. R.; Morris, J. S.; Hu, J.; Edmonson, S. R.; Baggerly, K. A. Nat. Biotechnol. 2005, 23, 291-292. (26) Hattan, S. J.; Marchese, J.; Albertinetti, M.; Krishnan, S.; Khainovski, N.; Juhasz, P. J. Chromatogr., A 2004, 1053, 291-297. (27) Old, W. M.; Meyer-Arendt, K.; Aveline-Wolf, L.; Pierce, K. G.; Mendoza, A; Sevinsky, J. R.; Resing, K. A.; Ahn, J. G. Mol. Cell. Proteomics 2005, 4, 1487-1502. (28) Higgs, R. E.; Knierman, M. D.; Gelfanova, V.; Butler, J. P.; Hale, J. E. J. Proteome Res. 2005, 4, 1442-1450.

know of no demonstration of the utility of LC-MALDI for this purpose. We propose that LC-MALDI is the ideal platform for such a work flow due to the semipermanent nature of the sample preparation. Once the analyte is on the target it may be reanalyzed a number of times before the sample becomes depleted. This makes it feasible to perform iterative analyses in a resultdependent manner.34-36 This study demonstrates the comparative analysis of pooled samples of serum and plasma. The primary purpose of the study is (1) to demonstrate the viability of this analytical approach, (2) to identify as many proteins as possible from undepleted serum/plasma using a 1-h HPLC separation, and (3) to determine which proteins are differentially expressed and to discuss how these results compare to similar studies conducted by other means. The elements of the experimental design and signal acquisition deemed critical to success are discussed in detail. EXPERIMENTAL SECTION Serum/Plasma Enzymatic Digestion. Pooled serum and pooled plasma samples were purchased from Sigma (St. Louis, MO). Protein content of the samples was determined using the BioRad protein assay (500-0002) (Hercules, CA). Based on assay results, 12 µL of serum and 10 µL of plasma were diluted to 100 µL of 20 mM Tris, pH 8.5, 5 mM CaCl2, and 10% ACN to create 2 µg/uL samples. Samples were reduced and alkylated by the addition of 4 µL of 50 mg/mL tris(2-carboxyethyl)phosphine hydrochloride and incubated at 60 °C for 1 h, followed by equilibration to room temperature and the addition of 2 µL of 200 mM methyl methanethiosulfonate and incubation for 10 min. Trypsin digestion was performed using two additions of enzyme (at 0 and 2 h) in a 1:100 (enzyme/substrate) ratio. Samples were then incubated at 37 °C for 12 h. Liquid Chromatography. Peptide separation was performed using an Ultimate Chromatography system (Dionex-LC Packings, Sunnyvale, CA) equipped with a Probot MALDI spotting device. For all runs, 5 µg of sample was injected, captured on a 0.3 × 5 mm trap column (3 µm, C18, Dionex-LC Packings, Sunnyvale, CA), and then eluted and separated from the trap column hooked in series with a 0.1 × 150 mm analytical column (3 µm, C18 (Dionex-LC Packings). Peptides were resolved using a binary gradient ((A) 2% ACN, 0.1% TFA; (B) 85% ACN, 5% IPA, 0.1% TFA) of 5-40% B in 60 min, 40-95% B in 2 min, and 2-min wash at 95% B, at a flow rate of 600 nL/ min. Column effluent was monitored at 214 nm using a 3-nL UV flow cell. After UV detection, the column effluent was mixed with the MALDI matrix in a 1:2 ratio (29) Colinge, J.; Chiappe, D.; Lagache, S.; Maniatte, M.; Bougueleret, L. Anal. Chem. 2005, 77, 596-606. (30) Villanueva, J.; Philip, J.; Entengerf, D.; Chaparro, C. A.; Tanwar, M. K.; Holland, E. C.; Tempst, P. Anal. Chem. 2004, 76, 1560-1570. (31) Wiener, M. C.; Sachs, J. R.; Deyanova, E. G.; Yates, N. A. Anal. Chem. 2004, 76, 6085-6096. (32) Stewart. I. I.; Zhao, L.; Bihan, T. L.; Larsen, B.; Scozzaro, S.; Figeys, D.; Mao, G. D.; Ornatsky, O.; Dharsee, M.; Orsi, C.; Ewing, R.; Goh, T. Rapid Commun. Mass Spectrom. 2004, 18, 1697-1710. (33) Wang, W.; Zhou, H.; Lin, H.; Roy, S.; Shaler, T. A.; Hill, L. R.; Norton, S.; Kumar, P.; Anderle, M.; Becker, C. H. Anal. Chem. 2003, 75, 4818-4826. (34) Bondarenko, P. V.; Chelius, D.; Shaler, T. A. Anal. Chem. 2002, 74, 47414749. (35) Graber, A.; Juhasz, P.; Khainovski, N.; Parker, K. C.; Patterson, D. H.; Martin, S. A. Proteomics 2004, 4, 474-490. (36) Rejtar, T.; Hu, P.; Juhasz, P.; Campbell, J. M.; Vestal, M. L.; Preisler, J.; Karger, B. L. J. Proteome Res. 2002, 1, 171-9.

Analytical Chemistry, Vol. 78, No. 23, December 1, 2006

7987

through a 25-nL mixing tee (Upchurch, WA) and spotted onto a 3 in. × 5 in. MALDI plate in 10-s intervals. In this manner, it was possible to spot 5 runs of 264 spots for each sample onto a single MALDI target. A 10-s spotting interval was chosen because the average baseline peak width for eluting peptide was ∼20 s; therefore, most peptides were detected in two to three fractions. For any given peptide in any given run, the fraction of greatest signal intensity was chosen for MS/MS analysis. The matrix solution was 7.5 mg/mL R-cyano-4-hydroxycinnamic acid dissolved in 75:25 ACN-H2O containing 0.15 mg/mL dibasic ammonium citrate and 10 fmol/µL ACTH 1-18 clip as an internal mass standard (m/z 2465.2). Mass Spectrometry. Mass spectrometric analysis was performed using a 4800 MALDI TOF/TOF mass spectrometer (Applied Biosystems, Framingham, MA). MS-mode acquisitions consisted of 2000 laser shots averaged from 80 sample positions (“search pattern positions”). After completing the MS analysis of an entire sample plate, a peak list of all components that passed a selection criterion threshold of signal-to-noise ratio of g20 was compiled by the instrument software, starting from one HPLC analysis of serum and plasma. Masses recurring in adjacent spots were selected only from the fraction containing the peak apex based on intensity, and the signal intensity for any given mass was required to be below the detection threshold for two consecutive fractions before that mass was eligible for reselection. A mass range from 900 to 4000 Da was considered for peak selection. An exclusion filter was used to eliminate the internal mass standard and sodium and potassium precursor adducts from the peak list. The top 20 masses in each spot (10-s chromatography time) were then selected for MS/MS analysis. A total of 2500 laser shots were averaged from 50 sample positions. Onekiloelectronvolt collisions with air (the collision gas) were used to generate the high-energy CID spectra using a source voltage of 8 kV, a collision cell voltage of 7 kV, and a second accelerating voltage of 15 kV. Data collection was relatively rapid due to the 200-Hz repetition rate laser and high-speed sample stage. Data Processing. For MS analyses, the 4800 MALDI TOF/ TOF analyzer software was used to generate a list of masses and intensities (peak area) normally used to determine which precursors may be appropriate for MS/MS analysis from all 10 runs. This mass list, for each run, is deisotoped and reduced in the chromatography domain so that each eluting mass is listed only once (within adjustable time and mass tolerances). All lists were then imported into the MarkerView (Sciex Corp., Toronto, Canada) program software. MarkerView software was used to align the 10 runs to one another, using a time alignment tolerance of 2 min and a mass alignment tolerance of 0.1 amu. An aligned list containing 10 000 masses was generated. Normalization of samples by adjusting the intensity values by a run-specific normalization constant was not performed, as there was no more than 10% average deviation between runs to begin with. Had such normalizations been performed, many standard deviations of individual components would have improved, but nearly as many would have gotten worse. Within MarkerView software, the five plasma samples were defined as one group, while the five serum samples were defined as a second group. Two-sided t tests were then performed within MarkerView software to determine which aligned masses could differentiate plasma from serum. 7988

Analytical Chemistry, Vol. 78, No. 23, December 1, 2006

For MS/MS analyses, the Mascot search engine was used to obtain preliminary peptide identifications. All peptide identifications that did not correspond to well-documented abundant proteins in serum or plasma were discarded because (1) the spectrum in question was of low quality, (2) the spectrum could be attributed to chemically modified peptides from high-abundance serum proteins (especially serum albumin) upon manual inspection, or (3) the precursor appeared to be related by a neutral loss (usually of water or ammonia) from another precursor in the same well. To quantify the quality of fit, the MatSc39 parameter was calculated for all proposed identifications. This parameter was particularly useful in reconciling discrepancies (of which there were very few), and in some cases, in determining whether there was evidence that a spectrum that was not directly identified by Mascot nonetheless corresponded to a peptide that was identified by Mascot in another experiment. The aligned masses from MarkerView software containing the t-test probability scores of these 10 000 components were then imported into MS-Access, along with the results from the MS/ MS analyses. SQL was then used to align the masses aligned by MarkerView software to sequences that were identified by MS/ MS analysis. It was found that nearly all of the most intense masses had been selected for fragmentation, and most of them had been identified. A new field (SVenn) was added that consisted of a string containing 10 characters that designated whether the component was found in each of the runs. Hence, a SVenn of 1111111111 indicates the component was present in all 10 runs, while SVenn 0010000000 indicates the component was found in only the third run. In addition, average values and standard deviations were calculated for the plasma and serum runs separately. For Figure 3, this table was loaded from MS-Access into Spotfire. Peptides were placed into three separate bins based on the t-test score from MarkerView software. Peptides were designated “plasma enriched” if the t-test was >3, “serum enriched” if the t-test was 1 000 000 500 000-1 000 000 250 000-500 000 100 000-250 000 0-100 000

6314 552 610 1028 1888 2236 6314

all 4726 >1 000 000 503 500 000-1 000 000 546 250 000-500 000 859 100 000-250 000 1502 0-100 000 1316 4726

CV CV CV CV mass (Da) intensity fraction time (s) Serum 0.0075 0.0040 0.0065 0.0074 0.0084 0.0081

26.15 16.49 18.21 20.83 26.14 33.16

2.82 2.60 2.74 2.85 2.88 2.83

28.20 26.04 27.36 28.50 28.83 28.32

Plasma 0.0075 0.0040 0.0066 0.0075 0.0083 0.0084

33.39 17.40 19.81 23.00 32.31 53.15

2.96 2.69 2.83 2.94 2.99 3.08

29.52 26.88 28.32 29.34 29.88 30.84

a The table shows that variance in elution time and mass measurement do not depend on signal intensity, Variance in the intensity measurement does depend on signal strength.

in each run. As can be expected, the less precise the measurements of merit for alignment, the greater the probability that similar features will not be aligned; or worse, distinct features will be inappropriately aligned. This phenomenon increases as sample complexity increases due to the increased probability of different peptides with similar mass eluting in proximity. Of the peaks identified by instrument software, the top 10 000 (based on signal intensity) were submitted to MarkerView for potential alignment and feature assignment. For the serum sample, 6314 features were common to all five runs and 4726 for the plasma sample, representing ∼63 and 47% of the total number of possible features, respectively. However, as can be expected when considering the dynamic range of protein concentration in these samples (as with most proteomic samples), there are far more peaks detected at or near the detection threshold than there are at higher intensities,30-33 and therefore, there is a greater probability that these lower intensity peaks will be detected in one run and missed in the next.34,35 Therefore, an alternative metric to the overall success of the alignment is to quantitate the percentage of total signal intensity (total peak area) assigned to features found in all runs. Using this metric, 92% of the total signal intensity could be assigned to features common to all five runs in the serum sample and 87% in the plasma sample. Table 1 shows the CV for signal intensity and the standard deviation (SD) retention time and mass accuracy for features common to all runs binned as a function of signal intensity. The range of signal intensities varied from 5 001 936 counts on the high end to 588 on the low end with an average intensity value of 47 887 and a median value of 11 510 counts (the features are binned as shown in the far left column). As the table shows, average standard deviation for retention time was ( 3 spots or 30 s and varied only slightly with signal intensity. This finding is as expected since fidelity in retention time is almost exclusively dependent on the reproducibility of the chromatographic conditions and independent of the mass spectrometer. Parameters such

as sample injection volume, column temperature, column conditioning, and fidelity in solvent gradient construction are critical to reproducible analyte retention time. Table 1 shows that mass accuracy measurements have a slight dependence on signal intensity. The overall average standard deviation in mass measurement is (0.0075 Da and ranges from 0.004 Da for peaks with intensities of >1 000 000 to 0.0084 Da in peaks with intensities of