Peptide End Sequencing by Orthogonal MALDI Tandem Mass

Michael L. Nielsen,† Keiryn L. Bennett,† Brett Larsen,‡ Marc Moniatte,† and Matthias Mann*,†. MDS Proteomics A/S, Staermosegaardsvej 6, DK-5...
0 downloads 0 Views 217KB Size
Peptide End Sequencing by Orthogonal MALDI Tandem Mass Spectrometry Michael L. Nielsen,† Keiryn L. Bennett,† Brett Larsen,‡ Marc Moniatte,† and Matthias Mann*,† MDS Proteomics A/S, Staermosegaardsvej 6, DK-5230 Odense M, Denmark, and MDS Proteomics, Inc., 251 Atwell Drive, Toronto M9W 7H4, Canada Received December 14, 2001

Highly sensitive peptide fragmentation and identification in sequence databases is a cornerstone of proteomics. Previously, a two-layered strategy consisting of MALDI peptide mass fingerprinting followed by electrospray tandem mass spectrometry of the unidentified proteins has been successfully employed. Here, we describe a high-sensitivity/high-throughput system based on orthogonal MALDI tandem mass spectrometry (o-MALDI) and the automated recognition of fragments corresponding to the N- and C-terminal amino acid residues. Robotic deposition of samples onto hydrophobic anchor substrates is employed, and peptide spectra are acquired automatically. The pulsing feature of the QSTAR o-MALDI mass spectrometer enhances the low mass region of the spectra by approximately 1 order of magnitude. Software has been developed to automatically recognize characteristic features in the low mass region (such as the y1 ion of tryptic peptides), maintaining high mass accuracy even with very low count events. Typically, the sum of the N-terminal two ions (b2 ion), the third N-terminal ion (b3 ion), and the two C-terminal fragments of the peptide (y1 and y2) can be determined. Given mass accuracy in the low ppm range, peptide end sequencing on one or two tryptic peptides is sufficient to uniquely identify a protein from gel samples in the low silver-stained range. Keywords: proteomics • peptide sequencing • database searching • high-throughput protein identification • tandem mass spectrometry

Introduction Proteomics is concerned with the large-scale determination of protein function.1 A wide definition of proteomics now includes protein chip-based methods, technologies with oligonucleotide-based “read out”, such as the yeast two-hybrid system, and even large-scale approaches to define protein structure. In a narrower sense, mass spectrometry based proteomics encompasses techniques for protein quantification and determination of protein primary structure, including posttranslational modifications as well as protein-protein interactions. In all these applications, protein identification is a main requirement. Key elements of any such method are sensitivity, certainty of identification, number of components that can be determined in a mixture, and throughput. The first technology to become practical for the identification of “real world” biological samples was MALDI fingerprinting.2-6 Protein mixtures were usually separated by one- or twodimensional gel electrophoresis and stained and the bands or spots digested with trypsin. The mixture was then micropurified or applied directly to metal substrates and analyzed by MALDI time-of-flight mass spectrometry. The mass fingerprint was at first a rather nonspecific identification tool but rapidly improved with the advent of higher mass accuracy due to delayed * To whom correspondence should be addressed. E-mail: mmann@ mdsproteomics.dk. † MDS Proteomics A/S. ‡ MDS Proteomics, Inc. 10.1021/pr0155174 CCC: $22.00

 2002 American Chemical Society

extraction, new matrix preparation techniques, and higher sensitivity, which led to higher sequence coverage. MALDI TOF fingerprinting has become a relatively straightforward and highly useful identification method in laboratories around the world. Soon after MALDI fingerprinting was established as a protein identification method, electrospray tandem mass spectrometry also became an extremely powerful method for protein microcharacterization. This was due to the development of algorithms for searching protein sequence databases by tandem mass spectrometric data,7,8 as well as increasingly sensitive nanoelectrospray9 and on-line LC MS/MS methods.10,11 While ES tandem mass spectrometry is somewhat more complicated to perform, the results are much less ambiguous than MALDI fingerprinting because sequence related information of even one or two peptides usually identifies a protein. It is now possible to automate LC MS/MS such that protein samples in 96-well microtiter plates are analyzed via LC MS/MS in a completely automated manner. Another exciting development in LC MS/MS concerns the analysis of complex mixtures of proteins. For many samples protein gels do not need to be run but the crude protein mixture is digested and analyzed directly.12 A mixture consisting of several hundred proteins can now be analyzed in a single LC MS/MS run. Despite the advances in direct protein mixture analysis by LC MS/MS, most biological experiments result in gel-separated proteins that need to be analyzed by mass spectrometry. Journal of Proteome Research 2002, 1, 63-71

63

Published on Web 01/18/2002

research articles Furthermore, MALDI remains an attractive method for processing large numbers of samples due to the fact that protein digests can be laid out in small spots on MALDI targets. In our hands, the largest problem in automating MALDI mass fingerprinting has been the reliability of protein identification, particularly for minor components in a protein mixture. Therefore, ideally the instrument used for MALDI fingerprinting should also be able to provide peptide sequence. MALDI post source decay can in principle sequence peptides, but it has been difficult to control the fragmentation behavior sufficiently to enable reliable protein identification.13 MALDI TOF-TOF is another potential method for obtaining peptide sequence information;14 however, this method is still under development. MALDI has also been coupled successfully to the quadrupole ion trap15-17 and to FT ICR.18 The recently introduced MALDI quadrupole TOF instrument offers a number of advantages for proteomic applications.19-21 The MALDI process takes place at relatively high pressure, allowing effective collisional cooling, and ion detection is decoupled from ion generation, allowing larger laser fluencies to be used.22 Mass accuracy and resolution are very high in MS and MS-MS mode on an o-MALDI hybrid tandem mass spectrometer. A potential disadvantage of the method is the less informative fragmentation pattern of singly charged ions compared to multiply charged ions as encountered in electrospray mass spectrometry. Furthermore, the ion transmission of the quadrupole TOF combination is low for low mass ions. Here, we report on a novel conceptspeptide end sequencings which addresses these shortcomings. The pulsar function of a PE-Sciex quadrupole mass spectrometer is used in this case to selectively enhance the low mass range of the peptide mass spectrum. In the case of MALDI-generated ions, this mass range contains easily annotated ions such as the a2, b2 ion pair plus the y1, y2 ion pair. We show that this “end sequence” information, in conjunction with the molecular mass of the peptide, is sufficient to retrieve a peptide uniquely from large protein sequence databases. Peptide end sequencing retains the advantages of MALDI for the high-throughput measurement of proteins while adding the specificity of tandem mass spectrometry-based methods.

Experimental Procedures Materials and Reagents. Samples used throughout the experiments were obtained from internal projects at MDS Proteomics A/S, Odense, Denmark. All water used was obtained from a Milli-Q system (Millipore, Bedford, MA), and all chemicals were of analytical grade unless otherwise noted. Proteomics Platform. Prior to mass spectrometric analysis, an internally developed proteomics platform consisting of a spot-excising robot, a streamlined digestion step, and a liquiddispensing robot for sample deposition provide high-throughput sample preparation. The spot-excising robot combined with image-recognition software, SoftSpot, performs the excision and placement of gel plugs into 96-well microtiter plates. Destaining, reduction/alkylation, and digestion of the excised gel plugs is achieved via a streamlined procedure utilizing our DigestStation. In situ digestion of gel-separated proteins was performed with modified porcine trypsin (Promega Corp., Madison, WI) essentially as described.23,24 Tracking both samples and sample information is an essential component of a high-throughput proteomic platform. A barcoding system linked to our laboratory information management system (LIMS) has been implemented for this 64

Journal of Proteome Research • Vol. 1, No. 1, 2002

Nielsen et al. purpose. During the gel excision step, the 96-well microtiter plates are assigned unique barcodes, which links to all information concerning the sample including the position in the microtiter plate and the sample identification name. The information is retained throughout the process and retrieved from the LIMS when samples are robotically deposited onto the MALDI plate and mass spectrometric data is acquired, searched, and saved to the server. Preparation of MALDI-MS Samples. Samples were prepared for orthogonal MALDI-MS analysis using 35 mg/mL of 2,5-dihydroxybenzoic acid (DHB) (PE Biosystems, Cambridge, MA) in 35% acetonitrile, 0.1% TFA. Samples were concentrated and desalted on customized nanoscale columns9,25 packed in the end of a GELoader tip (Eppendorf-Netheler-Hinz GmbH, Hamburg, Germany) with Oligo R2 20 reversed-phase media (PE Biosystems, Cambridge, MA).26 The peptides bound to the media were washed with 10 µL of 0.1% TFA and eluted from the column with 2 µL of matrix onto a Teflon-coated stainless steel MALDI target with anchor points for accurate sample positioning (96-well format) (MDS Proteomics, Odense, Denmark). The coated target enabled further concentration of the eluted samples into discrete sample spots. This customized target not only improved the sensitivity of the mass spectrometer but also allowed easy sample localization for automated acquisition. Instrumentation. A hybrid QSTAR Pulsar quadrupole timeof-flight mass spectrometer (Applied Biosystems/MDS Sciex, Toronto, Canada) equipped with a prototype o-MALDI ion source was used for all experiments.19,22 The QSTAR tandem mass spectrometer can be described simply as a triple quadrupole configuration with the last quadrupole replaced by a TOF analyzer. Briefly, the system consists of an ion guide (q0) followed by two quadrupoles, analyzer (Q1) and collision cell (q2), and a reflecting TOF mass analyzer with orthogonal ion injection (Figure 1). A skimmer lens installed between the o-MALDI ion source and q0 increases transmission of ions from the MALDI plume to the ion guide and cools the ions prior to entering the quadrupole section (Q1). A unique feature of the QSTAR is the linear acceleration (LINAC) Pulsar high-pressure collision cell technology, which allows ions to be trapped efficiently in the collision cell and pulsed into the mass analyzer in packets. A small trapping potential is applied to both ends of the collision cell, and by lowering the potential the ion packet is gated into the pusher region of the TOF analyzer and synchronized with extraction into the TOF analyzer. Pulsing of the ions allows up to 1 order of magnitude enhancement of peak intensities.27 Laser pulses were generated with a nitrogen laser (Model VSL-337ND-S, Laser Science Inc., Franklin, MA) operating at 337 nm and with pulse energy of 300 µJ. A 200 µm core diameter fused-silica optic fiber delivered the laser pulses to the target with a frequency of 30 Hz. The output of the fiber optic was directed onto the target by a focusing lens, shaping an elliptical image of the laser of approximately 0.3 mm × 1 mm on the sample. Acquisition of MS and MS-MS Spectra. Single MS experiments were performed with Q1 and q2 operating in RF-only mode. Ions were transmitted efficiently resulting in measurement of the entire mass range with high resolution and mass accuracy. MS spectra were acquired using pulsing in two regions covering the mass range from m/z 800 to 2400 with an average enhancement of a factor of 3-4 in ion intensities. Tandem MS experiments were performed with Q1 operating

research articles

High-Throughput Peptide End Sequencing by o-MALDI

Figure 1. Representation of the MDS Sciex QSTAR mass spectrometer equipped with a orthogonal MALDI ion source, which is used throughout this study.

in the mass-filter mode. Typically, a mass window of 3 Da was selected in order to transmit the entire isotopic envelope of the precursor ion species. By selecting precursor ions of interest in Q1, structural information from the tandem mass measurements could be obtained in the TOF section by fragmenting the precursor ion in the collision cell q2, using nitrogen as collision gas. A narrow mass range from m/z 120 to m/z 500 was chosen for acquisition of tandem mass spectra, to optimize both transmission through the quadrupole and enhancement of fragment ions using the pulsing feature of the o-MALDI QSTAR. The timing conditions for pulsing from the collision cell into the TOF were optimized for m/z 200. In general, ions close to this value gained an order of magnitude in intensity. The gain was smaller for ions at both higher and lower m/z values, and almost no ions were recorded above m/z 400 The collision energy for fragmentation was adjusted by a computer script proportionally to the precursor ion mass and set at a relatively high level compared to typical collision energies for measuring large precursor ions. Calibration. As mentioned previously, the QSTAR mass spectrometer consists of both a quadrupole and a TOF section, but it is only the latter that provides accurate mass. Once the quadrupole region has been calibrated, it can be relied upon for stability over a period of weeks to months assuming the room temperature is stable. One of the main reasons for the high mass accuracy in the TOF section is the linearity of the mass calibration scale (simple square root dependence of mass on time-of-flight). Provided that the calibration points are not too close together, a simple two-point calibration is usually accurate over the entire mass range. A daily internal calibration on the two peptide fragments from porcine trypsin at m/z 842.509 and m/z 2211.104 is sufficient to maintain mass accuracy of the instrument below 30 ppm. Automated Acquisition. A software tool, o-MALDI Automaton (MDS Proteomics), was developed to complement the

automatic acquisition features of the o-MALDI Analyst software (MDS Sciex, Toronto, Canada), which controls the instrument. It creates and automatically loads acquisition batches from the LIMS into the Analyst software. The loaded information contains sample information retrieved from the barcode on the 96-well microtiter plate. MS and MS/MS spectra are automatically acquired within Analyst and saved onto the server in a predetermined file structure. Database Search. MS and MS/MS spectra were searched with the PepSea database search system developed by MDS Proteomics. Searches were performed against a set of nonidentical protein sequences obtained from NCBI (www.ncbi.nlm.nih.gov). There are currently more than 540 000 protein sequences contained in the database. PepSea runs on a LINUX cluster on up to 200 nodes. Search times per spectrum are below 100 ms on 100 nodes.

Results and Discussion Peptide End Sequencing. Figure 2 demonstrates the principle of peptide end sequencing. Peptides generated by sequence specific proteases are fragmented and give rise to specific low mass ions. In the case of trypsin, which cleaves C-terminal to Arg and Lys, there are only two possibilities for the y1 ion mass, namely 147.1134 and 175.1195 Da. A spectrum can be readily assessed for the presence of these ions. On the basis of this information, there are only 19 possibilities for the y2 ion (20 different amino acids, but Leu/Ile cannot be distinguished), which is also visible in the spectrum in almost all cases. Even when the y1 ion is of low intensity or of low signal-to-noise so that it is not observed, there are only 38 possibilities for the y2 ion and it is readily recognized. The next easily recognized feature in most spectra is the a2, b2 ion pair, which is spaced by the mass of CO. In the case of unmodified peptides, the b1 ion is not observed28 but there are only 176 possible masses for the a2, b2 ion pair (190 two-amino acid Journal of Proteome Research • Vol. 1, No. 1, 2002 65

research articles

Nielsen et al.

Figure 2. End sequencing principle. Peptides isolated in the QSTAR o-MALDI are fragmented by CAD. Low mass fragments such as y1, y2, b2, and b3 fragment ions plus the recalibrated accurate molecular mass of the intact peptide are used to “end sequence” the peptide. In the given example (see text), we derive the sequence pattern [A,P]A...IR. The pattern and the molecular mass uniquely identify the peptide in the database.

compositions, of which 14 have the same mass). This pair is very prominent in the spectra if relatively high collision energies are used. We usually also observe the b3 ion in the low mass region of the mass spectrum. If we assumed equal abundance for all amino acids, knowledge of these ions would improve the search of the peptide by the following factors: a factor of 2 for the y1 ion, a factor of 19 for the y2 ion, and more than a factor of 100 for the a2, b2 pair. The precise number of combinations for each of the fragment masses can be calculated in advance, so it is known for the fragment masses actually found in the spectrum. On the basis of the found fragment masses, the amino acid occurrence can be taken into account. The b3 ion then contributes a further factor 19 to the search specificity on average. Thus, we would expect that knowledge of the y1, y2 and a2, b2 ion pairs plus the b3 ion would contribute a factor of about 2 × 19 × 150 × 19 or more than 120 000. This is to say that, on average, a peptide mass compatible with several thousand tryptic peptide sequences would find only the correct sequence in almost all cases. A search of tryptic peptide masses in large sequence databases containing several hundreds of thousands of proteins with a mass accuracy of 20-30 ppm usually retrieves several hundred different tryptic peptides. The exact number strongly depends on the mass but is in this range for peptides of 12 amino acids or longer. Therefore, knowledge of the two C-terminal amino acids and the sum of the two N-terminal amino acids should usually result in a unique identification of the peptide. If the 66

Journal of Proteome Research • Vol. 1, No. 1, 2002

b3 ion has been observed, which is typically the case in o-MALDI experiments, the certainty of identification would be even higher. In the case of human proteins, there are in the order of 40 000 coding sequences. Thus, for peptides from human, or other mammalian genomes of similar size, knowledge of the end sequences should be sufficient for identification. Finally, peptide end sequencing has several operational advantages: as the low mass peaks are of unique origin there are only a few combinations; therefore, the tandem mass spectra can be calibrated on these peaks. For example, if a y1 ion from an arginine residue is encountered in the spectrum, it can be used for calibration. Likewise, most combinations of a2, b2 ions are unique and can be used in this manner. In the case of o-MALDI on a quadrupole TOF instrument, the resolution of about 10 000 and the highly linear (square root) relationship between the time-of-flight and the mass means that the low mass region of the spectrum can be interpreted very simply and accurately. Finally, tryptic peptide masses can be calculated “on the fly” or precalculated, and low mass fragment ions can also be readily calculated. Therefore, database searches, with an accurate precursor mass and, for example, the y1 and y2 ion as a second database key, can be done exceedingly fast. Simple peptide modifications, such as oxidized methionine, can also be taken into account by considering both possibilities. In general, modified peptides would not be identified by this

High-Throughput Peptide End Sequencing by o-MALDI

research articles

Figure 3. Analysis by QSTAR o-MALDI of a low-level Coomassie-stained gel excised from a 1D gel. (A) Band of approximate molecular mass 30 kDa excised and digested in situ with trypsin is indicated by the arrow. (B) Peptide mass fingerprint of the digested sample. The majority of peaks in the spectrum are due to trypsin autodigestion products. The peak indicated by the arrow was isolated and analyzed by MS/MS. Calculated tryptic peptide masses from the protein identified in (C) are labeled with an asterisk. (C) o-MALDI MS/MS spectrum obtained in the low mass region WITHOUT pulsing. (D) o-MALDI MS/MS spectrum obtained in the low mass region WITH pulsing. The increase in ion intensity is indicated.

approach but would be “flagged” as not corresponding to a database entry and can then be sequenced in more detail and with longer acquisition times and mass ranges. Description of a Typical Experiment. A typical proteomic experiment consists of separating a protein mixture according to molecular mass by 1-D gel electrophoresis (SDS-PAGE) or by both molecular mass and isoelectric point via 2-D gel electrophoresis (2D-PAGE).29 The separated proteins are visualized by silver or Coomassie-blue staining, cleaved by trypsin, and analyzed by mass spectrometry. The mass spectrometric identification of a protein usually begins by acquiring and searching a peptide mass fingerprint (PMF). In general, a protein is considered unequivocally identified when at least five to seven peptides match an in silico digest of the database. This assumes a mass accuracy greater than 30 ppm and a sequence coverage of at least 15%; it also assumes that the next candidate in the database is significantly lower in score.30 Tandem mass spectral information is required when a positive identification from the PMF is not possible. Here, we report the use of a rapid end sequencing method for gathering the necessary tandem mass spectrometric information in order to identify a protein.

Figure 3B shows the peptide mass map obtained from the tryptic digest of the weakly Coomassie-stained band excised from the 1D gel shown in Figure 3A (marked by the arrow). Molecular mass markers indicated that the protein of interest has an apparent molecular mass of 32 kDa. The spectrum was acquired on the o-MALDI QSTAR, and the PMF was searched in the NCBI nonredundant database, yielding numerous hits with four to eight matching peptide masses within the 30 ppm mass accuracy constraint. Positive identification of the protein from the peptide mass map, however, was not possible. Therefore, a single tandem mass spectrum was acquired from the peptide precursor ion at m/z 935.471 (Figure 3C). Note that the intensity of the fragment ions is poor when no pulsing was used. A comparison of MS/MS spectra with and without pulsing (Figure 3C,D) clearly indicates the advantages of this technique. The two fragment ions closest to the pulsing mass (m/z 200) (y1 at m/z 175.120 and b2 at m/z 169.112) are both enhanced 15 times compared to the nonpulsed spectrum, and the b3 ion at m/z 288.205 is enhanced 10 times. This enhancement is only observed over a limited mass range and reflects pulsing settings that maximize sensitivity at the expense of mass range. It is Journal of Proteome Research • Vol. 1, No. 1, 2002 67

research articles

Nielsen et al.

clear that the fragmentation information obtained from the spectrum acquired without pulsing is much more difficult compared to the spectrum with pulsing.

sequencing the energy was set to a relatively high value that efficiently produces low mass fragment ions from a wide variety of peptides.

Identification of the C-terminal amino acid in the tandem MS as arginine is straightforward because of the diagnostic y1 ion at m/z 175.111 shown in Figure 3D. No ion was observed at m/z 147.113, confirming the absence of a lysine amino acid residue at the C-terminus of the peptide. A single-point internal recalibration of the spectrum using the accurate mass of the diagnostic y1 ion from arginine yielded low ppm mass accuracy across the entire low mass range.

As shown in this example, the low-mass region yields sufficient unique sequence information for peptide identification even for a relatively small peptide of eight amino acids. The statistics are much more favorable for longer peptides where only a few hundred sequence possibilities in the database may fit the accurate parent peptide mass. We find that peptides up to the mass range of ∼2000-2500 Da fragment efficiently to yield low mass information. We conclude that for the identification of unmodified peptides whose sequence is contained in a database, the low mass region is both very simple to interpret and sufficiently information rich for unambiguous identification. Acquisition time and sample consumption are both reduced. If the peptide is modified, it would not be identified in the database and a complete MS/MS mass spectrum could be obtained for more in depth characterization. Importantly, the end sequence information would still aid in sequencing or identification of the peptide.

Submission of the specific C-terminal information plus the accurate mass of the fragmented peptide to the NCBI database returned more than 4300 possibilities from more than 500 000 protein sequence entries. The only possibility for the y2 ion was the peak at m/z 288.205, and thus, the second amino acid from the C-terminus was determined as leucine/isoleucine. Addition of this information to the search criteria narrowed the number of returned identifications by a factor of 14. At this point, there were 300 candidates for the protein. In contrast to the y1 and y2 ions, which are readily identified in most tandem mass spectra, we have found that the third C-terminal residue is not observed for all peptides. In most cases, the peak corresponding to the y3 fragment ion appears in the range above m/z 400 and thus is outside the optimized pulsing region of the mass spectrometer (m/z 120-350). The prominent ion pair displaying a mass difference corresponding to CO, m/z 141.105/169.102, indicated the presence of an a- and b-type fragment ion pair in the tandem mass spectrum. Only the amino acid combination of AP or PA leads to this a2, b2, pair, fixing the two N-terminal amino acids in addition to the two C-terminal ones. Using the combined information from both the N- and C-termini end sequencing the search results returned from the NCBI database were reduced to a single peptide sequence. Furthermore, the only possible sequence extension from the b2 ion is the ion at m/z 240.137, indicating an alanine as the third N-terminal amino acid. Combining all the information obtained from the data unambiguously retrieved the peptide sequence APAMFNIR, which is present in the ribosomal protein S3a. The protein has a mass of 30.1 kDa, consistent with the expected mass as determined from the 1D gel (Figure 3A). The peptide is also found in a number of homologous ribosomal proteins from various species; however, since the sample was of human origin, we concluded that the peptide is from human S3a. After identification, the masses of the calculated tryptic peptides from this protein were marked in the peptide mass spectrum. As can be seen in Figure 3B, four of the predicted peptides match peaks in the spectrum within 30 ppm, further supporting the identification. In addition to the fragment ions used for end sequencing, there is a large peak at m/z 210.153. There is no possible a2, b2, y2, or internal ion at this mass, and therefore, the most plausible origin of the ion is from fragmentation of matrix clusters, consistent with the fact that we have observed this ion in several MS/MS spectra of previously sequenced peptides. In the experiment described above, the collision energy was proportionally adjusted according to the mass of the peptide. When larger fragments are recorded, the collision energy needs to be adjusted carefully in order to obtain the optimal balance between efficient fragmentation of the precursor mass and stability of large fragment ions. In contrast, for peptide end 68

Journal of Proteome Research • Vol. 1, No. 1, 2002

Sensitivity. Initial experiments with the o-MALDI prototype instrument revealed that the sensitivity was limited to the Coomassie-stain range (50-100 ng of protein in a band) in routine work. This level of sensitivity is not sufficient for the more interesting proteomics projects; thus, efforts were made to dramatically increase the sensitivity. Combining several developments, the hydrophobic anchor target, an improved ion optics in the ionization region plus the gain from amplifying the low mass region, the sensitivity was increased such that it is now adequate down to the silver stained range (few ng of protein per band). This is demonstrated by acquisition of peptide mass spectra from a silver-stained gel plug excised from a 2D gel (Figure 4A). The peptide mass map shown in Figure 4B did not yield any distinct protein match when searching the data in the NCBI protein database. Widening the search parameters to require only three peptides for identification still did not produce any protein hits. Therefore, the precursor ion at m/z 1298.762 (the most intense ion in the mass spectrum) was isolated and a tandem mass spectrum acquired. The result is shown in Figure 4C. The C-terminal amino acid was easily identified as a lysine residue by the appearance of the y1 ion at m/z 147.105, and the spectrum was recalibrated on this mass. Because of the high resolution of the instrument, we find that relatively small peaks still contain sufficient information for calibration. The only fragment mass corresponding to a y2 ion was at m/z 204.137, indicating a glycine in the penultimate position. In this case the series could be extended to the y3 ion at m/z 305.203, leading to the sequence TGK at the C-terminus. In this spectrum, there is only one possibility for the a2, b2 pair, namely at m/z 209.141/237.130, indicating the combination VH or HV. Another a, b type ion pair was observed at m/z 322.230/ 350.224 indicating an isoleucine/leucine residue in the third N-terminal position. Note that high mass accuracy is required to assign an ion to a definite sequence combination. The ion pair at m/z 322.230/350.224 might have been mistaken as the a2, b2 pair for WY; however, this possibility leads to a mass difference of 200 ppm from the recalibrated mass and thus can be discarded. Therefore the N-terminal sequence was VHI/L or HVI/L. Combining the C- and N-termini end sequencing searches with the PMF unequivocally identified the peptide as

High-Throughput Peptide End Sequencing by o-MALDI

research articles

Figure 4. Analysis by QSTAR o-MALDI of a silver-stained gel piece excised from a 2D gel. (A) Magnification of a region of the 2D gel. The sample spot that was excised and digested is marked by an arrow. (B) Mass spectrum of peptides derived from the gel spot. (C) Tandem MS/MS spectrum of the low mass region using the pulsing function of the QSTAR o-MALDI. Combination of N- and C-termini sequencing unequivocally identified the sample as a eukaryotic translation initiation factor protein. Calculated tryptic peptide masses from this protein are labeled with an asterisk.

VHLVGIDIFTGK and the protein as eukaryotic translation initiation factor 5A. This experiment clearly demonstrates that the o-MALDI QSTAR in the peptide end sequencing mode is capable of identifying proteins from low-level silver-stained gels. Automation and Throughput. Large-scale proteomics studies based on either 1-D (SDS-PAGE) or 2-D gel electrophoresis (2D-PAGE) can easily result in hundreds of gel-separated proteins which have to be identified by mass spectrometry. It is therefore essential to be able to achieve this task in a reasonable time frame and with minimal operator intervention. The different stations of the proteomic platform, described earlier, from the biological experiment to protein identification are schematically depicted in Figure 4. The estimated number of samples that can be processed per hour is given at each step. Prior to mass spectrometric analysis and acquisition, the proteomic platform consisting of excision, digestion and deposition of samples can process an average of 100 samples per hour. The limiting step in this process is the semiautomated

DigestStation, one unit of which can destain, digest and alkylate/reduce a 96-well microtiter plate of samples within 1 h. A single excision and deposition robot can process 300 to 400 samples per hour, and the DigestStation could be scaled to this throughput by complete automation using liquid dispensing robots for the pipetting steps (Figure 5). Current typical acquisition times for an MS and an MS/MS experiment on the o-MALDI are 60 and 30 s, respectively. These values vary with the amount of sample deposited on the target, and larger amounts can be processed in less time. In addition to acquisition time, it takes ∼5 s to move the target from one sample position to the next and begin acquisition. Assuming acquisition of five tandem mass spectra per sample, these numbers equate to an acquisition time for an o-MALDI target plate (i.e., a single 96-well microtiter plate) of just under 6 hs or approximately 20 samples per hour. Doubling the number of MS/MS spectra acquired per sample increases the acquisition time to almost 10 h; however, the amount of information gained also increases markedly. These times already compare Journal of Proteome Research • Vol. 1, No. 1, 2002 69

research articles

Nielsen et al.

Figure 5. Process overview for high-throughput analysis of samples by o-MALDI QSTAR. Gel-based samples are robotically excised, digested via a streamlined process, robotically deposited onto the MALDI target, and analyzed in an automated fashion by QSTAR o-MALDI MS and MS/MS. Data from the end sequencing approach are processed and searched on a high-speed computer cluster to return identified proteins.

favorably with LC MS/MS analysis, where typically not more than a few samples can be processed per hour. Improvements that would have a direct impact on throughput include reducing the sample consumption rate and analysis time per spectrum. Implementation of a high-repetition laser (1 kHz vs currently 30 Hz) would dramatically decrease analysis time as suggested by Loboda et al.19 and would be supported by the existing translation stage motors. Such a laser will reduce the energy per pulse to about 30 µJ, but a useful fluence should still be easily achieved by coupling the laser to an optic fiber with a smaller diameter (>200 µm) and focusing the output of the laser onto the target in a smaller spot size (∼0.1 mm in diameter). As demonstrated above in the peptide end sequencing strategy, the number of spectra required to return an unequivocal identification is much less than that required for full spectrum acquisition. Since almost all of the acquisition time is currently spent in accumulation of laser shots, acquisition time could decrease to a few seconds per sample. Database searching of the mass spectra obtained can be achieved automatically - in these experiments by the PepSea search engine running on a large LINUX cluster comprising several hundred nodes. Search speeds are in the millisecond range and are therefore not time limiting. In conclusion, an overall analysis speed of several seconds per protein should be possible with the system described above. Throughput would increase with the implementation of highspeed lasers and scaling of the digestion unit. 70

Journal of Proteome Research • Vol. 1, No. 1, 2002

Conclusion and Perspectives Peptide end sequencing has been described and implemented on an o-MALDI quadrupole time-of-flight system. We have shown that the low mass range of the spectrum is uniquely simple to interpret, given the fact that only few amino acid combinations lead to the observed low fragment masses and that this region of a quadrupole TOF instrument can be recalibrated on identified fragment ions to effect low ppm mass accuracy. Sensitivity sufficient for the identification of low silver-stained protein spots has been demonstrated. Throughput on our proteomic system is currently about 20 samples per hour per mass spectrometer. Introduction of a high repetition rate laser could decrease analysis time per sample to several seconds. Identifications are based on tandem mass spectrometric data, which is automatically searched in less than 1 s. We conclude that the system described here has great advantages for the high-throughput identification of proteins separated by gel electrophoresis. Further development of search software should also allow the identification of simple posttranslational modifications plus the selective sequencing of peptide masses that are not in agreement with the calculated tryptic masses of an already identified component of the protein band. In principle, o-MALDI peptide end sequencing could also be applied to the analysis of complex peptide mixtures. For example, such mixtures could be vacuum deposited as strips on a suitable substrate.31 We speculate,

High-Throughput Peptide End Sequencing by o-MALDI

research articles

however, that the main application of o-MALDI will remain the analysis of gel-separated proteins, whereas complex mixture analysis will be performed by electrospray tandem mass spectrometry. In this regard, we note that the end sequencing principle is not limited to o-MALDI tandem mass spectrometry, but can also be applied to electrospray tandem mass spectrometry.

(11) Aebersold, R.; Goodlett, D. R. Chem. Rev. 2001, 101, 269-295. (12) Washburn, M. P.; Wolters, D.; Yates, J. R., III. Nat. Biotechnol. 2001, 19, 242-247. (13) Spengler, B.; Kirsch, D.; Kaufmann, R.; Jaeger, E. Rapid Commun. Mass Spectrom. 1992, 6, 105-108. (14) Medzihradszky, K. F.; Campbell, J. M.; Baldwin, M. A.; Falick, A. M.; Juhasz, P.; Vestal, M. L.; Burlingame, A. L. Anal. Chem. 2000, 72, 552-558. (15) Qin, J.; Ruud, J.; Chait, B. T. Anal. Chem. 1996, 68, 1784-1791. (16) Krutchinsky, A. N.; Kalkum, M.; Chait, B. T. Anal. Chem. 2001. (17) Doroshenko, V. M.; Cotter, R. J. Anal. Chem. 1996, 68, 463-472. (18) Hettich, R. L.; Buchanan, M. V. J. Mass Spectrom. Ion Processes 1991, 111, 365-380. (19) Loboda, A. V.; Krutchinsky, A. N.; Bromirski, M.; Ens, W.; Standing, K. G. Rapid Commun. Mass Spectrom. 2000, 14, 1047-1057. (20) Shevchenko, A.; Loboda, A.; Shevchenko, A.; Ens, W.; Standing, K. G. Anal. Chem. 2000, 72, 2132-2141. (21) Verhaert, P.; Uttenweiler-Joseph, S.; de Vries, M.; Loboda, A.; Ens, W.; Standing, K. G. Proteomics 2001, 1, 118-131. (22) Krutchinsky, A. N.; Loboda, A. V.; Dworschak, S. R.; Ens, W.; Standing, K. G. Rapid Commun. Mass Spectrom. 1998, 12, 508518. (23) Jensen, O. N.; Wilm, M.; Shevchenko, A.; Mann, M. Methods Mol. Biol. 1999, 112, 513-530. (24) Shevchenko, A.; Wilm, M.; Vorm, O.; Mann, M. Anal. Chem. 1996, 68, 850-858. (25) Gobom, J.; Nordhoff, E.; Mirgorodskaya, E.; Ekman, R.; Roepstorff, P. J. Mass Spectrom. 1998, 34, 105-116. (26) Kussmann, M.; Lassing, U.; Sturmer, C. A.; Przybylski, M.; Roepstorff, P. J. Mass Spectrom. 1997, 32, 483-493. (27) Chernushevich, I. V. Eur. Mass Spectrom. 2000, 471-479. (28) Schlosser, A.; Lehmann, W. D. J. Mass Spectrom. 2000, 35, 13821390. (29) O’Farrell, P. H. J. Biol. Chem. 1975, 250, 4007-4021. (30) Jensen, O. N.; Podtelejnikov, A.; Mann, M. Rapid Commun. Mass Spectrom. 1996, 10, 1371-1378. (31) Preisler, J.; Foret, F.; Karger, B. L. Anal. Chem. 1998, 70, 5278-5287.

Acknowledgment. We thank our colleagues at MDS Proteomics Odense, Toronto, and Charlottesville for fruitful discussions and assistance with experiments. Dr. Chris Lock at MDS Sciex provided o-MALDI hardware upgrades and assistance with the Analyst software. Thanks are also extended to Dr. Albrecht Gruhler and Dr. Dan Bach Kristensen for provision of the samples used in the analyses. References (1) Pandey, A.; Mann, M. Nature 2000, 405, 837-846. (2) James, P.; Quadroni, M.; Carafoli, E.; Gonnet, G. Biochem. Biophys. Res.Commun. 1993, 195, 58-64. (3) Mann, M.; Hojrup, P.; Roepstorff, P. Biol. Mass Spectrom. 1993, 22, 338-345. (4) Yates, J. R.; Speicher, S.; Griffin, P. R.; Hunkapiller, T. Anal. Biochem. 1993, 32, 397-408. (5) Henzel, W. J.; Billeci, T. M.; Stults, J. T.; Wong, S. C.; Grimley, C.; Watanabe, C. Proc. Nat. Acad. Sci. U.S.A. 1993, 90, 5011-5015. (6) Pappin, D. J. C.; Hojrup, P.; Bleasby, A. Curr. Opin. Biotechnol. 1993, 3, 327-332. (7) Wilm, M.; Mann, M. Anal. Chem. 1996, 68, 1-8. (8) Eng, J. K.; McCormack, A. L.; Yates, J. R., III. J. Am. Soc. Mass. Spectrom. 1994, 5, 976-989. (9) Wilm, M.; Shevchenko, A.; Houthaeve, T.; Breit, S.; Schweigerer, L.; Fotsis, T.; Mann, M. Nature 1996, 379, 466-469. (10) Hunt, D. F.; Yates, J. R., III; Shabanowitz, J.; Winston, S.; Hauer, C. R. Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 6233-6237.

PR0155174

Journal of Proteome Research • Vol. 1, No. 1, 2002 71