Mass Spectrometric Methods for Protein Sequencing - Analytical

Nov 1, 1986 - Mass Spectrometric Methods for Protein Sequencing. Klaus Biemann. Anal. Chem. , 1986, 58 (13), pp 1288A–1300A. DOI: 10.1021/ac00126a71...
0 downloads 7 Views 9MB Size
MassSpectrometric

P

roteins are among the most important components of all living systems. Their functions range from catalysts (enzymes) to regulators to structural components. The building blocks of proteins are about 20 amino acids (H2N-CH(R)-COOH), all differing in the structure of R, linked together by peptide bonds (-CO-NH-) in chains that may consist of a few dozen to more than 1000 amino acids. The determination of the "primary" structure of proteins, namely, the arrangement (sequence) of the various amino acids along this chain, is a formidable task. It is generally accomplished by cleaving the large chain into smaller, more manageable segements and determining their amino acid sequence first. There are an enormous number of ways in which these 20 building blocks can be linearly assembled and, therefore, an equally large number of possible peptides that can result from the chemical or enzymatic cleavage of a protein. However, they all have an important structural feature in common—a repeating backbone of consecutive -NH-CH(R)-CO- units and the strict limitation that R must be one of the side chains of the naturally occurring amino acids. These restrictions make it possible to determine the structure of peptides by mass spectrometry (MS) despite their almost infinite variability.

1288 A

ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986

The potential usefulness of MS for determining the amino acid sequence of peptides was recognized in the late 1950s (1). Cleavage of a single bond along the backbone produces a fragment whose mass is related to the sum of all side chains and the number of peptide backbone units retained in the fragment. Thus, interpretation of the mass spectrum of a suitable peptide derivative to provide information about a previously unknown sequence of amino acids is generally rather straightforward. During the 1960s and 1970s, however, the technique's contribution to the elucidation of protein structure was quite limited (2, 3). Determination of the amino acid sequence of small peptides was possible only after extensive chemical conversions to derivatives that were sufficiently volatile to permit their introduction into the spectrometer via a gas chromatograph or the solid-sample probe. One benefit of the gas chromatograph was that it allowed analysis of complex peptide mixtures derived from proteins; the solid-sample probe was useful only for derivatives of single peptides or very simple mixtures. The gas chromatography /mass spectrometry (GC/MS) technique required the more complicated conversion to W-trifluoroethyl-O-trimethylsilyl polyamino alcohols (4), whereas N-acetyl-iV,0-permethylated derivatives (5) were generally used for direct introduction. The reactions involved in these derivatizations and the sequence-specific fragmentation upon electron ionization (EI) are illustrated in Scheme I. Most often these mass spectrometric techniques were used to answer specific questions, such as the identification of N-terminal blocking groups and sequences. These blocked peptides are not amenable to manual or automated Edman degradation (6), a chemical method that removes and identifies one amino acid after the other in a stepwise fashion. It is the only widely used practical method for 0003-2700/86/ A358-1288$01.50/0 © 1986 American Chemical Society

Report Klaus Biemann Department of Chemistry Massachusetts Institute of Technology Cambridge, Mass. 02139

Methods for Protein Sequencing the direct determination of the amino acid sequence of polypeptides and proteins and can generally be carried through 10-30 steps and even further in favorable cases. Because both the reductive GC/MS method and the permethylation technique have now been eclipsed by an entirely different approach, they will not be discussed here in detail. It suffices to say that their advantages and disadvantages nicely complemented those of the Edman degradation. Thus, a combination of the two became an efficient strategy for the determination of the primary structure of a few small proteins in the late 1970s and early 1980s. Most notable among them was the membrane protein bacteriorhodopsin, which consists of a single chain of 248 amino acids (7). Because it exhibits unique solubility properties—it is very hydrophobic—it could not be cleaved by en-

zymes, normally the first step in sequencing. It is soluble only in 80% formic acid, the solvent of choice for chemical cleavage with cyanogen bromide. This reagent splits the peptide bond after methionine resulting, in this case, in 10 peptides, ranging in length from 4 to 50 amino acids, each ending in homoserine (the reaction product of methionine). Only the peptide representing the original C-terminus of the protein does not end with homoserine. Partial acid hydrolysis of each of these 10 peptides, followed by reductive derivatization and analysis of the resulting complex mixture, provided the sequences of many di- to hexapeptides. The sequences could be assembled either by lining them up where there was sufficient redundancy in the information or by using partial sequences derived at about the same time by the automated Edman degra-

dation. The latter method often did not define the sequence all the way to the C-terminal homoserine of the particular peptide. As these hydrophobic peptides become shorter and shorter during the Edman degradation, they are increasingly soluble in the organic solvents used in this procedure and are then quickly lost. On the other hand, the derivatives of nonpolar amino acids are particularly well suited for MS because they can be easily transmitted through a gas chromatograph or vaporized from a solids probe. DNA-protein correlation Around 1980 the picture changed dramatically in a number of ways. First, the fine art of chemical sequencing of proteins reached a pinnacle when the primary structure of the first protein that consisted of more than 1000 amino acids was determined (8).

Scheme I. Derivatization reactions and resulting El fragments

ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986 · 1289 A

But this achievement appeared to be rendered obsolete by the development of relatively simple, fast, and reliable methods (9,10) for the determination of the base sequence of the gene coding for a given peptide. The amino acid sequence of the peptide could then be simply derived by translation of the DNA sequence using the genetic code. This meant that one did not have to determine the amino acid sequence of the protein directly, as long as the corresponding gene could be isolated. Recombinant DNA techniques (genetic engineering) came along at the same time and made this process even simpler and less timeconsuming. However, a number of problems remained. The codon for each amino acid corresponds to three bases in the DNA chain, and for a protein of 1000 amino acids one has to identify correctly at least 3000 consecutive nucleotide bases without a single error of omission, insertion, or misidentification. In addition, because each correct DNA sequence can be translated into three entirely different strings of amino acids (the so-called reading frames) some protein data are necessary to define the correct one. Conventionally, this was achieved by determining the amino acid sequence of the protein for a short portion at the N-terminal and

C-terminal end and redundantly sequencing the entire gene to reveal any errors in an individual sequencing experiment. Here again, MS proved quite useful. Reductive derivatization of a partial acid or enzymatic hydrolysate of even a large protein produced a complex mixture of hundreds of peptide derivatives. By GC/MS it was possible to determine enough sequences of three to six consecutive amino acids derived from regions randomly located along the protein that one could fit them to the amino acid sequence predicted from the DNA results. Deletion or insertion errors in the DNA sequence were easily recognized if some of the peptides matched in one reading frame and others matched in another. The first and still the largest protein to be studied by this technique, the enzyme alanyl-tRNA synthetase, is 875 amino acids long. Its structure was determined by the combination of DNA sequencing and mass spectrometric peptide sequencing in a realtime collaborative effort (11). FABMS

The second event that occurred in the early 1980s and dramatically changed the role of MS in peptide and protein structure research was the development of fast atom bombardment

MS (FABMS) by Barber et al. (12). Although some may still argue that this method is merely a modification of secondary ion MS (SIMS), the use of a liquid matrix—and not so much the use of neutral atoms—suddenly made it possible to ionize large polar molecules simply and without any prior chemical derivatization. Although this had been achieved previously using either field desorption (FD) (13) or plasma desorption (PD, in which the fission products of 252Cf cause the ionization) (14), none of these techniques gained much popularity for technical reasons—chiefly inconvenience or unavailability of the instrumentation. It should be pointed out, however, that a group in Osaka, Japan, made quite a bit of progress in the use of FDMS to solve protein structure problems, most notably the identification of single amino acid mutations in abnormal hemoglobins that cause various inherited diseases (15,16). Peptides have always been the favored test compound for new ionization techniques, in part because of their structural uniqueness, their wide availability in all sizes, complexities, and amounts, and perhaps most importantly because of their intellectual appeal as essential components of biological systems. Barber's first publication describing the FAB ionization

Exceptional Performance and Uncompromising Value! Compact design and high performance Easy to use and maintain Quick setup and solvent changeover Upgradable to gradient operation Microbore, analytical, or semi-preparative LC applications Precise solvent delivery for reproducible results Very low pulsation for optimum results and long column life Ask for details: The, SSI Model 222 HPLC Pump delivers exceptional performance for both conventional and high-speed HPLC as well as GPC and ion chromatography. The versatile Model 222 is a rugged, reliable HPLC pump with a long list of helpful features: remote control capability, optional pumpheads for microbore or semi-prep LC, settable upper and lower pressure limits, solvent compressibility correction, built-in injector mount and column enclosure, upgradability to gradient operation, and more.

1-800-441-HPLC [1-800-441-4752] in PA call 814-234-7311 or write to:

CIRCLE 195 ON READER SERVICE CARD 1290 A · ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986

Scientific Systems, Inc. 1120 W. College Avenue, State College, PA 16801

process included a peptide of mol wt 1318 as an example (12). It exhibited an abundant (M + H ) + ion, a charac­ teristic of all FAB mass spectra of po­ lar compounds. Immediately, the mass spectrometric world took notice, and a flood of papers concerned with the mass spectra of peptides followed. An intermediate milestone was reached with the demonstration that even a small protein, such as insulin, (mol wt 5803.4 for the human variety) can be successfully ionized (17). In addition to a pronounced peak corresponding to the mass of the protonated molecule, the (M + H ) + ion, FAB spectra exhibit some fragment ions. Although these are generally of low abundance, they increase with a decrease in the molecular size of the peptide and increase, up to a point, with increasing sample size. A further complication in the ability to interpret the FAB spectrum of a peptide is the obliterating effect of low-mass peaks resulting from the liquid matrix (glyc­ erol or thioglycerol) and its cluster ions. Thus, although the literature abounds with papers describing the correlation of a FAB spectrum of a peptide with its known sequence, those seriously concerned with the de­ termination of the primary structure of large peptides or proteins do not at­ tempt to deduce an unknown se­

quence from a FAB mass spectrum di­ rectly. Rather, they use this informa­ tion in conjunction with other chemical or mass spectrometric frag­ mentation procedures. There are, however, many situations in which the molecular weight infor­ mation alone suffices. The most well documented area is a variant of the DNA-protein structure correlation discussed earlier. Although the GC/ MS method provides short actual se­ quences that can be matched against those predicted from a DNA sequence, comparing the molecular weights (ac­ curate to within one mass unit) of the peptides found in the mixture pro­ duced by the action of an enzyme, such as trypsin, with those predicted from a hypothetical amino acid se­ quence is an equally stringent test of the correctness of that sequence. This prediction is simple and reliable be­ cause trypsin very specifically cleaves the peptide bonds at the carboxyl side of the basic amino acids arginine and lysine. Because these two amino acids occur in all proteins with average fre­ quency, tryptic peptides are generally 2-25 amino acids long. With the ex­ ception of the very short peptides, their molecular weights are quite dis­ tinct. Thus it is very rare that more than one peptide of the same molecu­ lar weight is predicted from a hypo­

thetical (DNA-derived) amino acid se­ quence. Errors in a DNA sequence hardly ever result in a transposition of two or more amino acids; they normal­ ly cause a change in reading frame (which results in an entirely different amino acid sequence, including the lo­ cations of lysines and arginines) or the replacement of one amino acid by an­ other. As a consequence, these errors in the DNA sequence always lead to a change in the molecular weights of the predicted tryptic peptides. It follows that it is quite easy to check the correctness of a DNA se­ quence by the FABMS determination of the molecular weights of the tryptic peptides derived from the correspond­ ing protein (18). Where they match, the DNA sequence is correct, whereas any mismatches directly indicate the type of error (deletion, addition, or misidentification of a base) and pin­ point its location, as illustrated in Fig­ ure 1. This information makes correc­ tion easy. Clearly, this check-and-balance strategy is most useful when carried out in parallel because it eliminates the need for the redundant resequencing of the same region of the gene just to uncover the unavoidable errors made the first time around. Table I lists the proteins whose structures were determined by these strategies.

AUTOSAMPLER Automate Your HPLC Sampling* The EM SCIENCE/Hitachi* Model 655A-40 Autosampler is: sample carousel and only 13^*1 of sample required. PROGRAMMABLE — individually program injection volumes (1 -100 μ\), number of injections per vial (1 9), and cycle time; all solid state, electrically driven. USKR FRIENDLY — program from the front panel via an LCD display; simple and easy to use STAT SAMPLE POSITION. ACCURATE A N D PRECISE — at injection volumes of 5 -100 μ\, the CV is never greater than 0.6%. To complete your modular EM SCIENCE/Hitachi HPLC Systems, request additional information on: • Pumps • Fluorescence • High Purity • UV Detectors Detectors Solvents/ • Semi-Micro • Column Ovens Reagents Systems • Columns For additional information on our Autosampler or to arrange for a demonstration, call or write: 111 Woodcrest Rd. Cherry Hill, NJ 08034-0395 (609)354-9200 EM SCIENCE (800)222-0342

•Reg. TM Hitachi, Ltd., Japan

A Division of E M industries, Inc.

Circle 50 for literature.

Circle 51 tor a demonstration.

1292 A · ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986

Associate of E. Merck, Darmstadt, Germany

Figure 1. Detection of deletion and insertion errors in the gene coding for the β-subunit of glycyl-tRNA synthetase from E. co//(adapted from Reference 18) The molecular weights predicted for reading frame 1 matched those found by FABMS up to amino acid position 324 and then again starting with amino acid 387. Translation of the DNA sequence in reading frame 3 corresponded to an amino acid sequence containing two tryptic peptides whose molecular weights matched two experimentally determined ones, namely, from amino acid 339 to 346 and 347 to 360. A shift to reading frame 3 indicates a missing base in the DNA sequence (insertion of a base would shift to reading frame 2). Also, between the codon of amino acid 324 and that of 339 there were only 41 bases rather than 42 (a multiple of three for each codon). Similarly, between the codons of amino acid 360 and 387 there were 79 bases instead of 78. These two errors can be pinpointed further by generating all hypothetical amino acid sequences derived from these short DNA sequence regions in which one base has been consecutively inserted or deleted at each position until a tryptic peptide appears whose molecu­ lar weight corresponds to one found in the tryptic digest of the protein (for details see Reference 18)

Most of them are rather large because it is much easier to determine the se­ quence of a few thousand bases in a DNA strand than a few hundred amino acids in a protein. On the other hand, it is relatively simple to gener­ ate the limited protein data, such as the molecular weights of a large frac­ tion of all tryptic peptides necessary to check the DNA sequence. Genetically engineered proteins Another area of current interest is the verification of the structure of a protein that has been prepared by re­ combinant DNA techniques. The large-scale synthesis of biologically ac­ tive proteins that occur in extremely

low concentrations in higher organ­ isms can now be accomplished by in­ sertion of a properly manipulated plasmid in a single cell system that can be cultured in large quantities. Well-known examples are the produc­ tion of human insulin, interferons, and interleukins. Many other proteins and numerous modifications of them are produced for research purposes. It is, however, important to verify that the new host organism produces the cor­ rect protein and that it is not modified in any way (other than a desired one). This verification can be accomplished by the same techniques as outlined above for the DNA-derived protein se­ quences and is sometimes referred to

Table I. Protein sequences deduced by a combination of DNA sequencing and FABMS Sequence Gln-tRNA synthetase

NO. of amino acids

Reference

550

a

303 + 687

b

Met-tRNA synthetase from yeast

751

c

His-tRNA synthetase from E. coli

324

d

Glu-tRNA synthetase from E. coli

471

e

Endo-/3-N-acetylglucosaminidase Η

313

f

Rabbit muscle creatine phosphokinase

380

Rabbit brain creatine phosphokinase

380

9 h

Gly-tRNA synthetase (a and β) from E. coli

Protein S from Myxoc. xanthus

173

i

Human glucose transporter

492

1

Cytosolic phosphoenolypyruvate carboxykinase

626

k

a

Biemann, K. Int. J. Mass Spectrom. Ion Phys. 1982, 45, 183-94; b Webster, T. A. et al. J. Biol. Chem. 1983, 258, 10637-41; c Gibson, B. W. et al. Proc. Natl. Acad. Sci. U.S.A. 1984, 81, 195660; d Freedman, R. et al. J. Biol. Chem. 1985, 260, 10063-68; β Breton, R. et al. J. Biol. Chem., in press; ' Robbins, P. W. et al. J. Biol. Chem. 1984, 259, 7577-83; » Putney, S. et al. J. Biol. Chem. 1984, 259, 14317-20; "Pickering, L. et al. Proc. Natl. Acad. Sci. U.S.A. 1985, 82, 2310-14; ' Takao, T. et al. J. Biol. Chem. 1984, 259, 6105-9; ' Mueckler, M. et al. Science 1985, 229, 9 4 1 45; * Beale, E. G. et al. J. Biol. Chem. 1985, 260, 10748-60.

1294 A · ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986

as FAB mapping (19,20). Essentially one checks whether the peptides pro­ duced by specific cleavage (either with trypsin or with cyanogen bromide) have the expected molecular weight. This principle has been demonstrated by Morris et al. (20) on insulin as a test compound and was used by Richter et al. to show that the protein eglin c when produced by recombinant techniques in E. coli contains an addi­ tional acetyl group (21). This modifi­ cation was first recognized when the molecular weight determined by FABMS was found to be 8134 instead of 8092 as calculated from the known sequence of the natural protein. Cleavage of the biosynthetic protein with various enzymes finally showed that this extra acetyl group was at­ tached to the N-terminal threonine. More recently, site-specific amino acid replacements in bacteriorhodopsin were produced in H. G. Khorana's group via the total synthesis of the corresponding gene and expressed in E. coli (22). Here again the determina­ tion of the molecular weights of the peptides produced by cleavage of the native protein and those obtained from the proteins produced by the synthetic genes corroborated the cor­ rectness of the desired modifications. As an example, Figure 2 shows the (M + H) + ion regions of the N-termi­ nal cyanogen bromide fragment of na­ tive bacteriorhodopsin. This fragment begins with pyroglutamic acid and in­ cludes a modified analogue where pyroglutamic acid was deleted and re­ placed by methionine (which had been cleaved off as homoserine). The mass difference of the two (M + H ) + ions at m/z 2191.4 and m/z 2080.2, respective­ ly, corresponds to pyroglutamic acid minus H2O, that is, 111 mass units (23).

Direct mass spectrometric sequencing of proteins An entirely different approach has to be taken if the amino acid sequence of a protein is to be determined direct­ ly, without isolating and sequencing the corresponding gene. For smaller proteins, direct sequencing is more ef­ ficient because the work of finding and identifying the gene represents the major effort and is just as difficult for small proteins as for large ones. The conventional approach applies the automated Edman procedure to the protein itself and certain cleavage products. However, FABMS is becom­ ing a more and more useful alternative or complementary approach. As mentioned earlier, a convention­ al FAB spectrum of a typical peptide produced by enzymatic or chemical cleavage does not contain sufficiently abundant fragment ions to define its

Outstanding luminance fromUVtoIR

Figure 2. The high-mass region of the FAB spectrum of the N-terminal peptide from (a) native and (b) modified bacteriorhodopsin (adapted from Reference 23)

amino acid sequence. Various methods have been developed to overcome this shortcoming, and they fall into two groups: (1) sequential chemical or enzymatic cleavage of the peptide bonds followed by FABMS of the truncated peptides or (2) cleavage in the mass spectrometer by collision of the (M + H) + ion with a neutral gas and mass analysis of the resulting fragment ions (tandem MS, also referred to as MS/MS). In methods of the first type, the stepwise shortening of the peptide by manual Edman degradation and redetermination of the molecular weights of the truncated peptide permit the identification of the N-terminal amino acid from the difference in mass (however, it is not possible to distinguish leucine from isoleucine). This can be repeated step by step until the signal for the new (M + H ) + ion disappears

in the background (usually 3-10 steps at most, depending on the amount of peptide one starts with). The most important advantage of this mass spectrometric subtractive Edman degradation is the fact that it can be carried out on mixtures of peptides. It is generally possible to correlate the first set of molecular weights with the second, the second with the third, etc., because they must represent sets of pairs that differ in mass by the mass difference of one of the 20 amino acids occurring in proteins. The Osaka group demonstrated this first using FDMS (24), and Williams et al. demonstrated it with FABMS (25). Shortening of the peptide chain from the carboxyl end with carboxypeptidases can be followed in the same way, except that the interpretation is more complicated (25, 26). This is because the removal of the C-terminal

Hamamatsu flash mode SuperQuiet Xenon Lamps provide a minimum of 109 flashes! Arc stability is 5 times higher, service life 10 times longer than conventional lamps. Available in three sizes for: • Stroboscopes • Chromatographs • Photoacoustic Spectroscopes • Strobe Cameras • Photomasking Devices • Fluorospectrophotometers • Color and S0 2 Analyzers • Other light-sensitive instruments

For Application Information, Call 1-800-524-0504 1-201-231-0960 in New Jersey

HAMAMATSU HAMAMATSU CORPORATION 360 FOOTHILL ROAD P.O. BOX6910 BRIDGEWATER, NJ 08807 PHONE: 201/231-0960 International Offices in Major Countries of Europe and Asia. © Hamamatsu Photonics, 1986 CIRCLE 95 ON READER SERVICE CARD

ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986 ·

1297 A

Figure 3. Schematic diagram of a four-sector magnetic deflection mass spectrome­ ter (JEOL HX110/HX110) indicating the ionization of a mixture, consecutive selec­ tion of (M + H) + ion of components in MS-1 for fragmentation by collision with a neutral gas (He), and mass analysis of the daughter ion in MS-2

amino acids continues as soon as one has been cleaved. It is thus necessary to follow the appearance and disap­ pearance of (M + H ) + ions over time. This is not easy for an unknown se­ quence because the rate of removal depends on the nature of the amino acid to be cleaved off next. Today, the most promising tech­ nique is MS/MS (for a general de­ scription of this methodology see Ref­ erence 27). It allows complete se­

quencing of peptides of medium size, such as those representing tryptic or other enzymatic digests, and can be carried out in simple mixtures (i.e., the peptides do not have to be fully separated and purified). Crude HPLC fractions containing between 1 and 10 peptides, ranging in molecular weight from 500 to 2500, have provided com­ plete sequence information in the author's laboratory. Of the instrumentation currently

Figure 4. FABMS of an HPLC fraction of the tryptic digest of thioredoxin isolated from Chromatium vinosum (adapted from Reference 31) 1298 A · ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986

Figure 5. Daughter ion spectrum of the (M + H)+ ion of m/z 2165.0 in Figure 4 Only the Β and Y + 2H ion series are labeled; the mass assignments are all within ± 0.5 mass units. Arrows pointing to the right read the N-terminal sequence (B ions), and arrows pointing to the left read the C-terminal sequence (Y + 2H ions). Cys(cm) stands for carbamidomethylated cysteine, Xle for either leucine or isoleucine (adapted from Reference 31)

available, four-sector magnetic tan­ dem mass spectrometers have the nec­ essary combination of precursor, (M + H) + , and product ion resolution and sufficient sensitivity to provide complete sequence information on 0.11.0 nmol of peptide (28). Triple quadrupole mass spectrometers are also suitable for this purpose except that their mass range, resolution, precursor ion selection, and product ion mass ac­ curacy are much less (29). Fourier transform mass spectrometers are po­ tentially useful (30), but the instru­ ments currently available commercial­ ly do not yet lend themselves to rou­ tine analyses of this type. Figure 3 illustrates the layout and operation of a magnetic deflection tandem mass spectrometer currently being used in the author's laboratory for protein sequencing. An example from the structure determination of the redox protein thioredoxin from Chromatium vinosum outlines the strategy followed and the interpreta­ tion of the data (28, 31). Digestion with trypsin cleaves the protein into 11 peptides. These are fractionated (but not completely separated) by HPLC. The FAB spectra (measured by MS-1) of the fractions indicate the molecular weights of the peptides present, and their (M + H ) + ions are then subjected to collisional activation with He while the second mass spec-

Trp Ala Asp Trp Cys(cm) Gly Pro Cys(cm) Lys

trometer (MS-2) scans the resulting fragment (daughter) ion spectrum. Figure 4 represents the molecular ion region of the normal FAB spectrum of one of the fractions, indicating the presence of two peptides whose monoisotopic (12C only) (M + H ) + ions have m/z 1762.9 and 2165.0, respec­ tively. These were then consecutively centered at the exit slit of MS-1, frag­ mented in the collision cell, and the fragment ion spectrum recorded by scanning MS-2. The spectrum result­ ing from the larger peptide is shown in Figure 5. Although the peptide bonds can fragment in a number of ways, only two ion types, the N-terminal Β ions and the C-terminal Y + 2H ions, are labeled. Their formation is depicted in Scheme II, without indication of the details of the fragmentation mecha­ nism (the nomenclature is adapted from Reference 32). The sequence is deduced from the mass differences of pairs of ions that correspond to the in­ cremental mass (molecular weight mi­ nus H 2 0) of the amino acid found in this position. A number of features can be noted. First, the sequence of 14 of the 18 amino acids present can be deduced directly from these data. Sec­ ond, the fact that the Y14 + 2 ion is absent does not matter because the B 4 ion indicates Pro-Val and not the re­ verse. Third, the sum of the mass of the first two and last two amino acids can be calculated from the B2 and Yi6 + 2 ions. A simple experiment re­ veals the complete structure of the peptide, which is shown at the top of Figure 5. Treatment of the peptide with phenylisothiocyanate (PITC), the Edman reagent, and acid treat­ ment to complete the cleavage step

provide a truncated peptide. The FAB mass spectrum of the reacted product reveals the (M + H ) + ion, and the dif­ ference in mass indicates the amino acid that was cleaved off. In this case, it was found to correspond to serine, and it follows that the second amino acid must be proline to account for the mass of the B2 and Y u + 2 ions. The two C-terminal amino acids must be Cys (cm)-Lys because the last one has to be a tryptic cleavage site; the re­ maining mass fits only cysteine, modi­ fied by reaction with iodoacetamide, a step necessary for digestion with tryp­ sin. The sequence of the other 10 pep­ tides was deduced in the same way ex­ cept that the overlapping ion series extended to the N-terminus and the C-terminus, respectively, thus elimi­ nating the need for the PITC treat­ ment step.

It appears that MS/MS, particularly with the data quality of four-sector in­ struments, will be a powerful tool for the determination of the primary structure of proteins and other, relat­ ed problems. Its strength, compared with the simpler, less expensive, and currently somewhat more sensitive Edman degradation, lies in the elimi­ nation of the need for the complete separation of peptide mixtures (which at least partly makes up for the lower sensitivity), its applicability to Nblocked peptides and those containing modified amino acids, and the fact that information from both N- and Cterminal sequences is obtained. References (1) Biemann, K.; Gapp, F.; Seibl, J. J. Am. Chem.Soc. 1959,81, 2274. (2) Biemann, K. In Biochemical Applica­ tions of Mass Spectrometry; Waller,

Scheme II. Two of the most common fragmentation processes along the peptide bond

Note: A tetrapeptide is used as an example; cleavage between the second and third amino acids is shown. The same fragmentation can occur at any peptide bond

ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986 · 1299 A

G. R., Ed.; John Wiley & Sons: New York, 1972; pp. 405-28. (3) Biemann, K. In Biochemical Applica­ tions of Mass Spectrometry, First Sup­ plementary Volume; Waller, G. R.; Dernier, 0 . C , Eds.; John Wiley & Sons: New York, 1980; pp. 469-525. (4) Carr, S. Α.; Herlihy, W. C.; Biemann, K. Biomed. Mass Spectrom. 1981,8, 5 1 61. (5) Vilkas, E.; Lederer, E. Tetrahedron Lett. 1968, 3089. (6) Niall, H. D. Meth. Enzymol. 1973,27, 942. (7) Khorana, H. G.; Gerber, G. E.; Herlihy, W. C ; Gray, C. P.; Anderegg, R. J.; Nihei, K.; Biemann, K. Proc. Natl. Acad. Sci. U.S.A. 1979, 76, 5046-50. (8) Fowler, A. V.; Zabin, I. J. Biol. Chem. 1978 253 5521 (9) Maxami A. M.; Gilbert, W. Proc. Natl. Acad. Sci. U.S.A. 1977, 74, 560-64. (10) Sanger, F.; Coulson, A. R.; Barrel, Β. G ; Smith, A.J.H.; Roe, Β. A. J. Mol. Biol. 1980,143,161-78. (11) Putney, S. D.; Royal, Ν. J.; Neuman de Vegvar, H.; Herlihy, W. C ; Biemann, K.; Schimmel, P. R. Science 1981,213, 1497-1501. (12) Barber, M.; Bordoli, R. S.; Sedgwick, R. D.; Tyler, A. N. J. Chem. Soc, Chem. Commun. 1981, 325-27. (13) Beckey, H. D. Field Ionization Mass Spectrometry; Pergamon: Oxford, En­ gland, 1971; p. 42. (14) Macfarlane, R. D. Anal. Chem. 1983, 55,1247-64 A. (15) Matsuo, T.; Matsuda, H.; Katakuse, I.; Wada, Y.; Fujita, T.; Hayashi, A. Biomed. Mass Spectrom. 1981,8, 25-30. (16) Wada, Y.; Hayashi, Α.; Fujita, T.; Matsuo, T.; Katakuse, I.; Matsuda, H. Biochim. Biophys. Acta 1981,667, 23341.

(17) Barber, M.; Bordoli, R. S.; Elliott, G. J.; Sedgwick, R. D.; Tyler, A. N ; Green, B. N. J. Chem. Soc, Chem. Com­ mun. 1982, 936-38. (18) Gibson, B. W.; Biemann, K. Proc. Natl. Acad. Sci. U.S.A. 1984,81,195660. (19) Biemann, K. In Methods in Protein Sequence Analysis; Elzinga, M., Ed.; Humana Press: Clifton, N.J., 1982; pp. 279-88. (20) Morris, H. R.; Panico, M.; Taylor, G. W. Biochem. Biophys. Res. Commun. 1983,727,299-305. (21) Richter, W. J.; Raschdorf, F.; Maerki, W. In Mass Spectrometry in the Health and Life Sciences; Burlingame, A. L.; Castagnoli, N., Eds.; Elsevier B. V.: Am­ sterdam, 1985; pp. 193-208. (22) Khorana, H. G. Ann. N.Y. Acad. Sci. 1986, in press. (23) Allmaier, G.; Chao, B. H ; Khorana, H. G.; Biemann, K. Presented at the 34th Annual Conference on Mass Spec­ trometry and Allied Topics, Cincinnati, Ohio, June 8-13,1986. (24) Shimonishi, Y.; Hong, Y.-M.; Kitagishi, T.; Matsuo, T.; Matsuda, H.; Kata­ kuse, I. Eur. J. Biochem. 1980,112, 25164. (25) Bradley, C. V.; Williams, D. H.; Hanley, M. R. Biochem. Biophys. Res. Com­ mun. 1982,104,1223-30. (26) Takao, T.; Hitouji, T.; Aimoto, S.; Shimonishi, Y.; Hara, S.; Takeda, T.; Takeda, Y.; Miwatani, T. FEBS Lett. 1983, 152,1-5. (27) Tandem Mass Spectrometry; McLafferty, F. W., Ed.; John Wiley & Sons: New York, 1983. (28) Biemann, K.; Martin, S. Α.; Scoble, Η. Α.; Johnson, R. S.; Papayannopoulos, Ι. Α.; Biller, J. E.; Costello, C. E. In Moss Spectrometry in the Analysis of Large

Molecules—Proceedings of Texas Sym­ posium on Mass Spectrometry; McNeal, C , Ed.; John Wiley & Sons: Sussex, En­ gland, 1986; pp. 131-49. (29) Hunt, D. F.; Bone, W. M.; Shabanowitz, J.; Rhodes, J.; Ballard, J. M. Anal. Chem. 1981,53,1704-8. (30) Cody, R. B.; Amster, I. J.; McLafferty, F. W. Proc. Natl. Acad. Sci. U.S.A. 1985, 82, 6367-70. (31) Biemann, K. In Methods of Protein Sequence Analysis, VI; Walsh, Κ. Α., Ed.; Humana Press, in press. (32) Roepstorff, P.; Fohlman, J. Biomed. Mass Spectrom. 1984,11, 601.

Klaus Biemann obtained his Ph.D. in chemistry from the University of Inns­ bruck in his native Austria. He has been a professor of chemistry at MIT since 1963. In 1958, he began his work on structure determination by mass spectrometry, particularly of complex molecules of biological interest.

Rugged & Reliable Gradient HPLC Compact design and high performance Modular components for quick set up and easy maintenance Analytical or semi-preparative applications Precise mobile phase delivery for reproducible gradient results Robust components designed for maximum uptime Very low pulsation for optimal results and long column life Ask for details 1-800-441-HPLC [1-800-441-4752] in PA call 814-234-7311 or write to:

Scientific Systems, Inc. 1120 W. College Avenue State College, PA 16801

Day in and day out, the SSI Model GS 400 HPLC Gradient System delivers precise, reproducible binary gradients. Com­ prised of two Model 200 LC Pumps, the Model 210 Guardian, and the Model 230 Gradient Controller, the compact Model GS 400 fits neatly on your laboratory benchtop and has a long list of helpful features: "user friendly" programmable gradient controller, optional pumpheads for semi-preparative LC, settable upper and lower pressure limits, remote control capability, solvent com­ pressibility correction, and more. The system components are also available individually — start with a single LC Pump and Guardian for isocratic chromatography, then add a second LC Pump and Gradient Controller as the need for gradient elution techniques occurs.

CIRCLE 196 ON READER SERVICE CARD 1300 A · ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986