Integration of mass spectrometry in analytical biotechnology

1991, 63, 2802-2824. PERSPECTIVE: ANALYTICAL BIOTECHNOLOGY. Integrationof Mass Spectrometry in Analytical Biotechnology1. Steven A. Carr,* Mark E...
0 downloads 0 Views 3MB Size
2802

Anal. Chem. lB91, 63,2802-2824

PERSPECTIVE: ANALYTICAL BIOTECHNOLOGY

Integration of Mass Spectrometry in Analytical Biotechnology' Steven A. Carr,* Mark E. Hemling, Mark F. Bean, and Gerald D. Roberts Department of Physical and Structural Chemistry, SmithKline Beecham Pharmaceuticals, King of Prussia, Pennsylvania 19406 Mare spectrometry (MS) has become an Indispensable tool for peptlde and proteln structure analyrls because of three unique capabWks that enabk It to be used to rdve structural problems not easlly handled by conventlonai technlques. Flrst, MS Is able to provMe accurate molecular welght lnformatlon on iow-picomole amounts of peptides and proteins Independent of covalent modlfkatlons that may be present. Second, thk tnformatkn b obtalnable for pepticks present In complex mlxtwes such as those that result from a proteolytk dlgest of a protein. Thlrd, by uslng tandem MS, partial to complete sequence lnfonnatlon may be obtained for peptkles containlng up to 25 amino acM resklues, even If the p8ptkles are present in mixtures. Sensltivlty and speed of the MSbased approaches now equal (and In m e cases exceed) that of Edman-based sequence analyrls. I n thk pempectlve we dbcuss how MS, tandem hlgh-performance MS, and online llquld chromatographyIMS uslng fast atom bombardment or ekctrospray ionization have been Integrated wRh more conventional technlques In order to increase the accuracy and speed of peptide and protein structure characterlzatlon. The expanding role of matrlx-assisted laser desorption MS in protein analyrls Is ab0 described. The unlqw nkhe that MS occupkr for locating and structurally characterlzlng posttranrlatlonal modltlcatlons of protdns b emphasized. Examples chosen from the authors' laboratory illustrate how MS b wed to sequence blocked protehw, deflne N- and Gterminal sequence heterogenelty, locate and correct errors In DNAand cDNAdduced proteln sequences, Identlfy sltes of deamidath, koaspartyl fonnatbn, m a t h , oxklation, dlsutflde bond formation, and glycosylation, and define the structural class of carbohydrate at specclfic attachment sHes In glycoproteins.

I. INTRODUCTION 1. First and Second Generation Biopharmaceuticals.

The revolution in molecular biology has made it possible not only to produce large quantities of naturally occurring proteins but also to alter the nucleotide sequence of a gene at will (and, therefore, the sequence of the encoded protein) to produce novel, "second-generation" proteins with altered properties. Many "wild-type" proteins-the first generation products-are being developed for therapy, diagnosis, and prophylaxis of human diseases. Several, including human insulin, human growth factor, CY interferon, hepatitis B vaccine, tissue plas-

*Towhom correspondence and reprint requests should be addressed. Dedicated to Professor Klaus Biemann on the occasion of his 65th birthday. 0003-2700/91/0363-2802$02.50/0

minogen activator, and erythropoeitin are already in use. Others such as epidermal growth factor (for use in wound healing) and granulocyte-macrophage colony stimulating factor (which increases the concentration of circulating white blood cells) are in clinical trials or have recently been approved. In addition to proteins with obvious therapeutic potential, numerous proteins with physiologically undesirable effects such as HIV-protease (1,2)are also being produced by recombinant techniques to study their mechanism of action and to develop therapeutically useful antagonists. The increased emphasis on developing second generation recombinant proteins has resulted, in part, from problems associated with the production, characterization, and quality control of multigram amounts of recombinant wild-type proteins. Second-generation molecules are generally smaller than the native protein and are "engineered" to have improved properties such as longer serum half-life, greater potency, higher stability, and lower toxicity. Protein engineering is also used to remove potential sites of chemical or enzymatic modification (such as glycosylation sites or cysteines not involved in disulfide bonds) which complicate production, purification, and characterization of these molecules. Conformationally restricted synthetic peptides often incorporating non-peptidic structural elements are also being produced as mimics of their large molecule counterparts. If efficacious, such molecules could be far easier and less expensive to produce on large scale. Automated chemical synthesis of even small proteins is now a viable alternative to expression in heterologous cells, particularly where extensive structure/function studies are intended ( 3 , 4 ) . Peptide synthesis is also a key technology for epitope mapping, a procedure whereby synthetic peptides corresponding to known partial sequences of a target protein of interest are used to produce antibodies. These antibodies are used to probe the relationship of a specific part of the protein's structure to its function and to develop diagnostic reagents for the detection of that protein (5). 2. Characterization: The Challenge to the Analytical Biochemist. Once produced, the identity, purity, potency, and safety of biopharmaceutical compounds must be demonstrated to the regulatory agencies before they can be used in humans (see refs 6-8 and references therein). Structural characterization and purity assessment of these molecules present formidable challenges to the analytical biochemist due to a number of factors. Recombinant proteins are produced in living cells by fermentation or cell culture, and complex multistep purification procedures are required to eliminate DNA, endotoxins, host cell proteins, protein aggregates, and viruses that may be present (6-8). Harsh purification conditions can create populations of proteins with structural variants such as deamidated or oxidized amino acid residues. Heterogeneity may also arise by genetic instability of the host 0 1991 American Chemical Soclety

ANALYTICAL CHEMISTRY, VOL. 63, NO. 24, DECEMBER 15, 1991

2803

Table I. Techniques for Primary Structure Analysis of Peptides and Proteins technique

sensitivity, pmol

amino acid analysis

1-25

SDS-PAGE" of proteins

0.1-100

MS of proteins

0.1-100

Edman microsequencing

5-50

MS and LC/MS tandem MS (MS/MS)

5- 100 10-1000

comm en ta

modified amino acids destroyed or unidentified; accuracy of composition decreases with increasing M , M , generally within 10%;noncovalent oligomers and aggregates may be detected; difficult to analyze M , loo0 Da) parent ions (87,90). As a result, more than one parent ion may be selected, and uncertainty is introduced in the assignment of product ion masses. Detailed discussions of other important analytical distinctions among these tandem analyzers have been presented elsewhere (39,56,57,87,91). The FAB and high-energy tandem MS data presented here were obtained with a VG ZAB-SE 4F, a first-generation tandem double-focusing mass spectrometer equipped with a conventional electron multiplier detector (75,87). Analytically useful product ion spectra for peptides with M , 12500 are

ANALYTICAL CHEMISTRY, VOL. 63, NO. 24, DECEMBER 15, 1991

routinely obtained using 100-1000 pmol of peptide (87). Sensitivity in tandem MS may be increased by factors of 20-50 using integrating electrooptical array detectors in a focal plane of the instrument (see ref 56, and references therein). Each double-focusing mass spectrometer of the four-sector instrument consists of a magnet sector ( B ) and an electric sector (E). These analyzers separate ions on the basis of their differences in momentums and kinetic energies, respectively (66). Mass- and energy-selected ions are fragmented by high-energy CID with helium or argon in a collision chamber located between the two mass spectrometers. Spectra of the resulting product ions are obtained by scanning the magnet and electric sectors of the second mass spectrometer together such that the ratio of the respective field strengths is held constant: the so-called linked-scan at constant B / E (66). Linked scanning can also be accomplished on a single (vs a tandem) double-focusing mass spectrometer to produce product ion information (66). In this case, parent ions are fragmented in a collision cell located between the ion source and the first analyzer. An advantage of the B / E linked-scan experiment on a single mass spectrometer is that the sensitivity for product ion detection is generally better than in a four-sector tandem MS experiment (92). Product-ion resolution in a B / E linked-scan is generally good (ca. IOOO), but parent ion selection on the single instrument is much lower, typically 1150. With this low selectivity the potential is high for obtaining product ion spectra that are composites of the product ions of parents of different mass. In addition, interference from matrix-related ions can be significant. Therefore, linked scanning at constant B / E on a single instrument is useful only for the analysis of pure compounds or simple mixtures in which the parent ions have significantly different masses (e.g., m / z loo0 vs m/z 1007). The use of two double-focusing mass spectrometers in tandem reduces these problems, as the first double-focusing combination provides unity mass selection of parents weighing up to several thousand daltons. Electrospray and low-energy tandem MS data presented here were acquired with a Sciex triple-quadrupole mass analyzer with pneumatically assisted electrospray (4593, 94). The principles of tandem MS using triple-quadrupole mass spectrometers have been reviewed recently (57). 6. Microchemical Derivatization. Before the advent of soft ionization methods like FAB, biomolecules had to be chemically derivatized to increase their volatility and thermal stability in order to permit analysis by conventional ionization methods such as electron and chemical ionization (95, 96). Although no longer required, derivatization is still widely used in conjunction with soft ionization MS techniques (1)to detect the presence of specific functional groups (95,%), (2) to clarify the structure of fragment or product ions (90,97,98),and (3) to enhance molecular ion signals (99,100). For example, MS of a peptide prior to and after N-acetylation can be used to assess the presence (or absence) of a free amino terminus by the diagnostic shift of 42 Da (97,98). In addition, if fragment ions are present, any that are derived from the N-terminus will be shifted upward in mass by 42 Da thereby aiding in interpretation of the spectrum. Caution must be exercised since Lys residues will acetylate, at least partially, at the eNH2 of the side chain. Acetylation can also be used to differentiate between Lys and Gln which have the same nominal mass, since the side chain of Gln is an amide and will not acylate. Similarly, the number of carboxyl groups may be determined and C-terminal ions distinguished from N-terminal ions by esterification (90). Carboxylates may also be labeled via enzyme-catalyzed introduction of l80into the nascent carboxyl of the peptide bond cleaved (101). Comparison of the MS or tandem MS spectra of *sO-labeledpeptides has been used to clarify the origin of sequence ions in the spectra (101, 102).

2809

Derivatization may also be used to direct fragmentation in tandem MS by placing a fixed charge on one of the two termini or on a specific amino acid residue (103-105). In underivatized peptides, positive charge tends to be localized on the side chains of basic amino acids such as Arg and Lys rather than on the less basic amide nitrogen atoms along the backbone. Fragmentation of peptides in MS and tandem MS is believed to occur at bonds remote from these sites of protonation (106). Thus tryptic peptides, with C-terminal Arg, generally yield a dominant series of ions retaining the charged, C-terminal residue (88,89). Similarly, when basic residues are present at or near the N-terminus of a peptide, fragment ions derived from the N-terminus of the molecule dominate. In the absence of strongly basic residues, a mixture of N-terminal and Cterminal ions are observed (88,89). Unfortunately, these ion series are often incomplete even in tandem MS (see sections 111.1and IILG), and ambiguities may arise in the interpretation of product ion spectra with regard to the sequence deduced. Several groups have employed N- or C-terminal specific derivatization to fix a positive charge on the molecule; such a fixed charge has a stronger directing influence on the fragmentation than any basic residues that may be present, and therefore, the appearance of the spectra are often dramatically altered (103-105). For example, the high-energy CID mass spectra of peptides derivatized at their N-termini with a trimethylammonium acetyl group only exhibit N-terminal product ions even if the peptides contain C-terminal or internal basic residues (104). In addition, side chain specific cleavages that permit the differentiation of isomeric amino acids are enhanced in these spectra.

111. ROLES OF MASS SPECTROMETRY I N PRIMARY STRUCTURE ANALYSIS 1. Sequence Analysis of Blocked Peptides in Mixtures. Enzymatic acylation of the N-termini of peptides and proteins is a common posttranslational event. Most intracellular proteins in eukaryotes are Nu-acetylated (25)and an increasing number of proteins are being identified that have long-chain fatty acids, particularly myristic acid, bound to their N-termini (25, 108-110). Nu-acylation is also often used to protect synthetic peptides from the action of aminoproteases. Absence of a free primary or secondary N-terminus prevents straightforward use of Edman degradation for sequencing. In contrast, the mass spectrum of a blocked peptide will reveal the nature of the blocking group by the mass difference between the observed (M + H)+ ion and that calculated for the composition of the peptide, and a tandem MS experiment can be used to sequence the blocked peptide de novo. On the basis of the synthesis and amino acid composition analysis, the expected M, of the N-acetylated, C-terminal amidated peptide shown in Figure 1 is 1160.5 (see footnote for the residue masses of the amino acids). The mass of the most abundant (M H)+ ion observed in the FABMS data is at m / z 1161.5, suggesting that this component has the desired composition and blocking substituents. The sequence of the blocked peptide was then determined by tandem MS using a tandem double-focusing mass spectrometer (see section 11.5). Several overlapping series of N- and C-terminal sequence-defining "backbone" product ions are observed (Figure 4A). The b, series defies the amino acid sequence of residues

+

$Amino acid single letter codes and residue masses (Da) (monoisotopic): Ala (A) 71.04; Arg (R) 156.10; Asn (N) 114.04; Asp (D) 115.03; Cys (C) 103.01; Glu (E) 129.04; Gln (Q)128.06; Gly (G) 57.02; His (H) 137.06,Leu/Ile (L/I) 113.08; Lys (K) 128.09; Met (M) 131.04; Phe (F)147.07; Pro (P)97.05; Ser (S) 87.03; Thr (T) 101.05; Trp (W) 186.08; Tyr (Y)163.06; Val (V) 99.07; (carboxymethy1)cysteine161.01; homoserine 101.05; hydroxyproline 113.05; pyroglutamic acid 111.03.

2810

ANALYTICAL CHEMISTRY, VOL. 63,NO. 24, DECEMBER 15, 1991

1-9 as if being read sequentially from the N-terminus (i.e., bl, b2, b3, ...1. Most importantly, the acetyl group on the N-terminus, which prevents sequencing the sample by Edman degradation, shifts the mass of the bl (and, therefore, all subsequent b, ions) upward by 42 Da, and thereby establishes that this group is located on the N-terminal His. An incomplete a series is also observed. Similarly, yn product ions from y3 to yg enables the Gly-Trp-xLeu-xLeu-Gly-Glu (xLeu = either Leu or ne) sequence of the peptide to be read. It should be noted that the product ions are labeled in ascending order from the terminus retaining the charge (for example, see labeled structure in Figure 4). Numerous x and z fragments are also observed. The d, and w, ions are formed by cleavage of the j3,r bond of the residue that has undergone peptide bond cleavage. One of the most frequent and important uses of these side chain fragments is to distinguish Leu from Ile (88,89). For example w6 is formed by loss of C3H7from the xLeu at residue five of the peptide demonstrating that this amino acid is Leu (Figure 4A). Similarly, the xLeu at residue four of the peptide is an Ile on the basis of the observation of two w7 fragments corresponding to the loss of CH3 and CzHS (w7*and WTb, respectively, Figure 4A) from the side chain of this amino acid. In principle any isobaric amino acids differing in their pattern of substitution at the @-carbon(such as phydroxyaspartic acid and methionine, both of which have a residue mass of 131 Da) may be distinguished in a similar manner. Product ions reflecting the amino acid composition of the peptide are also commonly observed (89). Amino acids from any sequence location can give rise to immonium ions with the general structure +NHz=CHR, at the low-mass end of the spectrum. In addition, the peptide may fragment by loss of a side chain of an amino acid from any location in the sequence to yield a product ion at the high-mass end of the spectrum. In both cases these ions are denoted by the single-letter code of the amino acid of origin. Electrospray tandem MS of the (M + 2HI2+ion (m/z 581.2, Figure 1B) of this same peptide yields, principally, b, (n = 1-7) and yn (n = 3-9) ions (Figure 5). Side chain loss fragments are not observed at the low collision energies used ( 2500) peptides. Several factors contribute to this problem. First, molecular ion currents generally decrease with increasing size of the peptide. In addition, the number of fragmentation pathways available to the peptide increases with size. This distributes the fragment ion current among a larger number of fragment ions, increasing the difficulty of detecting product ions and deciphering their structural origin. Despite these difficulties, useful tandem mass spectra of FAB-generated molecular ions have been obtained on peptides with masses exceeding 3000 Da (19).

manner as the tryptic digest above. The pertinent FABMS and tandem MS sequence data that define the N-terminal region of SBP are shown in Figure 14. Tandem MS of the (M + H)+ = m / z 605.3 chymotryptic peptide provided the sequence Gly-Ile- Arg-Ile-Phe. This contains the previously defined tryptic tripeptide and the first two residues of the m/z 1174 tryptic peptide thereby ordering these fragments. Two other signals observed in the FABMS data of the chymotryptic digest with (M + H)+ = m/z 671.3 and 1326.5 can only be rationalized if the m/z 1174.7 tryptic peptide immediately precedes the m/z 586.4 tryptic peptide. Many other molecular weight related signals that provided redundant information were observed in the chymotryptic digest and therefore have not been included in Figure 14. Two chymotryptic peptides observed at m/z 1175.7 and 1322.9 did not shift in mass after one cycle of manual Edman and were presumed to derive from the blocked N-terminus. Tandem MS of the (M H)+ = m/z 1175.7 chymotryptic peptide gave the spectrum shown in Figure 15. The fragmentation consists principally of b and y series ions that define the sequence