Strain-based Sequence Variations and Structure Analysis of Murine

Bao Quoc Tran , Christopher Barton , Jinhua Feng , Aimee Sandjong , Sung Hwan Yoon , Shivangi Awasthi , Tao Liang , Mohd M. Khan , David P.A. Kilgour ...
0 downloads 0 Views 171KB Size
Biochemistry 2001, 40, 9725-9733

9725

Strain-based Sequence Variations and Structure Analysis of Murine Prostate Specific Spermine Binding Protein Using Mass Spectrometry† Pierre Chaurand,‡ Beverly B. DaGue,‡ Shuguang Ma,‡,§ Susan Kasper,| and Richard M. Caprioli*,‡ Department of Biochemistry and the Mass Spectrometry Research Center, Departments of Urologic Surgery and Cell Biology and the Vanderbilt Prostate Cancer Center, Vanderbilt UniVersity, NashVille, Tennessee 37232-6400 ReceiVed March 1, 2001; ReVised Manuscript ReceiVed May 10, 2001

ABSTRACT: Mouse spermine binding protein (SBP) has been characterized using mass spectrometry, including its localization within the prostate, sequence verification, and its posttranslational modifications. MALDI (matrix-assisted laser desorption/ionization) mass spectrometry was employed for localization of proteins expressed by different lobes of the mouse prostate obtained after tissue blotting on a polyethylene membrane. The mass spectra showed complex protein profiles that were different for each lobe of the prostate. The prostate-specific spermine binding protein (SBP), primarily identified by its in-source decay fragment ion signals, was found predominantly expressed by the ventral lobe of the prostate. The MALDI in-source decay measurements combined with nanoESI (nanoelectrospay ionization) MS/MS measurements obtained after specific proteolysis of SBP, allowed the exact positioning of a single N-linked carbohydrate group, and the identification of a pyroglutamate residue at the sequence N-terminus. The N-linked carbohydrate component was further investigated and the general pattern of the N-linked carbohydrate identified. The presence of a disulfide bridge between cysteine78 and cysteine124 was also established. The full sequence characterization of SBP showed several strain-based sequence differences when compared to the published gene sequence.

Matrix-assisted laser desorption/ionization mass spectrometry (MALDI MS)1 and electrospray ionization (ESI) MS have greatly facilitated the studies of protein structure and function over the past 10 years (1-5). These techniques now provide the means for sequencing femtomole amounts of peptides, measuring the molecular weights of proteins up to 200 kDa in mass, analyzing protein posttranslational modifications (e.g., phosphorylation and glycosylation), and elucidating specific binding of ligands to proteins as well as protein-protein interactions. Protein identification is routinely performed using MS analysis of protein digests to obtain sequences of fragment peptides followed by the query of protein databases (6-10). Consequently the identification of variations in sequences and posttranslational modifications of proteins of nominally similar function may be achieved relatively easily. † This work was supported by the National Institutes of Health (Grant GM 58008), the T. J. Martell Foundation and the Vanderbilt Ingram Cancer Center. * To whom correspondence should be addressed. Phone: (615) 3224336. Fax: (615) 343 8372. E-mail: [email protected]. ‡ Department of Biochemistry. § Current address: Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, NJ 07033. | Departments of Urologic Surgery and Cell Biology and the Vanderbilt Prostate Cancer Center. 1 Abbreviations: SBP, spermine binding protein; MALDI, matrixassisted laser desorption/ionization; ESI, electrospray ionization; NanoESI, nanoelectrospray ionization; MS, mass spectrometry; TOF, timeof-flight; MS/MS, tandem mass spectrometry; MW, molecular weight; m/z, mass to charge; CHCA, R-cyano-4-hydroxycinnamic acid; DHB, 2,5-dihydroxybenzoic acid; SA, sinapinic acid; IAA, iodoacetamide; DTT, dithiothreitol; TFA, trifluoroacetic acid; HPLC, high-performance liquid chromatography

MALDI ion sources are typically coupled to time-of-flight (TOF) mass analyzers that have delayed extraction capabilities (11). Thus, ions formed after the desorption/ionization events are kept in the source regions for several hundreds of nanoseconds. Some of the molecular species will fragment during this period in a process termed in-source decay (12). For proteins, many sequence-specific fragment ions are formed by cleavage of the protein backbone, with the most frequent being cn, an and yn ions (for fragment ion nomenclature see refs 13 and 14). From the mass differences one can deduce the sequence of a portion of the protein, and in turn, this information can be used to query protein databases and identify the protein. Although in-source decay MALDI MS can provide partial amino acid sequence determination, only a few examples of peptide and protein identification done in this manner have been reported. Katta et al. studied the in-source decay behavior of several recombinant proteins and were able to precisely locate posttranslational modifications such as disulfide bonds (15). Reiber and colleagues successfully sequenced several peptides based upon their in-source decay fragment ion spectra and identified several large proteins (16, 17). Lennon and Walsh obtained sequence information using in-source decay on several peptides and proteins and successfully located posttranslational modifications such as disulfide bonds and sites of phosphorylations (18-20). More recently, Muscat and colleagues used in-source fragmentations to characterize polyesteramides (21). The use of MALDI MS for protein profiling allows one to generate highly reproducible and specific protein patterns from fresh tissue sections (22, 23). Briefly, tissue sections

10.1021/bi010424l CCC: $20.00 © 2001 American Chemical Society Published on Web 07/21/2001

9726 Biochemistry, Vol. 40, No. 32, 2001 are contact-blotted to an organic polymer membrane, which captures proteins through hydrophobic and electrostatic interactions. The blotted areas are then coated with a matrix and analyzed by MALDI MS. Typically, from such blots over 200 individual protein signals can be detected in a mass range up to 100 kDa (24). In the current paper, we have used these methods to profile and compare the proteins expressed or secreted by the different lobes of the mouse prostate. Because of the lobe sampling process, both cytoplasmic and secretory proteins are observed (24). Prostate-specific mouse spermine binding protein (SBP) (25) was identified by its in-source decay ions and found to be expressed in high abundance in the ventral prostate lobe. After purification by HPLC, SBP was treated with proteases and the peptide fragments analyzed by nanoelectrospray (nanoESI) tandem mass spectrometry (MS/MS). These data were used to verify the primary sequence of some of these fragments and to highlight variations between the sequence of the studied strain (CD1) and the database reference SBP sequence (C57/BK). Further, we have demonstrated the presence of an N-terminal pyroglutamate residue as well as the position of an N-linked glycosyl group. The MS/MS measurements also revealed the presence of a disulfide bridge in the protein. MATERIALS AND METHODS Materials. Matrix compounds, R-cyano-4-hydroxycinnamic acid (CHCA), 2,5-dihydroxybenzoic acid (DHB), and 3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic acid, SA) were obtained from Aldrich Chemical Co. (Milwaukee, WI). Peptides and proteins used as molecular weight calibration standards, ammonium bicarbonate, iodoacetamide (IAA), dithiothreitol (DTT), and endoproteinase Glu-C were obtained from Sigma Chemical Co. (St. Louis, MO). Biotech sequencing-grade 88% formic acid was obtained from Fisher (Fair Lawn, NJ) and sequencing-grade trifluoroacetic acid (TFA) was purchased from Burdick and Jackson (Muskegon, MI). HPLC-grade acetonitrile and methanol were EM Science Omnisolve brand (Merck KGaA, Darmstadt, Germany). Modified TPCK-treated porcine trypsin was obtained from Promega (Madison, WI), and N-deglycosidase PNGaseF from Prozyme (San Leandro, CA). Preparation and MALDI MS Analysis of Tissue Blots. The prostate glands of five ten-week old male CD1 and C57/B6 mice (Harlan, Indianapolis, IN) were dissected and the different lobes carefully separated under a stereoscope. Animal sacrifice was performed under NIH guidelines. Special care was taken not to burst either the bladder or the seminal vesicle to avoid cross-contamination. The lobes were immediately placed on a carbon-imbedded polyethylene (CPE) membrane (80 µm thick, Goodfellow, Cambridge, England) mounted on a metal MALDI-MS sample plate using double-sided conductive tape (3M Co., Minneapolis, MN) (23). A glass microscope slide was placed on the tissue sections to achieve good contact between the tissue samples and the membrane and to avoid tissue dehydration. After 5 min at room temperature, the tissue sections were removed from the C-PE membrane and the blot areas were allowed to air-dry. Cellular debris, tissue fragments, physiological salts, and blood were then removed from the blot area by washing with 20 µL water for about 30 s using an automatic pipet. The rinse water was retained for further analysis. The

Chaurand et al. blotted areas were air-dried and a 1-µL drop of SA matrix (at 20 mg/mL in 5/5/0.01, V/V/V, acetonitrile/water/TFA) was applied to several regions of each blot. After drying under ambient conditions, analysis was performed using a Voyager MALDI DE-STR TOF MS (Applied Biosystems, Inc., Framingham, MA) equipped with a nitrogen (337 nm) laser. Positive ion mass spectra were acquired in the linear mode using a 25 kV accelerating potential under optimized delayed extraction conditions. Mass spectra were acquired and averaged from 1000-2000 laser shots randomly taken at various locations within a single matrix-coated spot (∼2 mm diameter). Mass calibration was achieved using the ions from R- and β-hemoglobins observed in all blots [R-hemoglobin, m/z 7498.5 (z ) 2) and m/z 14 996.1 (z ) 1); β-hemoglobin, m/z 7809.5 (z ) 2) and m/z 15 617.9 (z ) 1)]. Protein identification using amino acid sequence tags generated from in-source decay fragment ions was performed using the search engines available on the Proteometrics LLC web site (www.proteometrics.com) to search the NCBInr protein database with the taxonomy Mus musculus (Proteometrics, New York). HPLC Purification of Spermine Binding Protein. Rinse water from several blots from the same tissue lobe was pooled and centrifuged at 16000g for 10 min to remove particulates. The cleared solution was fractionated at room temperature on a Narrowbore LC/MS C4 column (2.1 × 250 mm, 5 µm particle, 300 Å pore) from Vydac (Hesperia, CA) using an Agilent HP1100 HPLC system (Wilmington, DE). Elution solvents A and B were 0.1% TFA (aq) and 0.1% TFA in 95% acetonitrile (aq), respectively. Sample proteins were eluted from the column at a flow rate of 0.2 mL/min using the following gradient program: 5% B, 10 min isocratic; 5 to 60% B over 50 min, linear gradient; 60% B, 10 min isocratic; 60 to 80% B over 10 min, linear gradient; 80% B, 5 min isocratic. Protein peaks were detected by UV absorbance at 214 nm and collected manually in Eppendorf Safe-lock polypropylene microfuge tubes. HPLC fractions were lyophilized dry and reconstituted in 40 µL 5/5/0.01 (V/ V/V) acetonitrile/water/TFA for MALDI-MS analysis. A highly glycosylated protein subsequently identified as SBP eluted at 52-55 min (not corrected for system delay time). Conditions for Enzymatic Digestion of HPLC-Purified SBP. The final concentrated HPLC fraction (elution time 5255 min) was estimated to contain about 10 pmol of protein/ µL on the basis of the UV absorbance signal. All digestions were carried out in 50 mM ammonium bicarbonate buffer (pH 7.8-8.0). Treatments were as follows: (1) Trypsin digestion. A 7.5-µL portion of the concentrated HPLC fraction (approximately 20% of the total sample) was dried and reconstituted in 10 µL buffer containing 0.05 µg trypsin and the protein was digested for 6 or 24 h at 37 °C. (2) Endoproteinase Glu-C digestion. A 7.5 µL portion of the concentrated HPLC fraction was dried and reconstituted in 10 µL of buffer containing 0.05 µg of enzyme. After 4.5 h reaction at 37 °C, another aliquot of 0.05 µg of enzyme in 10 µL buffer was added to the reaction and digestion was continued an additional 15.5 h. (3) PNGaseF digestion was used to remove N-linked carbohydrates from both a sample of the nonreduced protein as well as from a sample that had been preincubated with trypsin for 24 h. In each case, 1 unit of PNGaseF (reconstituted at 5 units/µL with water) was added to the sample for a 24 h digestion at 37 °C. (4)

Structure of SBP by Mass Spectrometry Reduction/Alkylation/Trypsinization. A 7.5 µL portion of the concentrated HPLC fraction was dried and reconstituted in 10 µL of buffer. The protein solution was then mixed with 1 µL of 11 mM DTT (in 50 mM NH4HCO3, aq) and the reaction blanketed with argon. Reduction was allowed to proceed at 60 °C for 15 min. The reaction was then cooled to room temperature and 1 µL of 30 mM IAA (aq) was added to produce the carbamidomethyl derivative of the reduced protein. After a 30-min reaction period in the dark at 37 °C, the excess IAA was quenched by the addition of 2 µL of 11 mM DTT. The reduced and alkylated protein was then digested with trypsin (0.05 µg added in 5 µL 50 mM acetic acid) at 37 °C for 7 h. HPLC Fractionation of the Trypsin-Digested Reduced/ Alkylated SBP. A Waters Alliance HPLC system (Milford, MA) configured with a 2690 separations module and a 2487 dual-wavelength absorbance detector and computer controlled with Millenium32 software (ver. 3.05.01) was used to fractionate the trypsin digest. Eight microliters of the digest in 50 mM NH4HCO3 (aq) were diluted with 100 µL of eluent A to lower the sample pH to approximately 2. The acidified sample (100 µL) was loaded onto a Vydac Narrowbore LC/ MS C4 column (2.1 × 250 mm, 5 µm particle, 300 Å pore) maintained at 25 °C. Peptide fragments were eluted from the column at a flow rate of 0.2 mL/min with a linear gradient of 5 to 80% B over 75 min. A and B eluents were 1/1000 TFA/water (V/V) and 0.85/1000 TFA/acetonitrile (V/V), respectively. Peptide fragments were detected by UV absorbance at 214 nm. One-minute fractions were collected in Eppendorf Safe-lock microcentrifuge tubes, frozen, and lyophilized dry. Fractions were reconstituted with 10 µL of a solution of 6/3.9/0.1 (V/V/V) methanol/water/88% formic acid shortly before analysis by nanoESI QTOF MS. Mass Spectrometric Analysis. MALDI MS of protein or peptide samples isolated by HPLC or in enzymatic digestion reaction mixtures were obtained using the dried-droplet method. Samples were acidified when necessary and 0.5-1 µL mixed with an equal volume of appropriate matrix solution on a stainless steel target. Protein samples were analyzed with SA matrix dissolved at 20 mg/mL in 5/5/0.01 (V/V/V) acetonitrile/water/TFA. Peptide-containing samples were analyzed with CHCA or 2,5-DHB dissolved at 10 mg/ mL in 5/5/0.01 (V/V/V) acetonitrile/water/TFA. The sample/ matrix mixtures were air-dried on the target. Analyses were performed on a Voyager DE-STR MALDI MS. NanoESI QTOF MS was performed on an MDS Sciex (Toronto, Canada) QStar instrument with a Protana (Protana A/S, Odense, Denmark) nanospray source. Samples were usually dissolved in methanol/water/88% formic acid (6/3.9/ 0.1, V/V/V). Positive ion spectra were acquired in the profile mode with an ion count accumulation time of 1 s, a TOF pulser frequency of 6.99 kHz, and pulse duration of 13 µs. A stable spray was obtained at 800-900 V. For data acquisition in the TOF MS mode, the curtain and CAD nitrogen gas settings were 15 psi and 3-4 (arbitrary units), respectively. Peptide sequences were obtained from product ion spectra acquired for ions selected with Q1 and dissociated by increasing the CAD gas setting and Q0 offset potential. Protein spectra were deconvoluted using the Macintosh-based Q-Star software.

Biochemistry, Vol. 40, No. 32, 2001 9727

FIGURE 1: Protein profiles obtained by MALDI MS after fresh tissue blotting on a polyethylene membrane for the mouse anterior (AP), dorsal (DP), lateral (LP) and ventral prostate (VP) lobes. R+ and β+ are alpha and beta-hemoglobin, respectively.

FIGURE 2: (a,b) Enlargement from Figure 1 of the protein profile obtained for the mouse ventral prostate lobe (VP) displaying insource decay fragment ion signals. (c) Assignment of observed fragment ions. Numbering is relative to Swiss-Prot sequence P15501 for the processed mouse spermine-binding protein (SBP).

RESULTS Protein Profiling and MALDI MS In-Source Decay. The major secretory protein of mouse ventral prostate is the androgen-regulated SBP, a 22-25 kDa glycoprotein. In the course of performing MALDI MS of tissue blots to compare the protein expression of the different lobes of the prostate, a set of proteins in this molecular mass range was found specifically localized in normal ventral prostate tissue from sexually mature CD1 mice (Figure 1). The mass differences of the signals corresponded to increments of carbohydrate residues, indicating these were glycoforms of the same protein. The mass spectra obtained from the ventral prostate also exhibited a unique set of low MW signals in the mass-tocharge (m/z) range from about 1500-5000 (Figure 2a). Because the mass differences between these low molecular weight signals corresponded to amino acid residue weights, the ions were assumed to be derived from in-source decay of a high abundance protein in the blot. Both c-type and

9728 Biochemistry, Vol. 40, No. 32, 2001

FIGURE 3: (a) Protein profile obtained by MALDI MS for the rinsing solution of the ventral prostate blot, (b) UV absorbance profile obtained from the HPLC separation of the proteins present in the ventral prostate lobe rinsing solution, (c) MALDI MS spectrum of the 55-min HPLC fraction, “SBP ISD ions” are spermine-binding protein in-source decay fragment ions, and (d) MALDI MS of the 55-min HPLC fraction after deglycosylation.

a-type in-source fragment ions were accounted for by the ion sequence (Figure 2b), and several sequence tags could be derived from the data. For example, -F(I/L)SVFKFwas deduced from the ions in the range m/z 2969.3-3838.5. When used to query the NCBInr database, “FLSVFKF” proved to be specific for SBP from mouse. A larger portion of the spectrum provided the amino acid sequence QGEDQGQLKGMRIFLSVFKFIKGFQLQFGS, corresponding to the segment [32-61] of pro-SBP. All of the observed ions were 111 Da greater than was expected based on the reported sequence of functional SBP, suggesting the presence of pyroglutamic acid at the sequence N-terminus, similar to that found in the rat SBP (26). These data are summarized in Figure 2c. The mass accuracy of the measured fragment ions was within one mass unit of the predicted average mass calculated for c-type N-terminal fragment ions. No in-source decay ion signals corresponding to other proteins were detected in the different protein profiles. Interestingly, the in-source decay ion pattern abruptly terminated at m/z 4944.9, corresponding to the c-type ion formed after the amino acid serine in position 44 of SBP. The asparagine residue in position 45 of rat SBP had been shown to be glycosylated by Anderegg and co-workers (26). We, therefore, proposed that asparagine 45 of processed mouse SBP was N-glycosylated. No in-source decay ions with an ion distribution corresponding to the heterogeneous addition of sugar residues on SBP were detected. For further studies, the protein was purified from the pooled water rinses of the ventral prostate tissue blot. As shown in Figure 3, HPLC fractionation of the pooled rinses yielded a highly purified preparation of SBP, confirmed by the intense signals for in-source decay ions. Treatment of the purified glycoforms with PNGaseF for 24 h converted approximately two-thirds of the total glycoform population to the deglycosylated protein at m/z 20 682 ((2) (Figure 3d). It must be noted that a signal of low intensity at the same m/z value was also detected in the profile obtained from the

Chaurand et al.

FIGURE 4: Partial protein profiles obtained by MALDI MS after fresh tissue blotting on a polyethylene membrane of the ventral prostate lobes from the CD1 and C57/B6 strains.

blot of the ventral lobe (Figure 1) and in its rinsing solution (Figure 3a). Upon deglycosylation of the glycoforms, the intensity of signals from in-source decay ions c14-c44 was reduced significantly in the MALDI MS. However, no strong signals corresponding to c45 and higher N-terminal derived in-source decay fragment ions were observed as the result of the production of carbohydrate-free SBP. These observations suggested that under the same experimental conditions, the carbohydrate-free protein undergoes very little in-source fragmentation and that the carbohydrate moiety plays an active role in the in-source decay charge remote fragmentation process(es). Other glycoproteins are being examined in order to determine if this is a general phenomenon or more specific to SBP. Posttranslational Modification Characterization. The molecular mass observed for purified SBP after deglycosylation was 658 ((2) Da greater than that expected based on the Swiss-prot database entry (P15501) for mouse SBP (shown in Figure 13). We had, however, used a different strain of mice (CD1) than the one used for the original sequence determination (C57/BK). The strain C57/BK could not be found but the very similar strain C57/B6 was investigated. Figure 4 compares the protein profiles obtained after blotting the ventral prostate from both the CD1 and C57/B6 strains. A mass difference of 549 Da was observed between these two strains indicating either sequence differences or different posttranslational modifications. The MW of the carbohydratefree SBP from the C57/B6 mouse was found to be 111 ((2) Da greater than that expected based on the SBP database entry potentially corresponding to the presence of an Nterminal pyroglutamic acid. This is also supported by the in-source decay profile observed for the C57/B6 SBP (not shown). The strategy taken to characterize CD1 mouse SBP and to determine the source of the molecular weight difference was to utilize various proteases in combination with the N-deglycosidase PNGaseF to obtain peptide maps for the purified protein. Analysis of these peptide maps by MALDI MS and selected peptides by nanoESI QTOF MS/MS provided sufficient sequence information to account for the observed mass difference as described below and summarized in Table 1. Note that the amino acid residue numbers are based on the sequence results reported here.

Structure of SBP by Mass Spectrometry

Biochemistry, Vol. 40, No. 32, 2001 9729

Table 1: List of Peptides from SBP Expressed by CD1 Mice Sequenced by Mass Spectrometry location

sequence

calcd mass (mono.)

[1-10] [11-23] [24-26] [27-33] [34-36] [37-53] [54-70] [71-88] [91-97] [101-110] [111-126] [127-131] [132-149] [154-187] [148-187] [1-187]

Pyr-NVLGNAAGK YFYVQGEDQGQLK GMR IFLSVFK FIK GFQLQFGSN*WTDVYGTR SENFIDFLLEDGEHVIK VEGSAVICLTSLTFTTNK VATFGVR YFSDTGGSDK HLVTVNGMHAPGLCVK GMGFK WGNINANGNDHYNNKEDK DADNKDADNKDADNKDDGDEDDDGNDDDDQKDES DKADNKDADNKDADNKDADNKDDGDEDDDGNDDDDQKDES carbohydrate free SBP (CD1 mouse)

953.5 1573.7 362.2 852.5 406.3 1974.9 2004.0 1883.0 748.4 1075.4 1674.9 538.3 2101.9 3742.4 4413.7 20 678.5 (avg)

MS/MS 477.8(+2) 787.9(+2) 363.2(+1) 427.3(+2) 407.3(+1) 659.6(+3)b 669.0(+3) 971.0(+2)c 375.2(+2) 538.7(+2) 578.3(+3)c 539.3(+1) 702.0(+3) 936.6(+4) 883.7(+5)

∆M (avg)

Figure

+111.1

6

+14 {-1}

7 8

-28, {-1} +18

8 9

+543.6 +543.6 +658.7

10 13

a ∆M represents the average mass difference for each peptide from the published sequence. b N[45] modified to D by deglycosidase in studied peptide. c Carbamidomethylation of cysteines in studied peptides. {Disulfide bridge subtracts 2 u}. Bold ) sequence difference. Italic ) sequence order not confirmed by MS/MS.

FIGURE 6: NanoESI QTOF MS/MS spectrum of SBP tryptic peptide [1-10] obtained from the double-charge ion detected at m/z 477.8. FIGURE 5: Peptide maps of SBP obtained by MALDI MS using 2,5-DHBA as matrix after (a) trypsin digestion and (b) trypsin plus PNGaseF digestion.

Glycosylation Site, Tryptic Peptide [37-53]. Figure 5 shows the MALDI MS analyses of the limit tryptic digests of SBP purified from CD1 mouse obtained both with and without the addition of PNGaseF. In the absence of PNGaseF the presence of glycopeptides was evident from the ion series observed from m/z 3700-5000 where many species appeared to be related by mass differences of 162 (hexose residue) or 203 (N-acetyl hexosamine residue) daltons. Addition of PNGaseF to the tryptic digest eliminated these species and no evidence was seen for the retention of O-linked glycopeptides (sugar residues not cleaved by the PNGaseF). A unique peptide at m/z 1977.1 appeared in the deglycosylated tryptic digest. NanoESI QTOF MS/MS of the triple-charge ion of this peptide at m/z 659.6 was performed (not shown) and the sequence of (GF)QLQFGSDWTDVYGTR expected from the SBP database sequence was confirmed. Note that the method does not distinguish leucine from isoleucine or lysine from glutamine, and that asparagine (N45) would be converted to (D45) during the enzyme cleavage of the peptidecarbohydrate linkage. These observations, in conjunction with the in-source decay results described above, supported the

conclusion that mature CD1 SBP had only a single Nglycosylation site at N45. N-Terminal Pyroglutamate, Tryptic Peptide [1-10]. The presence of N-terminal pyroglutamate would result in the tryptic fragment Pyr-NVLGNAAGK (monoisotopic molecular mass 953.5 Da). The MS/MS spectrum of the doublecharge tryptic ion at m/z 477.8 obtained after HPLC fractionation of the digest is shown in Figure 6. This spectrum supported the proposed sequence. The N-terminal pyroglutamate residue was not directly recorded, due to the absence of b1 and y9 fragment ions. Nevertheless, the observed masses for the y2 through y8 fragments were within 0.002 (( 0.002) Da of the expected values. Comparison of the observed b fragments with those for the expected sequence and alternative isobaric sequences (QP)VLGNAAGK and (KP)VLGNAAGK showed average differences of only 0.003 (( 0.003) Da for the fit with the expected sequence, versus 0.037 (( 0.005) and 0.073 (( 0.005) Da for the respective alternative sequences. Sequence Differences Found for CD1 SBP. Tryptic Peptide [54-70]. The predicted tryptic peptide ([M + H]+ ) 1992.2), was not observed. However, MALDI MS analysis showed a tryptic fragment at m/z 2005.0 (Figure 5). MS/ MS fragmentation of the corresponding triple-charge ion

9730 Biochemistry, Vol. 40, No. 32, 2001

FIGURE 7: NanoESI QTOF MS/MS spectrum of SBP tryptic peptide [54-70] obtained from the triple-charge ion detected at m/z 669.0.

FIGURE 8: NanoESI QTOF MS/MS spectrum of SBP tryptic peptide [71-88]-S-S-[111-126] from the quadruple-charge ion observed at m/z 890.2.

observed in nanoESI QTOF MS at m/z 669.0, showed this peptide to be modified by the substitution of glutamic acid for aspartic acid at position 55 (Figure 7). This sequence change was confirmed by the presence of the b2 fragment ion (m/z 217.1), the double-charge y15 fragment ion (m/z 895.0), and the double-charge y16, y16-H2O, and y16-NH3 fragment ions (at m/z 959.5, 950.5, and 951.0, respectively). This modification of the protein resulted in an increase of the molecular weight by 14 Da. Tryptic Peptides [71-88] and [111-126]. Each of these peptides contains a cysteine residue and they could possibly be linked to each other through disulfide bond formation. Neither of the individual peptides (expected monoisotopic [M + H]+ values of 1884.0 and 1703.9, respectively) nor the disulfide-bridged peptide (average [M + H]+ ) 3587.2) was observed in the mass spectrum of the tryptic digest of nonreduced protein. However, an ion observed in the MALDI-MS spectrum at m/z 3558.4 (average mass) and as the quadruple-charge ion in the nanoESI spectrum at m/z 890.2, when fragmented by MS/MS, gave partial sequence coverage corresponding to the disulfide-bridged peptides. The fragmentation scheme is shown in Figure 8. The mass of the parent disulfide-bridged peptide was 28 Da lower than expected from the published sequence indicating a modification of one or both of the component peptides. Examination

Chaurand et al.

FIGURE 9: NanoESI QTOF MS/MS spectrum of SBP tryptic peptide [127-131] obtained from the single-charge ion detected at m/z 539.3.

of the MS/MS data obtained from the m/z 890.2 ion suggested the possible sequence change was located within the last six residues of the [111-126] tryptic peptide. Furthermore, the presence of a fragment ion signal at m/z 147.1 and the absence of the expected signal for an arginine y1 ion at m/z 175.1 strongly suggested the presence of a lysine at the C-terminus of tryptic fragment [111-126]. Reductive alkylation of purified SBP followed by trypsin digestion and HPLC fractionation of the digest yielded the carbamidomethylated fragment [111-126]. MS/MS sequencing of the peptide showed the presence of C-terminal lysine. This confirmed the substitution of lysine for arginine at position 126 and accounted for a 28 Da reduction in the mass of SBP. Tryptic Peptide [127-131]. An expected peptide at m/z 521.3 corresponding to the expected sequence GIGFK was not observed in any of the tryptic digests of CD1 SBP. However, a single-charge ion at m/z 539.3 was detected in nanoESI spectra, which could not be assigned to any known residue sequence. HPLC purification of the tryptic digest of the carbamidomethylated SBP produced a purified fragment of m/z 539.3 which, when subjected to MS/MS analysis, showed the fragment ions expected for the peptide GMGFK (Figure 9). Substitution of methionine for isoleucine at position 128 of the protein accounted for an 18 Da increase in the protein molecular weight. Tryptic Peptide [132-149]. On the basis of the published SBP sequence, [M + H]+ of the [132-149] tryptic peptide was expected to be 2102.9 (monoisotopic value). The MALDI MS data for the trypsin digest of SBP clearly displayed an [M + H]+ signal at m/z 2103.9, which differed from the expected m/z value by one mass unit. However, the isotope distribution at m/z 2103.9 also indicated the presence of an additional signal of lower intensity at m/z 2102.9. NanoESI QTOF MS for multiple-charge ions from this peptide also exhibited isotope distributions, which indicated the presence of two species differing in mass by one Da. The MS/MS analysis showed the fragmentation pattern expected for the tryptic fragment [132-149] with isotope distributions indicating the presence of both N138 and D138-containing peptides. The fragment ion signals derived from the asparagine-containing peptide were less abundant than those from the aspartic acid-containing peptide and the amount of asparagine-containing peptide declined with an

Structure of SBP by Mass Spectrometry

Biochemistry, Vol. 40, No. 32, 2001 9731

FIGURE 10: NanoESI QTOF MS/MS spectrum of the endo Glu-C generated SBP [148-187] C-terminal peptide obtained from the quintuple-charge ion detected at m/z 883.7.

increase in the digestion time. Since position 139 was confirmed to be occupied by glycine, asparagine was thought to be the correct amino acid at position 138. The asparagineglycine sequence has been shown to readily rearrange to aspartic acid-glycine under alkaline conditions such as those utilized for trypsin digestion (27). The same rearrangement was also observed after the HPLC separation step for the [111-126] tryptic peptide, which also contains the partial sequence asparagine-glycine. C-Terminal Tryptic Peptide [154-187] and Endoproteinase Glu-C Peptide [148-187]. Sequence analysis of the C-terminus of SBP was problematic owing to the presence of a large number of acidic residues, many of which are adjacent to trypsin cleavage sites, and to the presence of the repeating unit “DADNK.” Examination of the MALDI MS spectrum of trypsin digests of the protein revealed none of the C-terminal peptides expected on the basis of the reported mouse SBP sequence. However, nanoESI QTOF MS/MS analysis of the quadruple-charge ion for a tryptic peptide at m/z 936.6 (corresponding to the MALDI-MS signal at m/z 3747.3, average mass) provided the following sequence: DADNKDADNKDADNKDDGDEDDDGNDDDDQKDES (data not shown). The result suggested that the sequence of the protein was modified by the insertion of an aspartic acid residue at position 154. Further examination of the protein sequence by endoproteinase Glu-C mapping and peptide sequencing showed a peptide at m/z 4417.1 (average mass) by MALDI MS. The MS/MS analysis of the quintuplecharge ion for this peptide at m/z 883.7 generated the spectrum shown in Figure 10. The sequence DKADNKDADNKDADNKDADNKDDGDEDDDGNDDDDQKDES was confirmed based on the presence of the b fragment ions detected in various charge states and single-charge y1 to y15 fragment ions. These data clearly indicated that an extra “DADNK” motif was present at positions 154-158 of the mouse SBP sequence. This sequence difference accounts for a mass difference of 543.6 Da in the molecular weight of the intact protein and shows that the signal at m/z 3747.3 in the tryptic digest of SBP was derived from the [154-187] sequence, the C-terminal segment of the protein. Other Tryptic Peptides. The tryptic peptides [11-23], [24-26], [27-33], [34-36], [37-53], [71-88], [91-97], [101-110], and [111-126] were all analyzed by nanoESI QTOF MS/MS either directly from the digest solutions or after HPLC purification. All of the sequences were found to

FIGURE 11: Distribution of the SBP glycoforms detected by (a) MALDI MS, (b) NanoESI QTOF MS (deconvoluted spectrum), and (c) MALDI MS profile of tryptic [37-53] glycopeptides obtained from SBP.

agree with the published mouse SBP sequence (25). Residues 98-100 (“RGR” in the reported sequence) were not confirmed by MS/MS. From the endo Glu-C digest, a signal at m/z 8048.1 was detected by MALDI MS potentially corresponding to the [73-147] SBP sequence containing the [98RGR100] residues. Taking into account the presence of the disulfide bond between the cysteines 78 and 124 (-2 Da) and the rearrangement of the asparagine 138 to aspartic acid (+1 Da), the measured MW of the [73-147] peptide is in agreement with the presence of [98RGR100]. Partial Structure of N-Linked Carbohydrate. As seen in Figure 11, a partial structure for the N-linked carbohydrate of SBP could be derived from the distribution of signals in both the MALDI MS and nanoESI QTOF MS spectra of the glycoforms. Differences in mass corresponding to hexose (162 Da), deoxyhexose (146 Da), and N-acetylhexosamine (203 Da) were observed, and the distribution of glycoforms was mirrored with the pattern measured for the glycopeptides released by trypsin digestion. The nanoESI QTOF MS/MS fragmentation spectrum of the triple-charge ion at m/z 1303.8 (from the glycopeptide observed at m/z 3909.7 (average mass) in MALDI MS) is shown in Figure 12. Given an expected tryptic peptide residue mass of 1976.1 Da (amino acids 37-53, average mass), the carbohydrate average residue mass was calculated to be 1932.6 Da (1930.7 Da monoisotopic). This mass combined with the fragmentation data shown indicated that the carbohydrate component of the glycopeptide consisted of (Hexose)6(N-acetylhexosamine)4(Deoxyhexose). This glycopeptide originated from the SBP glycoform observed at 22 611 Da (average) by nanoESI MS. Other SBP glycoforms that were observed could be rationalized as the result of additions of hexose and N-acetylhexosamine to this basic carbohydrate structure as shown in Figure 11. DISCUSSION Combining the sequence information obtained through MALDI in-source decay and nanoESI MS/MS, numerous strain-specific sequence differences from the published SBP sequence were detected, as shown in Table 1. In particular,

9732 Biochemistry, Vol. 40, No. 32, 2001

Chaurand et al.

FIGURE 12: NanoESI QTOF MS/MS spectrum of the signal observed at m/z 3909.7 in Figure 11, corresponding to an SBP [3753] tryptic glycopeptide. The MS/MS spectrum was obtained from the triple-charge ion detected at m/z 1303.8.

we have identified the presence of a pyroglutamate residue at the N-terminus of SBP. The in-source decay signals as well as the MS/MS data from the N-terminal tryptic peptide are in agreement with previous findings by Anderegg and co-workers who studied the rat SBP and reported the presence of a pyroglutamate at the N-terminal position of the processed SBP (26). The presence of a pyroglutamate residue at the N-terminal end of the processed protein formed by the rearrangement of the glutamine residue in position 18 of the proSBP sequence, also implies that the signal peptide of the proSBP consists of the amino acids from 1 to 17 and not from 1 to 18 as indicated in the published sequence (25). It is generally accepted that the presence of an N-terminal pyroglutamate residue inhibits or minimizes N-terminal protein degradation by exopeptidases, and in some cases has been shown to be important for enzymatic activity (28). We have also identified the presence of a disulfide bridge in the sequence between cysteines 78 and 124 and a single N-glycosylation site on asparagine 45. The presence of the disulfide bridge also accounts for a difference of 2 mass units between the CD1 mouse carbohydrate free SBP and the calculated amino acid sequence (Swiss-Prot protein database, Primary Accession Number P15501). Table 1 lists the peptides from SBP expressed by CD1 mice, which were sequenced by mass spectrometry, and details the mass differences obtained after sequence corrections. These sequence variations account for a mass difference of +658.7 u (average mass), in agreement with the mass difference between the experimentally measured carbohydrate-free SBP and the published SBP sequence. On the basis of our measurements, Figure 13 presents the amino acid sequence of murine SBP as identified in the CD1 mouse strain compared to the sequence published by Mills et al. (25). Only three amino acids (98RGR100) out of 187 were not detected by MS/MS. However, the agreement (within the mass accuracy) between the measured (MW 20 680 ( 2) and the calculated mass (MW 20 678.5) of the carbohydrate-free SBP (assuming no spontaneous rearrangement of N138 to D138) suggests that this partial sequence is correct. The three amino acids, which were found different when comparing the CD1 and C57/BK SBP sequences, can all be explained by single point mutations in the SBP gene. Asp55(C57/BK) to Glu55(CD1) corresponds to the mutation of the codon GAT(C57/BK) to

FIGURE 13: Single letter amino acid sequence of the murine spermine-binding protein (SBP) as identified in the CD1 mouse strain. U indicates the N-terminal pyroglutamate, bold characters indicate differences from the published sequence (small caps), characters in italics indicate the amino acid order was not confirmed by MS/MS, and lower case italics highlight residues not detected by MS. SBP possesses an N-glycosylation site at N45 and a disulfide bridge between C78 and C124.

GA[A/G](CD1), Arg126(C57/BK) to Lys126(CD1) corresponds to the mutation of the codon AGA(C57/BK) to AAA(CD1), and finally, Ile128(C57/BK) to Met128(CD1) corresponds to the mutation of the codon ATT(C57/BK) to ATG(CD1). The codon sequences for the C57/BK strains were obtained from the SBP gene sequence published by Mills and colleagues (25). Because at this point we have only studied two forms of SBP, we cannot tell if these mutation sites are hyper-variable. SBP, primarily expressed in the ventral lobe, was detected in its mature fully processed glycoforms. SBP was also detected in the other prostate lobes displaying the same isoforms but expressed at much lower intensities. A low abundance signal at m/z 20 681 corresponding to carbohydratefree SBP, was observed in blots from the mouse ventral prostate, in the rinsing solution recovered after tissue blotting as well as in the 56-min HPLC fraction sampled on the tail end of the UV signal corresponding to SBP. These results suggest that SBP is present in the ventral prostate in both the glycosylated and unglycosylated isoforms. Because of the mass accuracy of (2 units on the measurement of the carbohydrate-free SBP, we cannot detect with certainty if the rearrangement of the asparagines 116 and 138 to aspartic acid detected on the [111-126] and [132-149] tryptic peptides, respectively, is taking place on the protein. However, it is noted that the N-deglycosylation reaction was performed in ammonium bicarbonate buffer potentially favoring these rearrangements. Both MALDI and ESI MS have been used effectively for the study of glycoproteins (29-32). From the width of the signal distribution of the glycoforms, measured in both the MALDI and nanoESI MS protein profiles, it is estimated that the carbohydrate entity is composed of up to 20 sugar residues with a mean value centered on 14. From the MS/ MS data obtained from the [37-53] glycopeptide of MW 3908.7 (average), a general structure for the carbohydrate entity of the molecule is proposed. We have used the

Structure of SBP by Mass Spectrometry molecular weight of the carbohydrate entity of the [37-53] glycopeptide (MW 1930.7 + 18 ) 1948.7, monoisotopic mass) to scan the GlycoSuite DB carbohydrate database (33) (www.glycosuite.com). Only one known N-linked carbohydrate structure composed of 11 sugar residues matching our experimental data was reported for several proteins in mammalian species. Bovine brain ribonuclease (34), porcine major seminal plasma glycoproteins PSP-I and PSP-II (35), and various mouse/human hybrid immunoglobulin G glycoproteins (36) have been found modified with N-linked oligosaccharides with this same sugar composition. The structure below is proposed as the structure of the carbohydrate entity of the [37-53] tryptic glycopeptide (corresponding to the SBP signal at MW 22 610). It is in agreement with the experimental MS/MS data and our proposed carbohydrate composition. From our data, however, we cannot identify the nature of the different carbohydrate residues nor their interconnecting chemistries.

One of the striking properties of SBP is its potential to generate N-terminal in-source decay ions in high abundance. Several structural features can lead to the favored formation of N-terminal in-source decay ions. In particular, the protein requires one or more potential ionization sites near or at its N-terminus (37, 38). The potential N-terminal ionization sites are essentially the protein N-terminus, and lysine and arginine (basic) amino acid residues. In the case of the mouse SBP protein, the N-terminal residue is a pyroglutamate, which because of its cyclic structure is not likely to capture the MALDI ionizing proton compared to a free N-terminal primary amine. This statement is supported by previous experimental data generated by Takayama and co-workers who have analyzed by MALDI in-source decay MS several peptides possessing a pyroglutamate N-terminal residue (37, 38). Considering the N-terminal sequence of SBP and the mass of the first observed in-source decay signal (at m/z 1527), the most probable location for the ionizing proton in the sequence is on the side chain of the lysine residue in position 10. ACKNOWLEDGMENT The mouse prostates were a kind gift from Dr. Robert Matusik of the Department of Urologic Surgery (Vanderbilt University). We also thank Dr. Robert Matusik for the use of his facility for animal storage and dissection. SUPPORTING INFORMATION AVAILABLE Supplemental material includes a figure presenting the nanoESI MS/MS spectrum of deglycosylated SBP [37-53] tryptic peptide obtained from the triple-charge ion detected at m/z 659.6 and a figure presenting the MALDI MS spectrum of the SBP [132-149] tryptic peptide, displaying monoisotopic signals at m/z 2102.9 and m/z 2103.9, suggesting the rearrangement of N138 to D138. This material is available free of charge via the Internet at http://pubs.acs.org. REFERENCES 1. Karas, M., Bachmann, D., Bahr, U., and Hillenkamp, F. (1987) Int. J. Mass Spectrom. Ion Processes 78, 53-68.

Biochemistry, Vol. 40, No. 32, 2001 9733 2. Hillenkamp, F., Karas, M., Beavis, R. C., and Chait, B. T. (1991) Anal. Chem. 63, 1193A-1202A. 3. Fenn, J. B., Mann, M., Meng, C. K., Wong, S. F., and Whitehouse, C. M. (1989) Science 246, 64-71. 4. Roepstorff, P. (1997) Curr. Opin. Biotechnol. 8, 6-13. 5. Yates, J. R. (1998) J. Mass Spectrom. 33, 1-19. 6. Mortz, E., Vorm, O., Mann, M., and Roepstorff, P. (1994) Biol. Mass Spectrom. 23, 249-261. 7. Chen, Y. J., Wall, D., and Lubman, D. M. (1998) Rapid Commun. Mass Spectrom. 12, 1994-2003. 8. Chaurand, P., Luetzenkirchen, F., and Spengler, B. (1999) J. Am. Soc. Mass Spectrom. 10, 91-103. 9. Link, A. J., Eng, J., Schieltz, D. M., Carmack, E., Mize, G. J., Morris, D. R., Garvik, B. M., and Yates, J. R. (1999) Nat. Biotechnol. 17, 676-682. 10. Borchers, C., Peter, J. F., Hall, M. C., Kunkel, T. A., and Tomer, K. B. (2000) Anal. Chem. 72, 1163-1168. 11. Brown, R. S., and Lennon, J. J. (1995) Anal. Chem. 67, 19982003. 12. Brown, R. S., and Lennon, J. J. (1995) Anal. Chem. 67, 39903999. 13. Roepstorff, P., and Fohlmann, J. (1984) J. Biomed. Mass Spectrom. 11, 601. 14. Biemann, K. (1990) Methods Enzymol. 193, 455-479. 15. Katta, V., Chow, D. T., and Rohde, M. F. (1998) Anal. Chem. 70, 4410-4416. 16. Reiber, D. C., Brown, R. S., Weinberger, S., Kenny, J., and Bailey, J. (1998) Anal. Chem. 70, 1214-1222. 17. Reiber, D. C., Grover, T. A., and Brown, R. S. (1998) Anal. Chem. 70, 673-683. 18. Lennon, J. J., and Walsh, K. A. (1997) Protein Sci. 6, 24462453. 19. Walsh, K. A., and Lennon, J. J. (1997) FASEB J. 11, 2955. 20. Lennon, J. J., and Walsh, K. A. (1999) Protein Sci. 8, 24872493. 21. Muscat, D., Henderickx, H., Kwakkenbos, G., van Benthem, R., de Koster, C. G., Fokkens, R., and Nibbering, N. M. M. (2000) J. Am. Soc. Mass Spectrom. 11, 218-227. 22. Caprioli, R. M., Farmer, T. B., and Gile, J. (1997) Anal. Chem. 69, 4751-4760. 23. Chaurand, P., Stoeckli, M., and Caprioli, R. M. (1999) Anal. Chem. 71, 5263-5270. 24. Todd, P. J., Schaaff, T. G., Chaurand, P., and Caprioli, R. M. (2001) J. Mass Spectrom. 36, 355-369. 25. Mills, J. S., Needham, M., and Parker, M. G. (1987) Nucleic Acids Res. 15, 7709-7724. 26. Anderegg, R. J., Carr, S. A., Huang, I. Y., Hiipakka, R. A., Chang, C., and Liao, S. (1988) Biochemistry 27, 4214-4221. 27. Brennan, T. V., and Clarke, S. (1995) in Deamidation and Isoaspartate Formation in Peptides and Proteins (Aswad, D. W., Ed.) pp 65-90, CRC Press, Boca Raton, FL. 28. Abraham, G. N., and Podell, D. N. (1981) Mol. Cell. Biochem. 38, 181-190. 29. Tsarbopoulos, A., Bahr, U., Pramanik, B. N., and Karas, M. (1997) Int. J. Mass Spectrom. 169, 251-261. 30. Nemeth, J. F., Hochensang, G. P., Marnett, L. J., and Caprioli, R. M. (2001) Biochemistry 40, 3109-3116. 31. Harvey, D. J. (2000) J. Am. Soc. Mass Spectrom. 11, 900915. 32. Harvey, D. J. (2000) J. Mass Spectrom. 35, 1178-1190. 33. Cooper, C. A., Harrison, M. J., Wilkins, M. R., and Packer, N. H. (2001) Nucleic Acids Res. 29, 332-335. 34. Katoh, H., Ohgi, K., Irie, M., Endo, T., and Kobata, A. (1990) Carbohydr. Res. 195, 273-293. 35. Nimtz, M., Grabenhorst, E., Conradt, H., Sanz, L., and Calvete, J. J. (1999) Eur. J. Biochem. 265, 703-718. 36. Lund, J., Takahashi, N., Nakagawa, H., Goodall, M., Bentley, T., Hindley, S. A., Tyler, R., and Jefferis, R. (1993) Mol. Immunol. 30, 741-748. 37. Takayama, M., and Tsugita, A. (1998) Int. J. Mass Spectrom. 181, L1-L6. 38. Takayama, M., and Tsugita, A. (2000) Electrophoresis 21, 1670-1677. BI010424L