Anal. Chem. 2008, 80, 951-960
Partial Acetylation of Lysine Residues Improves Intraprotein Cross-Linking Xin Guo,†,‡ Pradipta Bandyopadhyay,‡,# Birgit Schilling,§ Malin M. Young,| Naoaki Fujii,‡ Tiba Aynechi,‡ R. Kiplin Guy,⊥ Irwin D. Kuntz,‡ and Bradford W. Gibson*,‡,§
Department of Pharmaceutics and Medicinal Chemistry, University of the Pacific, Stockton, California 95211, Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94143, Buck Institute for Age Research, Novato, California 94945, Sandia National Laboratories, Livermore, California 94551-0969, and St. Jude Children’s Research Hospital, Memphis, Tennessee 38105
Intramolecular cross-linking coupled with mass spectrometric identification of cross-linked amino acids is a rapid method for elucidating low-resolution protein tertiary structures or fold families. However, previous crosslinking studies on model proteins, such as cytochrome c and ribonuclease A, identified a limited number of peptide cross-links that are biased toward only a few of the potentially reactive lysine residues. Here, we report an approach to improve the diversity of intramolecular protein cross-linking starting with a systematic quantitation of the reactivity of lysine residues of a model protein, bovine cytochrome c. Relative lysine reactivities among the 18 lysine residues of cytochrome c were determined by the ratio of d0 and acetyl-d3 groups at each lysine after partial acetylation with sulfosuccinimidyl acetate followed by denaturation and quantitative acetylation of remaining unmodified lysines with acetic-d6 anhydride. These lysine reactivities were then compared with theoretically derived pKa and relative solvent accessibility surface values. To ascertain if partial N-acetylation of the most reactive lysine residues prior to cross-linking can redirect and increase the observable Lys-Lys cross-links, partially acetylated bovine cytochrome c was cross-linked with the aminespecific, bis-functional reagent, bis(sulfosuccinimidyl)suberate. After proteolysis and mass spectrometry analysis, partial acetylation was shown to significantly increase the number of observable peptides containing Lys-Lys cross-links, shifting the pattern from the most reactive lysine residues to less reactive ones. More importantly, these additional cross-linked peptides contained novel Lys-Lys cross-link information not seen in the nonacetylated protein and provided additional distance constraints that were consistent with the crystal structure and facilitated the identification of the proper protein fold. Recently, intramolecular protein cross-linking coupled with identification of peptide cross-links using mass spectrometry and * To whom correspondence should be addressed. E-mail: buckinstitute.org. Phone: 415 209 2032. Fax: 415 209 2231. † University of the Pacific, Stockton. ‡ University of California, San Francisco. § Buck Institute for Age Research. | Sandia National Laboratories. ⊥ St. Jude Children’s Research Hospital. 10.1021/ac701636w CCC: $40.75 Published on Web 01/18/2008
bgibson@
© 2008 American Chemical Society
computational modeling (MS3D)1-3 has emerged as an alternative strategy for determining protein folding. Compared with X-ray crystallography and NMR, MS3D offers structures of lower resolution (4-5 Å rms) but is more rapid and less demanding on sample quantity and purity. Moreover, a unique feature of intramolecular cross-linking technology is that the reactions can be carried out under physiological conditions, low concentrations, and/or in mixed systems, such as the presence or absence of a ligand or a secondary binding protein. Such features make MS3D potentially suitable for the high throughput structure determination of large numbers of proteins discovered from genomic and proteomic projects.4,5 In theory, MS3D should be a broadly applicable methodology, as long as the protein carries, on the majority of its surface, sufficient residues that can be cross-linked to generate ∼N/10 informative distance constraints, where N is the number of amino acids in the protein.2 Soluble proteins generally have a number of exposed primary amines including lysine side chains and, when not blocked, the primary amino group at the N-terminus. Each of these amino groups is potentially available for amine-specific cross-linking using reagents such as bis(sulfosuccinimidyl)suberate (BS3).6 BS3 is one of the most frequently used of the many commercially available cross-linking reagents and is a homobifunctional, water-soluble, and membrane-impermeable reagent containing an N-hydroxysulfosuccinimide (sNHS) ester at each end of an eight-carbon spacer arm. NHS esters selectively acetylate primary amines at pH 7-9 to form stable amide bonds, along with release of the sNHS leaving group. However, recent studies3,7-9 using NHS-based cross-linking reagents including BS3 yielded only a very limited number of # Current address: School of Information Technology, Jawaharlal Nehru University, New Delhi, India 110067. (1) Albrecht, M.; Hanisch, D.; Zimmer, R.; Lengauer, T. In Silico Biol. 2002, 2, 325-337. (2) Young, M. M.; Tang, N.; Hempel, J. C.; Oshiro, C. M.; Taylor, E. W.; Kuntz, I. D.; Gibson, B. W.; Dollinger, G. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 5802-5806. (3) Kruppa, G. H.; Schoeniger, J.; Young, M. M. Rapid Commun. Mass Spectrom. 2003, 17, 155-162. (4) Elkin, P. L. Mayo Clin. Proc. 2003, 78, 57-64. (5) Attwood, T. K.; Miller, C. J. Biotechnol Annu. Rev. 2002, 8, 1-54. (6) Kotite, N. J.; Staros, J. V.; Cunningham, L. W. Biochemistry 1984, 23, 30993104. (7) Taverner, T.; Hall, N. E.; O’Hair, R. A.; Simpson, R. J. J. Biol. Chem. 2002, 277, 46487-46492. (8) Pearson, K. M.; Pannell, L. K.; Fales, H. M. Rapid Commun. Mass Spectrom. 2002, 16, 149-159.
Analytical Chemistry, Vol. 80, No. 4, February 15, 2008 951
peptide cross-links per protein, even though the proteins under investigation (e.g., cytochrome c, ribonuclease A, and ubiquitin) carried many potentially modifiable lysine pairs that might be expected to react based on distance considerations. Compounding this problem was the fact that very few unique distance constraints could be derived from the peptide cross-links because many of the experimentally observed cross-linked peptides were variations of modification of the same limited subset of lysine residues.10 In this report, we address the challenge of diversifying the cross-linking reactions to obtain sufficient distance constraints for protein fold determination. Since a cross-linking reagent first modifies the most reactive amino acid residues,11 we hypothesize that the uneven reactivity of the lysine residues may be a limiting factor for the diversity of cross-linking reactions in the previous studies, with the most reactive residues quenching most of the cross-linking reagent. As cytochrome c has often been used as a model protein for intramolecular cross-linking reactions,3,8-10,12 we used this protein to determine the relative reactivities of its 18 amino groups to acetylation reactions and then to generate sets of intramolecular cross-links for the protein that had first undergone partial acetylation. These data were then compared with calculated pKa and relative solvent accessibility surface values of the amino groups. Our aim is to better understand the reactivity of lysine residues in proteins and find ways to improve the experimental cross-linking profile. EXPERIMENTAL SECTION Materials. Bovine heart cytochrome c and reagents for protein chemistry including iodoacetamide and DTT were obtained from Sigma (St. Louis, MO). Sequencing-grade modified trypsin (porcine, Promega, Madison, WI) and Pepsin (from porcine gastric mucosa, Sigma, St. Louis, MO) were utilized for in-solution digestion reactions. Sulfosuccinimidyl acetate (sNHS-Ac) and bis(sulfosuccinimidyl)suberate (BS3) were purchased from Pierce/ Thermo-Fisher (Rockford, IL). The deuterated acetylation reagent, acetic-d6 anhydride, was obtained from Cambridge Isotopes (Andover, MA). High-performance liquid chromatography (HPLC) solvents, such as acetonitrile (ACN) and water were obtained from Burdick & Jackson (Muskegon, MI). Partial Acetylation of Bovine Cytochrome c by sNHS-Ac. Bovine heart cytochrome c was dissolved in buffer (20 mM Na2HPO4, 150 mM NaCl, pH 7.5) to a final protein concentration of 40 µM, which was then aliquoted into 100 µL aliquots for the following acetylation reactions. sNHS-Ac was dissolved with 5 mM sodium citrate buffer (pH 5, 4 °C) into a 100 mM stock solution. The sNHS-Ac stock solution was immediately diluted to yield reaction solutions of 1, 2, 5, 10, 20, 50, and 100 mM). For each of these seven concentrations, a 10 µL aliquot was mixed with a 100 µL aliquot of the 40 µM cytochrome c solution, yielding an sNHSAc molar equivalent of 2.5, 5, 12.5, 25, 50, 125, 250, respectively, relative to protein concentration. The reaction mixtures were agitated and incubated at 4 °C for 4 h. (9) Dihazi, G. H.; Sinz, A. Rapid Commun. Mass Spectrom. 2003, 17, 20052014. (10) Schilling, B.; Row, R. H.; Gibson, B. W.; Guo, X.; Young, M. M. J. Am. Soc. Mass Spectrom. 2003, 14, 834-850. (11) Novak, P.; Young, M. M.; Schoeniger, J. S.; Kruppa, G. H. Eur. J. Mass Spectrom. (Chichester, Eng.) 2003, 9, 623-631. (12) Kalkhof, S.; Ihling, C.; Mechtler, K.; Sinz, A. Anal. Chem. 2005, 77, 495503.
952
Analytical Chemistry, Vol. 80, No. 4, February 15, 2008
Denaturation, Alkylation, and Subsequent Exhaustive Acetylation.13,14 An aliquot of each partially acetylated cytochrome c reaction mixture (55 µL) was mixed with 33.3 µL of acetonitrile and 9.2 µL of 100 mM DTT. The mixture was heated at 60 °C for 1 h and then cooled to room temperature. Iodoacetamide (8.5 µL, 0.1 g/mL in water) was added to the mixture followed by incubation in the dark at 37 °C for 1 h. To quantitatively acetylate all remaining lysine residues not reacted by sNHS-Ac treatment, the denatured and alkylated protein was treated with acetic-d6 anhydride under conditions that minimize acetylation of hydroxy residues (i.e., half-saturated sodium acetate buffer). Each of the denatured protein solutions was mixed with 70 µL of saturated NaOAc buffer and then mixed with 3.5 µL of acetyl-d6 anhydride. The mixture was agitated on a vortex mixer at room temperature for 30 min, and its pH was adjusted to 8.0 with 10 N NaOH. Another aliquot (6 µL) of acetyl-d6 anhydride was added to each sample mixture, and the agitation was repeated at room temperature. Mass Spectrometric Analysis of d0- and d3-Labeled NAcetylated Lysines. To ensure a high protein coverage, the various cytochrome c samples that had been N-acetylated and cysteine alkylated were subjected to proteolytic digestion with pepsin. First, each protein mixture was acidified with 20 µL of formic acid, diluted with 460 µL of water, and left at 4 °C overnight to ensure the complete hydrolysis of the excess acetic-d6 anhydride. These protein mixtures were then adjusted to pH 3 with formic acid, and an aliquot of pepsin (24 µL of 7 µg/mL solution in water) was added to each sample followed by agitation in the dark (room temperature, 16 h). An 18 µL aliquot of each pepsin digestion sample was mixed with 2 µL of 1% TFA, each, and desalted using C4 zip-tips (Millipore, Bedford, MA). The concentration of acetonitrile in the zip-tip eluants (10 µL each) was lowered by a 10-fold dilution with water followed by a 10-fold reconcentration on a Speed-Vac apparatus (Savant, Thermo-Fisher Scientific, San Jose, CA). The proteolytic peptide mixtures were then acidified with an equal volume of 0.2% formic acid and analyzed by reverse-phase nano-HPLC-ESI-MS/MS. Briefly, peptides were separated on an Ultimate nanocapillary HPLC system equipped with a PepMap C4 nano-column (75 µm i.d. × 15 cm) (Dionex, Sunnyvale, CA) and a CapTrap Micro guard column of 0.5 µL bed volume (Michrom, Auburn, CA). Peptide mixtures were loaded onto the guard column and washed with the loading solvent (0.05% formic acid, flow rate: 20 µL/min) for 5 min, then transferred onto the analytical C18-nanocapillary HPLC column and eluted at a flow rate of 300 nL/min using the following gradient: 2% solvent B in A (from 0 to 5 min) and 2-60% solvent B in A (from 5 to 55 min). Solvent A consisted of 0.05% formic acid in 98% H2O/2% ACN, and solvent B consisted of 0.05% formic acid in 98% ACN/2% H2O. The column eluant was directly coupled to a QSTAR Pulsar i quadrupole orthogonal TOF mass spectrometer (MDS SCIEX, Concorde, Canada) equipped with a Protana nanospray ion source (ProXeon Biosystems, Odense, Denmark). Mass spectra (ESI-MS) and tandem mass spectra (ESI-MS/MS) were recorded in positive-ion mode with a resolution of 1200015000 full-width half-maximum. (13) Shen, S.; Strobel, H. W. Arch. Biochem. Biophys. 1993, 304, 257-265. (14) Shen, S.; Strobel, H. W. Arch. Biochem. Biophys. 1992, 294, 83-90.
Data Analysis and Quantitation of Concentration-Dependent Lysine N-Acetylation. The HPLC-MS/MS raw data files of protein acetylation experiments using 250 equiv of sNHS-Ac were searched with an in-house Mascot 2.1 database search engine program (Matrix Science)15 to obtain a library of identified acetylated peptides (search parameters for pepsin digestion: no enzyme specificity, 100 ppm mass accuracy). A subpopulation of the observed acetylated peptides (Supporting Information Table S1) was selected with good ion abundances and signal-to-noise ratios in both their MS and MS/MS spectra. This subset of acetylated peptides was then used for quantitation and determination of the Ac% values of cytochrome c lysine residues that were subjected to the stepwise sNHS-Ac/acetic-d6 anhydride experiments described above. The mono-isotopic ions of the resulting d0-acetylated peptides and their d3-acetylated isotopic counterparts were extracted from the total ion chromatograms (TIC) using the QSTAR Analyst QS software (see Supporting Information Figures S1 and S2 as examples). Under our HPLC separation and MS ion extraction conditions, the do- and d3-labeled peptides coeluted. The ion abundance data were transferred onto a Microsoft Excel worksheet, and small contributions from (minimally) overlapping isotopic ions clusters (e.g., d0 vs d3) were subtracted to obtain the ratio of the d0- and d3-acetylated peptide pairs. For a peptide with one lysine side chain, the molecular ions containing an acetyld0 versus an acetyl-d3 group were observed with a mass difference of 3 Da, and these ion pairs were used to calculate relative ion abundances (see Supporting Information Figure S1). For a peptide containing more than one lysine (or also containing the N-terminal Ac-Gly residues), the relative ion abundances of peptides containing acetyl-d0 and acetyl-d3 modifications were calculated based on molecular ions that were separated by n × 3Da, where n was the number of acetyl groups (see Supporting Information Figure S2). The relative abundance was then normalized to yield the percentage of acetylation of the peptide by sNHS-Ac. For some of the lysine residues, peptides with only one lysine residue were detected and the percentage of acetylation of that residue equals the percentage of acetylation of the peptide (Supporting Information Figure S1). Other residues were observed only in peptides with multiple lysine groups, and their percentage of acetylation was calculated by using the percentages of acetylation of more than one (overlapping) peptide. For example, the acetylation percentage of Lys-86 was calculated by the following formula:
A%K86 ) 3A%83-94 - 2A%87-96 where A%K86 is the acetylation percentage of Lys-86, A%83-94 is the acetylation percentage of peptide 83-94 (AGIKKKGEREDL), and A%87-96 is the acetylation percentage of peptide 87-96 (KKGEREDLIA). Cross-Linking of Bovine Cytochrome c. The cross-linking reactions were carried out as previously reported2,10 with minor modifications. For cross-linking with no prior acetylation, cytochrome c from bovine heart was dissolved with 20 mM Na2HPO4, 150 mM NaCl, pH 7.5, into a 10 µM solution, and reacted with a 50 mol equiv excess of freshly prepared BS3 solution (5 mM citrate, pH 5.0/10 mM BS3). For cross-linking reactions following (15) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Electrophoresis 1999, 20, 3551-3567.
partial acetylation, the acetylated bovine cytochrome c in 20 mM Na2HPO4, 150 mM NaCl, pH 7.5 was diluted by the same buffer to 10 µM and reacted with a 50 mol equiv excess of freshly prepared BS3 solution (5 mM citrate, pH 5.0/10 mM BS3). Each cross-linking reaction mixture was incubated for up to 24 h at 4 °C and quenched with an aliquot of 1 M Tris‚HCl at pH 8.0 to a final concentration of 10 mM of Tris‚HCl. Mass Spectrometry and Identification of Cross-Linked Peptides. Tryptic digestion of bovine cytochrome c proceeded at 37 °C with a trypsin/protein ratio of 1:20 (wt/wt). After 16 h, another aliquot of trypsin was added, and digestion continued for an additional 2 h. The enzymatic digestion was quenched with phenylmethanesulfanyl fluoride (PMSF). The tryptic hydrolysate, consisting of both unmodified and modified peptides, was then analyzed by LC/MS under the same conditions as those for analyzing the digestions of exhaustively acetylated bovine cytochrome c. The Automated Spectrum Assignment Program (ASAP; http:// roswell.ca.sandia.gov/∼mmyoung/asap.html) was used to first assign the MS spectral data as previously described.2 The proposed cross-linked peptides and other types of modified peptides as proposed by ASAP were then further interrogated by analyzing their corresponding MS/MS spectra using the MS2Assign program (http://roswell.ca.sandia.gov/∼mmyoung/ms2assign.html).10 For final confirmation, MS/MS spectra were inspected “manually” using criteria for evaluation similar to those published by Link et al.16 such as the presence of a sequence tag of several contiguous b- and/or y-ions, considering as well the general patterns of crosslinked peptide fragmentation as previously described.10 Calculation of Lysine pKa and Solvent Accessibility in Cytchrome c. Assuming that the sterically unhindered, free-base form of lysine side chains are the reactive species toward acylation, we calculated the pKa and the relative solvent accessible surface area (SAS) of all 18 lysine residues in bovine cytochrome c to explore their effects on the lysine reactivity toward acetylation. The pKa calculations were carried out with the DELPHI software package,17 using a structure of the horse-heart cytochrome c (pdb ID 1HRC), which has 97% sequence similarity to the bovine cytochrome c used in our acetylation/cross-linking experiments (see below). SAS calculations were performed using the DMS program package with a probe radius of 4.0. The relative SAS of all lysine residues were determined. The DMS program package was obtained from the Computer Graphics Lab, University of California, San Francisco. Constraint-Based Protein Structure Prediction and Modeling. To generate a 3D structural model of bovine cytochrome c, we first used the 123D+ threading algorithm18 (http:// 123d.ncifcrf.gov/123D+.html) to generate sequence-structure alignments for cytochrome c against a fold family library of 1125 unique proteins (hobohm_97_45.cath).19 The threading calculation was performed using the following parameters; Costmatrix ) dayhoff, Match ) 0.00, Mismatch ) 0.00, Gapinsert ) 15.00, Gapextend ) 3.00, Window length ) 3, Fold Library ) hobohm97_45.cath, Homology-Level ) 0.00, Used Mode ) global, Cost Function ) (16) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates, J. R., III. Nat. Biotechnol. 1999, 17, 676-682. (17) Honig, B.; Nicholls, A. Science 1995, 268, 1144-1149. (18) Alexandrov, N. N.; Nussinov, R.; Zimmer, R. M.; In Pacific Symposium on Biocomputing ‘96; Hunter, L.; Klein, T. E., Eds.; World Scientific Publishing Co.: Singapore, 1996; pp 53-72. (19) Hobohm, U.; Sander, C. Protein Sci. 1994, 3, 522-524.
Analytical Chemistry, Vol. 80, No. 4, February 15, 2008
953
Scheme 1. Experimental Determination of Relative Reactivity of Lysine Residues in Cytochrome c
threading by contact capacity. The top 20 models were then reordered according to constraint error violations, which we have previously defined as the extent of the model violation of the crosslink-derived distance constraints.2 Structural alignments were performed using the DaliLite server (http://www.ebi.ac.uk/ DaliLite/).20 RESULTS AND DISCUSSION To investigate the acetylation differences of protein amine groups, we first developed a strategy for systematically evaluating the relative reactivity of all lysine residues using bovine cytochrome c as a model protein (see Scheme 1). Lysine residues of bovine cytochrome c were partially acetylated with different equivalents (defined as molar ratio of the reagent and the protein) of sulfosuccinimidyl acetate (sNHS-Ac), which carries the same lysine-selective acetylation group sNHS as the bifunctional crosslinking reagent BS.3 After this titrated initial acetylation with sNHSAc, the unreacted amino groups remaining from the partially acetylated lysine residues were exhaustively acetylated13,14 under denaturing conditions with deuterated (d6) acetic anhydride. Therefore, on any single lysine, one expects to see different amounts of d0 and d3 N-acetyl modifications, whose ratio is reflective of the initial (partial) reaction with sNHS-Ac-d0. To obtain comprehensive lysine coverage at the peptide level for mass spectrometry analysis, we carried out an exhaustive protease digestion with pepsin of the fully acetylated proteins. These pepsin digestions were then analyzed by nano-HPLC-MS/ MS to provide unambiguous peptide assignments by MS/MS and to simultaneously measure the relative isotopic abundance of the precursor ions relative to their differing deuterium content. A complete list of acetylated peptides that were used for quantitation purposes and details of their mass spectrometric identification is provided as Supporting InformationTable S1, as well as a full display of their respective Mascot annotated spectra (see Sup(20) Holm, L.; Park, J. Bioinformatics 2000, 16, 566-567.
954 Analytical Chemistry, Vol. 80, No. 4, February 15, 2008
Figure 1. Acetylation of bovine cytochrome c: (A) Percentage of acetylation (Ac%, y axis) of lysine residues and N-terminus of bovine cytochrome c (x axis) at different equivalences of sNHS-Ac ranging from 0 to 250 equiv of sNHS-Ac, defined as mole of sNHS-Ac per mole of protein (z axis). An averaged Ac% is reported for Lys-99 and Lys-100 (99/100). As the N-terminus of the protein is fully acetylated (post-translational modification in the native protein), it serves here as a positive control. (B) Uneven distribution of lysine acetylation on bovine cytochrome c. Key: Top 6, the 6 most acetylated lysine residues; Least 6, the 6 least acetylated lysine residues; Mid 6, the rest of the 6 of 18 lysine residues not listed in either top 6 or least 6. Percentage of total acetylation is defined as the sum of Ac% of the 6 lysine residues divided by the sum of Ac% of all the 18 lysine residues times 100%. (C) CR of lysine residues by percentage of acetylation (Ac%) at different equivalences of sNHS-Ac. See eq 1 for definition of CR.
porting Information Figure S9). The ratios of the abundances of the molecular ions of all lysine-containing peptides ratios were then used to calculate the percentage of acetylation (Ac%) of each lysine residue at different molar equivalents of sNHS-Ac (Figure 1A and Supporting Information Figures S1 and S2). Since the N-terminal glycine of native bovine cytochrome c is fully acetylated prior to any reactions, its Ac% was used as a positive internal
control, which showed values close to 100% at all equivalents of sNHS-Ac (see Figure 1A). From the experimental data, one can easily see that Nacetylation of specific lysine residues was biased to a subset of lysine residues (see Figure 1B), especially at low equivalences of acetylation reagent sNHS-Ac. At 2.5 equiv of sNHS-Ac, for example, the 6 most reactive of the 18 lysines in bovine cytochrome c, i.e., Lys-13, 22, 25, 72, 86, and 87, accounted for 55% of all the acetylation events whereas the 6 lysines showing the lowest level of acetylation, i.e., Lys-5, 7, 39, 73, 99, and 100, accounted for only 12% of all the lysine acetylations. At higher equivalences of sNHS-Ac, the acetylation pattern became more evenly distributed, with the less reactive lysine residues accounting for a larger share of the acetylations. For example, at 25 equiv of sNHS-Ac, where about approximately half (47%) of all the lysine residues were acetylated, the six least acetylated lysine residues accounted for 21% of all the lysine acetylation while the most acetylated 6 lysine residues accounted for about 50% of all the acetylation. Moreover, the ranking of the lysine residues by Ac% changes at different equivalences of sNHS-Ac (Figure 1C). Figure 1C shows larger changes occurring at low sNHS-Ac equivalences (5 and 12.5 equiv), as indicated by a larger value of CR (change of reactivity) as defined by eq 1, where Ri,j is the rank by Ac% (1 indicates the highest Ac% and 17 the lowest Ac%) of a lysine residue i (except for Lys-99 and Lys-100, of which an averaged Ac% was determined and ranked) at a certain equivalence (j) of sNHS-Ac, and Ri,j-1 is the rank of the residue i by Ac% at the next lower equivalence of sNHS-Ac (j - 1) applied to bovine cytochrome c. 17
CR )
∑(R
i,j
- Ri,j-1)2
(1)
i)1
To better understand our experimental values of bovine cytochrome c lysine reactivities to acetylation by sNHS-Ac, we carried out a theoretical calculation of lysine pKa and relative SAS values. As described in the Experimental Section, the pKa values for each of the 18 lysine residues were calculated using the method described by Honig and Nicholls,17 based on the crystal structure of the horse-heart cytochrome c (pdb ID 1HRC), which has 97% sequence similarity to bovine cytochrome c. In a similar fashion, the relative SAS values for each lysine residue were also calculated using a DMS program package. The lysine residues were grouped as most reactive (red), medium reactive (green), and least reactive (blue) based on their lysine Ac% at 50 equiv sNHS-Ac as shown in Figure 1A. Lysine residues with Ac% greater than 80% are considered as most reactive, between 50 and 80% as medium reactive, and less than 50% as least reactive. As shown in Figure 2, the theoretically calculated pKa and SAS values for each individual lysine residue in cytochrome c are displayed in relation to their experimentally determined acetylation reactivities. One arrow, pointing from right to left, predicts an increase of lysine reactivity with decreasing pKa values. The second arrow, pointing from bottom to up, predicts an increase of lysine reactivity with increasing surface accessibility. The calculated lysine pKa’s show a better correlation to the experimental reactivities than the SAS values. The correlation of lysine reactivities to their predicted surface accessibilities is modest at best, while the majority of the most reactive species clearly had
Figure 2. Relationship between pKa, relative SAS, and experimentally determined reactivities of individual lysine residues in horse-heart cytochrome c toward acetylation (also see Supporting Information Table S2 for more detailed information). The crystal structure of horseheart cytochrome c was used as a surrogate for bovine cytochrome c, which shares 97% sequence identity. Lysine reactivity as determined from Figure 1A is indicated by color coding and assigned based on the Ac% at 50 equiv. sNHS-Ac: lysines are marked in red (most reactive, Ac% > 80%), green (medium reactive, Ac% ) 50-80%), and blue (least reactive, Ac% < 50%).
the lowest pKa values. For example, residues Lys-87 (pKa 6.6) and Lys-13 (pKa 6.7) were considered most reactive at 50 equiv sNHSAc with lysine Ac% > 80%. In contrast, Lys-86 (pKa 6.5) and Lys25 (pKa 6.7) were considered medium reactive at 50 equiv sNHSAc, despite starting out with the highest reactivity at the lowest level of acetlation reagent (2.5 equiv sNHS-Ac). Lysine residues with higher pKa values were mostly observed to be slowly acetylated, matching expectations of a correlation between pKa and lysine reactivity. Prime examples of this correlation are residues Lys-5 (pKa 7.7) and Lys-99 (pKa 8.1), which are considered least reactive based on results shown in Figure 1A. Two other lysine residues that were determined to have slow reactivity to acetylation, i.e., Lys-73 and Lys-100, exhibited still moderately high pKa values at 7.3 and 7.2, respectively. However, there are exceptions and borderline cases where one cannot simply rely on the pKa values for reactivity prediction, as rates are clearly influenced by other factors. For example, Lys-7 (pKa 6.8) showed slow reactivity at 2.5 equiv sNHS-Ac, but at higher levels of the acetylation reagent it was considered medium reactive (Ac% at 50 equiv reagent ) 50-80%) and correlates somewhat better to the low pKa value. Changes in the Ac% rank of various lysine residues in bovine cytochrome suggest that partial acetylation of the protein prior to cross-linking could improve the availability of the less reactive lysine residues to amine-specific cross-linking reagents, thus facilitating more diverse and more informative cross-linking reactions. To examine this possibility, bovine cytochrome c was acetylated with 5 or 12.5 equiv of sNHS-Ac, followed by crosslinking with 50 equiv of BS3 (Figure 3, Table 1, and Supporting Information Table S3). The cross-linking of bovine cytochrome c using 50 equiv of BS3 without prior acetylation is also reported for comparison. The low equivalences of sNHS-Ac were chosen, as they induce large changes in the relative reactivity of lysine residues (large CR values, Figure 1C) and would impose the least Analytical Chemistry, Vol. 80, No. 4, February 15, 2008
955
Figure 3. Distance constraints between lysine residues of bovine cytochrome c derived from peptide cross-link hits (see Table 1 and Supporting Information Table S3). The position (x and y values, where y g x) of each symbol denotes the residue numbers of two lysine residues whose R-carbons are separated by e23.85 Å, the maximum distance allowed with fully extended cross-linker and lysine side chains. The type 0 modifications (dead-end lysine acylations) fall onto the diagonal line of y ) x. Informative distance constraints carry y values larger than x + 4 and are presented as points above the red dotted line.
perturbation of the tertiary protein structure.13,14 No intermolecular cross-linking (i.e., dimers) was detected by gel electrophoresis analyses under such acetylation/cross-linking conditions (Supporting Information Figure S3), suggesting that low-level acetylation of cytochrome c did not induce any aberrant protein-protein interactions at these working concentrations. Cross-linking of bovine cytochrome c using BS3 cross-linking reagent without any prior modification of lysine residues yielded a large number of type 0 cross-linked peptides (one end of the cross-linker modifies the protein and the other end is hydrolyzed) as well as several cross-links between closely spaced lysine residues (less than two amino acid residues apart in primary sequence), neither of which offers informative distance constraints.2,10 Only two informative distance constraints (Lys-7 to Lys-27 and Lys-39 to Lys-53) were determined from two mass spectrometry-verified cross-linked peptides (see Table 1). The small number of distance informative cross-linked peptides supports our hypothesis that the uneven reactivity of protein residues is a limiting factor for the diversity of cross-linking reactions. Acetylation of bovine cytochrome c with sNHS-Ac (5 or 12.5 molar equiv) prior to cross-linking led to the identification of a series of novel cross-linked peptides. All these new cross-links (as well as the Lys-acetylation sites discussed earlier) were rigorously confirmed by tandem mass spectrometry. Tandem mass spectra of all potential cross-linked peptides were analyzed using the MS2Assign program10 and then inspected “manually” using several criteria as described in the Experimental Section before inclusion, including the presence of sequence tags.16 Indeed, three additional informative distance constraints not observed in the original cytochrome c cross-linking experiments without prior acetylation could now be unequivocally assigned (Table 1 and Figure 3). Combining all cross-linking experiments (with and without prior acetylation), a total of five useful distance constraints resulting from “long” sequence stretch spanning cross956 Analytical Chemistry, Vol. 80, No. 4, February 15, 2008
links have now been identified. For completion, a complete list of all BS3-modified peptides containing type 1 or 2, including closely spaced lysine residues that contain no useful distance information, are listed in the Supporting Information Table S3. Specific examples of these new cross-linked peptides are shown in Figure 4. For example, Figure 4A shows a novel peptide consisting of K39TGQAPGFSYTDANK53 (R peptide) and K100ATNE104 (β peptide) cross-linked between Lys-39 and Lys-100. Fragment ions from both peptides were oberved, displaying a dominant y series (ynR and ynβ). Figure 4B shows a new peptide consisting of G(Ac)1DVEK5GK7 (R peptide) and K87K(Ac)GER91 (β peptide) cross-linked between Lys-5 and Lys-87. This type 2 cross-linked peptide also displays an extensive fragment ion series from both peptide chains. This latter peptide is of particular interest as it contains two neighboring lysine residues, Lys- 87 and Lys-88, only one of which is acetylated (Lys-88), thus demonstrating the influence of N-acetylation on directing crosslinking reactions. Further MS/MS spectra of novel cross-linked peptides are shown in the Supporting Information, i.e., Figure S4 (cross-link between Lys-7 and Lys-100), Figure S5 (different peptide as above with cross-link between Lys-5 and Lys-87), Figure S6 (cross-link between Lys-5 and Lys-8, no distance constraint), and Figure S7 (cross-link between Lys-72 and Lys-73, no distance constraint). MS/MS spectra for peptides proving the cross-links Lys7-Lys27 and Lys39-Lys53, respectively, (no prior acetylation) were previously published by Schilling et al.10 The beneficial effects of acetylation become evident in the fact that several of the new cross-links that have formed upon prior acetylation of cytochrome c result from lysine residues that were identified in the differential acetylation experiments to have low reactivities (see Figure 1A and 2). Two such residues, Lys-5 and Lys-100, also have relatively high predicted pKa values. Thus, it seems, such residues became available for cross-linking only after partial acetylation of other residues in bovine cytochrome c. In addition, prior partial acetylation can block one of two lysine residues that are close in primary sequence and thereby prevent the two from cross-linking with each other. This will in turn facilitate the unblocked lysine residue to cross-link with alternate residues, reactions that would potentially lead to informative distance constraints. This may have been the case for the lysine clusters Lys-5,7,8, Lys-86,87,88, and Lys-99,100. Cross-linking with prior acetylation by 12.5 equiv of sNHS-Ac yielded the same four distance constraints identified from crosslinking with prior acetylation by 5 equiv of sNHS-Ac. All five distance constraints from all experiments combined with or without acetylation (see Table 1) are consistent with the recently reported X-ray crystal structure of cytochrome c (PDB ID 2B4Z), indicating that the acetylation and cross-linking conditions did not perturb the native fold of the protein. Protein fold identification (Table 2) was carried out in a twostep threading process as previously reported.2 First, we carried out sequence threading with 123D+18 of bovine cytochrome c to find the best 20 structural models out of a fold family database of 1125 proteins sharing less than 45% sequence identity.19 Five cytochrome c proteins are ranked first to fourth and eighth. Second, we then re-ranked the models by their agreement with the cross-link-derived distance constraints, which resulted in ranking of the cytochrome c proteins from first to fifth as shown
Table 1. Observed Lys-Lys Cross-Links and Corresponding Cross-Linked Peptides from Bovine Cytochrome c, with and without Prior Acetylation, Containing Useful Distance Constraintsa equiv Ac-sNHS
X-link positionsc
X-link typeb
0d 0d 5 5 5
type 2 type 1 type 2 type 2 type 2
5
type 2
K7-K27 K39-K53 K5-K87 K7-K100 (K7 or K8) -K100 K39-K100
sequence number
peptide sequences
6-8, 26-38 39-55 1-7, 87-88 6-8, 100-104 6-13, 100-104
HKTGPNLHGLFGR-GKK KTGQAPGFSYTDANKNK G(Ac)DVEKGK-KK GKK-KATNE GKKIFVQK-KATNE or GKKIFVQK-KATNE KTGQAPGFSYTDANKKATNE KTGQAPGFSYTDANKNK G(Ac)DVEKGKKK(Ac)GER KATNE-GKK KTGQAPGFSYTDANKKATNE KTGQAPGFSYTDANKNK
39-53, 100-104
5 12.5
type 1 type 2
K39-K53 K5-K87
39-55 1-7, 87-91
12.5 12.5
type 2 type 2
K7-K100 K39-K100
6-8, 100-104 39-53, 100-104
12.5
type 1
K39-K53
39-55
modifications
M (exp)
M (theor)
m/z (z)
∆M ppm
BS3 BS3 BS3, Ac BS3 BS3
1901.97 1963.93 1185.62 1030.53 1687.89
1902.08 1963.98 1185.68 1030.58 1687.97
635.00 (3) 655.65 (3) 396.22 (3) 516.27 (2) 563.64 (3)
53 24 46 51 48
BS3
2283.02
2283.12
762.02 (3)
41
BS3 BS3, 2xAc
1963.86 1569.79
1963.98 1569.85
982.94 (2) 524.27 (3)
62 41
BS3 BS3
1030.54 2283.02
1030.58 2283.12
516.28 (2) 762.01 (3)
38 43
BS3
1963.91
1963.98
982.96 (2)
36
a All reactions used 50 molar equiv of the cross-linking reagent, BS3, relative to the total number of lysine residues (18) in cytochrome c with variable equivalents of sNHS-Ac (with x ) 0, 5, and 12.5 equiv acetyl reagent). b Cross-link type 2 (inter-peptide cross-link) and type 1 (intrapeptide cross-link). c X-link positions in bold represent significant and new distance constraints that formed after cross-linking of partially acetylated bovine cytochrome c. d Values as previously reported by Schilling et al.10
in Table 2. The threading models were scored by using eq 2,2 where n is the number of constraints; di is the distance CR-CR in the model for the two residues in constraint i; and d0 is 23.85 Å, the maximum CR-CR through-space distance between BS3-crosslinked lysines.
∑{ d - d n
i)1
0
i
0
if di < d0 if di > d0
(2)
With only the two spatial constraints derived from cross-linking without prior N-acetylation, the re-ranking reports zero constraint violation for the initially identified top nine models from five different protein families, insufficient for definitive fold identification. When the 3 additional spatial constraints obtained from crosslinking with prior acetylation of cytochrome c were included (then a total of 5 distance constraints), all 5 cytochrome c proteins were re-ranked to top 5 of the 20 structural models. Compared to the initial ranking of the top 20 models by sequence threading, the re-ranking based on all the 5 spatial constraints confirmed the cytochrome c fold by lowering two alternative fold models (βspectrin, 1DRO, PH domain-like; pseudoazurin, 1PAZ, cupredoxinlike) out of the top 6 list. All top six ranked structures after the constraint-based reordering are alignable with both 2B4Z, a crystal structure of bovine cytochrome c, and 1CRC, a crystal structure of chicken cytochrome c, with RMSD values of less than 3 Å. None of the lower-scoring structures are alignable with 2BEZ or 1CRC except for β-spectrin, which is barely alignable with 2B4Z (3.8 Å RMSD over 46 amino acid residues) but not alignable with 1CRC. The only one non-cytochrome c protein among the top six models is 1KTE of glutaredoxin. Glutaredoxin has a different function than cytochrome c and less than 10% sequence identity with 2BEZ. However, its structure is alignable with both 2B4Z (2.7 Å RMSD over 42 amino acid residues, Figure 5) and 1CRC (2.8 Å RMSD over 46 amino acid residues). Since the goal of MS3D is fold identification rather than functional annotation or sequence alignment, we consider 1KTE (glutaredoxin) a success-
ful hit and a demonstration of the robustness of the MS3D methodology. CONCLUSION Taken together, the variable and uneven nucleophilic reactivity observed in this study among protein lysine amino groups to NHS ester cross-linking reagents can be identified as a limiting factor for the low diversity of intramolecular protein cross-linking that is typically observed. As determined by partial (incomplete) reactions with the water-soluble amine-specific acetylation reagent sNHS-Ac, it was clear that there are reactive “hot-spots” that can dominate the reaction profile and therefore place severe restrictions on the observable lysine-lysine cross-links. Part of the variation in this reactivity can be directly attributed to pKa differences among lysine residues, which can vary over 2 units. Other factors are likely to contribute as well, such as steric considerations and relative solvent accessibility. Partial acetylation of the most reactive lysines prior to cross-linking was shown here to be a simple and effective strategy to improve the diversity of the cross-linking reactions and thereby to generate sufficient distance constraints needed for reliable protein fold identification. Nonetheless, it was the combination of cross-linking reactions, ranging from unmodified to low and moderate levels of prior protein acetylation, that provided the most complete data set of useful distance constraints for proper fold family recognition. After completion of this current study, we became aware of a paper in press by Lee et al.21 that used high-resolution Fourier transform mass spectrometry (FTMS) coupled to ultrahighpressure chromatography to identify cross-linked peptides from horse-heart cytochrome c after treatment with BS3, the same amine-specific bifunctional cross-linking reagent used in our study. It is interesting to note that they identified significantly more crosslinks than had previously been reported for horse-heart cytochrome c,8 presumably due to the increased dynamic range of FTMS and its associated higher mass accuracy and dynamic range (21) Lee, Y. J.; Lackner, L. L.; Nunnari, J. M.; Phinney, B. S. J. Proteome Res. 2007, 6, 3908-3917.
Analytical Chemistry, Vol. 80, No. 4, February 15, 2008
957
Figure 4. Representative tandem mass spectra of novel, cross-linked peptides identified only in the partially acetylated cytochrome c showing type 2 cross-links (interpeptide cross-link between two peptide chains). (A) ESI-MS/MS spectrum of peptides K39TGQAPGFSYTDANK53 (Rpeptide) and K100ATNE104 (β-peptide) cross-linked between Lys-39 and Lys-100. The [M + 3H]3+ signal at m/z 762.023+ (Mexp ) 2283.02) was selected as the precursor ion. (B) ESI-MS/MS spectrum of peptides G(Ac)1DVEK5GK7 (R-peptide) and K87K(Ac)GER91 (β-peptide) cross-linked between Lys-5 and Lys-87. The [M + 3H]3+ signal at m/z 524.273+ (Mexp ) 1569.79) was selected as the precursor ion. Two additional acetyl groups are located on the N-terminal residues Gly-1 and Lys-88. (Note: for cross-linked peptides, the larger of the two peptides is referred to as the R-peptide and the smaller as the β-peptide according to the nomenclature of Schilling et al.10). 958 Analytical Chemistry, Vol. 80, No. 4, February 15, 2008
Table 2. Top 20 Threading Models Ranked by Distance Constraint Errora protein name ferricytochrome C cytochrome c2 (Rhodospirillum rubrum) cytochrome c2 (Rhodobacter sphaeroides) cytochrome c2 (Paracoccus denitrificans) cytochrome c6 glutaredoxin NTRC receiver domain pseudoazurin pseudoazurin pleckstrin profilin Ib heat-labile enterotoxin flavodoxin kedarcidin β-spectrin RANTES rous sarcoma virus protease CD2, first domain ipoyl domain of dihyrolipoamide acetyltransferase designed zinc finger protein
PDB ID
fold family
%ID
threading rankb
error [Å] by two constraintsc
error [Å] by five constraintsd
1ccr 3c2c
cytochrome C cytochrome C
57.52 31.90
1 (1) 2 (2)
0.00 0.00
0.00 0.00
1cxc
cytochrome C
26.98
3 (3)
0.00
0.00
1cot
cytochrome C
28.46
4 (4)
0.00
0.00
1ctj 1kte 1ntr 1paz 1pmy 1pls 1acf 1ltsD 5nul 1akp 1dro 1hrjA 2rspA 1cdb 1iyv
cytochrome C thioredoxin fold flavodoxin-like cupredoxin-like cupredoxin-like PH domain-like profilin-like OB-fold flavodoxin-like IgG-like β-sandwich PH domain-like IL8-like acid proteases IgG-like β -sandwich barrel-sandwich hybrid
18.18 9.76 9.23 6.35 7.94 7.58 8.66 4.20 9.29 3.97 8.94 2.86 6.77 7.56 6.48
5 (8) 6 (7) 7 (16) 8 (6) 9 (9) 10 (20) 11 (19) 12 (15) 13 (11) 14 (14) 15 (5) 16 (18 17 (13) 18 (17) 19 (10)
0.00 0.00 0.00 0.99 1.14 2.27 0.00 3.61 1.30 0.00 12.97 2.80 14.08 4.65 6.12
0.00 0.23 0.80 0.99 2.11 2.27 3.02 3.61 3.76 5.32 12.97 13.30 14.08 20.50 34.01
1meyF
zinc finger design
8.62
20 (12)
6.55
45.88
a Constraint error is the extent of model violation of the cross-link-derived distance constraints, as defined by eq 2. b Ranking is listed as after re-ordering with five distance constraints data and in parentheses, the original ranking. c Distance constraints obtained from cross-linking experiments (50 equiv BS3). d Distance constraints obtained from cross-linking experiments (50 equiv BS3) without and in addition, with prior partial protein acetylation (5 and 12.5 equiv of sNHS-Ac).
Figure 5. Structural alignment of bovine cytochrome c (2B4Z, gray) and glutaredoxin (1KTE, cyan). The RMSD is 2.68 Å over 42 R-carbons.
of ion detection. It should be pointed out, however, that horseheart cytochrome c has an additional lysine residue compared with the bovine form used here, which increases both the number of potential Lys-Lys cross-links and tryptic cleavage sites. Nonetheless, a comparison of our results (after taking into account the nonhomologous Lys-60) shows that the absolute number of informative cross-links obtained after FTMS analysis was higher, although there remained unique cross-links observed only in our data set both before and after prior N-acetylation, e.g., Lys39-Lys53 and Lys7-Lys100, respectively. The publication of the work by Lee et al.21 also provides an opportunity to better understand the two dynamic range limitations in cross-linking experiments: one, the ability to detect cross-linked
peptides covering many orders of magnitude in relative molar abundance and two, the relative reactivity of the combinatorial set of paired lysine residues in a given protein that minimally satisfy the appropriate distance constraint. The study by Lee and colleagues addressed the former issue and clearly demonstrated the advantages of optimizing chromatographic separation and mass spectrometry sensitivity and mass accuracy. In contrast, our results using prior acetylation takes aim at the second dynamic range limitation by redirecting cross-linking away from the most reactive and entropically favored lysine pairs to those where at least one of the paired members is less reactive. Alternatively, the new cross-linked lysine pairs found only in the protein after partial N-acetylation could be the result, at least in some cases, of overcoming an entropic disadvantage due to a larger distance that the partially reacted and tethered cross-linking reagent needs to explore, a strategy which would be predicted to reduce the number of redundant and often uninformative lysine cross-links between closely spaced lysine residues. Clearly, a combination of the two approaches, prior-N-acetylation followed by betteroptimized mass spectrometry and separation analysis, has the potential to be a very powerful improvement in overall experimental design. In regards to our choice of lysine modifying reagent (sNHSAc), it may also be possible to use other types of pretreatment regimens, such as succinylation, which might have the additional value of maintaining a charged site at lysine residues (although in this particular example, positive to negative) and therefore maintaining protein solubility and tertiary structure. Of course, care needs to be taken when carrying out any protein modification, including N-acetylation and bifunctional cross-linking, as such modifications have the potential to alter protein structure. Indeed, Analytical Chemistry, Vol. 80, No. 4, February 15, 2008
959
this concern was one of the reasons we carried out both N-acetylation and bifunctional cross-linking at several concentrations, as well as assessing for possible changes in protein dimerization, which could yield potentially confounding interprotein cross-links, by 1D gel electrophoresis. Overall, these findings have profound implications in the development of the MS3D methodology for the determination of tertiary structure and foldfamily assignments into a more broadly applicable strategy for protein structure characterization. ACKNOWLEDGMENT This work is supported by NSF Grant CHE-0118481. We thank Mr. Abraham Lo for technical assistance.
960
Analytical Chemistry, Vol. 80, No. 4, February 15, 2008
SUPPORTING INFORMATION AVAILABLE Tandem mass spectra of cross-linked and acetylated peptides obtained from cytochrome c, corresponding tables of acetylated and cross-linked peptides (details of mass spectrometric identification), and a table showing cytochrome c lysine pKa and SAS values. This material is available free of charge via the Internet at http:// pubs.acs.org.
Received for review August 1, 2007. Accepted November 16, 2007. AC701636W