Site-Specific Labeling of Protein Lysine Residues ... - ACS Publications

Nov 20, 2015 - Department of Chemistry, University of Toronto, UTM, 3359 Mississauga Road North, Mississauga, Ontario L5L 1C6, Canada. ‡ Department ...
0 downloads 0 Views 1MB Size
Subscriber access provided by University of Manitoba Libraries

Article

Site-specific labeling of protein lysine residues and Nterminal amino groups with indoles and indole-derivatives Sacha Thierry Larda, Dmitry Pichugin, and Robert Scott Prosser Bioconjugate Chem., Just Accepted Manuscript • DOI: 10.1021/ acs.bioconjchem.5b00457 • Publication Date (Web): 20 Nov 2015 Downloaded from http://pubs.acs.org on November 24, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Bioconjugate Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 CH2 20 21 22 H2C 23 24 CH2 25 26 H2C 27 28 NH2 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54

Bioconjugate Chemistry

H

H O

CH2 H2C

H2C

O H

X

CH2

H

CH2

HN

CH2 H2C

H2C

NH2

NH H2C

CH2 X

ACS Paragon Plus Environment

NH

Bioconjugate Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Site-specific labeling of protein lysine residues and N-terminal amino groups with indoles and indole-derivatives Sacha Thierry Larda†, Dmitry Pichugin†, and Robert Scott Prosser†‡ †Department of Chemistry, University of Toronto, UTM, 3359 Mississauga Road North, Mississauga, Ontario L5L 1C6, Canada ‡Department of Biochemistry, University of Toronto, 1 King’s College Circle, Toronto, Ontario M5S 1A8, Canada

Abstract Indoles and indole-derivatives can be used to site-specifically label proteins on lysine and Nterminal amino groups under mild, non-denaturing reaction conditions. Hen egg white lysozyme (HEWL) and alpha-lactalbumin were labeled with either indole, fluoroindole, or fluoroindole-2-carboxylate via electrophilic aromatic substitutions to lysine sidechain Nε- and N-terminal amino imines, formed in-situ in the presence of formaldehyde. The reaction is highly site-selective, easily controlled by temperature, and does not eliminate the native charge of the protein, unlike many other common lysine-specific labeling strategies. 19F NMR was used to monitor reaction progression and in the case of HEWL, unique resonances for each labeled side chain could be resolved. We demonstrate that the indole tags are highly selective for primary amino groups. 19F NMR demonstrates that each lysine exhibits a different rate of conjugation to indoles making it possible to employ these tags as a means of probing surface topology by NMR or mass spectrometry. Given the site-specificity of this tagging method, the mildness of the reaction conditions (aqueous, buffered or unbuffered) and the low stoichiometry required for the reaction, indolederivatives should serve as a valuable addition to the bioconjugation toolkit. We propose that labeling lysine side chains and N-terminal amino groups with indoles is a versatile and general strategy for bioconjugations with substituted indoles having broad implications for protein functionalization. Introduction Bioconjugation reactions play an important role in the study of proteins and are increasingly used in the synthesis of novel biomaterials and functionalized therapeutics.1-6 In addition, reactions which serve to site-specifically incorporate chemical tags into proteins are of particular importance in biochemistry as they provide ways of probing and/or chemically modifying specific residues that may be responsible for key aspects of structure and/or function.7, 8 Acetylation (acylations), methylation (alkylations), as well as succinimide- and maleimide based conjugations are all common techniques that target lysine or cysteine residues for the purpose of either conferring new properties (and possibly functionality), or serving to introduce labels (isotopes, fluorophores, radicals, complexes) into the macromolecule of interest. 9-12 Lysine residues in particular serve as convenient labeling sites, due in part to their abundance of roughly 6-7% in proteins and their tendency to be solvent-exposed.13-15 As with arginine they tend to confer favorable solubility to the macromolecule given their net positive charge at physiological pH. In addition, lysine residues are often involved in structurally and functionally important intra-, inter-domain,

ACS Paragon Plus Environment

Page 2 of 14

Page 3 of 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Bioconjugate Chemistry

and inter-protein interactions such as cation-pi, hydrogen bonds, and salt-bridges, for which the net charge is essential.16 With regard to lysine side chain and N-terminal amine tagging, many existing approaches employ reagents such as anhydrides, acyl halides, thioacetates and active esters, isocyanates and isothiocyanates, and sulfonyl halides, all of which eliminate the net positive charge by converting the amine to a neutral conjugate.17-21 The elimination of the native charge of the protein can be perturbing to both structure and function and is therefore generally undesirable, especially at higher levels of labeling. Here we report a chemical tagging approach that employs indoles to site-specifically label lysine residues, under mild, non-denaturing, aqueous conditions. This labeling strategy preserves the native charge of the lysine side chain amino group and allows for the incorporation of functionalized/substituted indoles for a wide range of potential applications including: pegylation, protein immobilization, isotopic tagging for NMR, isotope coded affinity tagging for mass spectrometry, and crosslinking. Results Indole rings are surprisingly reactive toward electrophilic species. The C3 position of the indole ring is roughly 1013 times more reactive than benzene toward aromatic substitutions.22-24 In the presence of aqueous formaldehyde, protein side chain and N-terminal amino groups tend to form electrophilic imine intermediates. This process is identical to the first steps associated with reductive aminations/alkylations of proteins.10, 25, 26 These transient cationic intermediates are sufficiently reactive toward the C3 of indole rings such that conjugation readily proceeds under non-denaturing temperatures and pH. Reactions between protein amino groups and indoles (as well as imidazoles and/or phenols) in the presence of formaldehyde was reported by Fraenkel-Conrat et al. in 1947, though earlier evidence for such reactions has existed since the advent of the tanning industry.27 Conjugations between amino groups and tyrosine residues in proteins, which occur similarly via Mannich reactions (specifically, the Betti reaction), have been reported but tend to require far greater stoichiometric ratios of reagent:protein and are prone to side reactions with tryptophan residues.28-30 Here, we demonstrate site-specific covalent attachment of indoles to the lysine side chain and N-terminal amino groups of hen egg white lysozyme (HEWL) and alpha-lactalbumin. This bioconjugation strategy is achievable at comparatively low stoichiometric ratios of reagent to protein, thereby eliminating the common concerns associated with crosslinking at high protein concentrations in the presence of formaldehyde. We show that the reaction exhibits high specificity for lysine side chain and N-terminal amino groups. The reaction can be readily performed in buffered (pH 6-9) or unbuffered solution. The only requirement is the absence of primary, or secondary amines, reducing agents, and any compounds with highly acidic CH protons as buffer components during the reaction (as the presence of such buffer components can introduce crossreactivity).

ACS Paragon Plus Environment

Bioconjugate Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 14

By employing 5-fluoroindole or 5-fluoroindole-2-carboxylate, fluorinated tags can be conjugated to each reactive lysine on the protein as shown below (Figure 1A, 1B, respectively).

F

C A

CH2 H2C

HN

CH2 H2C NH2

H2O

CH2 H2C

H2C

CH2 H2C

O

NH

F CH2 H2C

H

H

CH2 H2C

NH2

NH

D

F

CH2

CH2

B H2C HN

CH2 C

O

H2C

O

NH2 H2C F

O C O NH

Figure 1. Reaction schemes describing the labeling of protein lysine amino groups with 5-fluoroindole (A) and 5-fluoroindole-2-carboxylate (B). Note that lysine side chains are in equilibrium between protonated and deprotonated states and that it is the deprotonated state that reacts with formaldehyde to form the electrophilic imine intermediate. The carbon contributed by formaldehyde is shown in red. The most reactive site of the indole ring is the C3 position. (C) 19F NMR spectrum (5 ˚C) of ~150-300 µM 5-fluoroindole labeled HEWL exchanged into unbuffered milli-Q H2O. (D) 19F NMR spectrum (25 ˚C) of 5-fluoroindole-2-carboxylatelabeled HEWL (~150-300 µM) exchanged into unbuffered milli-Q H2O. In both cases, peak deconvolution can resolve at least 6 distinct resonances. Spectra are normalized such that the largest peak was assigned an absolute intensity of 100. The reaction stoichiometry used is the same as described in the methods section.

For both fluoroindole and fluoroindole-2-carboxylate labeled HEWL we expect to observe as many as 7 resonances arising from 6 lysines and 1 N-terminal amino group (Figure 2). The NMR spectrum reflects this, exhibiting 6-7 prominent peaks. Presumably steric perturbation prevents the addition of a second indole tag.

ACS Paragon Plus Environment

Page 5 of 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Bioconjugate Chemistry

A

B

Figure 2. Crystal structures of hen egg white lysozyme (PDB:193L, A) and bovine alpha lactalbumin (PDB:1HFZ, B).31, 32 Lysine residues are colored in dark blue. The N-terminal residue of HEWL is a lysine whereas the Nterminal residue of bovine alpha lactalbumin is a methionine (backbone colored in cyan). The N-terminal amino groups are also reactive toward indoles in the presence of formaldehyde and will therefore conjugate. Furthermore, the lower pKa of Nα amino groups relative to Nε of lysine side chains favors reactivity at the N-termini. For HEWL, there are seven free reactive amino groups, whereas in alpha lactalbumin, there are 13 potential sites for conjugation. Structures prepared using UCSF Chimera and POVRay.33, 34

In order to definitively identify those resonances arising from fluoroindole and fluoroindole-2carboxylate tags conjugated to protein, we performed diffusion-edited NMR to selectively eliminate all peaks arising from small molecule byproducts and reagent (Figure S1). Furthermore, to demonstrate that the indole-labeling methodology is site-specific and that only lysine side chain and N-terminal amino groups are labeled, the reaction was tested against a fully-dimethylated lysozyme sample. A fully dimethylated protein has all reactive amine groups “capped” and cannot further react with formaldehyde to form the electrophilic imine intermediate that precedes indolation. As expected, the reaction does not proceed with the dimethylated protein indicating that the reaction is indeed specific for lysine sidechain and N-terminal amino groups (Figure S2A, S2B). Arginine does not react under most conditions given the elevated pKas of the guanidinium nitrogens as well as the stability conferred by resonance of the guanidinium group. 13C,1H HMBC and HSQC experiments along with chemical shift predictions confirm coupling between the 13CH2 methylene “bridge” (red in Figure 1) and the aromatic protons of the indole ring and also demonstrate that it is the C3-position of the indole ring which is the site of conjugation (Figure S3, S4, S5).

ACS Paragon Plus Environment

Bioconjugate Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Mass spectrometry Coupling of functionalized indoles to proteins has many potential mass spectrometry applications. We acquired ESI-MS mass spectra for indole, 5-fluoroindole, and 5-fluoroindole-2-carboxylate labeled lysozyme samples equivalent to those shown in Figure 1 (Figure 3). In each case, the 19F NMR spectra reveal 6-7 resonances, while the mass spectra demonstrate that there are on average 2-3 indole tags incorporated per molecule. Therefore, whereas each molecule of protein is labeled with a small number of indole tags, on average there is complete coverage/labeling of all amino groups to some extent, with the more reactive lysines presumably exhibiting higher degree of modification (Table 1). A

C

B

Figure 3. ESI+ mass spectra for indole-, 5-fluoroindole- and 5fluoroindole-2-carboxylate-labeled HEWL. Respective fragments added to amino groups for each indole are highlighted with blue rectangles. Yellow rectangles denote backbone (R) and amino group which can exist in both protonated and deprotonated states both before and after labeling. (A) ESI+ mass spectrum for indole-labeled HEWL showing up to 8-modifications possible (B) ESI+ mass spectrum for 5-fluoroindole-labeled egg white lysozyme. (C) ESI+ spectrum for 5-fluoroindole-2-carboxylate-labeled HEWL. Each successive mass peak corresponds to the addition of a mass equivalent of either indole+CH2 (A), 5-fluoroindole+CH2 (B) or 5fluoroindolecarboxylate+CH2 (C), where the CH2 is from the formaldehyde carbon during imine formation. Figure S8 shows the mass spectrum for unlabeled HEWL with MW=14306 Da. In (A), the small proportion (2%) of 8-modifications indicates potential non-sitespecific labeling of protein (perhaps the histidine nitrogen). Labeling with fluoroindoles at levels beyond 1-3 tags/molecule leads to protein instability in the case of HEWL but is well tolerated for non-fluoroindole.

ACS Paragon Plus Environment

Page 6 of 14

Page 7 of 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Bioconjugate Chemistry

Table 1: Percent Modification of egg white lysozyme as quantified by ESI-MS for various indoles. 0 mod 1 mod 2 mod 3 mod 4 mod 5 mod 6 mod 7 mod 8 mod 5-fluoroindole 56.5 34 8.5 ~1 0 0 0 0 0 5-fluoroindole-2-carboxylate 48 36 14 2 0 0 0 0 0 Indole 2 6.8 13.6 19.6 18.8 18.5 11.7 6.8 2 *Note that higher yields can be achieved but under the conditions employed (high protein concentration) for NMR, precipitation occurs at elevated levels of protein modification.

Similarly, alpha-lactalbumin was labeled with fluoroindole until a maximum in 19F-NMR peak intensity was achieved (Figure 4). At the maximum, there are on average 1-4 labels per molecule of protein. The absence of precipitate in the NMR sample at ~250 µM protein concentration suggests that the indole tags are well tolerated at this level of labeling. Higher labeling yields were found to result in sample instability. Figure 4. Fluoroindole labeling of lactalbumin at 250 µM in milli-Q H2O at 25 ˚C, monitored over time by 19F NMR and the corresponding mass spectrum after 105 min of labeling (arrow). Grey plot (squares) defines reaction profile versus time out to 240 min and defines a plateau indicative of maximum stable levels of labeling (stable out to 300 min, not shown). Beyond 300 min, loss of 19F intensity indicates sample instability (not shown). The black plot (circles) follows the time course for an equivalent sample but was stopped after 105 minutes to obtain the mass spectrum shown. The stacked 19F NMR spectra correspond to the individual points in the black plot (circles). Unlabeled refers to lactalbumin in the absence of any fluoro-indole coupling.

Prior work has demonstrated the utility of fractional labeling in preventing perturbation of protein structure.35 Given the modest number of probes conjugated to each protein (Figure 4), prolonged sample stability at elevated temperatures for the fractionally-labeled samples, and the lack of substantial inhomogeneity in the NMR spectra of fluoroindole-2-carboxylate labeled HEWL as a function of temperature (Figures S6, S7), it seems that the partial labeling approach does not induce multiple protein conformational states for this tag. Unlike fluoroindole-2-carboxylate spectra, those for fluoroindole do exhibit inhomogeneity, but only at elevated temperature.

ACS Paragon Plus Environment

Bioconjugate Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1,3-dimethylindole was also tested for reactivity at the 2’ position of indoles as this site may undergo electrophilic aromatic substitution. The mass spectrum (Figure S8, D) shows no evidence of coupling after 48 hours under the conditions and stoichiometry described in the methods.

Probing protein topology by monitoring differential lysine reactivity. The pKas for lysine side chain amino groups are typically between 9.3 and 10.3 and are dependent on factors such as solvent exposure, intra- and intermolecular interactions (H-bonds, salt bridges), and steric factors.36-40 It is therefore reasonable to assume that one can distinguish individual lysine residues simply by their differences in reactivity toward a chemical reagent. Furthermore, any changes in their reactivity may indicate conformational changes in a protein and/or ligand binding events.9 From Figure 1C and 1D, the difference in peak integrals for the 19F resonances of fluoroindole and fluoroindole-2carboxylate labeled lysines suggests variance in their relative reactivities. To further demonstrate such differences, 19F NMR spectra were acquired as a function of time for the reaction between 5-fluoroindole, formaldehyde, and lysozyme (Figure 5). The same was done for the reaction with 5-fluoroindole carboxylate (Figure 6). It is apparent that many of the lysine residues can readily be distinguished by their distinct reaction profiles. Relative differences in reactivity between the two reagents are also very pronounced, with fluoroindole carboxylate exhibiting a much slower reaction profile than fluoroindole. This is most likely due to the deactivation of the C3 position by the carboxylate group at C2 for 5fluoroindole-2-carboxylate.

Figure 5. Relative labeling rate of 5-fluoroindole to various individual lysine residues in HEWL (A) Plot of absolute 19 F NMR peak integrals versus time for 5-fluoroindole labeling of 500 µM HEWL in 20 mM phosphate pH 7.2 at 25 ˚C (B) Overlay of spectra as reaction time progresses. All peaks shown correspond to labeled protein and are numbered according to the profiles in plot A. Peak integrals were obtained by deconvolutions using MestreNova. The predicted solvent exposure and pKa for each lysine residue of HEWL is presented in Table S1.41-46

ACS Paragon Plus Environment

Page 8 of 14

Page 9 of 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Bioconjugate Chemistry

*

* Note that resonance 8 corresponds to a small molecule in solution. Its intensity was static and it was present since the start of the reaction. Figure 6. Relative labeling rate of 5-fluoroindole-2-carboxylate to various lysine residues in HEWL (A) Plot of 19F peak integral obtained from deconvolution of spectra versus time for 5-fluoroindole-2-carboxlate labeling of 500 µM HEWL in 20 mM phosphate pH 6.0 (B) Representative deconvolution of the 19F NMR spectra for 5fluoroindolecarboxylate-labeled HEWL with peak numbers corresponding to each of the profiles in plot A. The integrals used to obtain plot (A) could only be reliably estimated with sufficient peak intensity; therefore the intensity starts above 200. Note that the reaction had not yet plateaued after 1030 minutes indicating that reactions with fluoroindole-2-carboxylate are far slower than for fluoroindole at the same temperature. The deconvolutions were necessitated by the fact that peaks 4 and 5 could not be independently integrated which would otherwise prevent reliable estimation of the rate of reaction at either site. Figure S9 shows the plots for 19F peak integral vs time for the fully deconvolved spectra.

Discussion Methods for chemically modifying proteins have played an important role in biochemistry and materials science for over half a century and continue to be active areas of research. Site- or residuespecific modifications are more attractive strategies given that they simplify the interpretation of results and yield predictable conjugates. Whereas many approaches to chemically modifying lysines in proteins exist, the majority of these form amide linkages and therefore destroy the native charge on the amino group, which may otherwise be critical for protein function and/or stability. Indole-based conjugates have the advantage that they preserve the native protein charge and are relatively small compared to many other common lysine- or cysteine-specific tags, such as most fluorophores used for FRET and FACS (Alexa Fluor), and comparable in size to succinimide-based (NHS-) tags. Furthermore, aromatics are more readily functionalized than saturated aliphatics, therefore making it possible to readily design a large library of indole-based probes and crosslinkers that specific for lysines and N-terminal amino groups.

ACS Paragon Plus Environment

Bioconjugate Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 14

EDAC/NHS based conjugation methods are popular and are ubiquitously employed in the preparation of lysine-specific probes/tags/crosslinkers. Yet the latter approach is often two-step (first being the formation of the NHS-ester) and has the further disadvantage in that it yields an amide-linkage. In addition, carbodiimides are less desirable from a toxicity/environmental perspective. Reductive amination/alkylations are a second approach to site-specifically modifying lysines and preserve the native charge. These reactions rely on imine formation between a reactive aldehyde or ketone and amino groups and are remarkably versatile for functionalizing proteins and small molecules. Yet for large aldehydes (PEG-aldehydes) and ketones in aqueous solution, the proportion of imine formed over any given time interval is vastly smaller than that generated with the use of small molecule aldehydes (formaldehyde). Consequently, reductive alkylations with large moieties can be slow, limited by the first step in the reaction mechanism and may require greater stoichiometric equivalents of reagent to drive the reaction forward. Conversely, the indole-labeling approach relies on the rapid formation of a large population of electrophilic imine (in exchange with free aldehyde) in-situ as a first step, followed by the addition of a highly reactive indole to the electrophile. This is all done in a one-pot reaction and proceeds very rapidly at or above room temperature. It should be noted that substituent effects can modulate the reactivity of the indole C3 position, as was the case for 5-fluoroindole-2-carboxylate, whereby the carboxylate at the C2 position seems to deactivate the C3 toward EAS. Tailoring the reactivity via substituent effects may be a route toward more selective N-terminal labeling. Note that labeling rates demonstrated in the results (Figures 5 and 6) were not optimized for time and the reactions were allowed to proceed without stirring with the intention of best differentiating reaction rates as a function of residue. The fluoroindole reagent is also relatively insoluble and the reaction is therefore dependent on passive dissolution of the material. We observed conjugation within an hour using 100 µM protein in 650 µL volumes in an NMR tube at 25 ˚C. Complete labeling occurs overnight with fluoroindole and within 24 hours with fluoroindole carboxylate. Above 40 ˚C, the reaction proceeds more rapidly, with completion (or overlabeling) within 1-2 hours using fluoroindole and overnight with fluoroindole carboxylate. As noted in the results, overlabeling by fluoroindoles can give rise to sample instability and precipitation. Normally a compromise between stability and reaction completion can be attained, such that representative resonances can be attained for all solvent accessible amino groups. In this case, peak height is a function of protein topology, which is in principle an advantage. The use of fluoroindoles in this study was motivated by the lack of available 19F probes for labeling lysine residues which simultaneously preserve charge on the amino group. The pKas of the indole-modified residues have not been measured to date, but would presumably exhibit a comparable or elevated pKa according to trends for reductively methylated/alkylated lysines.15, 36-40 Whereas the NMR applications of this method satisfy a niche role, the technique should serve as a general and facile strategy for bioconjugations of proteins and small molecule amines alike. Conclusion We describe a chemical tagging method for the site-specific labeling of lysine residues and Nterminal amino groups with functionalized indoles. The reaction takes advantage of the highly reactive C3 position of indole rings and can therefore be used to conjugate various functionalized indoles onto proteins site-specifically under mild, non-denaturing, aqueous conditions. We have shown that the

ACS Paragon Plus Environment

Page 11 of 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Bioconjugate Chemistry

fluoroindole can be a useful probe for fluorine NMR and mass spectrometry whereas we also foresee applications toward protein pegylation, crosslinking, and immobilization, among others. The indolation reaction is rapid, requires low stoichiometric equivalents of reagent to protein and is therefore a useful addition to the toolkit of bioconjugation reactions.

Materials and Methods Formaldehyde (37% w/w) solution, egg white lysozyme, alpha-lactalbumin (Type I), indole, 5fluoroindole, and 5-Fluoroindole carboxylate were all purchased from Sigma Aldrich (Mississauga, ON, Canada). Reactions with indole, 5-fluoroindole or 5-fluoroindole carboxylate. Lyophilized hen egg white lysozyme or bovine α-lactalbumin was dissolved in milli-Q H2O (EMD Millipore) to a concentration of ~200-250 µM. An aliquot of 12C-formaldehyde solution (37% w/w) was added at a 3x stoichiometry relative to the number of free, reactive amines (number of lysines + number of N-terminal amines). Immediately following addition of formaldehyde, the solution was mixed and transferred to a vial or glass NMR tube (for time series data) containing 1.25-5 fold stoichiometric equivalents of solid indole, 5fluoroindole, or 5-fluoroindole-2-carboxylate relative to moles of aldehyde. Batch reactions conducted in vials or 15 mL polypropylene tubes were stirred gently with a micro stir bar. Note that 5-fluoroindole and 5-fluoroindole-2-carboxylate are only slightly soluble and the reaction rate is partly dependent on reagent solubility at a given temperature. Experimentation with co-solvents (0.1-1%) to improve reagent solubility were successful but were less desirable given the adverse effects on protein stability and less consistent reaction rates, and so were not continued. Batch reactions (10-15 mL volumes) were carried out at room temp (23 ˚C) for 2-3 days for 5-fluoroindole or 5-7 days for 5-fluoroindole-2-carboxylate prior to purification. Reactions done above 35 ˚C were found to achieve similar levels of conversion within 0.5-1.5 days whereas below 10 ˚C, conversion after 0.5 days was negligible by 19F NMR. Samples were either quenched with L-arginine and sterile filtered through 0.4 micron Millipore syringe filters (to eliminate any over-modified protein which may have precipitated during the reaction and to remove residual solid reagent), or simply rapidly diluted and buffer exchanged three to four times using Amicon15 spin concentrators (EMD Millipore, Toronto, Ontario, Canada). Spectra acquired as a function of pH were obtained from samples in which buffer was exchanged to 20 mM K+,Na+ phosphate at pH 6.0, 7.2, or 8.0. For the time-series experiments, spectra were acquired within 20 minutes of combining reagents in the NMR tube. NMR sample volumes were 500 - 700 µL with 7-10% D2O. NMR. NMR experiments were performed on either a 600 MHz Varian INOVA with a cryogenic probe tunable to fluorine or a 600 MHz VarianS equipped with a HCN probe tunable to fluorine. Standard 19F, 13 C, and 1H 90˚ pulse widths were 8.5-11 µs, 16-18 µs, and 8.5-11 µs, respectively. A pulsed-field gradient stimulated echo sequence with a 300 ms mix period was used for all diffusion experiments. The 13 1 C, H HMBC experiments employed a JXH of 12 Hz for selection of long-range couplings. NMR spectra were processed with MestreNova (Mestrelab Research, Santiago de Compostela, Spain). Electrospray MS. ESI+ mass spectra were acquired on a Waters Micromass ZQ mass spectrometer (Waters Corp. Mississauga, Ontario, Canada). Each protein spectrum was acquired over a 1 minute period with scans every 0.5 seconds. 1 % formic acid was added to samples shortly before injection. Maximum

ACS Paragon Plus Environment

Bioconjugate Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 14

entropy spectra were calculated using the MassLynx software suite (Waters Corp. Mississauga, Ontario, Canada). Chemical Shift Predictions. 1H and 13C chemical shift predictions were conducted using either ChemDrawUltra 12.0 (CambridgeSoft/PerkinElmer Inc, Waltham Massachusetts, USA) or MestreNova Mnova NMR (Mestrelab Research, Santiago de Compostela, Spain) Associated Content Supporting information available.

References (1) Means, G. E., and Feeney, R. E. (1990) Chemical modifications of proteins: history and applications, Bioconjug Chem 1, 2-12. (2) Trakselis, M. A., Alley, S. C., and Ishmael, F. T. (2005) Identification and mapping of protein-protein interactions by a combination of cross-linking, cleavage, and proteomics, Bioconjug Chem 16, 741-750. (3) Joo, C., Balci, H., Ishitsuka, Y., Buranachai, C., and Ha, T. (2008) Advances in single-molecule fluorescence methods for molecular biology, Annual review of biochemistry 77, 51-76. (4) Biju, V. (2014) Chemical modifications and bioconjugate reactions of nanomaterials for sensing, imaging, drug delivery and therapy, Chemical Society reviews 43, 744-764. (5) Roberts, M. J., Bentley, M. D., and Harris, J. M. (2002) Chemistry for peptide and protein PEGylation, Advanced drug delivery reviews 54, 459-476. (6) Agarwal, P., and Bertozzi, C. R. (2015) Site-specific antibody-drug conjugates: the nexus of bioorthogonal chemistry, protein engineering, and drug development, Bioconjug Chem 26, 176192. (7) Jentoft, J. E., Jentoft, N., Gerken, T. A., and Dearborn, D. G. (1979) 13C NMR studies of ribonuclease A methylated with [13C]Formaldehyde, J Biol Chem 254, 4366-4370. (8) Bokoch, M. P., Zou, Y., Rasmussen, S. G., Liu, C. W., Nygaard, R., Rosenbaum, D. M., Fung, J. J., Choi, H. J., Thian, F. S., Kobilka, T. S., et al. (2010) Ligand-specific regulation of the extracellular surface of a G-protein-coupled receptor, Nature 463, 108-112. (9) Kahsai, A. W., Xiao, K. H., Rajagopal, S., Ahn, S., Shukla, A. K., Sun, J. P., Oas, T. G., and Lefkowitz, R. J. (2011) Multiple ligand-specific conformations of the beta(2)-adrenergic receptor, Nat Chem Biol 7, 692-700. (10) Means, G. E., and Feeney, R. E. (1968) Reductive alkylation of amino groups in proteins, Biochemistry 7, 2192-2201. (11) Madler, S., Bich, C., Touboul, D., and Zenobi, R. (2009) Chemical cross-linking with NHS esters: a systematic study on amino acid reactivities, Journal of mass spectrometry : JMS 44, 694-706. (12) Chalker, J. M., Bernardes, G. J. L., Lin, Y. A., and Davis, B. G. (2009) Chemical Modification of Proteins at Cysteine: Opportunities in Chemistry and Biology, Chem-Asian J 4, 630-640. (13) King, J. L., and Jukes, T. H. (1969) Non-Darwinian Evolution, Science 164, 788-&. (14) Hattori, Y., Furuita, K., Ohki, I., Ikegami, T., Fukada, H., Shirakawa, M., Fujiwara, T., and Kojima, C. (2013) Utilization of lysine (1)(3)C-methylation NMR for protein-protein interaction studies, Journal of biomolecular NMR 55, 19-31. (15) Larda, S. T., Bokoch, M. P., Evanics, F., and Prosser, R. S. (2012) Lysine methylation strategies for characterizing protein conformations by NMR, Journal of biomolecular NMR 54, 199-209.

ACS Paragon Plus Environment

Page 13 of 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Bioconjugate Chemistry

(16) Andre, I., Linse, S., and Mulder, F. A. (2007) Residue-specific pKa determination of lysine and arginine side chains by indirect 15N and 13C NMR spectroscopy: application to apo calmodulin, Journal of the American Chemical Society 129, 15805-15813. (17) Syed, S., Rajpurohit, R., Kim, S., and Paik, W. K. (1992) In vivo and in vitro methylation of lysine residues of Euglena gracilis histone H1, Journal of protein chemistry 11, 239-246. (18) Means, G. E. F., R. E. (1971) Chemical modification of proteins, Holden-Day Inc., San Francisco, California. (19) Lundblad, R. L. (2005) Chemical reagents for protein modification, 3rd ed., CRC Press, Boca Raton, Florida. (20) Hermanson, G. T. (2008) Bioconjugate Techniques, 2nd ed., Academic Press, Elsevier, London, UK. (21) Aitken, A. L., M.; Ward, M.; Tawfik, D. S. (1996) The Protein Protocols Handbook, Humana Press, Totowa, New Jersey. (22) Bandini, M. (2013) Electrophilicity: the "dark-side" of indole chemistry, Organic & biomolecular chemistry 11, 5206-5212. (23) Bandini, M., and Eichholzer, A. (2009) Catalytic functionalization of indoles in a new dimension, Angewandte Chemie 48, 9608-9644. (24) Lakhdar, S., Westermaier, M., Terrier, F., Goumont, R., Boubaker, T., Ofial, A. R., and Mayr, H. (2006) Nucleophilic reactivities of indoles, The Journal of organic chemistry 71, 9088-9095. (25) Borch, R. F., Bernstei.Md, and Durst, H. D. (1971) Cyanohydridoborate Anion as a Selective Reducing Agent, Journal of the American Chemical Society 93, 2897-&. (26) Rayment, I. (1997) Reductive alkylation of lysine residues to alter crystallization properties of proteins, Methods in enzymology 276, 171-179. (27) Fraenkelconrat, H., Brandon, B. A., and Olcott, H. S. (1947) The Reaction of Formaldehyde with Proteins .4. Participation of Indole Groups - Gramicidin, Journal of Biological Chemistry 168, 99118. (28) Joshi, N. S., Whitaker, L. R., and Francis, M. B. (2004) A three-component Mannich-type reaction for selective tyrosine bioconjugation, Journal of the American Chemical Society 126, 15942-15943. (29) Romanini, D. W., and Francis, M. B. (2008) Attachment of peptide building blocks to proteins through tyrosine bioconjugation, Bioconjugate Chem 19, 153-157. (30) McFarland, J. M., Joshi, N. S., and Francis, M. B. (2008) Characterization of a three-component coupling reaction on proteins by isotopic labeling and nuclear magnetic resonance spectroscopy, Journal of the American Chemical Society 130, 7639-7644. (31) Vaney, M. C., Maignan, S., Ries-Kautt, M., and Ducriux, A. (1996) High-resolution structure (1.33 A) of a HEW lysozyme tetragonal crystal grown in the APCF apparatus. Data and structural comparison with a crystal grown under microgravity from SpaceHab-01 mission, Acta crystallographica. Section D, Biological crystallography 52, 505-517. (32) Pike, A. C., Brew, K., and Acharya, K. R. (1996) Crystal structures of guinea-pig, goat and bovine alpha-lactalbumin highlight the enhanced conformational flexibility of regions that are significant for its action in lactose synthase, Structure 4, 691-703. [33] Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., and Ferrin, T. E. (2004) UCSF Chimera--a visualization system for exploratory research and analysis, J Comput Chem 25, 1605-1612. (34) (2004) Persistence of Vision Raytracer (POV-Ray), p Persistence of Vision (TM) Raytracer, Persistence of Vision Pty. Ltd. (35) Kitevski-Leblanc, J. L., Hoang, J., Thach, W., Larda, S. T., and Prosser, R. S. (2013) (1)(9)F NMR studies of a desolvated near-native protein folding intermediate, Biochemistry 52, 5780-5789. (36) Gao, G., DeRose, E. F., Kirby, T. W., and London, R. E. (2006) NMR determination of lysine pKa values in the Pol lambda lyase domain: mechanistic implications, Biochemistry 45, 1785-1794.

ACS Paragon Plus Environment

Bioconjugate Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 14

(37) Gerken, T. A., Jentoft, J. E., Jentoft, N., and Dearborn, D. G. (1982) Intramolecular interactions of amino groups in 13C reductively methylated hen egg-white lysozyme, J Biol Chem 257, 28942900. (38) Jentoft, J. E., Gerken, T. A., Jentoft, N., and Dearborn, D. G. (1981) [13C]Methylated ribonuclease A. 13C NMR studies of the interaction of lysine 41 with active site ligands, J Biol Chem 256, 231236. (39) Brown, L. R., De Marco, A., Wagner, G., and Wuthrich, K. (1976) A study of the lysyl residues in the basic pancreatic trypsin inhibitor using 1H nuclear magnetic resonance at 360 Mhz, European journal of biochemistry / FEBS 62, 103-107. (40) Imoto, T. J., L. N.; North, A. C. T.; Phillips, D. C.; Rupley, J. A. (1972) The Enzymes, 3rd ed., Academic Press, New York. (41) Koradi, R., Billeter, M., and Wuthrich, K. (1996) MOLMOL: a program for display and analysis of macromolecular structures, Journal of molecular graphics 14, 51-55, 29-32. (42) Fraczkiewicz, R., and Braun, W. (1998) Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules, J Comput Chem 19, 319-333. (43) Tsodikov, O. V., Record, M. T., Jr., and Sergeev, Y. V. (2002) Novel computer program for fast exact calculation of accessible and molecular surface areas and average surface curvature, J Comput Chem 23, 600-609. (44) Touw, W. G., Baakman, C., Black, J., te Beek, T. A., Krieger, E., Joosten, R. P., and Vriend, G. (2015) A series of PDB-related databanks for everyday needs, Nucleic Acids Res 43, D364-368. (45) Li, H., Robertson, A. D., and Jensen, J. H. (2005) Very fast empirical prediction and rationalization of protein pKa values, Proteins 61, 704-721. (46) Bas, D. C., Rogers, D. M., and Jensen, J. H. (2008) Very fast prediction and rationalization of pKa values for protein-ligand complexes, Proteins 73, 765-783.

ACS Paragon Plus Environment