Quantitative Analysis of Protein Covalent Labeling Mass Spectrometry

Jun 3, 2019 - Here, we provide a substantial improvement in the analysis of CL-MS data with the development of an extended plug-in built within the Ma...
1 downloads 0 Views 1MB Size
Article pubs.acs.org/ac

Cite This: Anal. Chem. 2019, 91, 8492−8499

Quantitative Analysis of Protein Covalent Labeling Mass Spectrometry Data in the Mass Spec Studio Daniel S. Ziemianowicz,†,‡ Vladimir Sarpe,† and David C. Schriemer*,†,‡,§ †

Department of Biochemistry and Molecular Biology, ‡Robson DNA Science Centre, Arnie Charbonneau Cancer Institute, and Department of Chemistry, University of Calgary, Calgary, Alberta T2N 4N1, Canada

§

Downloaded via UNIV AUTONOMA DE COAHUILA on July 18, 2019 at 12:35:03 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: Covalent labeling with mass spectrometry (CL-MS) provides a direct measure of the chemical and structural features of proteins with the potential for resolution at the amino-acid level. Unfortunately, most applications of CL-MS are limited to narrowly defined differential analyses, where small numbers of residues are compared between two or more protein states. Extending the utility of high-resolution CL-MS for structure-based applications requires more robust computational routines and the development of methodology capable of reporting of labeling yield accurately. Here, we provide a substantial improvement in the analysis of CL-MS data with the development of an extended plug-in built within the Mass Spec Studio development framework (MSS-CLEAN). All elements of data analysisfrom database search to site-resolved and normalized labeling outputare accommodated, as illustrated through the nonselective labeling of the human kinesin Eg5 with photoconverted 3,3′-azibutan-1-ol. In developing the new features within the CL-MS plug-in, we identified additional complexities associated with the application of CL reagents, arising primarily from digestion-induced bias in yield measurements and ambiguities in site localization. A strategy is presented involving the use of redundant site labeling data from overlapping peptides, the imputation of missing data, and a normalization routine to determine relative protection factors. These elements together provide for a robust structural interpretation of CL-MS/MS data while minimizing the over-reporting of labeling site resolution. Finally, to minimize bias, we recommend that digestion strategies for the generation of useful overlapping peptides involve the application of complementary enzymes that drive digestion to completion.

P

reagents used in CL may be functional-group specific such as diethyl pyrocarbonate (targeting D,E residues) and sulfo-Nhydroxysuccinimide acetate (targeting K residues)14,16 or functional-group nonspecific such as hydroxyl radicals or carbenes.15,17−19 While significant, the methodological advances in reagent development,8,20−24 data collection and analysis9,25,26 have not yet transformed CL-MS into a widely used technique. CL data are usually interpreted on the premise that the labeling yield is proportional to the accessibility of a residue to the solvent or CL reagent. However, the actual per-residue labeling yield is influenced by other factors as well, such as chemical reactivity and the structural factors governing the partitioning of CL reagents near the residue in question. In its most straightforward form, CL-MS is used in a differential fashion by comparing the extent of residue modification between two protein states. In the presence and absence of a ligand, for example, one may infer which residues are involved in the interaction.27 This straightforward comparative approach avoids the need to determine the proportionality

roteins display a remarkable specificity in interacting with components of their molecular environment. These interactions are mediated by motifs predominantly on the protein surface and are driven by shape complementarity and noncovalent interactions.1−3 Identifying and mapping the topography of proteins is a central activity in the study of molecular mechanisms and in the determination of structure− function relationships. It can reveal structural dynamics4,5 and protein−protein6,7 or protein−ligand8,9 interactions. Ultimately, topographical data can even be applied to de novo structural modeling10,11 or integrative modeling of protein complexes.12 Several biophysical methods have been applied to such activities (e.g., X-ray crystallography and nuclear magnetic resonance10 and electron paramagnetic resonance spectroscopies13). MS-based methods are very attractive, as there are few restrictions on the states under which they can be applied (individual proteins to whole proteomes). Among the many MS techniques available, covalent labeling (CL)-MS is particularly powerful as it can provide a topographical analysis that returns data on both the structural and the chemical features of protein surfaces. Topographical mapping in a MS workflow requires the labeling of residues with chemical reagents, the products of which are usually quantified upon proteolytic digestion and LC-MS/MS analysis.14,15 Chemical © 2019 American Chemical Society

Received: April 1, 2019 Accepted: June 3, 2019 Published: June 3, 2019 8492

DOI: 10.1021/acs.analchem.9b01625 Anal. Chem. 2019, 91, 8492−8499

Article

Analytical Chemistry

access to features in the core library, such as the conversion of all MS vendor datatypes (either natively or through Proteowizard38) into an optimized .mssdata binary data file format for efficient, live data retrieval. The wizard-style project setup allows for easy project building, while an intuitive user interface enables streamlined data inspection. We maintain and enhance the Studio framework for these applications and continue to add new reusable content. The CL module was refactored from v1 of the Mass Spec Studio and augmented with a wrapped version of MS-GF+ to search for variable modifications in user-supplied databases. The search capability was also supported with a “label maker” wizard, allowing for definition and editing of reaction products and labeling rules. A new algorithm was generated for the selection and grouping of LC-MS features and MS/MS spectra for all labeled and nonlabeled peptide states and combined with improved logic for assigning yields on a per-residue basis (see below). This algorithm is used to support the visualization of label yield mapped to sequence. File management was improved with a rebuilt project management user interface, and LC-MS/MS inspection is supported with improved spectral navigation tools. File management also supports the designation of control data sets, used in the generation and export of normalized output for further analysis and visualization (e.g., through Chimera). For the data sets described below, a database search was performed with all default parameters retained, except for the following: minPepLength = 5, maxPepLength = 40, minPepCharge = 2, maxPepCharge = 5, instrumentID = Q Exactive, massTolerance = 10 ppm, numModsPerPeptide = 2, usePercolator = False, CutoffQValue = 0.05. Additional Studio parameter values included the following: XIC selector mass tolerance = 10 ppm, fragment mass selector mass tolerance = 20 ppm. Eg5 Protein Preparation. The motor domain of Eg5 (1386) was expressed and purified following a previously described procedure.39 Briefly, plasmid DNA was transformed into competent BL-21 E. coli. Cells were grown at 37 °C for approximately 7 h. Expression was induced with the addition of IPTG overnight at room temperature. The resulting cell pellet was lysed and lysate collected and loaded onto Ni−NTA agarose gel and washed. Eg5 was eluted with an imidazole gradient in a gravity flow column. Collected fractions were analyzed with SDS-PAGE, buffer exchanged into 200 mM NaCl, 1 mM MgCl2, 20 mM PIPES pH 7.4 buffer; purified protein concentrations were determined to be approximately 3.3 mg/mL by a BCA assay. For covalent labeling, PIPES was replaced with HEPES buffer at 20 mM; all other components remained equal. Sample Preparation. We used three classes of Eg5 motor domain samples for our labeling experiments: intact protein in neutral buffer conditions, denatured protein, and digested protein (i.e., predigested state); the latter two required additional preparative steps. Denatured Eg5 motor domain was generated by incubating 20 μM protein with 8 mM dithiothreitol (DTT) (Sigma-Aldrich) in 40 mM ammonium bicarbonate (≥99%; Sigma-Aldrich) for 30 min 56 °C followed by 80 mM chloroacetamide (CAA) (Sigma-Aldrich) for 30 min in the dark at room temperature (19.5 °C). The predigest was prepared using denaturation as above followed by digestion with trypsin or pepsin as follows: trypsin (Thermo Scientific, sequencing grade) was added at an enzyme-tosubstrate ratio of 1:20 and incubated for 4 h at 37 °C or

constant for each and every residue, as would be required for a direct topographical mapping exercise.15,26 To take full advantage of CL-MS data for topographical mapping, we require a more involved normalization routine and software that better supports CL-MS workflows. We do not yet have a complete software solution for accurate analysis of residue-resolved labeling data arising from the use of any CL reagent (specific or nonspecific). Existing data analysis routines use workflows involving the concatenation of multiple tools not specifically designed for CL-MS and/or costly commercial software.7,28−30 For example, a workflow for analysis of hydroxyl radical covalent labeling data has typically involved (1) raw data conversion and extraction of spectra (e.g., Rosetta Elucidator, msconvert), (2) searching the spectra with a proteomics database-search engine (e.g., MASCOT,31 MS-GF +32), (3) additional annotation of labeled peptides with scripts (e.g., Excel macros), (4) quantification of chromatograms for labeled and nonlabeled peptides (e.g., MZmine33), and (5) manual normalization and/or statistical analysis of data (e.g., Excel, MATLAB). Some of these steps are packaged as a set of algorithms in ProtMapMS,34 but ProtMapMS remains specific to experiments using water radiolysis and does not allow for data inspection. In other activities, Jones et al. developed a custom configuration of the commercial software Proteome Discoverer and an Excel add-on for data analysis.35 We also developed a solution using the Mass Spec Studio (MSS) framework v1,36 blending standalone search tools, peptidelevel yield measurements, and MS/MS fragment analysis. Here, we present MSS-CLEAN (Covalent Label Estimation And Normalization; MSS-CLEAN is available free for download at www.msstudio.ca), a substantial enhancement of the CL-MS data analysis module, rebuilt in the Mass Spec Studio software development framework v2.37 Our module offers all of the essential functions of a CL-MS analytical workflow in a user-friendly, self-contained software package with a simple graphical user interface. We embed database search algorithms and a set of new features that allow the CLMS module to function as a complete solution. These include the application to any CL reagent (specific or nonspecific), full support for chimericity, and an improved algorithm for quantification of per-residue labeling8,9,26 to compensate for incomplete fragmentation and the effect of neutral loss. We propose and test a strategy for increasing residue coverage using complementary digests and test a strategy for normalizing labeling data in support of topographical mapping. Testing involved the analysis of CL-MS data from the nonspecific labeling of a human kinesin, Eg5, using photoconverted 3,3′-azibutan-1-ol.



EXPERIMENTAL SECTION CL Plug-in Design. The Mass Spec Studio framework (v2) was designed to capture and recycle basic and advanced MSbased signal processing functionality, calculators, algorithms, and resources for reuse in the building of entirely new analysis packages, focused primarily on structural mass spectrometry applications (e.g., HX-MS, XL-MS, CL-MS).36,37 The framework is a composite application where our component coupling strategy allows for effective use of shared tools as well as straightforward development of new extensions and new pluggable content. Mass Spec Studio is written in C# using the .NET framework. To allow for easy extensibility, the Studio employs MEF (Microsoft Extensibility Framework), AvalonDock, and Prism. All analysis packages benefit from easy 8493

DOI: 10.1021/acs.analchem.9b01625 Anal. Chem. 2019, 91, 8492−8499

Article

Analytical Chemistry

Figure 1. CL-MS data analysis workflow in MSS-CLEAN. Experiments may include multiple proteins and any CL chemistry. High-resolution raw data (from any instrument) are processed in a single project and converted to mzML format for database searching (i.e., MS-GF+) and to the mssdata format for fast processing. MSS-CLEAN supports data validation at multiple levels. Finally, data can be exported in a flexible manner, including formats compatible with visualization (e.g., UCSF chimera).

with a 40 min 3−35% B gradient at 300 nL/min. Mobile phase A consisted of 0.1% v/v formic acid in 3% acetonitrile (LC-MS grade; Thermo Scientific;, mobile phase B consisted of 0.1% v/ v formic acid in 97% acetonitrile. Data were acquired on an LTQ Orbitrap Velos (Thermo Scientific) in OT/OT mode. Spray voltage was set to 2.5 kV, and the transfer capillary temperature was set to 285 °C. MS scans were acquired with a resolution of 60 000 and an m/z range from 300 to 2000. The top 12 most intense ions with a ≥ 2+ charge state and ≥2.0 × 104 signal intensity were selected for fragmentation via HCD with NCE = 35 and an isolation width of 2.0 Th. MS/MS data were acquired with a resolution of 7500 with an AGC target of 1.0 × 105. Data were acquired twice, once with a dynamic exclusion of 30 s and once without dynamic exclusion.

porcine pepsin (Sigma-Aldrich) at a ratio of 1:20 and incubated for 1 h at 37 °C. Tryptic digests were quenched with 0.5% formic acid (ACS reagent grade ≥98%, Thermo Scientific) and extracted with a Hypersep C18 SpinTips (Thermo Scientific), whereas peptic digests were quenched by immediate solid-phase extraction. Peptides were eluted with 60% acetonitrile (LC-MS grade; Thermo Scientific) and 0.1% trifluoroacetic acid (HPLC grade; Merck Millipore). Samples were lyophilized in a Savant SpeedVac (Thermo Scientific) and peptides resuspended according to downstream use (see below). Carbene Labeling. For labeling, samples were prepared with 10 μM protein or peptide solution and 10 mM 3,3′azibutan-1-ol in 20 mM HEPES (≥99.5% titration; SigmaAldrich) pH 7.4 and 50 mM NaCl (Biotechnology grade; Amresco). Samples were equilibrated at room temperature for 10 min. Approximately 1 μL sample volumes in a windowed 450 μm i.d./670 μm o.d. fused silica capillary (Molex) were flash frozen in liquid nitrogen, followed by irradiation with 500 mJ of 355 nm light (5 × 100 mJ pulses, 10 ns pulse width at 10 Hz) from a Nd:YAG laser (YG 980; Quantel). The beam was focused with biconvex (f = 70 mm) and plano-concave cylindrical lenses (f = 75 mm) to produce a 0.8 mm × 7 mm elliptical beam shape to maximize photon deposition (≤80%) into the frozen sample. Replicates consisted of independently irradiated samples; each replicate consisted of two 1 μL irradiated samples to obtain enough material for repeated downstream analysis. Following irradiation and thawing, the protein samples were reduced, alkylated, and digested as described above. Tryptic digestion was quenched with 0.5% formic acid; peptic digestion was quenched with 90 mM NaOH (ACS grade; VWR) at 95 °C for 5 min followed by acidification with 1.4% formic acid; final peptide concentration for each digest was estimated at 1 μM. Propionylation of Bovine Serum Albumin (BSA; Sigma-Aldrich) was performed according to Lin and Garcia, 2012.40 Labeled and nonlabeled intact BSA were mixed at 1:50 and 1:5 ratios followed by tryptic digestion as above. LC-MS/MS Data Acquisition. Following the quenching of digestion, 2 μL of sample was injected via an nLC-1000 (Thermo Scientific) equipped with an Acclaim PepMap 100 guard column (75 μm × 2 cm C18, 3 μm particles, 100 Å; Thermo Scientific) and separated using a self-packed C18 HPLC column (75 μm × 15 cm, Kinetex 2.6 μm particles, Peptide XB C18, 100 Å; Phenomenex). Peptides were eluted



RESULTS AND DISCUSSION Overview of Data Analysis Workflow. The Mass Spec Studio provides a graphical wizard to guide the user through project setup (Figure 1). Projects are structured to allow for differential CL-MS analyses for footprinting of molecular interactions or for standalone protein mapping exercises. Following project setup, raw data are converted to the mzML41 format for compatibility with the MS-GF+ search tool integrated into the Studio. We chose MS-GF+ as it is an optimized search tool for post-translational modifications.32 Peptide spectral matches (PSMs) are generated for both labeled and nonlabeled peptides using searches parametrized by the user. After processing the user may evaluate all PSMs using an interactive graphical display (Figure S1A). To facilitate the rapid inspection and grouping of PSMs, the binary data are automatically retrieved, annotated, and visualized rather than drawing from the intermediate mzML formatted data. Accessing the source data allows for better labeling inspection and/or PSM rejection (Figure S1B). Ultimately, grouped PSMs for each labeled peptide are used to generate and display calculations of per-residue labeling yields. MSS-CLEAN also exports normalized labeling yields and associated measures of significance. The computational approaches used for these processes will be described below. A custom-formatted text file of normalized yields can be exported for mapping data onto a protein structure in the software UCSF Chimera (via the “define attributes” tool) for visual interpretation of CL data. Finally, we provide the user with additional rich output, including absolute per-residue labeling 8494

DOI: 10.1021/acs.analchem.9b01625 Anal. Chem. 2019, 91, 8492−8499

Article

Analytical Chemistry

Figure 2. Workflow for calculation of per-residue labeling yield from XIC’s and MS/MS data. (A) XICs showing nonlabeled (gray) and labeled (blue) peptide. Peptide-level labeling yields are calculated using eq 1. Triangles correspond to MS/MS triggers. (B) Integrated MS/MS spectrum of all PSMs from a singly labeled peptide with nonlabeled (black) and labeled (blue) fragment ions. (C) Fractional distribution of per-residue labeling calculated using y ions from the integrated MS/MS spectrum using eq 2. M = full peptide sequence. (D) Per-residue labeling yields calculated according to eq 3.

yields are then weighted by the fractional peptide labeling yield to generate the labeling yield L(r) of a residue i according to

yields with respect to the protein sequence(s), fractional perresidue labeling yields with respect to individual peptides, fractional per-residue yields, and peptide XIC peak areas delineated by replicate. Database search results, in the native search algorithm output format, are also available in the project data directory. High-Resolution Yield Estimation. Our strategy for residue level data analysis builds upon a method we described earlier.9 Briefly, covalent labeling yields are calculated using both the peptide- and the residue-level information (Figure 2) under an assumption that modifications have only a modest effect on ionization and fragmentation efficiencies. Proteomics experiments support the practicality of these assumptions in many situations (e.g., label-free quantification42 and phosphorylation analysis43), but we recognize that yields are best described as estimations as a result. Yield calculations require the detection of XIC features and the assignment of label positions in the MS/MS spectra, but incomplete chromatographic resolution of positional isomers prevents a simple assignment of chromatographic features to single-residue positions. Under the assumptions above, we first calculate the combined, fractional peptide labeling yield Ya according to n

Ya =

L(ri) = Y [f (yi ) − f (yi − 1)]

By taking into account all PSMs for a single labeled peptide sequence, we represent all labeled peptide positional isomers. This strategy is limited only by the sampling rate in a datadependent acquisition (DDA) method (Figure S2). In this study, stronger sampling was achieved using two runs with different exclusion settings, once with a dynamic exclusion (DE) of 30 s (approximately 1.5x chromatographic peakwidth) and again with no DE. Previous work showed that although a standard DE time interval allows for a greater number of peptide identifications overall, improved MS/MS sampling of a set of positional isomers is achieved with DE effectively disabled.21 Ultimately, targeted acquisition methods such as Parallel Reaction Monitoring (PRM) or comprehensive acquisition methods such as Data-Independent Acquisition (DIA) may provide more extensive sampling. Regardless, in the resulting integrated MS/MS spectrum, fragmentation may be insufficient to resolve labeling yield to a single-residue site. In this situation, labeling yield is simply averaged among the ambiguous, i.e., nonfragmented elements of sequence and highlighted as such. There are additional situations where the spectral input to the calculations are either insufficient to sustain the yield calculation or actually generate erroneous values. In one situation, missing values for a labeled/nonlabeled pair will prevent calculation of f(yi), requiring a data imputation strategy (Figure 3A). For example, given an integrated spectrum where the labeled fragment yi is observed but the corresponding nonlabeled fragment is not, we impute the nonlabeled fragment yi. To reduce imputation error, we require that the observed fragment peak has a signal-to-noise (S/N) ratio ≥ 10 (in a strategy based on Decon2LS45), and imputed fragment intensities are attributed a value of one-half of the calculated noise level, i.e., N/2. In this manner, the error of imputed values is limited to ≤5%. The same strategy is applied in the opposite instance (i.e., nonlabeled fragment observed, but no labeled fragment). Interpolation is a frequent occurrence in our data set (78.2% of fragment ions). In the other situation, erroneous values may be returned when sequence ions generate neutral losses from labile products of the labeling reaction. We previously described this phenomenon for carbene insertion into acidic residues, creating labile esters.9 CID-type fragmentation can result in minor neutral loss in such situations. In this case, the perresidue yield measurements will be falsely low. ETD or ECD

m

∑i = 1 ∑ j = 1 iaj n

m

a0 + ∑i = 1 ∑ j = 1 iaj

(1)

where a0 represents the peak area of the unlabeled peptide a, aj represents the peak area of a labeled peptide up to m distinct features for the labeled state (Figure 2A, blue trace), and i represents the number of labels on a peptide. To generate an accurate measure of the retention time distribution for all labeled states, we implement the strategy of “matching between runs”,44 where features are selected based on the union of PSMs between all replicates. In this manner, lowintensity features which may have avoided sampling in one run are detected (Figure 2A, blue triangles). We then generate a fully chimeric MS/MS spectrum that combines all PSMs spanning the retention time distribution of the modifications, and calculate the fractional yield f(yi) as f (yi ) =

I(yi )1 I(yi )1 + I(yi )0

(3)

(2)

from the intensities I of singly modified and nonmodified peptide sequence fragment i. Here we restrict fractional yield calculations to PSMs of single-labeled peptides, as higher-order labeled peptides (≥2) are usually low abundance and add no additional information on label distribution. These fractional 8495

DOI: 10.1021/acs.analchem.9b01625 Anal. Chem. 2019, 91, 8492−8499

Article

Analytical Chemistry

conformations, that is, only isolated sequence ions demonstrate this, and thus, the flanking residues seem reliable. Interpolation of fractional yields due to neutral-loss effects occurred at a low frequency in our data set (3.1%). Our approach is a conservative one: if the proximal values are both one (for example), the method will impute the correct value. If they are nonzero, the method simply functions like a missing fragment detector, and the yields are averaged over the ambiguous sequence (Figure 3C). Integration of Multiple Measurements. To further reduce error in yield measurement for a given residue, we combine data from multiple peptides (Figure 4). When

Figure 4. Concept of increased sampling of residue CL yields through measurements of multiple overlapping peptides. Labeling yield of residue V42 is averaged across all peptides containing V42. Additionally, multiple sets of peptides may be generated from different proteases (red and blue).

calculating final residue labeling yields across a protein sequence, the Studio combines residue labeling yields first for all overlapping peptides within a replicate and then across replicates. We observed that intrareplicate variability of labeling yield is greater than the inter-replicate variability (Table S1), that is, a residue’s labeling yield varies to a greater extent between different peptide sequences than between the same peptide from different replicates. This phenomenon appears to arise because of the influence of labeling on protein digestion. The modification of cleavage site residues may inhibit proteolysis entirely, and even distal modifications can influence Km values for a given cleavage site. The result is an altered digestion pattern after labeling at the peptide level (see Supporting Information). However, if digestion is driven to completion then combining the labeling data for a given residue across all peptides containing the residue should return a value closer to the actual yield. To test this idea we used propionylation, a well-controlled chemistry that can completely label lysines.40 Bovine serum albumin (BSA) was propionylated and then mixed with nonlabeled BSA to simulate labeling at a 2% level and a 20% level. These mixtures were digested to completion with chymotrypsin and label incorporation determined for each peptide containing a lysine. The variance can be high, but accurate incorporation values are returned when the labeling levels for multiple peptides are averaged. For the 2% labeling experiment the mean and standard deviation was 2 ± 1% (n = 3), and for the 20% labeling experiment the mean and standard deviation was 19 ± 10 (n = 7). Thus, MSS-CLEAN first uses all overlapping peptides in a run to determine an average yield measurement for a residue and then averages this value across replicate LC-MS/MS runs.

Figure 3. (A) Fragment intensity imputation depicted with a stylized MS/MS spectrum showing three missing data scenarios. Nonlabeled fragment intensities in black; labeled fragments in blue. Red dashed line corresponds to noise level (relative intensity = 1); green dashed line corresponds to the S/N = 10 threshold for imputation. (i) Nonlabeled fragment with S/N > 10 and missing labeled fragment intensity imputed at 0.5N; (ii) nonlabeled fragment with S/N < 10, missing labeled fragment intensity, thus discarded; (iii) labeled fragment with S/N > 10 and missing nonlabeled fragment imputed at 0.5N. (B) Fractional CL yield interpolation. HCD data (orange) showing neutral loss at fragment 7 and missing data (cyan) at fragment 3. Corrected output of fractional yield (blue) includes the interpolation of values for 7 and 3, resulting in regions of averaged labeling yield (green dotted regions). (C) Per-residue labeling yields corresponding to the corrected fractional yield plot in B. Site-resolved yields in blue; ambiguously localized, averaged yields in green.

fragmentation restores accuracy, but these modes may not practical in many situations (e.g., low charge state ions). Neutral losses return f(yi) values that are lower than f(yi−1) values (Figure 3B). It is not physically possible for the fragment with i residues to have a lower yield than the fragment with i − 1. Thus, we remove these inconsistent fractional yields from the calculation. In such instances we then interpolate the value based on the slope between the two flanking residues. This strategy is effective, as most instances of strong neutral loss appear to depend upon particular gas-phase 8496

DOI: 10.1021/acs.analchem.9b01625 Anal. Chem. 2019, 91, 8492−8499

Article

Analytical Chemistry

Figure 5. (A) Mean labeling yield of tryptic peptides found in common between the native, denatured, and predigested states (error bars SD, n = 5). Peptides 5 and 6 correspond to tryptic missed cleavages (GVIIKGLEEITVHNKDEVYQILEK and IGKLNLVDLAGSENIGR). (B) Volcano plot of mean residue-level protection factors and corresponding Welch’s t-test probabilities. Protection factors of the native, i.e., structured state were calculated using either denatured or predigest unstructured states via eq 4. Normalized yield values < 0 indicate protected residues; values ≥ 0 indicate deprotected residues (n = 3−5).

We explored these matters further using digests of the labeled and nonlabeled states of kinesin Eg5. We generated 55 tryptic peptides in the labeled state and only 38 peptides in the nonlabeled state. The additional peptides are the result of missed cleavages induced by labeling (see Supporting Information). Such missed cleavages skew the calculation of peptide level yields, as the corresponding nonlabeled peptide a0 is not present (see eq 1). We tested if a nonspecific enzyme such as pepsin would allow for better residue coverage through generation of more overlapping peptides and thus higher accuracy. Interestingly, we obtained the opposite: fewer peptides in the labeled state (n = 68) vs the nonlabeled state (n = 163). We suspect that label-altered digestion kinetics is more problematic for a nonselective enzyme like pepsin. Unlike with tryptic digestion that is driven to completion, peptic digestion must be first optimized to generate a set of mass-measurable peptides; driving peptic proteolysis to completion would result in small peptides not easily detectable

by standard LC-MS/MS proteomics methods. In the measurable set we found that residue yields from pepsin digests were not consistent with tryptic digestions. Nevertheless, we maintain that parallel analyses with complementary proteases could improve yield measurement accuracy provided that the proteases have limited specificity and are driven to completion (e.g., LysArginase, Glu-C, etc.). Normalization of Yield Measurements. In order to map protein topography, the yield measurements require some form of normalization with an unstructured state. For any reagent used in covalent labeling experiments the per-residue labeling yields are dependent on (1) the inherent chemical reactivity of a residue with primary sequence, (2) the local structural effects influencing reagent partitioning (e.g., structural motifs), and (3) the higher order structural effects of the intact protein (e.g., buried vs exposed motifs). We examined two candidate strategies for normalization: a simple denatured state inspired by Sharp et al.26 using high temperature with disulfide bond 8497

DOI: 10.1021/acs.analchem.9b01625 Anal. Chem. 2019, 91, 8492−8499

Article

Analytical Chemistry

decrease in labeling yield (with a probability of α ≥ 0.05) in the native state, indicating a fraction of residue burial that roughly correlates with residues with a < 0.20 fractional solvent-exposed surface area of Eg5 (50.6%; 177/350 residues).

reduction and alkylation and a predigested state. Using the kinesin Eg5 protein as a test case, both surrogates for the unstructured state were labeled and site labeling data determined. The native and denatured states showed a range of labeling yields (0.4−98% and 0.1−81%, respectively), and the pattern of labeling was very similar (Figure 5A). It would appear that this level of denaturation for this particular protein did not lead to an irreversibly unstructured state: upon a return to labeling conditions, the protein reformed in whole or in part or may even have retained structure throughout thermal denaturation.46,47 To maintain an unstructured protein state via denaturation would require high temperatures during labeling and/or the use of a chaotropic agent.48 Unfortunately, such extreme conditions would also influence the reactivity of any labeling reagent and further complicate normalization. As an alternative means of normalization, we explored the labeling of a predigested protein, where we assume that all residue level and local structural influences on labeling yields will be preserved and higher order structure will be absent. Here, approximately 1 order of magnitude in the dynamic range of labeling yield was observed at the peptide and residue level. Labeling levels were also consistently higher than the structured state, for example, the peptide yields ranged from 25% to 83% (Figure 5A) and thus indicate effective reduction/ removal of higher order structure. Note that the predigest consists mainly of fully tryptic peptides. However, when digesting a protein postlabeling, modified residues can prevent proteolysis, resulting in a higher level of missed cleavage peptides compared to the predigest (e.g., peptides 5 and 6, Figure 5A). Combining all overlapping peptides (Figure 4) helps diminish any bias induced by this effect at the residue level. MSS-CLEAN was adapted to allow for either normalization strategy, and generate a calculated relative protection factor, P, for residue i Pi = log

L(ri)s L(ri)u



CONCLUSIONS The natural variation in labeling yield for any given protein chemistry, coupled with the structural influence on yield, makes detection and confident determination of labeling a difficult undertaking. MS/MS-based methods will always bear some level of ambiguity as a result, which is particularly the case for nonspecific labeling chemistries, such as carbenes or hydroxyl radicals. MSS-CLEAN provides a best estimate of yield from overlapping peptides while faithfully representing site determination ambiguity in the underlying data. MSSCLEAN also provides the complete integration of all steps required for processing and visualizing CL-MS data. A methodological insight that emerges from our development activities relates to the digestion and the manner in which residue levels should be estimated. Labeling strongly influences digestion kinetics, so unless all peptides spanning a given residue can be quantitated from a complete digestion, determinations should only be considered estimates; individual peptides are likely biased. At a minimum, only the detection of all peptides in a complete digestion can recover accuracy, as all protein forms must be represented in the peptide set. Finally, the normalization strategy is an important element in any CLMS experiment, and a useful method involves the labeling of a predigested protein. MSS-CLEAN provides a relative protection factor calculation, which will allow us to explore methods for accurate topographical mapping in subsequent studies, and ultimately will assist in refining structures determined by other methods.



ASSOCIATED CONTENT

* Supporting Information S

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.analchem.9b01625.

(4)

where L(ri) represents the labeling yield for residue i in the structured (s) and unstructured (u) states (Figure 5B). This protection factor is analogous to the protection factor calculated in hydrogen−deuterium exchange mass spectrometry.49,50 Protection factors additionally have corresponding Welch’s t-test probabilities (i.e., p values)51,52 calculated for additional filtering. In MSS-CLEAN we allow for a relative determination to allow the user to reference any state: ligand treated, denatured, or digested. We suggest that the predigested form is the best representation of the denatured state and best reflects the chemical reactivity of the insertion. In MSS-CLEAN, if the predigested state is selected, labeling yields for terminal residues are excluded because the free carboxyl and amine groups will distort the estimate. Finally, the normalization state selected for use also incorporates full peptide redundancy in the calculation of residue yields to improve the accuracy of the relative protection factor. We anticipate that the per-residue covalent labeling yields normalized using predigests would correlate with the degree of solvent exposure in some fashion. A complete analysis of this question is the subject of a future study, but here we simply note that 44% of the labeled residues detected in both the native and the predigested states (99 out of 223) show a ≥20%



Screenshots of MSS-CLEAN data analysis and validation views; comparison of DDA MS/MS sampling frequency between a 30s DE and no DE acquisition methods (PDF) Table showing the inter- and intrareplicate variability of per-residue yield quantification; lists of peptides obtained from each protein state via pepsin and trypsin digestions (XLSX)

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

David C. Schriemer: 0000-0002-5202-1618 Author Contributions

DSZ and DCS conceptualized the project and designed the Studio CL-MS module. DSZ performed all experiments and data analysis. VS coded the Studio software. DSZ and DCS wrote manuscript. All authors reviewed and edited the manuscript. 8498

DOI: 10.1021/acs.analchem.9b01625 Anal. Chem. 2019, 91, 8492−8499

Article

Analytical Chemistry Funding

(23) Rinas, A.; Mali, V. S.; Espino, J. A.; Jones, L. M. Anal. Chem. 2016, 88 (20), 10052−10058. (24) Riaz, M.; Misra, S. K.; Sharp, J. S. Anal. Biochem. 2018, 561− 562 (July), 32−36. (25) Xie, B.; Sharp, J. S. J. Am. Soc. Mass Spectrom. 2016, 27 (8), 1322−1327. (26) Xie, B.; Sood, A.; Woods, R. J.; Sharp, J. S. Sci. Rep. 2017, 7 (1), 4552. (27) Wang, L.; Chance, M. R. Mol. Cell. Proteomics 2017, 16 (5), 706−716. (28) Gau, B. C.; Chen, J.; Gross, M. L. Biochim. Biophys. Acta, Proteins Proteomics 2013, 1834 (6), 1230−1238. (29) Borotto, N. B.; Vachet, R. W.; Graban, E. M.; Vaughan, R. C.; Zhou, Y.; Hollingsworth, S. R.; Hale, J. E. Anal. Chem. 2015, 87 (20), 10627−10634. (30) Bern, M.; Kil, Y. J.; Becker, C. Byonic: Advanced Peptide and Protein Identification Software. Current Protocols in Bioinformatics; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2012; Vol. 40, pp 13.20.1−13.20.14. (31) Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S. Electrophoresis 1999, 20 (18), 3551−3567. (32) Kim, S.; Pevzner, P. A. Nat. Commun. 2014, 5, 5277. (33) Pluskal, T.; Castillo, S.; Villar-Briones, A.; Orešič, M. BMC Bioinf. 2010, 11, 395. (34) Kaur, P.; Kiselar, J. G.; Chance, M. R. Anal. Chem. 2009, 81 (19), 8141−8149. (35) Rinas, A.; Espino, J. A.; Jones, L. M. Anal. Bioanal. Chem. 2016, 408 (11), 3021−3031. (36) Rey, M.; Schriemer, D. C.; Baker, C. A. H.; Burns, K. M.; van Dijk, M.; Bonvin, A. M. J. J.; Sarpe, V.; Buse, J.; Wordeman, L. Structure 2014, 22 (10), 1538−1548. (37) Schryvers, A. B.; Ostan, N.; Schriemer, D. C.; Sarpe, V.; Rafiei, A.; Hepburn, M. Mol. Cell. Proteomics 2016, 15 (9), 3071−3080. (38) Chambers, M. C.; Maclean, B.; Burke, R.; Amodei, D.; Ruderman, D. L.; Neumann, S.; Gatto, L.; Fischer, B.; Pratt, B.; Egertson, J.; et al. Nat. Biotechnol. 2012, 30 (10), 918−920. (39) Sheff, J. G.; Farshidfar, F.; Bathe, O. F.; Kopciuk, K.; Gentile, F.; Tuszynski, J.; Barakat, K.; Schriemer, D. C. Mol. Cell. Proteomics 2017, 16 (3), 428−437. (40) Lin, S.; Garcia, B. A. Examining Histone Posttranslational Modification Patterns by High-Resolution Mass Spectrometry, 1st ed.; Elsevier Inc., 2012; Vol. 512. (41) Deutsch, E. Proteomics 2008, 8 (14), 2776−2777. (42) Fabre, B.; Lambour, T.; Bouyssié, D.; Menneteau, T.; Monsarrat, B.; Burlet-Schiltz, O.; Bousquet-Dubouch, M. P. EuPa Open Proteomics 2014, 4, 82−86. (43) Steen, H.; Morrice, N.; Kirschner, M. W.; Jebanathirajah, J. A.; Rush, J. Mol. Cell. Proteomics 2006, 5 (1), 172−181. (44) Argentini, A.; Goeminne, L. J. E.; Verheggen, K.; Hulstaert, N.; Staes, A.; Clement, L.; Martens, L. Nat. Methods 2016, 13 (12), 964− 966. (45) Jaitly, N.; Mayampurath, A.; Littlefield, K.; Adkins, J. N.; Anderson, G. A.; Smith, R. D. Decon2LS: An Open-Source Software Package for Automated Processing and Visualization of High Resolution Mass Spectrometry Data. BMC Bioinf. 2009, 10, 87 (46) Sosnick, T. R.; Trewhella, J. Biochemistry 1992, 31 (35), 8329− 8335. (47) Dobson, C. M. Curr. Opin. Struct. Biol. 1992, 2 (1), 6−12. (48) Craig, P. O.; Gómez, G. E.; Ureta, D. B.; Caramelo, J. J.; Delfino, J. M. J. Mol. Biol. 2009, 394 (5), 982−993. (49) Percy, A. J.; Rey, M.; Burns, K. M.; Schriemer, D. C. Anal. Chim. Acta 2012, 721, 7−21. (50) Konermann, L.; Pan, J.; Liu, Y. H. Chem. Soc. Rev. 2011, 40 (3), 1224−1234. (51) Welch, B. L. Biometrika 1947, 34 (1−2), 28−35. (52) Fagerland, M. W.; Sandvik, L. Contemp. Clin. Trials 2009, 30 (5), 490−496.

This work was supported by an NSERC Discovery Grant 298351−2010 (DCS). DCS acknowledges the additional support of the Canada Foundation for Innovation. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We thank Thermo Scientific for providing the labeling reagent 3,3′-azibutan-1-ol, and Joey Sheff for preparing purified Eg5 stocks.



ABBREVIATIONS CAA, chloroacetamide; CL, covalent labeling; CLEAN, covalent label estimation and normalization; DIA, dataindependent acquisition; DTT, dithiothreitol; HEPES, 4-(2hydroxyethyl)-1-piperazineethanesulfonic acid; LC, liquid chromatography; MS, mass spectrometry; MS/MS, tandem mass spectrometry; MSS, Mass Spec Studio; Ni-NTA, nickel nitrilotriacetic acid; PIPES, piperazine-N,N′-bis(2-ethanesulfonic acid); PRM, parallel reaction monitoring; SDS-PAGE, sodium dodecyl sulfate−polyacrylamide gel electrophoresis



REFERENCES

(1) Chothia, C.; Janin, J. Nature 1975, 256, 705−708. (2) McLaughlin, S.; Aderem, A. Trends Biochem. Sci. 1995, 20 (7), 272−276. (3) Abdusamatov, A. A.; Yunusov, S. Y. Chem. Nat. Compd. 1968, 4 (6), 334−335. (4) Chen, J.; Rempel, D. L.; Gau, B. C.; Gross, M. L. J. Am. Chem. Soc. 2012, 134 (45), 18724−18731. (5) Vahidi, S.; Stocks, B. B.; Liaghati-Mobarhan, Y.; Konermann, L. Anal. Chem. 2013, 85 (18), 8618−8625. (6) Barrow, A. S.; Kaminska, R.; Moses, J. E.; Manzi, L.; Kleanthous, C.; Oldham, N. J.; Hopper, J. T. S.; Robinson, C. V. Angew. Chem., Int. Ed. 2017, 56 (47), 14873−14877. (7) Jones, L. M.; Sperry, J. B.; Carroll, J. A.; Gross, M. L. Anal. Chem. 2011, 83 (20), 7657−7661. (8) Moses, J. E.; Layfield, R.; Barrow, A. S.; Wright, T. G.; Scott, D.; Manzi, L.; Oldham, N. J. Nat. Commun. 2016, 7 (1), 13288. (9) Jumper, C. C.; Bomgarden, R.; Rogers, J.; Etienne, C.; Schriemer, D. C. Anal. Chem. 2012, 84 (10), 4411−4418. (10) Hartlmüller, C.; Göbl, C.; Madl, T. Angew. Chem., Int. Ed. 2016, 55 (39), 11970−11974. (11) Aprahamian, M. L.; Chea, E. E.; Jones, L. M.; Lindert, S. Anal. Chem. 2018, 90 (12), 7721−7729. (12) Schmidt, C.; Macpherson, J. A.; Lau, A. M.; Tan, K. W.; Fraternali, F.; Politis, A. Anal. Chem. 2017, 89 (3), 1459−1468. (13) Hubbell, W. L.; Altenbach, C. Curr. Opin. Struct. Biol. 1994, 4 (4), 566−573. (14) Mendoza, V. L.; Vachet, R. W. Mass Spectrom. Rev. 2009, 28 (5), 785−815. (15) Limpikirati, P.; Liu, T.; Vachet, R. W. Methods 2018, 144, 79− 93. (16) Mendoza, V. L.; Antwi, K.; Barón-Rodríguez, M. A.; Blanco, C.; Vachet, R. W. Biochemistry 2010, 49 (7), 1522−1532. (17) Blencowe, A.; Hayes, W. Soft Matter 2005, 1 (3), 178−205. (18) Ureta, D. B.; Craig, P. O.; Gómez, G. E.; Delfino, J. M. Biochemistry 2007, 46 (50), 14567−14577. (19) Das, J. Chem. Rev. 2011, 111 (8), 4405−4417. (20) Cheng, M.; Zhang, B.; Cui, W.; Gross, M. L. Angew. Chem., Int. Ed. 2017, 56 (45), 14007−14010. (21) Ziemianowicz, D. S.; Bomgarden, R.; Etienne, C.; Schriemer, D. C. J. Am. Soc. Mass Spectrom. 2017, 28 (10), 2011−2021. (22) Aye, T. T.; Low, T. Y.; Sze, S. K. Anal. Chem. 2005, 77 (18), 5814−5822. 8499

DOI: 10.1021/acs.analchem.9b01625 Anal. Chem. 2019, 91, 8492−8499