Quantitative Analysis of Protein Covalent Labeling Mass Spectrometry

3 days ago - These elements together provide for a robust structural interpretation of CL-MS/MS data, while minimizing the over-reporting of labeling ...
0 downloads 0 Views 513KB Size
Subscriber access provided by UNIV OF SOUTHERN INDIANA

Article

Quantitative Analysis of Protein Covalent Labeling Mass Spectrometry Data in the Mass Spec Studio Daniel S. Ziemianowicz, Vladimir Sarpe, and David C. Schriemer Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.9b01625 • Publication Date (Web): 03 Jun 2019 Downloaded from http://pubs.acs.org on June 3, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Quantitative Analysis of Protein Covalent Labeling Mass Spectrometry Data in the Mass Spec Studio Daniel S. Ziemianowicz†,‡, Vladimir Sarpe, David C. Schriemer*,†,‡,§ †Department ‡Robson

of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada T2N 4N1 DNA Science Centre, Arnie Charbonneau Cancer Institute, University of Calgary, Calgary, Alberta, Canada T2N

4N1 §Department of Chemistry, University of Calgary, Calgary, Alberta, Canada T2N 4N1 *Corresponding author: [email protected] ABSTRACT: Covalent labeling with mass spectrometry (CL-MS) provides a direct measure of the chemical and structural features of proteins, with the potential for resolution at the amino-acid level. Unfortunately, most applications of CL-MS are limited to narrowly-defined differential analyses, where small numbers of residues are compared between two or more protein states. Extending the utility of high-resolution CL-MS for structure-based applications requires more robust computational routines, and the development of methodology capable of reporting of labeling yield accurately. Here, we provide a substantial improvement in the analysis of CL-MS data with the development of an extended plug-in built within the Mass Spec Studio development framework (MSS-CLEAN). All elements of data analysis – from database search to siteresolved and normalized labeling output – are accommodated, as illustrated through the nonselective labeling of the human kinesin Eg5 with photoconverted 3,3’-azibutan-1-ol. In developing the new features within the CL-MS plug-in, we identified additional complexities associated with the application of CL reagents, arising primarily from digestion-induced bias in yield measurements and ambiguities in site localization. A strategy is presented involving the use of redundant site labeling data from overlapping peptides, the imputation of missing data, and a normalization routine to determine relative protection factors. These elements together provide for a robust structural interpretation of CL-MS/MS data, while minimizing the overreporting of labeling site resolution. Finally, to minimize bias, we recommend that digestion strategies for the generation of useful overlapping peptides involve the application of complementary enzymes that drive digestion to completion.

Proteins display a remarkable specificity in interacting with components of their molecular environment. These interactions are mediated by motifs predominantly on the protein surface, and are driven by shape complementarity and noncovalent interactions1,23. Identifying and mapping the topography of proteins is a central activity in the study of molecular mechanisms and in the determination of structure-function relationships. It can reveal structural dynamics4,5 and protein-protein6,7 or protein-ligand8,9 interactions. Ultimately, topographical data can even be applied to de novo structural modeling10,11 or integrative modelling of protein complexes12. Several biophysical methods have been applied to such activities (e.g. X-ray crystallography, nuclear magnetic resonance10 and electron paramagnetic resonance spectroscopies13). MS-based methods are very attractive, as there are few restrictions on the states under which they can be applied (individual proteins to whole proteomes). Among the many MS techniques available, covalent labelling (CL)-MS is particularly powerful as it can provide a topographical analysis that returns data on both the structural and chemical features of protein surfaces.

Topographical mapping in a MS workflow requires the labeling of residues with chemical reagents, the products of which are usually quantified upon proteolytic digestion and LC-MS/MS analysis14,15. Chemical reagents used in CL may be functional-group specific such as diethyl pyrocarbonate (targeting D,E residues) and sulfo-N-hydroxysuccinimide acetate (targeting K residues)14,16 or functional-group nonspecific such as hydroxyl radicals or carbenes15,17–19. While significant, the methodological advances in reagent development8,20–24, data collection and analysis9,25,26 have not yet transformed CL-MS into a widely used technique. CL data are usually interpreted on the premise that the labeling yield is proportional to the accessibility of a residue to the solvent or CL reagent. However, the actual perresidue labeling yield is influenced by other factors as well, such as chemical reactivity and the structural factors governing the partitioning of CL reagents near the residue in question. In its most straightforward form, CL-MS is used in a differential fashion by comparing the extent of residue modification between two protein states. In the presence and absence of a ligand, for example, one may infer which residues are involved in the interaction27. This

1 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 10

packages, focused primarily on structural mass spectrometry applications (e.g. HX-MS, XL-MS, CL-MS)36,37. The framework is a composite application where our component coupling strategy allows for the effective use of shared tools, as well as straightforward development of new extensions and new pluggable content. Mass Spec Studio is written in C# using the .NET framework. To allow for easy extensibility, the Studio employs MEF (Microsoft Extensibility Framework), AvalonDock and Prism. All analysis packages benefit from easy access to features in the core library, such as the conversion of all MS vendor datatypes (either natively or through Proteowizard38) into an optimized .mssdata binary data file format for efficient, live data retrieval. The wizard-style project setup allows for easy project building, while an intuitive user interface enables streamlined data inspection. We maintain and enhance the Studio framework for these applications and continue to add new reusable content. The CL module was refactored from v1 of the Mass Spec Studio and augmented with a wrapped version of MS-GF+ to search for variable modifications in user-supplied databases. The search capability was also supported with a “label maker” wizard, allowing for the definition and editing of reaction products and labeling rules. A new algorithm was generated for the selection and grouping of LC-MS features and MS/MS spectra for all labeled and non-labeled peptide states and combined with improved logic for assigning yields on a per-residue basis (see below). This algorithm is used to support the visualization of label yield mapped to sequence. File management was improved with a rebuilt project management user interface, and LC-MS/MS inspection is supported with improved spectral navigation tools. File management also supports the designation of control datasets, used in the generation and export of normalized output for further analysis and visualization (e.g. through Chimera). For the datasets described below, a database search was performed with all default parameters retained, except for the following: minPepLength = 5, maxPepLength = 40, minPepCharge = 2, maxPepCharge = 5, instrumentID = Q Exactive, massTolerance = 10 ppm, numModsPerPeptide = 2, usePercolator = False, CutoffQValue = 0.05. Additional Studio parameter values included: XIC selector massTolerance = 10 ppm, fragment mass selector MassTolerance = 20 ppm.

straightforward comparative approach avoids the need to determine the proportionality constant for each and every residue, as would be required for a direct topographical mapping exercise15,26. To take full advantage of CL-MS data for topographical mapping, we require a more involved normalization routine and software that better supports CL-MS workflows. We do not yet have a complete software solution for the accurate analysis of residue-resolved labeling data arising from the use of any CL reagent (specific or non-specific). Existing data analysis routines use workflows involving the concatenation of multiple tools not specifically designed for CL-MS, and/or costly commercial software7,28–30. For example, a workflow for the analysis of hydroxyl radical covalent labeling data has typically involved 1) raw data conversion and extraction of spectra (e.g. Rosetta Elucidator, msconvert), 2) searching the spectra with a proteomics database-search engine (e.g. MASCOT31, MSGF+32), 3) additional annotation of labeled peptides with scripts (e.g. Excel macros), 4) quantification of chromatograms for labeled and non-labeled peptides (e.g. MZmine33), and 5) manual normalization and/or statistical analysis of data (e.g. Excel, MATLAB). Some these steps are packaged as a set of algorithms in ProtMapMS34, but ProtMapMS remains specific to experiments using water radiolysis and does not allow for data inspection. In other activities, Jones et al. developed a custom configuration of the commercial software Proteome Discoverer and an Excel add-on for data analysis35. We also developed a solution using the Mass Spec Studio (MSS) framework v136, blending standalone search tools, peptide-level yield measurements and MS/MS fragment analysis. Here, we present MSS-CLEAN (Covalent Label Estimation And Normalization), a substantial enhancement of the CL-MS data analysis module, rebuilt in the Mass Spec Studio software development framework v237. Our module offers all the essential functions of a CL-MS analytical workflow in a user-friendly, self-contained software package with a simple graphical user interface. We embed database search algorithms and a set of new features that allow the CL-MS module to function as a complete solution. These include the application to any CL reagent (specific or non-specific), full support for chimericity, and an improved algorithm for the quantification of per-residue labeling8,9,26 to compensate for incomplete fragmentation and the effect of neutral loss. We propose and test a strategy for increasing residue coverage using complementary digests and test a strategy for normalizing labeling data in support of topographical mapping. Testing involved the analysis of CLMS data from the non-specific labeling of a human kinesin, Eg5, using photoconverted 3,3’-azibutan-1-ol.

Eg5 Protein Preparation The motor domain of Eg5 (1-386) was expressed and purified following a previously described procedure39. Briefly, plasmid DNA was transformed into competent BL21 E. coli. Cells were grown at 37 °C for approximately 7h. Expression was induced with the addition of IPTG, overnight at room temperature. The resulting cell pellet was lysed, and lysate collected and loaded onto Ni-NTA agarose gel and washed. Eg5 was eluted with an imidazole gradient in a gravity flow column. Collected fractions were analyzed with SDS-PAGE, buffer exchanged into 200 mM NaCl, 1mM MgCl2, 20 mM PIPES pH 7.4 buffer; purified protein concentrations were determined to be approximately 3.3 mg/mL by a BCA assay. For covalent

Experimental Section CL Plug-in Design The Mass Spec Studio framework (v2) was designed to capture and recycle basic and advanced MS-based signal processing functionality, calculators, algorithms and resources for reuse in the building of entirely new analysis

2 ACS Paragon Plus Environment

Page 3 of 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 1. CL-MS data analysis workflow in MSS-CLEAN. Experiments may include multiple proteins and any CL chemistry. Highresolution raw data (from any instrument) are processed in a single project and converted to mzML format for database searching (i.e. MS-GF+) and to the mssdata format for fast processing. MSS-CLEAN supports data validation at multiple levels. Finally, data can be exported in a flexible manner, including formats compatible with visualization (e.g. UCSF chimera).

elliptical beam shape to maximize photon deposition (≤80%) into the frozen sample. Replicates consisted of independently irradiated samples, each replicate consisted of two 1 μL irradiated samples to obtain enough material for repeated downstream analysis. Following irradiation and thawing, the protein samples were reduced, alkylated and digested as described above. Tryptic digestion was quenched with 0.5% formic acid; peptic digestion was quenched with 90 mM NaOH (ACS grade; VWR) at 95 °C for 5 minutes followed by acidification with 1.4% formic acid; final peptide concentration for each digest was estimated at 1 μM. Propionylation of Bovine Serum Albumin (BSA; Sigma Aldrich) was performed according to Lin & Garcia 201240. Labeled and non-labeled intact BSA were mixed at a 1:50 and 1:5 ratios followed by tryptic digestion as above.

labeling, PIPES was replaced with HEPES buffer at 20 mM, all other components remained equal.

Sample Preparation We used three classes of Eg5 motor domain samples for our labeling experiments: intact protein in neutral buffer conditions, denatured protein and digested protein (i.e. predigested state); the latter two required additional preparative steps. Denatured Eg5 motor domain was generated by incubating 20 µM of protein with 8 mM dithiothreitol (DTT) (Sigma Aldrich) in 40 mM ammonium bicarbonate (≥ 99%; Sigma Aldrich) for 30 min 56°C, followed by 80 mM chloroacetamide (CAA) (Sigma Aldrich) for 30 min in the dark at room temperature (19.5 °C). The pre-digest was prepared using denaturation as above followed by digestion with trypsin or pepsin as follows: trypsin (Thermo Scientific, sequencing grade) was added at an enzyme-to-substrate ratio of 1:20 and incubated for 4 hrs at 37 °C, or porcine pepsin (Sigma Aldrich) at a ratio of 1:20 and incubated for 1 hr at 37°C. Tryptic digests were quenched with 0.5% formic acid (ACS reagent grade ≥98%, Thermo Scientific) and extracted with a Hypersep C18 SpinTips (Thermo Scientific), whereas peptic digests were quenched by immediate solid-phase extraction. Peptides were eluted with 60% acetonitrile (LC-MS grade; Thermo Scientific) 0.1% trifluoroacetic acid (HPLC grade; Merck Millipore). Samples were lyophilized in a Savant SpeedVac (Thermo Scientific) and peptides resuspended according to downstream use (see below).

LC-MS/MS Data Acquisition Following the quenching of digestion, 2 µL of sample was injected via an nLC-1000 (Thermo Scientific) equipped with an Acclaim PepMap 100 guard column (75 μm × 2 cm C18, 3 μm particles, 100 Å; Thermo Scientific), and separated using a self-packed C18 HPLC column (75 μm × 15 cm, Kinetex 2.6 μm particles, Peptide XB C18, 100 Å; Phenomenex). Peptides were eluted with a 40 min 3 – 35% B gradient at 300 nL/min. Mobile phase A consisted of 0.1% v/v formic acid in 3% acetonitrile (LC-MS grade; Thermo Scientific), mobile phase B consisted of 0.1% v/v formic acid in 97% acetonitrile. Data were acquired on an LTQ Orbitrap Velos (Thermo Scientific) in OT/OT mode. Spray voltage was set to 2.5 kV and the transfer capillary temperature set to 285 °C. MS scans were acquired with a resolution of 60,000 and an m/z range from 300 to 2000. The top 12 most intense ions with a ≥ 2+ charge state and ≥ 2.0 x 104 signal intensity were selected for fragmentation via HCD with NCE = 35 and an isolation width of 2.0 Th. MS/MS data were acquired with a resolution of 7500 with an AGC target of 1.0 x 105. Data were acquired twice, once with a dynamic exclusion of 30 seconds and once without dynamic exclusion.

Carbene Labeling For labeling, samples were prepared with 10 µM protein or peptide solution and 10 mM 3,3’-azibutan-1-ol in 20 mM HEPES (≥ 99.5% titration; Sigma Aldrich) pH 7.4, and 50 mM NaCl (Biotechnology grade; Amresco). Samples were equilibrated at room temperature for 10 min. Approximately 1 μL sample volumes in a windowed 450 μm ID/670 μm o.d. fused silica capillary (Molex) were flash frozen in liquid nitrogen, followed by irradiation with 500 mJ of 355 nm light (5 x 100 mJ pulses, 10 ns pulse width at 10 Hz) from a Nd:YAG laser (YG 980; Quantel). The beam was focused with biconvex (f = 70 mm) and plano-concave cylindrical lenses (f = 75 mm) to produce a 0.8 mm x 7 mm

3 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 10

Figure 2. Workflow for the calculation of per-residue labeling yield from XIC’s and MS/MS data. (A) XICs showing non-labeled (grey) and labeled (blue) peptide. Peptide-level labeling yields are calculated using equation (1). Triangles correspond to MS/MS triggers. (B) Integrated MS/MS spectrum of all PSMs from a singly-labeled peptide, with non-labeled (black) and labeled (blue) fragment ions. (C) Fractional distribution of per-residue labeling calculated using y-ions from the integrated MS/MS spectrum using equation (2). M = full peptide sequence. (D) Per-residue labelling yields calculated according to equation (3).

Results and Discussion

support the practicality of these assumptions in many situations (e.g. label-free quantification42 and phosphorylation analysis43), but we recognize that yields are best described as estimations as a result. Yield calculations require the detection of XIC features and the assignment of label positions in the MS/MS spectra, but incomplete chromatographic resolution of positional isomers prevents a simple assignment of chromatographic features to single residue positions. Under the assumptions above, we first calculate the combined, fractional peptide labeling yield Ya, according to

Overview of Data Analysis Workflow The Mass Spec Studio provides a graphical wizard to guide the user through project setup (Figure 1). Projects are structured to allow for differential CL-MS analyses for footprinting of molecular interactions, or for standalone protein mapping exercises. Following project setup, raw data are converted to the mzML41 format for compatibility with the MS-GF+ search tool integrated into the Studio. We chose MS-GF+ as it is an optimized search tool for posttranslational modifications32. Peptide spectral matches (PSMs) are generated for both labeled and non-labeled peptides, using searches parameterized by the user. After processing, the user may evaluate all PSMs using an interactive graphical display (Figure S1A). To facilitate the rapid inspection and grouping of PSMs, the binary data are automatically retrieved, annotated and visualized, rather than drawing from the intermediate mzML formatted data. Accessing the source data allows for better labeling inspection and/or PSM rejection (Figure S1B). Ultimately, grouped PSMs for each labeled peptide are used to generate and display calculations of per-residue labeling yields. MSSCLEAN also exports normalized labeling yields and associated measures of significance. The computational approaches used for these processes will be described below. A custom-formatted text file of normalized yields can be exported for mapping data onto a protein structure in the software UCSF Chimera (via the “define attributes” tool) for visual interpretation of CL data. Finally, we provide the user with additional rich output, including absolute per-residue labeling yields with respect to the protein sequence(s), fractional per-residue labeling yields with respect to individual peptides, fractional per-residue yields, and peptide XIC peak areas delineated by replicate. Database search results, in the native search algorithm output format, are also available in the project data directory.

𝑛

𝑌𝑎 =

𝑚

∑𝑖 = 1∑𝑗 = 1𝑖𝑎𝑗 𝑎0 +

𝑛

𝑚

∑𝑖 = 1∑𝑗 = 1𝑖𝑎𝑗

(1)

where a0 represents the peak area of the unlabeled peptide a, aj represents the peak area of a labeled peptide up to m distinct features for the labeled state (Figure 2A, blue trace), and i represents the number of labels on a peptide. To generate an accurate measure of the retention time distribution for all labeled states, we implement the strategy of “matching between runs”44, where features are selected based on the union of PSMs between all replicates. In this manner, low-intensity features which may have avoided sampling in one run are detected (Figure 2A, blue triangles). We then generate a fully chimeric MS/MS spectrum that combines all PSMs spanning the retention time distribution of the modifications, and calculate the fractional yield f(yi) as

𝑓(𝑦𝑖) =

𝐼(𝑦𝑖)1 𝐼(𝑦𝑖)1 + 𝐼(𝑦𝑖)0

(2)

from the intensities I of singly-modified and non-modified peptide sequence fragment i. Here we restrict fractional yield calculations to PSMs of single-labeled peptides, as higher-order labeled peptides (≥2) are usually low abundance and add no additional information on label distribution. These fractional yields are then weighted by the fractional peptide labeling yield to generate the labeling yield L(r) of a residue i, according to

High-Resolution Yield Estimation Our strategy for residue level data analysis builds upon a method we described earlier9. Briefly, covalent labeling yields are calculated using both the peptide- and residuelevel information (Figure 2), under an assumption that modifications have only a modest effect on ionization and fragmentation efficiencies. Proteomics experiments

𝐿(𝑟𝑖) = 𝑌[𝑓(𝑦𝑖) ― 𝑓(𝑦𝑖 ― 1)].

(3)

By taking into account all PSMs for a single labeled peptide sequence, we represent all labeled peptide positional isomers. This strategy is limited only by the

4 ACS Paragon Plus Environment

Page 5 of 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

sampling rate in a data-dependent acquisition (DDA) method (Figure S2). In this study, stronger sampling was achieved using two runs with different exclusion settings, once with a dynamic exclusion (DE) of 30 seconds (approximately 1.5x chromatographic peak-width) and again with no DE. Previous work showed that although a standard DE time interval allows for a greater number of peptide identifications overall, improved MS/MS sampling of a set of positional isomers is achieved with DE effectively disabled21. Ultimately, targeted acquisition methods such as Parallel Reaction Monitoring (PRM) or comprehensive acquisition methods such as Data-Independent Acquisition (DIA) may provide more extensive sampling. Regardless, in the resulting integrated MS/MS spectrum, fragmentation may be insufficient to resolve labeling yield to a single residue site. In this situation, labeling yield is simply averaged among the ambiguous i.e. non-fragmented elements of sequence and highlighted as such. There are additional situations where the spectral input to the calculations are either insufficient to sustain the yield calculation or actually generate erroneous values. In one situation, missing values for a labeled/non-labeled pair will prevent the calculation of f(yi), requiring a data imputation strategy (Figure 3A). For example, given an integrated spectrum where the labeled fragment yi is observed but the corresponding non-labeled fragment is not, we impute the non-labeled fragment yi. To reduce imputation error, we require that the observed fragment peak has a signal-tonoise (S/N) ratio  10 (in a strategy based on Decon2LS45), and imputed fragment intensities are attributed a value of one-half of the calculated noise level i.e. N/2. In this manner, the error of imputed values is limited to ≤ 5%. The same strategy is applied in the opposite instance (i.e. non-labeled fragment observed, but no labeled fragment). Interpolation is frequent occurrence in our data set (78.2% of fragment ions). In the other situation, erroneous values may be returned when sequence ions generate neutral losses from labile products of the labeling reaction. We previously described this phenomenon for carbene insertion into acidic residues, creating labile esters9. CID-type fragmentation can result in minor neutral loss in such situations. In this case, the perresidue yield measurements will be falsely low. ETD or ECD fragmentation restores accuracy but these modes may not practical in many situations (e.g. low charge state ions). Neutral losses return f(yi) values that are lower than f(yi-1) values (Figure 3B). It is not physically possible for the fragment with i residues to have a lower yield than the fragment with i-1. Thus, we remove these inconsistent fractional yields from the calculation. In such instances, we then interpolate the value based on the slope between the two flanking residues. This strategy is effective, as most instances of strong neutral loss appear to depend upon particular gas phase conformations. That is, only isolated sequence ions demonstrate this and thus the flanking residues seem reliable. Interpolation of fractional yields due to neutral-loss effects occurred at a low frequency in our data set (3.1%). Our approach is a conservative one: if the proximal values are both one (for example), the method will

Figure 3. (A) Fragment intensity imputation depicted with a stylized MS/MS spectrum showing three missing data scenarios. Non-labeled fragment intensities in black, labeled fragments in blue. Red dashed line corresponds to noise level (relative intensity = 1); green dashed line corresponds to the S/N = 10 threshold for imputation. i) A non-labeled fragment with S/N > 10 and missing labeled fragment intensity imputed at 0.5N; ii) a non-labeled fragment with S/N < 10, missing labeled fragment intensity, thus discarded; iii) a labeled fragment with S/N > 10 and missing non-labeled fragment imputed at 0.5N. (B) Fractional CL yield interpolation. HCD data (orange) showing neutral loss at fragment 7 and missing data (cyan) at fragment 3. The corrected output of fractional yield (blue) includes the interpolation of values for 7 and 3, resulting in regions of averaged labeling yield (green dotted regions). (C) Per-residue labeling yields corresponding to the corrected fractional yield plot in B. Site-resolved yields in blue; ambiguously-localized, averaged yields in green.

impute the correct value. If they are non-zero, the method simply functions like a missing fragment detector, and the yields are averaged over the ambiguous sequence (Figure 3C).

5 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 10

in the non-labeled state. The additional peptides are the result of missed cleavages induced by labeling (see Supporting Information). Such missed cleavages skew the calculation of peptide-level yields, as the corresponding non-labeled peptide a0 is not present (see equation 1). We tested if a non-specific enzyme such as pepsin would allow for better residue coverage through the generation of more overlapping peptides, and thus higher accuracy. Interestingly, we obtained the opposite: fewer peptides in the labeled state (n = 68) vs. the non-labeled state (n = 163). We suspect that label-altered digestion kinetics is more problematic for a nonselective enzyme like pepsin. Unlike with tryptic digestion that is driven to completion, peptic digestion must be first optimized to generate a set of massmeasurable peptides; driving peptic proteolysis to completion would result in small peptides not easily detectable by standard LC-MS/MS proteomics methods. In the measureable set, we found that residue yields from pepsin digests were not consistent with tryptic digestions. Nevertheless, we maintain that parallel analyses with complementary proteases could improve yield measurement accuracy, provided that the proteases have limited specificity and are driven to completion (e.g. LysArginase, Glu-C, etc.).

Integration of multiple measurements To further reduce error in yield measurement for a given residue, we combine data from multiple peptides (Figure 4). When calculating final residue labeling yields across a protein sequence, the Studio combines residue labeling yields first for all overlapping peptides within a replicate, and then across replicates. We observed that intra-replicate variability of labeling yield is greater than the inter-replicate variability (Table S1). That is, a residue’s labeling yield varies to a greater extent between different peptide sequences than between the same peptide from different replicates. This phenomenon appears to arise

Normalization of Yield Measurements In order to map protein topography, the yield measurements require some form of normalization with an unstructured state. For any reagent used in covalent labeling experiments, the per-residue labeling yields are dependent on (1) the inherent chemical reactivity of a residue with primary sequence, (2) the local structural effects influencing reagent partitioning (e.g. structural motifs), and (3) the higher-order structural effects of the intact protein (e.g. buried vs exposed motifs). We examined two candidate strategies for normalization: a simple denatured state inspired by Sharp et al.26 using high temperature with disulfide bond reduction and alkylation, and a pre-digested state. Using the kinesin Eg5 protein as a test case, both surrogates for the unstructured state were labeled, and site labeling data determined.

Figure 4. The concept of increased sampling of residue CL yields through measurements of multiple overlapping peptides. Labeling yield of residue V42 is averaged across all peptides containing V42. Additionally, multiple sets of peptides may be generated from different proteases (red and blue).

because of the influence of labeling on protein digestion. The modification of cleavage site residues may inhibit proteolysis entirely, and even distal modifications can influence Km values for a given cleavage site. The result is an altered digestion pattern after labeling at the peptide level (see Supporting Information). However, if digestion is driven to completion, then combining the labeling data for a give residue across all peptides containing the residue should return a value closer to the actual yield. To test this idea, we used propionylation, a wellcontrolled chemistry that can completely label lysines40. Bovine serum albumin (BSA) was propionylated and then mixed with non-labeled BSA to simulate labeling at a 2% level and a 20% level. These mixtures were digested to completion with chymotrypsin and label incorporation determined for each peptide containing a lysine. The variance can be high, but accurate incorporation values are returned when the labeling levels for multiple peptides are averaged. For the 2% labeling experiment the mean and standard deviation was 2 ± 1% (n = 3) and for the 20% labeling experiment the mean and standard deviation was 19 ± 10 (n = 7). Thus, MSS-CLEAN first uses all overlapping peptides in a run to determine an average yield measurement for a residue, and then averages this value across replicate LC-MS/MS runs. We explored these matters further using digests of the labeled and non-labeled states of kinesin Eg5. We generated 55 tryptic peptides in the labeled state, and only 38 peptides

6 ACS Paragon Plus Environment

Page 7 of 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry yields ranged from 25 – 83% (Figure5A) and thus indicate effective reduction/removal of higher-order structure. Note that the pre-digest consists mainly of fully tryptic peptides. However, when digesting a protein post-labeling, modified residues can prevent proteolysis, resulting in a higher level of missed cleavage peptides compared to the pre-digest (e.g. peptide 5 and 6, Figure 5A). Combining all overlapping peptides (Figure 4) helps diminish any bias induced by this effect, at the residue-level. MSS-CLEAN was adapted to allow for either normalization strategy, and generate a calculated relative protection factor, P, for residue i:

The native and denatured states showed a range of labeling yields (0.4% - 98% and 0.1 – 81%, respectively) and the pattern of labeling was very similar (Figure 5A). It would appear that this level of denaturation for this particular protein did not lead to an irreversibly unstructured state: upon a return to labeling conditions, the protein reformed in whole or in part, or may even have retained structure throughout thermal denaturation46,47. To

𝐿(𝑟𝑖)𝑠

𝑃𝑖 = log 𝐿(𝑟 )

𝑖 𝑢

(4)

where L(ri) represents the labeling yield for residue i in the structured (s) and unstructured (u) states (Figure 5B). This protection factor is analogous to the protection factor calculated in hydrogen-deuterium exchange mass spectrometry49,50. Protection factors additionally have corresponding Welch’s t-test probabilities (i.e. p-values)51,52 calculated for additional filtering. In MSS-CLEAN we allow for a relative determination, to allow the user to reference any state: ligand-treated, denatured or digested. We suggest that the pre-digested form is the best representation of the denatured state and best reflects the chemical reactivity of the insertion. In MSS-CLEAN, if the pre-digested state is selected, labeling yields for terminal residues are excluded because the free carboxyl and amine groups will distort the estimate. Finally, the normalization state selected for use also incorporates full peptide redundancy in the calculation of residue yields, to improve the accuracy of the relative protection factor. We anticipate that the per-residue covalent labeling yields normalized using pre-digests would correlate with the degree of solvent exposure in some fashion. A complete analysis of this question is the subject of a future study, but here we simply note that 44% of the labeled residues detected in both the native and pre-digested states (99 out of 223) show a ≥20% decrease in labeling yield (with a probability of  ≥ 0.05) in the native state, indicating a fraction of residue burial that roughly correlates with residues with a < 0.20 fractional solvent-exposed surface area of Eg5 (50.6%; 177/350 residues).

Figure 5. (A) Mean labeling yield of tryptic peptides found in common between the native, denatured and pre-digested states (error bars SD, n=5). Peptides 5 and 6 correspond to tryptic missed cleavages (GVIIKGLEEITVHNKDEVYQILEK and IGKLNLVDLAGSENIGR). (B) Volcano plot of mean residuelevel protection factors and corresponding Welch’s t-test probabilities. Protection factors of the native i.e. structured state were calculated using either denatured or pre-digest unstructured states via equation (4). Normalized yield values < 0 indicate protected residues; values  0 indicate deprotected residues (n = 3 – 5).

Conclusions The natural variation in labeling yield for any given protein chemistry, coupled with the structural influence on yield, makes detection and confident determination of labeling a difficult undertaking. MS/MS-based methods will always bear some level of ambiguity as a result, which is particularly the case for non-specific labeling chemistries, such as carbenes or hydroxyl radicals. MSS-CLEAN provides a best estimate of yield from overlapping peptides, while faithfully representing site-determination ambiguity in the underlying data. MSS-CLEAN also provides the complete integration of all steps required for processing and visualizing CL-MS data. A methodological insight that emerges from our development activities relates to the

maintain an unstructured protein state via denaturation would require high temperatures during labeling, and/or the use of a chaotropic agent48. Unfortunately, such extreme conditions would also influence the reactivity of any labeling reagent and further complicate normalization. As an alternative means of normalization, we explored the labeling of a pre-digested protein, where we assume that all residue-level and local structural influences on labeling yields will be preserved and higher-order structure will be absent. Here, approximately one order of magnitude in the dynamic range of labeling yield was observed at the peptide and residue level. Labeling levels were also consistently higher than the structured state, for example the peptide

7 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

digestion, and the manner in which residue levels should be estimated. Labeling strongly influences digestion kinetics, so unless all peptides spanning a given residue can be quantitated from a complete digestion, determinations should only be considered estimates; individual peptides are likely biased. At a minimum, only the detection of all peptides in a complete digestion can recover accuracy, as all protein forms must be represented in the peptide set. Finally, the normalization strategy is an important element in any CL-MS experiment, and a useful method involves the labeling of a pre-digested protein. MSS-CLEAN provides a relative protection factor calculation, which will allow us to explore methods for accurate topographical mapping in subsequent studies, and ultimately will assist in refining structures determined by other methods.

Page 8 of 10

REFERENCES (1) (2)

(3)

(4)

(5)

ASSOCIATED CONTENT Supporting Information. This material is available free of charge on the ACS Publications website at DOI: Screenshots of MSS-CLEAN data analysis and validation views; comparison of DDA MS/MS sampling frequency between a 30s DE and no DE acquisition methods; table showing the interand intra-replicate variability of per-residue yield quantification; lists of peptides obtained from each protein state via pepsin and trypsin digestions. MSS-CLEAN is available free for download at www.msstudio.ca.

(6)

(7)

(8)

AUTHOR INFORMATION Corresponding Author *[email protected]

(9)

Author Contributions DSZ and DCS conceptualized the project and designed the Studio CL-MS module. DSZ performed all experiments and data analysis. VS coded the Studio software. DSZ and DCS wrote manuscript. All authors reviewed and edited the manuscript.

(10)

(11)

Funding Sources This work was supported by an NSERC Discovery Grant 298351-2010 (DCS). DCS acknowledges the additional support of the Canada Foundation for Innovation.

(12)

ACKNOWLEDGMENT We thank Thermo Scientific for providing the labeling reagent 3,3’-azibutan-1-ol, and Joey Sheff for preparing purified Eg5 stocks.

(13)

ABBREVIATIONS

(14)

CAA, chloroacetamide; CL, covalent labeling; CLEAN, covalent label estimation and normalization; DIA, data-independent acquisition; DTT, dithiothreitol; HEPES, 4-(2-hydroxyethyl)-1piperazineethanesulfonic acid; LC; liquid chromatography; MS, mass spectrometry; MS/MS, tandem mass spectrometry; MSS, Mass Spec Studio; Ni-NTA, nickel nitrilotriacetic acid; PIPES, piperazine-N,N′-bis(2-ethanesulfonic acid); PRM, parallel reaction monitoring; SDS-PAGE, sodium dodecyl sulfate– polyacrylamide gel electrophoresis;

(15)

(16)

Chothia, C.; Janin, J. Principles of Protein–protein Recognition. Nature 1975, 256, 705–708. McLaughlin, S.; Aderem, A. The Myristoyl-Electrostatic Switch: A Modulator of Reversible Protein-Membrane Interactions. Trends Biochem. Sci. 1995, 20 (7), 272–276. https://doi.org/10.1016/S0968-0004(00)89042-8. Abdusamatov, A. A.; Yunusov, S. Y. Mass Spectrometric Study of the Structure of Indicaine and Plantagonine. Chem. Nat. Compd. 1968, 4 (6), 334–335. https://doi.org/10.1007/BF00569836. Chen, J.; Rempel, D. L.; Gau, B. C.; Gross, M. L. Fast Photochemical Oxidation of Proteins and Mass Spectrometry Follow Submillisecond Protein Folding at the Amino-Acid Level. J. Am. Chem. Soc. 2012, 134 (45), 18724–18731. https://doi.org/10.1021/ja307606f. Vahidi, S.; Stocks, B. B.; Liaghati-Mobarhan, Y.; Konermann, L. Submillisecond Protein Folding Events Monitored by Rapid Mixing and Mass Spectrometry-Based Oxidative Labeling. Anal. Chem. 2013, 85 (18), 8618–8625. https://doi.org/10.1021/ac401148z. Barrow, A. S.; Kaminska, R.; Moses, J. E.; Manzi, L.; Kleanthous, C.; Oldham, N. J.; Hopper, J. T. S.; Robinson, C. V. Carbene Footprinting Reveals Binding Interfaces of a Multimeric Membrane-Spanning Protein. Angew. Chemie Int. Ed. 2017, 56 (47), 14873–14877. https://doi.org/10.1002/anie.201708254. Jones, L. M.; Sperry, J. B.; Carroll, J. A.; Gross, M. L. Fast Photochemical Oxidation of Proteins for Epitope Mapping. Anal. Chem. 2011, 83 (20), 7657–7661. https://doi.org/10.1021/ac2007366. Moses, J. E.; Layfield, R.; Barrow, A. S.; Wright, T. G.; Scott, D.; Manzi, L.; Oldham, N. J. Carbene Footprinting Accurately Maps Binding Sites in Protein–ligand and Protein–protein Interactions. Nat. Commun. 2016, 7 (1), 13288. https://doi.org/10.1038/ncomms13288. Jumper, C. C.; Bomgarden, R.; Rogers, J.; Etienne, C.; Schriemer, D. C. High-Resolution Mapping of Carbene-Based Protein Footprints. Anal. Chem. 2012, 84 (10), 4411–4418. https://doi.org/10.1021/ac300120z. Hartlmüller, C.; Göbl, C.; Madl, T. Prediction of Protein Structure Using Surface Accessibility Data. Angew. Chemie Int. Ed. 2016, 55 (39), 11970–11974. https://doi.org/10.1002/anie.201604788. Aprahamian, M. L.; Chea, E. E.; Jones, L. M.; Lindert, S. Rosetta Protein Structure Prediction from Hydroxyl Radical Protein Footprinting Mass Spectrometry Data. Anal. Chem. 2018, 90 (12), 7721–7729. https://doi.org/10.1021/acs.analchem.8b01624. Schmidt, C.; Macpherson, J. A.; Lau, A. M.; Tan, K. W.; Fraternali, F.; Politis, A. Surface Accessibility and Dynamics of Macromolecular Assemblies Probed by Covalent Labeling Mass Spectrometry and Integrative Modeling. Anal. Chem. 2017, 89 (3), 1459–1468. https://doi.org/10.1021/acs.analchem.6b02875. Hubbell, W. L.; Altenbach, C. Investigation of Structure and Dynamics in Membrane Proteins Using Site-Directed Spin Labeling. Curr. Opin. Struct. Biol. 1994, 4 (4), 566–573. https://doi.org/10.1016/S0959-440X(94)90219-4. Mendoza, V. L.; Vachet, R. W. Probing Protein Structure by Amino Acid-Specific Covalent Labeling and Mass Spectrometry. Mass Spectrom. Rev. 2009, 28 (5), 785–815. https://doi.org/10.1002/mas.20203. Limpikirati, P.; Liu, T.; Vachet, R. W. Covalent Labeling-Mass Spectrometry with Non-Specific Reagents for Studying Protein Structure and Interactions. Methods 2018, 144, 79– 93. https://doi.org/10.1016/j.ymeth.2018.04.002. Mendoza, V. L.; Antwi, K.; Barón-Rodríguez, M. A.; Blanco, C.; Vachet, R. W. Structure of the Preamyloid Dimer of β-2Microglobulin from Covalent Labeling and Mass

8 ACS Paragon Plus Environment

Page 9 of 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(17)

(18)

(19) (20)

(21)

(22)

(23)

(24)

(25)

(26)

(27)

(28)

(29)

(30)

(31)

Analytical Chemistry Spectrometry. Biochemistry 2010, 49 (7), 1522–1532. https://doi.org/10.1021/bi901748h. Blencowe, A.; Hayes, W. Development and Application of Diazirines in Biological and Synthetic Macromolecular Systems. Soft Matter 2005, 1 (3), 178–205. https://doi.org/10.1039/b501989c. Ureta, D. B.; Craig, P. O.; Gómez, G. E.; Delfino, J. M. Assessing Native and Non-Native Conformational States of a Protein by Methylene Carbene Labeling: The Case of Bacillus Licheniformis β-Lactamase. Biochemistry 2007, 46 (50), 14567–14577. https://doi.org/10.1021/bi7012867. Das, J. Aliphatic Diazirines as Photoaffinity Probes for Proteins: Recent Developments. Chem. Rev. 2011, 111 (8), 4405–4417. https://doi.org/10.1021/cr1002722. Cheng, M.; Zhang, B.; Cui, W.; Gross, M. L. Laser-Initiated Radical Trifluoromethylation of Peptides and Proteins: Application to Mass-Spectrometry-Based Protein Footprinting. Angew. Chemie - Int. Ed. 2017, 56 (45), 14007– 14010. https://doi.org/10.1002/anie.201706697. Ziemianowicz, D. S.; Bomgarden, R.; Etienne, C.; Schriemer, D. C. Amino Acid Insertion Frequencies Arising from Photoproducts Generated Using Aliphatic Diazirines. J. Am. Soc. Mass Spectrom. 2017, 28 (10), 2011–2021. https://doi.org/10.1007/s13361-017-1730-z. Aye, T. T.; Low, T. Y.; Sze, S. K. Nanosecond Laser-Induced Photochemical Oxidation Method for Protein Surface Mapping with Mass Spectrometry. Anal. Chem. 2005, 77 (18), 5814–5822. https://doi.org/10.1021/ac050353m. Rinas, A.; Mali, V. S.; Espino, J. A.; Jones, L. M. Development of a Microflow System for In-Cell Footprinting Coupled with Mass Spectrometry. Anal. Chem. 2016, 88 (20), 10052–10058. https://doi.org/10.1021/acs.analchem.6b02357. Riaz, M.; Misra, S. K.; Sharp, J. S. Towards High-Throughput Fast Photochemical Oxidation of Proteins: Quantifying Exposure in High Fluence Microtiter Plate Photolysis. Anal. Biochem. 2018, 561–562 (July), 32–36. https://doi.org/10.1016/j.ab.2018.09.014. Xie, B.; Sharp, J. S. Relative Quantification of Sites of Peptide and Protein Modification Using Size Exclusion Chromatography Coupled with Electron Transfer Dissociation. J. Am. Soc. Mass Spectrom. 2016, 27 (8), 1322– 1327. https://doi.org/10.1007/s13361-016-1403-3. Xie, B.; Sood, A.; Woods, R. J.; Sharp, J. S. Quantitative Protein Topography Measurements by High Resolution Hydroxyl Radical Protein Footprinting Enable Accurate Molecular Model Selection. Sci. Rep. 2017, 7 (1), 4552. https://doi.org/10.1038/s41598-017-04689-3. Wang, L.; Chance, M. R. Protein Footprinting Comes of Age: Mass Spectrometry for Biophysical Structure Assessment. Mol. Cell. Proteomics 2017, 16 (5), 706–716. https://doi.org/10.1074/mcp.o116.064386. Gau, B. C.; Chen, J.; Gross, M. L. Fast Photochemical Oxidation of Proteins for Comparing Solvent- Accessibility Changes Accompanying Protein Folding: Data Processing and Application to Barstar. Biochim. Biophys. Acta - Proteins Proteomics 2013, 1834 (6), 1230–1238. https://doi.org/10.1016/j.bbapap.2013.02.023. Borotto, N. B.; Vachet, R. W.; Graban, E. M.; Vaughan, R. C.; Zhou, Y.; Hollingsworth, S. R.; Hale, J. E. Investigating Therapeutic Protein Structure with Diethylpyrocarbonate Labeling and Mass Spectrometry. Anal. Chem. 2015, 87 (20), 10627–10634. https://doi.org/10.1021/acs.analchem.5b03180. Bern, M.; Kil, Y. J.; Becker, C. Byonic: Advanced Peptide and Protein Identification Software. Curr. Protoc. Bioinforma. 2012, 40 (1), 13.20.1-13.20.14. https://doi.org/10.1002/0471250953.bi1320s40. Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S. Probability-Based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data. Electrophoresis 1999, 20 (18), 3551–3567.

(32) (33)

(34)

(35)

(36)

(37)

(38)

(39)

(40)

(41) (42)

(43)

(44)

(45)

(46)

https://doi.org/10.1002/(SICI)15222683(19991201)20:183.0.CO;2-2. Kim, S.; Pevzner, P. A. MS-GF+ Makes Progress towards a Universal Database Search Tool for Proteomics. Nat. Commun. 2014, 5, 5277. https://doi.org/10.1038/ncomms6277. Pluskal, T.; Castillo, S.; Villar-Briones, A.; Orešič, M. MZmine 2: Modular Framework for Processing, Visualizing, and Analyzing Mass Spectrometry-Based Molecular Profile Data. BMC Bioinformatics 2010, 11, 395. https://doi.org/10.1186/1471-2105-11-395. Kaur, P.; Kiselar, J. G.; Chance, M. R. Integrated Algorithms for High-Throughput Examination of Covalently Labeled Biomolecules by Structural Mass Spectrometry. Anal. Chem. 2009, 81 (19), 8141–8149. https://doi.org/10.1021/ac9013644. Rinas, A.; Espino, J. A.; Jones, L. M. An Efficient Quantitation Strategy for Hydroxyl Radical-Mediated Protein Footprinting Using Proteome Discoverer. Anal. Bioanal. Chem. 2016, 408 (11), 3021–3031. https://doi.org/10.1007/s00216-0169369-3. Rey, M.; Schriemer, D. C.; Baker, C. A. H.; Burns, K. M.; van Dijk, M.; Bonvin, A. M. J. J.; Sarpe, V.; Buse, J.; Wordeman, L. Mass Spec Studio for Integrative Structural Biology. Structure 2014, 22 (10), 1538–1548. https://doi.org/10.1016/j.str.2014.08.013. Schryvers, A. B.; Ostan, N.; Schriemer, D. C.; Sarpe, V.; Rafiei, A.; Hepburn, M. High Sensitivity Crosslink Detection Coupled With Integrative Structure Modeling in the Mass Spec Studio. Mol. Cell. Proteomics 2016, 15 (9), 3071–3080. https://doi.org/10.1074/mcp.o116.058685. Chambers, M. C.; Maclean, B.; Burke, R.; Amodei, D.; Ruderman, D. L.; Neumann, S.; Gatto, L.; Fischer, B.; Pratt, B.; Egertson, J.; et al. A Cross-Platform Toolkit for Mass Spectrometry and Proteomics. Nat. Biotechnol. 2012, 30 (10), 918–920. https://doi.org/10.1038/nbt.2377. Sheff, J. G.; Farshidfar, F.; Bathe, O. F.; Kopciuk, K.; Gentile, F.; Tuszynski, J.; Barakat, K.; Schriemer, D. C. Novel Allosteric Pathway of Eg5 Regulation Identified through Multivariate Statistical Analysis of Hydrogen-Exchange Mass Spectrometry (HX-MS) Ligand Screening Data. Mol. Cell. Proteomics 2017, 16 (3), 428–437. https://doi.org/10.1074/mcp.m116.064246. Lin, S.; Garcia, B. A. Examining Histone Posttranslational Modification Patterns by High-Resolution Mass Spectrometry, 1st ed.; Elsevier Inc., 2012; Vol. 512. https://doi.org/10.1016/B978-0-12-391940-3.00001-9. Deutsch, E. MzML: A Single, Unifying Data Format for Mass Spectrometer Output. Proteomics 2008, 8 (14), 2776–2777. https://doi.org/10.1002/pmic.200890049. Fabre, B.; Lambour, T.; Bouyssié, D.; Menneteau, T.; Monsarrat, B.; Burlet-Schiltz, O.; Bousquet-Dubouch, M. P. Comparison of Label-Free Quantification Methods for the Determination of Protein Complexes Subunits Stoichiometry. EuPA Open Proteomics 2014, 4, 82–86. https://doi.org/10.1016/j.euprot.2014.06.001. Steen, H.; Morrice, N.; Kirschner, M. W.; Jebanathirajah, J. A.; Rush, J. Phosphorylation Analysis by Mass Spectrometry. Mol. Cell. Proteomics 2005, 5 (1), 172–181. https://doi.org/10.1074/mcp.m500135-mcp200. Argentini, A.; Goeminne, L. J. E.; Verheggen, K.; Hulstaert, N.; Staes, A.; Clement, L.; Martens, L. MoFF: A Robust and Automated Approach to Extract Peptide Ion Intensities. Nat. Methods 2016, 13 (12), 964–966. https://doi.org/10.1038/nmeth.4075. Jaitly, N.; Mayampurath, A.; Littlefield, K.; Adkins, J. N.; Anderson, G. A.; Smith, R. D. Decon2LS: An Open-Source Software Package for Automated Processing and Visualization of High Resolution Mass Spectrometry Data. BMC Bioinformatics 2009, 10. https://doi.org/10.1186/1471-2105-10-87. Sosnick, T. R.; Trewhella, J. Denatured States of Ribonuclease

9 ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(47)

(48)

(49)

A Have Compact Dimensions and Residual Secondary Structure. Biochemistry 1992, 31 (35), 8329–8335. https://doi.org/10.1021/bi00150a029. Dobson, C. M. Unfolded Proteins, Compact States and Molten Globules. Current Opinion in Structural Biology 1992, 2:6-12. Curr. Opin. Struct. Biol. 1992, 2 (1), 6–12. https://doi.org/10.1016/0959-440X(92)90169-8. Craig, P. O.; Gómez, G. E.; Ureta, D. B.; Caramelo, J. J.; Delfino, J. M. Experimentally Approaching the Solvent-Accessible Surface Area of a Protein: Insights into the Acid Molten Globule of Bovine α-Lactalbumin. J. Mol. Biol. 2009, 394 (5), 982–993. https://doi.org/10.1016/j.jmb.2009.09.058. Percy, A. J.; Rey, M.; Burns, K. M.; Schriemer, D. C. Probing Protein Interactions with Hydrogen/Deuterium Exchange and Mass Spectrometry-A Review. Anal. Chim. Acta 2012, 721,

(50)

(51)

(52)

Page 10 of 10 7–21. https://doi.org/10.1016/j.aca.2012.01.037. Konermann, L.; Pan, J.; Liu, Y. H. Hydrogen Exchange Mass Spectrometry for Studying Protein Structure and Dynamics. Chem. Soc. Rev. 2011, 40 (3), 1224–1234. https://doi.org/10.1039/c0cs00113a. Welch, B. L. The Generalisation of Student’s Problems When Several Different Population Variances Are Involved. Biometrika 1947, 34 (1–2), 28–35. https://doi.org/10.1093/biomet/34.1-2.28. Fagerland, M. W.; Sandvik, L. Performance of Five TwoSample Location Tests for Skewed Distributions with Unequal Variances. Contemp. Clin. Trials 2009, 30 (5), 490–496. https://doi.org/10.1016/j.cct.2009.06.007.

10 ACS Paragon Plus Environment