Massign: An Assignment Strategy for Maximizing Information from the

Feb 22, 2012 - Game-Theory-Based Search Engine to Automate the Mass Assignment in Complex Native Electrospray Mass Spectra. Yao-Hsin Tseng , Charlotte...
1 downloads 10 Views 5MB Size
Article pubs.acs.org/ac

Massign: An Assignment Strategy for Maximizing Information from the Mass Spectra of Heterogeneous Protein Assemblies Nina Morgner* and Carol V. Robinson Department of Chemistry, Physical and Theoretical Chemistry Laboratory, University of Oxford, Oxford OX1 3TA, U.K. S Supporting Information *

ABSTRACT: Electrospray ionization mass spectrometry (ESI-MS) has evolved into a powerful adjunct for structural biology, helping to unravel the quaternary structure of protein complexes. Increasing interest has led to the study of ever larger multicomponent systems. Investigating these large complexes with ESI has meant that progressively more complicated mass spectra have been recorded. Correct assignment of these spectra is essential to maximize the information content available. Here we present a new assignment strategy and a supporting software package that allows the investigation of large heterogeneous systems, previously beyond the scope of full spectral assignment due to their complexity. The strategy involves two parts. The first includes a peak fitting routine to determine charge state distributions and consequently the masses of the various subcomplexes. The second module distinguishes between solution and gas phase products depending on their mass to charge ratio and assigns these charge states to different subunit combinations. These fitting and assignment routines contain many internal checks for consistency and reveal mass shifts, dependent upon desolvation conditions and small molecule binding. Using a rotary ATPase as a working example, we show how this assignment strategy is capable of determining the stoichiometry and interactions of the 8 different subunits within this 29-subunit assembly.

I

complexes and their subcomplexes increases with the number of subunits and therewith potential subunit combinations. For mass spectra of protein mixtures a possible assignment approach is spectral deconvolution using MaxEnt.5,6 This approach uses a maximum entropy algorithm to determine the masses in a spectrum. Excellent results can be achieved for small molecules and individual proteins where the peak shape is limited by the mass spectral resolution. For mass spectra that are complex and/or contain multiple different species with many overlapping charge states, this approach becomes problematic especially for large protein complexes, for which wide mass ranges have to be covered and peaks are often broadened due to incomplete desolvation. A software algorithm, SUMMIT,7 can be used for assignment of complexes, subcomplexes, and interaction networks from masses. This program is important for defining interaction networks since it finds the shortest path that connects protein subunits within all subcomplexes. This algorithm works well to establish small subcomplexes with an established list of subunit masses and returns a mathematical weighting for each of the networks deduced. The program does not include a mass spectral assignment module. An approach to simplify the interpretation of mass spectra of a distribution of oligomers of two or more constituting subunits, is the simulation of charge

n recent years electrospray ionization mass spectrometry (ESI-MS) has been developed to overcome instrumental limitations and to allow the investigation of increasingly complex multiprotein assemblies. In order to achieve this, effort has been made to improve time-of-flight detectors, modify electrospray sources, and customize quadrupoles to allow for greater transmission of high mass assemblies.1−4 However, the development of supporting software for the investigation of high mass complexes has not been pursued with the same intensity. The process of data analysis in electrospray mass spectra involves two steps: (A) The series of charge states have to be identified from many overlapping peaks in the mass spectrum. For each series, the correct charge states have to be assigned and the resulting mass of the complex has to be established. (B) The masses have to be assigned to the correct subcomplexes. Commercial MS software was developed to investigate small proteins or peptides. It is also well-suited for the identification of charge state series of small protein complexes, providing the charge state series are sufficiently separated. This will likely be the case for relatively small complexes containing only a few subunits. Assignment of those complexes will be possible with Tandem-MS, which can dissociate and therewith reveal the identity of one or two subunits. For larger complexes, the knowledge of one or two subunits will not be sufficient. The challenge of assigning © 2012 American Chemical Society

Received: January 6, 2012 Accepted: February 21, 2012 Published: February 22, 2012 2939

dx.doi.org/10.1021/ac300056a | Anal. Chem. 2012, 84, 2939−2948

Analytical Chemistry

Article

Figure 1. (A) In a heterogeneous mass spectrum, it is not always clear which peaks belong to a particular series, especially if they are of low abundance. (B) Upon simulation of the five dominant peak series (in red: sum of previously simulated component spectra), the low abundant peaks become apparent. (C) For the selected peak (green arrow) charge states, 58, 59, 60 are tested (blue, green, red cursors respectively). The deviation of theoretical values (cursor) and experimental peak top increases with distance to the selected peak, unless the correct charge state is chosen. (D) A difference spectrum is then used to simulate this series (see the Supporting Information (SI) for details). (E) A new simulation optimizes the overall fit.

state distributions, using a binomial8,9 or multinomial (SOMMS software10) subunit distribution, respectively. The software then allows quantitation of the candidate complexes and therewith deviation from the statistical distribution of the two or more subunits within the oligomers. Based in principle on these two approaches is the software CHAMP,11 which varies the candidate distribution to find the lowest χ2 between a calculated and an experimental spectrum. With a similar goal a likelihood based algorithm has been used to estimate relative abundance of nucleotide binding.12 For these simulations, the user needs prior knowledge of the composition of potential macromolecular complexes. For large heterogeneous systems such as the rotary ATPase exemplified here, the identity of the complexes in solution was unknown at the time of study. The aim of the investigation was therefore their complete and unambiguous assignment. The approaches mentioned above will not be applicable in these cases. Consequently, the bottleneck for experiments, especially of large dynamic complexes, is often not in obtaining the spectra but in the analysis of the data itself. For single homogeneous species mass spectra of high mass complexes are no more complicated than those of lower mass. Nevertheless these are not always of high information content.

For structural investigation heteromeric assemblies are better suited to MS analysis than homomeric ones since they can be probed for the connectivity of the constituting proteins by virtue of their distinct masses. This can be approached by controlled disassembly of the complexes, to generate subcomplexes, either in solution or in the gas phase. The ability to distinguish different (sub)complexes in a heterogeneous analyt distribution is a strength of MS. One aggravating factor, besides increasing heterogeneity, is peak broadening due to attachment of buffer/water molecules in noncovalent mass spectrometry (MS). A second factor is the overlap of the charge state series in congested spectra. While smaller proteins or peptides are usually present with only a few charge states, high mass complexes may give rise to series containing 10−15 charge states. With several charge state series in the same m/z region together with peak broadening due to adducts, the likelihood of overlapping peaks is very high. These contributory factors mean that it is often difficult to decide which peaks belong to which series (Figure 1A). In addition to identifying the series themselves, the second issue is the correct determination of the charge states.13−15 For high masses, the number of charges present on the complexes increases, which shifts peaks in a series closer together on the 2940

dx.doi.org/10.1021/ac300056a | Anal. Chem. 2012, 84, 2939−2948

Analytical Chemistry

Article

m/z scale. Distinguishing between charge state five or six for a selected peak in a series is relatively straightforward but it is much more challenging if a charge state is 59 or 60. The third challenge presents itself once all series in a spectrum have been determined: assigning the composition of a complex/subcomplex to the mass determined for a particular peak series. The assignment process becomes increasingly complex as the number of subunits increases. While for small complexes of two or three subunits every subcomplex can be assigned unambiguously manually, this is not possible for complexes which could contain combination of ∼20 subunits. To assign the mass spectra of for example a rotary ATPase, it is often necessary to include knowledge about the complex gained in previous research as well as to take into account charge distributions. For example bimodal charge distributions may point toward interesting complex characteristics. A separate issue, that is very hard to address with commercial software, is the quantitative analysis of mass spectral components, especially if they not completely resolved. Here we present an assignment strategy for the analysis of complicated spectra from heterogeneous, high mass complexes. This strategy is backed by a software package, designed to support the two crucial steps of assignment: identification of charge state series and, hence, determination of their masses and their subsequent assignment to complexes.

a line spectrum, prior to assignment, and selects for every peak the charge state charge series which fits best (see SI Figure S-2), the semiautomatic routine allows the user to evaluate the best fit by comparing theoretical peak positions with the experimental spectrum. Since the deviation between theoretical peak positions of different possible charge state distributions increases at either end of the charge state distribution, the correct assignment is readily identified in comparison with the experimental peak positions, even for broad peaks (Figure 1C). The charge state series of the different components determined in this way can then be simulated. The simulation of each component in the mass spectrum will be a series of peaks whose intensities follow a Gaussian distribution, to mimic the statistical distribution of the charges. The peak shape used for simulation of the individual peaks will also be Gaussian unless the peaks are distorted by small molecule binding (see below). It is important that overlapping charge state series are considered simultaneously for the simulation process to avoid overrepresentation of the ion signal where peaks overlap. These simulated spectra can then be displayed simultaneously/ overlaid with the experimental spectrum for further inspection. This approach has the advantage that peaks, which were completely or partially overlapping and/or low abundant become apparent as indicated in Figure 1B. Simulation of the component spectra allows use of the whole range of charge states present in the spectrum to determine the correct charge distribution and mass. Inclusion of more charge states increases confidence that the correct charge state and hence mass has been determined. The second advantage is that more realistic mass errors are derived. The approach taken here is to determine the correct charge series and then fit each peak to a Gaussian, which determines the midpoint of each peak in the entire charge state series. The standard deviation of the masses of the complex derived from each charge state is then used as mass error. Figure 2A shows the steps involved for simulating a spectrum. In brief: (1) Smooth and linearize (SI Figure S-1). Spectra are smoothed to reduce noise and transformed to a linear x-axis. (2) Combine spectra. Spectra can be combined to reduce noise. (3) Subtract background. (4) Find mass series. This can be done in an automated or semiautomatic way, depending on the quality of the spectra (5) Simulate component spectra (SI Figure S-3). Component spectra are simulated individually. The parameters are optimized to minimize deviation of the sum of the simulations from the experimental data. This can be done for up to five components in parallel. Further components can be fit in a second fit round. (6) Obtained spectra can be overlaid with the experimental spectrum, to make visible which parts on the spectrum are not yet accounted for. (7) Repeat steps 4−6 until all components are simulated. The process is described in more detail in the SI. The output from this first part is the list of masses/charge distributions found in the spectrum, the component spectra and the overall simulation. These are used as input for the second part, where the aim is to assign (sub)complexes to the components identified by their masses.



SOFTWARE/ASSIGNMENT WORKFLOW The approach taken in our lab to assign and identify complexes and subcomplexes is a two-step process: (1) Identif y all peak series in the spectrum and determine charges/masses. This is done by simulation of the component mass spectra for all complexes/proteins present in a spectrum, so that the sum of these component spectra resembles most closely the experimental spectrum. (2) Assign complexes. The output from part 1 can then be used together with knowledge of the subunit composition/connectivitys of the complex to determine the identity of the (sub)complexes appearing in the mass spectrum.



IDENTIFY ALL PEAK SERIES To determine all charge state series present, Massign offers an automatic as well as a semiautomatic approach (detailed description in the SI). It is important to point out that protein complexes usually carry with them in the gas phase many buffer molecules giving rise to rather broad peaks, with the mass of the naked protein being represented rather by the onset of the peaks, while the peak tops correspond to the complex with adducts attached. A common approach is therefore to determine the complex mass via the use of the onset of the peak. This can lead to difficulties, since determination of the onset of a peak is more operator dependent than the peak top. The strategy used here is therefore aimed at the masses determined from the peak topsthe additional mass of the adducts is then taken into account at a later stage, during assignment. For both (automatic and semiautomatic) routines, the approach is similar in that one peak of the series is chosen (automatic: the most abundant) and then the charge state of this peak is varied and the theoretical charge state distribution compared with all the peaks in the spectrum. While the automatic routine transforms the experimental spectrum into 2941

dx.doi.org/10.1021/ac300056a | Anal. Chem. 2012, 84, 2939−2948

Analytical Chemistry

Article

The optimization parameter is termed the “broadening factor”. The software can determine the broadening factor, which optimizes the agreement of the experimental spectrum and the simulation (for details see SI) via minimization of the rootmean-square deviation (rmsd). If the user recognizes the need, the broadening factors for the different components can be varied independently. This is however rarely necessary, even if intuitively one might expect differences in desolvation of solution complexes and those formed via collision induced dissociation (CID). Nevertheless complexes observed within one spectrum under the same experimental conditions will have experienced the same desolvating conditions, independent, if these led to CID or not.



ASSIGNING COMPLEXES Once the masses and charges of the components in a mass spectrum are determined, the user has to assign these to the correct complexes. Knowing the mass of a complex will provide sufficient information to distinguish between a monomer and a dimer of a known protein or to establish whether or not a ligand is bound to a complex. Determining the composition of a complex with a range of subunits of unknown stoichiometry is much more challenging. The approach taken here is to determine a list of all mathematically possible complexes, based on the masses of the subunits (preferably masses determined by LCMS or seen in isolation in an ESI spectrum). If only genome sequence data is available and post-translational modifications are unknown, the user should keep in mind a possible systematic mass error in the assignment process. The list of potential assignments, which can have several hundred entries, is then reduced by ruling out those which are known to be biologically impossible, due to compositional data from proteomics, cross-linking experiments, tandem-MS, etc. A list of rules is compiled such that complexes that do not fulfill the known requirements are excluded. (workflow as in Figure 2B, details on all subroutines in the SI). A very important feature of this program is the distinction between complexes formed in solution and those formed via CID. This is achieved on the basis of their mass to charge correlation. Complexes that result from CID will have lost a higher proportion of the overall charge and appear at lower charge values on a mass/charge plot than the same complexes formed in solution (see inset in Figure 4B). If complexes lose a subunit via CID, in general this process does not go to completion and as such 100% of the complex will not dissociate. Some of the original complex will remain. As a consequence the complex will be present as both the precursor and product complex. Complexes therefore have CID relationships which can be established, even if the identity of the complexes is as yet unknown. In a comparable fashion, different solution complexes emerge from each other by losing subunits or subcomplexes. These relationships can be established likewise. Differences between complexes can be used as restraints for the assignment (e.g., a subunit must/must not be present in the precursor/product complex). In many cases, some restraints can be based on previous research. For example, a subcomplex of the intacting complex may have been crystallized or cross-linking experiments may have revealed neighboring relationships between two proteins. These rules, as well as the maximum copy number of each protein subunit (if known), can be used as input into the assignment module. The increase of the measured mass compared to the mass of the naked complex is another important parameter that must be

Figure 2. Assignment of mass spectra as a two-step process: (A) identify all mass series in the spectrum (blue lines) and (B) assign complexes to masses and develop dissociation network (green lines). The steps within the software package are boxed with dotted lines. Subprograms are explained in detail in the SI.



PEAK BROADENING FACTOR Simulating the spectra in the manner described above will be sufficient for many cases, where spectra are well-resolved and qualitative rather than quantitative analysis is required. Nevertheless it is not always possible to obtain well-resolved spectra. Peak broadening is commonly experienced as a result of water/buffer molecules, which stay attached to the complexes, particularly when efforts to desolvate them result in the dissociation of the complex. The main problem is the asymmetry of broadened peaks. The trailing edge of one peak can mask an additional peak, or add to the intensity of the second peak. The approach described here allows us to add adducts to the peak simulation. This is done by replacing the trailing edge of every simulated peak by a broadened version of the same peak (Figure 3D). 2942

dx.doi.org/10.1021/ac300056a | Anal. Chem. 2012, 84, 2939−2948

Analytical Chemistry

Article

Figure 3. (A) Stability of the electrospray plume varying and leading to differences in resolution for complexes even from the same solution or nanoflow needle. Well-resolved spectra can be represented with Gaussian peak fits (B). Three charge state series (blue, green, yellow) which sum up to the overall simulation (red), which mirrors the experimental spectrum accurately. For less well-resolved spectra, (C) the sum of simulations deduced this way is not a good representation of the spectrum. (C: top) Adduct attachment broadening all peaks toward high mass, which can be accounted for by attaching additional mass to each simulated peak (D) Failure to correct for the adduct mass which can lead to an overestimation of species, present in the trailing edge of a less well-resolved peak. (E(i)) Contribution of blue series overestimated compared to green series. This effect is corrected in ii. Optimizing a spectrum simulation (C: bottom) by applying the same “broadening factor” to each peak allows for a much improved spectrum simulation.

do not represent a collection of random complexes but, rather, will be related to each other according to gas phase as well as solution dissociation patterns. So can a stable subcomplex that was found to dissociate in a pairwise manner in solution in one case be expected to show this behavior for all applicable complexes. (Example: if we observe solution complexes A2B2CDE and ABCDE, we can conclude AB is readily lost and lost in pairwise interaction. If we then as well observe A2B2CD, we would expect the same rule to be applicable and see ABCD). Equally a subunit that readily dissociates under CID in one complex can be expected to dissociate from all solution phase complexes containing this subunit. If we observe the solution complex ABCD and CID complex ABC (loss of D), the observation of solution complex BCD would suggest the existence of the BC complex, formed by CID. These patterns are defined during the assignment process and give insights into the behavior of the complexes as well as aiding the assignment process, by establishing a self-consistent set of complexes, which give rise to the observed spectrum. This is explained in more detail using the assignment of the rotary ATPase from E. hirae as a worked example.

considered during mass determination. The measured mass increases proportionally with the size of the complex, due to attachment of buffer and water molecules. For a complex of several hundred kilodalton, this mass shift can easily be ∼2000 Da. This number cannot be treated as an error since such a sizable error would lead to too great an ambiguity in assignment. This mass shift follows certain rules. The extent of attachment depends on the surface area of a complex, which in turn correlates to the mass of the complex. All complexes within one spectrum will experience the same conditions in solution (buffer conditions) and in the gas phase (desolvation process). Their mass shifts therefore scale linearly with the surface area of the protein complexes to which adducts can attach. The overall shape of large complexes is to a rough approximation globular which correlates the mass shift therewith with the mass of the complex (see inset in Figure 5). This correlation can still be of use for real complexes, which are usually not globular. Assignment of one or two complexes in a spectrum therefore defines the mass shifts that have to be expected during the assignment of further complexes. In general subcomplexes of very high as well as of low mass will be the easiest to assign. A subcomplex of approximately half of the mass of the intact complex will have a much larger list of potential subunit combinations compared to a complex, which has lost only one or two subunits. Consequently, if assignment of the complexes proves difficult with the default mass shift of 2 kDa, these “easy to assign” complexes are a logical choice as starting point in the assignment process and then define the mass shifts to be taken into account for other complexes in the mass spectrum. Reducing the potential subunit combinations for each complex will often leave very few possibilities. The next step is to evaluate the likelihood of each of these possibilities to be the correct one. The subcomplexes forming from one complex



APPLICATION EXAMPLES AND DISCUSSION

The main aim of this study was the development of a strategy for the assignment of complicate mass spectra of multicomponent protein assemblies. This had to be supported by a specially designed software package Massign. The suitability of this approach has been tested on several data sets and Massign has been proven to be extremely valuable.



ASSIGNMENT EXAMPLE An example of the assignment process is given for a spectrum of a rotary ATPase from E. hirae, reported recently.16 This 2943

dx.doi.org/10.1021/ac300056a | Anal. Chem. 2012, 84, 2939−2948

Analytical Chemistry

Article

Figure 4. (A) Components of the mass spectrum which are simulated. Masses and charge states are determined. Complexes are not yet identified, and therefore, numbered 1−18. (B) Schematic used to derive the values for SI Table S-1. The complexes separate by their charge/mass ratios into solution phase (green) and CID (orange) complexes (see inset). A potential connection network between the complexes observed is constructed. All possible subunit combinations which could account for the mass difference between two complexes are calculated with a mass tolerance set to ±1000 Da. The deviation between the theoretical subunit mass and the observed mass difference is shown in every case (black/gray). Those subunit combinations which are possible theoretically, but would not allow for a self-consistent set of complexes within the established rules of stoichiometry or connectivity (as listed in SI Table S-3), are greyed out, leaving the possible ones (black). Example: the only candidate for the mass difference of complexes 5 and 6 is subunit G. The calculated mass difference is 213 Da greater than the theoretical protein mass. Possible candidates for mass difference between complexes 6 and 7 are ΔD or ΔFG. If complex 6 is derived from complex 5 via loss of subunit G, loss of ΔFG would break the stoichiometry rule since the complex cannot lose more than one G. Loss of D is the only remaining option. ΔFG is therefore shown in gray. 2944

dx.doi.org/10.1021/ac300056a | Anal. Chem. 2012, 84, 2939−2948

Analytical Chemistry

Article

Figure 5. (inset) Surface area per mass, calculated for globular proteins. The dotted lines indicate the area of interest for the assignment in the main panel. The mass shift of a complex stems from adducts attached to the surface, which correlates the mass shift with the mass. (main graph) Assignment of E. hirae ATPase complexes. The default mass shift for assignment is 0−2000 Da (blue area). The first four complexes assigned in Figure 6 (red crosses) allow the user to estimate the range for the mass shifts to be expected for all complexes in this mass spectrum. The optimized mass shift range is shown in green and is used to eliminate potential assignments for the remaining complexes, which lie outside this range. Potential assignments, inside this range will be very few, usually only one. This complex can then be considered to have the correct assignment (blue crosses). The error bars shown are the errors determined for the masses, since the precision of the complex mass measurement will affect the range for the mass shift that has to be expected.

ATPase has nine different subunits: A, B, C, D, E, F, G, I, and K. ATPases/synthases are large membrane complexes, consisting of two parts. The head is composed of three subunits A and B each, which alternate around the 6-membered ring. The second part includes a species dependent membrane embedded rotor ring, which transports protons. Prior to our investigation, the number of K subunits of the E. hirae rotor was ambiguous. It had been reported as 7 (EM) as well as 10 (X-ray crystallography).17,18 The peripheral stalk in this case consists of two subunits E and F. Due to the three-fold design of the head 1, 2, or maximally 3 stalks are present in ATPases.19−21 This ATPase was thought to have only one stalk, but this was not confirmed.22 So for our assignment, the number of stalks had to be varied between 1 and 3. Summing all possible combinations given the restriction in the head, peripheral stalks, and membrane ring, the intact complex could contain between 19 and 26 proteins. Fitting of the peaks returned the component mass spectra (see Figure 4A), the masses, and charge distributions of the subunits and subcomplexes in the spectrum, listed in SI Table S-1. Plotting masses obtained versus the charge states shows separation into two groups (inset Figure 4B)those species that group at lower charges are CID products, while the others are subcomplexes which form in solution. The solution complexes will form by dissociation or loss of subunits/subcomplexes, while CID complexes in almost all cases will form from one of the solution complexes via loss of a single subunit which is sometimes followed by the loss of a second subunit. Therefore the next step is to determine CID relationships as well as relationships between solution complexes. For the E. hirae ATPase a set of relations were identified, (listed in SI Tables S-2 and S-3). These relationships can be transferred into a connection network, of solution and CID complexes, as shown in Figure 4B. Interesting to note: While the complex can lose E or F

(stalk proteins) in CID, the solution complexes show that a stalk is lost only as a pairwise interaction (E and F together). Three complexes show successively a mass difference consistent with the stalk (masses of 3−4 and 4−5), which confirms the existence of at least two stalks. These findings can be used as input to assign the subcomplexes to the masses. It is not possible to show the complete assignment process for all E. hirae ATPase subcomplexes. However we illustrate the process for four subcomplexes (Figure 6), which we have assigned as solution complexes based on their charge/mass ratios (complexes 5, 4, 3, and 2; inset in Figure 4B). Our experience shows that for complexes in the mass range of several hundred kilodalton, one can expect mass shifts (difference between naked protein mass and measured complex mass (peak center)) up to 2 kDa. As a consequence, starting values allow for a deviation of 2 kDa between the experimental mass and the theoretical mass. During the assignment process, the early on assigned complexes will define the mass shift to be taken into account for later assignments will be much smaller, which simplifies the assignment (Figure 5). For every subunit, the maximum possible copy number is added as input into the software. In Figure 6, we show the selection process of the mathematically possible subunit combinations, generated by the software to match the observed masses, based on the mass of the complex and the subunits. For complex 5 with a mass determined as 387 356 Da, Massign finds 580 possible subunit combinations given the default tolerance. Subsequently the number of possible complexes is reduced by adding connectivity and stoichiometry restraints into the software. These restraints for complex 5 reduce the number of potential complexes to two (Figure 6A). The same strategy is applied to the other solution complexes. For complexes 2 and 3, the software output for both complexes is two possibilities (Figure 6B and C). The potential complexes are depicted in Figure 6. A self-consistent set of complexes 2945

dx.doi.org/10.1021/ac300056a | Anal. Chem. 2012, 84, 2939−2948

Analytical Chemistry

Article

Figure 6. Assignment process. After relationships between complexes are established (Figure 4), these can be used in the final assignment of the complexes. (A) Assignment process for complex 5, a solution complex, mass 387 356 Da. The solution relationship suggests it does not include subunit C. Loss of subunits G and D via CID from this complex was observed, so the original complex must include subunits G and D. The restraints reduce the mathematically possible complexes from 580 to 2, which are listed together with the mass shift between the experimental and the theoretical protein mass. Complexes 2 and 3 are analyzed in a similar waythe software can reduce the possibilities in both cases from 385 (B) or 474 (C) to 2. These assignments deliver two possibilities for a consistent set of complexes (depicted on light blue vs purple background). Assignment of complex 4 (D) reduces the 538 possibilities to 1, which decides the final assignment for all 4 complexes to the series shown in E.

For the analysis of further complexes an additional restraint can be applied: The mass shift due to attachment of adducts can be of the order of 1 or 2 kDa, but for complexes in the same spectrum, the amount of adducts will be correlated. Therefore the default setting for the mass tolerance which has to be allowed in the complex assignment process is 2 kDa at

derived from each other can be seen for both sets of complexes. At this point it is not clear which solution is the correct one. Assignment of complex 4 then gives only one solution, which fits into only one set of solutions (Figure 6D). This allows the unambiguous assignment of all four complexes as selfconsistent set of solution complexes (Figure 6E). 2946

dx.doi.org/10.1021/ac300056a | Anal. Chem. 2012, 84, 2939−2948

Analytical Chemistry

Article

first. The assignment of the first complexes allows then to reduce the mass tolerance for the assignment of the remaining complexes. These first complexes to be assigned will often be the smallest or biggest ones, as mentioned earlier, but since in our example we already assigned four complexes (Figure 6) we illustrate the effect using the complexes already assigned (Figure 5). Complexes 2−5 now define the range of the expected mass shift for all E. hirae ATPase complexes in this spectrum. The mass shift allowed for the assignment can now be minimized accordingly. If the assignment process does not produce a consistent set of complexes it is advisible for the user to retrace his/her steps and to reconsider if the restraints that were chosen for stoichiometry and connectivity could be wrong. It is worth noting, that the aim of this software package is not to act as a black box, into which one inputs a spectrum and which then outputs assignments. Instead it can support the user in dealing with more and more complex sets of data, while allowing the user to stay in complete control of the entire process.



ATTACHMENT OF SMALL MOLECULES As mentioned earlier the quality and resolution of mass spectra can vary noticeably between spectra but in general the resolution is the same over the whole mass range for a single spectrum. Nevertheless we sometimes encounter mass spectra in which one or two peak series are much broader than the others. From experience we have found that it is worth paying attention to these irregularities. Peak series which appear to be noticeably broader than all other peak series present in the same mass spectrum can be expected to represent not one single subcomplex but a heterogeneous distribution of complexes very close in mass. While this can be due to truncations or PTMs (depending on the size of the distribution), the cases we encountered could in general be explained by a complex with varying amounts of ligands bound, which show a specific binding with certain subcomplexes. For ATPases we commonly observed binding of nucleotides to complexes containing the soluble head as well as ligands and/or nucleotides binding to complexes containing the membrane ring. In some cases the attachments leading to the broad peak features might be visible by means of shoulders in the peaks. In any case these features will be of importance if one wants to assign a complex to the observed mass and should therefore be kept in mind. While the general mass shift found for all complexes will be incorporated into the assignment strategy (as explained in the previous paragraph), these “complex specific” shifts can be factored in as mandatory “subunits” of the complex. This is important for example in the binding of six lipids and nucleotides to the membrane embedded C-ring, of Thermus thermophilus ATPase.16 This lipid and nucleotide binding induced a mass shift of more than 4 kDa (Figure 7). This binding assignment was later confirmed by identification and quantitative analysis of the specifically bound lipids. If this had gone unnoticed, the assignment of the membrane containing complexes would have been impossible.

Figure 7. (A) Mass spectrum of Thermus thermophilus ATPase which reveals a well-resolved peak series for the soluble subcomplexes with fwhm of around 40 Da. (B) Original assignment of a peak series in the lower m/z region indicating a broad peak series of a higher fwhm of 70 Da, suggesting heterogeneous ligand binding to the complex in question. More detailed analysis (C) reveals binding of lipids (L) and nucleotides (ATP/ADP (A)) to the membrane embedded rotor ring (forming a distribution of complexes termed here “ring Lx Ay”), which broadens the peaks and shifts the observed mass by more than 4 kDa in comparison with the naked protein mass (blue and green dotted lines in B).

environment (change of pH, addition of nucleotides, etc.) by comparing the intensities of different subcomplexes, under the same instrumental conditions. A very interesting area for investigation is in binding/ assembly studies (depending on time, ligand concentration, etc.).The kinetics of such systems can be investigated quantitatively even for multicomponent systems. Therefore each spectrum of a time/concentration series is simulated as described previously. This allows the determination of the overall peak area for every component in comparison with the overall signal of the whole spectrum. Changes in component ratio can therewith be followed easily, even if peak series overlaywhich would undermine any approach comparing peak heights. This approach has been used successfully on systems with up to 14 components.23−25





QUANTITATIVE ASSIGNMENTS The simulation of the spectra allows additionally the comparison of signal intensities represented in the component spectra to obtain quantitative information on the complex distribution. It is possible therefore to determine for example (de)stabilization effects due to changes in the sample

CONCLUSION

Overall the assignment procedures needed to assign mass spectra for heterogeneous large multicomponent systems have to be more complex than for smaller assemblies. Rigorous procedures for consistency need to be applied. Given the large MS data sets and 2947

dx.doi.org/10.1021/ac300056a | Anal. Chem. 2012, 84, 2939−2948

Analytical Chemistry

Article

from the European Union Prospects grant number (HEALTHF4-2008-201648) (N.M.) together with funding from the Royal Society (C.V.R.)

the numerous restraints to reach a consistent assignment, it was necessary to develop software specifically to handle these heterogeneous multicomponent assemblies. Massign was designed explicitly to work with the mass spectra of large complexes rather than redesigning an application for proteins and peptides. Instead of treating mass spectra of multicomponent protein complexes as spectra of large individual proteins, we have considered those features peculiar to large protein assemblies. Specifically, these features include spectra of high information content, large charge state distributions with high charges, peak broadening due to incomplete desolvation, and overlapping charge state series, which often masks species of low abundance. The visual reconstruction of the mass spectrum makes it relatively straightforward to ensure that all information has been extracted from a spectrum. An additional second part aids the assignment, once the charge states and masses of the components are determined. In order to have success in assignment of large complexes, a systematic strategy had to be developed which goes much further than just simple comparison of expected masses and those found in a spectrum. It is essential to implement a rigorous treatment of mass shifts due to water/buffer attachments, charge restraints which link subcomplexes, and to consider the possibility of small molecule binding (lipids, nucleotides). Information available from previous research has to be taken into account as well as the awareness that one is dealing with biological systems which will in solution or gas phase disassemble according to certain complex-specific patterns. We have introduced an assignment strategy supported by Massign which allows the qualitative and quantitative analysis of the mass spectra of heterogeneous, dynamic complexes. The package is not fully automated and involves input from the researcher at many stages to define restraints and to check the logic of the assignment. The assignment strategy presented here makes systematic use of masses, charge states, stoichiometry, and connectivity information. Overall using this method it becomes possible to establish connectivity networks, assembly/disassembly pathways, and kinetic analysis and to study the reaction to change in solution conditions. This can not only establish KDs, stable complexes in solution, connectivity, and stoichiometry but also highlight possible regulatory and allosteric interactions.16,23,24,26 The method is the subject of a Provisional US Patent Application (US provisional application serial no. is 61/ 631,188). The software will be available for download from http://Massign.chem.ox.ac.uk for academic noncommercial use.





ASSOCIATED CONTENT

S Supporting Information *

Tables supporting the working example and details on the Massign software, spectra preparation, analyzing the mass series, assigning complexes, and following kinetics. This material is available free of charge via the Internet at http://pubs.acs.org.



REFERENCES

(1) Tahallah, N.; Pinkse, M.; Maier, C. S.; Heck, A. J. R. Rapid Commun. Mass Spectrom. 2001, 15, 596. (2) Krutchinsky, A. N.; Chernushevich, I. V.; Spicer, V. L.; Ens, W.; Standing, K. G. J. Am. Soc. Mass Spectrom. 1998, 9, 569. (3) Rostom, A. A.; Robinson, C. V. J. Am. Chem. Soc. 1999, 121, 4718. (4) Sobott, F.; Hernández, H.; McCammon, M. G.; Tito, M. A.; Robinson, C. V. Anal. Chem. 2002, 74, 1402. (5) Ferrige, A. G.; Seddon, M. J.; Green, B. N.; Jarvis, S. A.; Skilling, J.; Staunton, J. Rapid Commun. Mass Spectrom. 1992, 6, 707. (6) Ferrige, A. G.; Seddon, M. J.; Skilling, J.; Ordsmith, N. Rapid Commun. Mass Spectrom. 1992, 6, 765. (7) Taverner, T.; Hernandez, H.; Sharon, M.; Ruotolo, B. T.; MatakVinkovic, D.; Devos, D.; Russell, R. B.; Robinson, C. V. Acc. Chem. Res. 2008, 41, 617. (8) Sobott, F.; Benesch, J. L.; Vierling, E.; Robinson, C. V. J. Biol. Chem. 2002, 277, 38921. (9) Baldwin, A. J.; Lioe, H.; Hilton, G. R.; Baker, L. A.; Rubinstein, J. L.; Kay, L. E.; Benesch, J. L. Structure 2011, 19, 1855. (10) van Breukelen, B.; Barendregt, A.; Heck, A. J.; van den Heuvel, R. H. Rapid Commun. Mass Spectrom. 2006, 20, 2490. (11) Stengel, F.; Baldwin, A. J.; Bush, M. F.; Lioe, H.; Basha, E.; Jaya, N.; Vierling, E.; Benesch, J. L. P. unpublished work. (12) Monti, M. C.; Cohen, S. X.; Fish, A.; Winterwerp, H. H. K.; Barendregt, A.; Friedhoff, P.; Perrakis, A.; Heck, A. J. R.; Sixma, T. K.; van den Heuvel, R. H. H.; Lebbink, J. H. G. Nucleic Acids Res. 2011, 39, 8052. (13) McKay, A. R.; Ruotolo, B. T.; Ilag, L. L.; Robinson, C. V. J. Am. Chem. Soc. 2006, 128, 11433. (14) Liepold, L.; Oltrogge, L.; Suci, P.; Young, M.; Douglas, T. J. Am. Soc. Mass Spectrom. 2009, 20, 435. (15) Tseng, Y.-H.; Uetrecht, C.; Heck, A. J. R.; Peng, W.-P. Anal. Chem. 2011, 83, 1960. (16) Zhou, M.; Morgner, N.; Barrera, N. P.; Politis, A.; Isaacson, S. C.; Matak-Vinković, D.; Murata, T.; Bernal, R. A.; Stock, D.; Robinson, C. V. Science 2011, 334, 380. (17) Murata, T.; Arechaga, I.; Fearnley, I. M.; Kakinuma, Y.; Yamato, I.; Walker, J. E. J. Biol. Chem. 2003, 278, 21162. (18) Murata, T.; Yamato, I.; Kakinuma, Y.; Leslie, A. G.; Walker, J. E. Science 2005, 308, 654. (19) Rubinstein, J. L.; Walker, J. E.; Henderson, R. EMBO J. 2003, 22, 6182. (20) Esteban, O.; Bernal, R. A.; Donohoe, M.; Videler, H.; Sharon, M.; Robinson, C. V.; Stock, D. J. Biol. Chem. 2008, 283, 2595. (21) Kitagawa, N.; Mazon, H.; Heck, A. J.; Wilkens, S. J. Biol. Chem. 2008, 283, 3329. (22) Yamamoto, M.; Unzai, S.; Saijo, S.; Ito, K.; Mizutani, K.; SunoIkeda, C.; Yabuki-Miyata, Y.; Terada, T.; Toyama, M.; Shirouzu, M.; Kobayashi, T.; Kakinuma, Y.; Yamato, I.; Yokoyama, S.; Iwata, S.; Murata, T. J. Biol. Chem. 2008, 283, 19422. (23) Ebong, I. O.; Morgner, N.; Zhou, M.; Saraiva, M. A.; Daturpalli, S.; Jackson, S. E.; Robinson, C. V. Proc. Natl. Acad. Sci. USA 2011, 108 (44), 17939−17944. (24) Natan, E.; Hirschberg, D.; Morgner, N.; Robinson, C. V.; Fersht, A. R. Proc. Natl. Acad. Sci. 2009, 106, 14327. (25) Natan, E.; Baloglu, C.; Pagel, K.; Freund, S. M. V.; Morgner, N.; Robinson, C. V.; Fersht, A. R.; Joerger, A. C. J. Mol. Biol. 2011, 409, 358. (26) Hernández, H.; Makarova, O. V.; Makarov, E.; Muto, Y.; Pomeranz- Krummel, D.; Robinson, C. V. PLOS One 2009, 4, e7202.

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We thank Helena Hernández and Justin Benesch (University of Oxford) for critical reading of the manuscript as well as funding 2948

dx.doi.org/10.1021/ac300056a | Anal. Chem. 2012, 84, 2939−2948