Appropriate Degree of Trust: Deriving Confidence Metrics for

Publication Date (Web): August 7, 2012 ... These metrics can help an analyst to place an appropriate degree of trust in the results obtained from auto...
2 downloads 0 Views 1MB Size
Article pubs.acs.org/ac

Appropriate Degree of Trust: Deriving Confidence Metrics for Automatic Peak Assignment in High-Resolution Mass Spectrometry David P. A. Kilgour,*,† C. Logan Mackay,‡ Patrick R. R. Langridge-Smith,‡ and Peter B. O’Connor† †

Department of Chemistry, University of Warwick, Coventry, CV4 7AL, U.K. Scottish Instrumentation and Resource Centre for Advanced Mass Spectrometry, School of Chemistry, University of Edinburgh, Edinburgh, EH9 3JJ, U.K.



ABSTRACT: Techniques for deriving confidence metrics for the reliability of automatically assigned elemental formulas in complex spectra, from high-resolution mass spectrometers, are described. These metrics can help an analyst to place an appropriate degree of trust in the results obtained from automated spectral analysis of, for example, natural organic materials. To provide these metrics of confidence, common mass spectrometric tests for reliability of peak assignment (mass accuracy/error, relative ion abundance, and rings-plusdouble-bonds equivalence) are combined with novel confidence metrics based on the interconnectivity and consistency of a mass difference or mass defect based peak inference network and on the confidence of the initial library matches. These are shown to provide improved peak assignment confidence over manual or simple automatic assignment methods.

T

appropriate degree of trust one may place on the output of a given automatic peak assignment, have been developed specifically for use with an in-house-developed transformation mapping algorithm but will also be equally applicable to any of the other, similar algorithms (for example, those presented in refs 14−18).

he use of high-performance mass spectrometers (e.g., Fourier transform ion cyclotron resonance1−3 or Kingdon 4,5 trap mass spectrometers) has allowed the analysis of highly complex mixtures, which are common in the study of naturally occurring organic materials: for example, environmental samples,6,7 natural organic matter,8 crude oil,9 biological extracts,10 food products,11−13 etc. The ultrahigh mass resolution (>250 000 to 1 million, or better) and mass accuracy (better than 1 ppm) of these spectrometers provide the analyst with the potential to assign elemental formulas10,14−18 to many compounds in the sample. However, although the instrumentation exists to analyze these samples and separate many thousands of components, this has created a problem in the ability to analyze the vast quantity and complexity of the data so produced. Manual peak assignment methods for complex spectra are available19,20 but are labor intensive and costly, both in terms of time and investment. Furthermore, as with many man-in-theloop processes, it is difficult to assign a value for the reliability of the results which successfully includes the subjective measure of individual analysts’ performances against unknown samples. As a result, it is likely to be difficult to compare the appropriate level of confidence that one may place in the results from different analysts, or even from the same analyst across different samples, sample types, or from day to day. Automatic peak assignment algorithms have been developed in order to help overcome the potential inherent weaknesses of the manual peak assignment approaches. Most of these methods involve the use of transformational mapping, either in a mass difference14−16 or mass defect17,18 space. The techniques we propose in this paper, which allow the analyst to have an improved ability to understand the © 2012 American Chemical Society



METHODS

The n-Dimensional Kendrick Defect Inference Network. Since its publication in 1963,21 the Kendrick mass defect has become a common tool for the identification of compound classes in complex samples. Kendrick mass defects (KMDs) are obtained by rescaling the IUPAC masses in the spectrum to a Kendrick mass scale which depends on the choice of the Kendrick base: KMbase = M peak

[Mbase] Mbase

(1)

KMbase is the Kendrick mass calculated using the nominated Kendrick base, Mpeak is the mass of the peak, [Mbase] is the nominal mass of the Kendrick base, and Mbase is the accurate mass of the Kendrick base. Therefore, if one were to choose CH2 as the Kendrick base (probably the most commonly used Kendrick base), the Kendrick mass would be the product of the peak mass and the ratio (14.00000/14.01565). Received: May 17, 2012 Accepted: August 7, 2012 Published: August 7, 2012 7431

dx.doi.org/10.1021/ac301339d | Anal. Chem. 2012, 84, 7431−7435

Analytical Chemistry

Article

recognize the regular spacing resulting from the addition of O and C to the peak formulas. We have found that plotting mass spectral data in this way provides the analyst with an improved ability to manually investigate a complex data set and, therefore, check the results produced by the automated processing system. At the benchmarking stage, manual review of the results is, of course, key to understanding of the quality of the results of the automated processing. Kendrick bases are selected specifically for the sample set under consideration. The base selections of O and C, as shown in the whisky results above, are also used, for example, for the analysis of fulvic acid samples, but different Kendrick bases may be used for other samples types. Additionally, although for display purposes normally only two KMD dimensions are used, the algorithm will work equally well with more dimensions, should the data set be best analyzed in this manner. Library Search. As with any transformation mapping technique, our algorithm requires that the formulas of some points be known so that the formulas of unknown points can be inferred from these. This inference is possible because particular mapping vectors in that space have been correlated to known chemical formulaic differences.8 Therefore, the first stage in our algorithm is to search the data under analysis against a library, in order to generate a subset of the data points for which a formula can be assigned by the mass alone. We have a number of libraries available to us, including one which contains all compounds in the National Institute of Standards and Technology (NIST) GC/MS library, where the elemental formulas of these have been used to generate an accurate mass. The user can choose what elements to allow in returned library matches, based on their chemical knowledge of the sample type. We prefer the method of using a pregenerated library to the method of live recalculation of matching formula as it is minimizes the amount of processing time which is required to regenerate the same formula for many subsequent spectra. The system invites the user to list those adducts or leaving groups which will be added to/removed from the molecular formula during sample ionization. For example, a monosodiated ion will have an adduct with the formula E−1Na (loss of one electron and addition of sodium), whereas a disodiated ion will have the adduct E−2Na2. The system classifies all peaks to which it can assign formulas from the library (or calculated from an “exact mass” calculation) as “knowns”. This is not intended to suggest that we confidently know that the formula is the correct one, simply that the class had to be given a name. Furthermore, these initial assignments will be further refined and incorrect assignments will normally become obvious as the algorithm iterates forward, as shown below. Generating the Inference Network. The second step is to generate the inference network. The data is converted into the required KMD mapping space as described above. In order to generate the connection network, the mapping vectors between all pairs of points in this mapping space can be statistically analyzed to find common transformation vectors between points, in a manner similar to that of Kunenkov et al.15 The top 200−500 of these are converted back to the underlying mass difference, and these mass differences are searched against a library of formulaic differences. This library is different from the normal libraries, in that a formulaic difference can contain elements with negative stoichiometries. For example, the

The Kendrick mass defect is then calculated as the difference between the Kendrick mass of a peak in a mass spectrum and the next highest integer value: KMDbase = ⌈KMbase⌉ − KMbase

(2)

where KMDbase is the Kendrick mass defect calculated for a specific base unit. Conventionally, the peak masses in a complex spectrum would be plotted as the nominal Kendrick mass (NKM) versus the KMD, and this can simplify the process through which homologous series can be assigned.17 An example KMD plot, using CH2 as the Kendrick base, showing a portion of the peaks recorded in the negative mode electrospray ionization Fourier transform ion cyclotron resonance mass spectrum (FTICR MS) of a diluted malt whisky sample is shown in Figure 1; the regular spacing of

Figure 1. Example KMD plot, using CH2 as the Kendrick base.

points, resulting from a formulaic difference of a CH2 unit, is labeled. The spacing resulting from the regular repeat unit CH2 can be clearly seen in Figure 1, and this allows the analyst to distinguish a number of homologous series in the data. Our approach is related to the conventional Kendrick mass defect approach. However, instead of plotting the NKM versus the KMD for a given Kendrick base mass, we find that plotting two different KMDs (calculated using different Kendrick base masses) allows the data to be more easily manually interpreted, as is shown in Figure 2. In Figure 2, the two Kendrick base masses are O and C (for base 1 and 2, respectively). It is easy to

Figure 2. Subset of a malt whisky FTICR mass spectrum displayed in two-dimensional Kendrick mass defect mapping space (2D KMD), including some connecting mapping vectors corresponding to known formulaic differences. 7432

dx.doi.org/10.1021/ac301339d | Anal. Chem. 2012, 84, 7431−7435

Analytical Chemistry

Article

formulaic difference between n-octanoic acid and n-nonanol is C1H3O−1. The formulaic differences which are used to create the connection net for a given sample can be saved and applied to generate connection nets for other, potentially related samples, a tactic which can be used to greatly improve the speed of the algorithm in cases where the samples contain similar sets of homologous series. Alternatively, or additionally, the user can simply define common formulaic differences that are expected for the sample under consideration. Points in the mapping space are connected if the mapping vector between any two points is one of the statistically significant set which was matched against a known formulaic difference. The formulaic difference concerned is, of course, linked to that connection. This interconnected data set is the inference network. The Inference Loop. At this stage, the system has produced an inference network connecting points where there is a known formulaic difference between the points and some of those points (the “knowns”) have been assigned chemical formulas. The system now finds all “unknowns” which are directly connected to “knowns” through the inference network and uses the formulaic difference characteristic of the link to infer the formula of the “unknown”. It then assigns these to the category “inferred”. The system then enters a loop in which it identifies any “unknowns” directly connected to these newly assigned “inferred” peaks and can, therefore, assign formulas to this new layer of “unknowns”. It repeats this loop, iteratively assigning more “unknowns” through the connections in the inference network, until all peaks which are connected into the network are assigned formulas. Although this algorithm will automatically assign formulas to mass spectral peaks, it does not, so far, contain any methods by which one can monitor the confidence of any peak assignment, and there may be an unknown number of misassigned peaks in the inference network. Therefore, we have added a variety of different methods of improving the surety of the output. All metrics are user controllable. Metrics of Confidence. Basic Mass Spectral Metrics of Confidence. The simplest classes of confidence metrics that we use to control the assignment of peaks are widely known and have been applied to mass spectral peak assignments for many decades: mass accuracy/mass error, relative ion abundance,10,22 the nitrogen rule,22 and rings-plus-double-bonds equivalence. Mass accuracy/mass error is used to define how closely a library hit must match a spectral peak before it can be returned as a match. Similarly, once a formula has been assigned to an inferred peak, this same metric is used to determine whether the assigned formula is within the acceptable range. Analogous to mass accuracy, the accuracy of the mapping vectors linking the data points in the mapping space is also used as a metric of confidence, and the threshold accuracy which must be achieved for a valid linkage to be recorded in the inference network is controllable by the user. For all peaks which have an isotopologue in the spectrum, the relative ion abundance is measured and compared to theoretical isotope distributions. This is used as a method of improving confidence in the identity of the relevant formulas. For all inferred formulas, both the well-known nitrogen rule (molecules with an even number of nitrogens will have an even molecular mass) and the rings-plus-double-bonds equivalence calculation (eq 3)23 are applied in an attempt to check the stoichiometric possibility of the returned formula.

RDBE = 1 +

1 2

∑ ni(vi − 2) i

(3)

where RDBE is the rings-plus-double-bonds equivalence, ni is the number of atoms of element i in the molecular formula, and vi is the formal valence of element i. An RDBE value lower than −0.5 is stoichiometrically impossible. Artificial Immune System Metric. Artificial immune systems are a type of artificial intelligence. In the same way that artificial neural networks attempt to mimic some of the processes which may be important in the processing abilities of the brain, artificial immune systems mimic some of the control mechanisms thought to be important in the distributed intelligence of the mammalian immune system.24−27 A key part of the control mechanism in the immune system is provided by the B cells and, specifically, the degree to which they are stimulated.27 In part, B cell stimulation is moderated by the closeness of the match between the antigens expressed by that B cell and the pathogens which it encounters. The immune system uses that stimulation level as a trigger for a clonal selection process; insufficiently stimulated cells are culled from the immune system (by apoptosis), whereas B cells which are stimulated above a threshold are left to multiply and mutate, to better detect the pathogens. We have used this concept to derive a confidence metric for the assignment of formulas to mass spectral peaks. Taking each peak in turn, we treat it as a B cell and all other peaks in the spectrum as potential pathogens. The B cell can detect a pathogen if it is directly connected through the inference network. The presence of that connection varies depending on the accuracy of the mapping vector between those two peaks (how closely the mapping vector matches perfectly for that formulaic difference); the accuracy of the acceptance threshold (analogous to the network affinity threshold27 in a conventional artificial immune system) can be set automatically or manually, by the user. The total stimulation of a B cell is the sum of all the connections that cell has to potential pathogens; for example, in Figure 2, point A can recognize points B−G and so has a stimulation level of 6. In this way, we can record the stimulation level of all peaks in the spectrum. For “known” peaks, a low stimulation level could result because space charge, or some other effect, has caused the mass of that peak to shift or because the peak is actually an artifact or noise peak. This type of peak will be poorly connected into the homologous network for that sample. Therefore, there is an increased potential that poorly stimulated peaks are either incorrectly identified or incorrectly connected, and were any unknown peaks to have formulas inferred, based on the formula of this suspect peak, this error could be propagated through large parts of the downstream network. This spreading of an error through the inference network could potentially reduce the confidence in the assignments of many other peaks. The user, therefore, can set a threshold level of stimulation below which a peak cannot be used as an inference source; this removes this potential source of error. An example of this type of potential error is shown in Figure 3. The data shown is part of the negative mode electrospray FTICR mass spectrum of a malt whisky, where many peaks have been automatically assigned. The presumed fatty acid peak at m/z 171, with an assigned elemental formula of C10H20O2, is sufficiently large that the peak detection algorithm used by the Bruker DataAnalysis version 4 Sp4 software also detects artifact peaks in the sidebands of the main peak (Figure 3b). By library 7433

dx.doi.org/10.1021/ac301339d | Anal. Chem. 2012, 84, 7431−7435

Analytical Chemistry

Article

unique elemental composition assignment. So, in order to reduce the probability that a peak can be misassigned, rather than simply choosing the closest library match to a peak mass, we require that, for a peak to be assigned a formula by accurate mass alone, there must be only a single candidate formula hit from the library, within the user-defined accuracy requirements. And, taking this concept further, that there must be no other potential library hits within a larger mass error range, which is known as the uniqueness threshold and is also defined by the user. The uniqueness threshold would usually be set to some value known to be at least 2−3 times the expected spectral mass accuracy. The use of the uniqueness threshold prevents potentially incorrectly assigned peaks from being incorporated into the starting network because if there is the potential that, within the uniqueness threshold, more than one potential elemental formula could be assigned to a given peak, that peak will not have a formula assigned to it. So, only those peaks which are both accurate and unique are “assigned” by this software. Obviously, this means that the number of peaks that can be uniquely assigned are substantially lower than the number of peaks that can be accurately assigned, but the false assignment rate will have been reduced. Second, for both “known” and “inferred” peaks we apply a related concept. This concept states that for any two peaks connected through the inference network any route through the network should result in the same relative formula difference. Therefore, the network must be 100% consistent. For any peaks for which more than one formula assignment is possible through the inference network, these peaks are prevented from acting as inference sources in later cycles and are specially labeled, to highlight this issue to the user. This labeling of peaks is intended to allow the user to investigate the cause of the inconsistency which could be poor calibration, poor mass accuracy, or inappropriately chosen control parameters in the automatic assignment system.

Figure 3. (a) Section of malt whisky FTICR mass spectrum showing assigned peaks. The region between the dotted lines has been enlarged in panel b. Peaks labeled with asterisks in this spectrum are unassigned. (b) Zoomed portion of panel a showing an artifact peak detected by the Bruker software and initially probably misassigned by the automatic peak assignment algorithm but identified as a lowconfidence assignment owing to the poor stimulation level of the peak in the connection network.

search alone, one of these sideband peaks can be misassigned the formula C8H18ON3. However, the algorithm detects that this peak assignment is very poorly connected into the inference network of the rest of the sample (this peak has a stimulation level of only 1, which is much lower than other peaks) and so recognizes that this peak has a higher probability of being incorrectly assigned. Other methods of identifying the presence of inappropriately detected sideband peaks are also used within the algorithm, detecting peak spacing and relative intensities of very closely spaced peaks. For inferred peaks, the same is true, although the method of application is different. Here we allow the user to set a different stimulation threshold, and the system will not allow the formulas of an inferred peak to be assigned until such time as it passes this stimulation threshold. Therefore, an inferred peak cannot be used as an inference source itself until a certain number of other confidently assigned peaks all concur as to the assignment of that new peak. This is analogous to the reverse of the culling process for poorly stimulated B cells in the immune system. Uniqueness and Consistency. The last two metrics of confidence which we have employed so far are related to concepts which were originally developed for Raman spectroscopy.28 Here we attempt to avoid potential confusion as to the identity of assigned formulas by accepting only those results which are highly unlikely to be a result of confusion between similar potential candidates. This principle is applied in two ways. Although masses are measured with a typical mass accuracy of 0.1 ppm, accurate mass measurement alone does not ensure



RESULTS A 10 year old Glenkinchie single malt whisky was used as a test sample. The whisky was diluted 1:100 in 50% MeOH. The sample was analyzed by negative mode electrospray ionization on a Bruker solariX 12T FTICR MS. A total of 100 scans was acquired for each spectrum (syringe infusion) using a 2 MW acquisition size with 10 repetitive measurements. The spectra were both externally calibrated in solarixControl (using arginine clusters) and calibrated internally, using DataAnalysis 4.0, with calibration reference lists of CxHyOz compounds. The data was analyzed using the algorithm described above. The top 1938 most intense peaks were selected for processing. The parameters used for processing were as follows: library mass accuracy (±0.500 ppm), uniqueness threshold (±1 ppm), Kendrick mass defect accuracy for valid connections (±150 × 10−6), minimum stimulation level for “knowns” to act as inference sources (S = 32), minimum stimulation level for inferred peaks to be created (S = 8), network internal consistency (100%). The spectrum library used for this analysis contained stoichiometrically reasonable combinations of C, H, and O (up to C75), and the molecular fragment library used for identifying the formulaic difference, corresponding to statistically common mapping vectors, contained stoichiometrically reasonable combinations of C, H, O, N, and S between C−6 and C6; both of these libraries were generated in-house. 7434

dx.doi.org/10.1021/ac301339d | Anal. Chem. 2012, 84, 7431−7435

Analytical Chemistry



Article

ACKNOWLEDGMENTS The authors acknowledge the assistance of Matthias Witt (Bruker Daltonik GmbH, Bremen, Germany) for collecting the whisky data set and Mark Neal (University of Aberystwyth, U.K.) for useful discussions on artificial immune systems. This work was supported by the University of Warwick, Department of Chemistry, and the Warwick Centre for Analytical Science (EPSRC funded by Grant EP/F034210/1).



Figure 4. (a) Mass accuracy of the assigned peaks vs ion mass. (b) Mass accuracy distribution of the assigned peaks. (c) Stimulation levels of the peaks vs ion mass.

With this setup, the algorithm assigned unique formulas to 1652 peaks (77.4%). The inference network contained 797 296 connections between points. The mass accuracy and mass accuracy distribution of the assigned peaks and connectivity data is shown in Figure 4.



CONCLUSIONS Confidence metrics which can be used to objectively rate the reliability of automatic peak assignments in mass spectra will be necessary in order to allow the technique to be used robustly for decision making. We have found the novel metrics described here (stimulation level, peak assignment uniqueness, and inference network consistency) to be easy to implement and useful in application. They can be appended into existing algorithms to provide additional measures of confidence in the formulas assigned to mass spectral peaks.



REFERENCES

(1) Comisarow, M. B.; Marshall, A. G. Chem. Phys. Lett. 1974, 25, 282−283. (2) Amster, I. J. J. Mass Spectrom. 1996, 31, 1325−1337. (3) Marshall, A. G.; Hendrickson, C. L.; Jackson, G. S. Mass Spectrom. Rev. 1998, 17, 1−35. (4) Kingdon, K. H. Phys. Rev. 1923, 21, 408−418. (5) Makarov, A. Anal. Chem. 2000, 72, 1156−1162. (6) Reinhardt, A.; Emmenegger, C.; Gerrits, B.; Panse, C.; Dommen, J.; Baltensperger, U.; Zenobi, R.; Kalberer, M. Anal. Chem. 2007, 79, 4074−4082. (7) Headley, J. V.; Peru, K. M.; Barrow, M. P. Mass Spectrom. Rev. 2009, 28, 121−134. (8) Kim, S.; Kramer, R. W.; Hatcher, P. G. Anal. Chem. 2003, 75, 5336−5344. (9) Hughey, C. A.; Rodgers, R. P.; Marshall, A. G. Anal. Chem. 2002, 74, 4145−4149. (10) Weber, R. J. M.; Southam, A. D.; Sommer, U.; Viant, M. R. Anal. Chem. 2011, 83, 3737−3743. (11) Shah, M.; Meija, J.; Caruso, J. A. Anal. Chem. 2006, 79, 846− 853. (12) Gougeon, R. D.; Lucio, M.; Frommberger, M.; Peyron, D.; Chassagne, D.; Alexandre, H.; Feuillat, F.; Voilley, A.; Cayot, P.; Gebefügi, I.; Hertkorn, N.; Schmitt-Kopplin, P. Proc. Natl. Acad. Sci. U.S.A. 2009, 106, 9174−9179. (13) Wu, Z.; Rodgers, R. P.; Marshall, A. G. J. Agric. Food Chem. 2004, 52, 5322−5328. (14) Kujawinski, E. B.; Behn, M. D. Anal. Chem. 2006, 78, 4363− 4373. (15) Kunenkov, E. V.; Kononikhin, A. S.; Perminova, I. V.; Hertkorn, N.; Gaspar, A.; Schmitt-Kopplin, P.; Popov, I. A.; Garmash, A. V.; Nikolaev, E. N. Anal. Chem. 2009, 81, 10106−10115. (16) Tziotis, D.; Hertkorn, N.; Schmitt-Kopplin, P. Eur. J. Mass Spectrom. 2011, 17, 415−421. (17) Hughey, C. A.; Hendrickson, C. L.; Rodgers, R. P.; Marshall, A. G.; Qian, K. N. Anal. Chem. 2001, 73, 4676−4681. (18) Roach, P. J.; Laskin, J.; Laskin, A. Anal. Chem. 2011, 83, 4924− 4929. (19) Hsu, C. S.; Qian, K. N.; Chen, Y. N. C. Anal. Chim. Acta 1992, 264, 79−89. (20) Stenson, A. C.; Marshall, A. G.; Cooper, W. T. Anal. Chem. 2003, 75, 1275−1284. (21) Kendrick, E. Anal. Chem. 1963, 35, 2146−2154. (22) Koch, B. P.; Dittmar, T.; Witt, M.; Kattner, G. Anal. Chem. 2007, 79, 1758−1763. (23) Pretsch, E.; Bühlmann, P.; Badertscher, M. In Structure Determination of Organic Compounds; Springer: Berlin and Heidelberg, Germany, 2009; pp 1−43. (24) de Castro, L. N.; Timmis, J. Artificial Immune Systems: A New Computational Intelligence Approach; Springer-Verlag: Berlin, Germany, 2002. (25) Farmer, J. D.; Packard, N. H.; Perelson, A. S. Phys. D 1986, 22, 187−204. (26) Hart, E.; Timmis, J. Appl. Soft Comput. 2008, 8, 191−201. (27) Timmis, J.; Neal, M. Knowl.-Based Syst. 2001, 14, 121−130. (28) Brown, C. D.; Vander Rhodes, G. H. United States Patent No. 20080033663, 2008.

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Notes

The authors declare no competing financial interest. 7435

dx.doi.org/10.1021/ac301339d | Anal. Chem. 2012, 84, 7431−7435