Anal. Chem. 1980, 52, 1095-1102
1095
Computerized Mass Spectrum Prediction and Ranking N. A. B. Gray, R. E. Carhart, A. Lavanchy, D. H. Smith,” T. Varkony, B. G. Buchanan, W. C. White, and L. Creary ( 7 ) Departments of Computer Science and Chemistry, Stanford University, Stanford, California 94305
A program is described for predicting the mass spectrum of a given molecular structure, using a parameterized theory that can encompass a wide range of levels of specificity from general models of fragmentation to precise, class-specific fragmentation rules. Given, from some other source, a set of candidate structures for some unknown compound, this program may be used to determine which candidate(s) can, within the limits of the fragmentation theory used, provide the best rationalization of the observed mass spectral data and which are, consequently, the most plausible structure(s) for the unknown. Two example applications of the program are given. The first example shows how the program might assist in the identification of the probable structure of an unknown, given just its chemical history and high resolution mass spectral data. The second example illustrates a more routine application of the program for the identification of a simple metabolite from a known starting material. Limitations of the program are presented.
Most applications of computers to the analysis of mass spectral data are concerned with the derivation of structural information from the observed spectrum. The processing may involve spectrum classification by Pattern Recognition ( 2 ) or other empirical approaches (3-5), or may involve the identification of the structure, or component parts thereof (6), through the comparison of t h e observed spectrum with standard spectral data in files (for a recent review see ( 7 ) ) . In contrast with these more conventional computer applications, our work is concerned with the prediction of the spectral properties of given structures, and in the use of such predicted spectra for structure elucidation. While in some applications the predicted spectrum may itself be of value, most uses of our programs exploit comparative analysis of a predicted spectrum and observed data. One use of these programs is the identification of all fragmentation processes that, within user-specified constraints, could give rise to selected observed ions. T h e most frequent application of the programs is for structural analysis in association with the CONGEN structure generating program (8). In these structural analysis applications, the programs are used to identify which of a number of candidate structures may best rationalize an observed mass spectrum; generally, the candidate structures will have been created by CONGEN and will satisfy structural constraints inferred from other chemical and spectral data. The kernel of these Mass Spectral Analysis (MSA) programs is a set of functions that can be used to predict a spectrum of a given chemical structure on the basis of some model, or “theory”, of mass spectral fragmentation processes. As the applications of these programs include both the analysis of moderately large polyfunctional molecules of arbitrary character and the detailed processing of closely related structures of known type, the models (“theories”) of mass spectral processes used by the program have to encompass a wide range of degrees of specificity. Several computational models for mass spectrum prediction have been developed. Quasiequilibrium theory (QET) can be used to predict the relative abundances of ions as a function of internal energy; the model employed by Q E T for spectrum 0003-2700/80/0352-1095$01 .OO/O
prediction envisages the fragmentation processes as a series of competing, consecutive unimolecular decomposition reactions of excited parent molecular ions (9). Applications of the Q E T method require either models of transition states or a knowledge of cross sections for reverse reactions (IO).Most applications of Q E T have been to relatively small molecules such as hydrocarbons or benzene derivatives. Pattern recognition approaches, employing trainable linear threshold logic units (TLU), have also been used for mass spectrum prediction (11). Pattern recognition methods generally attempt to establish direct relationships between molecular attributes and predicted properties-relationships which do not involve any intermediate model or interpretive steps. In the case of mass spectrum prediction, a separate T L U was trained to predict the presence/absence of each m / z of interest These TLUs employed a set of about 60 molecular attributes such as the number of rings in the molecule, the size of the smallest ring, and the number of oxygen atoms present. The fact that these pattern recognition methods do not employ an,y intermediate model or theory makes them unsuitable for our applications in which one of the objectives is the ability to explain predictions in terms of conventional models of molecular fragmentation. T h e MSA functions are designed specifically for analysis of spectra of unknown compounds by predicting spectra, as either ion compositions or just nominal masses, for specific candidate structures and rank ordering the candidates on the basis of agreement between their predicted spectra and the observed spectrum for the unknown. We stress that the MSA functions require, from some source, a set of plausible candidate structures; the observed mass spectrum is not interpreted by itself to determine such candidates. T h e MSA functions use for spectrum prediction either the “half-order theory” of molecular fragmentation (121, with significant extensions, or more detailed rule-based schemes in cases where fragmentation rules for closely related structures are known. The MSA functions take as input connection tables representing the constitution of the molecules of -interest and a definition of the appropriate theoretical model of molecular fragmentation either as a parameterized version of the halforder theory or a set of fragmentation rules. If the functions are being used to assist in the analysis of observed fragmentations, or to rank candidate structures, then the observed complete or partial spectrum must also be provided (either as nominal integral masses or actual ion compositions from high resolution mass spectrometry (HRMS)).
THEORY AND IMPLEMENTATION Half-Order Theory. DENDRAL‘S “half-order theory” of mass spectrometry defines fragmentation modes for structures in terms of one or more consecutive cleavage steps with possible additional hydrogen transfers and “neutral losses”. (The name “half-order’’ is intended to convey the fact that the model is so simple that it is unreasonable to suggest that it constitutes even a first-order approximation to reality). This theory does not attempt to describe detailed aspects of molecular fragmentations (such as specific relationships between hydrogen transfers and cleavage of certain groups of bonds), Le., it makes no attempt to express anything about the mechanisms by which fragmentations actually take place. 1980 American Chemical Society
1096
ANALYTICAL CHEMISTRY, VOL. 52, NO. 7, JUNE 1980
Table I. Fragmentation Control Parameters Used for Illustrative Processing of the Spedrum of 7-Oxoandrostane (1) single bond breaks, plausibility aromatic bond breaks, plausibility multiple bond breaks, plausibility adjacent breaks, plausibility molecular ion, plausibility allow fragmentation of fusedibridged rings? allow simple ring fragmentations? max steps per process: 2 step process, plausibility max bonds to break in a process neutral loss how many different H-transfers P Hs *:Hs -c Hs
= Hs 3
Hs
0.95 0 0 0 1.0
N Y 2 0.5 3
none 5 - 2 0.7 -1 0.9 0 1.0 10.9 2 0.7
Previous work (12) simply allowed or forbade fragmentations with no differentiation among the plausibilities of various processes. We have made significant extensions to the half-order theory for MSA in order to provide greater user control over such plausibilities in two ways: (1) T h e half-order theory now allows the complexity of fragmentation processes to be determined by user-defined plausibilities. Differing plausibilities, in the range 0-1.0, can be assigned to various factors including the total number of bonds that cleave, the number of consecutive cleavage steps required, the complexity of individual cleavage steps and allowed hydrogen transfers and neutral losses. (2) T h e half-order theory also allows for variation in the relative plausibility of cleavage of bonds dependent on their character. T h e basic model just distinguishes between the cases of single bonds, aromatic bonds, and multiple bonds, permitting a different plausibility value to be assigned to each such class of bonds. Finer discrimination, e.g., distinction between vinylic and allylic single bonds, is achieved through the use of substructural templates. These substructural templates define the structural environment of a bond for which some specific cleavage plausibility should be used. The templates are matched to a structure, by a conventional node-by-node graph-matching algorithm, and the appropriate bonds are identified; these bonds are assigned the cleavage plausibility given with the substructure. Unmatched bonds in a structure are assigned the default cleavage plausibility for their bond class. The overall plausibility of a particular fragmentation process is taken as the product of each of its component factorsplausibilities of cleavages of the bonds involved, hydrogen transfers and neutral losses invoked, and any appropriate modifying factors such as reduced plausibility associated with multistep processes or adjacent cleavages. The basic steps in spectrum prediction using the half-order theory are described in detail in the Appendix. Explanatory M o d e of Prediction. The "explanatory" mode for the program shows how the different possible fragmentation processes, allowed by the chosen control parameters, can from a given structure lead to selected observed ions. We previously discussed (12) how such an analysis can facilitate interpretation of a spectrum of Known structures. We include a brief example here to illustrate features of the spectrum prediction process which are relevant to subsequent use and interpretation of the results of spectrum prediction/structure ranking (next Section). This mode of program operation is illustrated in Table I and Figure 1. The program was used to process the (high resolution) spectrum of 7-oxoandrostane (1). T h e fragmentation control parameters used for this
Figure 1. Partial output showing possible rationalizations for selected ions observed in the spectrum of 7-oxoandrostane (1)
analysis are given in Table I; they assigned the molecular ion a unit plausibility, allowed for one- and two-step processes with a maximum of three bonds cleaved (but no cleavage of fusedlbridged rings) with each single bond having a 0.95 plausibility of cleavage, and for up two hydrogens being transferred into or out from the ion. Within these constraints, only one rationalization could be found for the structurally significant ion at mlz 135 (13),several alternative explanations were offered for the minor ion a t m / z 163 (two being one-step simple ring cleavages, the others being two-step processes with ring-cleavage and loss of a methyl substituent). The second of the three processes proposed t o account for mlz 178 ((9, 10) (5, 6)) is the process suggested in the literature (13) and verified by substituent labeling. No explanation could be offered for the ion a t C18H25for this requires loss of H20 and implies molecular rearrangements (HzO was not given as a n allowed neutral loss). The observed spectrum for 7-oxoandrostane contained about 40 different ions with nominal masses in the range 40-274 amu; 27 of these ions were included among those predicted. T h e ions observed but not predicted averaged about 3% intensity and were mainly at low mass (