Reliability ranking and scaling improvements to the probability based

Ying , Steven P. Levine , Sterling A. Tomellini , and Stephen R. Lowry ... Sanchez , Brice C. Erickson , Bruce E. Wilson , Lawrence E. Wangen , and Br...
1 downloads 0 Views 662KB Size
Anal. Chem. 1985, 57,899-903 New York, Tokyo, 1983;Springer Series in Chemical Physics, Vol. 25, p 206. (36) . . Rolloen. F. W.. Drivate communication and Proceedinas of the 2nd L A i M A Workshop, D-2061 Borstel, FRG, 1983,p 57. (37) Cotter, R. J. Anal. Chem. 1984, 56, 485. (38) Simons, D. S. “Ion Formation from Organic Solids”; Benninghoven,

899

A., Ed.; Springer-Verlag: Berlin, Heidelberg, New York, Tokyo, 1983; Springer Series in Chemical Physics, Vol. 25, p 158.

RECEIVED for review September 17,1984. Accepted December 20, 1984.

Reliability Ranking and Scaling Improvements to the Probability Based Matching System for Unknown Mass Spectra Barbara L. Atwater, Douglas B. Stauffer, and Fred W. McLafferty* Chemistry Department, Cornell University, Ithaca, New York 14853

David W. Peterson Scientific Instrument Division, Hewlett-Packard, 1501 California Avenue, Palo Alto, California 94304

Statistical evaluations of the effects of flve matching parameters on the probablllty of retrieving a correct answer wlth the probability based matching (PBM) system have been made. Comblnlng the resulting values found In matching an unknown spectrum makes It posslbie to rank retrleved reference spectra according to the predicted match rellabllity. Thls ranklng substantlally improves the performance of PBM, and the reliability value is especially helpful In avoldlng the assumption that the best matching spectrum represents the correct compound when Its spectrum Is actually not in the reference flle. Quadratic scaling of the abundance values of the unknown compensates for spectral dlfferences caused by Instrumental varlations, a critlcal problem In matching reference spectra. Other improvements include a more effectlve “flagglng” technlque to remove spurious reference peaks. Extensive appllcatlons wlth a commercial GCIMS system have demonstrated the increased effectiveness made possible by these PBM modifications.

Thousands of gas chromatograph/mass spectrometers (GCIMS) are now used daily worldwide (1). A major application is the identification of unknown compounds, which in many laboratories results in the production of hundreds of unknown mass spectra per day, making obvious the need for computerized identification systems (2-16). For samples representing complex mixtures, incomplete GC separation is unavoidable (17,18); for the resulting spectra which represent more than one component reverse searching (only requiring the peaks of the reference to be in the unknown) improves retrieval performance (4-7). By far the most widely used retrieval algorithm of this type appears to be probability based matching (PBM) ( 4 , 7). Although other search systems (3, 10-12) are valuable, when evaluated under various conditions (7,10,12) none appears clearly superior to PBM. In the last decade PBM has been used extensively by individual implementation (161, through computer networks (Cornell Computer Services, Uris Hall, Ithaca, NY 14853) (19), and on a commercial GC/MS systems, resulting in a variety of helpful criticisms. A major problem to many users, which appears to be common to all retrieval algorithms, is that the match ranking

factors such as the “similarity index” (3)or “confidence ( K ) value” (4,7) give only a qualitative indication of the probability that the retrieved compound represents a correct answer. For example, although a K value of 150 implies a higher match confidence than a value of 100, it does not directly indicate whether the probability of a correct identification is 50% or 95%. This deficiency can cause a particularly serious problem when the unknown compound is not represented in the reference file, as almost always the algorithm retrieves a “best”, even though a poor, match. We show here that the effect of a variety of matching indicators can be evaluated statistically, so that a combination of these can serve as a quantitative measure of the predicted reliability of the match. A second serious problem for mass spectral matching systems utilizing a comprehensive reference file is the variation in peak abundances caused by mass discrimination and change in sample concentration during the spectrum scan. As suggested independently by Dromey (8), various methods of tilting and scaling the unknown spectrum to compensate for such spectral differences are investigated here. The PBM algorithm has also been modified to improve the “peak flagging” which discards anomalous peaks in the reference spectrum.

EXPERIMENTAL SECTION Computers used include a DEC PDP-11/45 containing 56 kilobyte memory and 64 megabyte random-access disk storage, an IBM 370/168 multiuser system, and the H/P-1000 computer of the H / P 5985 GC/MS system. The data base was the expanded Registry of Mass Spectral Data (ElectronicData Div., Wiley, 605 Third Ave., New York, NY 10158) containing 41 429 different spectra of 32 403 different compounds, from which 2091 isotopically labeled spectra were excluded. From those compounds in the file represented by more than one spectrum (measured under other experimental conditions) 900 were selected at random, with the restriction that all spectra of the compound must have a quality index (QI)2 0.5 (20). For each of these compounds the spectrum of highest QI value was used to make up the list of unknown spectra, which were excluded from the data base in testing. Every odd-numbered spectrum of this list was used to make up a second “odd” list of 450. The performances of these two lists and of the PBM program versions were evaluated by using recall/reliability plots (21,22),which show the proportion of correct answers which are retrieved as a function of the proportion of retrieved answers which are correct. For the “odd” list poor PBM retrievals were used to correct obvious errors and delete

0003-2700/85/0357-0899$01.50/00 1985 American Chemical Society

900

ANALYTICAL CHEMISTRY, VOL. 57, NO. 4, APRIL 1985

Table I. Predicted Reliability Values (a)for K Categories K

0 flags

150 140 130 120

99.5 99.3 99.0 98.7 98.2 97.5 96.4 94.3 91.0 83.0 73.0 54.0 21.0 14.0

110 100

90 80 70 60 50 40 30 20

molecular ion present 1 flag 2 flags 99.4 99.0 98.5 97.8 96.6 94.5 91.5 86.3 80.5 69.5 44.3 22.0 15.5 12.3

98.7 97.8 96.4 94.3 91.5 86.3 80.5 74.0 64.8 50.3 33.5 19.3 14.0 11.0

3 flags

0 flags

96.7 95.9 94.0 90.0 83.5 76.5 69.5 59.3 49.0 37.5 22.0 15.5 12.3 9.5

95.0 94.0 92.0 89.5 86.5 83.0 77.8 72.0 45.5 40.0 30.3 21.3 16.0 8.3

molecular ion absent 1 flag 2 flags 94.5 92.0 87.5 84.3 79.8 68.5 62.0 45.8 33.0 27.7 18.5 13.5 9.0 6.0

3 flags

89.5 84.3 77.8 72.0 62.0 40.5 37.0 30.7 22.7 20.3 16.5 12.0 8.3 5.0

81.0 76.5 70.5 56.3 40.5 30.0 27.7 23.3 18.7 17.3 13.0 9.0 6.0 4.3

Table 11. Predicted Reliability Values (%) for AK Categories AK 0 10

20 30 40 50 260

0 flags

97.5 96.5 91.0 78.7 68.5 58.0 45.0

molecular ion present 1 flag 2 flags 97.0 85.8 83.8 68.5 52.0 40.3 30.5

94.5 91.0 76.3 62.3 48.0 32.3 24.3

3 flags

0 flags

91.0 87.0 70.7 55.3 32.7 24.3 21.3

91.0 88.7 75.0 59.5 47.3 34.5 25.5

the worst spectra; the final set contained 431 unknown spectra. The matches were classified (7) as: (a) the identical compound or a stereoisomer, designated as “class I”, and (b) a compound which should give a very similar mass spectrum because its structure is closely related to that of the unknown (“class IV”). From these compounds were selected the 392 unknowns for the current Wiley/NBS file of 80000 spectra; the list of unknowns and their class IV matches and mismatches is available from Cornel1 for comparative evaluations. The basic PBM matching algorithm (7) was modified with tighter window tolerances, using *37% and *20% for peaks of 30% of the fifth highest reliability value already retrieved); for further scaling, the first must have increased the predicted reliability by >lo% absolute. Reliability Ranking, The algorithm was modified to predict the probability that a retrieved reference is a correct match. This reliability value, RL, is dependent on the values of K , AK (the difference between the K value for a perfect match with the unknown and the K value found), number of peak flagging operations (0-3), whether the reference molecular ion was or was not used in matching, and the tilt (n) factor. The collective effect of these separate values on RL using the class IV definition was determined statistically (13) with the 431 unknowns (Tables I and 11). For example, a smoothed plot of reliability vs. K value showed 9.2% of matches with K = 98-102 and zero flags to be incorrect, while matches of K = 98-102 with no molecular ion were

ANALYTICAL CHEMISTRY, VOL. 57, NO. 4. APRIL 1985

901

-

Table 111. Adjustments (%) Made to Predicted Reliability Values of Scaled Spectra Sa

without scaling 0 0

-150 0 -150 -150 0 1.0 1.0

moC

V b

1.5 x 5.5 x 1.5 x 6.8 X 5.5 x 6.8 X 6.8 X 6.8 X 5.5 x 6.8 X 6.8 X 6.8 X 6.8 X

1.0 1.0 O to 1.0, 530; 1 to 55, 1028. 5.5 X lo-",1106; 5.5 x to 1.5 x 1307; 1.5 X to 0.22, 1464. For minimum/maximum mass ( m , ) values: ) 120,1495.

incorrect 85% more often than the average of all matches of this K value, resulting in a predicted RL value of 83% for K = 100, no flags, and no molecular ion. Matching a specific reference spectrum gives a separate predicted RL value from K and AK, the higher of which is used, up to a maximum value 10 units above that from K. This value is adjusted for tilt by subtracting from the predicted RL value the following absolute values: K , negative tilt, 10 if K 2 80, 20 if K < 80; K , positive tilt, 8; AK, negative tilt, 7, positive tilt, 3. The recommended PBM system now uses quadratic scaling (14)rather than tilting. The predicted RL value was found to relate to the actual reliability found for the 431 unknowns as follows: RL (%) = 0-5, 0; 40, RL,; 80, RL,,; and 100, 100; intermediate points are interpolated linearly. The values of RLa and RLso depend (Table 111) on the degree of scaling, which is described by three factors: scale (S,eq 3), where Ui is the U value of the ith reference peak S = C ( U i - Uav)AAi

(3)

I

I

401 20

1

U,, is the average U value of all reference peaks and AAt is the change caused by scaling in the A value of the ith reference peak; curvative (V), described by eq 4, which when combined with eq 1yields eq 5; and motthe mass of the minimum or maximum of the quadratic function. V = ld2y/dx21/[1 + ( d y / d ~ ) ~ ] ~ / ~ V = 12c1/(4c2m2+ 4cbm

+ b2 + 1)3/2

(4)

(5)

RESULTS AND DISCUSSION PBM Performance before Modification. The recall/ reliability performance of the subset of 450 "odd" unknowns agreed within k2%,on average, of that of the full set of 900 unknowns using K values and class I matching criteria (13), justifying the use of the smaller subset of unknowns for testing further modifications. The recall/reliability performance found by Pesyna (7) using the same PBM algorithm with a data base only 57% as large, but using unknowns selected by average molecular weight, gave recall values which were on average 8% higher (13). Correcting errors, which reduced the 450 to 431 spectra, approximately halved this discrepancy in recall values. There also should be a larger possibility of error caused by increasing the proportion of wrong answers in the data base (15). The class I performance a t low recall values was much lower with the larger data base, reflecting (13)an increased number of reference structures closely related to those of the unknowns. Class I V matching criteria are designed to be less sensitive to such deficiencies of mass spectrometry, and so are used here to be more sensitive to the

0

20

60

40

80

IO0

Reco I I

Figure 1. Effect of abundance based flagging on PBM performance (class IV, K values, class I recall): A,results with original PBM (7); 0, with abundance based flagging; -, with a minimum increment of pmlnin each flagging (actually based on the difference between these and the previous conditions run with slightly modified parameters) (74). The separate data point values 20, 40, ..., 160 represent the minimum K values included in determining the recaWreliability values.

performance of the retrieval algorithm itself. Abundance-Based Flagging. Examination of condensed spectra of correct answers not retrieved using the original algorithm (7) showed some contained more than three artifact peaks of low abundance which had been included in the condensed spectrum because they had high U values. An artifact peak produces an unusually low (or zero) value of pmm; if after three flagging operations all artifact peaks are not eliminated, the resulting low pmin value prevents the match of other reference peaks of higher abundance. The new meabundance in the thod removes all low p value peaks of 4% first flagging operation, those of l o % RL in the current PBM system. The original study (7) showed a substantial effect of the unknown's molecular weight on PBM performance; this should also be incorporated into the reliability ranking. The effect of reliability ranking can be illustrated by the results from an actual unknown supplied by D. Henneberg as part of a test of different retrieval systems (IO). Although the unknown spectrum was that of 1-dodecene, the first references retrieved by PBM without reliability ranking were those of undecanol and 1-tridecene, K = 99 (Table V). Their predicted class IV reliabilities are only 41% and 32%, lowered by the failure to match the molecular ion and by the three flagging operations. However, the unknown has a peak corresponding to the molecular ions of the 1-dodecene reference spectra, and for three of these fewer flagging operations were necessary. Thus the five best answers by reliability ranking are now correct. Two-Level PBM. An extensive effort was made to use further data of the reference spectrum, not in the PBM condensed spectrum, to improve the ranking of the best matching compounds found in the original PBM search. A variety of ways of assigning new uniqueness values to the peaks of this small subset of spectra, followed by further PBM-type matching, were unsuccessful in improving the PBM performance; the additional structurally significant peaks were accompanied by an offsetting number of artifacts or misleading peaks (14). Applicability of Reliability Ranking. These PBM improvements (except scaling) have been available for over 2 years on a commercial GC/MS system used in hundreds of laboratories. Experienced users have generally recognized these improvements as significant; some of their suggestions have resulted in further improvements to be reported separately. This concept has also been extended to the prediction of the presence of specific substructures out of a list of 600 from unknown mass spectra using the "self-training interpretive and retrieval system" (24). For other types of libraries retrieval systems having several parameters indicating the

confidence class IV rank by Kb AK reliability, % RL value

21 29 23 23

41 32 87

87+

20 16

87 73 94

84*

23

55

81*** 80*

26 40

28 52 42 64

78** 42 77***+ 21

9 10 2 3 4

1 6 11

7 8 6

" The correct answer is in italics. The data base has 11different CI2Hz4compounds. bThe number of asterisks (*) indicates the number of flagging operations; plus (+) indicates that the molecular ion was matched. degree of match should also benefit by combining these into a single value to provide a quantitative indication of the probability that the retrieved file actually represents the unknown.

ACKNOWLEDGMENT We are grateful for stimulating discussions with R. G. Dromey, R. G. Ellis, K. S. Haraki, 1. K. Mun, J. W. Serum, and R. Venkataraghavan and to PBM users, too numerous to be named here, for valuable suggestions. LITERATURE CITED (1) Burlingame, A. t.;Whltney, J. 0.; Russell, D. H. Anal. Chem. 1984, 5 6 , 417R. (2) Smith, D. H. "Computer-Assisted Structure Elucidation"; American Chemical Society: Washington, DC, 1977. (3) Hertz, H. S.;Hites, R. A.; Biemann, K. Anal. Chem. 1971, 4 3 , 681. (4) McLafferty, F. W.; Hertel, R. H.; Vlllwock, R . D. Org. Mass Spectrom. 1974, 9, 690. (5) Abramson, F. P. Anal. Chem. 1975, 4 7 , 45. (6) Gronneberg, T. 0.; Gray, N. A. B.; Eglinton, G. Anal. Chem. 1975, 4 7 , 415. (7) Pesyna, G. M.; Venkataraghavan, R.; Dayrlnger, H. G.; McLafferty, F. W. Anal. Chem. 1976, 48, 1362. (8) Dromey, R. G. Anal. Chim. Acta 1979, 112, 133. (9) Atwater, B. L.; McLafferty, F. W. Anal. Chem. 1979, 5 1 , 1945. , (IO) Henneberg, D. Adv. Mass Spectrom. 1980, 86,1511. (11) Domokos, L.; Henneberg, D.; Wiemann, B. Anal. Chim. Acta 1983. 150, 37. (12) Cieij, P.; van't Klooster, H. A.; van Houwellngen, J. C. Anal. Chlm. Acta 1963, 150, 23. (13) Atwater, 8. L. Ph.D. Thesis, Cornel1 University, 1980. (14) Stauffer, D. B. Ph.D. Thesis, Cornel1 university, 1984. (15) McLafferty, F. W.; Stauffer, D. B. Int. J . Mass Spectrom. Ion Proc. 1984, 58, 139. (16) Shackelford, W. M.; Kline, D. M.; Faas, L.; Kurth, G. Anal. Cbim. Acta 1983, 146, 15. (17) Rosenthal, D Anal. Chem. 1982, 5 4 , 63. (18) Davis, J. M.; Giddings, J. C. Anal. Chem. 1983, 55, 418 (19) Milne, G. W. A.; Heller, S.R. J . Chem. Inf. Comput. Sci. 1980, 2 0 , 204 (20) Speck, D. D.; Venkataraghavan, R.; McLafferty, F. W. Org. Mass Spectrom. 1978, 13, 209-213. (21) McLafferty,F. W. Anal. Chem. 1977, 4 9 , 1441-1443. (22) Salton, G. "Automatic Information Organization and Retrieval"; McGraw-Hill: New York, 1968. (23) McLafferly, F. W. "Interpretation of Mass Spectra"; 3rd ed.; University Science Books: Mill Valley, CA, 1980; pp 79-84. (24) Haraki, K. S.; Venkataraghavan, R.; McLafferty, F W. Anal. Chem. 1981, 53, 386-392.

RECEIVED for review August 8, 1984. Accepted December 26, 1984. This work was supported by the Industry/University Cooperative Research Program of the National Science Foundation, Grants CHE-7910400 and -8303340.