Computer prediction of molecular weights from mass spectra

formation Is utilized In a computer program which predicts the unknown's ... to choose which set of predictions Is the more reliable. For ... of corre...
0 downloads 0 Views 564KB Size
179

Anal. Chem. 1981, 53, 179-182 (15) Noyes, W. A. “Organic Syntheses”; Wlley: New York, 1943; Collect. VOI. 2, pp 108-109. (16) Munson, M. S. B.; Franklin, J. L. J. Phys. Chem. 1864, 68, 3191-3 196. (17) Franklin, J. L.; Dlllard, J. G.; Resenstock, H. M.; Herron, J. T.; Draxyl, K. “Ionlzatlon Potentials, ADDearent Potentials. and Heat of Formation .. of Gaseous Positive Ions”. Natl. Sfand. Ref. Data Ser. ( U . S . Nafl. Bur. Sfand.) 1969, MSRDS-NBS 26. (18) Kandel, R. J. J. Chert?. Phys. 1955, 23,84-86.

(19) Collin, J. E. Bull. SOC.R. Scl. Llege 1963, 32, 133-136. (20) Das, R. C.; Koga, 0.; Suzuki, S. Bull. Chem. SOC. Jpn. 1878, 52, 65-68. (21) Hoshlno, M.; Ogata, T.; Aklmoto, H. Chem. Lett. 1976, 1367-1370.

R~~~~~~ for review

12, 19~0.~~~~~~~dseptember 29,

J~~~

1980.

Computer Prediction of Molecular Weights from Mass Spectra In Ki Mun, Rengach,ari Venkataraghavan,‘ and Fred W. McLafferty * Department of Chemistry; Baker Laboratory, Cornell UnIversiW, Ithaca, New York 14853

The best-matchlng splectra found by the self-tralnlng Interpretlve and retrleval system (STIRS)exhibit prlmary neutral loss data resembling that of the unknown spectrum; thls informatlon is utilized in 41 computer program which predicts the unknown’s molecular weight. Separate predictions are made assuming that the mollecular Ion is, or Is not, present. The spectral data used by STIRS for the former Include the corresponding primary neutral losses, while for the latter Include secondary neutral losses matched agalnst reference spectra that do not contaln molecular Ions. Such data are also used to choose whlch set OF predictions is the more reliable. For randomly selected unknowns (15 % with no molecular ion) the flrst choke of molecular weights is correct In 91% of cases and the flrst or second In 95 %.

Knowledge of the niolecular weight of an unknown compound is of singular importance in determining its molecular structure. One of the most sensitive and accurate sources of this information, the electron-ionization mass spectrum, is seriously handicapped by the fact that molecular ion abundances are sometimes negligible (1);for 15% of the spectra in a large data base (2)the molecular-ion abundance is 0.24% abundance separated by 1 2 daltons. The difference between the mass of the “high mass base peak” (HMBP), the most abundant peak in the 61 dalton range below MAPH, and the maw of each selected secondary-neutral-lowpeak is used as the latter’s mass value; thus this can be positive, zero, or negative. The peaks are selected in the following order: HMBP; the most abundant peak in each cluster (other than that of HMBP), starting with MAPH the next largest peak in each cluster with abundance 1 2 % of HMF’B; peaks 10.25% of the base peak, starting with the highest mass cluster. These secondary neutral loss data are designated as match factor 7A; a second set of data, MF7B, involves the same peaks except that the relative mass value of each neutral lost is based on the second most abundant peak in the 61 dalton range below MAPH. 0 1981 American Chemical Soclety

180

ANALYTICAL CHEMISTRY, VOL. 53, NO.

2,

FEBRUARY 1981

The match factor MFll.l combining MF2 through MF4 was modified to include MF1: MF11.1 = 24 X MF1 + E 3 x Ni x MFi/(24 + C 3 X Nil, where MF1 is the low ion series match factor, Ni is the number of entries in the unknown for the ith characteristic-ion data class, and MFi is the match factor in the ith characteristic-ion data class. A new match factor MF11.3 combining MF11.1 and MF7 is MF11.3 = (6 X MF11.1 + 3 x N7 X MF7)/(6 + 3 X N7), where MF7 is the larger of MF7A or MF7B, and N7 i s the number of entries in MF7. The Molecular Weight Prediction Program. The STIRS program is run using the data of the unknown mass spectrum, For the molecular weight value needed to define the primary neutral losses (MF5 and MF6), the “principal peak” of the highest mass cluster found by the halogen program (19) is used if its confidence value is >39; otherwise, the mass of MAPH is used. Two sets of 15 best-matching compounds found by STIRS are examined separately: those retrieved by the overall match factor MF11.0, which combines MF1 through MF6, and those by MF11.3, which does not use the postulated molecular weight value (in the test runs with “unknown” spectra from the data base, the data from only 14 are used, omitting the first. which is the spectrum of the unknown itself). The most probable primary neutral losses of the unknown are predicted from each set of 15 best-matching compounds by tabulating their neutral losses as follows: losses of 0-53 from data class 5A, 54-75 from 5B, 76-109 from 6A, and 110-149 from 6B. These primary-neutral-loss occurrences from the MF1l.O spectra are compared against the average occurrences of the same data in the entire reference file for spectra which contain a molecular ion (as indicated by a loss of 0 in the MF5 data); the primary-neutral-loss occurrences from the MF11.3 data are compared against the average occurrences for reference spectra which do not contain the molecular ion. The random drawing model used for predicting substructure presence by STIRS (14)is applied to predict the probability that the number of occurrences of a particular loss in the 15 spectra arises by chance; the negative base 2 log of this value is designated as the V-value following the reasoning used previously (14). Thus if there is only a 1% probability that the number of (M - 18)+ peaks found in the spectrum of the 15 best-matching compounds occurred by chance, there should be a 99% probability that this occurred because the unknown spectrum also has an (M - 18)+ peak. The sign of the V value, SV, can be positive or negative; SV = -1 shows that the number of occurrences in the top 15 compounds is less than that expected statistically, indicating that the unknown spectrum has no such neutral loss. As possible values for the molecular weight the program considers the mass of MAPH, and MAPH plus 1, 2, 15-20, and 26-100. Their probabilities are measured as a K value (ZO), calculated (eq 1)using the nonzero V values of each of the primary K = CSV X SA X (IVl+ A ) X FA (1) neutral losses n predicted by the STIRS results and the A value (20) of the corresponding peaks in the unknown spectrum which would result from such a loss if the true molecular weight had the value predicted. The A value is defined by the peak abundance: 0.24 - LO%, A = 1;-3.4%, 2; -9%, 3; -19%, 4; -38%, 5; -73%, 6; and -loo%, 7. If the abundance is loo. The molecular weight program does not consider such answers, and modification of the program to do so would probably help only in special cases, based on the difficulties found by human interpreters. Some mass spectra which do not contain molecular ions show almost no indication of the substructure(s) lost in forming the high mass peaks. Thus the spectra retrieved by STIRS for the unknown will not be of compounds containing that substructure(s), so that the primary neutral losses of the retrieved spectra will not be useful for molecular prediction. For example, the mass spectrum of nitrocyclohexane exhibits a dominant (50%) peak for the loss of NOz, typical of such nitro compounds. However, this CsHll+ peak and most others in the spectrum appear to contain only C and H, providing no indication of the NO2 group. There is a small (1%) C6H110+(loss of NO) peak which was not selected by MF7 because other peaks of the m / z 93-99 cluster are larger (2-5%). It might be helpful to modify the MF7 peak selection procedure to choose first the highest mass nonisotopic peak in a cluster, in line with well-known peak “importance” rules (1).

The program also has difficulty with mass spectra of compounds containing substructures which rarely occw in the data base. This is particularly critical for synergistic combinations of electronegative substructures; in the spectrum of citric acid the small high mass peaks correspond to losses of 35 (HzO + OH, 1%); 45 (COOH, 2%); 53 (2Hz0 + OH, 1%); 63 (HzO COOH, 15%); 81 (2H20 + COOH, 20%); and 89 (COz + COOH, 40%). For this spectrum none of the best matching compounds found by STIRS for MF7 contained more than one carboxyl group. Applications of t h e Predicted Molecular Weight. STIRS predicts the rings-plus-double-bondsvalue (21) as well as the molecular weight; it appears possible to combine these with best-matching-compound data to predict the most probable elemental compositions for the unknown. The combination of this information and substructure predictions now possible with STIRS (14,15)will be used as input to artificial intelligence programs such as CONGEN (22)to generate the possible molecular structures for the unknown which are consistent with these data.

+

ACKNOWLEDGMENT H. E. Dayringer and K. S. Haraki provided helpful advice, and P. Bruck performed some initial experiments. LITERATURE CITED (1) McLafferty, F. W. “Interpretation of Mass Spectra”, 3rd ed., University Science Books: Mill Valley, CA, 1980; Chapter 3. (2) Stenhagen, E.; Abrahamsson, S.; McLafferty, F. W. “Registry of Mass Spectral Data”, extended version on magnetic tape; Wiley: New York, 1978. (3) Mun, I. K. Ph.D. Thesis, Cornell Unlverslty, 1980. (4) Gray, N. A. 8.; Carhart, R. E.; Levanchy, A.; Smith, D. H.; Varkony, T.; Buchanan, B. G.; White, W. C.; Creary, L. Anal. Chem. 1080, 52, 1095. (5) Chapman, J. R. “Computers in Mass Spectrometry”; Academic Press: New York. 1978. (6) Pesyna, G: M.; McLafferty, F. W. “Determinatlon of Organic Structures by Physical Methods”; Nachod, Zuckerman, Randall, Eds.; Academic Press: New York, 1978; pp 91-155. (7) Biemann, K.; McMurry, W. Tetrahedron Lett. 1985, 1 1 , 647. (8) Venkataraghavan, R.; McLafferty, F. W.; Van Lear, G. E. Org. Mass Spectrom. 1080, 2 , 1. (9) Relmendal, R.; Sjoevall, J. 8. Anal. Chem. 1073, 45, 1063. (IO) OBrien, J. F.; Morrison, J. D. Aust. J. Chem. 1073, 26, 785. Silva, M. E. S. F. Org. Mass Spectrom. 1073, (11) Jardlne, A.; Reed, R. I.; 7, a n *.i , (12) Dromey, R. G.; Buchanan, B. G.; Smith, D. H.; Lederberg, J.; Djerassi, C. J. Ofg. Chem. 1975, 40, 770. (13) Kwok, K.-S.: Venkataraghavan, R.; McLafferty, F. W. J. Am. Chem. Sac. 1073, 95, 4185. (14) Dayringer, H. E.; Pesyna. G. M.; Venkataraghavan, R.; McLafferty, F. W. Org. Mass Spectrom. 1078, 1 1 , 529. (15) Haraki, K. S. Ph.D. Thesis, Cornell University, May 1980. (16) Office of Computer Servlces, Corneli University, available Internationally over TYMNET and TELENET computer networks. (17) Speck, D. D.; McLafferty, F. W.; Venkataraghavan, R. Org. Mass Spectrom. 1078, 13, 209. (18) Knuth, D. E. “Structured Programming with GOT0 Statements”; ACM Computing Surveys: 1974; Voi. 6. (19) Mun, I. K.; Venkataraghavan, R.; McLafferty, F. W. Anal. Chem. 1977, 49, 1723-1726. (20) Pesyna, G. M.; Venkataraghavan, R.; Dayringer, H. E.; McLafferty, F. W. Anal. Chem. 1076, 48, 1362-1368. (21) Dayringer, H. E.; McLafferty, F. W. Org. Mass Spectrom. 1077, 12, 53-54. (22) Smith, D. H.; Carhart, R. F. ACS Symp. Ser. 1078, No. 70, 325.

--

Received for review August 14,1980. Accepted November 6, 1980. Support of this research by the National Institutes of Health (Grant GM16609) and the National Science Foundation (Grant CHE7910400) is gratefully acknowledged.