Computerized learning machines applied to chemical problems

May 1, 2002 - Chemical information from computer-processed high resolution mass spectral data: determination of fragmentation pathways for single func...
4 downloads 6 Views 803KB Size
of the liquid scintillation spectrometer which employs integrated electronic circuitry, sophisticated logic, and carefully selected photomultiplier tubes, when coupled with machine computation for data handling, the argument of the advantage of a linear over a curvilinear correlation curve becomes a weak one. It has been suggested (7) that a proper shaped curvilinear correlation curve which yields constant relative sensitivity of channels ratio to counting efficiency over the full quenching range will suffer little loss in accuracy with increasing quenching and is preferable to a linear correlation curve. The method presented above can generate the desired curvilinear curves in a rational approach whereas the conventional trial and error method cannot. APPENDIX

Apparent specific count rate-Le., the observed count rate per unit sample concentration in the presence of quenching So = Specific count rate in the absence of quenching. With a nonquenched source such as t ~ l u e n e - ~ H or toluene-lT, the observed count rate of the sample is identical to So; whereas with a quenched source, So

Sa

=

(7) B. E. Gordon, Shell Development Co., Emeryville, Calif., private communication, 1968.

can only be obtained by extrapolation according to Equation 1. [Cf. Ref. (41 e = Base of natural logarithm C = Sample concentration C l i ~= Half-value concentration of a quenched sample that will reduce the count rate to half of its initial value by quenching q = Quenching constant. It equals 0.693/C1/2 C,, = l/q = 1.44Cliz. Cois the reciprocal quenching constant and is identical to Cmaxgiven in Ref. (4)--i.e., the point of sample concentration where the rate of increase of count rate balances that of quenching. (In this paper Co is used to designate the reciprocal quenching constant for the counting channel having the lowest discriminator level as base line.) qt = Quenching constant measured at base line discriminator level i (i = 1,2,3,. . . . . . . .) St = Apparent specific count rate measured in integral mode of counting at base line discriminator level i ( i = 1,2,3,. . . . . ) S0i = Specific count rate in the absence of quenching measured in integral mode of counting at base line discriminator level i (i = 1,2,3,. . . . .)

RECEIVED for review May 17, 1966. Resubmitted May 20, 1968. Accepted September 26, 1968.

Computerized Learning Machines Applied to Chemical Problems Molecular Formula Determination from Low Resolution Mass Spectrometry P. C. Jurs, B. R. Kowalski, and T. L. Isenhour Department of Chemistry, University of Washington, Seattle, Wash. 98105 A computerized learning machine has been applied to the chemical problem of determining molecular formulas from low resolution mass spectra. Learning involves applying feedback to a decision process to minimize the probability of undesired responses. The basic concepts of learning machine theory are presented, along with the computer programs which, using a training set of low resolution mass spectra, empirically develop pattern classifiers to correctly determine molecular formulas. Statistically varied spectra are used to demonstrate the high reliability of the system, and some possible applications of learning machines to other areas of chemical data interpretation are discussed.

cause of time limitations--i.e., spacecraft trajectory calculations. Third, computers can be used in the design, optimization, and simulation of experiments which would be more expensive or time consuming, or even impossible, to actually perform. Fourth, computers can learn to solve specific problems without recourse to theory. Starting in the late 1940’s a great many books, papers, and conference reports have dealt with the various phases of the theory, design, development, and use of learning machines (1-15). Such studies have been the province of applied mathematicians, statisticians, computer-oriented engineers,

DATA EVALUATION tasks normally considered to require human

(1) John Von Neumann, “The Computer and the Brain,” Yale University Press, New Haven, Conn., 1958. (2) E. Feigenbaum and J. Feldman, Eds., “Computers and Thought,” McGraw-Hill Book Co., New York, N. Y., 1963. (3) J. T. Tou and R. H. Wilcox, Eds., “Computer and Information Sciences,” Spartan Books, Washington, D. C . , 1964. (4) N. J. Nilsson, “Learning Machines,” McGraw-Hill Book Co., New York, N. Y., 1965. ( 5 ) M. Minsky, Proc. IRE, 49, 8 (1961). (6) M. Minsky, IRE Trans. Human Factors Electron., HFE-2, 39 (1961). (7) P. L. Simmons, IRE Trans. Electron. Computers, EC-10, 462 (1961); EC-11, 535 (1962). (8) R. J. Solomonoff, Proc. IEEE, 54, 1687 (1966). (9) A. Newell, J. C . Shaw, and H. A. Simon, IEM J . Res. and Develop., 2, 320 (1958). (10) A. L. Samuel, ibid., 3, 211 (1959); 11, 601 (1967). (11) W. H. Highleyman, Proc. IRE, 50, 1501 (1962).

intelligence can be performed by computerized learning machines. In conjunction with computer controlled information handling systems which automatically perform routine duties, such intelligent machines can materially aid in dealing with the tremendous quantities of information produced by sophisticated scientific instrumentation. This paper deals with the development of such a computerized learning machine and its application to the interpretation of chemical data. Scientific computer applications to date can be classified into four groups. First, computers can be used to solve problems which can, in theory, be solved by other means which are slower and perhaps more amenable to error-i.e., x-ray crystallographic calculations. Second, computers can solve problems which are not solvable by slower means be-

VOL. 41, NO. 1, JANUARY 1969

21

and others in several disciplines investigating biological behavior on the neural level. Work has also been done with game-playing machines (9, IO), pattern recognition machines (11-13), and the simulation of biological systems (14). This paper examines some possible applications of learning machines to the data of physical science and describes a specific application to a chemical problem. Trainable pattern classifiers or learning machines may be considered intelligent in that they can learn to select an appropriate response upon receipt of a wide range of stimuli. Such learning is accomplished by selectively applying feedback to the decision process so as to minimize the probability of undesired responses and maximize the probability of desired responses. The learning capabilities of trainable pattern classifiers depend strongly upon the wise selection of responses to be considered correct or acceptable. Operationally, such a machine learns through the following process. A stimulus in the form of a pattern is presented and the machine is asked to categorize it among the allowed responses. If the desired response results, no action is taken because the decision process has correctly solved the problem. If the response is an undesired one, feedback alters the decision process by reducing the probability of again producing that erroneous result upon receipt of that stimulus; additionally, the probability of producing the desired result may be increased. A number of sample problems, called the training set, are presented repeatedly as long as the machine improves its performance at a rate which justifies the time and effort expended. The machine thereby learns to recognize features in the patterns which are significant in determining the category of the pattern. For evaluating data, it is important to recognize three properties of learning machines. First, because of their ability to consider more pieces of data than can a human interpreter, learning machines can be used to investigate complex, obscure, and overdetermined data for features which may be linked to underlying causes. Second, because learning machines have adjustable memory time constants, they can select their rate of forgetting to advantage. Additionally, the two modes of forgetting-purging information from the memory, and temporarily ignoring unneeded informationcan be used at will. Third, learning machines work on the literal level using only the directly applicable data in solving problems. (To date they have no capacity to formulate intuitive hypotheses.) This characteristic is simultaneously a strength and a weakness. On the one hand because learning machines are completely unprejudiced they are free to draw what conclusions the data warrant and are uninfluenced by various scientific schools of thought. On the other hand they generally cannot bring scientific intuition to bear on a particular problem. However, because intuition is apparently a learning process whose workings are not well understood, it would seem that in the future, layered networks of learning

/ PATTERN

22

ANALYTICAL CHEMISTRY

SUMMATION

2

THRESHOLO DISCRIMINATOR

Figure 1. Schematic representation of binary pattern classifier

machines may overcome this handicap by learning general methods of problem solving. MATHEMATICAL FOUNDATIONS OF LEARNING MACHINES

The foundations of learning machine theory can be stated concisely in mathematical terms. Many approaches to the theory of learning machines have been made (4, 11, 12, 16); the following treatment is based on geometrical concepts and derives some terminology from Nilsson’s monograph on learning machines ( 4 ) . A pattern containing delements, X I , x2, . . . , xd,can be considered as a vector X in d-dimensional pattern space Ed which extends from the origin to the point (xl,x 2 , . . . , x d ) . A binary pattern classifier, or pattern dichotomizer, maps pattern points into two sets separated by a decision surface cutting Ed into half-spaces. This decision surface can be defined by a function f(X) which is a scalar and single-valued function of the pattern X. A linear discriminant function is of the form s =f(X) =

w1x1

+ wzxz + . . . . +

WdXd

+

Wdfl

(1)

where f(X) > 0 corresponds to one category and f(X) < 0 corresponds to the other. The decision surface is represented byf(X) = 0. Thus, a binary pattern classifier assigns patterns to one of two categories according to which side of the hyperplane decision surface in pattern space they fall. A schematic representation of a binary pattern classifier is shown in Figure 1. The hyperplane decision surface has its orientation determined by the parameters wI, w2,. , . , w d and its position determined by w d + l . In order to have a useful pattern classifier, it must be trained. Training consists of allowing the classifier to find a set of weights wl, w2,. . . , wd,w ~ such + ~ that the training set of patterns is dichotomized as desired. Training occurs in weight space, a specific case of multidimensional space. A pattern classifier with d 1 weights can be represented by a point in (d 1) - dimensional weight space, or a (d 1) - dimensional vector W. A (d 1)st component whose value is always one is added to the original pattern vector X and this new augmented pattern vector is named Y. There is then a hyperplane in weight space defined by

+

(12) H. D. Block, N. J. Nilsson, and R. 0. Duda, “Determination and Detection of Features in Patterns,” in “Computer and Information Sciences,” J. T’. Tou and R. H. Wilcox, Eds., Sparton Books, Washington, D. C., 1964, p 75. (13) R. Casey and G. Nagy, IEEE Trans. Electron. Computers, EC-15, 91 (1966). (14) F. Rosenblatt, “A Model for Experimental Storage in Neutral Networks,” in “Computer and Information Sciences,” J. T. Tou and R. H. Wilcox, Eds., Sparton Books, Washington, D. C., 1964, p 16. (15) M. A. Sass and W. D. Wilkinson, “Symposium on Computer Augmentation of Human Reasoning,” Spartan Books, Washington, D. C., 1965.

WEIGHTS

-CATEGORY

+

+

+

Y*W= 0

(2)

called the pattern hyperplane, which divides weight space in (16) D. B. Brick and J . Owen, “A Mathematical Approach to Pattern Recognition and Self-Organization,” in “Computer and Information Sciences,” J. T. Tou and R. H. Wilcox, Eds., Spartan Books, Washington, D. C . , 1964, p 139.

two parts. If the categories of interest are linearly separable, then a weight vector W exists such that Y.W YsW

>0 0. Therefore vector 5 is the next to be used. Multiplication of vector 5 with the third column yields s < 0, so vector 6 is used next. The required multiplication is performed and a scalar s > 0 results. The carbon number is thus determined to be 6. The calculation proceeds as indicated at the bottom of Table I11 and the entire molecular formula is determined as C6H1202. The spectrum used was from the original data pool and is methyl n-pentanoate; the molecular formula was correctly determined in -0.05 second. The successful determination of molecular formulas from low resolution mass spectra demonstrates the feasibility of applying learning machines to the evaluation of chemical data. This method of data interpretation can be implemented on a routine basis in any laboratory performing many mass spectrometric analyses. The necessary computer programs could be set up to accept new data continuously as they are developed and also to do automatic determinations on unknown spectra. Slight modifications of the procedures presented here would make this method of data evaluation very valuable in numerous industrial situations. Alternatively, the vectors developed during this study could be used by a laboratory whose equipment and data handling procedures were compatible with the American Petroleum Institute Research Project 44 participants. Computer programs for training the vectors and computing molecular formulas are available upon request. POTENTIAL APPLICATIONS OF LEARNING MACHINES

All data reduction devices may be regarded as interpreters which transform data into a more understandable form. The special feature of learning machines is that they display intelligence by becoming progressively better interpreters (18) A. L. Crittenden, University of Washington, personal communication, 1968.

as their experience increases. The learning machine developed during this study reads mass spectra and translates what it reads into terms chemists can understand and relate to the characteristics of the compounds which produced the spectra. It begins ignorant of any relationship between the data and the desired answers and develops its own relationship without theoretical bases. It should be noted that this amounts to bridging the gap between experimental data and fundamental descriptions without stepwise translations which can increase the noise level of data markedly. If the learn-

vlass 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

Table I. Examples of Trained Vectors Oxygen Nitrogen Mass Oxygen - 462 618 73 991 -155 219 - 123 74 651 15 77 1333 76 -3 - 862 - 87 335 77 300 - 527 78 444 - 397 - 127 19 291 -7 - 541 80 11 - 234 310 - 173 81 418 - 153 82 255 379 - 69 1 1 - 273 83 84 1 1 - 343 85 78 - 693 - 193 86 374 124 - 393 281 167 - 199 87 88 241 303 - 154 89 -536 -69 384 420 -616 -27 90 -293 91 519 - 99 92 967 - 92 - 86 7 93 -447 199 93 - 33 94 41 1 1 95 11 1 1 -159 96 71 -409 97 255 98 593 -297 635 784 1413 99 - 10 - 88 - 1281 100 307 -957 101 423 282 - 171 102 232 27 99 -60 103 - 99 196 - 128 104 -25 513 - 125 105 1 767 - 746 106 1 1653 107 1 803 29 - 59 108 89 -13 - 342 109 401 84 - 949 110 113 -280 1 - 647 111 278 112 - 565 - 59 - 851 406 113 - 51 - 1024 -99 114 845 - 568 240 115 485 116 - 19 685 111 165 -40 117 1 118 290 -95 1 88 96 119 1 569 - 127 120 1 53 46 1 121 1 254 175 122 1 -24 - 199 123 385 -433 -571 124 85 1542 - 171 125 1 -513 -717 126 1 - 225 153 127 1 57 - 84 128 1 296 150 129 1 170 -266 130 1 282 - 395 131 1 141 - 84 77 132 43 - 75

Nitrogen - 57 - 196 165 501 -467 226 216 152 16 362 - 49 207 -268 -248 7 - 50 -316 - 27 - 141 -6 579 177 55 241 -44 -46 90 -99 131 59 201 53

VOL 41, NO. 1 , JANUARY 1969

1

1 1

23 101 29 1 113 29 1 1

1 1 1 1 1

1 1

769 169 1 1

1 1 1

1 1 - 88

0

25

Table 11. Errors in Statistically Varied Data

IJ

(%I 1 5 10 15 20

Number incorrect out of 1000

Per cent correct

C

Error Occurrence H 0

0 7 24 30 42

100.0 99.3 97.6 97.0 95.8

0 2 12 13 19

0 4 9 10 16

N

0 0 2 2 4

0

1 1 6 5

Table 111. Molecular Formula Determination Mass number

Relative intensity

15 27 28 29 30 31 32 39 40 41 42 43 55 56 57 58 59 69 73 85 101 116 132

12 6 3 28

1 5 3 10 2 38 3 2 3 25 100 4 7 3 8 7 3 4 1

Adjusted relative intensity

1

5

34 24 17 52 10 22 17 31 14 61 17 14 17 50 100 20 26 17 28 26 17 20 10

399 67 - 1049 - 173 - 397 168 - 15 554 12 29 97 -13 129 266 630 292 65 584 29 1 31 565 1 34

255 1435 642 - 656 - 770 - 308 - 1007 - 1054 244 319 860 - 34 -29 404 -115 220 -71 4 415 1030 -9 231 -61

Vector Number

-

-

-

-

Vector No.

Scalar

1 5 6 7 15 16 18 22 23 24 25

115748 5320 41694 109820 - 13020 66185 15850 24128 9826 -21037 - 24302

6 917 1267 - 192 535 428 -9 - 503 413 - 571 1250 433 148 223 909 245 - 57 137 2080 62 -91 1070 181 -46

-

-

-

-

-

ing machine can deduce molecular formulas from the spectra, it should also be able to deduce such characteristics as bond types, structure, bond strengths, cracking patterns, and any other characteristics of the compound which affect the mass spectrum. In general, trainable pattern classifiers or learning machines can be used to classify any set of patterns whose components depend upon the characteristics used to label the classification categories. This is precisely a description of the type of data one tries to acquire in the physical sciences. Whether the learning machine approach should be used for the interpretation of a particular type of data depends on the comparative costs relative to alternative methods. This will depend to a 26

ANALYTICAL CHEMISTRY

7 407 839 869 - 67 247 62 -315 291 125 44 388 76 205 501 394 288 309 219 37 93 -9 121 140

-

-

-

-

15 571 1196 979 522 - 244 934 - 1761 - 140 - 66 - 641 141 - 136 -277 867 86 334 - 309 710 366 - 33 - 1291 - 305 - 95

-

16

18

22

23

114 -135 - 862 44 1 216 154 565 -755 - 536 -6 580 420 97 339 213 169 -440 - 293 - 244 967 85 - 45 - 202 - 1225 1 5 - 447 511 - 537 158 - 88 - 1943 -751 - 14 -957 1047 500 52 171 129 127 99 155 509 78 196 177 109 685 130 413 - 207 337 165 -40 - 59 383 - 160 -95 147 208 96 - 54 130 -20 106 569 -117 86 170 514 119 991 153 483 142 13 -212 78 1453 282 918 149 1064 111 -2738 61 1 13 1 - 75 -13

-

- 319

-

-

-

-

-

24

25

250 - 87 78 303 - 146 - 69 -616 92 -315 519 - 268 - 92 7 - 339 - 198 -1281 423 - 124 232 - 392 - 60 -94 - 153 - 128 - 245 - 19 - 178 165 290 166 88 -218 256 - 127 284 266 - 59 - 57 408 -268 - 103 131 1 - 59 -6 - 88

-

Conclusion Use vector 5 Use vector 6 Carbon No. is 6 Use vector 15 Use vector 16 Use vector 18 Hydrogen No. is 12 Use vector 23 Use vector 24 Oxygen No. is 2 No nitrogen present.

.. . ..C ~ H I ~ O ~ N ~

large extent on whether analytical functions are known which adequately describe the stepwise translations. If they are not known, then the direct route from the data to the answer may be preferable to several intermediate numerical steps. Thus learning machine evaluation and interpretation of data should prove advantageous in a great many areas including the sciences. Specific examples of areas suitable for learning machine data evaluation abound in the physical sciences. The following are standard experimental techniques to which the application of learning machines might considerably increase the available information : emission spectrometry contains much information, but the spectra are hard to interpret because of

overdetermination; the long wavelength regions of infrared absorption spectrometry are known to contain information about bond types and strengths, but interpretation to date has been quite difficult; X-ray fluorescence studies are again overdetermined and contain more bonding information than is presently utilized; the further resolution and interpretation of complex X-ray and gamma-ray spectra would enhance nuclear techniques; other instrumental data sources as NMR, ESR, etc., could use the most sophisticated data evaluation methods available. Hence, for scientific data

evaluation, as well as in other areas of information handling, the learning machine approach should prove to be an important and powerful technique. ACKNOWLEDGMENT Acknowledgment is gratefully given to W. S. Chilton for his helpful suggestions regarding this work. RECEIVEDfor review June 6, 1968. Accepted October 3, 1968.

Determination by Neutron Activation Analysis of the Burn-Up Indicator Neodymium-148 in Irradiated Uranium Dioxide-PI utonium Dioxide M. R. Monsecour and A. C . Demildt Studiecentrum coor Kernenergie, Centre #Etude d e I‘Energie NuclPaire, Mol, Belgium

A radiochemical method has been developed for the isolation of Nd from irradiated UOz-PuOzfuel elements and measurement of ImNd by neutron activation analysis. The method is based on separation by reversed phase chromatography, with di(2 ethy1)hexylorthophosphoric acid (DZEHPA) as thestationary phase, and anion exchange on Dowex 1-X4 columns mixed with PbOz. The chemical yield, after the subsequent steps, is determined by adding 14’Nd tracer at the start of the separation procedure. The 14SNd (n,r) 149Nd neutron activation reaction carried out in the BR 2 reactor is used for the radiochemical determination.

To MEASURE the burn-up of a reactor fuel element, it is sufficient to determine one or more fission product nuclides whose concentrations at every point in the fuel are directly proportional to the number of fissions. In fuel elements working at relatively low temperatures are (500 “C and lower) the concentrations of 13’Cs and proportional to the fission density and these nuclides may be used as bum-up indicators. However, in high temperature fast reactors the longitudinal thermal diffusion of these nuclides may become important when long residence times in the reactors are envisaged. The ideal fission indicators are the stable isotopes of the more refractory elements provided they do not have very high burn-out cross sections and are not formed simultaneously by neutron capture from the lower chains. This kind of element provides the additional advantage of avoiding all absolute y - or @-raymeasurements which on a routine basis, are difficult to perform with sufficient accuracy. For these reasons certain isotopes of neodymium and molybdenum have been proposed (1-3). The nuclide 148Nd is particularly useful as a fission indicator because of its almost identical fission yield for 235U and 239Pu (1.7%). Although (1) J. E. Rein and B. F. Rider, U. S. At. Energy Comm. Rept., TID-17,385 (1963). (2) W. J. Maeck and J. E. Rein, ibid., IDO-14,656 (1965). (3) B. F. Rider, ibid., GEAP-4053-2, 2nd quarttrly report (1962). (4) W. J. Mc Gonnagle, “Nuclear Materials Management,” Proceedings of a Symposium, Intern. At. Energy Agency, Vienna,

STI/PUB/110,851(1966).

McGonnagle ( 4 ) proposed and Rider (5) mentioned the possibility of neutron activation analysis, no experimental data were available; therefore, such a method was developed. The composition of irradiated nuclear fuels is a complex mixture of elements. Apart from U, Pu, and certain transplutonium nuclides, all radioactive and stable fission products are present as well as the cladding constituents such as stainless steel and zircaloy. The nuclear data of interest in the 148Nddetermination by neutron activation are listed in Table I. The amounts of 14*Nd and 149Nd resulting from (n,f) reactions in the fuel should be several orders of magnitude larger than the quantity originating from neodymium present in the fuel as impurity. The expected 149Nd activities in fuel materials are listed in Table 11. Because one megawatt-day per ton (MWd/t) equals 2.695 X l O I 5 fissions gram-’or 1.126 X 10-2pg 148Ndgram-‘ fuel, it is easily seen that a depleted reactor fuel which dissipated 10.000 MWd/t energy contains 112.6 pg 148Nd gram-’. This amount forms, during a neutron activation of 5 minutes, 5.1 x 1010 atoms 149Ndgram-1 if a neutron flux (4) of 1014 n ern+ sec-l is available. Comparing this figure with the 149Ndproduced by 235Uand 239Pufission during the activation, it is easily seen that a decontamination of the order of 10’ is required if the fission contribution during the activation analysis has to be kept below 0.1 %. A chemical procedure had to be developed to separate sufficiently pure neodymium from the depleted fuel material. The required decontamination factors for uranium and plutonium are so severe that a quantitative recovery of neodymium can not be achieved. A 147Nd tracer is used to determine the over-all chemical yield. Use of this tracer imposes two limitations: the fuel must be cooled a long enough time to get rid of all the 147Ndresulting from fission and the tracer used must be sufficiently free from I4*Nd so that this addition may be neglected in comparison with the 148Ndformed during the burn-up. For a natural UOz-l % Pu02 mixture for example, (5) B. F. Rider, J. P. Peterson Jr., C. P. Ruiz, and F. R. Smith,

U. S. At. Energy Comm. Rept., GEAP-4621, 10th quarterly report (1964). VOL. 41, NO. 1, JANUARY 1949

27