Chemical applications of machine intelligencee

The learning machine method, a totally empirical method ofdata interpretation, is ... to chemical measurements so that small machines in individual la...
2 downloads 1 Views 4MB Size
THOMAS

L. ISENHOUR

Department of Chemistry University of North Carolina Chapel Hill, N.C. 27514

Some Chemical Applications of

PETER C. JURS Department of Chemistry The Pennsylvania State University University Park, Pa. 16802

ISTERPRETATIOS of experiiiieiital data and the corresponding establishmelit of cause and effect relationship< are essential aspects of experimental chemistry. I n general, the investigator has data which lie n%hes to place into certain categories. For example, infrared spectra can he u>ed to place compounds into categories defiiied b y functional groups, or pK, values can be used to define the degradation products of certain protein reactions. Placing data into specific categories, then, is often the basis of interpretation of experimental remlts. TITOapproaches can lie used to relate data to categories-theoretical or empirical. Theoretical data interpretation is usually preferred hecause it is based on explicit causal relationships derived from earlier observations or from logically constructed models. T h a t is. scientists norinally prefer interpretations based on theory because they feel they understand the measurement procecs in some or even all aspects. However, not even the most ardent theoretician would be likely to attempt the interpretation of the dc arc emission spectra of an iron alloy starting from first princiTHE

20A

ple>. Empirical methods are, however, readily applied in many coniiiion analytical situations ; and, most frequently, some combination of the theoretical and empirical approaches i> used. For example, n-liile most scientists are satisfied with current theories of light ahsorption hy molecules, it 1‘ standard procedure to measure the spectrum of a new compound and select a desirable absorption wavelength emp1rically in order to develop a colorimetric method. The learning machine method. preqeiited here, is a totally empirical method of data interpretation. The sole assumption is that a relationship b e t w e n the data and the defined categories exists-i.e., the experiment measured something related to the property of interest. Even this assumption will be investigated by the empirical method itself. Hence, the learning machine method does not depend upon established theory and, while it is disadvantageous in that accepted hypotliesm may riot be used, it is siniultaneously advantageous in t h a t interpretation will not be restricted to current accepted schools of t 11ought , The term “learning” used in this

ANALYTICAL CHEMISTRY, VOL. 43, NO. 10, AUGUST.1971

context refers to a decision process which improves performance of a task as its experience a t performing the task increases. The application of negative feedback causes the decision process to be modified t o discriminate against wrong answers, therefore improving its performance with time. I n general, empirical relationships are established between available inputs and desired outputs. I n this article the inputs will be chemical measurements and the outputs will be the previously mentioned data categories. Pattern Recognition

Starting in the late 1940’s a great many books, papers, and conference reports have dealt with the various phases of the theory, design, developnient, and use of learning machines 11-13). Such studies have been the province of applied mathematicians, statisticians, computeroriented engineers, and others in several disciplines investigating biological behavior on the neural level. A recent review by Nagy (141 demonstrates the amorphous nature of the subject. Applications have appeared in such divergent scientific areas as character recognition (alphabetic and numeric),

REPORT FOR ANALYTICAL CHEMISTS

The learning machine method, a totally empirical method of data interpretation, is based on a system whereby a decision process improves its performance as its experience at performing the task increases. Negative feedback causes the decision process to be modified to discriminate against wrong answers, and thus its performance improves with time. One can envision the centralization of such calculations relating

to chemical measurements so that small machines in individual laboratories could make decisions on data. Other possibilities, likely and bizarre, can be considered in artificial intelligence

particle tracking (cloud, bubble, spark) , fingerprint identification, speech analysis, weather prediction, medical diagnosis, and photographic processing (cell images and aerial photography). Recently, chemical applications have started to appear in a number of areas of spectroscopy (15-29). The pattern recognition procesb will be described as four stages: measurement, feature selection and preprocessing, discriminant training, and generalization. Measzirement. The measurement process is generally not a problem in chemical applications. Indeed it has been said t h a t the modern problems of data interpretation have been generated by the incredible rate a t which modern instruments can produce data. T h e quality of data is generally excellent in the physical sciences, and, in most cases, meaningful limits can be placed on accuracy and precision, and experiments can be repeated to check reproducibility. D a t a to be used in pattern recognition studies are represented as vectors X = (xl,x2, . . . , z d ) . T h e utility of this representation is demonstrated by the following example. Figure 1 shows a two-dimensional

plot of the melting points and boiling points of several organic compounds. A’s represent organic acids and K’s represent ketones. Xote t h a t each point in the two-dimensional space completely defines the two pieces of information, and, furthermore, the points could be reprewilted as two-dimensional vectors from the origin. It is clear t h a t acids are high boiling and high melting, while the ketones are low boiling and low melting. Hence, from this figure an investigator who knew nothing of chemistry would immediately recognize t h a t the acids and ketones cluster on this plot, and furthermore, that the experimental data available suggest a good method of distinguishing between the two categories, acids and ketones. Figure 2 makes clear the notion of linear separability, meaning the pattern points can be placed into their two classes by a linear decision surface-a line for this twodimensional case. M a n y types of data can be represented in vector form by providing a sufficient number of dimensions. Chemical spectroscopy d a t a , for example, is usually a spectrum of intensities vs. frequency or wavelength. (Mass spectrometry is a

notable exception where the abscissa is mass-to-charge ratio.) If the abscissa is quantized, the number of dimensions can simply be the range of the abscissa divided by the resolution. For example, an infrared spectrum recorded from 2.0 to 14.9 pni with data collected every 0.1 pm can be represented by a 130component vector or a single point in 130-dimensional space without information loss. ilpproxiniately 150 dimensions are sufficient to represent low resolution mass spectra of simple organic compounds with up t o 10 carbons. The learning machine method is one approach to finding decision surfaces in such a multidimensional space where the experimenter can no longer plot the data and simply look for clusters or other trends. Feature Selection and Preprocessing ( T r a n s f o r m a t i o n ) . The basic overall objective of the pattern recognition method is t o classify the patterns into the desired categories. Preprocessing of the data includes algebraic transformations such as the extraction of roots and taking of logarithms, feature selection, or changes in variables through transforms such as the Fourier transform. Such preprocessing can be useful for

ANALYTICAL CHEMISTRY, VOL. 43, NO. 10, AUGUST 1971

21 A

First, we entered NMR with 5 mm tubesof very high consistent quality. But 5 m m doesn't answer everyone's needs. So we added 8, 10, 12, 13 and 15 m m tubes of equal quality. Now we're introducing a whole new l i n e of t u b e s and accessoriesseveral that actually open the door t o new research techniques. Here are some of the more important items: (we intend t o keep t h e list growing) I n tubes: w our 5 m m Teflon* coated I.D. sample tube allows you t o work with halogenated compounds and other corrosive solvents. w valuable reference samples can be quickly sealed-off with our partially

constricted tubes. samples as small as 3 uI can be more easily handled in one of our capillary, spherical o r cylindrical M icrocell bulbs. our rubber septa are highly useful as t u b e closures or in sampleoutgassing. w quartz sample tubes are available for your special requirements. We've also designed a unique still that collects product directly in an NMR tube for minimum loss and no contamination, as well as a tube heater/concentrator that will accommodate all size NMR tubes. Another useful item: you can store, retrieve and return costly samples w

in our Microflexr containers with special valved c a p s a n d conical bottoms. Even cleanup can be more convenient and safer for fragile tubes with our new bench-top ultrasonic cleaner. With NMR technology growing t h e way it is, we felt the best in N M R tubes deserved the most of what can go with them. Ready t o expand your capability in NMR? See your Kontes salesman. Or, write directly for our latest NMR bulletin.

KONTES Vineland, N.J. 08360

@:Registered trademark of Kontes Glass Company 'Trademark of DuPont

Regional Distributors: KONTES OF ILLINOIS, Evanston, Illinois KONTES OF CALIFORNIA, Berkeley, California CIRCLE 95 ON READER SERVICE CARD

Report for Analytical Chemists

Figure 1. Melting and boiling points of organic acids and ketones plotted in two. dimensional space

-___~---

~~.

~~-

~~-

~~~

Figure 2. A linear decision surface for organic acids and ketones

the folloving two reasons. First, some traiisformations can spread the clusters of the patterns in the t\yo categories further apart in the pattern space, making discrimination easier. Second, preprocessing can reduce the dimensionality of the pattern space, either b y discarding dimensions deemed expendable or l q combining dimensions (possibly in very complex way (231, generating cross terms ( 2 6 ) , and using Fourier transforms (25). Decision Development fDzscriminnnt Trazning). The widely accepted optinnmi method for making pattern classification decisions, known as Bayes strategies, depends on having the probability density functions for the classes. Suppose that it is desired to classify patterns represented by d-dinien4onal vector. X I lxl, R ' ~ ,. . . , 2 , ) into one of two possible categories. Let F l ( X ) and F 2 ( X J lie the probability density functions for the two categories, let L , and L 2 be the losses associated with misclassifying a member of category one or category b o , and let P I and Pl = 1 P I be the a priori probabilities of occurrence of patterns in categories one and two. Then it can be shown (2) that the Bayes strategy says to make the decisions as follows: If PlLiFi (X)> P-LFr (XI

ANALYTICAL CHEMISTRY, VOL. 43, NO. 10, AUGUST 1971

23A

Report for Analytical Chemists

then classify X in category 1

If

P2L2F2(X)> PlLlFl (XI (1) then classify X in category 2 This procedure can be generalized to allow decisions among more than two classes. To use the optimum Bayes strategy, the probability density functions, loss functions, and a priori probabilities of each class must be either known or estimated. If the distribution is not known, or cannot be approximated accurately, then one must either estimate the distribution and proceed accordingly or apply some nonparametric method. For the data which is produced by most chemical experimentation, systems are so complex t h a t rarely is the distribution function known or easily estimable. I n most chemical experimentation, particularly t h a t of chemical analysis, i t is rare that a n y appreciable fraction of the universal set of data is collected under controlled conditions. I n mass spectrometry, one of the areas where considerable attention has been paid to the collection of data, large files typically contain several thousand entries, far short of the more than a million known chemical compounds, and even smaller in comparison to the imaginable number of compounds. Hence, a t most times we are working with a very small subset of the universal set. Any direct assumption of the universal set from such a small subset could be misleading. For these reasons we resort t o an empirical method for developing decision-makers. The principal decision process to be described here is the threshold logic unit (TLU) ( 5 ) . We will be concerned with TLU's which are binary pattern classifiers capable of placing a pattern in one of two categories. (This can, however, be made into a complete solution because a series of binary pattern classifiers may be used to subdivide data to any desired degree.) The original data pattern is denoted by the vector X. The TLU implements a plane of the same dimensionality as the patterns which will separate the data into the desired two classes. The two-dimensional data shown in Figure 1 may 24A

A convenient way to determine be separated into the desired catewhether a point lies on one side of gories by any of a family of straight the plane or the other is to use a lines (planes in two dimensions), vector normal to the plane a t the one of which is shown in Figure 2. origin. This vector (called a weight T o cause the decision plane to pass vector, W ) may be thought of as through the origin, an extra degree defining the locus of points which of freedom is added by augmenting constitutes the plane separating the the original d-dimensional pattern data classes (Figure 3 ) . Because vector X by a ( d 1) - s t dimenW is perpendicular to the plane, the sion (which has the same value for dot product of W with any pattern every pattern) t o give a new vector, Y. Usually an arbitrary value of 1 vector ( Y ) will determine whether the pattern lies on one side or the is given for the d 1 component other of the plane. of each pattern. [This value can, however, have some effect on the des = w * Y= /w1 IYI cos0 (3) velopment of decision-makers (&I), although it does not affect the sepwhere 0 is the angle between the two arability of pattern sets.] Hence vectors. IWI and IY/are always positive, and thus x E ( 2 1 , 2 2 , . * . ,T d j

+

+

and

>0 >0 90" < 0 < 270" cos 0 < 0 and8 < 0

-90"

Y = (Yl,Y2,.. . , Yd, Y d + 1) (2) Figure 3 shows the effect for the two-dimensional case given in Figure 2. Now a three-dimensional plane which passes through the origin may be used to separate the pattern sets.

/

/

/

Y i

Figure 3. Addition of a d through the origin

ANALYTICAL CHEMISTRY, VOL. 43, NO. 10, AUGUST 1971