High-speed algorithm for simplex optimization calculations - Analytical

Teaching the fundamentals of experimental design. Stanley N. Deming , Stephen L. Morgan. Analytica Chimica Acta 1983 150, 183-198 ...
0 downloads 0 Views 421KB Size
ANALYTICAL CHEMISTRY, VOL. 51, NO. 13, NOVEMBER 1979

p H of the solution to be titrated must be carefully adjusted to be within the recommended range using a pH meter (not p H paper) for measurement.

ACKNOWLEDGMENT T h e assistance of Michael A. Kraus in performing the sulfate titrations is amreciated. We also thank R. J. Smith for writing the progrkwhich produced the contour plots from the best-fit equation.

LITERATURE CITED (1) Fritz, J. S.; Freeland, M. Q. Anal. Chem. 1954, 26, 1593-95. (2) Fritz, J. S.;Yamamura, S. S. Anal. Chem. 1955, 27, 1461-64.

2295

(3) U.S. Dept. of Labor, OSHA: Sulfur Dioxide Occupational Exposure. Fed. Reg. 1975, 40, 54520-54534. (4) "Criteria for a Recommended Standard , . . Occupational Exposure to Sulfur Dioxide"; DHEW Pub. (NIOSH) 74-1 11, 1974. (5) "NIOSH Manual of Analytical Methods", 2nd ed.,Taylor, D. G., Ed.; DHEW Pub. (NIOSH) 77-157-A,B,C 1977; Methods 163 and S-174. (6) "Methods of Air Sampling and Analysis," 2nd ed.,Katz, M., Ed.; American Public Health Assochtion: Washington, D.C. 20036, 1977; Method 71 1, D 737. (7) '"Criteria for a Recommended Standard , , Occupational Exposure to Sulfuric acid"; DHEW Pub. (NIOSH) 74-128, 1974. (8) Butler, J. W., Locke. D. N. J . fnviron. Sci. Health-fnviron. Sci. f n g . 1976, A 7 7, 79-92.

RECEIVED for review May 18,1979. Accepted August 20,1979.

High-speed Algorithm for Simplex Optimization Calculations Gregory

F. Brissey, Robert B. Spencer,' and Charles L. Wilkins"

Department of Chemistty, University of Nebraska-

Lincoln, Lincoln, Nebraska 68588

In recent years the importance of optimization of experimental procedures and parameters affecting any given chemical analysis has been recognized. As a result, a number of workers have investigated the application of the sequential simplex method of Spendley and co-workers ( I ) , as well as the later modifications by Nelder and Mead ( 2 ) to problems of chemical interest (3-7). Denton and co-workers derived the most recent modification of the simplex procedure for chemical applications and demonstrated its efficacy in optimization of the operation of a computer-controlled flame spectrophotometer (8). A number of studies of the application of simplex optimization methods to the development of weight vectors for chemical pattern recognition studies, primarily spectral data interpretation, have also been reported (9-13). In these latter studies, one of the major disadvantages of the technique has been the large amount of computer time required when high-dimensional problems are involved. For experiment optimization applications, computation time is often a secondary consideration, since problems frequently involve adjustment of only two or three operating parameters and their actual adjustment times may significantly exceed the computer time required to determine what the new settings should be. However, with the increasing advent of instruments capable of making rapid computer-controlled adjustment of a larger number of electronically controlled parameters, the need for more rapid simplex computation procedures may soon be felt even in experiment optimization applications. In the context of the previously-mentioned pattern recognition studies, the need for optimizing as many as 60 or more parameters, which are subsequently used in an automatic discriminant function for spectral analysis, makes the speed of the simplex algorithm employed a dominant factor in that application. Accordingly, a search for an even faster algorithm than those previously reported was initiated. We report here the successful outcome of that search.

of the improved algorithm. As originally defined ( I ) , a simplex consists of a figure with d + 1 vertices located in a d dimensional space. Thus, in a two-dimensional space a simplex is a triangle, and in three-dimensional space, a tetrahedron. The coordinates of each vertex represent values of the d control variables, and the simplex algorithm provides a systematic procedure for exploring the experimental response obtained as a result of modifications in the control variables. By this means, if the simplex search is successful an optimized response is ultimately obtained. For pattern recognition, the simplex method was applied to the determination of the appropriate set of' weights in a linear discriminant function which would produce an optimum response (defined as the maximum recognition of members of a training set of patterns which are classified wikhcespect to some property of interest) (9). In order that the space searched be continuous, an additional response criterion (minimization of the perceptron (14) value) was combined with the recognition criterion. Subsequently, other response functions have been examined for that particular type of application (13). Whatever protocol is used, the simplex optimization procedure involves searching for an optimum in response space by successively replacing the worst vertex of the simplex with better vertices. This is accomplished by reflecting the worst vertex through the centroid of the remaining vertices (CR). Therefore, the new vertex is simply a linear combination of the worst vertex and CR. As originally implemented for pattern recognition applications (9), the reflection procedure required complete recalculation of the value of CR on each iteration, using Equation 1.

(w:") n

CR, = ( l / ( n

- 1))C

W,,

(1)

I &

DESCRIPTION O F THE HIGH-SPEED SIMPLEX ALGORITHM Both the Modified Simplex method ( 2 , 4 ) and the Super Modified Simplex procedure (8, 9) have been described in detail elsewhere as has their application to the problem of weight vector development for pattern recognition purposes (9-13). Therefore, only a brief summary of the fundamentals of the techniques will be given here to facilitate discussion

For the d dimensional space, n is equal to d + I , the number of vertices of the simplex and the remaining vertices, J , are summed, excluding k , the vertex (weight vector) being replaced. Thus, each of the i coordinates of the centroid are calculated. This is unnecessary, since a running overall centroid (C) may be computed on each iteration by considering only the changed vertex (for pattern recognition, this is a weight vector). Equation 2 summarizes these relationships. = C y (1/ n ) ( - Wp) (2)

'Present address: Nicolet Instrument Corp., Madison, Wis. 53711.

However, because the new summation, changes from the old one only as a result of the changes in the vertex

0003-2700/79/0351-2295$01.OO/O

cy

+

0 1979 American Chemical Society

my

w",

2296

ANALYTICAL CHEMISTRY, VOL. 51, NO. 13, NOVEMBER 1979

5. The expression for directly computing the dot products of the new vertex with the members of the training set is given by Equation 6.

Table I. Chemical Categories of the Compounds Whose Mass Spectra Comprise the Training Set Pool category

number

phenyls aldehydes and ketones ether aliphatic alcohols phenols carboxylic acids thiol/thioe thers esters amines amides nitrites

7245 887 362 463 1618 294 138 1657 370 459 121

replaced, the full summation need only be done for the initial vertex, with the subsequent centroids being computed by modifying the sum. If a weight vector (vertex) k is excluded, the values of CRi follow directly from the discussion above as expressed in Equation 3 which uses the excluded vertex to compute CRi

(3) In any of the simplex methods mentioned above, new vertices in hyperspace are chosen by placing them on a line defined by the worst vertex and the centroid of the remaining vertices. The methods differ only in the algorithm used to determine where on that line to place the new vertex. For pattern recognition, a new trial weight vector results at each iteration. Therefore, the algorithms used may be viewed as corresponding to linear combinations of the centroid and the worst vertex with a proportionality constant, a , dictated by the method used. Equation 4 summarizes this relationship. Wfrtpw = CRi + ru(CRi - wp, (4) The equation represents the linear combination of the centroid, CRi, with the vector difference of the centroid and the old weight vector. Using the relation expressed by Equation 3, it follows that w:" can be computed directly from the overall centroid, C, the old vertex, and the proportionality constant, CY.

wid,

Wfrtf"

= (1

+ a)(n/n

-

1)Ci - (1

+ (nru)/n

-

l)WP

(5)

Thus, wherever the centroid of the remaining vertices was previously used, the overall centroid, C, can be used instead. The centroid of the unreflected vertices need never be calculated. For pattern recognition applications, the response of a vertex is obtained by forming the dot products of that vertex with each member of the training set, in turn, and tabulating the recognition performance. A major increase in speed can be realized by recognizing the fact that the dot product of the new vertex with any particular member of the training set, X l i ,is also a linear combination with the dot product of the member with both the overall centroid, Ci,and the worst having the same coefficients as those in Equation vertex, K!d,

Because the dot products on the right-hand side of Equation 6 are calculated directly only once (taking 60 multiplications per training set member in our example) and subsequently these and updated values are used (requiring 4 multiplications per training set member), the speed improvement is ca. N/4, where N is the dimension of training set members. Thus, for the present 60-dimension example, a speed improvement of a factor of 15 is expected. Note that the various modifications of the simplex technique basically differ in how the coefficient a , which defines the weighting of the linear combination, is determined. For example, in the Super-Modified Simplex method described by Denton and co-workers (8), the responses of the worst vertex, the centroid of the remaining vertices, and the reflected point of the worst vertex through the centroid are calculated and fitted to a second-order polynomial curve. The position of the maximum of this curve defines a , and a new vertex is calculated. If this vertex has a better response than the reflected point it is kept, otherwise the reflected point is kept. As the dot product of a particular pattern with the overall centroid is changed, it too can be updated in an analogous fashion (Equation 7).

(CiXpw = (CiXJOld+ ( l / n ) (Wfrt,e"X,j - Wll'dXli)

Furthermore, the weight vector that bisects the means of the two classes of spectra in the training set ( Wd+,,i)is used as the initial vertex in the construction of the simplex. The remaining initial d vertices (weight vectors) are obtained by adding a constant called the "Spanning Constant" (SC) to only one of the d elements in the initial weight vector for each new weight vector created, with no two having the constant added to the same d element. This initial step can be avoided by recognizing that the dot products of the d starting weight vectors may be obtained directly from the dot products of the initial weight vector and the spanning constant (Equation 8).

As a result, for the initial, expanded, or contracted simplex, only the dot products for an initial weight vector and centroid of the simplex need to be calculated from the training set spectra. Another speed improvement which can be realized is achieved by the simple expedient of ranking the responses of vertices initially and as they are generated, using a binary search method to locate the ranking of new responses. In this way, a running pointer to the worst current vertex is easily maintained.

Table 11. Comparison of Optimum Weight Vector Development Time and Performance for the SMS andISMS Methodsagb

category phenyl carboxylic acid

thiol/ thioether ester amide

iteration/second SMS ISMS

SMS

ISMS

SMS

ISMS

392.8 1309.2 2014.6 1798.2 737.6 1250.5

0.87 1.01 1.06 1.05 0.93 0.98

342 1321 2131 1883 686 1273

532 926 2294 1092 653 1099

92 90 99 88 83 90

90 94 100 91 84 92

12.09 13.48 15.88 13.52 13.30 13.65

total iterations

% correct recognition

total CPU time ( s ) SMS ISMS 44.0 68.7 144.4 80.8 51.4 77.9

(7)

average a Calculations done using an IBM 360/65 computer. Prediction performances of the weight vectors produced were comparable. In the absence of round-off error, identical weight vectors would be expected.

ANALYTICAL CHEMISTRY, VOL. 51, NO. 13, NOVEMBER 1979 * 2297

EXPERIMENTAL Data. In order to test the efficacy of the new algorithms, a training set pool of 13614 low resolution mass spectra of monofunctional compoundswere drawn from the NIH/EPA/MSDC mass spectral data base of 25 560 spectra, leased from the Office of Standard Reference Data, National Bureau of Standards. For each of 5 categories of compounds (selected from those listed in Table I), training sets of 200 spectra were chosen. The training sets were constructed so as to contain equal numbers of class and nonclass spectra. For example, the training set for recognition of the ether function was comprised of the spectra of 100 ethers and 100 additional spectra were chosen from among the other 10 categories listed in Table I. For training with each category, the 60 most frequent m / e peak positions were chosen as the features to be used. Computations. Super-modified Simplex (SMS) and Improved Super-modified Simplex (ISMS) programs were written in Fortran IV and all computations were done using an IBM 360/65 computer. Spanning constants used were the same for both SMS and ISMS calculations. The extrapolation factor used was 50% of the distance between the worst vertex and its reflection about the centroid of remaining vertices and the safety factor was 5% of the same distance.

RESULTS AND DISCUSSION Table I1 summarizes the data relevant for comparison of the SMS and ISMS simplex pattern recognition speed and performance in development of weight vectors for recognition of each of the 5 categories chosen for testing. When total computation times are compared, it is seen that the improved algorithm described here was, on the average, 14 times faster in locating an optimal weight vector. Recognition performance of the vector thus produced was generally as good as,or better than, those obtained using the SMS method. Closer ex-

amination of the tabulated data shows that this speed improvement derives primarily (as expected) from the ability of the ISMS technique to complete many more iterations in a given amount of time. The total number of iterations carried out with each of the methods was approximately the same. Because the dot products of the vertices must be stored when the ISMS method is used, memory requirements are somewhat greater than with the SMS technique. In the present instance, the program size increased from 150 Kbytes to 200 Kbytes when the ISMS technique was implemented. However, this modest memory increase is more than compensated by the additional computational speed obtained.

LITERATURE CITED Spendley, W; Hext, G. R; Himsworth, F. R. Technometrics 1962, 4 , 441. Nelder, J. A; Mead, R. Comput. J . 1965, 7, 308. Emst, R. R. Rev. Sci. Instrum., 1966, 3 9 , 998. Deming, S . N; Morgan, S. L. Anal. Chem. 1973, 45, 278A. Long, D. E. Anal. Chim. Acta 1969, 46, 193. Deming, S. N; Morgan, S. L. Anal. Chem. 1974, 46, 1170. Johnson, E. R.; Mann, C. K; Vickers, T. J. Appl. Spectrosc. 1976, 30, 415. Routh, M. W. Swartz, P. A; Denton, M. 8 . Anal. Chenr. 1977, 49, 1422. Rtter, G. L; Lowry, S. R; Wilkins, C. L; Isenhour, T. L. Anal. Chem. 1975, 47, 1951. Brunner, T. R; Wilkins, C. L; Lam, T. F; Soltzberg, L. J; Kaberline, S. L. Anal. Chem. 1976, 4 8 , 1146. Lam, T. F; Wilkins, C. L; Brunner, T. R; Soltzberg, L. J; Kaberline, S. L. Anal. Chem. 1976, 4 8 , 1768. Kaberline. S . L; Wilkins, C. L. Anal. Chim. Acta 1978, 103, 417. Ritter, G. L; Lowry, S. R: Isenhour, T. L; Wilkins. C L., submitted for publication in J . Chem. Inf. Comput. Sci. Duda, R . 0; Hart, P. E. "Pattern Classification and Scene Analysis", Wiley-Interscience: New York, 1973; p 141.

RECEIVED for review June 25,1979. Accepted August 10,1979. The support of the National Science Foundation under grant CHE-76-21295 is gratefully acknowledged.

Quantitative Analysis of Silicates by Instrumental Epithermal Neutron Activation Using (n,p) Reactions Ernest S. Gladney" and Daniel R. Perrin University of California, Los Alamos Scientific Laboratory, P.O. Box 1663, Los Alamos, New Mexico 87545

Instrumental epithermal neutron activation (IENA) involves the use of a neutron filter to screen out the thermal portion of the reactor neutron energy spectrum. Both Cd and B are efficient neutron filters. The former is essentially opaque to neutrons with kinetic energies of less than 0.4 eV, while the absorption cross section of the latter follows a l / v relationship and reaches a small value a t approximately 280 eV (1). The advantages of epithermal over conventional thermal neutron activation for elemental analysis of geological materials have been well summarized by Steinnes ( 2 ) . The principal advantage is that the most common rock forming elements, which activate strongly with thermal neutrons (e.g., Na, Al, P, K, Fe, and Sc), have their activities suppressed, relative to elements which have cross-sectional resonances in the epithermal energy region. T h e reduction in activity from common (n,y) products permits the observation of activation products from the lower cross section (n,p) reactions. Elements with potentially useful (n,p) reaction products are shown in Table I. Cross-sectional data are taken from Steinnes ( 2 )and Erdtmann ( 3 )and are presumably for Cd filtered epithermal fluxes. The energy threshold values are from Howerton et al. (4). Only "Mn and =Co are commonly observed in spectra of thermal neutron activated geological samples. The remainder are usually obscured by the background caused by those species that are 0003-2700/79/0351-2297$01 .OO/O

produced in greater abundance by thermal neutron capture. The threshold energies for (n,p) events are also shown in the table. For most of the reactions, neutrons with 0.5 MeV kinetic energy or more are required. When the thermal component of the flux is absorbed, neutrons of this energy become a much more prominent part of the remaining spectrum. Steinnes has carefully explored the (n,p) reaction for Ni determination in silicate rocks ( 5 ) ,and Ni measurements in geological materials using this reaction have been reported by other investigators (6-9). The determination of Si through the 28A1reaction has been reported in bulk iron ore samples (10) and in lymph node samples (11). The quantitative study of the application of (n,p) reactions to the analysis of geological materials is the subject of this paper.

EXPERIMENTAL One-gram samples of various silicate standard reference materials were encapsulated in polyethylene vials and irradiated in the Los Alamos Omega West Reactor epithermal facility. The epithermal neutron flux is boron filtered and is described in detail elsewhere (12). The flux is approximately 5 X 1O1O n/cmz/a. Samples may be pneumatically transferred in a few seconds from the reactor to the counting room. Each irradiation is monitored by use of a known amount of an Au solution pipetted onto filter paper disks lodged in the rabbit caps. These monitors may be 0 1979 American Chemical Society