STOCHASTIC T E S T FUNCTIONS
FOR OPTIMIZATION TECHNIQUES The effects of various methods of adding noise to object functions used in testing optimization search techniques are examined. The mean value of a noise-corrupted object function is a variable and, in the neighborhood of the extremum, is less desirable than that of the noise-free function. The variance also is a function of the independent variable even though input variances are constant. These effects are second order with respect to the independent variable and cannot b e legitimately approximated by a linear noise addition to the object function.
of the increased use of digital computers for process control, improvement of process performance by experimental search techniques is becoming increasingly important. A factor which complicatles interpretation of experimental data is “noise” or errors of a statistical nature in the data. Analytical study of the noisy search problem is extremely difficult and effort often must be orieni:ed toward direct trial of the proposed search procedure on stochastic test functions. This paper shows that the method of adding noise to the object function used in three recent studies of this problem (Ahlgren and Stevens, 1966; Heaps and Wells, 1965; Kushner, 1963) is not representative of the normal physical situation. Since additional studies can be expected in this important area, a detailed examination of the technique of adding random elements to test functions is presented. BECAUSE
Discussion
The general problem follows: A set of control or manipulative variables is specified. As a result of operation of a process a t this specified state, a number of observable output variables can be measured. These observed variables and the control variables are then combined in a criterion function, so that some measure of the desirability of the operating condition can be assessed. The system of equations for this situation can be summarized as: 57
Q
P(x)
(1)
Q ( Y , x)
(2)
=
=
where x is the control vector, y is the output vector, P is the plant vector transfer function (usually unknown), and Q is the object function computed from the above factors. Noise and error can occur a t every link of this chain of equations. There may be error in setting the control variables, x ; there may be error in observing the true valuesof y ; and there may be other input::, so that the relationships P between y and x contain apparent stochastic factors. Normally, the computation of Q is orders of magnitude more precise than measured values of its parameters. A critical point in the computational procedure is the way in which randomness is imparted to the object function. The practice in the studies cited above was to assume that the random factors are additive. Thus the object function was modified by adding a random variable-i.e.,
R=Q+kt
(3)
where R becomes the stochastic object function, t is a random variable with zero mean, and k is a proportionality factor. This formulation does not approximate the noisy behavior of the usual experimental object function in the critical region near an extremum. The crux of the argument can be demonstrated heuristically with reference to the one-dimensional system depicted in Figure 1. I n this neighborhood of the extremum (a minimum is used for the example in Figure l ) , the object function must have a shape similar to a quadratic curve. The random variation of Q results from random variations of x and y. The random variations of y are themselves dependent on random variations in x. Variations of x near point A in Figure 1 produce roughly proportional variations in Q . Hence in this region it may be permissible to approximate the noisy object function as given in Equation 3. However, at point B, a point near the extremum, an entirely different situation exists. If Equation 3 is retained, the variation in Q remains the same as indicated. However, assuming the functional variation in parameters y to be the same a t B as a t A . the deviation in Q becomes the smaller range shown a t B in Figure 1. The minimum of the perturbed function cannot be less than the absolute minimum of the noise-free function. Analytical proof of these intuitive results is obtained as follows: The noise corruption of the object function can be represented as
where n and p are random variables. But the observed variables y result, a t least in part, from the value of the control variables, x:
(5)
+
where 4 is a random operator on x p and represents unknown random inputs as parameter changes for P. Thus, ultimately Q may considered as solely a function of x u.
+
For the small noise levels where I$ can be expanded as a Taylor’s p only. series, R is a function of x
+
VOL.
7 NO. 3 A U G U S T 1 9 6 8
523
/
A
I n the neighborhood of the extremum the gradient of Q vanishes, so that the variance of R ( x p) becomes
+
I t is clear that this second-order difference is smaller than the variance a t points where the b Q / b x , are not zero. This result contrasts with the variance of Q ( x ) ke where
C
+
Var [ Q ( x )
+ ke] = k Var (e)
= constant
(11)
for all values of x . Conclusions d w t r limit of Q 4 k r
x Figure 1. Schematic diagram of objective function behavior near an extremum
In the neighborhood of an extremum, any surface osculates with a quadratic surface. For the small noise levels,
R(x
+ p)
=
+ Q ’ W p + p’Q’’(dp
Q(d
(7)
where Q’ is the gradient of Q , Q” is the Hessian of Q , and pT indicates the transpose of p. Since the mean of p is zero, the expectation of R (x p) is given by
+
where p is the correlation matrix and u* is the variance of p. The second derivative matrix of Q is sign-definite in the neighborhood of an extremum. Hence Equation 8 shows that there is a shift in the mean in the opposite direction from the extremum. A definite limit exists, therefore, for the degree of approach to an extremum for any search algorithm. I t is possible, however, to reduce the effective variance by replication and/or regression. I n any case it is clear that the mean value of Q ( x ) ke is not equal to the mean value of R ( x p). The variance of R ( x p) is found using Equations 7 and 8:
+
Var R(x
+
+
+ p) = E [ R ( x + p)’] - E [ R ( x + p)]’
The most valid way to add stochastic elements to a test object function is to add scaled noise directly to vectors x and y. A reasonable approximation to this procedure would be to add noise to vector x only. This route would introduce discrepancies due only to the higher order differences in the linear approximations to P ( x ). There are significant differences between the behavior of the p) and the function Q ( x ) stochastic object function R ( x ke. The mean value of R ( x p) is not the same as the mean of Q ke. The variances of the two expressions are also conk e ; decreasing siderably different-constant in the case of Q near the extremum of R ( x f p). A better approximation to the behavior of R(x p) would be to multiply Q ( x ) by the factor (1 f ke). At least the variance then would exhibit more nearly equivalent behavior. The explicit results of employing Q ( x ) ke instead of R x ( x f p) in testing search techniques is not easily deduced. Methods employing replication or regression proceed by reducing d’ directly. This would reduce the effects of the random factor in Equations 7 and 9. Methods searching directly for extrema, but using no algorithmic memory, might find the optimal x but would converge a t an atypical rate and predict a degraded value for the extremum as shown by Equation 7 .
+ +
+
+
+
+
+
Literature Cited
Ahlgren, T. D., Stevens, W. F., IND.ENG.CHEM.PROCESS DESIGN DEVELOP. 5,290 (1966). Heaps, H. S.,Wells, R. V., Cun. J . Chem. Engr. 43, 319 (1965). Kushner, H. J., Trans. A.S.M.E. J . BusicEngr. 63, 157 (June 1963).
=
R. H. LUECKE University of Missouri Columbia, Mo. RECEIVED for review November 29, 1967 ACCEPTEDMay 28, 1968
524
I&EC FUNDAMENTALS