Statistical Simplex Method for Experimental Design in Process

A new simplex search algorithm with a logic that resorts to correlation-based ranking of simplex vertices for reflection, expansion, contraction, and ...
1 downloads 0 Views 348KB Size
8796

Ind. Eng. Chem. Res. 2005, 44, 8796-8805

PROCESS DESIGN AND CONTROL Statistical Simplex Method for Experimental Design in Process Optimization Ernesto C. Martı´nez* INGAR-Instituto de Desarrollo y Disen˜ o (CONICET-UTN), Avellaneda 3657, S3002 GJC Sante Fe, Argentina

Experimental optimization with scarce and noisy process data is a key issue in laboratory automation for faster chemical process research and development, real-time process optimization, and the ability to embed a learning capability into the design of self-calibrating instruments and extremum-seeking controllers. To deal successfully with noise and uncontrollable factors in experimental design for process optimization, a statistical characterization of an optimum using process data is proposed. The Kendall’s tau statistic is used for identifying a minimum (maximum) in a data set as a cluster center of positively (negatively) correlated points. A new simplex search algorithm with a logic that resorts to correlation-based ranking of simplex vertices for reflection, expansion, contraction, and shrinking steps is proposed. The advantage of resorting to a data set that cumulatively provides a global perspective of the output landscape through Kendall’s tau calculations is a novel feature of the statistical simplex method. Encouraging results obtained for Rastringin’s multimodal function and in the optimization of the operating policy for a semibatch reactor are presented. 1. Introduction Finding optimal operating conditions fast for process systems is a key competitive factor in the fine chemicals and pharmaceutical industrial sectors to increase the value added in many stages of a product/process lifecycle, including recipe development, scale-up, and runto-run optimization1. Product customization and reduced time-to-market are features of central concern to speed up process development in order to succeed against an increasing number of alternative products2. The drivers for experimental process optimization include compressed timeliness to market for new products, increased throughput of projects, accelerated process screening for alternative products, and, above all, breaking the new bottleneck: process development.3,4 Experimental process optimization is of central concern for implementing real-time optimization through extremum-seeking control strategies5 resorting to setpoint variations made on purpose in order generate information about the reference-to-output map. As shown in the schema of Figure 1, the continuous probing of the controlled system is aimed at detecting and tracking a local process optimum. The high-frequency filter is required to deal with a very noisy environment. The use of direct search methods for systematic probing has already been proposed in the form of a dynamic simplex method.6 The apt name of direct search emphasizes that changing operating conditions to estimate the local improvement direction is carried out directly without the help of a process model. Direct search methods are recommended when a process model of the * Tel.: +54 342 4534451. Fax: +54 342 455 3439. E-mail: [email protected].

Figure 1. Role of direct-search methods in the design of extremum-seeking controllers.

required accuracy is difficult or expensive to obtain. It is worth noting that the minimum number of experimental points for gradient estimation in an n-dimensional space is (n + 1), which is exactly the number of vertices of a simplex in an R n space. Three factors, noise, time, and cost, constitute major stumbling blocks to the design of algorithms for experimental optimization with process data.7 Process outputs often contain significant quantities of noise, or error, and sampling bias, while time and money allows for only a reduced number of experimental runs or trials. Another complicating factor is often the requirement of a “black-box” system behavior, namely the lack of a priori fundamental knowledge (e.g., multiple optima or continuity) about the objective function and the effect of uncontrollable factors over the input domain of interest for optimization. The simplex search method proposed by Nelder and Mead8 has been successfully used for rather smooth functions but has severe difficulties to deal with noise, outliers, and process dis-

10.1021/ie050165m CCC: $30.25 © 2005 American Chemical Society Published on Web 10/18/2005

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005 8797

continuities that are ubiquitous in experimental optimization problems. As is the case for all optimization algorithms, the simplex search method has been designed to handle outputs from a mathematical function. When the method is used with process data which always incorporate perturbations, stagnation problems are often encountered.9 Perturbations can be of different natures, such as discontinuities, nonsmooth effects in the underlying input-output relation, and the effect of noncontrollable factors that bias the output and measurement errors. The Nelder-Mead algorithm can stagnate and converge to a nonoptimal point in the presence of noise or outliers, even for very simple and smooth functions, because of the “disconcerting effect” of perturbations when estimating an improvement direction. A rather simple explanation for such a poor performance is that, as a result of perturbations on sampled process outputs, the objective function looks like one with multiple local optima, such as Rastrigin’s function, Ackley’s function, and Griewangk’s function. The interested reader is referred to the book by To¨rn and Zilinskas10 for further details on these multimodal functions. The easiest way to handle output variability is doing more that one experiment for each simplex vertex. However, this is very costly in terms of both time and money. A much better approach is to increasingly exploit the full data set describing the influence of controllable input parameters on process outputs. This work proposes a statistical simplex method based on correlation-based ordering of the simplex vertices using sampled values of process outputs. The main difference of the statistical simplex with regards to the NelderMead simplex, or any of its variants, is that the ranking of points in the simplex and the overall logic of the algorithm is based not on point-wise values of the objective function but on the Kendall’s τ correlation coefficients calculated for each one of the candidate optima resulting either from a reflection, an expansion, or a contraction operation in the current simplex. Also, a selective replication operation is integrated into the logic of the statistical simplex method. The advantage of resorting to a data set that cumulatively provides a global perspective of the output landscape through Kendall’s tau calculations is a novel feature of the statistical simplex method. 2. Statistical Characterization of an Optimum To infer from process data a direction of improvement or the nearby location of a local optimum requires a statistical characterization of optimality. Unfortunately, this need has not been recognized so far, and there is little previous research in this area.7,11 In this section, a rather simple characterization of optimality based on rank correlation methods is discussed. 2.1. Problem Statement. This work proposes the combination of nonparametric statistics and the simplex search method for designing evolutionary search algorithms that can solve an experimental optimization problem involving a process performance index g(x,θ). The operating conditions to be optimized are represented by a vector x of real-valued inputs defined over a domain of interest Ω, whereas the outcome of each experiment is a scalar process output y that is a nonlinear function of inputs x, uncontrollable factors θ, and an additive noise φ:

y ) g(x, θ) + φ, x ∈ Ω

(1)

Given the constrained space of legal inputs, the optimization task is to find the input vector that is close enough to the global optimum of g(x,θ), using a rather short sequence of experiments. The main difficulty of dealing with process outputs as defined in eq 1 is making inferences about the optimum location from observations that are usually both sparse and biased and where the presence of outliers cannot be ruled out. Thus, a robust framework for characterizing optimum operating conditions using process data is required. The next section discusses the use of a local monotonicity assumption which elaborates on the relationship between optimality and correlation. The Kendall’s tau statistic is used for characterizing candidate local optima as cluster centers of strongly correlated points. 2.2. Correlation and Optimality. The performance index g(x,θ) to be optimized is typically unknown except for the scarce, noisy, and biased information y(xi), i ) 1, 2, ... provided by a sequence of experimental trials. To seek for an optimum in the face of output variability demands developing a statistical model of local optima that is more amenable to deal with process outputs rather than smooth functions. A robust and quite general assumption about sampled data regarding the hypothetical location of a minimum (maximum) is the monotonicity property, defined as follows: • Monotonicity Definition: As the distance to the true minimum (maximum) increases, there exists a monotonically increasing (decreasing) trend in the corresponding values of the objective function. Accordingly, hypothetical optima which are closer to the actual minimum (maximum) should exhibit a greater degree of positive (negative) correlation with respect to distance than those that are farther away. In mathematical terms, this assumption requires that, for the true optimum x*, there locally exists a monotonic relationship φ such that

y(x) ) φ(d), with d ) ||x - x*||

(2)

It is worth noting that this guiding model for an optimum does not impose any constraint on the shape of φ in the vicinity of the minimum (maximum) as long as the function φ is monotonically increasing (decreasing) with respect to the distance from the optimum. Furthermore, beyond the assumption of symmetry with respect to the distance metric d, the proposed local optimum characterization is independent of any continuousness assumption for the g(x,θ) or its derivatives. Also, there is no assumption about a given noise distribution for y(xi), i ) 1, 2. The only mild constraint is that the expected value of y, E(y), for a given x is equal to g(x,θ), meaning that the averaged response of replication experiments is not a biased estimator. It is assumed that unknown factors θ may vary from experiment to experiment following a stationary (yet unknown) statistical distribution. Handling of slow process drifts and temporal bias (e.g., learning curve) can be accommodated in the monotonicity property above assuming a time reference is explicitly included in the input vector x. Sampled data will locally fit the model of optima in eq 2 to different extents. The existence of a local optimum at x* requires a monotonic association between di ) ||xi - x*|| and y(xi). Under the influence of noise, bias, and outliers, the crucial concern is to find enough

8798

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005

evidence to accept or reject the hypothesis of independence, i.e., no correlation between y(xi), i ) 1, 2, ..., and di against the alternative that may correspond to a positive or negative correlation, depending if x* is a local minimum or maximum, respectively. To test for the strength of this relationship between di and y(xi), i ) 1, 2, ..., n, it is tackled here using nonparametric statistical methods that can be applied more broadly since much fewer assumptions about the data set need to be made. A typical nonparametric measure of correlation that can be used for this purpose is the Kendall’s correlation coefficient.12-14 Denoting the ordering ranks for increasing values of di, y(xi) by ri, si, respectively, the Kendall’s tau coefficient is defined to measure the strength of the association (correlation) between the distance ranks ri and the response ranks si, i ) 1, 2, ..., n. If such dependency between the ranks exists, then once the d-ranks have been arranged in ascending order, i.e., so that ri ) i, then the y-ranks should show, despite noise and bias, an increasing trend when there is positive association (local minimum) or negative association (local maximum). Accordingly, Kendall and Kendall13 and Gibbons14 proposed that, after arranging observations in the increasing order of d-ranks, we score each paired difference sj - si for i ) 1, 2, ..., n - 1 with j > i as +1 if this difference is positive and -1 if this difference is negative. These differences are called concordances and discordances, respectively, with respect to an expected monotonic trend between the ranks. Denoting the sums of observed concordances and discordances by nc and nd respectively, the equation for the sample Kendall’s coefficient τ is

1 τ ) (nc - nd)/ n(n - 1) 2

(

)

(3)

Since there exist 1/2n(n - 1) pairs sj - si, if all are concordances nc ) 1/2n(n - 1) and nd ) 0, then τ ) 1 and monotonic association is perfectly positive. Similarly, if all are discordances, monotonic association is perfectly negative with τ ) -1. If the rankings of d and y(xi) are independent, we do expect a fair mix of concordances and discordances, whence τ ≈ 0. When there are not ties in the ranking nc + nd ) 1/2n(n - 1), it is necessary to calculate only either the concordances or discordances.12,14 Figure 2 depicts graphically the idea of statistical characterization of an optimum based on a correlation using samples from the simple noisy function y(x,a) ) a(1 - x)2, where the parameter a is a parameter having a given statistical distribution with mean µ ) 1 and variance σ2. When σ ) 0, it is by all means clear from the sample data in Figure 2a that a minimum is located at x )1, which has a correlation coefficient of τ ) 1. When we assume first that parameter a follows a normal distribution with µ ) 0.0 and σ ) 0.15, the resulting sampled function values y are depicted in Figure 2b against the background of the function y(x) ) (1 - x)2. The picture for a hypothetical minimum at x ) 1 is not so clear, yet a value of τ ) 82 suggests that the hypothesis x* ) 1 still has a good chance of being true. As the level of output variability increases (σ ) 0.3, see Figure 2c), the degree of positive correlation decreases to τ ) 0.71 and the hypothesis x* ) 1 is less likely to be true. As can be seen, the statistical approach for making a hypothesis about alternative optimum

Figure 2. Kendall’s tau calculation for sampled values from the noisy objective function y(x) ) a(1 - x)2, where a is a noisy parameter having a normal distribution with mean µ ) 0 and variance σ2: (a) σ ) 0; (b) σ ) 0.15; (c) σ ) 0.3.

locations gracefully degrades as the level of output variability increases. To illustrate the importance of resorting to a nonparametric statistical measure of optimality, Table 1 summarizes results obtained for statistical distributions in the parameters a and b for the more general function y(x,a,b) ) b(1 - x)2 + a. Each calculated Kendall’s tau for the hypothetical optimum x* ) 1 is obtained by averaging over three independent samples using the same x-values as in Figure 2. It is noteworthy that the

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005 8799 Table 1. Robustness of Kendall’s Tau for Different Noise Distributions and Bias for the Function y(x) ) b(1 - x)2 + a a

b

Kendall’s tau

uniform (-0.15, 0.0) uniform (-0.1, 0.1) uniform (-0.05, 0.15) normal (0.1, 0.1) normal (0.1, 0.1) normal (0.2, 0.2) uniform (-0.05, 0.15)

1 1 1 uniform (0.9, 1.05) normal (1.0, 0.2) uniform(0.8, 1.2) uniform(0.8, 1.2)

0.833 0.905 0.905 0.760 0.841 0.840 0.714

statistical characterization of an optimum is quite robust against not only the shape of the distributions but also to the presence of a nonzero bias in the noise mean value and asymmetries in the output y distribution. 2.3. Example: Rastrigin’s Function. Consider Rastrigin’s function, defined as

Figure 3. Rastrigin’s function.

Ras(x1,x2) ) 20 + x12 + x22 - 10(cos(2π x1) + cos(2π x2)) (4) whose graph is shown in Figure 3. As can be seen, this function has many local minima and maxima but only one global minimum at (0, 0). The appearance of this function typically resembles the effects of noise and variability when sampling the quadratic function (x12 + x22), which makes it difficult for standard, directsearch methods such as the Nelder-Mead8 method to find the global minimum. It is worth noting that, in direct-search optimization using a sequence of simplices, the optimization logic centers around defining a local improvement direction along which reflection, expansion, and contraction operations allow defining a new simplex which is, on average, better than the current one. The problem to be addressed is finding an approximation to the global minimum of Rastrigin’s function over the domain Ω ) {x1, x2| - 5.12 e x1 e 5.12, -5.12 e x2 e 5.12} by resorting to sampled data of the Ras function in eq 4. To illustrate the relationship between correlation and optimality, let’s consider the data in Table 2, comprising a sample of 12 values of Rastrigin’s function. To assess the statistical significance of alternative hypothesis for the optimum location, assume first that x* is a local minimum of the Rastrigin function. To assess for this optimum’s degree of fit, the data set in Table 2 is used for testing for a positive monotonic association between y(x) and d ) ||x - x*||. This relationship is illustrated in Figure 4. To test for the hypothesis that x* is close to a local maximum, the value of τ and its significance need to be calculated. Table 3 provides ranked data from Figure 4; the paired ranks are used to calculate the statistic τ for the hypothetical optimum yielding τ ) 0.4242 with nc ) 47 concordances and nd ) 19 discordances. Testing for the significance of the hypothesis H0 ) τ ) 0 against the alternative H1 ) τ > 0 requires a precalculated value of the Kendall’s tau critical values for n ) 12 and a chosen significance level (1 - R)%. Data tables give nominal 5% and 1% critical values for

Figure 4. Positive monotonic association between y(x) and d ) ||x - x|| for x* ) (x1 ) 0.058; x2 ) 0.058)T.

significance when n ) 12 in one-tail tests as 0.3929 and 0.5455, respectively. The interested reader is referred to the Table C for a small-sample Kendall’s tau distribution that can be found in the book of Gibbons14 for further details. Corresponding values for nd - nc are 26 and 36, whereas an approximated Monte Carlo estimate of the exact one-tail probability P of rejecting H0 when in fact it is true is 0.0396. Since 0.4242 > 0.3929, we reject H0 and accept the postulated optimum with a confidence of 95%; on the other hand, the available data does not provide strong statistical support to accept the postulated optimum at x* ) (x1 ) 0.058; x2 ) 0.058)T with a confidence level of 99%. From the data in Table 2, it seems more advisable to assume that x* ) (x1 ) 0.01; x2 ) 0.04)T is indeed closer to the global optimum of Rastrigin’s function. Kendall’s tau for this postulated optimum is τ ) 0.8182, with nc ) 60 concordances and nd ) 6 discordances, which provides enough support for the hypothesis of x* ) (x1 ) 0.01; x2 ) 0.04)T being the true global optimum with a very high level of confidence (>99%). For the actual minimum of Rastrigin’s function, x* ) (x1 ) 0.0;

Table 2. Rastrigin’s Function Sampled Data y(x) x1 x2

1.4486 0.05 0.07

4.1645 0.15 0.01

0.3356 0.01 0.04

0.7851 0.02 0.06

1.565 0.09 0.001

1.1347 0.07 0.03

3.1296 0.09 0.09

0.6315 0.04 0.04

3.84 0.1 0.1

2.23 0.04 0.1

1.3201 0.058 0.058

3.2636 0.07 0.11

8800

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005

Figure 5. Bisection approach: a geometric understanding of Kendall’s tau. Table 3. Ranked Data for Kendall’s Tau Calculation d-ranks y-ranks

1 4

2 6

3 2

4 5

5 3

6 9

7 8

8 1

9 10

10 11

11 7

12 12

Table 4. Kendall’s Tau Sampled Values Using Rastrigin’s Noisy Function for the Hypothetical Location of the Optimum at (0, 0) µ

σ2

τ1

τ2

τ3

τ4

10 10 10 10 11 8 9 7.5 8 12.5 12.5

0.1 0.2 0.4 0.5 0.2 0.2 0.2 0.2 0.3 0.3 0.4

0.9697 0.9091 0.6667 0.7576 0.8788 0.5455 0.7879 0.8485 0.7273 0.6061 0.6364

0.9394 0.8182 0.7576 0.3636 0.9394 0.7879 0.8182 0.8788 0.5152 0.7576 0.5758

0.8485 0.7273 0.6364 0.5758 0.8788 0.8485 0.7273 0.8182 0.8788 0.8485 0.7576

1.0 0.8485 0.5758 0.5455 0.9091 0.7576 0.8788 0.6667 0.5758 0.7273 0.8788

x2 ) 0.0)T, Kendall’s tau is 1 with 66 concordances and no discordances. To illustrate the robustness of such a statistical characterization of a given optimum location, e.g., the actual global minimum, the following noisy version of Rastrigin’s function incorporating an unsystematic variation will be used,

Rasnoisy(x1,x2) ) 20 + x12 + x22 - A(cos(2π x1) + cos(2π x2)) (5) where A is a sampled value from a normal distribution with mean µ and variance σ2. Four calculations of Kendall’s tau for each µ and σ2 are given in Table 4. Although the statistical significance of each Kendall’s tau is sensitive to an increase in the variance of the parameter A, the degree of positive correlation for the hypothetical optimum x* ) (x1 ) 0.0; x2 ) 0.0)T is significantly high. Furthermore, the influence of up to 25% bias (µ ) 10 ( 2.5) on Kendall’s tau is also shown to be rather marginal. 2.4. Meaning of Kendall’s Tau. At first glance, it may seem clear that any correlation measure such as Spearman’s Rho or Kendall’s tau12-14 is equally useful to make a statistical characterization of a hypothetical optimum. However, Kendall’s tau has a very appealing interpretation as it is shown in Figure 5 for a two-input dimension.15 The underlying idea is very simple. Given a pair of points x1 and x2, a perpendicular bisector

between them is created; then the proposed model of a local minimum is satisfied if the hypothetical minimum is on the same side of the bisector as the point with the smaller y(x). This test of concordance can be carried out for all possible pairs of points in the data set. The number of satisfied bisectors (concordances) minus the number of unsatisfied bisectors (discordances), divided by the total number of bisectors will exhibit a Kendall’s tau distribution. Grounded on this interpretation of Kendall’s tau used for a statistical characterization of optimality, it is possible to argue that the assumption of symmetry referred to in eq 2 becomes almost irrelevant. Note that no reference whatsoever is made to any property of the objective function, such as symmetry, when concordances and discordances are calculated. Some comments regarding Figure 5 are worth making. Each circle corresponds to a data point in a twoinput space, with the output value x* inside. Solid lines define the perpendicular bisectors between each pair of points. For a given pair, a solid triangle indicates the direction of the smaller of the two points. A +1 or -1 indicates whether the postulated optimum x* is on the same side of the bisector as the smaller of the two points. The statistic τ is calculated as the sum of these numbers over all pairs in the data set divided by the number of bisectors. 3. Statistical Simplex Method Simplex search methods8,16 resort to an effective device for parsimoniously sampling the input space in the search for an optimizer. A simplex is a set of n + 1 points in R n. Thus, a simplex is a triangle in R2, a tetrahedron in R 3, etc. A nondegenerate simplex is one for which the set of edges adjacent to any vertex in the simplex forms a basis for the input space. The simplex search algorithm mainly resorts to four operations: reflection, expansion, contraction, and shrinking. The reflection operation is aimed at determining a local direction for improving the objective function. The logic of the algorithm seeks to take advantage of such a direction when the new vertex is better that all the vertices in the current simplex by an expansion operation. On the other hand, when the reflected vertex does not represent an improvement over any of the current vertices, a contraction operation is carried out. Shrinking toward the best vertex is done when even internal contraction fails to produce a vertex that is at least better that the worst one in the current simplex. In the statistical simplex method, these operations are based on the correlation coefficient τ for hypothetical optima located in the simplex vertices and trial points resulting from simplex operations. Additionally, the statistical simplex algorithm uses a new operation named replication, meaning remeasuring the process output for one of the simplex vertices, e.g., the vertex exhibiting the greatest degree of correlation τ. 3.1. Statistical Simplex Algorithm. • Initialize. Start kth iteration with a nondegenerated simplex in R n. • Ranking. To label the best (x1), the worst (xn+1), and the next-to-worst (xn) vertices in the current simplex, hypothetical reflections of each vertex at a time and their corresponding Kendall’s tau are calculated. It is noteworthy that ranking does not require doing new experiments or function evaluation. See Figure 6 for details in a two-input example. Assuming that we are

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005 8801

• Correlation Ordering. Once the new data obtained in the sampling step has been incorporated into the data set, the Kendall’s tau τn+1, τr, τic, and τoc for xn+1, xr, xic, and xoc are calculated. Then the maximum τ* ) max{τn+2, τr, τic, τoc} is obtained. If τr ) τ*, then consider doing a simplex expansion. If τn+1 ) τ*, then consider shrinking the current simplex. If the maximum τ* corresponds to either τic or τoc, then replace xn+1 by xic or xoc, respectively, and do a replication of the best vertex before terminate iteration. • Expansion. An expansion experiment is done at

xe ) 3 x j - 2xn+1

Figure 6. Hypothetical reflections (circles) for ranking experimental points (dots) in a simplex using the values of Kendall’s tau.

(9)

Data obtained is incorporated into the data set. If τe > τr, accept xe in place of xn+1 to form a new simplex; otherwise, accept xr in place of xn+1. Whatever the case, always do a replication operation before terminate iteration. • Shrinking. To avoid being trapped in a local optimum, shrinking the current simplex is only advisable if x1 is the best point found so far, i.e., the one that has the highest Kendall’s tau of all points in the data set. If done, the shrinking operation generates a smaller simplex by only retaining the best vertex x1. The coordinates of new n vertices which are closer to the best point are calculated as

xi′ ) x1 + ϑ(xi - x1), 0 < ϑ < 1, i ) 2, ..., n, n + 1 (10)

Figure 7. Sampling points along the improvement direction.

seeking to minimize the process output, the worst vertex r xn+1 is the one that provides a reflected point xn+1 with the highest Kendall’s tau compared to alternative reflections, whereas the best vertex x1 corresponds to a hypothetical reflection xr1 with the lowest Kendall’s tau. In the case of ties, the corresponding function values for the involved vertices are used for ranking. • Sampling. Following vertex ordering, a local direction for improvement is known. Sampling along this line is done at the following coordinates (see Figure 7 for details): • Reflection. A reflection of the current simplex gives the coordinates xr defined as follows,

xr ) 2 x j - xn+1

(6)

n xi/n is the centroid of the n best points. where x ) ∑i)1 • Contraction. Outer and inner contraction of the current simplex provides, respectively, the following coordinates:

3 1 xoc ) x j - xn+1 2 2

(7)

1 1 xic ) x j - xn+1 2 2

(8)

Then do a replication experiment and terminate iteration. • Replication and Convergence Test. The rationale of replicating a previously tried input is to improve the point-wise estimation of the process output for that input. For the sake of economy, a replication of one of the vertices in the new simplex must be done selectively. The replication step gives rise the question of how best to choose which vertex in the current simplex should be used for remeasuring the process output. Here, the best vertex x1 is chosen for replication only if the new vertex entering the simplex is ranked as the best one; otherwise, no replication is done. The logic for this selective replication strategy is to reject optimistic outliers. Before terminating the iteration, the convergence criterion is evaluated.

yjk - yjk-1 eδ yk

(11)

where yjk stands for the average response for all the vertices in the kth-iteration simplex and δ is a small tolerance. The overall logic of the statistical simplex search method is delineated in Figure 8. A very important feature of the logic of the statistical simplex algorithm is that the entire data set is used throughout. This provides the statistical simplex search with a global view of the function landscape, as will be demonstrated in Rastrigin’s function. Moreover, the data set can incorporate to advantage any previous data available regardless of the way it was generated. The original simplex algorithm was conceived primarily for unbounded domain problems. With bounded variables, generated points can leave the domain after either a reflection or expansion operation. It is straightforward to account for any type of constraints by projection along the improvement direction. For the

8802

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005

Figure 8. Statistical simplex search method. Table 5. Performance of the Statistical Simplex Method under Noise and Outliers initial point

noise distribution

outlier probability, pi

iterations done

solution found

function value

(5, 5) (5, 5) (-5, -5) (5, 5) (-5, 5) (5, -5) (5, 5)

normal (10, 0.1) normal (10, 0.2) normal (10, 0.1) normal (10, 0.2) normal (10, 0.1) normal (10, 0.15) normal (10, 0.2)

1/200 1/400 1/200 1/400 1/100 1/100 1/50

323 298 380 343 390 385 400

[0.001, 0.004] [1 × 10-4, 4 × 10-5] [0.038, 0.067] [3 × 10-3, 7 × 10-4] [0.03, 0.006] [0.02, 0.005] [0.091, 0.044]

-0.0756 0.00032 0.07 0.025 0.056 0.088 -0.12

sake of simplicity, consider the case of upper and lower bounds. Projection for each variable is accounted for using

{

min (xi < xmin i ) w xi ) xi (xi > xmax ) w xi ) xmax i i

}

(12)

As a result of projection onto the active constraints, the simplex will shrink into the subspace of the saturated variables. 3.2. Example: Rastrigin’s Function (Continued). Beginning from the point (5, 5), the Nelder-Mead simplex search (fminsearch function in Matlab) is trapped in the nearest local minimum (4.97, 4.97). Surprisingly, for the multimodal Rastrigin’s function in Figure 3, the statistical simplex converges in 240 iterations to a final value function 1 × 10-5 at the solution (0.000 23, 0.000 14). The influence of different orientations of the initial simplex was tested, and the results obtained were almost identical. The standard simplex cannot leave the valley where the initial point is located, whereas the statistical simplex always converges very close to the global minimum at (0, 0). To test further the behavior of the statistical simplex method, the presence of noise and outliers in the data was simulated. For introducing noise in the Rasnoisy function in eq 5, a normal distribution with mean µ and variance σ2 was used for parameter A. To simulate infrequent outliers, in each call to the Rastrigin noisy function, the value of parameter A is set to 100 with a

probability pi, whereas it is set to a statistical distribution with mean µ ) 10 with probability (1 - pi). The value of pi is chosen so that a handful of outliers over a maximum of 400 Rasnoisy function evaluations are generated. Results obtained are summarized in Table 5. The resistance to noise and outliers of the statistical simplex method in the search for the global optimum at (0, 0) is very remarkable. It is worth noting that the standard Nelder-Mead method is always trapped at the closest local minimum to the initial point. 4. Semibatch Reactor Optimization As an application of the statistical simplex method for experimental process optimization, run-to-run improvement of the operating policy for the acetoacetylation of pyrrole with diketene in an isothermal semibatch reactor is studied using simulated data. Unsystematic variations in kinetic parameters along with some performance outliers due to faults in the temperature control system are considered. The reaction system considered is described in detail by Ruppen et al.17 and comprises the following main reactions

8 C; 2B 9 8 D; B 9 8 E; C + B 9 8F A+B9 k k k k 1

2

3

4

where A ) pyrrole, B ) diketene, C ) 2-acetoacetyl pyrrole, D ) dehydroacetic acid, E ) oligomers, and F ) a pool of undesired byproducts. In Table 6, nominal values of kinetic parameters are given along with initial

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005 8803

Figure 9. Learning curve for the performance index PI without noise or outliers using the statistical simplex method.

Figure 10. Learning curve for the performance index PI with noise and outliers using the statistical simplex method.

Table 6. Nominal Kinetic Parameters and Initial Conditions for the Semibatch Reactor Example

0.0149), and the resulting value of the performance index is PI ) 0.1570, a very suboptimal operating policy. As can be seen, the statistical simplex algorithm converges rather quickly to a solution with l1 ) 0.0012 L/min and t1 ) 227.5 min; the overall cycle (batch period included) is 249 min, giving an optimal PI ) 0.5045. It is noteworthy that the standard simplex search carried out using the fminsearch Matlab algorithm provides a rather poor operating policy with l1 ) 0.0017 L/min and t1 ) 136 min; the overall cycle (batch period included) is 160 min, giving an optimal PI ) 0.4444. The superior performance of the statistical simplex is possibly due to the cumulative effects of data used in Kendall’s tau calculations which allows it to have a much broader perspective in comparison with the standard simplex search method. To assess the effect of persistent output noise and occasional outliers in each simulated experiment, the following approach was followed. With probability 0.1, the measured value PI° of the PI index is measured as an outlier value of the true value for the corresponding operating policy,

k1 k2 k3 k4 CBin CA0 CB0 CC0 CD0 V0 reactor cycle time

0.053 [L/(mol min)] 0.128 [L/(mol min)] 0.028 [min-1] 0.001 [L/(mol min)] 5 [mol/L] 0.72 [mol/L] 0.05 [mol/L] 0.08 [mol/L] 0.01 [mol/L] 1 [L] e 300 [min]

conditions and constraints on the final state of the batch. For safety reasons, the final concentration of diketene cannot exceed 0.025 mol/L. To guarantee compliance of this constraint, a batch time is added to the semibatch period if [B] > 0.025. The maximum concentration of D at the end of a run is preferred not to exceed a value of 0.15 mol/L so as to prevent precipitation at room temperature. The objective is to obtain as much product as possible without violation of the maximum allowable concentrations for B and D,

PI ) Vf‚[C]f - δ([D]f)

(13)

where δ([D]f) ) 0 if ([D]f) e 0.15 whereas δ([D]f) ) 10.0([D]f - 0.15) otherwise. The reactor is initially filled with reactant A, whereas reactant B is added during a semibatch period without exceeding a maximum threshold rate of 0.002 L/min. It is assumed that concentrations of species A, B, C, and D can be measured at the final time, whereas the pool of impurities and undesirable byproducts are calculated using mass balances. 4.1. Basic Operating Policy. The easiest way to define an operating policy is by choosing a constant flow rate l1 for charging reactant B and the duration of the semibatch period t1. The duration of the batch period (if necessary) is not subject to optimization since it is defined so as to comply with the safety constraint [B]final e 0.025. The experimental budget is limited to 40 runs including replications (if any). This means that up to 20 iterations of the statistical simplex method can be carried out. In Figure 9, the evolution of the index PI for the ideal case where noise or outliers are not present is shown. The simplex search begins with a very “safe” result: l1 ) 0.0002 L/min and t1 ) 115 min. There does not exist the need for a batch period ([B]final )

PI° ) PI‚0.5(0.5 - Rand)

(14)

where Rand is a random number over the interval [0, 1]. With probability 0.9, the output value of the index PI incorporates noise as follows:

PI° ) PI + 0.05(0.5 - Rand)

(15)

In Figure 10, the results obtained using the statistical simplex search method in the presence of ubiquitous noise and occasional outliers are presented. The proposed statistical simplex approach exhibits a significant robustness to outliers and is almost insensitive to a significant level of run-to-run variability. 4.2. More Complex Operating Policies. In principle, it can be expected that using a more elabsorated operating policy may provide additional room for improving the basic policy discussed above. The first attempt to be made is adding a new degree of freedom by splitting the semibatch period in two steps, as suggested by Srinivasan et al.18 In the first step, the reactant B is added at the maximum rate of l1 ) 0.0002 L/min for a variable time period t1. Furthermore, there is a second feeding step where the feed rate l2 and duration t2 are both subjected to optimization. Without

8804

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005

Figure 11. Learning curve for the performance index PI for 3DoF policy with noise and outliers using the statistical simplex method (with replications of the best current vertex). Table 7. Expected Performance for Policies with Increasing Degrees of Freedom policy DoF

expected PI (mol)

batch cycle (min)

3 4 6 8 10

0.51 0.512 0.513 0.515 0.515

276 304 327 354 369

noise or outliers, the statistical simplex quickly converges (in