Article pubs.acs.org/IECR
Development of an Adaptive Experimental Design Method Based on Probability of Achieving a Target Range through Parallel Experiments Atsuyuki Nakao, Hiromasa Kaneko, and Kimito Funatsu* Department of Chemical System Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan ABSTRACT: In materials design and development, experimenters must make experiments under various conditions until they achieve a required physical property, yield, cost, or other objective. Use of a regression model based on existing experimental results is one suitable way to reduce the number of experiments required and development costs. Although adaptive experimental design methods using regression models for sequential and parallel experiments have previously been developed, those methods are sequential sampling methods or maximization and minimization methods, which is not always suitable for material design. Therefore, we have developed an adaptive experimental design method for parallel experiments in the field of material design, which uses the probability that more than one experimental result will achieve a result within a target range of a property on the next set of parallel experiments. We used Gaussian process regression to consider correlation of the predicted values of a property in multiple experiments. The probability of achieving results within a required property range on the next set of parallel experiments is calculated on the basis of this correlation. Using five case studies, we demonstrated that the proposed method could select experimental conditions more efficiently than traditional methods without requiring any parameters to be set in advance.
1. INTRODUCTION In materials design and development, high-functionality materials require high levels of various properties, such as strength, density, and electrical conductivity. Target values or ranges of these properties are decided based on the purposes of the material development process. Experimenters select experimental conditions such as reaction times, reaction temperatures, and amounts of additives and catalysts, and perform repeated experiments to achieve these target requirements. The number of cycles of experimental condition selection and experiments must decrease to reduce development costs. Use of the experimental data obtained in past experiments offers one way to reduce the number of experimental cycles. The response surface methodology (RSM)1 is a valid tool for working in such situations. In RSM, a regression model is constructed based on a combination of statistical theories and training data, that is, past experimental results. Various regression methods are used in RSM, depending on the specific situation, including linear regression methods such as multiple linear regression (MLR)2 and partial least-squares (PLS),3 nonlinear regression methods such as support vector regression (SVR),4 Gaussian process regression (GPR),5 and artificial neural networks (ANN).6 A regression model predicts a property in a short time, and this model can be used for detailed analyses, including optimization using genetic algorithms,7 sensitivity analysis,8 and experimental design.9 Shieh et al. considered a relationship between conversion ratios from soybean oil to biodiesel and the reaction conditions as a © XXXX American Chemical Society
second-order polynomial function, constructed a regression model based on the experimental data, and found the optimal reaction conditions.10 Khalajzadeh et al. constructed a regression model based on simulation results for a vertical ground heat exchanger and optimized the design parameters for the heat exchanger.11 Yan et al. analyzed the relationship between the components and the abilities of a catalyst for selective hydrogenation of cinnamaldehyde using a GPR model and subsequently developed an optimal catalyst.12 RMS has been applied to wide fields of material design such as drug,13 starch nanocrystals,14 polymer,15 catalyst,16 and metal array.17 However, RSM can lead material design and development processes toward failure. For example, when a relationship between the experimental conditions and a specific property is nonlinear and complex, experimental conditions that are selected based on predicted property values that lie close to a target range often fall into a local minimum or local maximum because a candidate value around the experimental conditions with the property value that is closest to the target range in the existing results tends to be selected. Then, experiments are simply repeated around a local optimum. We can avoid this failure by considering the prediction ability of regression models for each experimental condition. The accuracy of any value predicted by a regression model depends Received: March 2, 2016 Revised: April 25, 2016 Accepted: April 27, 2016
A
DOI: 10.1021/acs.iecr.6b00852 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research on the experimental conditions. A predicted value is reliable in the case where the experimental conditions are similar to many training data, and is unreliable in the case where there are no similar experimental conditions in the training data. On the basis of a predicted value and its accuracy, we can evaluate the potential of a candidate experimental condition. For example, a candidate experimental condition with a predicted value that is not within but is close to a target range and has very high prediction accuracy, is not promising, because there is a very low probability that the actual property lies within the target range. In contrast, another candidate, with a predicted value that is not close to a target range but is unreliable, actually has some probability of reaching the target range. Therefore, it is necessary to consider the prediction accuracy of property values for selection of the next set of experimental conditions in RSM. Some regression methods exist for the handling of prediction accuracy. GPR18 is a regression method that is based on Bayes’ theorem,19 and a GPR model can predict values of a property while simultaneously estimating the prediction accuracy based on the distance from training data in the explanatory variables (X-variables). Relevance vector machine regression (RVMR)20 is another regression method based on Bayes’ theorem and considers the prediction accuracy of each predicted value. An RVMR model is sparser than a GPR model because the number of experimental results used in the prediction can be reduced in RVMR modeling. Ensemble learning21 can also be used to handle prediction accuracy. Multiple submodels, which are regression models based on random subsets of an original training data set, are constructed, and a standard deviation of the values predicted by the submodels is considered as a measure of the prediction accuracy corresponding to each candidate. GPR has been used as a prediction accuracy estimation method for adaptive experimental design methods in previous studies.22−24 Kishio et al. proposed an index (P), which is the probability that a property value will be within a target range, and applied this index to an adaptive experimental design.22 Through calculation of the P-values for all candidate experimental conditions followed by selection of the candidate with the highest P-value, a candidate experimental condition that meets a target property range can be found efficiently, even when the property has some local maxima and local minima. Jones et al. proposed an index called expected improvement (EI) that is the expected value of how an actual property will improve on the previous optimal value of a property.23 Their adaptive experimental design method based on EI can optimize the design parameters of a simulation in a small number of iterations through simulations based on the design parameters with the highest EI values, even when the simulation has high computational costs and nonlinearities. Seo et al. proposed an adaptive experimental design method to minimize the prediction errors of a regression model.24 Their proposed index is based on estimated diminution of prediction errors by the next experiment based on theorems for GPR, and the Xvariable candidate with the highest value of this index is selected for the next experiment to minimize the prediction errors of a regression model. However, those methods are sequential sampling strategies, which are inefficient for material design because use of parallel experiments is a standard technique. There are some studies of optimization by parallel experiments based on EI-value.25−27 Those studies are based on q-EI, which is defined as follows:
q‐EI(Xq) = E(max(0, fmin,obs − min(f (x i))))
(1)
where f min,obs and E(y) denote the minimum of observations and an expectation of y-value, respectively. Xq = [x1T, x2T, ..., xqT]T is an input vector. However, calculation of q-EI is timeconsuming because Monte Carlo simulation28 is required for integration of the product of a multidimensional probability distribution and an improvement of f(x) by the next simulations. In this study, we extend the evaluation index P22 and propose an adaptive experimental design method to enable effective parallel experiments for material design to be performed. The calculation of the P-value is easier than the calculation of the EI-value because there exist fast procedures to calculate the cumulative distribution function of the multivariate Gaussian distribution.26 Moreover, maximization or minimization methods are not necessarily the most efficient method in the field of material design. The purpose of material design is developing materials with appropriate values of some properties. If we use a minimization method for material design, the deviation from the target property values is minimized. However, the information on property values that are higher or lower than the target values is lost by transformation from property values to deviations. This information has a large contribution to the predictive ability of statistical models in regression and classification analyses in material design because we would know that there are some appropriate experimental conditions for materials with the target property values in the interpolation region if the database of experimental results contains property values that are both higher and lower than the target property value. Our proposed method based on P can utilize the information because it can use property values directly. The theorems of GPR are used for considering correlations between the predicted values of a property, and P is modified as Pmultiple in this study. The probability that one or more of the experiments will reach a target property range can be reasonably obtained with Pmultiple. The candidates with the highest Pmultiple values are used as the next experimental conditions. This paper is organized as follows. Section 2 includes the basic concept of P and the details of our proposed method; section 3 compares our proposed method with a traditional method via case studies; and section 4 gives the conclusions of this paper.
2. METHODS 2.1. Gaussian Process Regression (GPR). GPR13 is a regression method that is based on Bayes’ theorem. In GPR, when the X-variables x1 ∈ R1×m and x2 ∈ R1×m, where m is the number of X-variables, that are similar to each other, it is then assumed that the y-values at x1 and x2, that is, y1 and y2, respectively, are also similar to each other. This means that y1 and y2 are correlated to each other, depending on the similarity of x1 to x2. This correlation increases as the similarity of x1 and x2 becomes higher. On the basis of this assumption, a prior probability based on the X-variables of a training data set Xq = [x1T, x2T, ..., xqT]T ∈ Rn×m and an input vector xn+1 ∈ R1×m is assumed to be the following n+1-dimensional normal distribution. p(yn + 1) = N (0, Cn + 1) B
(2) DOI: 10.1021/acs.iecr.6b00852 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research where yn+1 ∈ R(n+1)×1 denotes a vector of y-values that corresponds to Xn+1 = [x1T, x2T, ..., xn+1T]T ∈ R(n+1)×m. Cn+1 ∈ R(n+1)×(n+1) denotes a(n+1)×(n+1) covariance matrix based on Xn+1. Cn+1 is assumed to satisfy the following equation.
Cn + 1
⎛ k(x1, x1) ⋯ k(x n + 1, x1) ⎞ ⎜ ⎟ ⋮ ⋱ ⋮ =⎜ ⎟ + β In + 1 ⎜ ⎟ ⎝ k(x1, x n + 1) ⋯ k(x n + 1, x n + 1)⎠
This equation is a probability distribution for a y-value for xn+1. Therefore, the probability that a y-variable or a property for a candidate of experimental condition xn+1 achieves a value in a target range of yl ≤ y ≤ yu is calculated by integration of eq 8 from yl to yu. This index is named P. The P-value is expressed as follows: P(x n + 1) =
(3)
(4)
where xi and xi′ denote the ith X-variables of x and x′, respectively. θ1, θ2 and ηi are hyperparameters that depend on a specific relationship between the X-variables and a y-variable. θ1, θ2, ηi, and β are decided by maximization of the function of eq 5, which is based on the maximum likelihood method.30 We used the conjugate gradient method for maximization of the function in this study. p(yn) = N (0, Cn)
(5)
⎛ k(x1, x1) ⋯ k(x n, x1) ⎞ ⎜ ⎟ ⋱ ⋮ ⎟ + β In Cn = ⎜ ⋮ ⎜ ⎟ ⎝ k(x1, x n) ⋯ k(x n, x n)⎠
(6)
where yn ∈ Rn×1 denotes a vector of the observed y-values of the training data. To calculate a posterior probability, the observed y-values of the training data are substituted for a prior probability. The equation for the prior probability is transformed into the following one-dimensional normal distribution: p(yn + 1|yn ,obs) = N (k nTC−n 1yn ,obs, β − k nTC−n 1k n)
p(yn + k) = N (0, Cn + k )
(7)
Cn + k
⎛ k(x n + 1, x1) ⎞ ⎜ ⎟ ⋮ kn = ⎜ ⎟ ⎜ ⎟ ⎝ k(x n + 1, x n)⎠
(11)
⎛ k(x1, x1) ⋯ k(x n + k , x1) ⎞ ⎜ ⎟ ⋮ ⋱ ⋮ =⎜ ⎟ + β In + k ⎜ ⎟ ⎝ k(x1, x n + k) ⋯ k(x n + k , x n + k)⎠
(12)
where yn+k denotes a vector of the y-values corresponding to Xn+k = [x1T, x2T, ..., xn+kT]T ∈ R(n+k)×m. Cn+k ∈ R(n+k)×(n+k) denotes an (n+k) × (n+k) covariance matrix that is based on Xn+k. To calculate the posterior probability, the observed y-values of the training data are substituted for the prior probability in the same way as normal GPR. The posterior distribution is then expressed in the following equation.
(8)
where yn+1 denotes a y-value at xn+1. Equation 7 is a probability distribution of a y-value corresponding to xn+1. Therefore, the average value of the posterior probability is used as a predicted value ypred(xn+1), and a standard deviation σ(xn+1) of the posterior probability is used as a prediction accuracy criterion. 2.2. Probability of Achieving a Target Range through a Single Experiment. In this section, we determine the probability of achieving a value in a target range for a yvariable.16 The probability is based on a normal distribution that is output by a GPR model. The output of the GPR model is given as follows:
T −1 T −1 p(yn + 1 ∼ n + k|yn ,obs) = N (K nk Cn yn ,obs, β Ik − K nk Cn K nk)
(13)
K nk
2
p(yn + 1|yn ,obs) = N (ypred , σ )
(10)
where Φ is the cumulative distribution function of the standard normal distribution. The candidate of experimental condition with the highest P-value among all the candidates is selected as the next experimental condition. 2.3. Gaussian Process Regression for Multiple Candidates. In this study, we use extended GPR to perform multiple experiments simultaneously. A simple method for selection of multiple candidates using P is based on selection of candidates in descending order of their P-values. However, the candidates that would be selected are concentrated around the candidate with the highest P-value. Therefore, the selected experimental conditions resemble each other, and the results of these experiments will be almost identical. Also, only information on a small domain could be obtained from the experiments. The selection of candidates in descending order of their P-values is therefore inefficient. Evaluation of the correlation of the y-values that corresponds to each pair of candidates is required for a more effective selection process. GPR can consider correlation of the y-values corresponding to plural input vectors, which are a combination of experimental conditions in the case of material design. The GPR for plural input vectors is based on the same assumptions as the normal GPR. A prior probability based on the explanatory variables of a training data set Xn = [x1T, x2T, ..., xnT]T ∈ Rn×m and on k input vectors Xk+1 = [xn+1T, xn+2T, ..., xn+kT]T ∈ Rk×m is assumed to be the following n + k dimensional normal distribution.
m i=1
N (ypred , σ 2) dy
⎛ yu − ypred ⎞ ⎛ yl − ypred ⎞ = Φ⎜ ⎟ − Φ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ σ σ
(n+1)×(n+1)
k(x , x′) = θ1 exp(∑ ηi(xi − x′i )2 ) + θ2
yu
l
where In+1 ∈ R and β denote a identity matrix and a hyperparameter that indicates the magnitude of the measurement noise, respectively. k(x,x′) is a kernel function29 that decides the relationship between the correlation of the yvalues and the similarity between the X-variables. In this study, we use the following function as the kernel function k(x,x′): (n+1)×(n+1)
∫y
(9) C
⎛ k(x n + 1, x1) ⋯ k(x n + k , x1) ⎞ ⎜ ⎟ ⋮ ⋱ ⋮ =⎜ ⎟ ⎜ ⎟ ⎝ k(x n + 1, x n) ⋯ k(x n + k , x n)⎠
(14)
DOI: 10.1021/acs.iecr.6b00852 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research where yn+1∼n+k ∈ Rk×1 denotes a vector of y-values that corresponds to Xk. KnkTCn−1yn,obs is a k-dimensional vector that expresses the predicted values corresponding to Xk. βIk− KnkTCn−1Knk is a k × k covariance matrix with diagonal elements that denote σ2 of the normal GPR, and has an (i,j) element that is a covariance of the predicted y-values corresponding to xn+i and xn+j. The interdependence of the predicted y-values corresponding to Xk is then evaluated using this covariance matrix. Consequently, the posterior probability can be used to calculate the probability that one or more experiments achieve a value in a target range in the next set of experiments. 2.4. Probability of Achieving a Target Range through Parallel Experiments. When two experiments are conducted at the same time, the experimenters need the first and/or the second experimental results to attain the required range of a property. The probability of achieving this through two experiments is expressed as follows: P(E1 ∪ E 2) = P(E1) + P(E 2) − P(E1 ∩ E 2)
lower bound and the upper bound of the required ranges of yvalues, respectively. P(E1) is calculated by integrating the joint probability distribution over area A. P(E2) is calculated similarly by integrating the distribution over area B. P(E1∩E2) is calculated by integrating the distribution in the area where both y(x1) and y(x2) are in the yl ≤ y ≤ yu range, that is, area C, which is given as follows: P(E1 ∩ E 2) =
∫ ∫C f (y1, y2 ) dy1 dy2
(16)
Here, y1, y2 and f mean y(x1), y(x2) and two-dimensional normal distribution where the average vectors are estimated yvalues and the variance−covariance matrix is given in eq 14. On the basis of these three integrated values, P(E1UE2) is calculated using eq 15. When an experimenter performs more than two experiments simultaneously, the probability of achieving the target range of a property through the next set of experiments can be calculated in the same way. The index of the probability, which is named Pmultiple, is given as follows:
(15)
where E1 and E2 denote the events where the first and second experiments attain a required range of a property, respectively. P(E1) and P(E2) can be calculated using eq 10. A correlation between the y-values corresponding to the first and second experiments is required for calculating P(E1∩E2). The covariance in eq 14 contains information on this correlation. The output of the GPR for plural input vectors (see eq 13) is a joint probability distribution of y-values corresponding to Xk. Therefore, the probability that one or more of the actual yvalues lie within a required range is calculated by integrating this joint probability distribution. When an experimenter conducts two experiments simultaneously, the two-dimensional normal distribution shown in Figure 1 is output from the GPR model. The horizontal axis and the vertical axis denote the yvalues at x1 and x2, respectively. The numerals in this graph indicate the probability densities that the actual y-values at x1 and x2 correspond to each coordinate. yl and yu denote the
n
Pmultiple = P( ∪ Ei)
(17)
i=1
where Ei denotes an event where the ith experiment achieves the required range of a property. This equation is decomposed into the following equation by the inclusion−exclusion principle. n
n
P( ∪ Ei) = i=1
n
∑ P(Ei) − ∑ i=1
P(Ei ∩ Ej)
i ,j:i