Cost-Effective Process Modeling and Optimization ... - ACS Publications

May 7, 2015 - Page 1 ... apply the migration-based modeling method to process optimization wherein a specified target of the process is to be explored...
0 downloads 7 Views 2MB Size
Article pubs.acs.org/IECR

Cost-Effective Process Modeling and Optimization Methodology Assisted by Robust Migration Techniques Linkai Luo,† Yuan Yao,‡ and Furong Gao*,† †

Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong ‡ Department of Chemical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan S Supporting Information *

ABSTRACT: While data-based models are frequently used as an effective means of investigating a variety of complex processes in engineering, they are typically restricted to a specific process being modeled. This paper demonstrates the application of migration-based approximation to relate similar yet nonidentical processes, using an extension to the model calibration method. A robust method is also used to identify the plausible migration function most consistent with experimental data. In addition, we apply the migration-based modeling method to process optimization wherein a specified target of the process is to be explored; this is achieved by using an infill criterion that assesses the conditional likelihood that a process response will attain the target. Two examples, namely, a pair of test functions and real-world injection molding processes, are used to demonstrate the method. similar process models can be considered as though “migrated” from one to the other. For convenience, we shall refer to the well-studied process as the past process, and to the new yet similar process as the current process, here and throughout this article. This paper will present a cost-effective process modeling technique that allows for migration among models that describe similar but nonidentical processes. The proposed approach aims to achieve satisfactory accuracy of a current process model using the lowest possible number of experimental runs. In essence, this is achieved by an extended calibration applied to the past model predictor together with well-planned experiments performed on the current process. We describe the calibration model using a Bayesian treatment and call it the migration model accordingly. The Bayesian migration model used in this work has the advantage of being capable of giving reliable prediction performance when training data are limited because it inherits the benefits of Bayesian inference, namely the incorporation of information already at hand and the avoidance of overfitting, the latter of which can easily occur with conventional techniques such as least-squares solutions. In addition, the Bayesian predictive approach is able to account for different sources of variability in a process by providing a full description of the model and parameter uncertainties instead of only a point estimate. While there is a rich literature in Bayesian calibration methods, two things set us apart from previous work. First, we emphasize the need for dealing with measurement errors in migration models, as oppose to studies that use interpolation without taking noise into account.10,12 Second, we do not limit our migration function to be a fixed form, instead, it is updated

1. INTRODUCTION Data-based models are frequently used as an effective means to investigate complex physical and engineering processes. Once a model has been constructed, it can readily be used to predict the results of future experiments or to explore the optimum process conditions. While a well-constructed model can be an efficient means of gaining insight into a process, it can still be experimentally costly. Consequently, it is desirable to perform a limited number of experimental runs when trying to understand and model the process efficiently and effectively. Traditional modeling approaches such as response surface methodology, artificial neural networks, radial basis functions, Gaussian processes, and support vector machine have been widely used in engineering applications.1,2 These methods, however, typically require expensive or time-consuming experimentation. While experimentation on new processes can be very costly, other sources of information may be present: prior information can be elicited from expert opinions;3 simulators of the physical process may be available for describing the same system;4,5 and accurate engineering models developed from similar, but nonidentical processes may exist.6−8 The focus of this paper will be the use of existing models in the efficient development of models for new processes, with the goal of achieving a desired accuracy using the lowest possible number of experimental runs. The use of existing models to assist in the modeling of new processes has a long history in many disciplines. Related work includes model calibration,4,5,9−12 calibration transfer,13,14 process transfer,15,16 and model migration.6−8,17−19 Model migration can loosely be viewed as an extension of the calibration transfer methods, in which a model developed from a wellstudied process is adopted to describe a similar but nonidentical process. To be specific, migration addresses similar but nonidentical systems, that is, systems that may be substantially different in their operations but that share many similar characteristics. Such phenomena are common in engineering applications. In this context, the part that is shared between © 2015 American Chemical Society

Received: Revised: Accepted: Published: 5736

January 29, 2015 April 28, 2015 May 7, 2015 May 7, 2015 DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research

performed one after another in the sequential strategy for the current process, it is essential initially to carry out a few experimental runs in order to build the future plan. The objective of experimentation at this stage is to explore the entire feasible design region to the extent possible in order to provide some supportive points for migrating the past model to the current one. To this end, the choice of design methods can again be one of the aforementioned space-filling designs. In this study, we use HSS to allocate the initial runs. The other issue is the number of initial runs, which is rather subjective and still remains an open question. Some researchers have recommended that no more than 25 % of the budget be allocated to this initial stage.25 If (though not theoretically rigorous) n = 10d is taken to be the budget for current model development, then according to the 25 % rule, the number of initial points that should be used is ⌈2.5d⌉ denotes the smallest integer greater than or equal to ⌈a⌉. The methodology is outlined in Figure 1. The flowchart describes a migration−experimentation−optimization cycle

whenever new data are generated. In our work, the choice of migration function will be made sequentially through a Bayesian model selection technique.20 We characterize our method as robust migration since it can automatically identify the best migration model among a set of candidates. This paper also addresses process optimization for a current process, the aim of which is to find conditions that will generate a predetermined target response of the process. This target may be set by benchmarking to competitive products, a desired level of product quality, the minimum cost of process manufacturing, or some other outcome. In particular, we achieve the desired target using a one-stage method, which works concomitantly with model development and refinement. Additionally, we propose using another infill criterion, one which assesses the conditional likelihood that a specified process condition will yield the target response;21,22 that is, we choose the design point that maximizes the conditional likelihood at each step and perform subsequent experiments around this point. The cases studied show that our method can reduce experiments significantly when comparing with the traditional optimization approach. The article is organized as follows. We describe our methodology in section 2, and then present our Bayesian robust migration in section 2.1, followed by considerations of ways to perform sequential design for process optimization in section 2.2. We validate our methodology in section 3 by means of two examples, namely, a pair of test functions (section 3.1) and realworld injection molding processes (section 3.2). Finally, in section 4, we summarize our conclusions.

2. METHODOLOGY As mentioned earlier, a past model usually has good predictive capability for the past process, and it can be built using a variety of different modeling methods. In our work, we limit our discussion to Gaussian Process (GP) modeling. GP has gained popularity for many years because of its powerful ability to model a class of complex processes of the kind that often occur in science and engineering. A brief review of GP is given in section 2.1.2. During the typical study of a process (e.g., at the beginning of past model development), there is often a lack of deep understanding of the true behavior underlying the process. In such a case, experimental plans for exploring the entire feasible space are favored, as a means of understanding and modeling the process as a whole. To that end, experimental designs such as Latin hypercube design (LHD), space-filling design, uniform design (UD), and Hammersley sequence sampling (HSS) are commonly employed. It has been found that HSS is often able to provide more uniform convergence than the others.1,23 In addition, although similar performance can be achieved using the optimization-based LHD, HSS is more straightforward to implement. We therefore adopt HSS in this study in choosing design points for the past process. A rule of thumb24 for the number of points that should be used in the design is n = 10d, where d is the input dimension. When developing the past model, the number can be greater than this, thus allowing us to build a more accurate model, or, if past and current models are highly related, the required number for the current model may be somewhat less, and this is one of the advantages of model migration. The procedure for generating HSS points is presented in section 2.1.3. It is important to emphasize the role of the initial experiments in building the current model. Although experiments are

Figure 1. Flowchart summarizing the key steps of model migration, sequential experimentation, and process optimization.

consisting of three key components. To begin with, it is assumed that the past and current processes are similar according to expert opinions or engineering experience. In our work, we use the GP to model the past process, using abundant experimental data generated using HSS. Once the past model has been constructed and is ready for use, we advance to robust migration (section 2.1). This uses a flexible migration model, as opposed to methods that use, in their own right, fixed adjustment models.7,10,12 Next, a leave-one-out cross-validation technique is applied to assist in the determination of the best migration function that is most consistent with the current experimental data. With the prediction results from the robust migration, the current model is then optimized to find the optimum process condition given the search target. If the resulting response is satisfied (e.g., within the prescribed engineering tolerance) or if the experimental budget runs out, then the procedure will end; otherwise, sequential experiments will be performed, with design points chosen to maximize the conditional likelihood of response data at the given target to gain one more observation that is expected to quickly converge to the desired value (Section 2.2). Meanwhile, the current model is iteratively updated with the new current data to obtain 5737

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research

used mean function with a mean of zero and the covariance function:26

a more accurate approximation. The process is repeated until the termination criterion is met. 2.1. Robust Migration Framework. We consider d-dimensional input, x = (x1,...,xd)′, and the univariate response, y = y(x) for both processes in this paper. Without loss of generality, assume that the input space is the unit hypercube, + =[0,1]d (and we recommend normalizing the original data into this region). The proposed robust migration framework borrows the idea of Bayesian adjustment of multifidelity simulators and improves the methodology using a robust strategy that suits our problem of interest. We detail these pieces in the sections that follow. 2.1.1. Basic Structure. We first consider how to build an approximation of a current process that is enhanced by the borrowing strength of an available past model. Combining a past model with a current model naturally leads to a complex notation, and we will simplify this by using superscripts “p” and “c” − abbreviations respectively for “past” and “current”to distinguish from the past process data to the current process. Let us first assume that we have current data consisting of n values yc = (yc1,...,ycn)′ evaluated at points Xcn = (xc1,...,xcn)′. Given past model ŷp(·) (elaborated in section 2.1.2), we consider the following migration model to “link” the two models: y c (x c) = ρ(x c) ·y ̂p (x c) + λ(x c) + z(x c) + ϵ(x c)

d

C(x i , x j) = σ 2 exp(− ∑ lk(xik − xjk)2 ) + τ 2δij k=1 2

= σ (R 0(x i , x j) + αδij)

where R0(xi,xj) is the correlation function between the noise-free responses at points xi and xj, σ2 is the constant process variance, and l = (l1,...,ld)′ is a vector of roughness parameters that represents the rate at which the correlation between y(xi) and y(xj) decays to zero as xi and xj diverge. δij = 1 if i = j and 0 otherwise. In addition, τ2 is the noise variance and α = τ2/σ2 is the variance ratio. Denote the covariance function by C(xi,xj) = σ2R(xi,xj). Now the prior on the observations is y ∼ N (0,σ2R), where N denotes the (multidimensional) normal distribution and 0 is a vector of 0s having length n. R is an n × n covariance matrix, whose ijth element is defined by the covariance function R(xi,xj). It can be shown that the conditional predictive distributions for a new data point x∗ is again normal distributed with mean and variance: y ̂(x ∗) = r(x ∗)′R−1y, s 2(x ∗) = σ 2(1 − r(x ∗)′R−1r(x ∗))

Cov(z(x ic), z(x cj )) = σ 2R 0(x ic, x cj )

n = nmnm − 1 ... n0 = n0 + n1p + n2 p2 + ... + nm pm

d k=1

(4)

respectively, where r(x*) =(R(x*,x1),...,R(x*,xn))′. Finally the covariance function that we use have some free parameters to be estimated, for example, the covariance is parametrized by ψ = (σ,l,α) in our case. One approach is to optimize the marginal likelihood function.26 The softwares to implement the GP can be publicly accessed from The Gaussian Processes Web Site http://www.gaussianprocess.org/. 2.1.3. Hammersley Sequence Sampling (HSS). We use the HSS technique to generate data that are required for model development. HSS design can be briefly described as follows:27 1. Any non-negative integer n can be represented as a radix notation of a prime p as follows:

(1)

where ρ(·) accounts for scale change from the past model to the current one, and λ(·) characterizes the potential bias (discrepancy) between the two models. We assume z(·) to be a GP with zero mean and covariance function R0(·,·). The stochastic term here expresses the covariance between values at xci and xcj :

= σ 2 exp( − ∑ lk(xikc − xjkc )2 )

(3)

with integers ni ∈ [0,p − 1], and m being the integer part of logp n. 2. The point in [0,1] is obtained by reversing the order of the bits of n and moving the decimal point:

(2)

which is parametrized by l = (l1,...,ld). Finally, ϵ(xc) represents homogeneously distributed normal noise N(0,τ2I), where I is an n × n identity matrix. Equation 1 is called the migration model in this study. Note here that a past model is readily available for migration. The past model can be any predictive model in the feasible domain, regardless of its structure. We restrict ourselves to consider GP in this study, because GP has proven to be a powerful tool and is capable of modeling a number of nonlinear systems in science and engineering. Interested readers can refer to the work6,17−19 where the past model is developed using non-GP techniques. 2.1.2. Gaussian Processes (GPs). We next briefly describe the GP technique that is constantly used in this paper. Let X = (x1,...,xn)′ be n × d experiment design matrix and the outputs for the experiments are held in the n-dimensional vector y = y(X) = (y1,...,yn)′. In GP modeling, a function y(x) is a Gaussian random function if for any choice of x1,...,xn, the vector (y(x1),...,y(xn))′ has a multivariate normal distribution that is completely specified by its mean function μ(x) and covariance function C(xi,xj). We accordingly adopt a widely

ϕp(n) = .n0n1 ... nm = n0 p−1 + n1p−2 + n2 p−3 + ... + nm pm − 1

3. For the first (d − 1) primes p1,..., pd−1, the Hammersley points on the dth dimensional space are given xk = 1 −

⎛k ⎞ ⎜ , ϕ (k ),..., ϕ (k )⎟ , pd − 1 ⎝ n p1 ⎠

k = 1 ,..., n

with 1 being the 1 × d unity vector. Note that xk ∈ [0,1] for k = 1,...,n. 2.1.4. Estimation. We describe how to estimate migration model 1 in this section. For simplicity, denote data sets of the current process by symbols omitting superscripts here and hereafter unless confusion could otherwise arise. Thus, the migration can be written as y c (x) = ρ(x) ·y ̂p (x) + λ(x) + z(x) + ϵ(x)

(5)

where z(x) ∼ GP(0,σ R0) and ϵ(x) ∼ N(0,τ I). 2

5738

2

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research

As demonstrated in the literature,4,29,30 the Bayesian framework has the added advantage that it can easily incorporate various sources of uncertainty, such as those present in approximating past models. Therefore, we follow a Bayesian approach and develop the necessary estimation and uncertainty quantification techniques for the proposed migration model. We first specify proper prior probability distribution for those parameters:

A common choice for a scale function is a linear regression function with main effect only d

ρ(x) = ρ0 +

∑ ρk xk

(6)

k=1

While the above linear relationship is insufficient to account for a change in effect from the past model to the current one, the bias function may be expanded by means of some continuous basic functions in order for a simple or complex model discrepancy between the two models:

θ|σ 2 ∼ N(0, σ 2(H′H)−1)

Such priors are frequently used in Bayesian variable selection, and it has been proven that conjugate priors for a normal likelihood lead to a closed-form posterior probability distribution.31 Let

m

λ (x) =

∑ λkbk(x)

(7)

k=1

where {b1(x),b2(x),...,bm(x)} is a set of known functions, and λk’s are unknown parameters. In this work, we use the shifted Legendre orthogonal polynomial basis functions because they are defined in the interval [0,1]; other bases such as wavelets and Fourier series can also be used, depending on the problem of interest. An nth degree shifted Legendre polynomial of a particular variable x is given by P̃n(x) = Pn(2x − 1),28 where Pn(x) =

Φ(X cn) =

d

(13)

θ|y c, σ 2 ∼ N(θ ̂, σ 2 A)

where bk’s are chosen from the shifted Legendre polynomials. The model is parametrized by a set of unknown parameters ρ0,ρ1,...,ρd, λ1,...,λm. Let these be denoted by a (d + m + 1) × 1 vector θ. In addition to θ, model 8 is also parametrized by a set of hyper-parameters ψ = (σ,l,α) that governs the Gaussian process z(x). Model 8 can be further simplified to (9)

where h(x) = (ŷ (x),x1·ŷ (x),...,xd·ŷ (x),b1(x),...,bm(x))′. We write the above function in matrix form given n data points: p

c

y = Hθ + Z ϵ

(15)

where θ̂ = (H′R H + H′H) H′R y , and A = (H′R H + H′H)−1. The manner for solving the so-called hyper-parameters ψ = (σ, α, l1,...,ld) can be problem dependent. A fully Bayesian solution specifies prior distributions to these unknown quantities and gives posterior distributions to qualify the uncertainties.32 Such a solution is entirely Bayesian; nevertheless, it can be computationally intensive because the resulting posteriors usually have no closed-form solutions and thus must rely on a numerical analysis such as Monte Carlo. This treatment would inevitably increase the computational burden, which is undesired for optimization problems. Therefore, we turn to the use of maximum-likelihood estimation (MLE) for finding ψ. We write the log-likelihood (ignoring constant terms) as −1

(8)

y c (x) = h′(x)θ + z(x) + ϵ(x)

(14)

from which we can deduce that the posterior distribution p(θ|yc,σ2) for parameter θ is the following one:

k=1

+ ϵ(x)

−1

−1 c

−1

n 1 − log(σ 2) − log|R| − Φ(X cn) 2 2

(10)

(16)

By setting the derivatives of 16, with respect to σ as 0 and solving, we find a maximum-likelihood estimate of 2

where ⎡ y p̂ (x1) x11· y ̂p (x1) ... x1d · y ̂p (x1) b1(x1) ... bm(x1)⎤ ⎥ ⎢ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎥ H=⎢ ⋮ ⎥ ⎢ p p p ⎣ y ̂ (xn) xn ,1· y ̂ (xn) ... xn , d · y ̂ (xn) b1(xn) ... bm(xn)⎦

z(Xcn) 2

σ̂2 =

(y c − Hθ )̂ ′R−1(y c − Hθ )̂ n

(17)

By substituting 17 into 16 and maximizing, we find

+ϵ(Xcn)

n 1 α̂ , l ̂ = arg max − log(σ ̂ 2) − log|R| α ,l 2 2

is the model matrix, Zϵ = is a stochastic Gaussian process defined by GP(0,σ R), and R is an n × n matrix whose ij element is defined by the covariance function

(18)

Since the parameters and hyperparameters are estimated, the migration prediction of the current model is given by substituting the estimated values

d

R(x i , x j) = exp( − ∑ lk(xik − xjk)2 ) + αδij k=1

−1 y ĉ (x ∗) = h(x ∗)′θ ̂ + r(̂ x ∗)′R̂ (y c − Hθ )̂

with α =τ /σ . 2

1 exp( −Φ(X cn)) (2πσ 2)n /2 |R|−1/2

p(θ|y c, σ 2) ∝ p(y c|θ , σ 2)p(θ|σ 2)

m

p

(12)

Bayes’ rule gives us the following equation:

∑ ρk xk)·y ̂p (x) + ∑ λkbk(x) + z(x)

p

2σ 2

p(y c|θ , σ 2) =

is typically defined on [−1,1]. It is easy to see that P̃0(x) = 1, P̃ 1(x) = 2x − 1, P̃ 2(x) = 1/2(3(2x − 1) 2 − 1), P̃ 3(x) = 1/2(5(2x − 1)3 − 3(2x − 1)), ... which can be used to define the overall mean, linear, quadratic, cubic, etc., effects of the variable x in [0,1]. The shifted Legendre polynomials of two or more variables can be multiplied together to define their interaction effects. Thus, the migration model becomes

k=1

(y c − Hθ)′R−1(y c − Hθ)

denote the data-misfit function. The likelihood function is then given

1 dn [(x 2 − 1)n ] n 2 n! dx n

y c (x) = (ρ0 +

(11)

2

5739

(19)

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research where r̂(x∗ ) =(R̂ ( x*,x1),..., R̂ (x *,xn))′. Additionally, the estimated variance at this point is given as follows: −1

s 2(x ∗) = σ ̂ 2(1 − r(̂ x ∗)′R̂ r(̂ x ∗))

relevance score for the potential terms. That is, the relative importance of including a given potential term is revealed by the estimate of its corresponding λ value.20 For assigning the scores, a least-squares solution would be a straightforward method of ranking those terms. However, this is not always possible as the number of potential terms may be higher than the number of training data. For instance, considering all twofactor linear-by-linear interactions in 10 dimensions, the number of potential terms t is 66. To address this, we propose using a Bayesian approach by introducing a Gaussian prior density probability for λ,

(20)

2.1.5. Model Selection. The concept of adjustment between models of various sources was perhaps originally initiated in Kennedy and O’Hagan’s framework.4 Many variants and applications of this seminal approach exist in the literature. In the work of Qian et al.,10 followed by that of Yan et al.,7 the linear regression function ρ(·) was suggested. The foregoing approach was extended by Vanli et al.12 who introduced an additional regression function λ(·). Both approaches have pros and cons. For example, although the method of Qian et al. is flexible in terms of modeling strategy, it seems to be applicable only in situations for which there are no measurement errors, and it is perhaps too simple to adequately capture the discrepancy between two models. The method of Vanli et al. improves model fit to some extent; however, it is sometimes unnecessarily complex, introducing numerous basic terms into the bias function. Hence, the compromise between model fit and model complexity should be carefully addressed. As the complete behaviors of the two models are unknown, it is often difficult to choose which bias functions to use for a given problem. This challenge is precisely what motivates our proposal for a robust migration methodology. In this section, we present a method for carrying out robust migration by utilizing a model selection technique. Notice that the scale function ρ(x) is already determined; thus, we need to select a number of basic terms from a candidate set {b1(x),b2(x),...}. In this work, the candidate basic functions are selected as linear effects, quadratic effects, and two-factor interactions. Here, the two-factor interactions include only the linear-by-linear interactions. This is because the term ρ(x)·ŷp(x) already captures most of the trends of the current model, and higher order effects make the migration unnecessarily complex. Therefore, there are a total of t = (d + 1)(d + 2)/2 candidate basic functions (including the constant term). According to the shifted Legendre polynomials, the linear and quadratic effects can be defined respectively as xjl =

3 (2xj − 1) 2

and

λ ∼ N(0, τ 2S)

where 0 is a vector of 0s having length t and S is a t × t diagonal matrix. Following Joseph and Delaney33 and Joseph et al.,20 the matrix S can be constructed as follows. Let ιij = 1 if λi includes the linear effect of factor j and 0 otherwise. Likewise, qij = 1 if λi includes the quadratic effect of factor j and 0 otherwise. Then ι q the ith diagonal element of S is given by ∏j d= 1sjιij sjqij, where sjι =

λ̂ =

sjq =

3 − 4e−0.5lj + e−lj 3 + 4e−0.5lj + 2e−lj

τ2 SH t ′R−1(y c − H mθm̂ ) σ2

(25)

where Ht is a n × t model matrix of all potential terms, Hm is the model matrix of all currently chosen terms, θ̂m is the associated parameter, and R is the correlation matrix of data. The parameter λ̂ obtained via this ranking method quantifies the importance of the associated potential terms with respect to the data. To identify the best set of potential terms, a greedy forward selection procedure that iteratively adds potential terms having the largest value of |λ̂i| is adopted. This method is equivalent to that used by Joseph and Delaney,33 delivering similar results while being easier to compute. Note that as we are only interested in maximizing |λî |, without loss of generality we set the constant term τ2/σ2 = 1. Once λ̂ has been estimated, we can declare a given potential function as important if its absolute parameter value is large. Thus, the potential term to be added at each step m = 0,1,2,... can be selected as the parameter having the largest |λ̂i |. In short, the model selection under consideration here can be done through the following steps: 1. Start with the empty set of candidate functions (migration only with primary term). 2. Select the next basic function bi(x) with largest |λî |. 3. Incorporate it and update the migration function. 4. Go to 2. There remains an important question to resolve in this forward selection strategy: When should we stop adding potential terms to the primary part? In other words, what is the best value for m given the knowledge we have from the current data set? There are a number of approaches described in the literature, such as the Akaike information criterion (AIC),34 the Bayesian information criterion (BIC),35 and the deviance information criterion (DIC).36 The difficulty in choosing m here is that the migration predictor tries to interpolate the data and thus gives a good fit. As a result, AIC will treat each

for j = 1, ..., d. Note that the shifted Legendre polynomials are properly scaled, so that they have the same length √3 when xj takes the values 0, 0.5, and 1. Therefore, the possible basic functions bk are 1, x1 l, ..., xdl, x1 lx2 l, ..., x(d−1),l·xdl, x1q, ..., xdq. The migration model then can be a combination of a primary term and potential terms t

y c (x) = ( ρ0 + ∑ ρk xk) ·y ̂p (x) + ∑ λk bk (x) + z(x) + ϵ(x) 1 k = 1 k =   primary term

and

(24)

(21)

d

3 − 3e−lj , 3 + 4e−0.5lj + 2e lj

It is shown that the posterior probability density is also Gaussian with mean

1 (3(2xj − 1)2 − 2) 2

xjq =

(23)

potential terms

(22)

The migration with primary term only is called the simplest migration. The goal of robust migration is thus to efficiently determine the basic functions from the potential terms that capture the most variance in the experimental data. The first part of eq 22 is the primary term of migration function and, hence, the parameter ρ = (ρ0,ρ1,...,ρd)′ can be determined independently of λ = (λ1,...,λt)′. The estimation of λ provides a 5740

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research

point that is most consistent with the hypothesis given n data:

candidate term indifferently, and BIC and DIC will favor those terms that, in combination with the primary term, are “simple” in terms of model structure. To overcome this, we use leaveone-out cross-validation (loo-cv) errors. Let yc−i(x) be the migration predictor after removing ith data point. Then the loo-cv error at this point is defined as cvi = y c (xi) − y−c i (xi)

n 1 x n + 1 = arg maxc − log(σ 2) − log|RT| x∈+ 2 2 (y c − μT )T RT−1(y c − μT ) − 2σ 2

(26)

where xn+1 is the (n + 1)th run condition for locating the target. Note that μT and RT continue to update as new data come in and the robust migration identifies the best terms. In our computation, the optimization of 30 is implemented by using a simulated annealing algorithm,37,38 a probabilistic method that can find global optima with constraints. When using the conditional log-likelihood to evaluate the hypothesis that the model passes through (x,T), we also optimize the relevant parameters θ,σ,li (i = 1,...,d) to maximize the conditional likelihood. That is, the parameters are adjusted to make the hypothesis seem as likely as possible. The resulting optimized likelihood can be considered a measure of the possibility of our hypothesis. This one-stage method can deliver a more rapid iterative convergence of points to the optimum condition, thus requiring fewer points than the two-stage method. Finally, we need to specify when to stop searching the target. A common criterion is to minimize the absolute error between the target and the resulting response. Under this criterion, we terminate the experiment at run n if either budget is depleted or

for i = 1,...,n. Define the average loo-cv errors with respect to m by n

eloo‐cv(m) =

∑i cvi2 n

(27)

Now we can choose the value of m that minimizes eloo‑cv(m). 2.2. Process Optimization. In this section we consider how to find the value of xn+1 given current n data pairs{(x1,yc1),..., (xn,ycn)} in order to achieve our target value T such that yc(xn+1) = T. There are basically two types of methods: two-stage methods and one-stage methods.21 Let us illustrate a two-stage method first. In the first stage, the migration model is fit to the observed data, and all of the relevant parameters are estimated. In the second stage, these parameter estimates are taken as correct, and a calculation is done to determine where to search for the target. Then, we perform the next experiment at that resulting condition and verify whether the corresponding response meets the requirement. If at any time the target T is found to the degree of accuracy required, the searching will stop. Otherwise, these new data are added into the training set, and the process is repeated until the target is located. A two-stage method is limited to the accuracy of the search model; thus, when only limited data are available, the resulting points may converge quite slowly toward the true optima because the target value is not observed. To avoid this pitfall, it is better to avoid estimating the parameters of the migration model based only on the observed data. While this may seem impractical, it is precisely what the one-stage method allows us to do. In the one-stage method, we postulate a hypothesis about the location of the point that achieves the goal T and use some criterion to measure the possibility of this hypothesis. More specifically, suppose we hypothesize that the goal T is achieved at a point x. To evaluate this hypothesis, we compute the likelihood of the observed data conditional upon the assumption that the model goes exactly through the point (x,T). Accordingly, we write the joint probability distribution of the observed data yc (of size n) and the target T as follows: ⎛⎡ y ĉ ⎤ ⎡ R r(x)⎤⎞ ⎡ yc⎤ ⎥⎟ ⎢ ⎥ ∼ N⎜⎜⎢ ⎥ , σ 2⎢ T ⎢ ⎥ ⎢ ⎥⎦⎟⎠ ⎣T ⎦ ̂ ⎣ ⎦ r ( x ) 1 ⎣ ⎝ T

|T − y ĉ (x n)| ≤ κ

2

y |T ∼ N(μT , σ RT)

(31)

where κ is a predetermined threshold, often defined as an engineering tolerance requirement in engineering applications. We call the aforementioned procedures a migration-based search method, which can be summarized as shown in Figure 2.

(28)

where ŷc = H θ̂, and T̂ = h(x)′ θ̂; note that H and θ̂ are evaluated on n data points. The other notations correspond to those presented in section (2.1). Equation 28 is derived based on an assumption of GP realization of z(x) and additional normal noise as given in eq 22. With the joint normal distribution 28, we can derive the conditional distribution: c

(30)

Figure 2. Flowchart summarizing the key steps of migration-based goal-seeking method.

(29)

where μT = ŷc + r(x) (T − T̂ ) and RT = R − r(x)r(x) T. We then maximize the log-likelihood of 29 to yield a data

Note that the one-stage method can also be applied to the model without the model migration strategy, in which case it 5741

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research

Figure 3. Comparison between the true model (blue surface) and the estimated model (red surface) with n = 10, 20, 30, and 40. Dotted points are those generated by Hammersley sequence sampling (HSS) for model training. “SRMSE” is standard root-mean-square error.

prediction accuracy is influenced in part by the number of training data points. To show this, we select a series of training numbers: 10, 20, 30, and 40. The results are shown in Figure 3, where contours show comprehensively at a glance the fit to the data and to the response surface. In addition, the standard rootmean-square-errors (SRMSEs)

would be a GP based method. In this scenario, we would replace μT with μT = T·r(x) in eq 30, and r and R should be defined as described in section 2.1.2.

3. ILLUSTRATION This section provides two examples to illustrate the effectiveness of the proposed methodology. The first example is an evaluation of two-dimensional test functions by simulation. This simple example allows us to show the key aspects of the methodology in more detail. We then illustrate the proposed method on an injection molding machine process to examine whether it has the potential to meet experimenters’ requirements in practice. 3.1. Test Function Example. Two test functions that produce different outputs but share similar trends are chosen for study.39 For this example, the current model will be

n

∑i =test1 {[y ̂(x i) − y(x i)]/y(x i)}2 ntest

are used for model assessment, where ŷ is the predicted value, y is the true response, and ntest is the number of testing points (200 in this example). It is clear from the figures that at least 40 data points are needed in order to develop an accurate model (with hyperparameters given as σ2 = 9.07, l = (1,0.59) and α = 0.0055), much greater than 20 that is advised by the rule of thumb. Nevertheless, we will use this model to assist in developing the current model, in the hope of using far less than 40 data points. Robust Migration. To begin with, an initial design of size ⌈40 × 0.25⌉ = 10 is determined using HSS. These HSS points spread out across the entire design region [0,1]2. These design points, together with the responses, are displayed in Table 1.

⎡ ⎛ 1 ⎞⎤ y c (x1 , x 2) = ⎢1 − exp⎜ − ⎟⎥ ⎢⎣ ⎝ 2x 2 ⎠⎥⎦ ×

2300x13 + 1900x12 + 2092x1 + 60 100x13 + 500x12 + 4x1 + 20

and the past model will be y p (x1 , x 2) =

Table 1. Ten Initial Hammersley Sequence Sampling Points and the Responses for the Current Test Function

1 c [y (x1 + a , x 2 + a) 4 + y c (x1 + a , max(0, x 2 − a))

run

x1

x2

yc

+ y c (x1 − a , x 2 + a)

1 2 3 4 5 6 7 8 9 10

0.5000 0.2500 0.7500 0.1250 0.6250 0.3750 0.8750 0.0625 0.5625 0.3125

0.0500 0.1500 0.2500 0.3500 0.4500 0.5500 0.6500 0.7500 0.8500 0.9500

11.7115 13.2255 9.1728 9.5043 7.3788 7.5727 5.5356 4.3645 5.0271 5.4222

+ y c (x1 − a , max(0, x 2 − a))]

with a = 1/20. Both of these models are evaluated on the unit square [0,1]2. When generating data from the models, Gaussian noise with standard deviation 0.01 is added to simulate measurement errors. Focusing on the past model, we develop a GP model ŷp(·) to approximate the true behavior of the model. Of course, the 5742

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research Using these points, two methods are employed, namely, developing the current model (i) directly from the data via the GP technique and (ii) by means of robust migration method. The pure GP model is built according to section 2.1.2. Next we construct a simplest migration model with primary term only. The maximum likelihood estimate of the hyper-parameters are given as l ̂ = (1.6818,0.2293), σ̂2 = 0.0120, α̂ = 4.9333 and the parameter θ0 is given as θ0̂ = (1.0469, −0.0783, 0.0059)′. We obtain eloo‑cv = 0.2272. The simplest migration predictor is then given by y ĉ (x) = (1.0469 − 0.0783x1 + 0.0059x 2)y ̂p (x) −1

+ r(̂ x)′R̂ (y c − H 0θ0̂ )

(32)

where r̂(x) is a vector of length 10 whose ith element R(x,xi) = exp(−∑k2 = 1 lk̂ (xk − xik)2) + α̂ δ(x,xi), and R̂ is a 10 × 10 matrix whose ijth element is R(xi,xj) = exp(−∑k2= 1 lk̂ (xik − xjk) 2) + α̂ δ(xi,xj), and H0 is a 10 × 3 model matrix with ith row (ŷp(xi), xi1·ŷp(xi),xi2·ŷp(xi)). We now consider the robust migration method. To apply the Bayesian forward selection method in ref 20, we need to construct the S matrix. Since the input dimension is two, this matrix is a diagonal matrix given by

Figure 4. Plot of the leave-one-out cross-validation errors eloo‑cv(m) in test function example (described in section 3.1). “SM” denotes the simplest migration case.

where H2 is a 10 × 5 matrix in which each of columns 1−3 columns is H0 and the columns 4−5 are the values of b1 and b2, respectively; θ̂2 = (1.0797, −0.4696, −0.0299, 0.0367, 1.6553)′ and l ̂ = (1.1892,2.1810). We also tried other existing methods using this example. Four models are fitted. The values of SRMSE (based on 200 testing points which are generated by HSS) for each method are listed in Table 2. In this table, GP prediction refers to pure

Ŝ = diag(1, s1̂ ι , s2̂ ι , s1̂ ιs2̂ ι , s1̂ q , s2̂ q)

where sĵι =

3 − 3e−lj

̂

̂

3 + 4e−0.5lj + 2e−lj

̂

,

̂

sjq̂ =

3 − 4e−0.5lj + e−lj ̂

and

Table 2. Comparison of Different Methods in Terms of Standard Root-Mean-Square-Errors (SRMSEs). “GP” is the Gaussian Process

̂

3 + 4e−0.5lj + 2e−lj

̂

We then calculate the parameter ̂ ′t R̂ −1(y c − H 0θ0̂ ) λ ̂ = SH

where Ht is a 10 × 6 matrix whose first column is 1 and whose other columns correspond to the values of x1ι,x2ι,x1ι,x2ι,x1q,x2q. We find that the maximum value of |λî | occurs for the coefficient of the coefficient of the quadratic term x1q. Thus, we take b1 = x1q and add this term to the bias function, and we then re-estimate the migration model for which the parameter is given as θ1̂ = (1.0447, −0.0747, 0.0152, −0.0958)′. Next, we compute the parameter

method

no. of bias terms

no. of parameters

SRMSE

GP prediction simple migration Qian’s method robust migration full migration

0 1 2 6

4 5 6 7 11

4.0981 0.1799 0.1579 0.1554 0.2544

GP predictor without migration strategies; simple migration refers to Kennedy and O’Hagan’s method,4 with no bias terms and the scale term being constant; Qian’s method corresponds to that used by Qian et al.,10 with a constant bias term; and the full migration method includes all linear and quadratic terms. We can see that, overall, the application of migration strategies achieves lower SRMSEs than does the GP predictor, thus confirming that the migration methods have better performance. In addition, the robust migration method has the lowest SRMSE value and thus is the best among the migration strategies. This clearly shows the importance of careful selection of the bias function as well as the superiority of the robust migration method over the other three. In addition to SRMSE values, a real value versus prediction plot for different methods is reported in the Supporting Information (Figure A.1), which reveals the same results. Process Optimization. We next consider how to find optimum conditions for the specified target T. Since we know the true function of the new system, we set the minimum as the target value, that is, T = 1.18; in addition, we set κ = 0.01 in this example. We can also determine that the true point corresponding to the target is (0,1). To achieve the target, we begin with those 10 initial data points in Table 1 and implement the proposed one-stage scheme described in section 2.2.

̂ ′t R̂ −1R̂ −1(y c − H1θ1̂ ) λ ̂ = SH

where H1 is a 10 × 4 matrix in which each of columns 1 to 3 is H0, and the fourth column of which is the series of values corresponding to b1. Note that the matrices Ŝ, Ht, and R̂ in this computation remain the same as before. At this step, the linear term x1ι is identified since it has the largest value of |λî |. Thus, we set b2 = x1ι and continue the forward selection procedure. In the next four steps, the terms x2q, x2ι, x1ι, x2ι and 1 are selected, respectively. The value of eloo‑cv(m) decreases at first and then increases with the entry of the third term x2q, as is shown in Figure 4 (where the simplest migration case is denoted by “SM”). Therefore, we choose m = 2 and obtain eloo‑cv(2) = 0.1060. The final migration model in view of 10 data points is hence given by y ĉ (x) = (1.0797 − 0.4696x1 − 0.0299x 2)y ̂p (x) −1 + 0.0367x1q + 1.6553x1l + r(̂ x)′R̂ (y c − H 2θ2̂ )

(33) 5743

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research After selecting the potential terms (i.e., x1q and x1ι) we maximize the conditional log-likelihood function 30. With such a sparsity of data, we have constrained l in [.001, 10] to obtain a more reasonable prediction. In the first exploration trial, the optimization gives the resulting optimum point x11 = (0.3149, 1). We can now evaluate the current process at x11, and obtained yc(x11) = 5.2104. Clearly, the absolute error κ > 0.01. Then we add this point to the data set and rebuild the migration model. At this step, two potential terms x1q and x1ι are identified as the best sets. Next we repeat the process until the target is found, and, in fact, we reach the target after just five updates, as depicted in Figure 5. The follow-up points are x12 = (1, 0.9999),

resulting optimum converges slowly to the true one, compared with that of the proposed method, which locates the target quickly after the third run. The total number of updated points for this method, not surprisingly, grows to 11. This is much greater than that needed for the proposed methodology, the reason being that the training data are not sufficient to construct an accurate GP predictor of the current process until another 11 points are added. The final optimum for this method is x21 = (0, 0.9867), which yields a response of 1.1735. This result reveals that since the robust migration takes advantage of the past model, fewer updates are required for locating the target, and the method is likely to find a better optimum while using a smaller number of experiments. 3.2. Injection Molding Process. Injection molding is an important polymer processing technique that transforms polymer granules into various shapes and types of products.40 As a cyclic process, injection molding consists of three stages: filling, packing-holding, and cooling. During the filling stage, an injection screw moves forward and pushes the polymer melt into the mold cavity. Once the mold is completely filled, the process switches to the packing-holding stage, during which additional polymer is added to the mold under a specified pressure to compensate for the shrinkage caused by the material’s cooling and solidifying. The packing-holding stage continues until the gate freezes, isolating the material in the mold from that within the injection unit. During the cooling stage, the polymer inside the mold continues to cool; meanwhile, the material is melted and conveyed to the front of the barrel via rotation of the screw. The process is then repeated. Figure 7

Figure 5. Establishment of optimum condition (T = 1.18) using conditional likelihood maximization and the migration-based method in test function example (described in Section 3.1). The large square (□) is the starting point, the small circles (○) are the optimum points at different stages, and the large × is the true optimum. Solid curves refer to the true contours of the current model.

x13 = (0, 0.4255), x14 = (0, 0.9653), and the final optimum point x15 = (0, 0.9926), which yields a response of 1.1812. The result is extremely close to the true point (0, 1). To test the effectiveness of the migration strategy compared to that of the other one-stage method, we also tried one-stage strategy without applying robust migration (which is the GPbased method). The initial data points are the same as before, and l has also been reduced to [0.01, 10] to guarantee a more reasonable prediction. It can be seen in Figure 6 that the

Figure 7. Diagram of injection molding machine.

shows a simplified diagram of a typical reciprocating-screw injection molding machine with instrumentation. One of the key objectives of injection molding is to maintain product quality with respect to specific requirements. In terms of internal and external properties, quality can be affected by key variables at each stage. Empirical models are typically built to describe the relationships and determine the optimum conditions at which the profile or window will yield the highest quality for the process. Other factors, such as variation in materials, the injection molds, and operating conditions, are essential to quality as well. In this study, we include these factors as qualitative categories contributing to similarity of nature, because the underlying processes are similar in the overall injection molding process but slightly different in their underlying mechanism. Many studies have shown that product

Figure 6. Establishment of optimum condition (T = 1.18) using conditional likelihood maximization and the GP-based method in test function example (described in section 3.1). The large square (□) is the starting point, the small circles (○) are the optimum points at different stage, and the large × is the true optimum. Solid curves refer to the true contours of the current model. 5744

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research quality (e.g., part weight41) can be maintained by controlling barrel temperature, injection velocity, and packing pressure during the three stages. Sophisticated tools such as neural networks are used by fitting a large number of experimental data.42 However, substantial effort is needed to build models, particularly if the process continually changes to satisfy customer demands (e.g., changes in materials or mold shapes or both). It is easy to appreciate that model migration, with its reduced experimentation requirements, could be useful for building an entirely new model and locating the target of interest. In this study, we apply our method to two similar processes that differ in both the materials and molds used in the experiments. The experimental setup and analysis are described in the sections that follow. 3.2.1. Experimental Setup. The machine used in this work is a Chen Hsong reciprocating-screw injection molding machine, model MJ-55. The machine is equipped with advanced controllers and sensors for which the key variables can be tracked and measured precisely. First, the barrel temperature measurement consists of four insulated K-type thermocouples installed in the middle thickness of the barrel wall. Second, a Vishay electronic scale, type W06003C, is installed to measure the injection ram displacement, an indirect measurement of injection velocity. Third, the packing pressure is measured using a Kristal Ceraline-S pressure transducer, type SAG25-A200B. All of the data can be recorded by a data logger task in the control unit and is interchanged via shared memories. Finally, the part weight is measured manually using an electronic balance. For the well-studied past process, the mold used was a flat mold as shown in Figure 8a, and the material used was high-density polyethylene (PE016). For the current process, however, the mold used was shifted to a plate mold as shown in Figure 8b, and the material was changed to polypropylene (PPk8003). As a result, the two processes discussed here are assumed to be similar in the overall injection molding process but slightly different in their underlying mechanism. The other operating ranges selected in the experiments for both processes are listed in Table 3. 3.2.2. Analysis. The past process was thoroughly investigated by performing 30 experimental runs. After collecting experimental data, we performed a sensitivity analysis43 on all relevant factors that could affect the part quality (g), denoted by y. The results showed that three factors, namely, barrel temperature (°C), injection velocity (mm/s), and packing pressure (bar),

Table 3. Operating Ranges of Process Factors in the Injection Molding Processes factors

range of the past process

range of the current process

barrel temperature (°C) injection velocity (mm/s) packing pressure (bar)

180−220 20−50 20−45

190−220 25−45 25−45

were the key variables; henceforth, we denote these variables respectively by x1, x2, x3. Thus, we developed a data-based GP model for the relationship between barrel temperature, injection velocity, and packing pressure and the part weight of the final product. Focusing on current process, we performed initial experimental design to allocate data points at which the responses would be observed. The initial eight points were HSSgenerated. Table 4 shows the design points along with the Table 4. Eight Initial Hammersley Sequence Sampling Points for Injection Molding Process run

x1 (°C)

x2 (mm/s)

x3 (bar)

ŷp̂ (g)

yc (g)

1 2 3 4 5 6 7 8

191.25 193.75 196.25 198.75 201.25 203.75 206.25 208.75

31.67 38.33 27.22 33.89 40.56 29.44 36.11 42.78

30.00 27.50 32.50 26.25 31.25 28.75 33.75 25.63

19.94 19.97 19.91 20.02 19.92 20.02 19.89 20.04

20.94 20.96 20.98 21.00 20.94 21.12 20.87 21.05

results from the experimental runs, consisting of eight runs for the current model and their associated predictions using the past model. The values of the design variables are given in columns 2−4, predictions from the past model in column 5, and responses from the current process in column 6. These initial experimental data were then used to migrate the past model to the current one. To apply the robust migration strategy, we first scaled x1, x2, and x3 to [0,1]3. For the simplest migration, we obtain θ̂0 = (1.0534, −0.0057, −0.0077, 0.0090)′, l ̂ = (1.7411,3.0314,5.2780) and eloo‑cv(0) = 0.1535. The values of eloo‑cv(m) are plotted in Figure 9 against the potential terms identified by the Bayesian selection technique. We find that the first four terms, 1, x3q, x2 ι and x1 ι give the minimum eloo‑cv(4) = 0.1196.

Figure 8. Molds used in injection molding processing. 5745

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research

training data. After adding the information from these first two runs into the migration model, the third update (run 11) produces a value of 21.14, which is much closer to the target and presents a significant improvement over the previous two runs. Not surprisingly, the fourth update locates the target immediately, yielding a value of 21.22 which is only 0.02 greater than the goal. The migration-based method was also compared with a maximization of the conditional log-likelihood of a GP model built using only the experimental data. Under this strategy, we reach the target after 17 experimental updates. Because the GPbased search strategy requires abundant experimental data to build an accuracy model, more updates are needed to locate the target thereafter. The results of a comparison between the two strategies are listed in Table 6. Clearly, the migration-based Figure 9. Plot of the leave-one-out cross-validation errors eloo‑cv(m) in the injection molding example (described in section 3.2). “SM” denotes the simplest migration case.

Table 6. Comparison of Different Methods for Locating the Target 21.2 g optimum condition

Re-estimate ρ and l, we obtain θ̂4 = (1.0602, −0.0049, −0.0076, −0.0345, 0.0103, 0.0490, 0.3692, 0.0147)′ and l ̂ = (1.4142,1.0718,1.7411). Therefore, the final part weight of the current process can be given by migrating the past model y ĉ (x) = (1.0602 − 0.0049x1 − 0.0076x 2 − 0.0345x3)y ̂p (x) −1

(34)

where r̂, R̂ , and H4 are defined in the same manner as in the first example. We note that the eloo‑cv in this example initially decreases after the first step, then increases slightly after the second and third step, and subsequently decreases again. Although it comes down a little bit after the eighth step, overall the value is increasing after the fifth step. This shows that we should not stop the procedure immediately when we observe an increase in eloo‑cv and that the procedure should be continued for a few more steps before determining the value of m that gives the minimum value for eloo‑cv. Product part weight is a good indicator of process stability, especially for the molding of complex products. To maintain products of good quality, the part weight of the current process should be within the range of 21.2 ± 0.05 g (according to prior information such as the material density and the mold volume). To establish the conditions that will maintain this level of quality, we maximize the log-likelihood 30 of the conditional distribution and sequentially determine experimental conditions for convergence to the targeted result. In fact, we reached the target range after an additional four runs. Table 5

x1 (°C)

x2 (mm/s)

x3 (bar)

ŷp (g)

yc (g)

9 10 11 12

210.00 207.41 210.00 210.00

33.21 27.66 31.41 27.99

29.60 27.38 26.62 35.94

20.01 20.05 20.06 20.06

21.06 21.10 21.14 21.22

4 17

210.00 205.42

27.99 26.34

35.84 36.94

identified response (g) 21.22 21.24

4. CONCLUSION We have presented a fast process-modeling methodology assisted by robust migration techniques, and we have applied this methodology to the process optimization problem. The methodology adopts existing and well-studied past models to new but similar current processes and thus borrows the strength of previous investigations and any available prior knowledge. To obtain such a connection between past and current models, we use a migration function that was originally proposed in ref 4. Our method, however, is more robust in that it can identify the best migration function for approximating the relationship, the one most consistent with the data. Specifically, this is achieved by utilizing Bayesian forward model selection and cross-validation techniques. With these techniques, the desired model accuracy can be achieved at a reduced cost. The

Table 5. Follow-up Points for Locating the Target 21.2 g run

migration-based GP-based

x2 x1 (°C) (mm/s) x3 (bar)

search strategy consistently outperforms the GP-based search and finds a better optimum, with fewer experiments that are needed for goal seeking. Alternately, in this application the optimum is not unique; other combinations of barrel temperature, injection velocity, and packing pressure may also result in a final part weight that meets the experimenter’s requirement. In reality, the optimization problem of finding the optimum conditions for achieving the target has multiple solutions. Moreover, the uncertainty in the migration model (meaning that this model is only an approximation of the underlying process) as well as the measurement errors also contribute to uncertainty in the optimum. In recognition of this, we repeated the foregoing procedures using both strategies and found two additional optima for achieving the target. The results are shown in Table A.1 (available in Supporting Information). In actuality, when determining the settings of barrel temperature, injection velocity, and packing pressure, the experimenter has a variety of choices. Nevertheless, those settings must be physically feasible as well as safe for the operators. Once such conditions can be guaranteed, our migration-based search strategy demonstrates superior performance over the GP-based method.

+ 0.0103 + 0.0490x3q + 0.3692x 2ι + 0.0147x1ι + r(̂ x)′R̂ (y c − H4θ4̂ )

method

no. of runs

shows the data for those four experimental runs. The first two updates from the current process (runs 9 and 10) are outside the target range. This arises simply because of the poor prediction accuracy of the migration model with only eight 5746

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research resulting model can also be used to find optimal solutions more quickly. For accomplishing this, we have presented a global optimization strategy using a migration-based method that converges toward the optimum using conditional likelihood maximization. We have demonstrated our methodology by means of two examples: a pair of test functions and real-world injection molding processes. In the first case studied, we have shown that the proposed robust migration is superior, as evidenced by smaller standard root-mean-square errors compared with those of the other existing methods. To illustrate how robust migration works, we began with an initial sample of size 10 and subsequently identified the most plausible terms, those that yield a minimum leave-one-out cross-validation error. In this example, we set the search target T to 1.18, the minimum of the current function. If, for instance, we set the search termination threshold κ to 0.01, then only five more updates were needed to locate the target with the proposed migration-based method, while the GP-based method required 11 updates to achieve the desired target. Moreover, we found that the proposed method converged toward the target more quickly than did the GPbased method, thus confirming this additional advantage of the robust migration technique. In the second case studied, we applied the proposed methodology to real-world injection molding processes. Here we limited our investigation to three input variables, namely barrel temperature, injection velocity, and packing pressure, and one output variable, the part weight of the final product, after employing sensitivity analysis to the experimental data. We selected two injection molding processes with various feed materials and mold geometries as the past and current processes; the two processes are considered to be similar in that they share similar mechanisms. We applied direct process optimization by setting a target T = 21.2 g and κ = 0.05 g for the current process. According to our results, the migrationbased method required four updates to find the target, while the total updates increased to 17 under the GP-based method. Because of the existence of measurement errors, we repeated the foregoing procedure twice but with similar conclusions. These findings show that the proposed methodology is a modeling and optimization technique that holds promise for many engineering applications where similar processes are found. We limited the total sample size of the past model to 30; we note, however, that if more training data are available, and thus a more accurate past model is available, then substantial improvement in process optimization may be achieved. This needs to be verified through experimentation. Extensions to the methodology could be the subject of future work. For instance, there is sometimes other prior information such as monotonicity available for past and current models; ways of incorporating such information to assist in model development and process optimization would be an interesting line of research. Second, the extension of migration-based techniques to higher dimensions is a particularly active area of research; the migration function would become more complex in this situation. Further developments in this area would be of interest.



Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.iecr.5b00411.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Tel.: (852) 2358 7139. Fax: (852) 2358 0054. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS Luo was supported in part by Guangzhou Scientific and Technological Project (under Project No. 12190007). Yao was supported in part by Ministry of Science and Technology, Taiwan (under Grant No. MOST 103-2622-E-007-025-).



REFERENCES

(1) Chen, V. C.; Tsui, K.-L.; Barton, R. R.; Meckesheimer, M. A review on design, modeling and applications of computer experiments. IIE Trans. 2006, 38, 273. (2) Forrester, A. I.; Keane, A. J. Recent advances in surrogate-based optimization. Prog. Aerosp. Sci. 2009, 45, 50. (3) Casciato, M. J.; Vastola, J. T.; Lu, J.; Hess, D. W.; Grover, M. A. Initial experimental design methodology incorporating expert conjecture, prior data, and engineering models for deposition of iridium nanoparticles in supercritical carbon dioxide. Ind. Eng. Chem. Res. 2013, 52, 9645. (4) Kennedy, M. C.; O’Hagan, A. Bayesian calibration of computer models. J. R. Stat. Soc.: Ser. B 2001, 63, 425. (5) Goh, J.; Bingham, D.; Holloway, J. P.; Grosskopf, M. J.; Kuranz, C. C.; Rutter, E. Prediction and computer model calibration using outputs from multifidelity simulators. Technometrics 2013, 55, 501. (6) Lu, J.; Yao, K.; Gao, F. Process similarity and developing new process models through migration. AlChE J. 2009, 55, 2318. (7) Yan, W.; Hu, S.; Yang, Y.; Gao, F.; Chen, T. Bayesian migration of Gaussian process regression for rapid process modeling and optimization. Chem. Eng. J. 2011, 166, 1095. (8) Luo, L.; Yao, Y.; Gao, F. Iterative improvement of parameter estimation for model migration by means of sequential experiments. Comput. Chem. Eng. 2015, 73, 128. (9) Higdon, D.; Kennedy, M.; Cavendish, J. C.; Cafeo, J. A.; Ryne, R. D. Combining field data and computer simulations for calibration and prediction. SIAM J. Sci. Comput. 2004, 26, 448. (10) Qian, Z.; Seepersad, C. C.; Joseph, V. R.; Allen, J. K.; Wu, C. J. Building surrogate models based on detailed and approximate simulations. J. Mech. Des. 2006, 128, 668. (11) Qian, P. Z.; Wu, C. J. Bayesian hierarchical modeling for integrating low-accuracy and high-accuracy experiments. Technometrics 2008, 50, 192. (12) Vanli, O. A.; Zhang, C.; Chen, L.-J.; Wang, K. K.; Wang, B. A Bayesian approach for integration of physical and computer experiments for quality improvement in nano-composite manufacturing. Qual. Reliab. Eng. Int. 2010, 26, 749. (13) Fearn, T. REVIEW: Standardisation and calibration transfer for near infrared instruments: A review. J. Near Infrared Spectrosc. 2001, 9, 229. (14) Feudale, R. N.; Woody, N. A.; Tan, H.; Myles, A. J.; Brown, S. D.; Ferré, J. Transfer of multivariate calibration models: A review. Chemom. Intell. Lab. Syst. 2002, 64, 181. (15) Jaeckle, C. M.; MacGregor, J. F. Product transfer between plants using historical process data. AIChE J. 2000, 46, 1989. (16) García Muñoz, S.; MacGregor, J. F.; Kourti, T. Product transfer between sites using Joint-Y PLS. Chemom. Intell. Lab. Syst. 2005, 79, 101. (17) Lu, J.; Yao, Y.; Gao, F. Model migration for development of a new process model. Ind. Eng. Chem. Res. 2008, 48, 9603.

ASSOCIATED CONTENT

S Supporting Information *

Figure A.1, real value versus prediction plots for different methods, and Table A.1, comparison of different methods for locating the target, 21.2 g, in two additional repetitions. The 5747

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748

Article

Industrial & Engineering Chemistry Research (18) Lu, J.; Gao, F. Model migration with inclusive similarity for development of a new process model. Ind. Eng. Chem. Res. 2008, 47, 9508. (19) Lu, J.; Gao, F. Process modeling based on process similarity. Ind. Eng. Chem. Res. 2008, 47, 1967. (20) Joseph, V. R.; Hung, Y.; Sudjianto, A. Blind kriging: A new method for developing metamodels. J. Mech. Des. 2008, 130, 031102. (21) Jones, D. R. A taxonomy of global optimization methods based on response surfaces. J. Global Optim. 2001, 21, 345. (22) Forrester, A.; Sobester, A.; Keane, A. Engineering Design via Surrogate Modelling: A Practical Guide; John Wiley & Sons: 2008. (23) Pronzato, L.; Müller, W. G. Design of computer experiments: Space filling and beyond. Stat. Comput. 2012, 22, 681. (24) Loeppky, J. L.; Sacks, J.; Welch, W. J. Choosing the sample size of a computer experiment: A practical guide. Technometrics 2009, 51, 366. (25) Box, G. E.; Hunter, J. S.; Hunter, W. G. Statistics for Experimenters: Design, Innovation, And Discovery, 2nd ed.; John Wiley & Sons: 2005. (26) Rasmussen, C. E. Gaussian Processes for Machine Learning; MIT Press: 2006. (27) Kalagnanam, J. R.; Diwekar, U. M. An efficient sampling technique for off-line quality control. Technometrics 1997, 39, 308. (28) Bayin, S. S. Mathematical Methods in Science and Engineering; John Wiley & Sons: 2006. (29) Bastos, L. S.; O’Hagan, A. Diagnostics for Gaussian process emulators. Technometrics 2009, 51, 425. (30) Gratiet, L. L. Bayesian analysis of hierarchical multifidelity codes. SIAM/ASA J. Uncert. Quantif. 2013, 1, 244. (31) Zellner, A., Min, C.-K. In Physics and Probability; Grandy, W. T. Jr., Milonni, P. W., Eds.; Cambridge University Press: 1993; p 195; Cambridge Books Online. (32) Le Gratiet, L. Multi-fidelity Gaussian process regression for computer experiments. Ph.D. Thesis, Université Paris-Diderot-Paris VII, 2013. (33) Joseph, V. R.; Delaney, J. D. Functionally induced priors for the analysis of experiments. Technometrics 2007, 49, 1. (34) Akaike, H. Selected Papers of Hirotugu Akaike; Springer: 1998; p 199. (35) Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461. (36) Ando, T. Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models. Biometrika 2007, 94, 443. (37) Bertsimas, D.; Tsitsiklis, J. Simulated annealing. Stat. Sci. 1993, 8, 10. (38) Kirkpatrick, S.; Gelatt, C.; Vecchi, M. Optimization by simulated annealing. Science 1983, 220, 671. (39) Xiong, S.; Qian, P. Z.; Wu, C. J. Sequential design and analysis of high-accuracy and low-accuracy computer codes. Technometrics 2013, 55, 37. (40) Gao, F.; Yang, Y.; Shao, C. Robust iterative learning control with applications to injection molding process. Chem. Eng. Sci. 2001, 56, 7025. (41) Chen, X.; Gao, F. A study of packing profile on injection molded part quality. Mater. Sci. Eng., A 2003, 358, 205. (42) Chen, X. A study on profile setting of injection molding. Ph.D. Thesis, The Hong Kong University University of Science and Technology, 2002. (43) Sobol, I. M. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 2001, 55, 271.

5748

DOI: 10.1021/acs.iecr.5b00411 Ind. Eng. Chem. Res. 2015, 54, 5736−5748