Model Migration with Inclusive Similarity for ... - ACS Publications

Nov 5, 2008 - an injection molding process is demonstrated by taking advantage of existing models ..... a soft-sensor measurement of the MFL during mo...
0 downloads 0 Views 414KB Size
9508

Ind. Eng. Chem. Res. 2008, 47, 9508–9516

Model Migration with Inclusive Similarity for Development of a New Process Model Junde Lu and Furong Gao* Department of Chemical and Biomolecular Engineering, Hong Kong UniVersity of Science & Technology, Clear Water Bay, Kowloon, Hong Kong

In the processing industries, operating conditions change to meet the requirements of the market and customers. Under different operating conditions, data-based process modeling must be repeated for the development of a new process model. Obviously, this is inefficient and uneconomical. Effective use and adaptation of the existing process model can reduce the number of experiments in the development of a new process model, resulting in savings of time, cost, and effort. In this paper, a particular process similarity, inclusive similarity, is discussed in detail. A model migration strategy for processes with this type of similarity is developed to model a new process by taking advantage of existing models and data from the new process. The new model is built by aggregating the existing models using a bagging algorithm. As an illustrated example, the development of a new soft-sensor model for online prediction of melt-flow length for new mold geometry for an injection molding process is demonstrated by taking advantage of existing models for different molds. 1. Introduction The increasing emphasis on product “quality”, economic process performance, and environmental issues in the chemical industries is placing significant demands on operation and product optimization. Enhanced process performance generally requires increased process knowledge, which is generally represented by process models. A process model is, mathematically, a relationship mapping from an input space, input variables (X), or process conditions to an output space, response variables (Y), or quality properties. A process model can be developed based on the first-principles, empirical approach or through the combination of first-principles models with other models (hybrid models). First-principle models, based on fundamental conservation laws, are widely used. They allow for a certain degree of extrapolation beyond the regions over which experimental data are available. However, this advantage is sometimes degraded by the inability of a first-principles model to predict real plant behavior with sufficient accuracy and its failure to take full advantage of all available empirical knowledge or data. An alternative modeling technique is to use a data-based approach to develop a cost-effective process model. Such databased approaches, also known as “black-box” tools, including artificial neural networks (ANNs), fuzzy logic modeling (FLM), partial least-squares (PLS), and support vector machines (SVMs), extract knowledge directly from experimental data under certain assumptions about the true underlying process behavior.1-4 Purely empirical approaches permit only limited extrapolation beyond the domain of the data from which they were derived. On the other hand, data-based modeling approaches usually require large amounts of data for reliable predictions to be obtained. To obtain good extrapolation properties, hybrid models, a combination of a first-principles models with traditional data-driven models, have been developed for many chemical processes.5,6 Data-based modeling approaches are widely used for process model development due to cost efficiency and low requirements of prior process knowledge. In most processing industries, operating conditions frequently change to meet the requirements * Corresponding author. E-mail: [email protected]. Tel: +852-23587139. Fax: +852-2358-0054.

of the market or customer. Different specifications of products from the same family are produced by changing the recipe from the first (or old) process to a second (or new) process. These processes might be of different sizes, process conditions or configurations, although their intrinsic physical principles are the same. For example, in injection molding, different specifications of products are produced with different grades of plastic granules and/or different molds. These modified process conditions might make existing process models invalid due to the limited extrapolation of data-based models. Therefore, the modeling procedure, such as the experimental design, data collection, and model training, has to be repeated for the development of a model for the new process. Obviously, this is inefficient, time-consuming, and uneconomical, and this is particularly problematic for processes with expensive materials and high operating costs. Although these processes might be of different sizes and configurations, use different raw materials and technologies, or operated under different operating conditions, the underlying physical principles behind them are similar. As a consequence, effective use and adaptation of an existing process model can allow us to conduct fewer experiments for development of a new process model, resulting in savings of time, cost, and effort. This is a challenging issue because adaptation algorithms are required to fulfill conflicting requirement of generalizing from as little new data as possible, yet at the same time, the performance of the new model must be as good as possible. The concept of migrating model development has recently been proposed by Lu and Gao.7 The key challenges and problems were discussed, and a simple case was presented for demonstration.7 For clarity, we call the process model describing the old process the base model and the model to be developed for the new process the new model. There are few papers on the subject of how to adapt existing process models to fit a new process. The multivariate calibration model is used for extracting chemical information from spectroscopic signals. The changes in spectral variations, due to different instrument characteristics or environmental factors, may render the original calibration model invalid for prediction in the new system. Feudale et al.8 reviewed various methods for calibration model transfer to avoid full recalibration by taking

10.1021/ie800595a CCC: $40.75  2008 American Chemical Society Published on Web 11/05/2008

Ind. Eng. Chem. Res., Vol. 47, No. 23, 2008 9509

advantage of the existing model. Most of calibration models are linear and one-dimensional; such methods are not suitable for process modeling. Case-based reasoning (CBR) has been found to be an effective approach to find the solution to a new case by adapting solutions that were used to solve past similar cases.9 Although they are widely used in medical diagnosis, process control, planning, and product design, most CBR systems cannot propose good solutions to a given new task without assistance from domain experts or system managers.10 On the other hand, based on the specific application background, different CBR systems use different approaches that are mostly quantitative, descriptive, constructive, and experience-based. Recently, Jaeckle et al.11,12 utilized historical process data to find a window of process operating conditions within which a product with a new specified quality can be produced. Similarly, the same concepts are developed by using a latent variable model and a joint-Y partial least-squares model with historical process data on both sites.13,14 The focus of these papers was to find the new process conditions for new plants instead of the development of a new model. This paper seeks to develop a modeling strategy, called model migration, to develop a new process model, by taking advantage of the availability of base models. Less experimental data, instead of full-scale retraining, can be used in the development. The remainder of the paper is organized as follows. The concept of inclusive similarity is presented in section 2. For this type of similarity, the corresponding model migration strategy is proposed in section 3. Section 4 presents an illustrative application in the development of a soft-sensor model for online prediction of the melt-flow length in injection molding. Finally, concluding remarks are offered in section 5. 2. Inclusive Similarity The basic requirements for model migration are process similarity. Different types of process similarity require different model migration strategies. Lu and Gao15 defined and classified process similarity, which gives the basis for the selection and development of different model migration strategies. With scale similarity, Lu and Gao7 proposed a model migration strategy using slope and bias correction in the development of new model. In this paper, a model migration strategy is proposed for the case of inclusive similarity. If all or some of the attribute values describing process i are a subset of a corresponding pair attribute values for process j, the processes are said to show inclusiVe similarity. Inclusive similarity occurs in natural set relations between a class and an element of between a whole and its parts. For example, a process, P, is represented by a set of process attributes A ) [A1, A2, ..., Ar] and the corresponding attribute values V ) [V1, V2, ..., Vr], where r is the number of process attributes. Assume that the attribute variables of size u in processes i and j are different and the corresponding attribute values describing process i are a subset of those describing process j, as eq 1 shows: Viu ⊂ Vju,

u ∈ [1, 2, ..., r]

(1)

In other words, process j has at least one attribute that has all of the characteristics of process i. For example, if the operating range of a particular process variable in process i is included in that of process j, these two processes can be viewed as having inclusive similarity in operating range. In the injection molding process, for instance, different molds are frequently used to produce different products. Molds used in one process include

Figure 1. Procedure of the model migration strategy.

some basic shapes of molds used in another process. Processes with such molds can be viewed as having inclusive similarity of attributes. 3. Model Migration Strategy Inclusive similarity means that the attribute values of one process are a subset of the attribute values of another process. Before we propose a particular model migration method, we need to introduce some process assumptions. Assume, first, that two processes have identical inputs (the same inputs and operating ranges). Second, assume that one or several base models are available and each corresponding old process has one or several attribute values that are a subset of those of the new process to a certain degree, and the remaining attribute values are the same. Since there is inclusive similarity, the new process is similar to that of old process. On the other hand, the new model must reflect process changes that the base model does not include. The model migration strategy with inclusive similarity must meet the following two requirements: (1) inherit the behaviors of all available base models and (2) reflect process variations that are not included in the base models. To meet these two requirements simultaneously, we propose a modeling method for development of new model by combining behaviors of each base model and the new process knowledge through ensemble building. Ensemble building is a common way to improve the generalization ability of the resulting model for classification and regression tasks. In ref 16, an ensemble of individual predictors performs better than a single predictor on average. The bagging algorithm, an acronym for “bootstrap aggregation”, is an ensemble method.17 It generates a set of component predictors and combines them to find an aggregated predictor. Each component predictor is formed by making bootstrap replicates of the training set. In this paper, we use a bagging algorithm to develop a new process model, and each component predictor is developed using ANN predictors. Training data are formed from base models and new process data. As shown in Figure 1, the procedure of the model migration strategy involves the following steps: (1) generate a training data set from base models, (2) form training subsets to identify the ensemble members, (3) develop individual ensemble members with training subsets, and (4) combine these ensemble members together. This modeling strategy utilizes only old process knowledge. Actually, although the new process is similar to

9510 Ind. Eng. Chem. Res., Vol. 47, No. 23, 2008

parameters are unknown. This black-box model maps only input-output relationships, which creates significant challenges for the development of a new process model. Past process knowledge, therefore, is only retrieved by generating data sets from base models. In this paper, a set of black-box models are assumed to be available. Available process data can be directly used for development of new process models. 3.2. Generating Training Data. The procedure for generating the training data set from a set of base models is shown in Figure 2. A set of sub data sets are generated from each base model and put together to form the training data set. The generated training data set can be directly sampled to generate individual training and test data sets. However, if the size of the training data set is too large, each individual training data set will contain a large amount of data, and this will add significant computation burden during the ensemble member development. An alternative method is that the training data set is clustered into a compact but representative style to form a training data set before sampling. A set of individual training and test data sets are then formed from the clustered training data set by a sampling technique and used to develop individual ensemble members. The detailed development steps are described as follows. Generation of the training data set starts with discretization of the input space of each base model by griding X with equidistant lines. Then the data are combined with corresponding y values to form input-output data pairs: Figure 2. Procedure for generation of training data sets.

(Zi ) [XT ;y] i ) 1∼n)

Figure 3. Mold geometries of old processes.

the old processes, the new process shows different process patterns. This mismatch can be compensated by training individual ensemble members on the new data. The new model is a combination of individual ensemble members. The new model is expected to perform well, as the demonstration example below shows. The detailed descriptions of each step are given as follows. 3.1. Base Model or Old Process Description. Past process knowledge to be used for development of a new model can be classified into two types: the first is the past process knowledge that is stored in a set of process models; the second is that stored in a database in terms of process data. The base model structure and parameters can be modified to adapt to the new situation in the case of the first type of process knowledge. However, the base model is often given to the user as a black-box model or a commercial package, so both the base model structure and

where X denotes the inputs of each base model, y denotes the corresponding output, Zi is the input-output data pair, and n is the number of data pairs in each sub-data set. Suppose m base models are available. Then m sub-data sets can be generated. Thus, N ) n × m data pairs are generated and put together to form the training data set. The number of data pairs in each subdata set depends on how fine the X inputs are grided. To describe the nonlinearity of the base model, a sufficient number of data pairs are needed. This will significantly increase the computation burden when using complex modeling techniques, i.e., neural networks. The clustering algorithm18 can group a number of data points into sparse cluster centers. Each center represents a local region where data points have similar patterns. When the clustering algorithm applied to training data, a great number of data points can be compressed into a small number of clustered centers without significant information loss. This kind of compact but representative feature of clustered centering can significantly reduce the computation burden without decreasing modeling efficiency to a certain degree, as indicated in the comparison of the prediction ability of the new ensemble model, which is built with training data with and without using the clustering algorithm. The effectiveness of each ensemble model can be measured by the extent to which the members show different patterns of generalization.19 The ideal would be a set of members in which each of the members generalizes well, performs with diversity, and when they do make errors on new data, these errors are not shared with any other members.19 To make each ensemble member generalize largely different and diverse data, it is common to use some form of sampling, such that each ensemble member is trained on a different subsample of the training data. One of resampling methods that has been used for this purpose is bootstrapping.17 A set of training data sets and test data set pairs, [S1, Sj1], [S2, Sj2], ..., [ST, SjT], are generated from the training set (S), where S is the training data set of size N, ST is a Tth

Ind. Eng. Chem. Res., Vol. 47, No. 23, 2008 9511

methods. Without any information on the new process, a single output can be created from a set of ensemble member outputs via simple averaging: f(X) )

Figure 4. Mold geometries of new processes.

individual training data set, which is sampled N times from original training data set with replacement, and SjT is a test data set of the data pairs from the original data set that do not occur in the corresponding training set. With sampling with replacement, it is likely that some examples will be repeated in each training data set pair. If N is large enough, then ST is expected to have 63.2% of the data pair of S, the rest being duplicates:

T

∑ f (X)

1 T

(i ) 1:T)

i

(3)

i)1

where f(X) is the ensemble member output, fi(X) is the ith ensemble member output, X is a input vector of an ensemble member, and T is the number of ensemble members. Weighted Averaging. However, individual ensemble members may make different contributions to the ensemble model. Therefore, with new data available, the overall output of the ensemble model is a weighted combination of the individual ensemble members’ outputs and is defined as T

f(X) )

∑ w f (X)

(i ) 1:T)

i i

(4)

i)1

( (

ST ) 1 - 1 -

1 N

N

) ) S ) 0.632S

(2)

The ensemble members are fitted using the above bootstrap data pairs with any modeling method. This training data set generation process is based on a situation in which the new data from the new process are not available. With availability of new data, each individual training data set and test data set are augmented with the new data, as shown in Figure 2. Obviously, this is expected to improve the generalization performance of each ensemble member. 3.3. Development of Ensemble Members. Once a set of individual training and test data sets is generated, the learning methods should be identified to develop each ensemble member. The performance of the ensemble model largely depends on the stability of the learning method that is used to develop ensemble members. Significant improvement will occur for unstable learning method.17 A learning method said to be unstable if minor changes in the training data set and/or the training parameters have serious consequences on the generalization ability of the resulting ensemble members. Instability was studied by Breiman, who pointed out that neural networks, classification and regression trees, and subset selection in linear regressions were unstable, while k nearest-neighbor methods were stable.20 Neural networks, therefore, are used to develop ensemble members in this paper. The training data set with replacement sampling is used to train each neural network member, and the corresponding test data set is used to determine the learning structure. As mentioned above, if a small amount of new data from the new process is available, this data can be used as the training and test data set for development of ensemble members. It can be expected that the prediction performance of a new model with new training data is better than that of a new model without new training data. 3.4. Combination of Ensemble Members. The next step in the model migration strategy is to find an effective way to combine ensemble members. While there are several possible ways of combining ensemble members, an ensemble, cooperative combination is the most common. In this combination method, it is assumed that all of the ensemble members will make some contribution to the ensemble decision, even though this contribution may be weighted in some way. Methods of combining the ensemble members in a co-operative fashion include averaging and weighted averaging. Averaging. The linear combination of the output of the ensemble members is one of the most popular aggregation

where wi is the weight for combining the ith ensemble member. A direct method to determine the weights is by multiple linear regression. However, an ensemble model with multiple linear regression does not appear to give good results. A good solution is to add constraints to the weights to be non-negative.21 Therefore, the following constraints are proposed: 0 e wi e 1

(5)

T

∑w

i

)1

(6)

i)1

Based on eqs 4-6, determining the weights can be viewed as an optimization problem with constraints and with new data:

( ( K

Wopt )

∑ j)1

T

Yj -

))

∑ w f (X ) i i

i)1

j

2

(7)

where Yj is the actual value of jth new data from the new process, K is the amount of new data, and Wopt is the optimized weights. In the end, the new model is a weighted combination of each individual ensemble member, as eq 4 shows. 4. Case Study 4.1. Background Introduction. To demonstrate the proposed method, a new soft-sensor model for online measurement of the melt-flow length (MFL) in injection molding is developed. As a batch process, injection molding operates sequentially in stages including filling, packing-holding, and cooling. During the filling stage, the screw moves forward and pushes the melt flowing via the runner and gate into the mold until the cavity is completely filled. Thus, the melt development in the cavity during mold filling is significantly important to the quality of the process. The MFL, defined as the distance that the melt front has traveled in the mold from the gate in the filling phase, is an important parameter reflecting the melt-flow status in the mold cavity. However, it is difficult to measure the MFL online. A hardware capacitive transducer was developed to measure the MFL.22 However, since it is neither economical nor practical for all molds to be equipped with such a capacitive transducer, a soft-sensor measurement of the MFL during mold filling has been developed by our group.23 In that study, on the basis of nine purposely designed basic mold inserts, we built a neural

9512 Ind. Eng. Chem. Res., Vol. 47, No. 23, 2008 Table 1. Prediction Results from Base Models base model

1

2

3

4

5

6

7

8

9

average

R-RMSE

0.101%

0.131%

0.116%

0.108%

0.121%

0.138%

0.113%

0.136%

0.125%

0.121%

network model by using data collected from the nine basic mold inserts to predict the MFL of a new mold. In this study, a new soft-sensor model predicting the MLF for two new molds is developed through model migration from a set of base models. Processes operating with mold inserts 1-9 (as shown in Figure 3) are considered as the old processes, while processes with mold inserts 10 and 11 (as shown in Figure 4) are the new processes. To determine the mold’s influence on product quality, we assume that the same material (high-density polyethylene, HDPE), the same injection molding machine and the same control strategy are used. Since nine mold inserts share the same basic geometry with the new mold inserts, including

Figure 5. Architecture of recurrent neural networks.

a constant area, a gradually increasing/decreasing area and an abruptly increasing/decreasing area, the similarity between the old and new processes can be viewed as inclusive similarity. 4.2. Base Model Description. As stated by Chen et al.,23 for a given material and mold, the MFL at any time is mainly determined by the nozzle pressure (NP), the screw displacement (SD), the injection velocity (IV), and the nozzle temperature (NT) and can be represented by MFLn ) f(MFLn-1, NPn, ∆NPn, SDn, ∆SDn, IVn, ∆IVn, NTn, ∆NTn) (8) where the subscript n denotes the current time; n - 1 represents the last time interval; MFLn-1is the preceding MFL; NPn is the current nozzle pressure, representing the driving force for mold filling; ∆NPn represents the changes in the force required for the current MFL increment; SDn is the current screw displacement, which represents the total amount of melt in the mold; ∆SDn denotes the amount of melt entering the mold during the current time increment; IVn is the current screw injection velocity, which will directly affect the melt-front velocity and the MFL consequently; ∆IVn is the injection velocity change in the current sample; and NTn and ∆NTn are the current nozzle temperature and its change, respectively. Though the MFL is a process variable related to time, the injection time is excluded as an explicit input, as it can be inferred by the screw displacement and injection velocity. The correlation between the MFL and the process variables can be treated as a “black box” input-output relation. As shown in eq 8, the MFL of the nth sample, MFLn, depends not only upon the process information on the current sample but also upon its previous status as reflected in the measurements of the last sample, the preceding MFL, MFLn-1. A number of experiments were conducted on a Chen Hsong reciprocating screw injection-molding machine (MK-III J88). For each old process mold insert, as depicted in Figure 3, 11 different injection velocity profiles, including constant, step change, and ramp profiles, are set to generate the training data for the base model development. The model structures are same as in ref 23. Validation data collected from the measurement of the hardware are used to evaluate the performance of the base models, and the results show that predictions from the nine base models are in good agreement with the actual value in terms of the relative root-mean-square errors (R-RMSE), as shown in Table 1. R-RMSE is defined as

R-RMSE )

Figure 6. Recurrent neural networks model.



N

∑ i)1

( ) Yi - Yˆi Yi N

2

(9)

where Yi is the validation data, Yˆi is the estimated value from the base model, and N is the amount of validation data. 4.3. Training Data Set Generation. Based on eq 8, with discretization of nine input variables with equidistant lines, a great amount of data will be generated. Actually, there are only four independent input variables, ND, SD, IV, and NT, and other input variables and time can be derived based on IV and SD. To sufficiently describe the base model, with a discretization size of 20, each base model will generate 160 000 (204) input-output data pairs. Given nine base models, there are 4

Ind. Eng. Chem. Res., Vol. 47, No. 23, 2008 9513

Figure 7. Comparison of prediction and measurements for mold insert 10.

× 160 000 data pairs generated to form the original training data set, which will significantly increase the computation burden in running the model. A subtractive clustering algorithm18 is therefore applied to compress the original training data. Considering the tradeoff between the computation burden and the sufficient amount of data required for modeling, the final number of clusters is reduced to about 50 000 to form the training data set. So that each ensemble member has mostly different patterns of generalization, a bootstrap sampling strategy is used to generate individual training data sets. The number of individual training data sets or the ensemble member size is chosen as 10, which is sufficient for ensemble modeling.24 4.4. Ensemble Member Development. In modeling nonlinear and complex processes, a neural network has been found to be an efficient tool for its fast computing and the capability of learning through examples. In this project, an unstable learning method, the recurrent neural network, is used to model the dynamic relation of eq 8 for ensemble member development. 4.4.1. Inputs and Output. The correlation between the melt flow length and the process variables can be treated as a “black box” input-output relation represented and learned by a neural network. To make the ensemble member more general and to facilitate better learning for the recurrent neural network, all input and output variables are normalized to be in the range of 0-1. The normalization of the neural network output (the melt

flow length) gives the relative ration of the melt flow length over the total flow length. Similarly, the screw displacement input, SDn, is divided by the total injection stroke required by the molding. The nozzle pressure, nozzle temperature, and injection velocity are divided by their filling maxima obtained from the measurements. The changes of the screw displacement, nozzle pressure, nozzle temperature, and injection velocity are derived from the above normalized data. 4.4.2. Neural Network Architecture. A typical recurrent neural network has at least one feedback loop. For example, a recurrent neural network consists of a layer of neurons, some of which feed their outputs back as the inputs to neurons in previous layers, as depicted in Figure 5. Owing to its particular structure, the recurrent network can store information for future reference and thus be able to learn temporal as well as spatial patterns.25 This makes it useful in signal processing and prediction where time plays an important role. In eq 8 the melt flow length of the nth sample, MFLn, depends not only on the process information on the current sample but also on the previous status as reflected by the measurements of last sample, the preceding melt flow length, MFLn-1. The corresponding recurrent neural networks, as depicted in Figure 6, should be adopted for the modeling of such a dynamic relation, taking the last output as one of the current inputs.

9514 Ind. Eng. Chem. Res., Vol. 47, No. 23, 2008

Figure 8. Comparison of predictions and measurements for mold insert 11. Table 2. Prediction Results of Different Modeling Strategies training data

without new data

with new data

clustering

without clustering

with clustering

combining ensemble members

averaging

averaging

averaging

without clustering weighted averaging

averaging

weighted averaging

Chen’s method

mold 10 mold 11

0.221% 0.136%

0.225% 0.151%

0.212% 0.149%

0.191% 0.126%

0.216% 0.154%

0.198% 0.132%

0.243% 0.156%

4.4.3. Training Algorithm. During the training, the weights and biases of the network are iteratively adjusted to minimize the mean-squared error (MSE), the average squared error between the network outputs and the target outputs. The Levenberg-Marquardt (LM) algorithm, a well-known training algorithm for its fast convergence and small residual training error,26 is employed in this work. The updating rule of the LM algorithm is represented as ∆W ) (JTJ + µI)-1JTe

(10)

where ∆W is the weight increment, J the Jacobian matrix of derivatives of each error with respect to each weight, I the identity matrix, µ a scalar, and e the calculated MSE. When µ is zero, the LM becomes the Gauss-Newton method using an approximate Hessian matrix; when µ is large, the LM approaches the gradient descent method with a small step size.

with clustering

As the Gauss-Newton method is faster and more accurate to approach the error minimum, µ is adjusted in such a way to shift the algorithm toward the Gauss-Newton method as quickly as possible. µ is thus decreased after each successful step (reduction in MSE) and is increased only when a tentative step increases the error. In this way, it ensures a rapid reduction of the objective function. This method has faster converging rate and smaller residual-training error compared with the commonly used back-propagation (BP) algorithm in network training.26 As stated above, the individual training data set for training the individual recurrent neural network comes from two different kinds of training data sets: one is from the training data set generated from base models without clustering; another is from the clustered training data set. For these two kinds of training data sets, corresponding test data sets, SjT, are used to determine the structure of the recurrent neural network. The number of

Ind. Eng. Chem. Res., Vol. 47, No. 23, 2008 9515

hidden neurons for each recurrent neural network was determined by considering several neural networks where the number of hidden neurons ranged from 10 to 25. The network giving the least number of errors on the test data was selected. For some cases, if some new data are available from new processes, these data can be added to the training data and the test data to become hybrid data to determine the structure of the individual neural network. It can be expected that neural networks trained with hybrid data have better prediction ability than those trained without the addition of new data. In this example, new data are collected from mold inserts 10 and 11 with three injection velocity profiles: constant profile (20 mm/s); one step-up change profile (10-30 mm/s) and a ramp-down profile (30-10 mm/ s). 4.5. Combination of Ensemble Members for Development of a New Model. Methods of combining the ensemble members include averaging and weighted averaging, depending on whether or not new data are available. Without availability of new data, the output of the ensemble neural network model or the new model is the average of the outputs of the individual neural network models. However, if new data are available, weights for each individual neural network model are optimized by using the genetic algorithm according to eq 7. The final output of the ensemble neural network model is the weighted averaging of the outputs of each individual neural network model. 4.6. Results and Discussion. Several new ensemble models are developed depending on whether or not generated training data are clustered and whether or not new data are available. For each ensemble model development, two hidden layers, with 10 neurons in the first layer and 15 neurons in the second, are used to train a recurrent neural network. The number of neurons is decided based on a combination of experience and trial-anderror tests. Then a set of new ensemble models are combined together to develop a new ensemble model based on averaging and optimization strategy. Validation data collected from molds 10 and 11 are used for the new ensemble model validation, under different injection velocity profiles, such as the constant profile (30 mm/s), the step change profile (15-25 and 30-10 mm/s), and the ramp profile (10-30 mm/s). The proposed methods are compared with Chen’s method in which experimental data obtained from mold 1 to 9 are used to train a recurrent network to predict MFL on new mold 10 and 11, as shown in Figures 7 and 8. The solid line represents the CT measurement from hardware or actual values of the MFL, the cross represents the predicted MFL from the new ensemble model, and the circle is the predicted MFL from Chen’s method.23 All results suggest that the new ensemble model has better prediction results than Chen’s method, especially for the velocity profiles of the stepdown and ramp-up in parts c and d of Figure 7, respectively, for mold 10. Prediction results from different new ensemble models are summarized in Table 2 in terms of the relative rootmean-square error (R-RMSE). A comparison between the results from the columns without the clustering algorithm and those with the clustering algorithm shows that there is no significant prediction accuracy loss. However, by using the clustering algorithm, the computation burden can be reduced significantly, although this is not shown in the table. With the availability of the new data, the prediction results, in general, are better than those without new data; this is because, when using new data as the training and test data to train the individual neural network, the neural network structure is better agreement with the new process behavior. Similarly, by using new data, the ensemble neural network model with optimized weights has

better prediction results than those using the averaging strategy. Furthermore, the results for mold insert 11 are better than those for insert 10, as the attribute values (mold geometry) of mold insert 11 are geometrically closer to the basic geometries of the mold inserts of the old processes. A comparison between Tables 1 and 2 indicates that, with only four injection velocity profiles for training, the best new model has a similar prediction precision to that of the base model calibrated with 11 injection velocity profiles. This is a large reduction in the amount of training data without loss of prediction ability. The good results can be attributed to the new model’s ability to take effective advantage of the combination of the existing base models and new information. 5. Conclusion In this paper, we presented an economical method to develop a new process model by taking advantage of a set of existing base models with limited number of experiments. Our method is a model-based learning system. The new model is built by aggregating outputs of a set of individual neural networks. The new network model is developed from data generated from the base models and the new data. As an illustrated example, new ensemble models for online prediction of melt flow length for new mold geometries in injection molding are developed by using the proposed modeling strategy. Compared with Chen’s method, the results show that the proposed method is effective. Acknowledgment This work is supported in part by Hong Kong Research Grant Council under project number 613107. Literature Cited (1) Lotti, C.; Ueki, M. M.; Bretas, R. E. S. Prediction of the shrinkage of injection molded iPP plaques using artificial neural networks. J. Injection Molding Technol. 2002, 6 (3), 157–176. (2) Li, E.; Li, J.; Yu, J. A genetic neural fuzzy system and its application in quality prediction in the injection process. Chem. Eng. Commun. 2004, 191 (3), 335–355. (3) Lu, N.; Gao, F. Stage-based process analysis and quality prediction for batch processes. Ind. Eng. Chem. Res. 2005, 44 (10), 3547–3555. (4) Lee, D. E.; Song, J. H.; Song, S. O.; Yoon, E. S. Weighted support vector machine for quality estimation in the polymerization process. Ind. Eng. Chem. Res. 2005, 44 (7), 2101–2105. (5) Thompson, M. L.; Kramer, M. A. Modeling chemical processes using prior knowledge and neural networks. AIChE J. 1994, 40 (8), 1328–1340. (6) van Lith, P. F.; Betlem, B. H. L.; Roffel, B. Combining prior knowledge with data driven modeling of a batch distillation column including start-up. Comput. Chem. Eng. 2003, 27 (7), 1021–1030. (7) Lu, J.; Gao, F. Process modeling based on process similarity. Ind. Eng. Chem. Res. 2008, 47 (6), 1967–1974. (8) Feudale, R. N.; Woody, N. A.; Tan, H.; Myles, A. J.; Brown, S. D.; Ferre, J. Transfer of multivariate calibration models: a review. Chemom. Intell. Lab. Syst. 2002, 64 (2), 181–192. (9) Kolodner, J. L. Case-Based Reasoning; Morgan Kaufmann: San Mateo, 1993. (10) Watson, I. Applying Case-Based Reasoning: Techniques for Enterprise Systems; Morgan Kaufmann: San Francisco, 1997. (11) Jaeckle, C. M.; MacGregor, J. F. Product design through multivariate statistical analysis of process data. AIChE J. 1998, 44 (5), 1105– 1118. (12) Jaeckle, C. M.; MacGregor, J. F. Industrial applications of product design through the inversion of latent variable models. Chemom. Intell. Lab. Syst. 2000, 50 (2), 199–210. (13) Jaeckle, C. M.; MacGregor, J. F. Product transfer between plants using historical process data. AIChE J. 2000, 46 (10), 1989–1997. (14) Garcia Munoz, S.; MacGregor, J. F.; Kourti, T. Product transfer between sites using Joint-Y PLS. Chemom. Intell. Lab. Syst. 2005, 79 (12), 101–114.

9516 Ind. Eng. Chem. Res., Vol. 47, No. 23, 2008 (15) Lu, J.; Gao, F. Process similarity and developing new process models through migration. AIChE J., submitted for publication, 2008. (16) Hansen, L. K.; Salamon, P. Neural network ensembles. IEEE Trans. Pattern Mach. Intell. 1990, 12 (10), 993–1001. (17) Breiman, L. Bagging predictors. Mach. Learn. 1996, 24 (2), 123– 140. (18) Chiu, S. L. Fuzzy model identification based on cluster estimation. J. Intell. Fuzzy Syst. 1994, 2, 267–278. (19) Rogova, G. Combining the results of several neural network classifiers. Neural Networks 1994, 7 (5), 777–781. (20) Breiman, L. Heuristics of instability and stabilization in model selection. Ann. Stat. 1996, 24 (6), 2350–2383. (21) Breiman, L. Stacked regressions. Mach. Learn. 1996, 24 (1), 49– 64. (22) Chen, X.; Chen, G.; Gao, F. Capacitive transducer for in-mold monitoring of injection molding. Polym. Eng. Sci. 2004, 44 (8), 1571– 1578.

(23) Chen, X.; Gao, F.; Chen, G. A soft-sensor development for meltflow-length measurement during injection mold filling. Mater. Sci. Eng. A 2004, 384 (1-2), 245–254. (24) Zhang, J.; Martin, E. B.; Morris, A. J.; Kiparissides, C. Inferential estimation of polymer quality using stacked neural networks. Comput. Chem. Eng. 1997, 21 (Suppl. 1), S1025-S1030. (25) Mandic, D. P.; Chambers, J. A. Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures, and Stability; John Wiley: New York, 2001. (26) Hagan, M. T.; Menhaj, M. B. Training feedforward network with the marquardt algorithm. IEEE Trans. Neural Networks 1994, 5 (6), 989– 993.

ReceiVed for reView April 14, 2008 ReVised manuscript receiVed October 8, 2008 Accepted October 14, 2008 IE800595A