Dynamic Data-Driven Modeling of Pharmaceutical ... - ACS Publications

Purchase temporary access to this content. ACS Members purchase additional access options · Ask your library to provide you and your colleagues site-w...
2 downloads 8 Views 4MB Size
ARTICLE pubs.acs.org/IECR

Dynamic Data-Driven Modeling of Pharmaceutical Processes F. Boukouvala, F. J. Muzzio, and Marianthi G. Ierapetritou* Department of Chemical and Biochemical Engineering, Rutgers University, Piscataway, New Jersey 08854, United States ABSTRACT: In many pharmaceutical process operations, first-principle models describing the behavior of the processed powders are difficult to derive or computationally very expensive and thus can be considered as “black-box” processes. Kriging is an efficient data-driven methodology that has been shown previously to predict steady-state operation of black-box processes while providing error estimates for each prediction. In this work, the ability of Kriging to capture the dynamic behavior of processes is evaluated and compared to the performance of dynamic Neural Network (NN) models. Design of Experiment (DoE) concepts are employed to collect the necessary dynamic inputoutput information that forms the initial multivariate database of the model. The most important feature of the proposed dynamic Kriging methodology is the ability to improve its predictive efficiency as the database is augmented by additional useful information while the process operates. An error tolerance is used in the adaptive procedure, so that only predictions that satisfy a strict accuracy constraint are added to the database. The performance of the proposed method is compared to that of dynamic NN modeling, through their application to a roller compactor process. Results indicate that dynamic Kriging has a better ability to adapt to continuous transitions in the operating conditions, compared to NN modeling. This on-the-fly approach can have significant applications in real-time optimization and control of systems for which a closed-form model is not available.

1. INTRODUCTION Dynamic systems modeling and simulation problems lacking closed-form model equations are difficult to solve when multivariate inputoutput data are the only information available. Because of the lack of model equations, such systems are characterized as a “black boxes”.1 This problem is often encountered in the case of emergent technologies, where first-principle models have not yet been developed and it contains several challenges, including: (a) What type of model should be used to approximate a black-box process? (b) How can the model capture dynamically the true behavior of the system instead of noise? (c) Can a measure of uncertainty prediction be provided for the time-dependent model output? (d) Given that accuracy of data-driven models increases if the sampling is symmetrically arranged, what is the best sampling strategy for dynamic black-box processes? Oral solid dosage form drug production in the pharmaceutical industry is one characteristic example where first-principle models governing the powders undergoing through a series of processes are unknown or difficult to be resolved, because of the complex behavior of particulate systems.2 In addition, the recent trend of the pharmaceutical industry and regulatory agencies to move toward continuous manufacturing3 signifies the need for efficient modeling techniques that can be used for online dynamic predictions and real-time optimization and control. More specifically, this work aims at demonstrating the need for, and effectiveness of, a methodology that can predict the dynamic evolution of a system for which model equations do not exist, by adapting continuously to the changes in operation of the process and simultaneously providing an error interval of the predictions. It has been shown that Kriging is an efficient black-box method for predicting steady-state performance of pharmaceutical r 2011 American Chemical Society

processes such as powder feeding and mixing.2,4 Kriging was originally applied in geostatistics for the identification of optimum drilling locations for mining. Specifically, Kriging was first developed as an inverse distance weighting method to describe the spatial distribution of mineral deposits.5,6 Accurate Kriging models can be constructed even from dispersed sampling data at lower sampling expense, relative to that required by an experimental design for response surface construction. The Kriging predictor is modeled after a normally distributed random function (Gaussian process), so a prediction variance is also obtained at the test point. As a result, the sampling value is expected to fall within the interval specified by the prediction and corresponding variance. The variance mapping describes prediction uncertainty, which will be high in regions with low number of sampling points. Kriging predictors can be improved by incorporating additional sampling information obtained within these regions, thereby reducing model uncertainty.7 This method has attracted much attention recently, because of its capability of modeling complex functions while also providing error estimates.2,4,8,9 Kriging has been used not only to generate models based on experimental process data, but also when inputoutput information is obtained via computer experiments based on simulations of complex processes.8 Kriging is a probabilistic, nonparametric black-box modeling technique that also has been used in the literature to capture the behavior of dynamic nonlinear systems.7 One interesting feature of Kriging is the possibility to include prior knowledge of the system into the model.10 In this Bayesian modeling framework, a future time step prediction is achieved based on prior knowledge Received: November 15, 2010 Accepted: April 14, 2011 Revised: April 9, 2011 Published: April 14, 2011 6743

dx.doi.org/10.1021/ie102305a | Ind. Eng. Chem. Res. 2011, 50, 6743–6754

Industrial & Engineering Chemistry Research of the system’s behavior and output values sampled at the current time step. Kriging has recently been employed in dynamic systems modeling of black-box functions, mainly for control purposes. In ref 11, the identification of dynamic biosystems is based on iterative one-step-ahead-prediction Gaussian process models, which form a general category in which Kriging belongs.7 In ref 12, a dynamic mapping Kriging algorithm is introduced, where a discrete time state-space formulation is used to represent the evolution of a simple dynamic process. The same group extended the dynamic Kriging method in ref 13 to approximate stochastic functions, specifically to describe the evolution of nanoparticle size distribution over time. In this work, a dynamic Kriging approach is proposed to capture the complete evolution of the system behavior while also providing a time-dependent mapping of the error of prediction. Using results from experiments—performed to capture the dynamic response of the studied processes under different operating conditions—the output at each time step is modeled as a function of the previous time step output and the input values at the current time step. The proposed approach is used to predict the dynamic behavior of a common pharmaceutical process—roller compaction—for which a closed-form model has been developed and validated. The choice of this case study allows the comparison of the results of the proposed data-driven approach with the actual simulated dynamic response of the process. A series of simulations are performed under different operating conditions of the process, based on a Design of Experiments (DoE) template, in order to characterize the dynamic behavior of the process and provide the necessary input for the dynamic Kriging predictor. The results of this approach are also compared to results obtained from modeling the same conditions with dynamic Neural Networks (NNs). NNs have been widely used in the pharmaceutical industry for modeling many black-box processes,14 because of their efficiency in modeling highly nonlinear behaviors. A steady-state NN can be transformed to a dynamic system by simply adding time as an extra dimension to the input of the model.15 The paper is organized as follows. Section 2 contains a thorough analysis of the history, theory, and programming steps of Kriging modeling. Section 3 describes the extension of Kriging to a dynamic and adaptive predictor model. A concise description of NN methodology is presented in section 4, and the results of the proposed approach are exemplified through a real case study of a roller compactor in section 5. The last section of the paper contains the conclusions and discussion of the results of this work.

2. KRIGING INTERPOLATION Using Kriging interpolation, the prediction is expressed as a weighted sum of the observed function values at sampling points that fall within a specific range around the point that is predicted.1,8 To define the basic structure of the Kriging model, let us first define the necessary vectors and spaces. Ntotal denotes the total number of sampling points that must be collected. An input vector xi contains the values of all the independent variables of the process at a specific point. The number of independent variables defines the dimensionality of the problem (n), such that xi = [x1,x2,...xn] i = 1, ..., Ntotal. Thus, an input matrix X contains all the observations of the system under different conditions X ∈ R Ntotal n . Similarly, the output vector yi = [y1,y2, ..., yr] contains the values of all measured outputs (r) and the output

ARTICLE

matrix Y ∈ R Ntotal r is the collection of all output vectors at all sampled locations. Given a new unsampled point xk (test point), each output is predicted as a weighted sum of a subset of the total number of samples N ⊂ Ntotal: yk, j ¼

N

∑ wi yi i¼1

j ¼ 1, :::, r

ð1Þ

The most important step of the Kriging algorithm is the determination of the set of weights assigned to each group of N clustered points near xk. This is done using the fitted variogram model and is subject to the unbiased constraint forcing the sum of weights to be equal to unity. Kriging is classified as an inverse distance weighting method, since the weights are a decreasing function of their Euclidean distance from point xk. It is important to understand the basic principles behind the calculation of each Kriging prediction and its variance. For the calculation of a Kriging predictor, function values for sampled points xi that are near xk will be more influential in determining the Kriging predictors yk. This is illustrated through a simple example in Figure 1a. If the number of nearby sampling points is high, then

Figure 1. Kriging weights assignment of (a) a test point xk and (b) a test point xk with higher uncertainty. 6744

dx.doi.org/10.1021/ie102305a |Ind. Eng. Chem. Res. 2011, 50, 6743–6754

Industrial & Engineering Chemistry Research

ARTICLE

complementary function of the semivariance, since the relationship between them is given by the following equation: CovðhÞ ¼ σ max 2  γðhÞ

ð5Þ

where σmax is the maximum variance of the variogram function (sill). The covariance function is used to calculate the Kriging weights (wk) by solving the following system: 3 2 3 2 Covðd1, 1 Þ 3 3 3 Covðd1, N Þ 1 w1 7 6 7 6 6 7 6 3 l l7 l 33 6 76 l 7 6 7 6 Covðd Þ 7 N, 1 4 3 3 3 CovðdN , N Þ 1 5 4 wN 5 λ 1 1 0 333 2 3 Covðd1, k Þ 6 7 6 7 l 6 7 ¼6 ð6Þ 7 4 CovðdN , k Þ 5 1 2

Figure 2. (a) Plot of squared function differences, with respect to sampling pair distance. (b) Generation of the variogram model, based on a least-squares fit to smoothed data, using variance scatterpoints. (c) Generation of covariance function by reflecting the variogram model between 0 and σmax2.

the confidence of prediction will be higher, resulting to a low variance interval for the output (Figure 1b). Therefore, the first step of the Kriging methodology is determination of the variogram coefficients from an experimental sampling set consisting of Ntotal sampling points. The variogram is a quantitative descriptive statistic that graphically characterizes dataset roughness and the information it provides complements that from histograms and other common descriptive statistics. The variogram γ(h) (eq 4) has an x-axis of Euclidean distance between pairs of points (eq 2) and a y-axis of squared differences of output evaluations (eq 3). h ¼ di, j ¼ jj x i  x j jj fi, j ¼ ðyi  yj Þ2 γðhÞ ¼

 1  var yi  yj 2

ð2Þ ð3Þ

where di,j is the distance between sampling point xi and sampling point xj, and di,k is the distance between sampling point xi and test point xk. Similarly, Cov(di,j) and Cov(di,k) represent the modeled covariances between sampled function data whose corresponding input vectors are a distance di,j or di,k apart, respectively. In eq 6, λ are the Lagrangian multipliers of the optimization problem solved, associated with the unbiased constraint of the sum of Kriging weights to be equal to 1. For each test point xk, a variance is also obtained as follows:

ð4Þ

where var( 3 ) is the variance. Note, at this point, that often in the geostatistical literature, the variogram is represented by the function 2γ(h), while γ(h) is referred to as semivariance. The prefix “semi” originates from the fact that points are compared in pairs and one should account for the variance of each point once.16 In the current study, γ(h) will be referred to as a variogram. If two sampling points are close to each other, in terms of distance h (eq 2), then it is expected that their values will be similar and the difference (yi  yj) will be small. This can be seen through the typical form of a variogram model (Figure 2b). The variogram coefficients are determined by fitting an optimum variogram model to the data. Data smoothing is often used to improve the fit by replacing clustered scatterpoints falling within an interval with average values. Variogram model coefficients are then obtained from regression of the variance scatterpoints to an exponential elementary type of model. This model is chosen in order to capture the trend of the variogram characteristic—to be a decreasing function of sampling pair distance. In previous work,1,2,4,8,17 the fit of a series of other elementary-type models were also evaluated and the one that captures the trend of the variogram best (minimum least-squares error) is selected. In this work, the purpose is to use a fast model that can be used for an on-the-fly prediction of the current time step output. Thus, this feature is eliminated to reduce computational cost and the variogram is assumed to have a predefined exponential functional form. From this fitted model, the covariance function is obtained. Based on the covariance definition, when two samples are close to each other, they have a high covariance (correlation) and, as the distance increases, the covariance approaches zero (see Figure 2c). Specifically, the covariance function is a

σk 2 ¼ σmax 2 

N

∑ wi Covðdi, k Þ  λ i¼1

ð7Þ

where Cov(di,k) corresponds to the right-hand side of eq 6. The steps of the Kriging algorithm that is employed are described as follows: (1) Obtain inputoutput data XY for Ntotal sampling points. (2) For any combination of a pair of sampling points xixj, calculate the Euclidian distance based on eq 2. Overall, there will be [Ntotal(Ntotal  1)]/2 sampling pairs. (3) For all sampling pairs, calculate the squared difference between the corresponding output values, based on eq 3. (4) The next step is to plot the corresponding squared differences, with respect to their distances to form a scatterplot, upon which an exponential-type variogram function γ(h) is fitted. The variogram model is necessary for the computation of the Kriging weights. However, because of noise, it is often not possible to identify the trend of the scattered squared differences. In that case, data smoothing is first applied to the scatterplot before the variogram fitting. (5) The peak value of the chosen variogram model is identified as σmax2, and it is used to calculate the covariance function, which is based on eq 5. Given a new test point, first, the number of sampled points that will affect the predicted value of the test point is chosen. During this step of the algorithm, only points that lie within a small enough maximum radius are included in the subset of samples that influence the prediction. The Kriging prediction of the test point is expressed as a weighted sum of the Kriging weights multiplied by their 6745

dx.doi.org/10.1021/ie102305a |Ind. Eng. Chem. Res. 2011, 50, 6743–6754

Industrial & Engineering Chemistry Research corresponding sampled point output values (eq 1). The Kriging weights depend on the covariances between sampling point pairs as well as the covariances between sampling points and the test point, and these are given by the solution of the system of equations given by eq 6. 6 The final step is the calculation of the variance estimate of the test point that is given by eq 7.

3. DYNAMIC KRIGING The purpose of the present work is to introduce a fast Kriging algorithm that can predict the dynamic evolution of a system based solely on experimental data and to allow the flexibility of model improvement by adding information as runtime data are collected. The first step of the proposed procedure is the use of Design of Experiment ideas in order to perform a set of dynamic response characterization experiments/simulations of the blackbox process of interest. The goal of this step is to collect the maximum amount of useful information about the type of dynamic response that a system exhibits, when a step change on one of its input variables is imposed. Experiments must be carried out within a representative range of operating conditions, according to the normal ranges under which the process is expected to operate. This concept is illustrated through a simple example of a process with three input variables: x = [x1,x2,x3]. In order to fully characterize the effect in the output of a step change of variable x3, different experiments must be carried out at different levels of variables x1 and x2. In Figure 3, a 2  2 factorial design with a center point is shown for the aforementioned example. Once the data are collected, a multivariate dataset connecting the input and output variables is formed which serves as the initial input of the dynamic Kriging predictor. This dataset forms the initial information that Kriging uses to interpolate and predict future states of the process. In order to implement a dynamic Kriging methodology, the initial dataset is formed, containing all the information from the dynamic DoE experiments. Subsequently, the Kriging algorithm is iteratively called at each time step. Based on the current time step values of the input (operating) variables, Kriging makes a prediction of the output and also provides an error estimate

Figure 3. Design of Experiment for dynamic response data: 2  2 factorial design with a center point for two variables at two levels.

ARTICLE

for this prediction. The actual form of the dynamic Kriging model can be represented as y k þ 1 ðxk þ 1 Þ ¼ f ðxk þ 1 , y k Þ

ð8Þ

Specifically, at each iteration, the algorithm uses all the available information and, using the methodology described in the previous section, identifies the most similar nearby sampled points in order to obtain the current output. Equation 8 denotes that the prediction of a point at a future time point k þ 1 is a function of the current value of the input variables and the form of the Kriging model of the previous time step. The error estimate calculated at each iteration reflects the certainty of this prediction. If the uncertainty is very high, this is an indication that this region has not been efficiently covered by the experimental points and, therefore, the predictions are not very reliable. The key feature of this adaptive algorithm is that every new predicted point is added to the initial dataset as long as its variance estimate does not exceed a certain threshold value. The predictions that have a higher variance estimate than a given threshold are not considered reliable and should not contaminate the dataset. As this procedure advances, the dataset is augmented by additional information that affects all future predictions and improves the predictive ability of the model. Updating the database takes advantage of the Kriging property that provides an error estimate for each prediction. By selecting a maximum allowed variance of prediction as a criterion for adding the surrogate prediction to the database, only reliable predictions are added to the model update. The steps of the algorithm are shown in Figure 4.

4. DYNAMIC NEURAL NETWORKS Inspired by biological nervous systems, neural networks (NN) are composed of many nonlinear and closely interconnected processing elements (neurons), which are arranged in groups called layers.18 The basic NN structure usually consists of three layers: the input layer, where the data are introduced to the network; the hidden layer or layers, where data are processed; and the output layer, where the results of given input are produced (Figure 5). NNs have been trained to model complex problems in many fields of science and engineering, such as pattern recognition, identification, classification, finance, and control, simply because a NN can approximate any practical function.19 Despite their wide range of applications and their flexibility, the design of a NN is case-specific, dependent upon the designer’s experience. The architecture of a NN is designed through a time-consuming iterative trial-and-error approach until the designer identifies the correct number of layers and neurons to best approximate the specific process. Thus, the main goal is to select the appropriate number of neurons and hidden layers to capture the correct trend of the underlying function without overfitting the data. The first step to develop a NN model consists of the selection of the model architecture, which involves choosing the number of neurons and layers. Next, the learning procedure follows, during which the weights and biases are modified to optimally approximate the target output. During this step, the training performance criterion function, which is given by the sum of squared errors between predictions and target values, is minimized. A wide variety of transfer functions can be used to model each neuron, ranging from linear to more-complex radial basis functions. In this work, the commonly used log-sigmoid function 6746

dx.doi.org/10.1021/ie102305a |Ind. Eng. Chem. Res. 2011, 50, 6743–6754

Industrial & Engineering Chemistry Research

ARTICLE

Figure 4. Adaptive Kriging Algorithm.

is used, which has the following form: tan sigðnÞ ¼

1 1 þ en

ð9Þ

The NN is trained to map a set of input data by iterative adjustment of the weights. Optimization of the weights is performed by back-propagation of the error during the training

phase. To summarize the procedure, once the architecture is chosen, the NN algorithm changes the weight values to minimize the error between the predicted output and the target values. The main advantage of this approach is shared with all other black-box modeling methods—namely, the fact that the information about the complex nature of the underlying process under consideration is not necessarily explicitly available in mathematical form. 6747

dx.doi.org/10.1021/ie102305a |Ind. Eng. Chem. Res. 2011, 50, 6743–6754

Industrial & Engineering Chemistry Research

Figure 5. Neural network (NN) structure.

Figure 6. Schematic representation of the roller compactor with the assumption of ribbon width = roll gap.

In pharmaceutical engineering, NNs have been used for a plethora of applications14,20 including validation of pharmaceutical processes,21 modeling of powder flow,22 and as decision support systems in the development of pharmaceutical drug formulations,23 to name a few. The ability of NNs to model nonlinearities between dependent and independent variables without requiring a closed-form solution of the system’s model equations makes NNs very attractive for many pharmaceutical processes.14 The requirement of a large sample size and the risk of overfitting due to the large number of parameters involved in the models are the major disadvantages of NNs. In addition, each produced network system is trained to model one particular state of a process, limiting their ability to adapt to transitions in the operation of a continuous process.

5. ROLLER COMPACTOR CASE STUDY In order to illustrate the applicability of the proposed dynamic Kriging approach, the model developed in ref 24 for the simulation of a powder roller compactor was used. Roller compaction is a process that is employed to overcome problems such as poor flowability, segregation, and dust generation of powders with very small particle sizes. Generally, it is observed that dry granulation processes improves flow properties and prevents segregation of ingredients of a powder mixture. The process involves the continuous feeding of powder to the roller compactor by augers while the two rotating rolls compress the powder into a ribbon (see Figure 6). The roll gaps are adjustable,

ARTICLE

resulting in possible variations in the roll gap width, according to the desired operating conditions. The model in ref 24 is based on Johanson’s rolling theory, which predicts the steady-state stress and density profiles of the ribbon during compaction. In order to incorporate the dynamic behavior of the adjustable roll gap, a material balance across the width is added to the system of equations. Simulations of the model can predict the roll gap and ribbon density as a function of feed speed, roll speed, and hydraulic pressure (see Figure 6). Throughout the process, it is assumed that the ribbon thickness is equal to the roll gap. The results of the simulation show that ribbon density is affected mainly by changes in pressure, while roll gap is affected by all three input variables. More specifically, simulation results show that an increase of the hydraulic pressure causes the increase of the ribbon density and the decrease of the roll gap width. On the other hand, an increase in the roll speed causes a positive change in the two outputs, but the change in density is not as significant. This case study was chosen because the dynamic response surfaces generated by the simulation have been validated experimentally. The dynamic simulator used in this work is gPROMS,25 while an electronic version of the dynamic model is available online at https://pharmahub.org/resources/244. The procedure followed for the validation of the dynamic Kriging prediction involves the design of a dynamic response set of experiments, as described in section 4. Different sizes and sequences of step changes are imposed on two of the operating parameters of the process: the hydraulic pressure and roll speed. A detailed description of the performed simulations and the results are described in the following sections. The implementation of the algorithm is performed in MATLAB. In terms of computational cost, the CPU time necessary for this fast Kriging algorithm recalibration for the current study of three independent variables is on the order of a few seconds. 5.1. Effect of Hydraulic Pressure. For the characterization of the effect of changes in the pressure to the two outputs of the model, a 2  2 factorial design with a center point was performed at different levels of the two remaining input variables: roll speed and feed speed. The range of the roll speed is chosen to be 412 rpm, while the feed speed range is 1555 cm/s. After executing the entire set of simulations, the multivariate dataset that consists of the values of all input variables and output variables at each time step from t = 0 to t = 6 min is considered by the dynamic Kriging algorithm as the initial dataset. In Figure 7, the dynamic response of the ribbon density of the roller compactor at a roll speed equal to 6 rpm and a feed speed equal to 20 cm/s, when a 1 MPa step change is imposed to the pressure at t = 1 min, is shown. Similarly, Figure 8 represents the effect of the same step change to the roll gap. The results show that dynamic Kriging predicts the simulated response surface with great accuracy, and this is verified by the Kriging error mapping in the same figures. The input used for the NN training was identical to the initial dynamic Kriging input. The MATLAB Neural Network Toolbox was used to produce the models.26 In order to keep the NN architecture simple, one hidden layer was chosen for all simulations. The disadvantage of this method is, first, the fact that, for each case study, a trial-and-error approach had to be employed to identify the optimum number of neurons—and, in some cases, the bias terms that need to be added—in order to approximate the dynamic response with accuracy. This time-consuming procedure renders NN unsuitable for online prediction of the 6748

dx.doi.org/10.1021/ie102305a |Ind. Eng. Chem. Res. 2011, 50, 6743–6754

Industrial & Engineering Chemistry Research

ARTICLE

Figure 7. Dynamic response and Kriging error of ribbon density at a roll speed of 6 rpm, a feed speed of 20 cm/s, and a 1 MPa step change in pressure.

Figure 8. Dynamic response and Kriging error of roll gap at a roll speed of 6 rpm, a feed speed of 20 cm/s, and a 1 MPa step change in pressure.

time-dependent output. However, this method can be proven to be efficient for offline dynamic systems modeling, once the optimum architecture of the system is identified. Next, a more-complex sequence of step changes is imposed on the system to verify the ability of the dynamic Kriging algorithm to adapt to continuously varying conditions of the process. Initially, a 1 MPa positive step change is imposed to the pressure at t = 1 min, while, at t = 3 min, the pressure is returned to its original value. The effects of this sequence to the ribbon density and roll gap are shown in Figures 9 and 10, respectively. It can be observed from Kriging predictions and error values in Figures 69 that there is an increase in the prediction error whenever there is change in the response. Note that, in these

regions, if a predicted point contains a variance of >5%, it is reported as a prediction but it is not added to the database from which dynamic Kriging interpolates. The Kriging error calculated for each prediction provides an interval within which an uncertain prediction might lie. Figures 69 show the results of a NN model with 20 neurons, with regard to the dynamic response of the ribbon density and the roll gap, respectively. The predicted dynamic response surfaces are not as accurate as the corresponding surfaces obtained by dynamic Kriging. In addition, NN methodology does not provide online error estimation—as opposed to Kriging—and, therefore, there is no measure of uncertainty of the dynamic prediction as the process evolves. The main 6749

dx.doi.org/10.1021/ie102305a |Ind. Eng. Chem. Res. 2011, 50, 6743–6754

Industrial & Engineering Chemistry Research

ARTICLE

Figure 9. Dynamic response and Kriging error of ribbon density at a roll speed of 6 rpm, a feed speed of 20 cm/s, and a 1 MPa step change in pressure from t = 1 min to t = 3 min.

Figure 10. Dynamic response and Kriging error of roll gap at a roll speed of 6 rpm, a feed speed of 20 cm/s, and a 1 MPa step change in pressure.

limitation of NN modeling is shown in Figure 10, where the method fails to predict the correct trend of the dynamic response of the output variables. The reasons for this limitation is that, as reported in the literature,10 NNs need a large amount of data in order to efficiently train the large number of parameters of the model. The purpose of this case study, however, is to compare the performance of the two methods when a limited set of data is available. In terms of the choice of the number of neurons used in the NN structure, a trial-and-error approach is followed, starting from a minimum number of five neurons and gradually increasing until the prediction error is satisfactory. Although there is proof that any function can be modeled if enough neurons are

used,19 in terms of comparison of the two methods, the optimum NN structure search stops at a maximum of 20 neurons. This set point is chosen because of the fact that, for every new neuron that is added to the network, an additional n þ 1 parameters are added to the model (where n is the number of input variables). Compared to the much smaller number of parameters necessary in adaptive Kriging models (no more than 3), it is fair to conclude that Kriging has a better performance using a smaller number of fitted parameters. 5.2. Effect of Roll Speed. In order to characterize the effect of modifications in the roll speed of the roller compactor to the two outputs of the model, a 2  2 factorial design with a center point 6750

dx.doi.org/10.1021/ie102305a |Ind. Eng. Chem. Res. 2011, 50, 6743–6754

Industrial & Engineering Chemistry Research

ARTICLE

Figure 11. Dynamic response and Kriging error of ribbon density at a pressure of 10 MPa, a feed speed of 20 cm/s, and a 2 rpm step change in roll speed at t = 1 min.

Figure 12. Dynamic response and Kriging error of roll gap at a pressure of 10 MPa, a feed speed of 20 cm/s, and a 2 rpm step change in roll speed at t = 1 min.

was performed at different levels of the two remaining input variables: hydraulic pressure and feed speed. The range of the hydraulic pressure is chosen to be 220 MPa, while the feed speed range is 1540 cm/s. After executing the entire set of simulations, the multivariate dataset that consists of values of all input and output variables at each time step from t = 0 to t = 6 min is fed into the dynamic Kriging algorithm as the initial lookup dataset. In Figure 11, the dynamic response of the ribbon density of the roller compactor for a hydraulic pressure equal to 10 MPa and a feed speed equal to 30 cm/s, where a 2 rpm step change is

imposed onto the roll speed at t = 1 min is shown. Similarly, Figure 12 represents the effect of the same step change to the roll gap. The results show that dynamic Kriging predicts the simulated response surface with great accuracy, and this is verified by the Kriging error values in the same figures. In addition, NN models are fitted to predict the dynamic response of the same operating conditions and step changes of the roll speed. The input used for the NN training is the same as the initial dynamic Kriging input. Similar to section 5.1, the MATLAB Neural Network Toolbox was used to produce the 6751

dx.doi.org/10.1021/ie102305a |Ind. Eng. Chem. Res. 2011, 50, 6743–6754

Industrial & Engineering Chemistry Research

ARTICLE

Figure 13. Dynamic response and Kriging error of ribbon density at a pressure of 10 MPa, a feed speed of 20 cm/s, and a 2-rpm step change in roll speed at t = 1 min, or a 1-rpm step change at t = 3 min.

Figure 14. Dynamic response and Kriging error of roll gap at pressure = 10 MPa, feed speed = 20 cm/s, and a 2-rpm step change in roll speed at t = 1 min, or a 1-rpm step change at t = 3 min.

models. In order to identify the optimal architecture of the network, a different number of neurons is used, while keeping the number of layers constant and equal to one. Figure 11 illustrates a case where the NN approach fails to predict the real magnitude of the dynamic response of the ribbon density, when the roll speed is modified. It is possible that the predicted response can be further improved by adjusting the number of layers and neurons and their biases. However, the need for this procedure makes the case-specific nature of NN systems even more apparent. Figure 12 is a typical case where a bias term needs to be added to further improve

the prediction; however, for illustration purposes, it is left in this form. The dynamic response of the two outputs of a more-complex sequence of step changes on the roll speed is illustrated in Figures 13 and 14. The sequence consists of an initial 2-rpm increase in the roll speed, followed by a 1-rpm decrease. The NN model can predict the dynamic response of the ribbon density with greater accuracy than in the case of the roll speed. Following the same validation procedure of the dynamic Kriging method, a more-complex step change is simulated to verify the ability of the method to predict the correct response 6752

dx.doi.org/10.1021/ie102305a |Ind. Eng. Chem. Res. 2011, 50, 6743–6754

Industrial & Engineering Chemistry Research surface. The initial step change of the roll speed is a 2-rpm increase at t = 1 min, followed by a 1-rpm decrease at t = 3 min. Figures 13 and 14 show the Kriging predicted response of the ribbon density and roll gap, compared to the simulated responses accompanied by the corresponding time mapping of the Kriging calculated errors. In both cases, the Kriging error increases during the steep transition curves, but this level of variance is not considered to be significant. This fact is substantiated by the accuracy of the predicted curves. In order to assess the importance of model updating, the same simulations were performed without the augmentation of the initial database as runtime data are collected and the average error of prediction, in certain cases, reached 30%. In addition, for certain step changes, Kriging was not able to capture the correct magnitude of the dynamic response of the output.

6. CONCLUSIONS AND FUTURE WORK The purpose of this work is to bring together aspects from Design of Experiment (DoE) and black-box modeling tools, not only to illustrate, but also to compare their results on the ability to predict a dynamic response of a complex system when the model equations are not available. Carefully designed experimental or simulated results are necessary in order to capture the dynamic response of the process when step changes are performed under different operating conditions. Results of the dynamic Kriging predictor method proposed in this work illustrate that the method can capture the dynamic evolution of a black-box process. The advantages of this approach include computational efficiency, speed, and ability to dynamically adapt to the transitions of the system. These advantages make this method very attractive for use in dynamic systems identification and control as well as real-time optimization. The key aspect of this work is the exploitation of the most important advantage of Kriging—the quantification of prediction error—in order to selectively update the model database and continuously improve the predictive ability of the model. Compared to Neural Network (NN) modeling, the dynamic Kriging approach is shown to have a better predictive ability with the predefined number of input data points. NN systems are also, by nature, less adaptive to internal changes of the operating process, which makes them less attractive for on-the-fly one-stepahead predictions. However, it should be mentioned that NN modeling is indeed a very powerful tool with many advantages and applications. In fact, even though a very simple NN model is used in this study for comparison purposes, it is true that morecomplex NN structures might have better predictive ability for the case studies of this work. The only reason of comparison between the two is to highlight the flexibility of Kriging and the use of a rather small number of parameters, compared to NN. The computational cost of the proposed methodology is a very important aspect for its application in an online setting. However, this will not increase significantly if the dimensionality of the problem increases. Specifically, it is reported by our group in refs 1,8, and 17 that the same algorithm used in a different framework for optimization purposes requires CPU time on the order of seconds for problems with a larger number of independent variables (67). The overall computational cost of the proposed approach, however, will increase the initial stage of the collection of samples for building the database for the surrogate model. As the dimensionality of the problem increases, then the design of computer experiments will include a

ARTICLE

significantly larger number of required simulations. Consequently, if the cost of a single simulation is high and the number of independent variables is large, then the computational burden of obtaining the initial database can be very time-consuming. However, even though the data collection stage will increase, the overall computational cost will lead to a reduced-order Kriging model, which will be a fast approximation of the real model. In fact, computationally expensive problems that are not suitable for an online setting can benefit greatly from this methodology. In our current work, we are studying the effect of different DoE strategies on the quality of the produced surrogate model. As proposed in the literature of surrogate-based optimization using Kriging, space-filling designs such as Latin hypercube sampling require less function calls than factorial designs and are still efficient for high-dimensional problems. In addition, the greatest disadvantage of any surrogate-based model is the strong dependence on the experimental data. Thus, if the experimental region is not sampled correctly, evenly, and sufficiently, the predictive ability of the model will be poor and the overall quality of the model will be affected. Future work includes the implementation of this dynamic Kriging approach to real dynamic experimental data of additional powder processing equipment, such as powder mixing. Reallife experimental data—especially when coming from powder processes—are inherently uncertain and noisy. Thus, the future challenge will be the integration of the current approach with filtering techniques in order to minimize the effects of noise in model prediction.

’ AUTHOR INFORMATION Corresponding Author

*Tel.: 732-445-2971; Fax: 732-445-2581. E-mail: marianth@ soemail.rutgers.edu.

’ ACKNOWLEDGMENT The authors would like to acknowledge the support provided by the ERC (NSF-0504497, NSF-ECC 0540855). ’ REFERENCES (1) Davis, E.; Ierapetritou, M. A Kriging Method for the Solution of Nonlinear Programs with Black-Box Functions. AIChE J. 2007, 53 (8), 20012012. (2) Jia, Z.;Predictive Modeling for Pharmaceutical Processes Using Kriging and Response Surface. J. Pharm. Innovation 2009, 4 (4), 174–186. (3) Plumb, K. Continuous Processing in the Pharmaceutical Industry: Changing the Mind Set. Chem. Eng. Res. Des. 2005, 83 (6), 730–738. (4) Boukouvala, F.; Muzzio, F. J.; Ierapetritou, M. G. Predictive Modeling of Pharmaceutical Processes with Missing and Noisy Data. AIChE J. 2010, DOI: 10.1002.aic.12203. (5) Cressie, N. Statistics for Spatial Data; Wiley Series in Probability and Statistics; WileyInterscience: New York, 1993. (6) Isaaks, E. S. R. Applied Geostatistics; Oxford University Press: New York, 1989. (7) Rasmussen, C. E. Williams, C. K. I. Gaussian Processes for Machine Learning; MIT Press: Boston, 2006. (8) Davis, E.; Ierapetritou, M. A Kriging-Based Approach to MINLP Containing Black-Box Models and Noise. Ind. Eng. Chem. Res. 2008, 47 (16), 6101–6125. (9) Jones, D. R.; Schonlau, M.; Welch, W. J. Efficient Global Optimization of Expensive Black-Box Functions. J. Global Opt. 1998, 13 (4), 455–492. 6753

dx.doi.org/10.1021/ie102305a |Ind. Eng. Chem. Res. 2011, 50, 6743–6754

Industrial & Engineering Chemistry Research

ARTICLE

(10) Lucifredi, A.; Mazzieri, C.; Rossi, M. Application of Multiregressive Linear Models, Dynamic Kriging Models and Neural Network Models to Predictive Maintenance of Hydroelectric Power Systems. Mech. Syst. Signal Process. 2000, 14 (3), 471–494. (11) Azman, K.; Kocijan, J. Application of Gaussian processes for black-box modelling of biosystems. ISA Trans. 2007, 46 (4), 443–457. (12) Hernandes, A.; Martha, G. G. An Exploratory Study of Discrete Time State-Space Models Using Kriging. Presented at the 2008 American Control Conference, 2008. (13) Andres, H.; Gallivan, M. G. Stochastic Dynamic Predictions Using Kriging for Nanoparticle Synthesis. In 10th International Symposium on Process Systems Engineering, 2009. (14) Agatonovic-Kustrin, S.; Beresford, R. Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J. Pharm. Biomed. Anal. 2000, 22 (5), 717–727. (15) Shaw, A. M.; Doyle, F. J.; Schwaber, J. S. A dynamic neural network approach to nonlinear process modeling. Comput. Chem. Eng. 1997, 21 (4), 371–385. (16) Bachmaier, M.; Backes, M. Variogram or semivariogram? Understanding the variances in a variogram. Precis. Agric. 2008, 9 (3), 173–175. (17) Davis, E.; Ierapetritou, M. A kriging based method for the solution of mixed-integer nonlinear programs containing black-box functions. J. Global Opt. 2009, 43 (2), 191–205. (18) Johann, G.; Jure, Z. Neural Networks Chem. 1993, 503–527. (19) Basheer, I. A.; Hajmeer, M. Artificial neural networks: Fundamentals, computing, design, and application. J. Microbiol. Methods 2000, 43 (1), 3–31. (20) Ichikawa, H. Hierarchy neural networks as applied to pharmaceutical problems. Adv. Drug Delivery Rev. 2003, 55 (9), 1119–1147. (21) Behzadi, S. S.;Comparison between two types of Artificial Neural Networks used for validation of pharmaceutical processes. Powder Technol. 2009, 195 (2), 150–157. (22) Kachrimanis, K.; Karamyan, V.; Malamataris, S. Artificial neural networks (ANNs) and modeling of powder flow. Int. J. Pharm. 2003, 250 (1), 13–23. (23) Mendyk, A.; Jachowicz, R. Neural network as a decision support system in the development of pharmaceutical formulation—Focus on solid dispersions. Expert Syst. Appl. 2005, 28 (2), 285–294. (24) Hsu, S.-H., Reklaitis, G.; Venkatasubramanian, V. Modeling and Control of Roller Compaction for Pharmaceutical Manufacturing. Part I: Process Dynamics and Control Framework. J. Pharm. Innovation, in press. (25) gPROMS Advanced User Guide; Process Systems Enterprise: London, U.K., 2003. (26) Demuth, H.; Beale, M.; Hagan, M. Neural Network Toolbox User’s Guide; MathWorks, Inc.: Nattick, MA, 2010.

6754

dx.doi.org/10.1021/ie102305a |Ind. Eng. Chem. Res. 2011, 50, 6743–6754