Optimization of Batch Operating Policies. Part I. Handling Multiple

Oct 10, 2006 - Salvador Garcı´a-Mun˜oz,†,‡ Theodora Kourti,† John F. MacGregor,*,† ... Canada, and Tembec, 33 Kipawa Road, Te´miscaming, Q...
0 downloads 0 Views 614KB Size
7856

Ind. Eng. Chem. Res. 2006, 45, 7856-7866

Optimization of Batch Operating Policies. Part I. Handling Multiple Solutions# Salvador Garcı´a-Mun˜ oz,†,‡ Theodora Kourti,† John F. MacGregor,*,† Francesca Apruzzese,|,§ and Marc Champagne|,⊥ Department of Chemical Engineering, McMaster UniVersity, 1280 Main Street West, Hamilton, ON L8S4L7, Canada, and Tembec, 33 Kipawa Road, Te´ miscaming, QC J0Z 3R0, Canada

Jaeckle and MacGregor (AIChE J. 1998, 44, 1105-1118) introduced a data-driven technique to estimate conditions at which a process should operate (i.e., temperature, pressure, and reactant amountssrecipe) in order to yield a final product with a desired set of quality characteristics. Their proposed technique utilizes empirical latent variable models that are fitted to historical process data from existing process grades. This paper extends the methodology to include estimation of the entire set of time-Varying profiles for the manipulated Variables for batch processes. The problem is formulated in an optimization framework to include both equality and inequality constraints in the objective function. Since often the solution is not unique, the locus of the multiple solutions (defined as the null space) is studied and approaches to selecting the best solution for the final variable settings and trajectories are discussed. Finally, a parallel approach based on a deriVatiVe-augmented model is suggested that avoids considering null spaces to select the final design. An industrial batch digester from the pulp and paper industry is considered throughout the paper to explain and illustrate the key concepts. 1. Introduction Batch processes play an important role in today’s industry; specialty chemicals and most of the pharmaceutical industry use batch systems. One of the reasons to use batch processes is their flexibility to produce a wide range of products; this capability is due to the large range of process conditions in which a batch unit can be operated. Typically any batch system will have a defined set of operating conditions for each one of the products. For a continuous system, these operating conditions correspond to the desired values of the process variables at steady state, and it is left to the control system to maintain these operating conditions in the presence of disturbances and eventually to change such operating conditions during transitioning from one product grade to another. Because batch systems are dynamic systems, it is not enough to define the starting point, or initial conditions for the process variables; the path to follow (e.g., heating profile vs time) from the beginning to the end must also be defined. All these conditions will have a great effect on the physical and chemical properties of the final product. Determining these operating conditions for batch systems is not an easy task because of the fact that the process variables are highly autocorrelated and cross-correlated,2 and because of additional constraints such as equipment capacity, etc. The estimated operating conditions should achieve a certain set of * Corresponding author. Tel.: (905) 525-9140 ext 24951. Fax: (905) 521-1350. E-mail: [email protected]. † McMaster University. ‡ Current Address: Pfizer Global Research and Development, 445 Eastern Point Rd, Groton, CT 06340. E-mail: [email protected]. | Tembec. § Current Address: Tembec-Skookumchuck Operations, 4501 Farstad Way, Skookumchuck, BC V0B 2E0. ⊥ Current address: Effective Assets, Inc., P.O. Box 84, Stn Main, Sault Ste marie, ON P6A5L2, Canada. # The concepts presented here were first presented in the Ninth Scandinavian Chemometrics Conference, SSC9, in Reykjavic, Iceland, August 21-25, 2005, and in the Annual Meeting of the American Institute of Chemical Engineers in Cincinnati, Ohio, Oct 30-Nov 4, 2005.

specifications in the final product (e.g., molecular weight, tensile strength, and particle size distribution) while consuming the minimal resources for its operation (e.g., catalyst, raw material, and time) and respecting constraints in all operating procedures. The optimal selection of batch process operation conditions has been widely studied by researchers. The different methodologies used can be grouped and classified as shown in Figure 1: fundamental model-based methodologies, experimental response surface methods, and empirical model-based methods. Fundamental model-based approaches can be split into offline one-time optimization-based approaches and recursive optimization based on batch-to-batch iterations using fundamental models and on-line data. The fundamental model-based optimization for the design of batch process operation involves the use of a highly nonlinear differential algebraic equation system. Several approaches have been proposed to solve the optimization of these dynamic systems. This deterministic optimization will give the trajectories for the manipulated variables3-5 under some assumed nominal conditions. The problem is that the real conditions will change over time, and so the optimal trajectories must be updated. Therefore, a second approach within the fundamental modelbased solutions is to use some simplifications to the full deterministic models and recursive optimization to “move” the operation in real time, updating the solution each time a new batch is available. Examples of this approach are as follows: (i) those that propose the use of necessary conditions for optimality (NCO) to define a set of solution “arcs” which are determined from fundamental knowledge, and the solutions are then continuously updated in a batch-to-batch mode;6,7 (ii) the use of tendency models where the kinetics of a reaction system are simplified, the models are updated from batch to batch,8,9 and the optimization is resolved; and (iii) the use of nonlinear controllers and nonlinear observers10 to move the operation to desired conditions. The reader is referred to the review by Srinivasan et al.,11 which covers many of the deterministic solutions available and offers a comparison among them. Experimental response surface methods (RSM) based on the design of experiments have also been used to optimize the batch conditions. This approach rarely involves the trajectory of a

10.1021/ie060314g CCC: $33.50 © 2006 American Chemical Society Published on Web 10/10/2006

Ind. Eng. Chem. Res., Vol. 45, No. 23, 2006 7857

Figure 1. Classification of the different approaches to batch operation design.

variable and is restricted to the choice of recipe and process conditions that are to be kept constant throughout the batch.12,13 This approach is extensively used in the development of pharmaceutical processes. Empirical model-based approaches can also be split into offline approaches and on-line recursive optimization approaches using historical databases. Example of the latter are approaches based on batch-to-batch and iterative learning control (ILC)14,15 that iteratively adjusts the conditions from batch to batch, eventually converging to a solution. An example of the former off-line approach is the use of latent variable regression (LVR) models16 developed from historical data on existing product grades to estimate the set of operating conditions necessary to yield desired characteristics for a new product.1,17,18 This paper is concerned with the empirical off-line latent variable model approaches. Its contributions are as follows: (i) To extend the methods to the optimal selection of the full trajectories for batch processes. (ii) To provide more insight into the case where multiple solutions exist (a null space) and methods of restricting this space or selecting a desirable solution from the space. (iii) To allow for the incorporation of constraints through posing the solution in an optimization framework. (iv) To introduce an alternative methodology that utilizes a multiway projection to latent structures (MPLS) model augmented with the derivatives of the process variables with respect to its evolution index. Since inverting the augmented model gives a solution that must match both the variable trajectories and the derivatives of the trajectories, there is generally a unique solution and no need to consider the null space. (v) To illustrate the methods through application to an industrial process. The paper is structured as follows: Section 2 summarizes the model inversion approach suggested by Jaeckle and MacGregor.1 Section 3 describes the data structure of the industrial example that will be used to illustrate the approaches. The problem is defined as an optimization problem with constraints in Section 4. Results from this approach are illustrated with the industrial data in Section 5. An augmented derivative approach is outlined in Section 6 and illustrated on the industrial digester application. Concluding remarks are in Section 7. 1.1 Notation. This paper follows the same notation as in past work19-21 done in multivariate modeling of batch processes. A data set from a batch process consists of I batches, and during each run there are J variables that are sampled K times throughout the batch. After the unfolding of the batch data set as proposed by Nomikos and MacGregor,19 the two-way matrix of variable trajectories (X) consists of I rows and JK columns. At the beginning of the run, there might be a set of L initial conditions, which are recorded for each run, and at the end of the run, the quality of the final product is characterized by a set of M product properties. The initial conditions will form an I × L matrix referred as to Z, and the set of final product quality

Figure 2. Unfolded data structure of a batch process.

characteristics form an I × M matrix referred to as Y. The unfolded data structure is illustrated in Figure 2. For any multivariate model in this work, A refers to the number of significant latent variable components. These indexes in lower case refer to one specific element of the data: i refers to a specific batch, j refers to a specific process variable, l refers to a specific initial condition, m refers to a specific final property, and a refers to a specific component in a latent variable model. 2. Design of Operating Conditions by LVR Model Inversion 2.1. LV Models. The advantage of LVR batch models is that not only do they describe the relationship between Z, X, and Y but they also will model the multivariate temporal and contemporaneous correlations in the variable trajectories (X) and the correlations among the variables in the initial conditions (Z) and final responses (Y). A latent variable model represents the matrices in Figure 2 in terms of a much smaller number of latent variable scores T (I × A),

X ) TPT + Ex

(1)

Z ) TRT + Ez

(2)

Y ) TQT + Ey

(3)

W/X W/Z

(4)

T ) [X Z]

[ ]

where the matrices P, R, and Q are loading matrices relating the data matrices (X, Z, and Y) to the latent variable scores in T, and T is estimated as linear combinations of (X, Z) with weight matrices (W/x , W/z ). The loading and weight matrices are estimated using a latent variable regression method such as PLS. 2.2. LV Model Inversion Approach to Product Design. In the approach to product design proposed by Jaeckle and MacGregor,1 it is assumed that there exists a database (Figure 2) on past steady-state operating conditions for continuous processes or on past batches (Z, X) corresponding to several grades of products with different final product properties (Y). A multiblock PLS model (eqs 1-4) is then built and used to design the operating conditions necessary to yield products with

7858

Ind. Eng. Chem. Res., Vol. 45, No. 23, 2006 Table 1. Grades of Pulp Produced in the Pilot Digester

Figure 3. Kappa and viscosity for a product family of pulp.

specified properties ydes. Their approach can be summarized in the following three steps. 2.2.a. Determine the Rank of the Y-Space and Check That the Desired Set of Properties (ydes) Falls in the Space of the PLS Model. Consider the data on past pulp grades (A-F) shown in Figure 3 (solid dots) that have been manufactured in an industrial pulp digester. The two y properties, the kappa number and the viscosity, are clearly highly correlated, and the effective rank of the Y data is close to one, implying that only one property can be selected independently. Hence, any PLS model built on these data could not be used to design conditions for a desired new grade 2 shown in Figure 3, but would be well-suited to do this for the new grade 1. A simple test for the effective rank of Y is to perform a PCA on Y and determine (e.g., by element-wise cross-validation) the number of important principal components (latent variables). We define this, as the effective statistical rank of Y (Ry). Any model will be valid only in the lower-dimensional plane defined by these latent variables. A simple test to check that the new ydes is a condition for which the model is valid is to project it onto the PCA model of Y and test that its square prediction error (SPE) SPEnew ) (ydes - yˆ PCA)T(ydes - yˆ PCA) is not significantly larger than, say, a 99% limit based on the data used to build the model. 2.2.b. Use the Inverse of the Model to Compute the Scores That Should Yield ydes. From the PLS model for Y (eq 3), the model for the design conditions is given by

yˆ des ) Qτnew

(5)

where (τnew) is the vector of scores [t1,new, t2,new, ..., tA,new] corresponding to ydes. In the absence of constraints, the vector τnew can be estimated using the pseudo-inverse of eq 5

τnew ) (QTQ)-1QTydes

(6)

2.2.c. Estimate the New Process Operating Conditions xnew That Correspond to τnew. A major advantage of latent variable models is that they provide a model not only for the Y space but also for the X and Z spaces (eqs 1 and 2). Given τnew, these models can be used to estimate the required initial batch conditions (i.e., recipe) and operating trajectories as

zˆTnew ) τTnewRT

(7)

xˆ Tnew ) τTnewPT

(8)

Use of these models ensures that the new operating conditions will have the same covariance structure as the ones already experienced at the plant and will lie on the LV model plane (eq 4) describing past operating conditions. The estimated xˆ new

grade

scaled kappa no.

scaled viscosity

A B C D E F

14.24 18.72 25.71 29.76 36.4 47

639.93 1706.9 2372.37 2796.3 3347.63 4175.21

is then rescaled, and the mean is added (the inverse exercise as in the preprocessing of the data). In the framework proposed by Jaeckle and MacGregor,1 steps b and c can be combined by expressing the new score vector as a function of the new process conditions. The solution (xˆ new) given by the method is unique except for the scenario where the rank of Y is smaller than the number of latent dimensions in [X Z]. The extra dimensions in [X Z] (which have no impact on Y) define a null space. This null space represents an operational region (multiple solutions to the problem) where xˆ new can “move” and still yield the same yˆ des. The final operating conditions along this null space can be defined by specifying additional criteria. 2.3. Extensions to the Model Inversion Approach. To introduce constraints in the desired quality, Yacoub and MacGregor18 suggest using an optimization approach to inverting the model (step b) and imposing constraints such as one on Hotelling’s T2 statistic for the new set of scores

min {(ydes - Qτnew)TG1(ydes - Qτnew)} τnew s.t. T ) 2

A

τnew,a2

a)1

sa2



e const

(9a)

(9b)

In eq 9, sa2 represents the variance of each of the columns of T in the PLS model used, const represents the magnitude of the constraint (e.g., the 99% confidence limit of Hotelling’s T2 for the training data set), and G1 is a weighting matrix; other constraints can be added to this formulation. This model inversion approach finds the solution closest to the existing trajectories (in terms of the distance T2) and has been successfully applied to industrial problems.18 In the absence of Hotelling’s T2 constraint, the result obtained by minimizing eq 9 is identical to the direct model inversion one. 3. Industrial Example: Description of Data Structure A relatively simple process is chosen as the illustration example: a pilot-scale batch pulp digester. The equipment is under tight control and disturbances are minimal (pilot-plant environment). The only process variable considered is the temperature trajectory in the reactor (the initial acid content is not considered in the design exercise since it is difficult to manipulate this variable in the process and there were no available measurements for it). The final product is characterized by two properties: the kappa number and the viscosity. To reduce common-cause variation in the data, each observation in the data set is constructed by averaging three repeats per grade. In general, it can be said that, the higher the temperature and the longer the cook time (batch length), the lower are the viscosity and the kappa number. The data set was provided by Tembec, Inc. (Canada), and it consists of six grades of pulp; the final properties of each of the grades are given in Table 1 and plotted in Figure 3, while the temperature profiles to achieve

Ind. Eng. Chem. Res., Vol. 45, No. 23, 2006 7859

Figure 4. Scaled temperature profiles for different grades of pulp (average of three runs each).

to benchmark the estimates. The best way to validate any estimate would be to run the process with the estimated conditions and then measure the properties of the final product; however, this option was not available in this study. As a reference for future sections, Figure 6 presents a summary of a multiblock PLS model (MBPLS)24 between the operating conditions z, X, and Y. The model uses two components, and the percentages captured variance per component are 97.82% and 0.58% for z, 73.52% and 17.34% for X, and 85.43% and 12.61% for Y (total R2 is 98% for z, 91% for X, and 98% for Y). The residual SPE plots for each of these matrices are shown in Figure 6 along with the PLS scores. The block scaling for z and X is set equal; that is, the total time required for a batch is given equal weight to temperature profile data. 4. Batch Trajectory Estimation using Optimization with Constraints

Figure 5. Aligned temperature profiles for different grades of pulp.

each of these grades in the pilot reactor are shown in Figure 4. The quantities in Table 1 and Figures 3 and 4 have been scaled for confidentiality reasons. Because this data set is going to be fitted with a multiway PLS model, it is necessary to perform variable alignment22,23 on the temperature profiles because each batch run has a different length. To synchronize this data set, a simple resampling is done to achieve 200 samples for all batches (interpolating linearly when needed) because the profiles are not monotonically increasing or decreasing. Other ways of alignment are possible by segmenting the batch run in different stages;23 however, for illustration purposes, this simple linear alignment proved adequate. The temperature profiles after the alignment (Figure 5) are contained in the matrix X (6 × 200), the total time for each batch is stored in a separate vector referred to as z (6 × 1 column vector), and the final properties for each grade are referred to as Y (6 × 2 matrix). In general, the trajectories of a batch system will be referred to as X, the initial conditions as Z, and the set of properties of the final product as Y. These three matrices z, X, and Y are used to illustrate the design method to be developed in the next sections. In each of the “design exercises” throughout the next sections, one grade at a time is left out from the known data set, and the final properties of the left-out grade are specified as the desired characteristics (or desired quality). The design is computed, and the estimated temperature profile is compared with the left-out temperature profile, which represents at least one known solution to the design problem. Because it might be possible to achieve the same quality in more than one way (multiple solutions to the problem), this last comparison becomes a suboptimal way

In this section, the complete time profiles for each of the process variables are considered, and the model inversion solution is presented with a thorough analysis of the null space containing the multiple solutions to the problem. The process described in Section 3 is used later in this section as an illustrative industrial example. The general methodology to solve the design problem is now segmented in two steps: (i) determine the point or locus of points in the latent variable space, represented by a score vector (τnew), that correspond to the desired set of properties (ydes) and (ii) use this score vector to reconstruct the operating conditions using the correlation structure captured by the LV model of the z and X spaces. This corresponds to steps b and c in the existing model inversion approach, as outlined in Section 2.2 4.1. Solving for τnew That Will Yield ydes. Jaeckle and MacGregor1 proposed the direct unconstrained inversion of the latent variable model shown in eq 6, once the Y is reduced to full rank (τnew ) (QTQ) - 1QTydes). However, in some circumstances, the new desired quality may not be entirely defined by equality constraints (where all values of ydes must have a determined value) but may include inequality constraints on some properties in ydes. In such cases, the corresponding desired score vector is not easily found by exploration or direct inversion. To find the desired score vector in such a scenario, the following optimization is proposed.

{

min (ydes - Qτnew) G1(ydes - Qτnew) + F τnew T

s.t.

( )} A

τnew,a2

a)1

sa2



(10)

Bydes ) BQτnew < b The elements of the diagonal matrix of G1 may be zeros for those elements of ydes which are not subject to equality constraints. The elements of ydes having inequality constraints will have a corresponding weight in the B matrix and a constraint value given in b. Also in this equation, Hotelling’s T2 statistic has been included as a soft constraint with a given weight F (last term in eq 10). Minimizing this soft constraint is particularly useful for selecting a reasonable solution that is as close as possible to past plant operation when there are multiple solutions to the problem (i.e., a null space in X). This objective function will give the same solution as the direct model inversion1 in the unconstrained case. Imposing a hard constraint onto Hotelling’s T2 statistic will anchor the solution at the

7860

Ind. Eng. Chem. Res., Vol. 45, No. 23, 2006

Figure 6. Score24 plot (t1-t2) and squared prediction errors (SPE) for a MBPLS model with Z, X, and Y.

intersection of the null space and the given constraint; a soft constraint, on the other hand, will enable the solution to be at the minimal possible Mahalanobis distance from the origin of the plane (see Figure 9), given a balanced adjustment on the weights. G1 is usually a diagonal matrix, and the diagonal elements will determine how much weight is given to meeting the equality constraint in each quality variable ydes in the solution. This weighting matrix can impose more weight on those properties that are of more interest to the customer. In this example, since there was no preference for either value in ydes, the approach taken was to assign the weighting to be proportional to the fractions of their explained variances from the PLS model (eq 11).

G1,m ) 1 -

RSSYm ) Rm2 TSSYm

(11)

where Rm2 is the fraction of the sum of squares of the property ym explained by the model. TSSYm is the total sum of squares per variable (Y is mean-centered), RSSYm is the residual sum of squares per variable (Y - TQT)T(Y - TQT). These approaches will down-weight a variable that is poorly predicted by the PLS model. This is important to consider in eq 10, since not all columns of Y have the same predictability. 4.2. Reconstructing the Operating Conditions xˆ new from τnew. Once the new score vector (τnew) is calculated (eq 9 or 10), the completed process variable trajectories (xˆ new) and the new required batch time (zˆ new) are estimated from the MBPLS model as in eqs 7 and 8. If the effective rank of Y is equal to the number of latent variables in X, the estimate of τnew is unique.1 Under these circumstances, there is only one point in the score space that corresponds to the desired ydes and the sought [zˆ newxˆ new]. However, as is often the case, the effective rank of X is higher than the effective rank of Y, and then a null space exists. The

Figure 7. Null space (gray straight line) and minimum Euclidean distance (b) and minimum Mahalanobis distance (() solution along the null space.

dimension of the null space in this case is the difference in the effective ranks. 4.2.1. Understanding the Concept of the Null Space. When a null space exists (infinite solutions), the solution given by the pseudo-inverse of Q (τnew ) (QTQ)-1QTydes) gives the one with minimal Euclidean norm1. This solution is not the best possible solution, since Euclidean distances in the score spaces are not meaningful if the scores have different variances. A minimal Hotelling’s T2, or Mahalanobis distance, would be better. To find this point along the null space, eq 10 is used instead of eq 9 (using a soft constraint instead of a hard constraint in Hotelling’s T2). This formulation is easy to solve since the Mahalanobis distance is a quadratic function as well. Figure 7 illustrates a null space in the case where X is twodimensional, while the Y space is one-dimensional. In this case, the null space is a one-dimensional subspace (a line) in the twodimensional score plot, and any solution (score) along this line

Ind. Eng. Chem. Res., Vol. 45, No. 23, 2006 7861

corresponds to the same yˆ des from the PLS model. Two solutions along the null space are shown in the figure. The minimum Euclidean distance is found with a circle of minimum radius, which is tangent to the null space, and the minimum Mahalanobis distance is found with an elliptic function (given by the variances of the scores), which is tangent to the null space. (Note that the symbols in this figure are unrelated to those used in Figure 9). The calculation of the null space is given by Jaeckle and MacGregor.17 The null space can also be found and understood in a more intuitive way, as shown below. A null space will exist when the effective rank of Y (Ry) is smaller than the effective rank of X(Rx). The dimension of the null space is given by Rx - Ry, implying that there are (Rx Ry) dimensions in which τnew can move without changing the predicted value of ydes. This can be expressed as

yˆ des ) Q(τnew + ∆τnew)

(12)

In order for the movement (∆τnew) not to affect the prediction of ydes, the definition of a null space, it is necessary that

Q∆τnew ) 0

(13)

The Q matrix has M rows and A columns, so eq 13 is a linear equation system with M equations and A variables,

q1,1∆τ1 + q1,2∆τ2 + ‚‚‚ + q1,A∆τA ) 0 q2,1∆τ1 + q2,2∆τ2 + ‚‚‚ + q2,A∆τA ) 0

(14)

l qm,1∆τ1 + qm,2∆τ2 + ‚‚‚ + qm,A∆τA ) 0 If Ry < Rx, the system in eq 14 is reduced to a system that has more variables than equations, and there is an infinite number of solutions along the space described by the linear system given by eq 14 (which depends on the values of Q). This approach to compute the null-space gives the same solution as the one proposed by Jaeckle and MacGregor17 (which performs singular value decomposition (SVD) on the Q matrix) if the columns of Y are independent. Each of the rows in the system described by eq 14 represents a univariate null space for each of the variables in Y (subspace in which τnew can move and still give the same yi,des). Consider the case where Rx - Ry ) 1 and there is a null space of dimension 1 (a line in the latent variable space of the PLS model). If there were no measurements or other errors and the rank deficiency was exact, then all the equations in eq 14 would be identical and would define the null space. In practice, these will be slightly different (since only the effective rank is being used here). To define the null space, one could select the row of eq 14 corresponding to the most important y variable or define a pseudo-null space as the singular vector direction corresponding to the smallest singular value (∼zero) of the Q matrix. The latter pseudo-null space will define the best approximation to the set of equations (eq 14). 4.3. Application to an Industrial Pulp Digester. 4.3.1. Unconstrained Solutions. The product quality data for the existing grades are listed in Table 1 and plotted in Figure 3. The design exercise is first performed (leaving out one grade at a time) with no constraints to Hotelling’s T2 in eq 10, using both quality variables (kappa and viscosity) (i.e., ignoring the near collinearity incident in Figure 3), and the [zˆ newxˆ new]solution is computed as Pτnew (eq 1). The final results are presented as their raw values (reapplying the time scale to the trajectories).

The MBPLS model for each case is built with two latent variables From Figure 3, it is clear that the design exercise for grades A and F represents an extrapolation of the model, while the design for grades B, C, D, and E requires an interpolation. Figure 8 illustrates the solution for grade C (which is representative of all other intermediate grades), and for grades A and F. In Figure 8, each one of the subplots contains the temperature trajectories and the corresponding scores for the new design grades and the existing grades used in the model; the score plots clearly show how the designs for grades A and F represent an extrapolation in the score space. In Figure 8, the dashed ellipses correspond to the 90% confidence limits around the origin,20 based on the five grades used to build each model. In the case of the design of the intermediate grade C (Figure 8 top), the solution appears to be very accurate. This does not happen when the desired grade scores are outside the normal region of operation (Figure 8 bottom subplots); notice that, for these extrapolation cases (grades A and F), the unique solutions in the score space appear way outside the confidence region; hence, the corresponding trajectories obtained from these score are very different from those previously experienced in the plant. These trajectory solutions for grades A and F obtained by direct inversion using all y variables and no constraints would clearly be unacceptable because they are too far away from existing experience. In the following section, it is shown that much better solutions are obtained by recognizing that the Y space is ill-conditioned (of lower effective rank) and performing constrained optimization to obtain the best solution in the resulting null space. 4.3.2. Reduced Rank, Constrained Solution. The solution obtained in the past section was unique because it was assumed that the Y had an effective rank of two (same rank as X). The unique solutions (for grades A and F) to this ill-conditioned inversion problem are obviously undesirable (after an engineering examination), since the estimated trajectories are very different from those previously experienced in the plant. Nevertheless, these solutions deserve a close analysis, since they can uncover some features and capabilities of the process which were not known before. Solutions which fall closer to existing operating policies (the origin in the score plot) are more desirable, since the shape of the corresponding trajectories will be similar to those in previous runs. To obtain such a solution, one has to recognize the existence of a null space and perform constrained optimization to obtain a desirable solution from the null space. Table 2 lists the percentage variance captured by a PCA model fitted to [z X] and Y separately. From this table, it is clear that [z X] is well-explained with two latent variables and Y is well-explained with one. Therefore, the problem involves a projection from a lower-dimensional space (Y) to a higherdimensional space ([z X]),1,16 implying that there will exist multiple solutions lying along a one-dimensional null space (a line in the score plots). The third component is insignificant by cross-validation and is not used in the final model. Figure 9 illustrates the reduced rank solution to the design of the operating trajectory for grade A. The full rank unconstrained solution previously obtained in Figure 8 for this grade is indicated by the + location in the plot. These null spaces are shown depending upon how one reduces the rank of the Y space. If one simply eliminates one of the y variables and defines the null space as the constrained eq 14, corresponding to the retained response (Kappa or viscosity), one obtains the dotted and dashed null space lines shown in Figure 9. Any score values along these lines should provide the same predicted Kappa or viscosity,

7862

Ind. Eng. Chem. Res., Vol. 45, No. 23, 2006

Figure 8. Score plots (O, model; crossed square, τnew) and design estimates (dotted line), known solution (solid black) and model building grades (gray lines), for grades C (top), A (bottom left), and F (bottom right). Table 2. Percentage Captured Variance of X and Y in Individual PCA Models component

PCA R2 ([z X])

PCA R2CUM ([z X])

PCA R2 (Y)

PCA R2CUM (Y)

1 2 3

75.68% 17.54% 6.78%

75.68% 93.22% 100.00%

99.00% 1.00%

99.00% 100.00%

respectively. The null space obtained from the SVD solution using both responses1,16 is the solid line lying halfway between those, based on retaining only one variable. All remaining discussion will be based on using this latter pseudo-null space to provide the best estimate of the locus of points in the score space that will provide the same solution ydes to both responses. Since the pseudo-null space provides, in principle, equivalent ydes solutions, the question is then on how to select the “best” solution among these alternatives, which will be that point on the null space that is closest to the existing plant operation where the distance is defined as the Mahalanobis distance T2; then the solution is obtained by the optimization shown in eq 10. For F > 0 and constraint coefficients B and b defined as those corresponding to the pseudo-null space line, this optimization provides the solution shown by the ( marker in Figure 9. That solution lies at the tangent of the null space line to the smallest T2 ellipse centered on the origin; hence, it is the solution

closest to existing operations. Solutions are also shown if one were to use the null space lines defined based on using kappa number or viscosity responses by themselves. As expected, these solutions lie close to that obtained from using the pseudo-null space based on both responses. Also shown are solutions along the pseudo-null space line obtained by imposing hard constraints on Hotelling’s T2. It is interesting to notice that a known solution (* in Figure 9), which is the score vector obtained when the known trajectory for grade A from actual plant operation is projected onto the PLS latent space, falls on the pseudo-null space and projects very close to the constrained solutions obtained with softconstraints ((, circled solid triangle, and 1) on T2. The corresponding temperature trajectories for these solutions are all close to the known trajectory for grade A (Figure 10). The temperature profiles corresponding to the solution obtained with hard constraints on T2 are substantial extrapolations and, hence, are still very different from the past runs and might not be as desirable as the former ones (Figure 11). From an operational point of view, there is no doubt that the solution obtained with soft constraints along the pseudo-null space is the best solution of all. It falls between the solutions obtained by following either of the univariate null spaces, and as seen in Figure 10, it lies very close to that used by the company for grade A.

Ind. Eng. Chem. Res., Vol. 45, No. 23, 2006 7863

Figure 9. Univariate null spaces, pseudo-null space, and several solutions for the design of grade A. Legend: (b) Grades B to F, (*) known conditions for grade A currently used in the plant, (+) analytical optimum (model inversion), (9) solution with hard constraints on Hotelling’s T2, (() solution with soft constraints (SC) on Hotelling’s T2, (1) solution with SC on Hotelling’s T2 designing for viscosity, (circled upward-facing solid triangle) solution with SC on Hotelling’s T2 designing for kappa.

Figure 10. Temperature trajectories for the several solutions along the univariate null and pseudo-null spaces for the design grade A using soft constraints in the estimation of τnew.

5. Alternative Approaches using Derivative Augmented Models In Section 4, it was shown that some unconstrained design estimates exhibited undesired structural characteristics, such as drastic changes in slope and “jumps”. These undesirable solutions occurred when the solution in the score space lay well outside the 95% confidence region of the training data. The inclusion of multivariate constraints (on Hotelling’s T2) was seen to correct the problem. From the above discussion, one might conclude that the multiway PLS model for the batch system lacked some structural information about the trend (slope) of the trajectories in order for it to be incorporated into the design estimates. Therefore, it might be desirable to augment the model to include the derivatives of the trajectories. Such additional information in

Figure 11. Temperature trajectories for the several solutions along the pseudo-null space for the design grade A using hard constraints in the estimation of τnew.

Figure 12. Data structure and unfolded X as ordered to be used in the augmented model.

the PLS model (on the normal behavior of the derivatives for existing grades) might lead to design solutions with acceptable trend and curvature behavior even without introducing constraints. The reason for this is that, by inverting the augmented PLS model, the derivatives of the solution will also be forced to be consistent with past operation performance (i.e., constrained to lie on the model plane).

7864

Ind. Eng. Chem. Res., Vol. 45, No. 23, 2006

Figure 13. Score plots (O, model, crossed square, τnew) and design estimates (dotted line), the one known solution (solid black) and model building grades (gray lines) for grades A (left) and F (right) in the pilot batch digester using augmented PLS model.

Information about the trends is contained in the derivative of the trajectories of the process variables with respect to its evolution index (EI). Mean centering the batch data X matrix as suggested by Nomikos and MacGregor 19 removes the major nonlinear component of the batch trajectories. Because of this, the derivatives of the mean-centered and scaled trajectories are the ones to be incorporated into the model. As shown in the work by Garcia et al.,23 when a set of batch trajectories is synchronized (or aligned), the time itself becomes an important process variable to be included in the analysis. The indicator variable chosen to align the rest of the variables against becomes a variable in the system that will be known a priori and will not change for all batches. For example, if the trajectories are aligned against conversion (conversion becomes the evolution index), then the conversion trajectory is a known vector for all batches. The fact that this evolution index (EI) is a known vector allows the derivatives of the time-varying trajectories (with respect to this EI) to be expressed as a linear function of the EI. For a data set consisting of I batches, with L initial conditions, and J variables measured at K times during the batch, one of the ways of ordering the unfolded X matrix is that shown in Figure 12. The derivatives of the trajectories (X) with respect to their evolution index can then be expressed as eq 15, where D is a JK × JK matrix; modifying D by adding L columns of zeros (D modified is referred to as DA) allows the derivative to be a function of the augmented [Z X] matrix (eq 16). DA is a JK × [L + JK] matrix (see Appendix)

d(X) ) XDT d EI

(15)

d([ZX]) ) [ZX]DAT d EI

(16)

For this particular application, the matrices D and DA are built using numerical derivatives as shown in the Appendix. The steps to be taken now are as follows: to build a PLS model using Z, X, [Z X]DAT, and Y, and then to reformulate the optimization problem to use this augmented PLS model; to estimate the new operating conditions [zˆ newTxˆ newT]; and to illustrate how the estimates compare with the ones obtained in past sections. 5.1. Building the Augmented Multiway PLS Model. The multiway PLS model to be built has to incorporate three different types of data into the X matrix: the initial conditions, the

trajectories, and the derivatives of the trajectories. The best way to balance these three types of data is to scale each block of data to unit variance, as in a multicblock PLS model.24 This implies equal importance of each block in the analysis. However, this scaling is left to the discretion of the user. To achieve this, first the Z and X matrices are mean-centered and block-scaled to unit variance, as shown in the work of Westerhuis et al,24 then the matrix of derivatives with respect to the EI is computed (with eq 18 and referred to as d[ZX]) and then scaled to have block-unit variance as well. A PLS model between the augmented matrix [Z X, [Z X]DAT] and Y is then obtained. This augmented model will capture the information about the structure of the trajectories and the structure in the derivative of the trajectories. Using this multiway augmented PLS model with the product design formulations in eq 9a is a powerful way to simultaneously incorporate multivariate constraints on both the trajectories and the derivative structure of the trajectories of the estimated design (xnew) to yield a product with a certain desired quality (ydes). 5.2. Application to the Industrial Pulp Digester. The results for the design of the temperature trajectories for the extreme grades A and F for the pilot batch digester, obtained by building an augmented multiway PLS model and solving the optimization problem in eq 9a, are illustrated in Figure 13. The solutions shown in Figure 13 comes from the direct inversion solution of the augmented PLS model. One can easily notice (a) the lack of drastic changes in slopes in the trajectories (compare these with Figure 8 bottom) and (b) that the scores are no longer outside the boundaries of the normal operating region. Hotelling’s T2 constraint is no longer required for these grades, and it is no longer essential to explore the null space to find a realistic solution. Hence, the inclusion of the derivatives with respect to the EI into the multiway PLS model improves the design solution without the need of bounds or extra constraints, because the trend structure of the trajectories is being modeled.25 6. Conclusions The data-based design methodology proposed by Jaeckle and MacGregor1 for finding process operating conditions capable of yielding desired product properties is extended to the design of complete batch trajectories along with their initial conditions.

Ind. Eng. Chem. Res., Vol. 45, No. 23, 2006 7865

The design technique is also extended to include constraints on the solution in an objective function framework when the statistical model of the matrices is such that multiple solutions exist (a null space). An alternative design approach is proposed that uses the historical data augmented with the derivatives of the trajectories with respect to its evolution index. This approach allows for more of the trend structure of the trajectories to be taken into account, and the constraints considered in the former technique might not be necessary. Derivative augmented multiway PLS models could also bring in some benefit when the batch control problem is solved with latent variable regression models, where robustness is crucial and may require more trend structure to be incorporated into the model.15,26 Each of the methodologies is illustrated using data from an industrial batch digester. Temperature trajectories designed by these approaches are shown to be very close to those used in actual plant operation to achieve the specified product quality.

dχk )

-χk+2 + 8χk+1 - 8χk-1 + χk-2 12

(A4)

The fact that the actual magnitude of h is not considered is so that the derivatives will have the same units as the evolution index. These formulas can be formulated as matrix operations. For example, if χ consists of five samples,

[

-3 4 -1 dχT ) χT 0 0 0 0

0 -3 4 -1 0 0 0

1 -8 0 8 -1 0 0

0 1 -8 0 8 -1 0

0 0 1 -8 0 8 -1 2 2

0 0 0 1 -4 3 0

0 0 0 0 1 -4 3

]

[ ]

-1

0

12

12

Acknowledgment The authors thank Dr. Paul Nomikos (Invista Canada) for his valuable feedback. Salvador Garcia-Munoz thanks the McMaster Advanced Control Consortium and the Natural Sciences and Engineering Research Council of Canada for their funding. Appendix Numerical Derivative Calculation for Batch Trajectories. In this appendix, the derivatives of a set of batch trajectories with respect to its evolution index (EI) are formulated as a linear function of the trajectories. Consider, to begin with, the case of a single trajectory χ (column vector) which has k measurements (χ ∈ Rk×1). It is assumed that each sample of this trajectory corresponds to a unique point in the evolution of the batch, which can be time, or conversion in the case of a chemical reactor (e.g., for time ) k, there is only one corresponding value of χ, referred to as χk), and for simplicity, it is assumed that the evolution index is equally sampled at h intervals (this is not a requirement, and the formulation can still be obtained if this assumption does not hold for a specific case). A simple solution in the computation of the derivative of χk with respect to its EI (referred to as dχk) is to calculate a singlepoint derivative (eq A1). This, however, may yield very noisy derivatives due to the presence of noise in the process measurements. A better calculation of the numerical derivatives is required.

dχk ) χk-1 - χk

(A1)

High-accuracy numerical derivatives can be obtained if a Taylor series is analytically differentiated. The higher the order of the Taylor series, the more points the numerical derivative will require for its calculation. In this work, a three-point derivative ahead (eq A2) is considered for the first two samples in χ, three-point behind (eq A3) is considered for the last two samples of χ, and a four-point centered (eq A4) derivative is considered for the rest of the samples.

-χk+2 + 4χk+1 - 3χk 2 χk-2 - 4χk-1 + 3χk dχk ) 2

dχk )

(A5)

12

2

2

If both numerical matrices are referred to as D,

dχ ) Dχ

(A6)

D is a square matrix which has as many elements as samples in χ. In the case where the batch trajectory χ is preceded by L elements corresponding to the initial conditions, the D matrix can be modified by adding L columns of zeros. For the case where there are J variable trajectories, a bigger matrix DA can be built with D and as many columns of zeros as L initial conditions in Z (eq A7 illustrating the case for L ) 2 and J ) 6), such that the complete derivative of the trajectory can be expressed as eq A9, by previously ordering the trajectories as eq A8, where XuT corresponds to the transpose of the unfolded batch data matrix where the columns are grouped by variables.

[

[

0 0 0 DA ) 0 0 0

0D 0 D 0 0 D 0 D 0 0 D 0 D

]

]

(A7)

The approach described here is the one used in this work. Other

χ1,1 χ2,1 XuT ) χ3,1 l χJ,1

χ1,2 χ2,2 χ3,2 l χJ,2

χ1,3 χ2,3 χ3,3 l χJ,3

‚‚‚ ‚‚‚ ‚‚‚ l ‚‚‚

χ1,I χ2,I χ3,I which is a JK × I matrix l χJ,I (A8)

dXuT ) DA

[ ] ZT XuT

(A9)

[JK × I] ) [JK × (JK + L)] × [(JK + L) × I] approaches can also be used, if the derivative of the trajectories can be posed as a linear function of the original trajectories (eq A6).

(A2)

Literature Cited

(A3)

(1) Jaeckle, C. M.; MacGregor, J. F. Product Design Through Multivariate Statistical Analysis of Process Data. AIChE J. 1998, 44 (5), 11051118.

7866

Ind. Eng. Chem. Res., Vol. 45, No. 23, 2006

(2) Kourti, T. Multivariate dynamic data modeling for analysis and statistical process control of batch processes, start-ups and grade transitions. J. Chemom. 2003, 17, 93-109. (3) Biegler, L. T.; Cervantes, A. M.; Wachter, A. Advances in Simultaneous Strategies for Dynamic Process Optimization. Chem. Eng. Sci. 2002, 57 (4), 575-593. (4) Vassiliadis, V. S.; Sargent, W. H.; Pantelides, C. C. Solution of a Class of Multistage Dynamic Optimization Problems. 1. Problems without Path Constraints. Ind. Eng. Chem. Res. 1994, 33, 2111-2122. (5) Vassiliadis, V. S.; Sargent, W. H.; Pantelides, C. C. Solution of a Class of Multistage Dynamic Optimization Problems. 2. Problems with Path Constraints. Ind. Eng. Chem. Res. 1994, 33, 2123-2133. (6) Srinivasan, B.; Bonvin, D.; Visser, E.; Palanki, S. Dynamic optimization of batch processes. II. Role of measurements in handling uncertainty. Comput. Chem. Eng. 2003, 27, 27-44. (7) Clarke-Pringle, T.; MacGregor, J. F. Optimization of molecularweight distribution using batch-to-batch adjustments. Ind. Eng. Chem. Res. 1998, 37, 3660-3669. (8) Filippi, C.; Greffe, J. L.; Bordet, J.; Villermaux, J. Tendency Modeling of Semibatch Reactors for Optimization and Control. Chem. Eng. Sci. 1986, 41 (4), 913-920. (9) Fotopoulos, J.; Georgakis, C.; Stenger, H. G. Use of tendency models and their uncertainty in the design of state estimators for batch reactors. Chem. Eng. Process. 1998, 37, 545-558. (10) Kozub, D. J.; MacGregor, J. F. Feedback control of polymer quality in semi-batch copolymerization reactors. Chem. Eng. Sci. 1992, 47 (4), 929-942. (11) Srinivasan, B.; Palanki, S.; Bonvin, D. Dynamic optimization of batch processes. I. Characterization of the nominal solution. Comput. Chem. Eng. 2003, 27, 1-26. (12) Rao, R.; Manohar, B.; Sambaiah, K.; Lokesh, B. R. Enzymatic acidolysis in hexane to produce n-3 or n-6 FA-enriched structured lipids for coconut oil: Optimization of reactions by response surface methodology. J. Am. Oil Chem. Soc. 2002, 79 (9), 885-890. (13) Gorret, N.; bin Rosli, A. K.; Oppenheim, S. F.; Willis, L. B.; Lessard, P. A.; Rha, C. K.; Sinskey, A. J. Bioreactor culture of oil palm (Elaeis guineensis) and effects of nitrogen source, inoculum size, and conditioned medium on biomass production. J. Biotechnol. 2004, 108 (3), 253-263.

(14) Chin, I. S.; Lee, K. S.; Lee, J. H. A Technique for Integrated Quality Control, Profile Control and Constraint Handling for Batch Processes. Ind. Eng. Chem. Res. 2000, 39, 693-705. (15) Flores-Cerrillo, J.; MacGregor, J. F. Control of batch product quality by trajectory manipulation using latent variable models. J. Process Control 2004, 14, 539-553. (16) Burnham, A.; MacGregor, J. F.; Viveros, R. Latent variable multivariate regression modeling. Chemom. Intell. Lab. Syst. 1999, 48, 167-180. (17) Jaeckle, C. M.; MacGregor, J. F. Industrial applications of product design through the inversion of latent variable models. Chemom. Intell. Lab. Syst. 2000, 50, 199-210. (18) Yacoub, F.; MacGregor, J. F. Product optimization and control in the latent variable space of nonlinear PLS models. Chemom. Intell. Lab. Syst. 2004, 70, 63-74. (19) Nomikos, P.; MacGregor, J. F. Monitoring Batch Processes using Multiway Principal Components. AIChE J. 1994, 40 (8), 1361-1375. (20) Nomikos, P.; MacGregor, J. F. Multivariate SPC Charts for Monitoring Batch Processes. Technometrics 1995, 37 (1), 41-58. (21) Nomikos, P.; MacGregor, J. F. Multi-way partial least squares in monitoring batch processes. Chemom. Intell. Lab. Syst. 1995, 30, 97-108. (22) Kassidas, A.; MacGregor, J. F.; Taylor, P. Synchronization of batch trajectories using dynamic time warping. AIChE J. 1998, 44 (4), 864-875. (23) Garcia, S.; Kourti, T.; MacGregor, J. F.; Matheos, A.; Murphy, G. Troubleshooting of an industrial batch process using multivariate methods. Ind. Eng. Chem. Res. 2003, 42 (15), 3592-3601. (24) Westerhuis, J.; Kourti, T.; MacGregor, J. F. Analysis of Multiblock and Hierarchical PCA and PLS Models. J. Chemom. 1998, 12, 301-321. (25) Garcia, S. Batch Process Improvement Using Latent Variable Methods. Ph.D.Thesis, McMaster University, Hamilton, Ontario, Canada, 2004. (26) Flores-Cerrillo, J.; MacGregor, J. F. Latent variable MPC for trajectory tracking in batch processes. J. Process Control 2005, 15 (6), 651-663.

ReceiVed for reView March 15, 2006 ReVised manuscript receiVed August 8, 2006 Accepted August 10, 2006 IE060314G