Multivariate Statistical Analysis of an Emulsion Batch Process

Multivariate Statistical Analysis of an Emulsion Batch Process. Debashis Neogi ... needed is a historical database of pertinent process data from past...
0 downloads 0 Views 149KB Size
Ind. Eng. Chem. Res. 1998, 37, 3971-3979

3971

Multivariate Statistical Analysis of an Emulsion Batch Process Debashis Neogi and Cory E. Schlags* Air Products and Chemicals, Inc., 7201 Hamilton Boulevard, Allentown, Pennsylvania 18195

The power of multivariate statistical methodologies, namely, MPCA (multiway principal component analysis) and MPLS (multiway projection to latent structures or multiway partial least squares), for batch process analysis, monitoring, fault diagnosis, product quality prediction, and improved process insight is illustrated. These techniques were successfully applied to an industrial emulsion polymerization batch process. One key feature of this work is that reaction extent was used as the common reference scale to align batches with varying time durations. MPCA/MPLS technology (1) detected potential process abnormalities, (2) determined the time an abnormal event occurred, and (3) indicated the likely variable or variables which caused the abnormality. The results also indicated that variations in an ingredient trajectory and heat removal variables were primarily associated with viscosity variability. The resultant PLS model predicted the product viscosities within measurement error, thereby improving our workflow process. Process knowledge played a key role in variable selection and interpretation of the results. 1. Introduction In Air Products’ portfolio of chemical businesses, emulsion polymers play a strategic role. Many of these products are manufactured via batch processing technology. These operations are characterized by finite processing time, unsteady-state operation, and few product quality measurements. Additionally, complex chemistry and mixing phenomena coupled with a lack of on-line quality variable sensors pose challenging control problems. Presently, we help ensure consistent product quality by automating the sequence of various steps of our batch operations, by controlling individual process trajectories, and by monitoring reaction extent. Due to the multivariate and highly correlated nature of the measured variables in these emulsion processes, univariate SPC and SQC charts do not always provide us with complete information for monitoring and estimating product quality. A research program was therefore initiated at Air Products to develop multivariable statistical methodologies for analyzing and monitoring our emulsion processes and estimating the product quality in our manufacturing facilities. The specific methodology investigated was multiway principal component analysis (MPCA), an extension of principal component analysis (PCA) for batch processes. While MPCA examines process variability only, its adjunct methodology, multiway projection to latent structures/partial least squares (MPLS), can be used to examine the potential relationships between process trajectories and product quality variables. These methodologies are cutting edge statistical techniques; their applications to batch processes have only been established within the past few years. Interested readers should refer to Kourti and MacGregor,1 Nomikos and MacGregor,2 and Wold et al.3 for more detailed discussions of these techniques. To exploit the MPCA procedure, the information needed is a historical database of pertinent process data * To whom all correspondence should be addressed. Email: [email protected], [email protected]. Phone: (610) 4816695. Fax: (610) 706-7556.

from past batches which span the operating conditions of interest coupled with a knowledge of some fundamental process relationships. MPCA is primarily used to compress the information contained in the batch data into low-dimensional spaces that describe the operation of the past batches. MPCA leads to simple monitoring charts, consistent with the philosophy of statistical process control, which are capable of tracking the progress of new batch runs and detecting the occurrence of observable upsets. The potential uses of these methodologies include on-line/off-line process monitoring, location of the sources of batch-to-batch variability, fault diagnosis, and product quality prediction. While several applications of these multivariate techniques to batch processes have been reported, only a few of these have been applied to industrial processes. Nomikos and MacGregor2 first applied MPCA to a simulated semibatch polymerization reactor for analysis and process monitoring purposes and then subsequently to an industrial batch process.4 Gallagher et al.5 also utilized MPCA methods to monitor the processing conditions of a nuclear waste storage tank. By extending MPCA to MPLS technology, Nomikos and MacGregor6 applied these techniques to a simulated polymerization reactor for the purposes of obtaining on-line inferences of final product quality variables. Kourti et al.7 used multiblock and multiway PLS for the purposes of monitoring and fault diagnosis of two industrial polymerization processes. Albert et al.8 also reported applying various multivariate statistical techniques to a simulated batch polymerization reactor as well as a pilot-scale penicillin fermenter. Additionally, Kosanovich et al.9 applied these techniques to a polymerization process primarily for the purposes of improved process understanding. Finally, Dong and McAvoy10 compared both linear and nonlinear MPCA methods on the same simulated process of Nomikos and MacGregor.2 To the best of our knowledge, however, all the prior applications utilized batch duration as the basis for comparing batch processing trajectories. If processing time varies from batch to batch, as is often encountered in industrial processes, then batch duration is no longer

S0888-5885(98)00243-7 CCC: $15.00 © 1998 American Chemical Society Published on Web 09/04/1998

3972 Ind. Eng. Chem. Res., Vol. 37, No. 10, 1998

an acceptable comparison basis. Nomikos and MacGregor4 recommend replacing time with another variable which (1) varies monotonically throughout the batch duration and (2) provides the same processing signature for each batch. Only Kourti et al.,11 in their PLS application to a semibatch industrial process, used the total amount of one ingredient as a means to monitor the course of the reaction. Due to the variability in industrial reactor conditions, such as fouling, the total ingredient flow may not always provide an adequate comparison basis. Additionally, this method is not applicable to batch processes which have all the ingredients added at the start of the reaction. At Air Products, we have successfully developed MPCA and MPLS models for several of our products. The primary objective of this work is to demonstrate the application of multivariate analysis techniques to one particular emulsion polymerization batch process/ product. A significant component of this effort is that we used reaction extent as a means to compare batches with varying time durations. First, we show how MPCA/MPLS technology offers superior capabilities for process monitoring, fault detection, and diagnosis purposes. Second, we show how this technology is used to identify the process variables likely associated with the variability in the product viscosity. This technique also confirmed the hypothesis that our production cycle results in two different sets of process conditions, which, in turn, influence product viscosity. We subsequently developed two separate MPLS models for the two sets of process conditions we found this product to encounter. Finally, we illustrate how these models can be utilized to improve our process insight as well our work practices. Neogi and Schlags12 presented part of this investigation at the 1997 American Control Conference. This paper is organized as follows. Section 2 provides the necessary theoretical background for utilizing these methods. In section 3, the process is described. Section 4 discusses the nature of the data and its challenges and pretreatment. Section 5 provides an analysis of the process, while section 6 focuses on the analysis of the process/product quality relationships. Finally, implementation considerations and conclusions are presented. 2. Theory The primary objective of this work is to apply multivariate statistical techniques for process improvement and not to describe with mathematical rigor what MPCA/MPLS analysis is. With this focus, only a brief overview of the technique is presented. The interested readers looking for a detailed discussion of PCA and PLS, development of control charts, and computation of control limits for statistical process control are referred to Jackson,13 Nomikos and MacGregor,2 Wold et al.,3 and Kaspar and Ray.14 To understand the nature of the data available in a batch monitoring problem, consider a typical batch run (Figure 1) in which j ) 1, 2, ..., J variables (e.g., temperature, pressure in the reactor) are measured at k ) 1, 2, ..., K intervals of reaction extent throughout the duration of the batch. Data exist for a number of similar batch runs, i ) 1, 2, ..., I. Thus, the operational space can be summarized by the X (I × J × K) matrix. Data describing the final product quality (e.g., viscosity, pH) are also available; these measurements are taken

Figure 1. Data structure.

Figure 2. Decomposition of the three-way X matrix by MPCA.

at the end of each batch, for a few variables, m ) 1, 2, ..., M. The quality space will thus be summarized by the Y (I × M) matrix. Generally, the X and the Y matrixes are mean-centered and scaled to unit variance. This is done in order to eliminate any effects the variable units may have on analyzing the results. By subtracting the average trajectory, we also eliminate process nonlinearities. Principal component analysis was initially applied to continuous processes where the two-dimensional data matrix comprises J variables and I observations. PCA reduces the dimensionality of the original data structure via orthogonal projection.2 This results in the creation of latent variables which are weighted linear combinations of the original variables. These latent variables, referred to as the principal components or t scores, are generally fewer than the number of original variables. The MPCA technique used for our batch analysis is equivalent to unfolding the three-dimensional array X containing information on multiple measurements during a batch for several variables over several batches. Then we perform a regular PCA on the resultant twodimensional matrix, X, as depicted in Figure 2. The PCA algorithm decomposes the unfolded data matrix, X, into a summation of R products of score vectors, tr, and loading vectors, pr, plus residual matrix, E, which are as small as possible in a least-squares sense: R

X)

∑trpr′ + E r)1

(1)

The elements of the t vector or principal component correspond to a single batch and depict the overall variability of this batch with respect to the other batches in the database throughout the batch duration. The loading vector p provides us with the relative contribution of the process variables and gives a simpler picture of the covariance structure of the data. The score vectors are orthogonal and the loading vectors are

Ind. Eng. Chem. Res., Vol. 37, No. 10, 1998 3973

orthonormal. In essence, this results in the principal components being independent of each other, facilitating further analysis. Usually, a few of the principal components can express most of the variability in the data. Ideally, the dimension R is chosen such that there is no significant process information left in the residuals E. One important step in PCA is the selection of the correct number of components to explain the systematic variability in the data. Since this is a nontrivial task, a number of techniques are reported to do this. A detailed discussion of these techniques is found in Jackson.13 One systematic way of doing this is cross validation, where a part of the data is excluded from the analysis and used later to check the performance of the PCA model. This is an iterative procedure- and user-supplied limits are used to stop the extraction of components.15 The squared prediction error (SPE) is the measure of the model mismatch between a batch and the multivariate model of the historically “good” batches. It is calculated as the sum of squares of the errors between the data and the estimates. The SPE for the ith batch at any given pseudotime k is given by J

SPEik )

(xijk - xˆ ijk)2 ∑ j)1

(2)

SPE charts can be used to monitor the new batches. When the process is “in-control”, the value of SPE should be small. Upper control limits for this statistic can be computed using the historical data.1,2 MPLS, which relates the quality variables with the process variables, simultaneously reduces the dimensions of X and Y spaces. It finds the latent vectors for X that not only explain the variation in the process data but also the variation in X which is most predictive of the variability in the product quality data, Y. The MPLS analysis is built on the NIPALS algorithm,3,16 which is a successive principal component selection procedure. The MPLS model consists of outer relations (X and Y blocks separately) and an inner relation (linking X and Y) as shown in eq 3. MPLS analysis has the following objectives: (1) to approximate the X and Y space and (2) to maximize the correlation between X and Y. The MPLS model accomplishing these objectives can be expressed as

X ) TP′ + E Y ) UQ′ + F

Figure 3. MPLS modeling of data sets X and Y showing the decomposition procedure.

Figure 4. Emulsion polymer process.

reduced spaces in both X and Y that approximate as close as possible the original data, while simultaneously maximizing the covariance between these two reduced spaces. A linear MPLS algorithm is used in the present study; this algorithm computes the P and Q matrixes in such a way that the underlying inner relationship between T and U becomes linear. The advantages of using the MPLS algorithm are that it can handle correlation within both X and Y (unlike multiple regression) and that its results are robust to the noise in the data. Further, it can handle data sets with missing data. The results of the MPLS regression are dependent on the scaling used and should be interpreted with caution. Good discussions on general PLS methods and its mathematical aspects are available in Kourti et al.,7 Kaspar and Ray,14 Lorber et al.,16 Geladi and Kowalski,17 and Hoskuldsson.18 Utilizing this technique, an empirical model is developed from batch data obtained under normal operating conditions. Subsequently, by monitoring only the process variables and projecting them onto the reduced dimensional space defined by MPLS, we monitor the variation in the process variables that are more influential on the product quality variables.7,19

(3) 3. Process Description

U ) cT + H where T is the matrix of scores that summarizes the X variables, P is the matrix of loadings showing the influence of the X variables, U is the matrix of scores that summarizes the Y variables, Q is the matrix of weights expressing the correlation between Y and T (X), c is the constant, and E, F, and H is the matrixes of residuals. Figure 3 summarizes the decomposition of the data matrixes X and Y. The MPLS model geometrically corresponds to fitting a line, plane, or hyperplane to both X and Y data represented as points in a multidimensional space. The objective is to define

For the emulsion polymer product investigated, a simplified process flow diagram is shown in Figure 4. The main chemical reaction takes place in a single batch vessel; posttreating of the product occurs in a downstream processing unit. Raw materials from the premix vessel are first charged to the reactor. To start the polymerization reaction, one or more monomers are fed to the reactor and mixed with initiator. Various sensors provide us with measurements related to the state of the reactor. We utilized measurements only from the reactor for both MPCA and MPLS analysis, as the dotted line in Figure 4 indicates.

3974 Ind. Eng. Chem. Res., Vol. 37, No. 10, 1998

Figure 6. Reactor process variable versus reaction extent. Figure 5. Reactor process variable versus time.

Quality measurements are only available several hours later after the finished products have been transferred out of the reactor. For this product, one of the key quality variables which our customers require to be the most consistent is product viscosity. Batches, whose viscosities are outside our manufacturing specifications, require adjustments, thus incurring processing inefficiencies and additional manufacturing costs. 4. Nature of the Data Forty-six batches based on several months of operating data were utilized as the basis for analysis. Each batch consisted of 106 measurements, taken during the batch duration, for each of the 20 process variables initially considered to potentially affect the process operations. Product viscosity was used as the quality variable. BatchSPC software, originally developed by McMaster Advanced Control Consortium, McMaster University (Hamilton, Ontario, Canada) and enhanced by Air Products, was used for most of the multivariate analysis. 4.1. Data Challenges. Interpretation of the data from this batch process, representative of many at Air Products, posed significant challenges. Most importantly, the reaction duration for each batch differed due to varying degrees of reactor fouling. The data acquisition frequency varied as well. These two aspects of data collection made it impossible to align the raw data from the batches to a common time scale. Additionally, the reaction sensitivity of this process varies over the course of the batch. So a disturbance to the process could yield dramatically different results depending on its timing in the batch sequence. 4.2. Data Pretreatment. To overcome these challenges, the raw data were first preprocessed. Specifically, reaction extent was used instead of time to align the batches to a common scale. This is one key feature of this work; to the best of our knowledge, this use of reaction extent on an industrial process represents a first time application in the published literature. Linear interpolation techniques were utilized for this transformation. To illustrate the point of pseudotime scaling, consider three statistically normal batches which differ only in batch processing time. Figure 5 shows the trajectories of one of the process variables which represents the instantaneous state of the reactor. Batches 10, 30, and 2 represent batches of relatively short, medium, and long duration, respectively, due to varying

degrees of reactor fouling. This figure illustrates the difficulties encountered when comparing batch process trajectories on a temporal basis. Not only do the batches end at different times but each batch follows a unique trajectory after an initial period. If reaction extent is instead used as a comparison basis, all three batch trajectories align for the batch duration as well as reach a common end point as shown in Figure 6. This figure also highlights the subtle differences between the batch trajectories in the initial portion of the reaction that were not apparent in Figure 5. These results confirm that reaction extent, not batch duration, is the proper choice for a comparison basis. In addition to pseudotime scaling, a higher sampling rate of data collection was used from the initial portion of the reaction to account for the increased process sensitivity during this period. Specifically, we recorded data every 0.3 unit of the reaction extent in the initial period; thereafter, we recorded at every 2 units. By using a higher concentration of data in the beginning of the process, we effectively force the MPCA/MPLS models to focus more on the initial portion of the batch. To the best of our knowledge, this feature of our analysis has not been presented in any other related publications. Because of our knowledge of this process, calculated variables such as heat-transfer rate and feed ratios were also used to supplement the original data set. Adding calculated variables to the raw data significantly enriches the MPCA/MPLS models with valuable process information. Statistical models are only as good as the data the user provides for building them. 4.3. Univariate Control Charts. As we will show in subsequent sections, we utilized multivariate analysis methodologies for batch classification, process diagnosis, and improved understanding purposes. To aid rootcause analysis efforts, we found it helpful to develop univariate statistical process control (SPC) charts for each process variable as well. While these charts did not capture the interactions among variables, they permitted us to compare process trajectories of individual variables against a statistical norm. These charts also linked the multidimensional domain which MPCA/MPLS technology is based upon with their univariate counterparts, which most of us are more familiar with. On the basis of the multivariate analysis, we selected a 10-batch subset of the 43 statistically normal batches to develop the univariate control charts. These 10 batches were selected such that they spanned the normal variability of the first 3 principal components.

Ind. Eng. Chem. Res., Vol. 37, No. 10, 1998 3975

Figure 7. MPCA as a batch classification tool. Figure 9. Contribution plot for batch 47 at observation 5 (20variable model).

Figure 8. SPE plot for batch 47 (20-variable model).

5. Process Analysis 5.1. Batch Classification. MPCA technology was first used as a batch classification tool to identify statistically similar batches from potentially abnormal ones. Figure 7, which plots the t scores of the first two principal components, represents a typical batch classification chart indicating potentially abnormal batches. Based on this initial examination of the data and other statistical measures, namely, SPE charts, a group of batches was selected which represented the normal range of variability for this process. The group of normal batches lies within the cluster as shown in Figure 7. On further investigation, we found that batch 16 experienced an obvious equipment malfunction. The differences between batches 29 and 45 from the cluster were more subtle in that they had an unusually large degree of variability of one particular ingredient. This analysis illustrates the ease by which MPCA technology facilitates batch outlier detection by monitoring a few multivariate charts instead of 20 univariate SPC charts, especially when the process variables are highly correlated. 5.2. Model Development for Process Monitoring. An initial multivariate statistical (MPCA) model was developed based on the cluster of statistically normal batches shown in Figure 7. Based on cross validation, we selected six principal components which explained 57% of the overall variability of this process. We then tested the model against a new batch (no. 47) that was not included in the original model building set. In Figure 8, the SPE plot for batch 47 indicates that some detectable event differentiated this batch at the outset as compared to the historical norm. To ascertain the likely cause of this apparent abnormality, we interrogated the initial model at observation 5 and obtained a contribution plot as shown in Figure 9. This plot shows the relative contributions of the 20 process variables to the overall model error at this particular instant. We chose this observation since it was obvious from Figure 8 (i.e., the large values for the

Figure 10. MPCA as a process monitoring tool.

prediction error compared to the model’s confidence limits) that something had profoundly affected the process early into the batch. Figure 9 illustrates that variable 10, one of the variables associated with the reactor’s cooling system, was the likely cause of this batch’s abnormality. Upon closer examination, we noticed that a shift in some of the variables related to the reactor’s cooling system did indeed occur between batches 46 and 47; this shift was evident in several subsequent batches. On the basis of our process knowledge, we concluded, however, that this variable played no significant role in the variability of this product’s viscosity. We also noticed that batch 47 appeared to be normal in every other way and also produced an acceptable product. Including variable 10 in our multivariate statistical model would have actually produced a “false positive” for this batch and all others produced in this fashion. Such a process model would not have been well received in our manufacturing environment. We then examined the relative importance of the 20 variables in the model and eliminated other variables based on the following criteria: insignificant contribution to the process variability and process changes which we determined to have no impact on the final product quality. The final MPCA model was then based on a reduced data set comprising 13 variables and 6 principal components; this model explained 67% of the overall process variability. The resulting multivariate control charts were then used for off-line monitoring of subsequent batches upon their completion. Figure 10 shows the corresponding process monitoring chart for batch 47. Based on the SPE statistic for our revised model, it is evident that batch 47 remained in statistical control for the duration of the batch. From this example, it is apparent that one needs to judiciously select the process variables that are to be

3976 Ind. Eng. Chem. Res., Vol. 37, No. 10, 1998

Figure 12. Univariate control chart of the ingredient B trajectory for batch 51.

Figure 13. Predicted versus actual viscosity using the MPLS model.

Figure 11. MPCA as a fault detection tool: SPE and contribution plots for batch 51.

included in any sort of statistical model. Thus, a familiarity of both the underlying emulsion polymerization chemistry and the manufacturing process is required to successfully apply these multivariate techniques. 5.3. Fault Detection. In industrial practice, classical univariate control charts of the quality variables are generally used to detect out-of-control situations. The process engineers then attempt to find assignable causes based on their prior experience and inspection of the process variables. However, MPCA control charts provide us with a much greater capability of fault detection when an unusual event is observed. Figure 11 depicts the fault detection and diagnostic capabilities of MPCA techniques for an abnormal batch (no. 51). The SPE plot shows that the process was out of control, especially during the initial portion of the batch. The contribution plot of the SPE was generated by interrogating the model at observation 8. This plot reveals to us the variables most likely responsible for this process upset. From the relative magnitude of this bar graph, we found that ingredient B (variable 5) was mostly responsible for batch 51 going out-of-control early on in the batch. The corresponding univariate SPC chart for this ingredient is shown in Figure 12. This result expedited the detection of the root cause, which was traced to a likely concentration mischarge of another ingredient. The viscosity of this batch was eventually found to be outside our manufacturing specifications. Such “contribution plots” can be obtained by analyzing the MPCA model at the instant an upset is detected.

These plots may not explicitly reveal the cause of the event, but they point to the group of process variables that are no longer consistent with normal operating conditions. This focuses the attention of the operators/ engineers and allow them to use process knowledge to deduce the possible causes. In all cases thus far, MPCA technology has been able to (1) detect potential process abnormalities, (2) determine the time an abnormal event occurred, and (3) indicate the likely variable or variables which caused the abnormality. Furthermore, fault detection is both simpler and easier when one monitors a few multivariate, instead of numerous univariate, control charts. 6. Product Quality Analysis While the preceding MPCA analysis deals with just the process variables, MPLS methodology examines both the process and the product quality variables with the intent of developing an empirical model relating the two. Using the entire domain of past statistically similar batches, we developed a MPLS model relating the process variable trajectories to product viscosity. 6.1. Product Quality Prediction. The MPLS model, based on all normal batches, was able to predict the product viscosity for batches that were used to generate the model and for subsequently produced batches having trajectories similar to those in the historical data set. Figure 13, a plot between the predicted versus the observed product viscosity, shows the predictive capability of this model. The viscosity predictions fall within the error tolerance of the viscosity measurement. The model predictions are unreliable for batches that are statistically dissimilar to those used to generate the MPLS model. The current variability in this product’s viscosity occasionally exceeds its sales specifications. When an

Ind. Eng. Chem. Res., Vol. 37, No. 10, 1998 3977

Figure 14. Process contribution to product viscosity variability based on the MPLS model (C, calculated variables; HR, heatremoval variables).

off-spec batch is produced, prime storage tank space must then be reallocated to hold this material for later adjustment or disposal. The value in using this MPLS model for quality prediction is that information regarding the final product viscosity is now available several hours earlier than actual product quality measurements. More timely detection of off-spec batches leads to an improvement in our work-flow practices. These benefits include more efficient storage tank utilization and more responsive adjustments to production schedules, if necessary, to satisfy customer demands. 6.2. Improved Process Insight. By interrogating the MPLS model, we can also find the process variables which contribute primarily to the variation in product quality. A Pareto diagram indicating the relative contribution of each process variable to the final product viscosity is shown in Figure 14. This figure shows that variations in the ingredient A and heat-removal-related variables, HR, are primarily associated with viscosity variability. The calculated variables, C, in Figure 14 were derived from a prior knowledge of the effect of the process variables on the product performance. The relative contribution of these calculated variables to product viscosity signifies the important role process knowledge plays in determining product performance. Figure 14 demonstrates one additional advantage that multivariate methodologies offer over their univariate counterparts. Traditionally, plant personnel utilized just the reactor agitator load signal to estimate final product viscosity. This signal, by itself, does not always provide accurate viscosity predictions. Figure 14 shows that the combination of the first 10 process variables provides more information than just the agitator signal alone. In fact, the agitator load is near the bottom of the pareto diagram list. Utilizing empirical techniques coupled with process knowledge also results in more accurate viscosity predictions. In addition to the contribution plots, one may examine the plot relating the scores of the reduced process space to the corresponding scores of the quality space, commonly known as a t versus u plot. This plot for the first principal component is shown in Figure 15 for all 43 statistically normal batches. Similar plots can be generated for subsequent principal components. The parity line indicates a linear fit. Although there is some scatter, this plot indicates that the underlying relationship between the scores is linear. Thus, a linear MPLS model structure is sufficient to explain most of the correlation between the process and quality variables. Figure 15 also provides us with further process insight. This product is made in a multiproduct batch reactor. Depending upon customer demand, this product is made at two different positions in our product

Figure 15. Quality space scores versus process space scores plot.

sequence, which naturally results in somewhat different reactor processing conditions. In Figure 15, batches are classified into these two: A denotes one position in our product sequence and B denotes the other. It clearly indicates that (1) the two different sets of operating conditions produce two different clusters and (2) the trajectories of the available process measurements contain enough information content to model the overall variability in viscosity, as noted by the linear fit in Figure 15. We suspected that these conditions may contribute to viscosity variability but, due to a variety of confounding factors, were unable to easily identify the magnitude of their effect. Upon further investigation, we noted that operating condition A produces a lower viscosity product; operating condition B produces a higher viscosity product. When viewing the data in this fashion, it became apparent that to further understand the effect of processing conditions on product viscosity, we needed to develop two MPLS models, one for each subcluster. Interrogation of these separate models identified the true nature of the product quality variability within a cluster. With this further process insight, we examined variables primarily associated with reducing the number of (1) low-viscosity batches produced under processing condition A and (2) high-viscosity batches produced under condition B. This analysis resulted in recommended changes to the existing control strategy to produce a product with more consistent viscosity. It should be noted that the overall MPLS model provided us with information regarding the variability from one cluster of operating conditions to another. It was necessary to develop separate models to identify the variables associated with viscosity variability within a cluster. While the preceding analysis was focused on factors affecting the final product viscosity, we can again utilize this technology to estimate the outcome of a new batch from measurements obtained at any point in time during a run. Figure 16 illustrates this point with a representative batch from each of the subclusters of Figure 15. For each point in time in Figure 16, we do the following: (1) collect process measurements and input them into the MPLS model and (2) run the model and obtain the predicted viscosity at the end of the batch. Continuing this procedure throughout the batch duration yields the final viscosity trajectory (shown with confidence limits). After the batch is completed, we compare the model prediction with the actual product quality values as shown in Figure 16. Note that the viscosity predictions for both batches in this figure were based upon the MPLS model generated from all statistically normal batches.

3978 Ind. Eng. Chem. Res., Vol. 37, No. 10, 1998

Figure 16. Viscosity estimation profiles for low- and highviscosity batches.

Each batch in Figure 16 produced product that was within specification but at the extreme ends of our viscosity specification. The initial viscosity estimates for each batch start way off from their final values and end up agreeing with the actual values quite well. These typical results indicate that more accurate viscosity predictions are only available near the end of the batch, based on our current measurement technology. This prediction is still several hours earlier than our current work-flow process permits. More importantly, we noted that the one cannot distinguish between a high- or low-viscosity batch, even with multivariate methods, until after some initial reaction portion. Afterward, the predictions diverge to their respective high and low values. This observation has two implications: (1) there are genuine differences in batch processing conditions between the two batches at the outset of the batch that we cannot detect with our current set of recorded variables, or (2) batch differences occur only after some initial common processing period. We are utilizing the information contained in figures such as Figures 14-16 as the basis for increased process insight into this manufacturing process. 7. Implementation Considerations The analysis discussed in this paper utilized actual plant data from past (historical) batches. One of the longer term goals of this effort is to provide our manufacturing personnel with the capabilities to perform batch monitoring as well as product quality prediction. This analysis can be done on batches in progress6 or, at the very least, at the end of each batch. To facilitate on-line/at-line process monitoring via multivariate methods, several challenges need to be addressed. First, the system architecture to extract process data from our control systems must be installed. A simultaneous effort to provide these capabilities is underway in our batch plants as was previously executed for our continuous facilities. Second, process data must be extracted seamlessly from these data historians and preprocessed, if necessary. Third, the ability to run the multivariate analysis software in realtime mode must be provided. Finally, our manufacturing personnel must be trained in the use of these technology tools to yield the benefits that this technology offers. 8. Concluding Remarks Analyzing this batch polymerization process, via multivariate statistical techniques, benefits us in sev-

eral ways. First, these techniques, coupled with process knowledge, provide exceptional process monitoring capability. These methods are capable of detecting and diagnosing process upsets in a faster, more efficient manner than conventional methods. Second, MPLS extensions allow early detection of off-spec batches, thereby improving our work-flow processes. Third, MPLS technology successfully confirmed that the reactor operates in two different modes or states of operation, thereby producing products in two different viscosity ranges. The improved process knowledge garnered by developing and analyzing MPLS models for each mode of operation has led us to recommend how this process can be better controlled. Implementation of these technologies at Air Products is currently underway. The ultimate goal of this effort is to minimize both process and product quality variability, reducing manufacturing cost and providing our customers with an enhanced product consistency. Acknowledgment We acknowledge Cajetan F. Cordeiro, Carlos A. Valenzuela, and Theresa M. O. Liu for their help and advice during this work. A special thanks goes to Professor J. F. MacGregor and Dr. T. Kourti from MACC (McMaster Advanced Control Contortium) for their insightful discussions and BatchSPC software. Finally, we thank the reviewers for their helpful comments. Literature Cited (1) Kourti, T.; MacGregor, J. F. Multivariate SPC Methods for Process and Product Monitoring. J. Qual. Technol. 1996, 8, 409428. (2) Nomikos, P.; MacGregor, J. F. Monitoring of Batch Processes Using Multi-way Principal Component Analysis. AIChE J. 1994, 40, 1361-1375. (3) Wold, S.; Geladi, P.; Esbensen, K.; Ohman, J. Multi-way Principal Components and PLS Analysis. J. Chemometrics 1987, 1, 41-56. (4) Nomikos, P.; MacGregor, J. F. Multivariate SPC Charts for Monitoring Batch Processes. Technometrics 1995, 37, 41-59. (5) Gallagher, N. B.; Wise, B. M.; Stewart, C. W. Application of Multi-way Principal Components Analysis to Nuclear Waste Storage Tank Monitoring. Comput. Chem. Eng. 1996, 20, S739S744. (6) Nomikos, P.; MacGregor, J. F. Multi-way Partial Least Squares in Monitoring Batch Processes. Chemo. Intell. Lab. Sys. 1995, 30, 97-108. (7) Kourti, T.; Nomikos, P.; MacGregor, J. F. Analysis, Monitoring and Fault Diagnosis of Batch Processes Using Multicblock and Multi-way PLS. J. Process Control 1995, 5, 277-284. (8) Albert, S.; Martin, E. B.; Montague, G. A.; Morris, A. J.; Multivariate Statistical Process Control in Batch Process Monitoring. 13th World IFAC Conference, San Francisco, CA, 1996. (9) Kosanovich, K. A.; Dahl, K. S.; Piovoso, M. J. Improved Process Understanding Using Multiway Principal Component Analysis. Ind. Eng. Chem. Res. 1996, 35, 138-146. (10) Dong, D.; McAvoy, T. J. Batch Tracking via Nonlinear Principal Component Analysis. AIChE J. 1996, 26, 2199-2208. (11) Kourti, T.; Lee, J.; MacGregor, J. F. Experiences with Industrial Applications of Projection Methods for Multivariate Statistical Process Control. Comput. Chem. Eng. 1996, 20, S745S750. (12) Neogi, D.; Schlags, C. E. Application of Multivariate Statistical Techniques for Monitoring Emulsion Batch Processes. Proc. Am. Control Conf. 1997, 2, 1177-1181. (13) Jackson, J. E. A User’s Guide to Principal Components; Wiley-Interscience: New York, 1991. (14) Kaspar, M. H.; Ray, W. H. Dynamic PLS Modeling for Process Control. Chem. Eng. Sci. 1993, 48, 3447-3461.

Ind. Eng. Chem. Res., Vol. 37, No. 10, 1998 3979 (15) Wold, S. Cross-Validatory Estimation of Number of Components in Factor and Principal Component Models. Technometrics 1978, 20, 397-405. (16) Lorber, A.; Wangen, L.; Kowalski, B. A Theoretical Foundation for the PLS Algorithm. J. Chemometrics 1987, 1, 19-31. (17) Geladi, P.; Kowalski, B. Partial Least Squares Regression: A Tutorial. Anal. Chim. Acta 1986, 185, 1-17. (18) Hoskuldsson, A. PLS Regression Methods. J. Chemometrics 1988, 2, 211-228.

(19) Kresta, J.; MacGregor, J. F.; Marlin, T. E. Multivariate Statistical Monitoring of Process Operating Performances. Can. J. Chem. Eng. 1991, 69, 35-47.

Received for review April 20, 1998 Revised manuscript received July 13, 1998 Accepted July 29, 1998 IE980243O