A Systematic Methodology for Comparing Batch ... - ACS Publications

Apr 20, 2016 - A Systematic Methodology for Comparing Batch Process Monitoring. Methods: Part I Assessing Detection Strength. Tiago J. Rato,. †,#...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/IECR

A Systematic Methodology for Comparing Batch Process Monitoring Methods: Part IAssessing Detection Strength Tiago J. Rato,†,# Ricardo Rendall,†,# Veronique Gomes,‡ Swee-Teng Chin,§ Leo H. Chiang,§ Pedro M. Saraiva,† and Marco S. Reis*,† †

CIEPQPF, Department of Chemical Engineering, University of Coimbra, Rua Sílvio Lima, 3030-790, Coimbra Portugal CITAB-Centre for the Research and Technology of Agro-Environmental and Biological Sciences, University of Trás-os-Montes e Alto Douro, Vila Real, 5001-801, Portugal § Analytical Tech Center, Dow Chemical Company, Freeport, Texas 77541, United States ‡

ABSTRACT: A significant number of batch process monitoring methods have been proposed since the first groundbreaking approaches were published in the literature, two decades ago. The proper assessment of all the alternatives currently available requires a rigorous and robust assessment framework, in order to assist practitioners in their difficult task of selecting the most adequate approach for the particular situations they face and in the definition of all the optional aspects required, such as the type of preprocessing, infilling, alignment, etc. However, comparison methods currently adopted present several limitations and even some flaws that make the variety of studies available not easily generalizable or, in extreme situations, fundamentally wrong. Without a proper comparison tool decisions are made on a subjective basis and therefore are prone to be at least suboptimal. In this article we present a structured review of comparison methods and figures of merit adopted to assess batch process monitoring approaches, analyzing their strengths and limitations, as well as some common missuses. Furthermore, we propose a comparison and assessment methodology (CAM) and figures of merit that should be adopted in order to circumvent some of the limitations of the current procedures. We begin by addressing, in this article, the analysis of the methods’ “detection strength”. Detection strength regards the ability to correctly detect abnormal situations without resulting in excessive false alarms. We describe in detail how the comparison framework is implemented and illustrate its application in two case studies, encompassing a rich variety of testing scenarios (99 000 batch runs) and monitoring methods (2-way, 3-way, and dynamic methods, amounting to a total of 60 different combinations of methods and their variants). From this analysis, it stands out that the CAM methodology is suitable for comparing different monitoring methods, even when the number of methods and variants is very large. It provides simple, informative, and statisticalbased metrics to assess a given method’s performance and is able to quantify the added value of alternative approaches. When applied to the two case studies considered, 2-way methods (with batch-wise unfolding) combined with missing data (MD) infilling and dynamic time warping (DTW) were found to provide adequate solutions. If a more parsimonious model is required, dynamic methods such as autoregressive principal components analysis (ARPCA) provide good alternatives and Tucker3 with current deviations (CD) infilling also presents an interesting performance for some fault scenarios. In general, no method can claim absolute supremacy in all faulty scenarios.



INTRODUCTION Batch processes are present in virtually all industrial sectors. They are often associated with the production of high addedvalue products and can span different time scales: from seconds to minutes in semiconductor manufacturing, hours in chemical and petrochemical industries, days in pharmaceutical units, and even weeks to years in the food and beverage sector (wine aging being one example of an industrial process with a long batch cycle time). The dominating role played by this type of processes in the last 30 years arises mainly from their higher operational flexibility and production scalability. But, when associated with their intrinsic complexity (nonstationary behavior coupled with additional characteristics such as multistage operations), this otherwise desirable flexibility also © 2016 American Chemical Society

creates plenty of opportunities for undesirable sources of variation to enter the process and interfere with its normal course. There is, therefore, a demand for effective monitoring methodologies specifically designed to handle batch processes (hereafter referred as batch process monitoring approaches, or simply by its initials, BPM). In the context of BPM, there are a significant number of approaches which have been put forward over the years, with some recent acceleration pattern. Since the appearance of the Received: Revised: Accepted: Published: 5342

December 19, 2015 April 20, 2016 April 20, 2016 April 20, 2016 DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research pioneering approaches1,2 based on the unfolding of the 3-way batch data matrix into a 2-way matrix by preserving the batch mode, followed by the application of principal component analysis (PCA) or partial least-squares (PLS), which are still being actively researched and applied,3−9 others have been proposed to address some of their shortcomings, such as the adaptive monitoring scheme proposed in ref 10. However, these alternative approaches, even though able to simplify the model structure, do it to an extent in which it may not be possible to capture properly all the cross- and autocorrelations present in the data.11−13 Dynamic methods, such as dynamic PCA and dynamic PLS, when applied to monitoring batch operations,14−16 fall somewhat in between batch-wise unfolding and variable-wise unfolding: the model is constructed using lagged measurements to account for some of the cross- and autocorrelation, as in batch-wise unfolding, but the unfolding is still made variable-wise. Other classes of methods directly address the 3-way tensor structure of batches × variables × time (with dimensions, I × J × K, respectively) generated in batch processes, using multiway methods,17 namely Tucker3 and PARAFAC.18 Independent component analysis (ICA) has also been applied to online BPM for extracting independent components under non-Gaussian distributions19 and, more recently, kernel methods have also gained some interest for addressing nonlinear effects.19−22 The available literature on BPM is indeed vast and covers many other areas than batch unfolding and modeling. One important topic is data alignment and synchronization, for which several approaches have been proposed, such as the use of indicator variables (IV)2 and dynamic time warping (DTW),23,24 among others.25,26 Another topic regards variable/trajectory scaling and centering,2,27,28 which may also have a significant impact on the methods performance. There are also different ways to compute the monitoring statistics18,29 and their control limits. With all the aforementioned methods and variants, practitioners face today the challenging task of selecting the best methodologies and all the associated configuration options that best suit their own particular processes. In this regard, the technical literature has been much less generous in providing clear guidelines on which approaches to follow and when. Even though some comparison studies have been conducted in the past,11,30−32 we can verify that no clear agreement currently exists on which figures of merit should be adopted to compare the proposed approaches in a rigorous and unbiased way. This lack of agreement and consistency in the way batch process monitoring methods are assessed makes the comparison of the proposed approaches a virtually impossible task, as each author tends to adopt the approach of its preference without necessarily taking into account the requirements that would enable more comprehensive comparison studies. An analysis of the literature shows that the following main categories of comparison paradigms can be found: graphical evaluation, conformance analysis, detection strength, and detection speed. These will be now briefly addressed, along with their main features, advantages, and limitations. Graphical Evaluation. This approach consists of comparing the behavior of the monitoring statistics for the proposed method against that obtained with a reference method. This approach is widely adopted in practice when proposing new methods,15,16,19 usually encompassing a small set of examples where some positive features of the new methods are illustrated (speed of detection, magnitude of the statistics under faulty conditions, etc.). This is a fast and easy approach to present

results of the new method. However, its qualitative nature makes it difficult to reuse the analysis in the future. It also has a case-based nature, and the results are hardly generalizable to different contexts, other than the ones depicted. Another disadvantage is that only a reduced set of methods can be compared in this way. Finally, being graphical, it does not provide a formal way to accommodate batch variability in the assessment of the numerical significance of the performance differences between different methods. Still, it is an acceptable approach to illustrate the way the new method works with the support of some graphical illustrations, as long as the characterization does not end there. Conformance Analysis. Here the purpose is to check whether the monitoring method conforms to certain assumptions or settings established a priori. Most often this analysis involves comparing the observed false alarm rate with that established initially when defining the significance level for the control limits.1 In principle, both should be the same within the applicable uncertainty bounds, if all the hypothesis considered in the derivation of the control limits are valid. However, it turns out that there are usually deviations to such hypothesized scenarios or even simplifications made in the derivations, which lead to tangible differences between the observed false alarm rate and the expected (or theoretical) false alarm rate set a priori by the significance level. This approach is useful for assessing the theoretical consistency of the proposed monitoring schemes, but does not address their detection ability, being therefore of limited value when used in an isolated fashion. Therefore, it is usually adopted in complement to other figures of merit. Detection Strength. One key aspect of monitoring methods is their ability to correctly detect abnormal situations, without incurring in excessive false alarms. We call this aspect detection strength. It is usually characterized by the true positive rate (TPR, also known as true detection rate, TDR33), defined by the ratio between the number of detections in the set of observations corresponding to abnormal operation (called true positives, TP, because they correspond to faulty states) and the total number of observations in such set (i.e., the total number of positives, P, or faulty observations, which must be equal to the sum of the number of correct detections, TP, and the number of missed detections or false negatives, FN).34,35 TPR =

TP TP = P TP + FN

(1)

Sometimes the missed detection rate, also known as overall type II error, is also reported (MDR = 1 − TPR), especially with the purpose of underlining the abnormal events still missed by the methodology.36 The TPR is also known as the method’s sensitivity. Its computation depends on the significance level used to establish the control limits, that is, the false positive rate (FPR, also known as false alarm rate, FAR, or overall type I error), defined by, FPR =

FP FP = N FP + TN

(2)

where FP represents the false positives and N is the total number of negatives (i.e., the sum of false positives, FP, and true negatives, TN). One common mistake found in practice, consists of assuming that the method’s FPR coincides with the theoretical significance level set a priori to compute the control limits. 5343

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

Detection strength and detection speed are usually required to make a thorough assessment of the monitoring performance of a new or existing method. Figure 1 illustrates three scenarios

This value is used as input in the analytical formulas for the control limits, but usually does not lead to exactly the same level of observed FPR, due to the reasons detailed above. Therefore, this quantity should rather be considered as a tuning parameter that should be manipulated in order to achieve a given target for the observed FPR. A good strategy for implementing this tuning task consists in training the monitoring model with a reference normal operating conditions (NOC) data set (the training set) and use a validation data set to compute the observed FPR.37 Failure to set all the observed FPR for the different monitoring schemes under comparison to the same level renders the comparison of the associated TPR biased and therefore incorrect. This approach has the advantage of being quantitative and free from distributional assumptions. The main disadvantages are that it is valid just for a single FPR, which requires a (often cumbersome) process of tuning the control limits for all methods involved; is rather prone to error, if the aspect of the similar FPR is not properly considered; is case dependent (i.e., also depends on the particular data set analyzed); and it does not measure performance variations caused by the process natural variability and randomness. Another approach used in the context of assessing the detection strength, which does incorporate some features of the natural variability and randomness of the process, consists in computing the p-value for the change in the level of the monitoring statistics associated with abnormal situations.18 However, this approach is seldom used in practice, as it is strongly dependent on distributional assumptions, which are often complex to obtain and not accurate enough to make reliable computations. Detection Speed. The other key aspect for characterizing the performance of process monitoring techniques regards the speed in signaling an abnormality, after it occurs. This responsiveness is usually assessed through the number of observations (runs) that, on average, the method takes to detect an abnormal event of a given type and magnitude, i.e., the average detection delay. The figure of merit usually adopted for characterizing the detection speed is the average run length (ARL), or the related quantity average time to signal (ATS).32,34,38−41 These two quantities are related by ATS = ARL·h, where h is the sampling interval. The most common type of perturbation is the step change, but others can also be adopted, such as ramps, sinusoids, the level of autocorrelation, etc.34 These performance metrics are intuitive and have found wide acceptance in practice. However, some criticisms have also been raised. First, it requires that all methods under comparison show the same ARL(0), that is, ARL in the absence of a fault. This requirement implies that, once again, all the techniques must be preliminarily tuned and the analysis is only valid for a given ARL(0). Second, the variance of the ARL is very high (of the order of magnitude of the mean) and so the average is a weak summary of the spread of values that typically can be found in the conditions tested. Furthermore, for continuous stationary processes one may assume ergodicity and the speed of response of the monitoring method is, under this hypothesis, independent of the particular time where the fault occurs. However, in batch processes, faults can occur at different stages and in different phases of the batch and, given the complexity and nonstationary nature of this type of processes, the run length may depend on the time where the fault happens. Some faults may be readily detected if they occur in specific stages but may be difficult to detect or undetectable in others. The ARL is not able to capture this behavior.

Figure 1. Three scenarios of performance regarding detection strength and detection speed: (a) a monitoring method with high detection strength and speed; (b) a method with a good detection speed but a low detection strength; (c) a method with a low detection speed and a high detection strength.

regarding different combinations of performance regarding detection strength and detection speed: (a) the method shows good performance in both aspects; (b) the detection speed is good (low ARL), but the change in the statistic is rather weak, leading to many missing detections (low TPR); (c) the ARL is high (more delay in detecting a fault), but the TPR is also high, indicating a consistent signaling of an abnormal situation. To circumvent the limitations of the above-mentioned figures of merit, we present in this article a framework for conducting rigorous and robust comparisons among existing batch process monitoring methods or to assess the added value of new approaches and modifications to be proposed in the future. In this context, we put forward figures of merit and comparison procedures that mitigate or circumvent some of the limitations presented by the current approaches. This article is dedicated to the analysis of the monitoring performance in the dimension of “detection strength”. In a sequel publication, we will address the dimension of “detection speed”. This article is structured as follows. In section 2, we propose the figures of merit for detection strength to be used in the comparison and assessment framework and describe in detail 5344

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

Figure 2. Examples of ROC curves and the associated area under the curve (AUC): (a) the case of a perfect classifier; (b) the case of a typical classifier where some samples are misclassified; (c) ROC curve for a classifier that attributes the class label by chance, with a 0.5 probability for each category.

alarms. It is fundamentally related to the capability of a given process monitoring methodology to effectively segregate between observations arising from NOC and those collected under abnormal situations. In this context, process monitoring methods can be interpreted as binary classification approaches that produce, at regular time intervals, one of two possible outcomes concerning the process operation status: {normal, abnormal} or {good, faulty}. More precisely, process monitoring methods fall under the category of the so-called one-class classification methods,43 a particular type of classification approaches where the classification of future samples is to be made, starting from the availability of a preliminary data set containing information for just one of the classes under analysis (in this case, the one regarding normal operation conditions). Recast as a classification problem, process monitoring can then benefit from the methodologies developed to assess the performance of classification algorithms. In this context, the classification performance of binary classifiers can be characterized, in a robust way, through the methodology based on the construction of the ROC curves. In simple terms, a ROC curve is the locus of the TPR vs FPR obtained for a given data set (in the present case, for a given batch). For each significance level (α), the control limits are drawn, and the TPR(α) and FPR(α) pair is obtained. This represent a single point in the ROC curve and therefore in order to obtain the full ROC curve, the significance level is varied from very low values (in which case the FPR and TPR → 0, as in this situation the control limits are too wide and virtually every observation, faulty or not, lies inside the region delimited by them) to very high values (for which FPR and TPR → 1, as the limits would be so low that any observation lies outside them). For a perfect classifier, that is, one that is able to segregate completely the observations from the two classes (i.e., normal/abnormal operation conditions), as soon as the control limit lowers until a point where some false detection occurs, all the existent positives are detected. In terms of the ROC curve this means that, in these circumstances, as soon as FPR > 0 ⇒ TPR = 1. This situation is depicted in Figure 2a, and we can verify that the ROC curve consists of an horizontal line located at 1. For facilitating the assessment and comparison in ROC studies, the ROC curves are summarized by the Area under the Curve (AUC). For the perfect separation case, the AUC reaches the maximum (which is 1). In practice values lower than this limit are likely to be found (Figure 2b), due to some superposition between classes (in the case of process monitoring, the superposition is between observations arising from normal and abnormal situations). The minimum admissible value for the

the methodology followed in it. Then, in section 3 we present the large scale comparison study in which the framework will be applied, which contemplates a wide variety of testing conditions (99 000 batch runs) and a diversity of batch process monitoring methods and associated options (60 different methods and variants). In section 4 we present the results obtained, which are discussed with more detail in section 5. Finally, we conclude with a summary of the main results and contributions of this work in section 6.

2. COMPARISON FRAMEWORK In this section we describe in detail the comparison and assessment methodology (CAM) proposed in this article. We begin with the fundamental aspect of selecting an adequate figure of merit to characterize the detection strength of batch process monitoring methods. Then, we address the methodological aspects of the framework, including the way simulations are conducted and results compiled and processed in order to generate valuable information, namely through the computation of the key performance indicator (KPI), based on which rankings of methods in specific testing scenarios and other performance summaries can be prepared. The proposed approach was designed to circumvent or mitigate some of the disadvantages and limitations of the current approaches. The main features of the proposed comparison framework are as follows: (1) They are based on a quantitative, robust, errorproof and results-oriented figure of merit for the detection strength of batch process monitoring methods: the receiver operating characteristic (ROC) curve.34,42 (2) Even though real world industrial data sets can be incorporated and analyzed, the base CAM framework relies on simulated processes containing realistic components of natural variability, dynamics, nonstationarity, and nonlinearity, among other features. Each case study (a case study regards a different system or process) develops as a pyramid of results corresponding to different types of faults; each type of fault is simulated with different magnitudes and each fault/magnitude is replicated several times. This provides the basis for computing statistically valid comparative inferences about the performance of the methods. (3) The diversity of results is processed according to a welldefined workflow, directly aligned with the relative performance of the methods, such that in the end the results can be properly interpreted by practitioners. 2.1. Baseline Figures of Merit for Assessing the Detection Strength of Batch Process Monitoring Methods. Detection strength regards the ability to correctly detect abnormal situations, without incurring in excessive false 5345

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

current observation with probability 0.5 (i.e., throwing an unbiased coin to perform the classification). Thus, an AUC = 0.5 constitutes the lower bound for admissible performance of a monitoring methodology. 2.2. Comparison and Assessment Methodology (CAM) and Key Performance Indicators (KPI). Having described the fundamental quantity (figure of merit) based on which the comparison and assessment methodology is built (the ROC and the associated AUC), we address now the CAM’s structure and workflow. The framework is composed by five main stages: I. definition of testing conditions and methods/versions to analyze; II. generate training data sets to estimate the monitoring models and testing data sets for all the conditions to be considered; III. run the selected process monitoring approaches over all conditions and replicates and compute the ROC curves and the associated AUCs; IV. process the results and compute KPI; V. analyze the results at different levels of aggregation. In the remainder of this subsection, more details are provided regarding each one of these CAM’s stages. I. Definition of Testing Conditions and Methods/Versions to Analyze. In this first stage, the analyst should clearly specify the scope of the study. This will translate into the selection of the testing conditions and methods to be implemented. In a simulation oriented work, CAM considers the hierarchy of conditions to be defined as depicted in Figure 3.

AUC of a classifier is 0.5 and corresponds to the situation depicted in Figure 2c, where the classification of each sample is made by allocating at random one of the labels to each observation with a probability of 0.5 for each label. In batch process monitoring, the decision of labeling the current observation as normal or abnormal is often made on the basis of the simultaneous analysis of the value of two complementary monitoring statistics, T2 and Q, and the associated control limits (see section 3.1 for more details), each one having an AUC for a given batch. This situation is handled in this paper by considering the maximum of the AUCs obtained with the T2 and Q statistics for a given method 2

i, in situation/batch j: AUCi,j = max {AUCi,jT , AUCi,jQ}. (Note: j is a compounded index, which specifies the particular combination of case study (cs =1:CS)/fault ( f = 1:F)/ magnitude (m = 1:M) and replicate (r = 1:R); that is, j ≡ CS × F × M × R). By using the maximum AUC only the dominant monitoring statistic is considered for characterizing the performance of a method. This is based on the observation that a given fault is usually more easily detected by either the T2 or the Q statistic. As only one of them is required to trigger an alarm, the method can be deemed as good as the best discriminating monitoring statistic, which is conveyed by the maximum AUC. Assessing the detection strength of process monitoring methods using the ROC/AUC methodology brings a number of advantages over other alternative approaches. It avoids the problem of selecting a specific significance level (or false alarm rate) for computing the detection performance, as well as the limitation of using just that one particular value. This methodology also eliminates the misuse of the theoretical significance levels for setting the control limits in the comparison studies, instead of the true, observed significance levels. The last approach, which is the correct one, involves a lengthy tuning process until the same false alarm rates are finally achieved for all methods under comparison, which can be an issue when several methods are being considered. In summary, using the ROC approach has the advantages of extending the analysis to all significance levels (and not just one) and avoiding the tedious tuning of all methods to the same false alarm rate (and the larger data sets with normal operation data that would be required for performing this task in a more rigorous way). It also is error proof and robust to misuse, and is directly related to the object of the studythe ability to separate abnormal from normal observations collected from the process. Another important aspect is that any future work can capitalize on previous comparison studies made on the same data sets, as the AUCs do not depend on the particular significance level considered in the control limits. Therefore, this figure of merit brings efficiency, objectivity, and consistency to the field. In the description of the CAM framework in the following subsection, each batch run or replicate corresponding to a particular combination of case study/fault/magnitude, will give rise to a single value of the AUC, representing the ability of a process monitoring methodology to separate observations arising from abnormal conditions from those collected under normal operation conditions. As mentioned above, the correct interpretation is that the higher the AUC is (the closer to 1) the better is the performance of the monitoring method. On the other hand, a value of 0.5 represents the same performance of a classifier that randomly attributes one of the two classes to the

Figure 3. Hierarchy of testing conditions considered in the CAM framework in a simulation oriented study.

Each case study regards a different system or process, thus with different dynamic models and operating conditions. For each case study (cs = 1:CS), several faults are considered ( f = 1:F(cs)), which are simulated with different levels of magnitude (m = 1:M(cs)). Finally, each fault/magnitude combination is replicated R times. Each replication includes independent realizations of all normal variability sources so that the effect of the process variability can also be accounted when assessing the monitoring methods’ performance. Only through replication it is possible to draw statistically valid inferences regarding the relative performance of the methods. For this reason it was contemplated in the CAM framework. However, this poses a disadvantage in regard to industrial scenarios and data sets where the replication of a batch run is usually infeasible, undermining the ability to estimate the variation of the 5346

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

established based on the AUC obtained in the replicate. This last issue could however be readily solved by taking the average ranking over all replicates. Still, the first limitation remains: the natural variability is not explicitly incorporated in the definition KPI. Score the methods according to their relative performance evaluated through pairwise statistical tests. This approach comprises two steps. In the first step, a paired t test is conducted between all the monitoring methods and their variants under analysis. The test statistic is given by t0 = D̅ /(sD/ R ), where D̅ is the sample average of the difference between the AUCs of two methods over R replicates (for a given combination of conditions cs/f/m), D1, D2, ..., DR, and sD is the sample standard deviation of these differences.44 From this statistical test, we can conclude about the statistical significance of the superiority of one method regarding the other in the testing condition cs/f/m, or if the difference is not significant, in which case they are declared similar in performance (the significance level adopted was 5%). In this way, the natural variability of the process present in the replicates is integrated in the methodology. The second step consists of counting the number of victories, losses, and ties, each method obtained in all pairwise tests conducted, and attribute a score: 2 for each victory, 1 for ties, and 0 for losses. This means that in each testing condition, one method can have a score between 0 (all losses) and 2 × (number of methods − 1) (all wins, except against himself). The procedure of attributing points for wins, ties and losses is similar, in spirit, to competition tournaments, where the winner has better performance in most situations. This procedure is consistent with the aim of comparison studies, which is to determine the best monitoring method, and facilitate the comparison by summarizing the performance of the methods (across case studies, faults, magnitudes) by an overall score, which can also be segmented for a more detailed analysis. As an example, if one were to compare 3 monitoring methods in a given test condition cs/f/m and one method presented a statistical superior performance (assessed by 2 pairwise comparisons), it would receive a score of 4 points. If the remaining 2 methods had similar performances, each one would receive a score of 1 point but if one of them was superior, it would receive 2 points whereas the other would have 0 points. This scoring system is also commensurable as the ranking approach proposed in the previous alternative, but is able to integrate variability in the computation of the KPI. Therefore, the total score obtained for each method in each testing condition (cs/f/m) is the KPI adopted in the comparison study. V. Analyze the Results at Different Levels of Aggregation. The final stage consists in analyzing the KPI at the level of each condition cs/f/m (the replicates level was aggregated into the scores obtained in step IV). Different strategies can now be followed, depending on the interests of the researchers and purpose of the study. They can use different levels of aggregation (by class of methods, by fault types, by magnitudes, etc.) and resort to different tools to perform the aggregation. In section 4, we present the aggregation workflow followed in our comparison study. The tools used were mainly of graphical nature, due to their highly informative and interpretative nature, while still being able to depict the variability of the results obtained. Among these, we have intensively used stratified box plots and multivary charts, among other less used tools.

monitoring methods’ performance. Thus, this stage can only be carried out in a simulation scenario. II. Generate Training Data Sets To Estimate the Monitoring Models and Testing Data Sets for All the Conditions To Be Considered. Here, all the testing conditions initially defined are simulated. This encompasses running a predefined number of NOC batches for estimating the monitoring models, as well as all the hierarchy of testing conditions depicted in Figure 3 (as will be described in the next section, all tested scenarios led to the generation of 99 000 simulated batches in the study there described). III. Run the Selected Process Monitoring Approaches over All Conditions and Replicates and Compute the ROC Curves and the Associated AUCs. In this stage, all of the different combinations of methods and their variants to be compared are implemented to the data sets generated during stage II. This encompasses the computation of the profile for the T2 statistic and the Q statistic for each batch run, data@(cs, f, m, r). Once these profiles are obtained, the associated ROC curves are computed, from which the AUC are finally obtained by numerical integration. As mentioned before, the AUC to be retained for each method corresponds to the maximum of the AUCs obtained with the T2 and the Q statistics. Thus, in the end of this stage, each simulated batch run give rise to an AUC score: data@(cs, f, m, r) → AUC@(cs, f, m, r). IV. Process the Results and Compute KPI. At the end of Stage III we have the figure of merit (AUC) computed for all the testing conditions. This information is available at the replicate level, the finer level of the testing conditions considered in the study (see the pyramid in Figure 3). The AUCs obtained for each method/variant over the set of replicates will now be aggregated to produce better indicators of performance. Different alternatives were considered and compared for producing the KPI: (1) Computing the averages of the AUC obtained in all the replicates for a given condition (cs/f/m). This is perhaps the simplest approach, and allows for smoothing out the diversity of results obtained under similar conditions into a single, more representative figure. However, it presents some drawbacks. First, it does not make an explicit use of the data variability in the analysis (just uses its central tendency). Second, when analyzing different faults, the AUC are likely to be very different. By taking the averages over replicates, they would remain very different. In this context, further aggregation of results (for instance to the level of a given fault type, aggregating over all magnitudes and replicates) would contemplate, under this conditions, highly heterogeneous sets of AUCs, leading to erroneous and misleading average values. A quantity whose scale is consistent in different conditions (namely, different fault type and magnitudes) will therefore be highly preferable. Ranking the methods based on the AUC obtained for each replicate. One approach to make the KPI commensurable, that is, consistent in scale in widely different conditions, is to use rankings (i.e., order statistics). For instance, a method that tends to perform better than the others in widely different conditions would have always similar low values for its ranking, irrespectively of how diverse its AUCs are in such conditions. This is a good feature that indeed enables the aggregation of results from the replicates level to higher levels. However, the variability arising from natural sources of process variation is still not incorporated in the analysis and in fact the information remains at the replicate level, as the rankings of the methods are 5347

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

3. APPLICATION OF THE CAM FRAMEWORK TO A LARGE SCALE COMPARISON STUDY In this section, we describe the elements considered in a comparison study that serves as a realistic illustration of an application of the CAM framework. These elements comprise the batch process monitoring methods and their variants that will be the object of assessment and comparative performance analysis (section 3.1), as well as the testing conditions and simulation procedures providing the raw data for application of the process monitoring methodologies (section 3.2). 3.1. Batch Process Monitoring Approaches Considered. We have considered representative methodologies from the three main classes of BPM methodologies currently in use: 2-way, 3-way, and dynamic methods. All methods were compared under online implementation conditions in the test data sets. A brief overview of these methodologies will be provided in the next subsections, together with details on the way they were implemented in this work. 3.1.1. 2-Way Methods. In the class of 2-way methods, multiway PCA (MPCA)1 was the modeling framework selected for the present comparison study. The main reasons for this choice were its ability to handle simultaneously the cross- and autocorrelation structure of the process, as well as the implicit removal of process nonstationary dynamics through batch-wise unfolding and preprocessing. MPCA is based on the batch-wise unfolding of the 3-way data matrix, X(I × J × K), into a 2-way matrix, X(I × JK). The variables of the unfolded X matrix are then mean centered, and their variance is scaled to unity. This corresponds to a batch-wise normalization, for which the batch mean trajectory (i.e., data nonstationarity) is removed. The remaining variation is then modeled by PCA. As PCA is computed for the entire batch duration, online monitoring taking place while the batch is not complete requires the estimation of future observations. This can be accomplished through one of the infilling strategies described below. In the online implementation of MPCA, the mean and covariance of the scores were modeled as time-dependent quantities. In this case, they were determined through a leaveone-out (LOO) approach, where each batch from the reference NOC data set is iteratively left out from the model building stage, and a model is constructed using the retained batches. The model is then used to monitor online the left out batch, leading to the associated online scores and residuals for each time k. The procedure is repeated until all batches have been left out once. Using the set of scores for each sample time, k, a time-dependent mean and covariance are computed, which are then used during online implementation to normalize the scores. The control limit for the T2 statistic, obtained from the normalized scores, was computed as described in ref 18. Other approaches, such as using an additional set of NOC data, can also be used to estimate the control limits. Ultimately, this would constitute another testing scenario for performance comparison. However, as a large amount of NOC batches might be difficult to obtain in practice, only the LOO approach was considered. In the present study, alternative versions of MPCA were tested. In particular different infilling methods were considered for predicting future measurements. The three approaches originally proposed by Nomikos and MacGregor1 were tested: zero deviation (ZD), which considers that future measurements will not deviate from their mean value; current deviations (CD), that assumes the deviations observed in the current

measurements will remain constant throughout the batch duration; and a missing data imputation approach (MD), namely projection to the model plane (PMP), which uses the ability of PCA to handle missing observations. Additionally, several window lengths were used in the computation of the control limits for the residual Q (SPE) statistic. The classical way of computing the control limits of the Q statistic at time k involves the use of the variable residuals just for time k (window size of 1, WS1). We have also considered the case where the control limits are obtained from Q statistics at times k ± 1 (window size of 3, WS3), and k ± 2 (window size of 5, WS5), in order to improve the estimation of the parameter characterizing the variability at each time (necessary to compute the associated control limits) since no large changes are expected to occur in adjacent times. Following this windowed approach, the reference data set is used in order to compute the Q/SPE statistics at adjacent time instants, k ± L (L = 1,2), fit a distribution to them, and compute the associated (1 − α) × 100 percentile, which will be the value for the control limit at the central time instant, k. This approach is only used for the Q statistic, since it tends to be highly noisy, whereas the T2 statistic is smoother and does not benefit from such procedure. During online implementations, only the instantaneous Q (or SPE) statistic was computed (which is usually significantly more sensitive than the overall Q/SPE). The control limits for the Q/SPE statistic were obtained by using a gχ2h distribution to approximate the distribution of the squared prediction errors at each time point.1 3.1.2. 3-Way Methods. In this class of methods, trilinear modeling formalisms are directly applied to the 3-way array, X, thus avoiding the need to perform any preliminary unfolding operation. The methods adopted in the present study are Tucker3 and PARAFAC. Tucker3 was proposed by Ledyard Tucker,45 and consists of decomposing the 3-way array into three orthogonal matrices of loadings (A, B, and C) and a core array (G), plus an additional residual array (E), leading to the following model for an observation arising from batch i, at sample time k, for the jth variable, xijk: P

xijk =

Q

R

∑ ∑ ∑ aipbjqckrgpqr + eijk p=1 q=1 r=1

(3)

where p, q, and r are the indices of the model components regarding the first (p = 1, ..., P), second (q = 1, ..., Q) and third modes (r = 1, ..., R), and aip, bjq, and ckr are the elements of the loadings matrices; gpqr is an element of the G array and eijk is an element of the array residual (E). The Tucker3 model has rotational freedom (ambiguity), meaning that it does not provide unique solutions (factors can be rotated without penalizing the model fit), unless additional conditions are imposed. PARAFAC18 was independently developed in 1970 by Caroll and Chang,46 under the heading of canonical decomposition, and by Harshman,47 with the name of parallel factor analysis. It decomposes the X array into three loadings matrices, with general elements ain, bjn, and ckn, plus the array of residuals, with elements, eijk, in the following way: N

xijk =

∑ ainbjnckn + eijk n=1

5348

(4) DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research where n is the index for the model components (n = 1, ..., N). This 3-way methodology differs from Tucker3 mainly by its reduced model flexibility (absence of core array and equal number of components for all the loading matrices). However, it has the advantage of not requiring any additional constraints in order to obtain unique solutions (as in Tucker3). Both 3-way methods require data preprocessing. However, this issue is more involved for 3-way arrays X, than for 2-way matrices.17,48 The number of ways to consider is higher and the impact of an incorrect selection can be even more critical to the quality of the final results. In this context, there are several different ways of centering and/or scaling 3-way arrays, depending on the analysis purposes and ways involved.17,48 In the current work, centering the first mode (batch mode) and scaling the third mode (time mode), was the strategy adopted. As for the 2-way methods, the online implementation of 3-way approaches also requires the estimation of future values. In this context, and following the same procedures described for the 2way methods, the same data infilling methods were adopted for predicting future measurements (ZD, CD, MD) and the same windows of data were contemplated in the definition of the statistical control limits for the Q/SPE statistic. Also, a LOO approach was adopted for estimating the monitoring parameters for computing the T2 and Q statistics and the associated control limits. 3.1.3. Dynamic Methods. Regarding the class of dynamic methodologies, we have considered two alternative approaches proposed in the literature. One of such approaches, called batch dynamic PCA (BDPCA), was proposed by Chen and Liu15 and extends the dynamic PCA (DPCA) method of Ku et al.49 to BPM. As in DPCA, this methodology is based on the construction of an extended data matrix of time-shifted replicates for each batch. That is, for batch i, the extended data matrix, X̃ (i), with L lags is constructed as, X̃ (i) = [X(0)

X(1)

X(L)]

...

predictions of the current measurements. Afterward, the residuals are determined and organized in a variable-wise fashion, prior to PCA analysis. As for the 2-way and 3-way methods, a LOO approach was followed to compute the time-dependent mean and covariance of the scores for both the BDPCA and ARPCA methodologies. These quantities can then be used to locally normalize the scores, resulting in a T2 statistic with a constant control limit. The original T2 statistic (i.e., without local normalization of the scores) was also considered for monitoring. In this case, the control limits were set based on the gχ2h approximation of the distribution of the T2 obtained from the LOO procedure. The same approximation was employed for deriving the limits of the Q statistic. Since the BDPCA procedure is a direct extension of the continuous DPCA scheme of Ku et al.,49 it can be easily adapted to a new batch monitoring procedure based on Dynamic PCA with decorrelated residuals (DPCA-DR).37 In this case, the DPCA model can be constructed on the basis of S(pool) (see eq 7). This method consists of assuming that, in a first stage, the current observations are missing. Then, based on past data and the available PCA model, their estimates are obtained. This essentially amounts to a one-step-ahead prediction of the scores (t)̂ and the associated observations in the original X space, and can be readily carried out through application of a missing data imputation technique (in this study trimmed score regression was employed).50,51 Then, by integrating in a second stage the observations that are actually available at the current time, the process state can be monitored by the following statistics:

X̃ (i)T X̃ (i) K−L−1

(6)

Afterward, the sample covariance matrices of all batches are combined into a pooled sample covariance matrix estimate, as follows: I

S(pool) =

(K − L − 1) ∑i = 1 S(i) I (K − L )

(8)

2 Tres = r TS−res1r = (x − Pt)̂ T S−res1(x − Pt)̂

(9)

where Sprev is the sample covariance matrix of the difference ̂ and Sres is between the observed and estimated scores, (t − t ), the sample covariance matrix of the residuals in the ̂ reconstructed data (r = x − Pt ). These monitoring statistics are expected to present low autocorrelation levels and good detection abilities. Although this approach shares some similarities with ARPCA, namely the fact that both procedures monitor the residuals of one-stepahead predictions, they do this in a very different manner. While ARPCA requires a construction of a multivariate time series model using a different modeling formalism (PLS), DPCA-DR does it implicitly within the same DPCA model. To determine the number of lags it is recommended to use of the method proposed by Rato and Reis,52 which gives more accurate estimates of the lagged structure of the process than the lag selection method proposed by Ku et al.49 The control limits for the DPCA-DR monitoring statistics are determined by approximating their distribution by a gχ2h distribution of the LOO statistics. 3.1.4. Synchronization Strategies. Besides comparing different types of monitoring methodologies, this study also contemplates the assessment of the impact of batch synchronization on their performance. In this regard, González-Martı ́nez et al.53 demonstrated that synchronization can play a relevant role in model development and fault detection. The authors also point out that even batches with the same duration may require alignment, and batches with different characteristics may call for different alignment

(5)

where X(l) is an ((K − L) × J) matrix of variables shifted l times into the past. Thus, the sample covariance matrix of batch i is computed as S(i) =

2 1 Tprev = (t‐t)̂ T S−prev (t‐t)̂

(7)

Given S(pool), the standard PCA model can be obtained, as well as its monitoring statistics. As this procedure uses time-shifted replicates of the variables, it can only be employed after the first L + 1 observations become available. Moreover, this method also requires the selection of the number of lags to construct the PCA model, for which the method of Ku et al.49 has been employed so far. Even though Chen and Liu15 did not mention any particular normalization of the data, the use of a batch-wise normalization is recommended.14 The second approach considered in our comparison study is autoregressive PCA (ARPCA), proposed by Choi et al.16 In simple terms, this method fits a multivariate autoregressive model to the data and then uses it to compute one-step-ahead 5349

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research strategies. In this study, two synchronization approaches were considered for performing online batch alignment: indicator variable (IV) and dynamic time warping (DTW). The first methodology makes use of an indicator variable that replaces time as the reference of batch evolution.54 The monitored variables are then indexed relatively to the values assumed by the IV. As necessary conditions for IV, the variable must represent the mechanism of the batch, progress monotonically (increasing or decreasing) in the full duration of the batch and cover the entire operational range of interest. Furthermore, it should also have the same start and end values for all the batches.1,54 Usual candidates for IVs, presenting these characteristics, are variables such as the conversion of a component or the amount of substance fed to a process. After selecting an appropriate IV, the monitoring data matrix is reconstructed by linear interpolation of the original measurements at the time stamps in which the IV reaches some prespecified values (IV levels). The use of linear interpolation might however cause the addition of artifacts or even break the correlation structure of the data. Therefore, the IV and its evaluation levels should be chosen carefully in order to minimize such effects. To this end, the IV levels were set to match the trajectory of a reference in-control batch, thus preserving the IV nonlinear growth. After IV alignment, the analysis follows the usual workflow, the subsequent stages being, in order, preprocessing (after unfolding if necessary), modeling, detection and diagnosis. The second synchronization approach adopted is based on dynamic time warping (DTW). DTW has its origins in the area of speech recognition and was introduced as a batch synchronization tool by Kassidas et al.23 The main feature of DTW is its ability to displace, compress, or expand the monitored batch trajectories (Xm), taking as reference a given batch (Xr), in order to match specific events. The warping process proceeds by finding the best path, through the time grid of both batches, that minimizes the total distance between the batches’ measurements.23 The resulting nonlinear path is thus a function, mapping the time stamps in which similar events occur. A detailed description of the mathematics underlying DTW can be found in ref 23. The univariate version of DTW uses a reference variable to find the optimal path between the batches to be aligned, which is then applied to the remaining variables. As a compromise between multivariate DTW complexity and univariate DTW lack of generality, a PCA-based DTW was employed in this study. As in the univariate DTW, the proposed approach finds the optimal warping path by aligning a single variable. However, to make this alignment variable as representative as possible, it is taken as the first score of the principal components analysis. This latent variable explains most of the variability within the batch and is here computed following a variable-wise PCA approach.27 By using variable-wise PCA, a local metric of the batch’s maturity is obtained, rather than its overall performance, as would be the case of batch-wise PCA. The DTW code implemented by Tomasi et al.55 (available at http://www. models.kvl.dk/dtw_cow) is then used to perform online alignment and to obtain the associated mapping path. This is done by matching the time series of the main latent variable of the monitored batch with the respective vector of the reference batch. Afterward, the same alignment path is applied to all variables, following the procedure described in Table 1. 3.1.5. Summary of the BPM Methodologies Tested. Table 2 summarizes all the methodologies analyzed in this study. A

Table 1. Pseudo-code for Applying DTW in the Latent Variable Subspace of the Batch Modeling 1. unfold the three-way X (I batches × J variables × K time samples) reference data set in a variable-wise way, resulting in a X (I·K × J) matrix; 2. autoscale X using the columns grand mean and standard deviation; 3. select a reference batch Xr and compute its first latent variable over time, Tr; Monitoring 4. for each new observation of the monitoring batch, Xm, compute its first latent variable until the current time, Tm; 5. Use univariate DTW to find the optimal path that maps the time stamps of Tm on the time stamps of Tr; 6. Apply the path obtained in step 5 to all variables in Xm.

total of 60 different variants of batch process monitoring methods were contemplated. This provides an interesting dimension to test the proposed CAM framework (to the best of the authors present knowledge, it is the highest number of methods contemplated in a single study of batch process monitoring methods). 3.2. Testing Scenarios. In this section, a brief description of the two simulated case studies is provided as well as an account of the type of faults introduced for testing the different monitoring approaches. These systems generated the data used later on to implement all the methods under comparison. The variety of scenarios tested (case studies, fault types, magnitudes, replicates) aims at providing the necessary diversity to evaluate the performance, consistency, and robustness of the different monitoring approaches under comparison, as it is impossible to foresee which faults will actually occur in real world industrial implementations. 3.2.1. Case Study 1: PENSIM. The first simulated system56 implements a mathematical model from a fed-batch fermentation process for the production of penicillin (PENSIM, version 2.0, developed at the Illinois Institute of Technology and available at http://www.chee.iit.edu/~cinar/). The PENSIM simulator is a well-known testing system used in several batch process monitoring studies.19,57−60 It generates profiles exhibiting several realistic features, such as nonstationarity, nonlinearity, multistage behavior, noise, and natural process variation. Furthermore, it allows for full control of the operations conditions, normal or abnormal, and has the capability for simulating several types of faults, with different magnitude, at specific times, which enables the computation of accurate figures of merit for the monitoring approaches under analysis. The model has five input (manipulated) variables, nine process related variables, and five quality variables.56 For monitoring purposes, the nine variables considered in this study were aeration rate, substrate feed temperature, dissolved oxygen concentration, culture volume, CO2 concentration, pH, bioreactor temperature, generated heat, and cooling/heating water flow rate. The duration of each batch was set to 200 h with a sampling interval of 0.5 h. The preculture stage lasts for about 45 h, and the fed-batch stage has a duration of approximately 155 h. More details about the simulation conditions can be found elsewhere.56 A total of 200 batches representing normal operation conditions were simulated for training the monitoring methodologies. Three types of step faults were simulated reflecting abnormal situations: aeration rate (fault 1), agitator power (fault 2), and substrate feed rate (fault 3), with six different magnitudes each, from the lower (magnitude 1) to the highest (magnitude 6). Each combination 5350

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

Table 2. Summary Table with the Different Variants of Methods Tested in the Comparison Study. Also Shown, Is the Nomenclature Adopted in the Results Section modeling approach

representative of the class

synchronization

2-way (2W) 3-way (3W) dynamic (DYN)

MPCA PARAFAC Tucker3 ARPCA BDPCA

none, none, none, none,

IV, IV, IV, IV,

DTW DTW DTW DTW

infilling approach

window sizea WS1, WS3, WS5

versions of dynamic methods

number of versions in the line

ZD, CD, MD ZD, CD, MD ORIG, NORMb ORIG, NORM, DPCA-DR

total a

27 18 6 9 60

For computing the control limits of the instantaneous Q statistic. bWithout scores normalization (ORIG) and with scores normalization (NORM).

of fault/magnitude was replicated 50 times. Table 3 summarizes the testing conditions generated in the PENSIM case study.

Table 4. Summary of the Testing Conditions Considered for the SEMIEX Case Studya

Table 3. Summary of the Testing Conditions Considered for the PENSIM Case Studya fault 1 2 3 total

description step fault in the aeration rate step fault in the agitator power step fault in the substrate feed rate

magnitudes small-high (6 levels) small-high (6 levels) small-high (6 levels)

fault 1

replicates

total in line

50

300

2

50

300

3

50

300

4

900 5 total

a

The number of NOC batches in the reference data set used to train the monitoring methods is 200.

description

magnitudes

replicates

total in line

step decrease in the inlet flow rate of reactant B fault in the inlet flow sensor for reactant B lag in the response of the control signal fault in the sensor measuring the reactor temperature reactor leakage

small, medium, high

50

150

small, medium, high

50

150

small, medium, high

50

150

small, medium, high

50

150

small, medium, high

50

150 750

a

The number of NOC batches in the reference data set used to train the monitoring methods is 200.

3.2.2. Case Study 2: SEMIEX. The second simulated system consists of a semibatch reactor equipped with a cooling jacket, where an exothermic second order reaction takes place (A + B → C).61 The reactor is initially charged with reactant A and is continuously fed with a constant flow of reactant B. The temperature inside the reactor is controlled with resort to a PID controller, manipulating the flow of cooling fluid circulating in the jacket in order to maintain the reactor temperature at approximately 25 °C. To mimic real process variability, a variety of noise patterns were simulated: Gaussian noise affects all reaction kinetics and heat transfer parameters, while most temperatures and flow rates are subject to autoregressive drifting patterns. For monitoring purposes, seven variables were considered: the reactor volume, the concentration of reactants A and B, the concentration of product C, the reactor temperature, the inlet temperature, and flow rate of the cooling fluid. A total of 200 NOC batches were simulated for model building and 50 faulty batch replicates were simulated for each combination of fault type and magnitude. Five different faults were studied to reflect abnormal situations found in practice: a step decrease in the inlet flow rate of reactant B (fault 1); a fault in the inlet flow sensor for reactant B (fault 2); a lag in the response of the control signal (fault 3); a fault in the sensor measuring the reactor temperature (fault 4); and finally a reactor leakage (fault 5). For each fault, three magnitudes were simulated: small, medium, and high. Table 4 summarizes the testing conditions generated in the SEMIEX case study.

procedure detailed before (see section 2), the KPI was computed (stage IV of CAM). We recall that the proposed KPI is the score of the methods based on their relative performance in each testing condition (case study/fault/ magnitude) evaluated through pairwise statistical tests using the 50 replicates available. As 60 different combinations of methods are considered, each particular combination will receive a score ranging from 0 (worst performance−all losses) to 118 (best performance−all victories) in each particular test condition case study/fault/magnitude (as explained in section 2.2). This leads to 1980 values of KPIs for the different methods in all the test conditions, based on which Stage V of the CAM will be conductedanalyze the results at dif ferent levels of aggregation. The amount of information generated until stage V prevents any attempt to perform direct comparisons, even when restricted to one particular test scenario, where 60 different combinations of methods still have to be handled and analyzed. Thus, some level of aggregation must be used in order to extract useful information and overall trends. This was the approach followed in Stage V, where different levels of aggregation were employed, which turned out to be adequate to extract general trends on the performance of the methods and useful insights to consider in the selection of BPM schemes. Each case study will be treated separately, to investigate which results are consistent and which vary and therefore are more clearly case-dependent. The aggregation of results follows a top-down approach: first the analysis is centered at the level of the modeling approaches (2-way, 3-way, and dynamic; this is called the intermethods aggregation level), and then evolves to finer details and features of each particular modeling methodology (intramethods aggregation level).

4. RESULTS The 60 methods and their variants described in section 3.1 were applied to the 1 650 testing data sets (900 + 750) presented in section 3.2, leading to 99 000 values of AUCs (60 × 1650 = 99 000). This represents a challenging context for application of the proposed CAM framework. Following the 5351

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

Figure 4. Score (KPI) distribution stratified by class of modeling approach for the SEMIEX process (a) and the PENSIM process (b). Higher scores are related to higher performance.

Figure 5. Multivary chart for the performance scores, stratified by class of BPM models and synchronization: (a) SEMIEX process and (b) PENSIM process. Higher scores are related to higher performance.

Figure 6. Multivary chart for the performance scores, stratified by class of BPM methods and type of process fault: (a) SEMIEX process and (b) PENSIM process. Higher scores are related to higher performance.

4.1. Aggregation Level: Intermethods. The distribution of the KPI or performance scores (not to be confused with the latent variable scores of the process monitoring approaches; see section 2.2) obtained by 2-way, 3-way, and dynamic methods for both the PENSIM and SEMIEX case studies, are presented in Figure 4. One can observe that for the SEMIEX system, dynamic methods present a small edge over 2-way methods, while 3-way methods tend to perform worse, despite that, for some instances, they do show higher scores. Therefore, some combinations in the group of 3-way methods do not follow the same pattern performance as the majority. In the PENSIM case

study, 2-way methods have a slight advantage, followed by dynamic methods. 3-Way approaches present again the worst performance, once more with some noticeable outliers. As 3way methods present clear outliers in both case studies, further investigation of their origin is required in order to assess if some combinations are systematically better than the others in this class or if the signaled outliers are not the same, resulting from different approaches that locally outperform the others. Figure 4 depicts the general trends across classes of BPM methods but one can also see a high variability within each class leading to a significant overlap of scores. This variability arises 5352

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

Figure 7. Multivary chart for the performance scores, stratified by class of infilling methods and window sizes: (a) SEMIEX process and (b) PENSIM process. Higher scores are related with higher performance.

Figure 8. Multivary chart for the performance scores, stratified by the different 3-way models and infilling options: (a) SEMIEX process and (b) PENSIM process. Higher scores are related with higher performance.

because of the modeling alternatives within each class (e.g., the infilling approaches and window sizes in 2-way models), but also due to other common features across all classes of BPM (e.g., synchronization) and, of course, the different testing scenarios within each case study. Figure 5 presents the multivary charts for the KPI, stratified by class of methodology and synchronization method. This type of charts is very useful for analyzing the main sources of variation of a given quantity. They show the mean of the output quantity (in this case the mean of the performance scores or KPI), within a primary stratifying variable (in this case, the class of modeling approach) and also within another secondary stratification variable nested with the first one (type of synchronization). Synchronization seems to be an important factor for BPM since it is one of the main sources of variability within each class of modeling approach. IV shows poor performance in the PENSIM process and DTW does not perform well in general in the SEMIEX case study. Scores can also be stratified by fault and/or magnitude. After analyzing both stratifications, it was found that faults have a higher contribution to the observed variability when compared to their magnitude. This is an expected result, since different faults represent fundamentally different phenomena affecting the process while different magnitudes only correspond to the extent to which the fault is present. Figure 6 depicts the scores stratified by fault, for both case studies. Analyzing Figure 6a, one can observe that the dispersion of performances found for

the 2-way methods across different faults is smaller when compared to that of the 3-way or dynamic methods. Therefore, the type of fault in the SEMIEX process is critical to the performance of 3-way and dynamic methods, while the 2-way methods seem to be rather consistent in their performance. However, for some faults dynamic methods clearly present the best monitoring performance (faults 3, 4, and 5 in the SEMIEX case study). Within the class of 3-way methods, faults 1 and 2 are easily detected. Faults 3, 4, and 5 are detected more effectively by dynamic methods. For the PENSIM case study, Figure 6b presents more stable results and the type of fault has little effect in the performance. As previously observed, 2-way methods are suitable choices for this process, followed by dynamic methodologies. The above trends suggest that 2-way and dynamic methods should be considered as preferred modeling approaches for monitoring processes with similar characteristics as the two case studies considered. In terms of synchronization methods, “no synchronization” resulted in higher scores than IV or DTW. However, practical scenarios arise in which batches have uneven lengths and synchronization is imperative in order to apply the different BPM models studied. In such cases, the results suggest DTW to be a better approach, presenting similar scores as “no synchronization”. 4.2. Aggregation Level: Intramethods. We analyze now the relative performance of the methodologic variants within each category of modeling approach. 5353

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

Figure 9. Multivary chart for the performance scores, stratified by type of dynamic methods: (a) SEMIEX process and (b) PENSIM process. Higher scores are related to higher performance.

4.2.1. 2-Way Methods. The alternative choices within the class of 2-way methods clearly point to significant differences in BPM performance, as depicted in Figures 4 and 5. Therefore, it is worth assessing if some modeling choices are consistently better for BPM or, on the other hand, if the scores are “randomly” distributed across faults and magnitudes. Figure 7 presents the scores for the class of 2-way methods in both case studies, stratified by infilling method and window size. For the SEMIEX case study, a positive effect is observed when one moves from ZD to CD to MD. An opposite effect is observed for the window size: larger windows lead to lower scores. For the PENSIM case study, on the other hand, the results are more ambiguous: CD is the worst infilling method while ZD and MD present better mean scores. However, ZD shows a rather poor performance when used with a window size of 1. Furthermore, the window size effect observed in the SEMIEX process is not consistent with that in the PENSIM, where choosing a higher window size improves the BPM performance for the ZD and MD options. From the two case studies, the MD infilling option for 2-way methods seems to lead to consistently higher scores. 4.2.2. 3-Way Methods. Figure 8 presents the scores for the 3-way methods, stratified by the type of 3-way model used and the infilling approach. The results are consistent for both case studies and it is clear that Tucker3 should be the preferred modeling choice. In terms of the infilling option, CD tends to be associated with higher performances. From this analysis focused on 3-way methods, an observation can be made regarding the intermethods comparison results previously shown in Figure 4: the outliers detected are the result of a single 3-way approach, namely the one where Tucker3 is combined with CD infilling. Its scores for both case studies are significantly higher when compared with the remaining 3-way options. In fact, this combination stands out as one of the best methods for conducting BPM in the studied processes. 4.2.3. Dynamic Methods. In the class of dynamic methods, Figure 9 presents the performance scores for both case studies, stratified by the dynamic modeling employed and the different normalization options. In the SEMIEX case study, ARPCA modeling leads to higher scores and a little edge is obtained when combined with a scores normalization step. For the PENSIM process, ARPCA is again the best method but the normalization step does not improve the monitoring performance. Thus, for modeling purposes, ARPCA should be the

preferred approach, while BDPCA modeling presents the worst performance across the case studies.

5. DISCUSSION This work proposes a CAM framework and describes its application to a large scale comparison study. The scope of the analysis regards the methods detection strength, a monitoring feature of primary interest in practical applications. The comparison study illustrates the potential of applying the CAM methodology in future studies. New methods can easily be tested in the same conditions and their results readily compared with the remaining approaches getting a direct assessment of their relative positioning in the scenarios tested. Regarding the comparison study conducted to illustrate the application of the CAM framework, it revealed a heterogeneous behavior in terms of BPM performance and pointed to the importance of a careful choice of the modeling approach and type of preprocessing. Some model classes improve their performance significantly when combined with appropriate secondary options. Furthermore, one can also observe that none of the monitoring schemes for BPM is consistently better than the others, always, a finding that is in accordance with results obtained in other studies.62 The best methodologic combination is related with the process’ characteristics and the type of faults. This point has already been stated in the literature by a theoretical analysis of the models properties12 and simulation with a simplified system,63 and was again confirmed here, now with resort to two simulated case studies whose characteristics are closer to conditions found in practice. The choices during BPM modeling should result in a model that is flexible enough to account for the characteristics likely to be found in the process and provide an accurate description of the system. Furthermore, the model should also be parsimonious, avoiding overparametrization and overfitting. In this context, parsimony can be quantified by the number of model parameters, which decreases when a model is more parsimonious. Matching the process and model correlation structure is a fundamental aspect for achieving good monitoring performances, a topic that will be considered in the course of the following discussion. 2-Way models were the more flexible models analyzed: the unfolding step originates a large correlation matrix where all variables’ cross- and autocorrelations are incorporated. This flexibility is an advantage for batches where events across its duration are correlated but can also lead to overfitting due to 5354

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

Figure 10. Distribution of the AUC for the best BPM models for the SEMIEX case study. Higher AUC are related to higher performance.

Figure 11. Distribution of the AUC for the best BPM models for the PENSIM case study. Higher AUC are related to higher performance.

the high number of model parameters, in particular when correlations between distant events are not present. When compared to the other model classes, 2-way methods presented more stable results: their performance in the SEMIEX process was much less dependent on the type of fault, as depicted in Figure 6, although they were not the best methods in general. The SEMIEX process is a rather simple system with only one batch stage, and consequently the model is probably overparameterized. The model has been able to capture the process

main characteristics and was sensitive to the simulated faults. However, fictitious relations were likely to be modeled, decreasing its overall performance. 2-Way models also provided the best performance for the PENSIM case study, which is a more complex system characterized by two distinct stages. The complexity of this system justifies a more flexible model able to cope with the changing dynamics. For both case studies, the MD infilling method led to higher performance and thus this is considered a rather robust choice. Overall, 2-way methods are 5355

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

Figure 12. Distribution of the scores for each synchronization option: (a) SEMIEX process and (b) PENSIM process.

preferred for both case studies and are recommended as a first choice when implementing BPM. 3-Way methods presented results spanning the worst and best scenarios. The mean score was lower than all the remaining classes of BPM models, although the outliers in Figure 4 resulted from a single 3-way approach: Tucker3 with CD infilling. Tucker3 is generally better than PARAFAC for process monitoring, a fact that can be attributed to its higher modeling flexibility,18 which is capable of capturing the process correlation characteristics in a more effective way. When compared against the remaining models, Tucker3 with the CD infilling was also superior in some conditions, which can be a consequence of its parsimonious description of the process when compared to 2-way methods. However, Figure 6 showed that its performance is dependent on the type of fault, implying that some process characteristics were not so well described as they were for the 2-way methods. Overall, 3-way methods did not present robust performance scores and were considered poor choices for the current case studies because of the more inconsistent nature of the results. Dynamic methods resulted in good scores for both case studies. They were the best methods for the SEMIEX case study, a result of matching the simple dynamics of the single stage system with the more parsimonious nature of the model. However, their performance depended on the type of fault. For the PENSIM case study, the flexibility of the model was not enough to account for the dynamics in the different stages of the process and resulted in lower scores. This opens however the possibility to use multiple dynamic models, one for each batch phase. Overall, dynamic methods were considered a good second choice for BPM since they are able to account for the process correlation structure and provide simpler models. To further investigate the performance of the BPM models, a final comparison was made using only the modeling choices that led to good scores for each case study. The results presented regard now the distributions of AUC for each method: in Figure 10 for the SEMIEX case study and in Figure 11 for the PENSIM process. Figure 10 clearly illustrates the variability associated with the model choices and supports the claim that there is no such thing as a “best” BPM method. The monitoring schemes usually have similar AUC values for most faults. For instance, in the SEMIEX process, fault 4 is easily detected, whereas for fault 1 the methods have more difficulties. For the PENSIM process (Figure 11), the same general conclusions can be drawn regarding the set of adequate methods. However, most BPM

models achieve higher AUC values, and the variability is lower. These results illustrate the advantages of using the KPI based on the AUC figure of merit and how different models can be compared using simple plots, such multivary charts and box plots. Synchronization options are also important for the performance of BPM methods. In the case studies analyzed, “no synchronization” was a good option, leading to high scores even though the trajectories were not completely aligned. As previously noted, this option is seldom used in practice since batches usually have different lengths. DTW is then the preferred option. For the PENSIM case study, DTW presented similar scores as “no synchronization” and was significantly better than IV. For the SEMIEX case study, DTW proved to be detrimental in some situations. Again, since the SEMIEX is a simple one stage system, the process variables have an already high degree of synchronization and the observed variability is related with the normal fluctuations around the mean values. Therefore, applying DTW to such conditions is probably distorting the fault patterns when searching for the optimal path in the grid. This distortion may mask the fault and prevents a more effective detection. The general effects of the synchronization method can be observed in Figure 12, where the scores have been grouped by synchronization option. In Figure 12, the negative effect of IV is evident and therefore it is not the most suitable approach for aligning data in these processes. Nevertheless, we would like to point out that this performance can be affected by the selected IV variables, and, in general, is case-dependent. In this study, the volume (for the SEMIEX) and biomass concentration (for the PENSIM) were used as IVs, since they present a consistent and continuous increasing pattern in all of the training batches. These systems have a reduced number of possible IVs, most of them highly correlated with the selected ones. Therefore, no significant performance improvement is expected from a change in the IV variables in these cases studies. Still, in other systems, the IV approach may perform better. As a final remark, we would like to point out that, irrespectively of the variety of conditions and methods contemplated in this study, the results are inevitably constrained to the scope of the case studies and methods analyzed. We also underline that the central contribution of this article is not the comparison study itself, but the comparison and assessment methodology followed. 5356

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research

(4) Gallagher, N. B.; Wise, B. M.; Stewart, C. W. Application of multi-way principal components analysis to nuclear waste storage tank monitoring. Comput. Chem. Eng. 1996, 20, S739−S744. (5) Gregersen, L.; Jørgensen, S. B. Supervision of fed-batch fermentations. Chem. Eng. J. 1999, 75 (1), 69−76. (6) Dong, W.; Yao, Y.; Gao, F. Phase Analysis and Identification Method for Multiphase Batch Processes with Partitioning Multi-way Principal Component Analysis (MPCA) Model. Chin. J. Chem. Eng. 2012, 20 (6), 1121−1127. (7) Zhaomin, L.; Qingchao, J.; Xuefeng, Y. Batch Process Monitoring Based on Multisubspace Multiway Principal Component Analysis and Time-Series Bayesian Inference. Ind. Eng. Chem. Res. 2014, 53 (15), 6457−6466. (8) Ge, Z.; Song, Z.; Gao, F. Review of Recent Research on DataBased Process Monitoring. Ind. Eng. Chem. Res. 2013, 52 (10), 3543− 3562. (9) Westad, F.; Gidskehaug, L.; Swarbrick, B.; Flåten, G. R. Assumption free modeling and monitoring of batch processes. Chemom. Intell. Lab. Syst. 2015, 149 (B), 66−72. (10) Rännar, S.; MacGregor, J. F.; Wold, S. Adaptive batch monitoring using hierarchical PCA. Chemom. Intell. Lab. Syst. 1998, 41 (1), 73−81. (11) Westerhuis, J. A.; Kourti, T.; MacGregor, J. F. Comparing alternative approaches for multivariate statistical analysis of batch process data. J. Chemom. 1999, 13 (3−4), 397−413. (12) Camacho, J.; Pico, J.; Ferrer, A. Bilinear modelling of batch processes. Part I: theoretical discussion. J. Chemom. 2008, 22 (5), 299−308. (13) Kourti, T. Multivariate dynamic data modeling for analysis and statistical process control of batch processes, start-ups and grade transitions. J. Chemom. 2003, 17 (1), 93−109. (14) Van den Kerkhof, P.; Gins, G.; Vanlaer, J.; Van Impe, J. F. M. Dynamic model-based fault diagnosis for (bio)chemical batch processes. Comput. Chem. Eng. 2012, 40, 12−21. (15) Chen, J.; Liu, K.-C. On-line batch process monitoring using dynamic PCA and dynamic PLS models. Chem. Eng. Sci. 2002, 57 (1), 63−75. (16) Choi, S. W.; Morris, J.; Lee, I.-B. Dynamic model-based batch process monitoring. Chem. Eng. Sci. 2008, 63 (3), 622−636. (17) Smilde, A.; Bro, R.; Geladi, P. Multi-way Analysis: Applications in the Chemical Sciences; Wiley: Chichester, UK, 2004. (18) Louwerse, D.; Smilde, A. Multivariate statistical process control of batch processes based on three-way models. Chem. Eng. Sci. 2000, 55 (7), 1225−1235. (19) Yoo, C. K.; Lee, J.-M.; Vanrolleghem, P. A.; Lee, I.-B. On-line monitoring of batch processes using multiway independent component analysis. Chemom. Intell. Lab. Syst. 2004, 71 (2), 151−163. (20) Lee, J.-M.; Yoo, C.; Lee, I.-B. Fault detection of batch processes using multiway kernel principal component analysis. Comput. Chem. Eng. 2004, 28 (9), 1837−1847. (21) Jia, M.; Chu, F.; Wang, F.; Wang, W. On-line batch process monitoring using batch dynamic kernel principal component analysis. Chemom. Intell. Lab. Syst. 2010, 101 (2), 110−122. (22) Cho, J.-H.; Lee, J.-M.; Wook Choi, S.; Lee, D.; Lee, I.-B. Fault identification for process monitoring using kernel principal component analysis. Chem. Eng. Sci. 2005, 60 (1), 279−288. (23) Kassidas, A.; MacGregor, J. F.; Taylor, P. A. Synchronization of batch trajectories using dynamic time warping. AIChE J. 1998, 44 (4), 864−875. (24) González-Martínez, J. M.; Ferrer, A.; Westerhuis, J. A. Real-time synchronization of batch trajectories for on-line multivariate statistical process control using Dynamic Time Warping. Chemom. Intell. Lab. Syst. 2011, 105 (2), 195−206. (25) Fransson, M.; Folestad, S. Real-time alignment of batch process data using COW for on-line process monitoring. Chemom. Intell. Lab. Syst. 2006, 84 (1−2), 56−61. (26) Gins, G.; Van den Kerkhof, P.; Van Impe, J. F. M. Hybrid Derivative Dynamic Time Warping for Online Industrial Batch-End Quality Estimation. Ind. Eng. Chem. Res. 2012, 51 (17), 6071−6084.

6. CONCLUSIONS AND FUTURE WORK In this article, a comparison and assessment methodology (CAM) for batch process monitoring approaches is suggested, which is based on adequate figures of merit and a well-defined workflow. The framework can be used either in the retrospective evaluation of existing methods (as illustrated in this article) or in the analysis of new contributions, where the true added value should be demonstrated in a rigorous, unambiguous, and as extensive as possible way. The proposed framework has in its base robust figures of merit for assessing the methods’ detection strength and detection speed. This article deals with the first dimension of assessment, for which the adopted figure of merit was the area under the ROC curve (AUC). On the basis of this quantity, we propose a procedure to compute the relevant key performance indicators and for conducting the analysis of results at different aggregation levels. The CAM framework was applied, for illustrative purposes, to a large scale comparison study of BPM approaches, comprising two case studies (SEMIEX and PENSIM), each one of them containing several types of faults, with different magnitudes, and replicated 50 times. The study involved 60 methods and their variants. The following remarks can be extracted from the domain of methods and conditions considered. The results suggest that 2-way methods combined with MD infilling and DTW are adequate first choices for the case studies considered. In particular, their performance was robust to most process conditions and also provided the best results on a complex simulated system. After adopting such modeling framework, further alternatives should be pursued: different infilling methods, window sizes, and synchronization methods may be tested in order to assess the potential for improving the monitoring performance. The CAM framework can also assist in this regard by quantifying the improvement or decrease in performance brought by the different modeling alternatives. The evidence found that other modeling formalisms, such as ARPCA with the normalization step option, and Tucker3 with CD infilling, were more sensitive to specific faults, highlight the use of the CAM as a supporting methodology in the construction of synergetic monitoring approaches. Future work will address the analysis of the performance dimension, speed of detection, for which figures of merit suitable for batch process monitoring applications will be proposed and integrated in the CAM framework.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Tel.: +351 239 798 700. Fax: +351 239 798 703. Author Contributions #

T.R. and R.R. have contributed equally to this work.

Notes

The authors declare no competing financial interest.



REFERENCES

(1) Nomikos, P.; MacGregor, J. F. Multivariate SPC Chart for Monitoring Batch Processes. Technometrics 1995, 37 (1), 41−59. (2) Nomikos, P.; MacGregor, J. F. Monitoring Batch Processes Using Multiway Principal Component Analysis. AIChE J. 1994, 40 (8), 1361−1375. (3) Lennox, B.; Montague, G.; Hiden, H.; Kornfeld, G.; Goulding, P. Process monitoring of an industrial fed-batch fermentation. Biotechnol. Bioeng. 2001, 74 (2), 125−135. 5357

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358

Article

Industrial & Engineering Chemistry Research (27) Wold, S.; Kettaneh, N.; Fridén, H.; Holmberg, A. Modelling and diagnostics of batch processes and analogous kinetic experiments. Chemom. Intell. Lab. Syst. 1998, 44 (1−2), 331−340. (28) González-Martínez, J. M.; Camacho, J.; Ferrer, A. Bilinear modeling of batch processes. Part III: parameter stability. J. Chemom. 2014, 28 (1), 10−27. (29) Joe Qin, S. Statistical process monitoring: basics and beyond. J. Chemom. 2003, 17 (8−9), 480−502. (30) García-Muñoz, S.; Kourti, T.; MacGregor, J. F. Model Predictive Monitoring for Batch Processes. Ind. Eng. Chem. Res. 2004, 43 (18), 5929−5941. (31) van Sprang, E. N.; Ramaker, H.-J.; Westerhuis, J. A.; Gurden, S. P.; Smilde, A. K. Critical evaluation of approaches for on-line batch process monitoring. Chem. Eng. Sci. 2002, 57 (18), 3979−3991. (32) Ramaker, H.-J.; van Sprang, E. N.; Westerhuis, J. A.; Smilde, A. K. Fault detection properties of global, local and time evolving models for batch process monitoring. J. Process Control 2005, 15 (7), 799−805. (33) Rato, T. J.; Reis, M. S. Non-causal data-driven monitoring of the process correlation structure: a comparison study with new methods. Comput. Chem. Eng. 2014, 71, 307−322. (34) Reis, M. S.; Bakshi, B. R.; Saraiva, P. M. Multiscale statistical process control using wavelet packets. AIChE J. 2008, 54 (9), 2366− 2378. (35) Yin, S.; Ding, S. X.; Haghani, A.; Hao, H.; Zhang, P. A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process. J. Process Control 2012, 22 (9), 1567−1581. (36) Chiang, L. H.; Russel, E. L.; Braatz, R. D. Fault Detection and Diagnosis in Industrial Systems; Springer-Verlag: London, 2001. (37) Rato, T. J.; Reis, M. S. Fault detection in the Tennessee Eastman benchmark process using dynamic principal components analysis based on decorrelated residuals (DPCA-DR). Chemom. Intell. Lab. Syst. 2013, 125 (15), 101−108. (38) Rato, T. J.; Reis, M. S. On-line process monitoring using local measures of association. Part I: Detection performance. Chemom. Intell. Lab. Syst. 2015, 142, 255−264. (39) Woodall, W. H. Controversies and Contradictions in Statistical Process Control. J. Qual. Technol. 2000, 32 (4), 341−350. (40) Montgomery, D. C. Introduction to Statistical Quality Control, 4th ed.; Wiley: New York, 2001. (41) Kenett, R. S.; Pollak, M. On Assessing the Performance of Sequential Procedures for Detecting a Change. Qual. Reliab. Eng. Int. 2012, 28 (5), 500−507. (42) Van der Heijden, F.; Duin, R. P. W.; De Ridder, D.; Tax, D. M. J. Classification, Parameter Estimation and State Estimation; Wiley: Chichester, 2004. (43) Kittiwachana, S.; Ferreira, D. L. S.; Lloyd, G. R.; Fido, L. A.; Thompson, D. R.; Escott, R. E. A.; Brereton, R. G. One-class classifiers for process monitoring illustrated by the application to online HPLC of a continuous process. J. Chemom. 2010, 24, 96−110. (44) Montgomery, D. C.; Runger, G. C. Applied Statistics and Probability for Engineers; John Wiley & Sons, Inc.: 2003. (45) Tucker, L. R. Some mathematical notes on three-mode factor analysis. Psychometrika 1966, 31 (3), 279−311. (46) Carroll, J. D.; Chang, J. Analysis of individual differences in multidimensional scaling via an n-way generalization of ″eckart-young″ decomposition. Psychometrica 1970, 35, 283−319. (47) Harshman, R. A. Foundations of the PARAFAC procedure: models and condictions for ’an exploratory’ multi-mode factor analysis. UCLA Work. Pap. Phonet. 1970, 16, 1−84. (48) Gurden, S. P.; Westerhuis, J. A.; Bro, R.; Smilde, A. K. A comparison of multiway regression and scaling methods. Chemom. Intell. Lab. Syst. 2001, 59 (1−2), 121−136. (49) Ku, W.; Storer, R. H.; Georgakis, C. Disturbance detection and isolation by dynamic principal component analysis. Chemom. Intell. Lab. Syst. 1995, 30 (1), 179−196. (50) Arteaga, F.; Ferrer, A. Dealing with missing data in MSPC: several methods, different interpretations, some examples. J. Chemom. 2002, 16, 408−418.

(51) Nelson, P. R. C.; Taylor, P. A.; MacGregor, J. F. Missing data methods in PCA and PLS: Score calculations with incomplete observations. Chemom. Intell. Lab. Syst. 1996, 35, 45−65. (52) Rato, T. J.; Reis, M. S. Defining the structure of DPCA models and its impact on process monitoring and prediction activities. Chemom. Intell. Lab. Syst. 2013, 125 (15), 74−86. (53) González-Martínez, J. M.; Vitale, R.; de Noord, O. E.; Ferrer, A. Effect of Synchronization on Bilinear Batch Process Modeling. Ind. Eng. Chem. Res. 2014, 53 (11), 4339−4351. (54) Ü ndey, C.; Ertunç, S.; Ç ınar, A. Online Batch/Fed-Batch Process Performance Monitoring, Quality Prediction, and VariableContribution Analysis for Diagnosis. Ind. Eng. Chem. Res. 2003, 42 (20), 4645−4658. (55) Tomasi, G.; van den Berg, F.; Andersson, C. Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data. J. Chemom. 2004, 18 (5), 231− 241. (56) Birol, G.; Ü ndey, C.; Ç inar, A. A modular simulation package for fed-batch fermentation: penicillin production. Comput. Chem. Eng. 2002, 26 (11), 1553−1565. (57) Li, Y.; Zhang, X. Variable moving windows based non-Gaussian dissimilarity analysis technique for batch processes fault detection and diagnosis. Can. J. Chem. Eng. 2015, 93 (4), 689−707. (58) Guo, H.; Li, H. On-line Batch Process Monitoring with Improved Multi-way Independent Component Analysis. Chin. J. Chem. Eng. 2013, 21 (3), 263−270. (59) Ü ndey, C.; Tatara, E.; Ç ınar, A. Intelligent real-time performance monitoring and quality prediction for batch/fed-batch cultivations. J. Biotechnol. 2004, 108 (1), 61−77. (60) Lee, J. M.; Yoo, C. K.; Lee, I. B. Enhanced process monitoring of fed-batch penicillin cultivation using time-varying and multivariate statistical analysis. J. Biotechnol. 2004, 110 (2), 119−36. (61) Ingham, J.; Dunn, I. J.; Heinzle, E.; Prenosil, J. E.; Snape, J. B. Chemical Engineering Dynamics: An Introduction to Modelling and Computer Simulation; John Wiley & Sons: 2008; Vol. 3. (62) Chiang, L. H.; Leardi, R.; Pell, R. J.; Seasholtz, M. B. Industrial experiences with multivariate statistical analysis of batch process data. Chemom. Intell. Lab. Syst. 2006, 81 (2), 109−119. (63) Camacho, J.; Picó, J.; Ferrer, A. The best approaches in the online monitoring of batch processes based on PCA: Does the modelling structure matter? Anal. Chim. Acta 2009, 642 (1), 59−68.

5358

DOI: 10.1021/acs.iecr.5b04851 Ind. Eng. Chem. Res. 2016, 55, 5342−5358