Statistical Monitoring of Complex Chemical Processes Using Agent

Jan 25, 2010 - tions, and evaluating SPM tool performance on an ongoing basis. A combined monitoring and fault detection environment with agent-based ...
0 downloads 13 Views 4MB Size
5080

Ind. Eng. Chem. Res. 2010, 49, 5080–5093

Statistical Monitoring of Complex Chemical Processes Using Agent-Based Systems Sinem Perk,* Fouad Teymour, and Ali Cinar Illinois Institute of Technology, Department of Chemical and Biological Engineering, Chicago, Illinois 60616

It is highly desirable to have a statistical process monitoring (SPM) system that detects the abnormalities in process operations quickly with as few missed and false alarms as possible while the process operates under various operating conditions. An agent-based combined monitoring and fault detection framework is proposed in this study. In this framework, different SPM techniques compete with and complement each other to enhance detection speed and accuracy. SPM techniques from literature such as principal component analysis (PCA), multiblock PCA (MBPCA), and dynamic PCA (DPCA) techniques are implemented in this agent-based process supervision system. An agent performance assessment and agent management layer provides dynamic adaptation of the supervision system and improves the performance of SPM. The statistical information coming from each of the statistical techniques is summarized through a consensus mechanism. The performance of the agent-based consensus mechanism using different consensus criteria is tested for system disturbances of various magnitudes. The effectiveness of the proposed agent-based framework with different consensus criteria is evaluated based on fault detection times and missed alarm rates and the adaptation of the supervision system is illustrated. Introduction Statistical process monitoring (SPM) techniques have been used in chemical process operations to detect deviations from normal operation caused by variations in process conditions and environmental effects or equipment failure. SPM is part of process supervision activities, which include timely detection of any deviation from normal operation that will affect the endproduct quality, identification of problematic variables in the production scheme, diagnosis of the source cause of the abnormality, and determination of remedies to prevent off-spec production. Complex and distributed chemical manufacturing processes have various modes of operation to produce different product grades and nonlinearities that cause different responses to a specific disturbance depending on the current state of the process. It is important to use the most appropriate SPM and supervision technique for rapid detection of abnormalities and diagnosis of their causes. A hierarchical agent-based system can provide the environment for coordinating the operation of various SPM tools, dynamically changing the weight of SPM tools to emphasize those that have performed better under specific process conditions, and evaluating SPM tool performance on an ongoing basis. A combined monitoring and fault detection environment with agent-based systems has been developed at IIT as a part of a complex framework for monitoring, analysis, diagnosis and control with agent-based systems (MADCABS). MADCABS contains agents that perform tasks such as data preprocessing, process monitoring, fault diagnosis and system identification and control. The paper outlines the structure of this system and illustrates its performance for SPM. Agents are autonomous software entities that enable learning and adaptation in a system through their interactions at various scales, according to their embedded decision-making rules. Agent-based systems are well suited for engineering applications where multiple methods can compete to reach the same objective. Their performances depend on and change with different operating conditions and they have parameters that * To whom correspondence should be addressed. E-mail: [email protected].

need to be tuned for better performance. The agent’s actions are defined by unique methods;agents adapt to changing operating conditions by changing the method’s tuning parameters. Agent-based systems enable the division of complex decision-making processes into hierarchical layers where the agents that reside in different layers communicate with other agents and collaborate according to predefined rules to improve performance. Another set of agents can be developed to evaluate the performance of process supervision agents and identify the techniques (and agents) that work better for specific operating regimes of the process to provide adaptation of the process supervision to perform better for specific states of process operation. The continuous process monitoring and fault detection modules in MADCABS are emphasized in this paper. Monitoring agents performing PCA, MB-PCA,and DPCA techniques have been embedded in the process supervision layer in MADCABS. The effectiveness of these techniques for monitoring chemical processes have been reported in the literature.1-7 In an object-oriented software environment, it is possible to create and distribute these monitoring agents in many ways, such that the process can be monitored locally, regionally,and globally. Each unique monitoring agent, independent of the region that is being monitored, generates two different monitoring statistics (T2 and squared prediction error (SPE),see SPM and Fault Detection Techniques section). The fault detection agents are responsible for warning and triggering promptly other agents in MADCABS, such as the diagnosis agent and the control agents. In the performance management layer in MADCABS, the performances of process supervision agents under different process conditions are evaluated and are recorded for future reference. Metrics such as speed of abnormality detection, false and missed alarms (Type I and Type II errors, respectively) are used. The numbers of false and missed alarms provide important metrics for evaluating the performance of SPM techniques. PCA-based techniques have been shown to be effective for most chemical processes but can be rendered unreliable if they generate too many false alarms and missed alarms. The performance evaluations reveal the success level of supervision

10.1021/ie901368j  2010 American Chemical Society Published on Web 01/25/2010

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

agents in accomplishing their tasks for specific process operation states. On the basis of the performance evaluation results, the successful agents are given a higher priority when they compete with other agents to achieve a certain task and the unsuccessful agents may either retune or restructure themselves to reach their objectives successfully. Hence, adaptation is achieved. The aim is to utilize agent-based systems to bring together several alternative SPM techniques from the literature, create a flexible environment for distributed monitoring and provide a reliable system for automated and adaptive monitoring of complex processes with rapid detection and few false and missed alarm rates. The outline of the paper is as follows: The monitoring and fault detection agent structure in MADCABS, the algorithms of the multivariate monitoring methods from literature that are embedded in MADCABS, layout of the modules in the hierarchical layers, and the collaboration of these agents with other agents in different layers are presented. The consensus criteria that are used in summarizing the information from individual fault detection agents are discussed. The performance of the combined agent-based monitoring and fault detection framework using different consensus criteria are compared in case studies on a simulated continuous stirred tank reactor (CSTR) network, the comparison of results in terms of detection times and missed alarm distributions are provided, and the adaptation of SPM is illustrated. SPM Using Agent-Based Systems Agent-Based Systems. An agent is a software entity that has specific properties and behavioral rules. A proactiVe agent observes its environment and other agents in the environment, acts on the environment according to its defined behavioral rules and can adapt to the changing process conditions automatically according to some predefined criteria. For engineering applications, the rules that are defined in the agent may be the algorithm of a method, such as a PCA agent. The PCA agent will then be responsible for observing and requesting data from the process, performing singular value decomposition (SVD) analysis, and calculating the monitoring confidence limits. The adaptable parameters of the PCA agent would be the time interval for recording the normal operation data and the number of principal components (PCs) used in model building. The PCA agent can automatically decide on the optimum number of PCs by using decision techniques from literature.8-10 The performance of the PCA agent will be observed by a manager agent in the hierarchical agent framework and if the monitoring or fault detection performance is deteriorating in time or becomes worse than competing monitoring agents, such as a DPCA agent, then adaptation is sought. Either the optimum number of PCs are recalculated for the current process data, or the time window in which the data is recorded is updated, such as old data is discarded from the data window as new data becomes available or the size of the data window is changed. Adaptation can occur both through supervised or unsupervised learning algorithm. Agents can decide on the number of PCs using methodologies from the literature and on the data window length based on in-control average run length (ARL) or the optimum values can be provided by the operator to the system. Agent-based systems are also beneficial in various applications where alternative methods can be used to perform a specific task or when there is no prior information about how well a certain method is suited for the task.11 It is difficult to declare one method being the global best since the performances

5081

Figure 1. Interlayer and intralayer information flow in MADCABS.

of the methods might be dependent on the operation states and data. Agent-based systems can provide an adaptive environment where methods can be embedded in unique agents that act simultaneously to achieve their objectives. Tracking the performances of agents under different operating states helps ranking the performances of the agents for those operating states and selecting the best agents for the current operating state. Monitoring and Fault Detection Framework in MADCABS. Multilayered, autonomous, and adaptive multiagent systems provide a powerful environment for supervision and control of complex distributed processes. Techniques and MADCABS software are developed at IIT. MADCABS is a multiagent hierarchical, adaptive, autonomous, distributed decision-making system that automates knowledge extraction from data and analysis. It includes several successful monitoring, diagnosis, and control methodologies from literature and efficiently combines and analyzes information provided by different agents, ranks, and selects the best performing method for the current operating conditions of the system and adapts deteriorating methods to become better by using experience and built-in knowledge.12-14 MADCABS consists of three hierarchical layers (Figure 1). The communication with a real plant or simulator occurs through the physical communication layer. The process supervision agents are located in the process superVision layer. This layer contains modules for preprocessing of data, statistical process monitoring and fault diagnosis, control, and decision making. Each of these modules consists of a number of agents with different techniques that are collaborating with each other during the execution of their tasks. The agent management layer monitors the performance of agents in process supervision layer, rates and ranks their performances, and adjusts the confidence level to an agent on the basis of past performance under similar operating conditions. There are both intralayer and interlayer communications in MADCABS. The communication between the physical communication layer and the process supervision layer starts with the recording of the process sensor readings into a database,

5082

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

Figure 3. Fault detection agents. Figure 2. Monitoring, fault detection, and diagnosis agents acting on process units or subsystems. The local and global monitoring agents, denoted by black circles and ellipses, provide the monitoring statistics for each subsystem denoted by rectangles.

which are then used by the preprocessing agents, statistical monitoring agents, and control and decision agents. The agents within the process supervision layer cooperate and help each other to achieve a certain task together while some agents may compete with each other using different strategies for the accomplishment of a task. The diagnosis information that is provided by the diagnosis agents is shared with the control agents as identified defective sensors or unavailable manipulated variables. The interlayer communication between the process supervision layer and the agent management layer is used for the performance evaluations of the agents. Then the control actions generated by the control agents are mapped to the actuator panel representations in the physical communication layer and sent back to the plant or the simulator through the physical communication layer (Figure 1). MADCABS is written in object-oriented Java language, using Repast Simphony as the agent building platform.15 Among the important features of Repast that MADCABS uses are its objectoriented structure, scheduling tools, and its built-in automated Monte Carlo simulation framework. Repast also allows users to change, add, or delete agents in run time. In Repast Simphony, a context is defined as a container where the agents reside. In MADCABS, there are three contexts for monitoring, fault detection and diagnosis agents, which are named accordingly. The communication between different agents in these contexts are shown for a process that consists of four subsystems is shown in Figure 2. In a distributed monitoring framework, each subsystem is monitored by default unless prior information about further grouping is available. In Figure 2, the black circles are the local, and the black ellipses are the global monitoring agents that provide the monitoring statistics for each subsystem. The monitoring agents for each subsytem build alternative statistical models and calculate the confidence limits that specify normal operation. The fault detection agents, which interpret the monitoring statistics, assign themselves to the fault detection organizer agents responsible from each subsystem. Fault detection organizers evaluate the decisions of the fault detection agents, decide on the existence of an abnormality in the subsystem, and flag a concensus fault according to a consensus criterion. The fault flag triggers the action of a diagnosis agent and the control agents. A diagnosis agent uses information from the neighboring fault detection organizers of the faulty subsystem and also additional statistical models to find the process variables most contributing to the inflation of the monitoring statistics, and investigates the potential reasons behind the fault. Control agents then provide control strategies to keep the process

operation at the desired level despite the fault or a disturbance in the system. In this paper, the structure of the monitoring and fault detection contexts is emphasized and details of these contexts are provided. SPM and Fault Detection Techniques. SPM methodologies rely on a statistical model that is used to detect deviations from the desired normal operation (NO). The statistical model is built using data collected when the process is in normal operation. Most multivariate SPM methods use modeling techniques, such as PCA in building a reliable model of the normal operation.1-7,16,17 The process progression information is summarized by two statistics, SPE and Hotelling’s T2 statistics, generated from the model. Departure of the values of the statistics outside of the confidence limits that define the normal operation suggests that there is unexplained significant variation in the data that might affect product quality. SPE shows deviations from the normal operation (NO) based on variations that are not captured by the PCA model and T2 shows variations within the model. Usually one or both of these statistics will detect the deviation and will raise a flag. However, it is possible to have a statistic raising a fault flag when the operation is normal or failing to raise a fault flag when there is actually a disturbance in the system. These are the shortcomings of a monitoring technique and are called the false and missed alarm rates (type I and type II errors), respectively. It is important for the reliability of the SPM technique to have a low rate of missed and false alarms and to promptly detect the deviations from desired operation. Monitoring Agents. The process monitoring agents use SPM techniques, such as PCA, DPCA, and MBPCA, summarized in the Appendix. Unless otherwise stated, MADCABS will create local monitoring agents for each process unit. However, regional and holistic monitoring can also be done if necessary. For each unit in the process, a PCA model and a DPCA model is built for distributed monitoring. In addition, a multiblock PCA model is built for the whole process, with data blocks coming from different operating units in the process.12 The fault decision is given by fault detection organizer agents that are responsible for each unit. This decentralized approach will utilize three monitoring agents (PCA, DPCA, and MBPCA) and hence six fault detection agents (T2 and SPE for each SPM technique used) for each unit (Figure 3). The consensus decision may be based on various consensus criteria. Four alternative consensus criteria will be presented in this paper and the resulting consensus performance will be evaluated. The monitoring agents that are in the monitoring context are PCAStarter, DPCAStarter, and MultiblockPCAStarter agents and a monitoring organizer. After the system reaches steady state and a sufficient amount of normal operation data are available, the monitoring starter agents are triggered, the statistical models are built and the projection onto the models begins.

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

Figure 4. Fault detection organizer and consensus among six fault detection agents.

As many PCAStarter and DPCAStarter agents as the number of operating units are created. Each PCAStarter agent is a clone. Each PCAStarter agent instance shares the same algorithmic structure but is unique since the data are unique for each unit and hence the PCA properties such as the PC number to be retained in the model. The same commonalities and differences apply to DPCAStarter agents. This is an advantage of objectoriented programming, where the objects can be cloned and each instance inherits the class specific properties from the parent object. In addition to that, each instance may have its own instance properties. Repast Simphony provides the environment for all these objects to reside in and the tools to individually schedule their actions. By default, there is only one MultiblockStarter agent for the whole process, such that each operating unit constitutes a data block. The multiblockStarter forms a single multiblock model, which enables the monitoring of the process both locally and holistically through block statistics and super statistics. Since there is only one model, the number of PCs retained in the model is the same for each block. With multiblock monitoring, it is possible to scale the blocks differently so that, if some of the processing units are more important and require more care, those units can be given more weight in the model. If the process units can be further grouped based on their similarities or for regional monitoring, then it is possible to create as many multiblockPCAStarter agents as each logical subgroup to enhance the effectiveness of monitoring. Each model generates two monitoring statistics for each subsystem, a T2 statistic and an SPE statistic, which are observed by two different fault detection agents. In the end, three monitoring agents generate a total of six fault detection agents for each unit (Figure 3). The fault detection statistics are contained in the fault detection context. The six statistics coming from different monitoring agents for each subsystem being monitored are summarized via a consensus mechanism (Figure 4). The consensus mechanism is built in the fault detection organizer agents. Each fault detection organizer weighs the statistical information according to a predefined consensus criterion and looks for the consensus decision. The information to be summarized is the individual flags raised by each unique fault detection agent. This consensus decision criterion can either be a voting-based criterion, so that the majority is required to flag a fault for the consensus to flag fault or a weight-based criterion. In the latter case, individual fault detection agents can be weighted according to their performances in terms of false and

5083

missed alarms. The agents compare their decisions with the consensus decision to learn if they missed an alarm or have a false positive. Depending on the weight-based scheme being used, the effects of false consensus on the individual performances are subsequently corrected either at the end of the fault episodes or during the fault based on the earliest detection time for the fault. In the following section, four different criteria that are used in fault detection organizers will be presented. Fault Detection Agents. In the distributed framework, where each process unit is monitored locally, there is a fault detection organizer for each unit that keeps track of all the fault flags given by its fault detection agents. Other responsibilities of the fault detection organizer are to declare consensus fault based on a consensus criterion, to keep history of the performances of different fault detection agents under different fault scenarios, and in case of a consensus fault decision, to trigger the diagnosis agent. The fault detection agents compute the fault detection statistics. All fault detection agents responsible from a subsystem assign themselves to the fault detection organizer of that subsystem (Figure 3). If the value of a statistic goes out of limits, the agent flags the existence of an abnormality in process operation. Using agent-based cooperation between different methods that are competing for the same task results in better overall performance than if those methods were used independently.11,18 The use of several monitoring techniques by different monitoring agents in MADCABS provides the opportunity for collaboration in decision-making. By diversity, it is aimed to design an automated fault detection framework that can detect the faults on time and give fewer false and missed alarms than if the monitoring methods were used independently and individually. Consensus Criteria in Fault Detection. There are several criteria to form a consensus among different fault detection agents. The first would be to flag the existence of a fault if the majority of the fault detection agents are flagging the fault. This would necessitate at least three of the six agents (3 T2 and 3 SPE) to flag the fault to declare there is a fault in the unit. In the following sections, this strategy will be referred as the voting-based criterion (VBC). Alternative criteria integrates the accuracy of the agents in detecting an abnormality, such that the total reliability of the combination that flags the existence of a fault should be greater than or equal to the reliabilities of the rest of the agents to declare a consensus fault. There are many ways to evaluate and determine the reliability of fault detection agents. Two different reliability based criteria will be presented here: the reward-based criteria and the time averaged performance-based criteria. The reliabilities of the agents can further be adjusted using the recordings of historical performances. Time averaged performance criterion with history update is presented as the fourth consensus criterion. Voting-Based Criterion. At each time step, a new observation is projected onto the statistical model and the fault detection agents that are interpreting the monitoring statistics decide whether there is a fault or not. If the values of the statistic goes outside the confidence limits, the existence of a fault is flagged. In the voting-based criterion, the number of fault detection agents that raise a fault flag is compared with the number of agents that flag normal. If the strict majority such as “faulty operation flag count > normal operation flag count” is used to indicate the abnormality, detection will be delayed considerably and the missed alarm rate will increase. Using “faulty operation flag count g normal operation flag count” provides faster fault detection since fewer votes are needed to declare a fault.

5084

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

Figure 5. False alarm distribution comparison of the performances individual methods versus the combination of all. Table 1. Missed Alarm Summary of Different Fault Detection Agent Combinations Using VBC (the Average Values of 100 Runs Are Reported)

PCA DPCA MBPCA VBC

sensor fault

process fault

ramp, 8%, time 210-225

ramp, 8%, time 210-225

detection time

missed alarm rate

detection time

missed alarm rate

214.49 214.67 215.4 216.28

5.81 5.69 6.42 6.99

210.97 210.95 210.98 211

0.97 0.95 0.98 1

Using this single consensus criterion, the effectiveness of diversity on the missed and false alarm rate is tested first. One of the targets of bringing together all three SPM methods was to create an automated system with few false alarms and missed alarms. The effect of diversity on the false alarm rate is demonstrated in Figure 5. Individual PCA, DPCA, and MBPCA techniques have been used to model the same process data coming from the CSTR network used in all the case studies in this paper. The statistical models are built at time 200 and projections continue until time 400. The x-axis of the figure represents the total number of false alarms issued during the 200 time ticks. The y-axis is the total number of runs with a given number of false alarms. The process encounters no faults or disturbances and the false alarm rate is calculated for individual agents and for the combination where all of the agents are present using the VBC. The false alarm distributions resulting after 100 runs show that the number of false alarms has been considerably reduced in the consensus case, where all agents were present. For example, PCA has only 1 run with no false alarms and 10 runs with 3 false alarms, while VBC yields 68 runs with no false alarm and 5 runs with 2 false alarms. It is not likely to have the majority of the fault detection agents to be giving a false alarm simultaneously that will result in a false consensus decision. The reduction in the false alarm comes at the expense of the missed alarm rate. Table 1 reveals the average detection times and missed alarm rates when SPM is made by individual agents or by consensus building using all agents via VBC. Waiting for the majority of votes to provide a reliable consensus decision results in a slight delay in detection compared to individual

Table 2. Instantaneous Performance Rewards (and Penalties) agent decision

consensus decision

not faulty faulty

not faulty

faulty

0.5 -1

-0.5 1

agents. However, the overall enhancement mostly on the false alarm rate that resulted from diversity is very important. Another drawback of using VBC is its lack of adaptation. Every time a disturbance enters the process, depending on the influence and magnitude of the disturbance, the fault detection consensus based on voting count will act the same way. However, an agent-based system can do better by weighting the performances of individual fault detection statistics under different scenarios and use these weights next time a consensus decision is required. This way, the SPM system will not act the same way next time a similar disturbance or fault is encountered. It will be prepared to act faster in the succeeding occurrences of the same fault and be more effective in terms of detection times and missed alarm rates. Reward-Based Performance Criterion. Use of the performance weights (voting power) is aimed at increasing the weight of a good technique in consensus decision making and reduce the weight of the less successful method. At each sampling time, new sensor readings are collected and fault detection agents signal either the existence or absence of a fault. On the basis of their decisions and the consensus decision, all agents are given an instantaneous performance reward. The rewarding strategy is designed such that a missed alarm is penalized the most and the correct detection of fault is rewarded the most. The instantaneous performance rewards are given in Table 2, where the rows show the consensus and columns show the individual agent decisions. If the fault detection agent flags faulty operation but the consensus decision is the opposite, the agent is penalized for a false alarm. If the agent does not flag faulty operation but the consensus decision is the existence of abnormal operation, then the agent is penalized for a missed alarm. The awards for different decisions are selected out of many possible combinations. Each agent has an effect on the others through the consensus. The instantaneous performances are summed over time for each detection agent to compute the accumulated performances.

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

The performance reliability of an agent is determined by the accumulated performance values divided by the total accumulated performance value of all agents in that unit. The performance reliability weights are then considered in the consensus decision making. The reliability weight of an agent is calculated as w)

accumulated performance of agent sum of all accumulated performances wtotal )

∑w

i

(1) (2)

i

The consensus declares fault if wtotalfaultflaggingagents g wtotalnormalflaggingagents. For some cases, when the fault is diffusing in the process and affecting the neighboring units, the consensus flag decisions may be oscillatory. Some minor faults cause a similar behavior as well. This oscillation affects the performance mechanism in an undesirable way such that an agent that has been flagging an abnormality correctly in the oscillatory period may not be strong enough at that point to change the consensus status of the flag and it will be penalized for flagging the correct state of the process although the flag was correct. In addition, an insensitive method could be rewarded if it did not flag the fault and affected the consensus in an erroneous way. To prevent these erroneous rewards and penalties, the performances of agents are updated after fault episodes. A fault episode starts when a fault consensus is issued. The episode continues until no fault consensus is flagged for eight consecutive time points. At that point, looking back in history, the performances of agents are updated. If the performance of an agent was reduced because of an erroneous misclassification as false alarm, the rewards are added and the accumulated performance increases. If the performance was rewarded incorrectly, the undeserved rewards are deducted from the accumulated performance at the end of the fault episode. Time-Averaged Performance Criterion. In addition to weighting methods on the basis of predefined rewards and penalties, it is possible to weigh them using some other criterion. An alternative weighting scheme is developed where the performances of the agents are evaluated when they raise a fault flag. When a fault detection agent raises a flag, it might either be a true or false alarm. If after the flag is raised, at the next time step the value of the monitoring statistic goes back within the limits and no other fault detection agent flags a fault and the consensus decision is normal, then it is accepted that the first flag was a false alarm flag. After it is concluded as having given a false alarm, the performance assessment of a fault detection agent for a false alarm is expressed as performance indicatorfalse alarm ) 1 number of false alarms (3) time episode without fault The earliest detection time is recorded when a fault detection agent raises a fault flag for the first time. This record is deleted if it belongs to a false alarm. But if it is a true alarm and the consensus flags the existence of a fault, the earliest detection time among all of agents detection times at the time of consensus alarm is taken as the time the abnormality begins. The performances of the fault detection agents are calculated based on the earliest detection time, so that the agents that were not able to detect the fault at that time are penalized by the amount of lag time of detection. The performances of all fault detection

5085

agents for a missed alarm is assessed at performance indicator value as performance indicatormissed alarm ) 1 number of missed alarms (4) duration of the fault episode For example, if a fault detection agent first alarms the existence of a fault at time 210, and two other agents alarm later at 212, a consensus is formed on the existence of an abnormality at time 212 and the performance assessment of the agents for a missed alarm is performed. The fault detection agent that first recognized the existence of a fault starting at time 210 gets a full performance value of 1 if it did not miss a flag until 212. And the other two agents get credit for correctly raising only 1 fault flag in a fault episode of 3 sampling times. If there are agents that have not raised a fault flag yet at the time of the consensus, they get the lowest performance value of 0. The time averaged performances of the agents are computed by eq 5. performanceavg,n ) (n - 1) × performanceavg,n-1 + performance indicatorn (5) n where n is the number of performance evaluations up to the current time. This criterion yields performance values normalized between 0 and 1. The consensus declares fault in a unit if performancetotalfault flagging agents g performancetotalnormal flagging agents It is possible with this criterion to use scaling factors to weigh the past evaluations differently. The weights here are given as example. In contrast to the monotonously increasing or decreasing accumulated performances of the reward-based consensus criterion, the performance values with this criterion reach a steady state in time. Another difference is that the performances of agents are updated during the fault episodes, which is better than waiting for the fault episode to end as in the previous criterion. Time-Averaged Performance Criterion with Historical Performance Update. It is difficult to know in the design phase of SPM which method would perform better than the others. Practice has shown that some methods and their agents are better suited to specific situations and ineffective for others. Therefore, multiple alternative methods are implemented in MADCABS. The performances of different agents are stored as history for some state metrics that define the situation under which the performances are measured. The performance history is then used as a reference in estimating an agent’s potential next time a similar situation arises. When two agents that can perform the same or similar tasks are available, they are compared based on their historical performances. When an agent is not used for a long time because of its poor performance, it is motivated to update itself and adapt to become better. On the basis of the results of the performance evaluation, the agents update their built-in knowledge, the methods they are using or retune their parameters. The agent performance evaluation occurs in the topmost agent management layer in MADCABS. The details of the performance evaluation is illustrated in Figure 6 for three hypothetical methods A, B, and C. A historical performance space is formed for each competing SPM agent for performance evaluation. The performance is measured and recorded along with the state metrics that define the state of the system when the performance

5086

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

has their own historical space, the performance space is a mapping of the detection power of the statistic against the magnitude shift for that statistic. Additional or alternative state metrics can be used. For the purpose of this paper, an example metric that is relevant for comparison is chosen. For each time tick, when the consensus flags a fault, the performance indicators of the agents calculated using eq 5 are recorded with the current value of the statistic. The performances are recorded after the consensus is already given for that time tick. At the next time tick, before the consensus evaluation, the new magnitudes of the statistics and their individual flags are available for use. The historical performance estimate is retrieved for the new magnitude shifts of the statistics and the performance estimates are used as in eq 6. Figure 6. Illustrative comparison of the performances of three competing methods for a new state.

was observed. The idea is to compare the current state of the system with the recorded states and estimate the performance of the agents for the current state using their performances for similar states in the historical performance space. Figure 6 shows the details of the performance estimation for Method B. The current state of the system is projected onto the history space and for each method the closest points to the current state point are determined along with the previous performances for those points. The method is expected to perform similarly to its previous performance for similar situations. Therefore, a distance measure is used in the estimations of performances for each method for the current state. The agent that gives the highest performance estimation is selected or given a higher priority or reliability weight. The performance evaluation and selection mechanism can be used both for validation and inquiry studies; either the performance of the agent is validated by comparing the current measured performance with the estimated performance or the best performing method for the current situation is selected (inquiry). The state metrics for each evaluation are chosen such that they are differentiating and are relevant for comparison. The state metrics represent the situation when the performances were measured. Some state metrics for fault detection performance evaluation include estimated fault severity or degree of catastrophe, the number of SPM agents that interpret abnormal operation or the degree of inflation in the value of statistics. The performance measurement can be a single criterion or a composite criterion. The performances of agents can be measured during their performance episodes. In some techniques, the performances can be corrected at the end of the performance episodes. The common performance criteria for fault detection agents are the rates of false and missed alarms. The state metrics, the performance criteria and the agents and methods that need to be evaluated should be determined before the performance analysis. All agents update their performance values, reliabilities or priorities after the performance evaluation. Then, a new performance assessment episode begins, after which the historical performance space is updated with the current performances. This cycle repeats itself for each performance episode. For fault detection agents, the performance space is built by using the shift in each statistic’s value as the state metric and the agent’s performance during consensus fault episodes. In other words, if the fault detection agent detects a true fault, it has a high performance and this is kept in the historical space as the agent’s detection performance for the magnitude that corresponds to the shift in the statistic’s value. Since every agent

w) √time average performance × estimated historical performance (6) If the time average performance of a fault detection agent is low because of false alarms but it is performing well in detecting faults, then the historical estimate which is the estimate of the performance under faults will help increase the weight of the agent. This way, the agent will gain power while the consensus decision is given. On the other hand, if the agent’s performance under fault scenarios has always been bad, the estimated historical performance will be low and will further reduce the weight of the agent when the consensus decision is given. This will prevent unreliable agents from clouding the consensus decision. Case Study: Autocatalytic CSTR Network The data are obtained from a simulator of a CSTR network, where three competing species coexist in a CSTR network using the same single resource.19-22 The data are written to a database in MADCABS for use by MADCABS agents. The ordinary differential equations modeling the kinetics are written in C and connected to Repast Simphony through the use of a Java Native Interface (JNI). Reactor networks have various modes of operation to produce different product grades and nonlinearities that cause different responses to a specific disturbance depending on the current steady state of the processes. Reactor networks hosting multiple species show a very complex behavior and provide a good case study.19-22 As the number of steady states of the network increases, autocatalytic species are allowed to exist in the network that would otherwise not exist in a single CSTR.19-22 Each species reproduction and death cycles are represented by the set of isothermal autocatalytic reactions R + 2Pi f 3Pi Pi f Di

(7)

where R is the resource continuously fed to the CSTR and, Di is the so-called dead (or inactive) form of species Pi (i ) 1, 2, 3). The reaction rate for the first reaction k is the species growth rate constant, and kd is the species death rate constant. Each CSTR is interconnected in a rectangular grid network and has an inlet and outlet flow. The feed flow contains pure resource that is consumed by the species in the reactor. Each reactor can host multiple species; however, one species is always dominant over the others in the reactor. The production objective is to produce a desired product grade in the network. The ratios of different grades in the product collected from the network should meet the desired production

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

5087

Figure 7. (4 × 5) CSTR network producing three species.

grade. For a network hosting three species the desired production grade used in the case study is 3:3:4. The feed flow rates and interconnection flow rates are treated as manipulated variables. The resource concentration in each reactor along with species concentrations are also available as variables used for monitoring. Results and Discussion Agent-based monitoring and fault detection framework in MADCABS has been tested using the CSTR network simulation. The network consists of 20 reactors arranged in a (4 × 5) rectangular grid network (Figure 7). The effectiveness of the combined framework is demonstrated when faults were introduced to one of the reactors. This networked system and the distributed multilayered monitoring framework can also be used to investigate more complex problems such as the disturbance propagation or diffusion in the network. The SPM techniques implemented in MADCABS are very effective in detecting faults even when used individually. Using them with agents simultaneously resulted in a considerable reduction in the false alarm rate of the combined system. Votingbased criteria has been very effective in detecting faults. However, the missed alarm rate of the combined system was slightly higher if the majority (>) of votes are required for consensus. Introducing reliability based consensus criteria, such as the reward-based or the time averaged performance based criteria, enhanced the voting power of good agents. The combined monitoring and fault detection framework performed well for ramp process faults or disturbances with magnitudes of 3% and higher. The results provided in the following section are the average results of 100 runs for each scenario. The detection times were very close to the actual

abnormality initiation times. For the case of sensor fault, where a single sensor reading deteriorates in time as a ramp change, the combined system still performed better than if the individual methods were used (Table 1). But as expected, the combined system performance was slower than the combined system performance under process faults. In process faults, abnormality can be recognized as inflations in more than one variable measurement and can be detected promptly. To see the differences between the performance of different consensus criteria, consecutive sensor faults have been tested in one reactor in the network (Figure 8). The sensor that provides the resource concentration measurements (Figure 8 (a)) in the top left corner (Figure 7) reactor (Node 3) is affected. There are six consecutive ramp changes simulated in the time periods 210-225, 235-250, 260-275, 285-300, 310-325 and 335-350 with magnitudes 10%, 4%, 3%, 4%, 8%, and 10%, respectively (Table 3). Magnitudes of faults are selected such that the graphical illustrations can display the number of missing alarms clearly in two different regions indicated by the bimodal shape. Faults 1, 5, and 6 have large magnitudes (8-10%) and result in fewer missed alarms. Faults 2, 3, and 4 have small magnitudes (3-4%) and cause more missed alarms. Fault 1 is repeated later as fault 6, and fault 2 is repeated later as fault 4. In the voting schemes (VBC), where each agent acts independent of the other agents, each time a fault is encountered, the combined system reacts the same way. If the detection of the fault was delayed five time ticks at the first encounter, voting-based consensus will again result in approximately same delay if the same fault is encountered a second time. Figure 9 represents the missed alarm rate during the 100 runs of six consecutive faults. Both the bar charts (upper figure) and the area charts (lower figure) display the number of missed alarms

5088

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

Figure 8. Sensor faults are demonstrated in the top left corner reactor denoted as node 3.(a) Resource concentration and (b-d) three species concentrations in the reactor. Table 3. Six Consecutive Ramp Changes in the Resource Concentration Sensor fault number

time interval

percent increase

1 2 3 4 5 6

210-225 235-250 260-275 285-300 310-325 335-350

10 4 3 4 8 10

cumulatively. For example, for 7 missed alarms in Figure 9, there are 31 cases out of 100 runs for fault 1, 13 cases for fault 5 (cumulative height 44), and 38 cases for fault 6 (cumulative height 82). Since the majority (>) of votes of the fault detection agents is required, abnormality detection by the combined system is delayed and the missed alarm rate for this consensus criterion is the highest. The overlapped response, in terms of both mean and variance, of the combined system to similar faults

in time is represented in the area chart. Areas 1 and 6 show the missed alarm rates for the fault with magnitude 10%. There are around 30 (31 and 38 for fault 1 and 6, respectively) runs out of 100 that had 7 missed alarm flags during the fault episodes of length 16 (210-225 and 335-350). For faults with smaller magnitudes, the missed alarm rate is higher. Especially for the fault with magnitude 3%, the fault is not signaled at all (missed alarm is 16) for 81% of the runs when the consensus criteria is VBC (>). For the consensus criterion where the equality of votes (g) is satisfactory to raise a fault flag (Figure 10), the missed alarm rate is smaller. For the fault with magnitude of 3%, the fault flag is not raised at all for 23% of the runs. There are 20 runs for fault 1, 32 runs for fault 5, and 24 runs for fault 6 that have 7 missed alarms when the consensus criterion is VBC (g). The number of runs that have 6 or less missed alarms has increased and the number of runs that have more than 8 alarms has decreased, when the consensus decision has changed from

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

5089

(>) to (g). There are 4 runs for fault 1 that has 3 missed alarms and 1 run for fault 6. These results show that VBC (g) yielded fewer missed alarms; however, when the system encounters similar faults in time consecutively, the combined system gives almost the same response each time. The effect of the time-averaged performance assessment (Figure 11) and the effect of the historical update (Figure 12) is more pronounced when there are repetitive faults, since the learning by the agents and adaptation are realized. The shift of the area to the smaller missed alarm rates, and smaller occurrence numbers is realized in the weight-based adaptive methods. Figure 11 shows the shift to smaller values of missed alarm when the faults are encountered for the second time. Missed alarm rate is 7 for 17 runs and is 6 for 27 runs for the fault magnitude of 10% when the fault is encountered the second time (fault 6, darkest gray bar and area), whereas the rates were 24 and 29 for the first encounter of the same fault (fault 1, black bars and area), respectively. The missed alarm rate of 16 in a total of 136 runs for VBC (>) (Figure 9) and 29 runs for VBC (g) (Figure 10) decreases to 20 runs for the time-averaged performance criterion (Figure 11) and further to 11 with historical update (Figure 12). The historical update reduces the number of missed alarms further for all faults as the fault is encountered again after the system has learned it and adapted itself (Figure 12). The missed alarm rate is 7 for 4 runs and is 6 for 23 runs for the fault magnitude of 10% when the fault is encountered the second time (fault 6), whereas the rates were 24 and 29 for the first

Figure 10. Voting-based criteria (g).

Figure 9. Voting-based criteria (>).

encounter of the same fault (fault 1), respectively. The number of missed alarms decrease overall when the performance values are updated with historical performance values. Historical performance space is very helpful in decreasing the weight of the unreliable fault detection agents in the consensus decisionmaking. This reduces the delay in detection and results in fewer missed alarms overall. The consensus average detection time delays in 100 runs are given in Table 4, and the consensus average missed alarm rate is given in Table 5. The smallest delays and the lowest number of missed alarms are shown in bold. The tables provide the average results, but for these runs, it is also beneficial to investigate the distributions to see the shift in the missed alarm rate toward left to the smaller numbers of missed alarms with the introduction of weight-based consensus and also historical updating. The average delays of consensus detection is inspected to realize the learning of agents for the performance-based criterion. The voting based criterion (>or g) performs similarly for consecutive faults of the same type since there is no learning and adaptation. When the fault 2 with magnitude 4% is first encountered, the detection time is delayed for 10.34 and 13.17 sampling times for the voting based (>) and (g) schemes, respectively. When the same magnitude fault is encountered for the second time (fault 4), these numbers are 10.81 and 13.28. Because of the stochastic nature of the process, the numbers will not be exactly the same, but similar. The combined response is not better the second time than the response when the fault was first encountered. This is observed for the faults with magnitudes of 10% as well.

5090

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

Figure 11. Time-averaged performance criteria.

For the performance based criterion, the effects of weighting which induces learning and adaptation resulted in shorter delays in detection. When the system is affected by the same fault for the second time, the detection times are noticeably shorter. The weighting scheme that considers the historical performances of agents yielded the shortest detection times of all. The detection delay of 4.35 is shorter compared to 5.61, which is the delay when the same fault was first encountered for the same criteria, and also shorter than 4.98 and 5.23, which are the delays when the consensus is formed based on VBC and not the individual performances. A similar reduction is also observed for fault 2 (9.51) and its recurrence as fault 4 (8.86). The rate of missed alarms (Table 5) reveal a similar outcome. The criterion with historical performance assessment resulted with the least missed alarms for all the faults. When the same fault is encountered the second time, the agents performances and the reliabilities have been updated and the combined system promptly and accurately detects the abnormality. The missed alarm rate of 4.92 (fault 6) is less than 5.89 (fault 1) when the fault is occurred for the first time and also less than 5.65 and 5.75 rates which belong to VBC (g), where there is no learning and adaptation. At the first encounter of a fault in the system, VBC (g) criterion yields the least detection delay than any other criterion. In the performance based TAPC with history scheme, the performances of each fault detection agent might have been affected from the false alarms that were given during the normal operation before the fault and this effect on the performances is realized in a slightly longer delay in detecting the first fault. During the progress of fault, the performances of the agents are updated based on each true positive and false negative and

Figure 12. Time-averaged performance criteria with historical update.

the performance based schemes become more effective than VBC (g) in time. With the incorporation of the historical performances of the agents under disturbances of various magnitudes to the consensus decision, TAPC with history scheme outperforms other criteria in time and when the same disturbance is encountered again. Each consensus criterion is tested on independent data sets with random variations. The missed alarm rates given in Table 5 for different consensus criteria are average values of 100 runs for each criterion. To show the degree of variation due to noise in these values, the average missed alarm values of three sample sets each containing 30 random samples from VBC (>), VBC (g), and TAPC with history criteria are illustrated in Figure 13. Despite the randomness and noise in the data sets, there is a distinct separation between the distributions of VBC(>), VBC (g), and TAPC with history criteria, the latter yielding the best missed alarm rate of all. In chemical processes, prompt detection of an abnormality, namely, sensitivity, is desired for the reduction of overall production costs and for timely correction of the disturbance before it is spread across the whole process. For every statistical test, there is a trade-off between sensitivity and specificity. The sensitivity of the consensus decision increases when the TAPC with history scheme is used. The effect of this increased sensitivity on the specificity is investigated using VBC (g) and TAPC with history criteria. The consensus false alarm rates encountered during the sensor fault case study given in Table 3 are plotted in Figure 14. The number of false alarms was recorded in a time interval of 200-350 during which the ramp changes given in Table 3 were in effect. TAPC with history criterion gives no false alarms for 50 runs out of 100 runs and

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

5091

Table 4. Average Consensus Detection Delays (10% (210-225), 4% (235-250), 3% (260-275), 4% (285-300), 8% (310-325), 10% (335-350))

VBC (g) VBC (>) reward-based performance (>) time-averaged performance time-averaged performance with history

fault 1

fault 2

fault 3

fault 4

fault 5

fault 6

4.98 6.97 6.56 5.56 5.61

10.34 13.17 10.64 10.20 9.51

12.66 15.37 12.92 12.16 10.97

10.81 13.28 10.57 9.67 8.86

6.18 7.78 6.05 5.81 5.07

5.23 6.65 5.22 4.99 4.35

Table 5. Average Consensus Missed Alarm Rate (10%, 4%, 3%, 4%, 8%, 10%)

VBC (g) VBC (>) reward-based performance (>) time-averaged performance time-averaged performance with history

fault 1

fault 2

fault 3

fault 4

fault 5

fault 6

5.65 7.58 6.99 5.93 5.89

12.64 14.64 12.70 12.32 12.05

14.55 15.75 14.69 14.18 13.74

12.94 14.59 12.80 12.22 11.46

6.87 8.93 6.96 6.61 6.19

5.75 7.30 5.77 5.43 4.92

yields 1 false alarm in 26 runs out of 100 runs, and the maximum number of false alarms is 4, which is given in only 2 runs. VBC (g) gives no false alarms in 82 runs out of 100 and 1 alarm for 16 runs, but it is inferior to TAPC with history criterion in terms of sensitivity in fault detection as shown by the reduction in the rate of missed alarms and detection delay. This is expected since TAPC with history criterion gives a heavier weight on detection techniques with higher sensitivity. As part of the activities of the Abnormal Situation Management Consortium (ASM) led by Honeywell in 1992, a hybrid,

distributed, multiple-expert-based blackboard framework called Dkit was developed for fault diagnosis. Multiple diagnosis methods worked in a blackboard architecture without having to know about each other. This setup required a lot of expertise about the process and, hence, frequent human machine interaction in the development of the causal models and analysis of the system; therefore, it could not be fully automated.23 In the proposed systematic and adaptive multiagent monitoring and fault detection framework, different SPM techniques worked together through a consensus scheme to enhance fault detection speed and accuracy. The consensus criteria can be modified to meet the demands or challenges of a specific process. The overall system dynamically evaluates and updates agents performances in different fault scenarios, automatically identifies the fault detection agents with higher performance values and by giving more weight to their decisions in the consensus, yields prompt and accurate fault detection. Prior information on the effectiveness of the SPM is not required and the human machine interaction is minimum. Conclusions

Figure 13. Effect of variations in noise on the missed alarm rates for different sample sets for VBC(>), VBC(g), and TAPC with history criteria. (a) VBC (>) vs VBC(g); (b) VBC(g) vs TAPC with history.

In this paper, it has been demonstrated that the effectiveness of an automated monitoring and fault detection framework can be improved using agent-based systems. The agent performance evaluation and management layer creates an adaptive system that can learn from past experience to improve dynamically the performance of the process monitoring and fault detection system. The resulting combined supervision system yields fewer rates of false and missed alarms using different consensus criteria for faults with various magnitudes. MADCABS provides a unique environment for integrating the agent performance evaluation and management with process supervision to create an adaptive system that can learn from the process and adjust its performance over time. Acknowledgment This work is supported by the National Science Foundation Grant CTS-0325378 of the ITR program. Appendix.

Figure 14. False alarm rate comparison for VBC (g) and TAPC with history criteria.

PCA. For SPM using PCA, the reference PCA model is built for an autoscaled process data matrix X with dimensions of I samples and J variables. For a good model, the data used in the building should be collected under normal operation and free of outlier measurements. PCA uses SVD to determine the eigenvectors of the data covariance matrix. The first R principal component (PC) directions from the loadings matrix P are

5092

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010

selected based on the variance explained by each PC.3,17,24 The model scores T are calculated by the projection of measurements onto the R-dimensional space provided by the loadings as ˆ ) X - TPT T ) XP and E ) X - X

(8)

The model scores T and the model residuals E are used for the calculation of the monitoring statistics and the statistical process control limits. Statistical prediction error (SPE) and Hotelling’s T2 are the two common multivariate monitoring statistics.1-4,24 When a new J-dimensional observation vector xnew is available, it is autoscaled using the same parameters as the reference model and projected onto the model plane formed by model loadings P and new scores vector tnew (R × 1) are calculated as tnew ) PTxnew and enew ) xnew - Ptnew

(9)

SPE and Hotelling’s T2 statistics are calculated for the new observation vector and compared to the statistical control limits of the model. If they exceed the corresponding limits, this observation is declared to be out-of-control. The Hotelling’s T2 is calculated for the new observation as 2 T Tnew ) tnew Λ-1 R tnew

(10)

where tnew is the score vector for the new observation and ΛR is the (R × R) diagonal covariance matrix for the first R elements. T2 statistic follows an F-distribution with R, I - R degrees of freedom for data that follow a multivariate Normal distribution, or a Beta distribution with R/2,(J - R)/2 degrees of freedom. For an R confidence level, the T2 control limit is given by T2R )

R(I2 - 1) F (I - R)I R,I-R,R

(11)

SPE show deviations from NO based on variations that are not captured by the PCA model. SPE statistic is calculated for the new observations as in eq 12, where enew is the J-dimensional residual vector for the new observation.

h0 ) 1 -

∑ (e

2 new(j))

SPEb,new ) ||xb,new - TPTb || 2 ) ||eb,new ||

gSPE ) b

hSPE ) b

Statistical limits for the SPE statistic are computed with the assumption that the data have a multivariate Normal distribution. The critical value is calculated using

(

QR ) θ 1 1 +

θ2h0(h0 - 1) θ21

+ zR

√2θ2h20 θ1

)

1/h0

(13)

θ values are calculated using the eigenvalues, λ, of the covariance matrix, that are not included in the model. The other parameter, h0 is calculated using θs as in eq 15 and z is the standard normal variable corresponding to the upper (1 - R) percentile and zR has the same mathematical sign as h0.1 J

θi )



j)R+1

λij

for i ) 1, 2, and 3

(14)

(15)

(16)

The confidence limit for the block SPE statistic to be used in 2 SPE fault detection can be derived using δ2 ) gSPE b χR(hb ) for a SPE given confidence (1 - R), where gSPE and h are given in5 as b b

(12)

j)1

3θ22

PCA partitions the variable space into a principal component subspace, which is defined by the model, and a residual space. The residual space usually contains noise, that is normally uncorrelated and SPE statistics and limits are used to control the residual space. The common cause variability in the principal component subspace is better tracked with the T2 statistic. If any of these statistics go beyond their respective control limits during the projection of a new observation, this indicates an unexplained significant variation in the process, and its source cause should be diagnosed in order to take corrective action. MBPCA. Multiblock PCA method has been used in literature for processes where there are multiple operating units with large amounts of data. It has been demonstrated in literature that multiblock methods localized the fault in large processes, and improved the false and missed alarm rates compared to using a single PCA model for the whole process.5,6,25 In SPM using multiblock PCA, the process data are divided into B blocks X ){X1, ..., XB} such as each block data Xb is a (I × mb) matrix of I observations of mb variables. Consensus PCA (CPCA) algorithm using nonlinear iterative partial least squares (NIPALS) method is given in refs 5, 6, and 25. In MBPCA, a single model is built using the data coming from B different blocks. It is possible to weigh the information coming from each block separately if there is prior information about their importance over the other blocks. A decentralized multiblock monitoring approach detects and diagnoses a fault for each block using the block SPE and block Hotelling’s T2 statistics together with the super SPE and super T2 statistics of the whole process. This provides both local monitoring at the block level and holistic monitoring at the super T T T level. For a new observation xnew )[x1, new...xB, new] the block SPE statistic is calculated using

J

T SPEnew ) enew enew )

2θ1θ3

˜ P˜Tb )2} tr{(P˜bΛ ˜ P˜Tb } tr{P˜bΛ ˜ P˜Tb })2 (tr{P˜bΛ ˜ P˜Tb )2} tr{(P˜bΛ

(17)

(18)

with PCA loadings defined as in [P|P˜] where Pb is (J × R) and ˜ ) P˜b is (J × (J - R)) and the residual eigenvalues are Λ diag{λR+1...λJ }. χR2 is the Chi-squared variable with R degrees of freedom.5 If SPEb > δb2, the variables in the bth block are affected by the fault. Similarly, the block Hotelling’s T2 statistic can be calculated as T2b ) tTb Λ-1 b tb

(19)

where Λb is the covariance matrix of tb. If Λb is singular, a possible case for CPCA since the block scores are correlated, a pseudoinverse of Λb can be used instead. And the confidence limit for T2b is χR2 (lb) for a given confidence (1 - R) where lb is the rank of Λb.

Ind. Eng. Chem. Res., Vol. 49, No. 11, 2010 2

Since Tb is in quadratic form, the confidence limit under normal conditions is given in5 as T2b e gTb χ2R, (hTb ) ≡ τ2b gTb )

hTb )

(20)

tr{(SbPbΛ-1PTb )2}

(21)

tr{SbPbΛ-1PTb } (tr{SbPbΛ-1PTb })2

(22)

tr{(SbPbΛ-1PTb )2}

where Sb ) cov (Xb). The decentralized monitoring procedure using T2 statistic is the same as described above for SPE statistic. Monitoring using the super statistics is the same as monitoring using PCA, if the data blocks are properly scaled by the squared root of the number of variables in the specific block. DPCA. The linear relations between X(t) and X(t - n) can be identified using DPCA. DPCA uses a special Hankel matrix, which is formed by augmenting each observation vector with the previous n observations and stacking the data matrix as,

X(I) )

{

XtT

T Xt-1

···

T Xt-1

T Xt-2

T · · · Xt-n-1

l

l

·

··

T T Xt+n-I-1 Xt+n-I ···

T Xt-n

l T Xt-I

}

(23)

where xtT is the J-dimensional observation vector for the reference set at time instance t. The procedure is essentially the same with conventional PCA except, here, the data matrix is composed of time shifted duplicate vectors. The number n, which is usually 1 or 2, indicates the order of the dynamic system. For nonlinear systems, n could be higher, to better approximate the actual nonlinear relations. Several methods to determine n are proposed. The T2 and SPE statistics and their thresholds given for PCA generalize directly to DPCA.7 Literature Cited (1) Jackson, J. E. Principal Components and Factor Analysis: Part I-Principal Components. J. Qual. Technol. 1980, 12, 201–213. (2) Kourti, T.; MacGregor, J. F. Process Analysis, Monitoring and Diagnosis Using Multivariate Projection Methods. Chemometr. Intell. Lab. 1995, 28, 3–21. (3) Kourti, T.; MacGregor, J. F. Multivariate SPC Methods for Process and Product Monitoring. J. Qual. Technol. 1996, 28, 409–428. (4) MacGregor, J. F.; Kourti, T. Statistical Process Control of Multivariable Processes. Control Eng. Prac. 1995, 3, 403–414. (5) Qin, S. J.; Valle, S.; Piovoso, M. J. On Unifying Multiblock Analysis With Application to Decentralized Process Monitoring. J. Chemometr. 2001, 15, 715–742.

5093

(6) Westerhuis, J. A.; Kourti, T.; MacGregor, J. F. Analysis of Multiblock and Hierarchical PCA and PLS Models. J. Chemometr. 1998, 12, 301–321. (7) Russell, E. L.; Chiang, L. H.; Braatz, R. D. Fault Detection in Industrial Processes Using Canonical Variate Analysis and Dynamic Principal Component Analysis. Chemometr. Intell. Lab. 2000, 51, 81–93. (8) Hoskuldsson, A. A Combined Theory for PCA and PLS. J. Chemometr. 1995, 9, 91–123. (9) Hoskuldsson, A. Dimension of Linear Models. Chemometr. Intell. Lab. 1996, 32, 37–55. (10) Krzanowski, W. J. Cross-Validation in Principal Component Analysis. Biometrics 1987, 43, 575–584. (11) Siirola, J. D.; Hauan, S.; Westerberg, A. W. Towards Agents-Based Process Systems Engineering: Proposed Agent Framework. Comput. Chem. Eng. 2003, 27, 1801–1811. (12) Cinar, A.; Perk, S.; Teymour, F.; North, M.; Tatara, E.; Altaweel, M. Monitoring, Analysis, and Diagnosis of Distributed Processes with Agent-based Systems. Proceedings of ADCHEM, Istanbul, Turkey, 2009. (13) Tetiker, M. D.; Artel, A.; Teymour, F.; Cinar, A. Control of Grade Transitions in Distributed Chemical Reactor Networks- An Agent Based Approach. Comput. Chem. Eng. 2008, 32, 1984–1994. (14) Perk, S.; Cinar, A. Agent-Based Monitoring, Fault Detection, Diagnosis and Control of Spatially Distributed Processes. Proceedingd of DYCOPS, Cancun, Mexico, 2007. (15) North, M. J.; Howe, T. R.; Collier, N. T.; Vos, J. R. Repast Simphony Runtime System. Proceedings of the Agent 2005 Conference on Generative Social Processes, Models, and Mechanisms, cosponsored by Argonne National Laboratory and The University of Chicago, Chicago, IL, 2005. (16) Montgomery, D. C.; Mastrangelo, M. C. Some Statistical Process Control Methods for Autocorrelated Data. J. Qual. Technol. 1991, 23, 179– 204. (17) Negiz, A.; Cinar, A. Statistical Monitoring of Multivariable Dynamic Processes with State Space Models. AIChE J. 1997, 43, 2002– 2020. (18) Siirola, J. D.; Hauan, S.; Westerberg, A. W. Computing Pareto Fronts Using Distributed Agents. Comput. Chem. Eng. 2004, 29, 113–126. (19) Tatara, E.; Birol, I.; Cinar, A.; Teymour, F. Measuring Complexity in Reactor Networks with Cubic Autocatalytic Reactions. Ind. Eng. Chem. Res. 2005, 44, 2781–2791. (20) Tatara, E.; Birol, I.; Cinar, A.; Teymour, F. Agent-Based Control of Autocatalytic Replicators in Networks of Reactors. Comput. Chem. Eng. 2005, 29, 807–815. (21) Tatara, E.; North, M.; Hood, C.; Teymour, F.; Cinar, A. In Engineering Self-Organising Systems; Springer: Berlin/Heidelberg, 2006; Vol. 3910, Chapter Agent Based Control of Spatially Distributed Chemical Reactor Networks. (22) Tatara, E.; Teymour, F.; Cinar, A. Control of Complex Distributed Systems with Distributed Intelligent Agents. J. Process Control 2007, 17, 415–427. (23) Dash, S.; Venkatasubramanian, V. Challenges in the industrial applications of fault diagnostic systems. Comput. Chem. Eng. 2000, 24, 785–791. (24) Cinar, A.; Palazoglu, A.; Kayihan, F. Chemical Process Performance EValuation; CRC Press-Taylor & Francis: Boca Raton, FL, 2007. (25) Wold, S.; Kettaneh, N.; Tjessem, K. Hierarchical Multiblock PLS and PC Models for Easier Model Interpretation and as an Alternative to Variable Selection. J. Chemometr. 1996, 10, 463–482.

ReceiVed for reView August 31, 2009 ReVised manuscript receiVed December 18, 2009 Accepted December 21, 2009 IE901368J