Hybrid Model-Based Framework for Alarm Anticipation - Industrial

Jan 21, 2014 - When an abnormal situation occurs, the automation system alerts the operators through alarms. In this work, we introduce a new type of ...
0 downloads 0 Views 2MB Size
Article pubs.acs.org/IECR

Hybrid Model-Based Framework for Alarm Anticipation Shichao Xu,† Arief Adhitya,† and Rajagopalan Srinivasan*,‡,§ †

Institute of Chemical and Engineering Sciences, A*STAR (Agency for Science, Technology and Research), 1 Pesek Road, Jurong Island, Singapore 627833, Singapore ‡ Department of Chemical and Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, Singapore 117576, Singapore S Supporting Information *

ABSTRACT: Modern chemical plants consist of a number of integrated and interlinked process units. When an abnormal situation occurs, the automation system alerts the operators through alarms. In this work, we introduce a new type of alarms, known as anticipatory alarms, aimed to enable operators to orient holistically to the abnormal situation. These anticipatory alarms are developed based on an alarm anticipation algorithm that utilizes dynamic process models to offer an accurate shortterm prediction of the process state. In particular, these models predict the rate-of-change of process variables, which are then translated into predictions of time horizons for occurrence of various critical alarms. Anticipatory alarms seek to improve the sensemaking facilities offered to the operator through advance warning of impending alarms. As a result, operators can adopt a more proactive approach in managing abnormal situations. The benefits of anticipatory alarms have been demonstrated through six fault scenarios in a depropanizer unit case study. All alarms are successfully predicted, providing a diagnosis time benefit of around 35 s to the operators. was reported to generate a total of 392 alarms within 1.5 h,2 that is, 43 alarms on average every 10 min. An accident at Esso’s Longford Refinery generated 8500 alarms over a 12-h period.3 This flood of one alarm every five seconds was highlighted as the main contributor to the accident since the operator missed some important alarms leading up to the accident. Similar statistics were also reported by Srinivasan and co-workers4,5 for a refinery in Singapore. In Texaco’s Milford Haven Refinery, the operator had to recognize, acknowledge, and act on 275 alarms in the last 11 min before an accident occurred. Poorly prioritized alarms and inadequately designed control displays were again pointed out as the root cause of the accident.6 As a benchmark, the UK Health & Safety Executive (HSE) classified the number of alarms that an operator can effectively manage into three levels:7 (1) manageable, one alarm per three minutes; (2) overdemanding, one alarm per 1.5 min; and (3) unmanageable, one alarm per minute. Once the number of alarms reach an unmanageable level, they become disorienting and result in delays in taking corrective measures, which will eventually lead to an emergency shutdown in the best case or sometimes worse. Some contributors to poor alarm systems include chattering alarms (where the same alarm is triggered three or more times in a minute), duplicate alarms (where one alarm always follows another and hence does not provide the operator any additional information), and nuisance alarms (that do not require any operator action). The financial implications of poor alarm

1. INTRODUCTION Modern chemical plants are complex systems and consist of a large number of integrated and interdependent process units. To optimize the supervision of operation in these plants, process operators and engineers depend on automation systems to assist them in (1) extracting key information of the state of the plant, which is then used for (2) managing and controlling operations in real-time. One key constituent of the automation system is the alarm system. During an abnormal situation, when a process variable deviates beyond its acceptable limits, an alarm is flagged, presented in the alarm summary page of the Distributed Control System (DCS) user interface, on panelmounted enunciator boards, or as audible bells or sirens. The primary function of the alarm system is to direct the operator’s attention toward any plant condition requiring timely assessment or action. Hence, when an alarm occurs, operators are expected to intervene in the process operation (typically through the control system), rectify the cause of the abnormal situation and bring the plant back to normal operating state. On the part of the operators, this requires sensemaking, that is, the active process of building, refining, questioning, and recovering situation awareness.1 Alarms are typically configured in the early stages of the plant’s lifecycle. However, the relative ease of configuring new alarms in modern-day alarm systems often leads to their proliferation, that is, far too many alarms are configured. During an abnormal situation, because of the highly integrated nature of modern chemical plants, these leads to “alarm flood”, wherein more alarms are generated within a short time than can be physically addressed by the operator (typically 10 alarms per 10 min). Alarm floods lead to information overload on the part of the operator thus hampering the recovery steps that he has to take. A number of real-life incidents offer evidence of the scale of the problem. A simple incident of a compressor trip © 2014 American Chemical Society

Special Issue: David Himmelblau and Gary Powers Memorial Received: Revised: Accepted: Published: 5182

May 10, 2013 January 20, 2014 January 21, 2014 January 21, 2014 dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193

Industrial & Engineering Chemistry Research

Article

and hence duplicate alarms, as well as unnecessary alarms which do not require any operator action. Kimura et al.22 developed indices to quantitatively evaluate alarms in terms of their effectiveness, recall, and timeliness, which gauge if the alarm would enable the operator to respond in an effective and timely manner. As another approach to evaluating alarms systems, Liu et al.23 constructed an operator model for use as a virtual subject to analyze their behavior during process malfunctions without the need of process operational data. Such offline assessment and alarm rationalization approaches are meant to be applied periodically (weekly or monthly) to review the health of the alarm system. Graphical tools, such as visualization, can further assist the engineer identify patterns among the alarms “manually”.24 It is also now being recognized that, although alarm rationalization is probably the most effective strategy to solve alarm problems, they are heavily knowledge intensive; hence, statistical and data mining technologies can offer only partial support.17 It has been recognized for some years now that alarm management is not solely a technological problem.25,26 Ultimately, it is the responsibility of the operator to respond to abnormal situations and prevent catastrophic consequences. The actions of the operator during an abnormal situation can be viewed in terms of the Boyd loop, which models the decision-making as consisting of four steps: observe, orient, decide, and act (and hence also called the OODA loop).27 The role of the alarm system in the OODA loop is to help orient the operator to the current (abnormal) state of the plant. Subsequently, operators have to decide and act by taking corrective actions to bring the plant back to normal operating state. Nuisance alarms and alarm floods delay or prevent the orientation through information overload; hence, alarm management is important. However, besides reducing nuisance alarms, other approaches can be additionally pursued to improve the operator’s ability to orient quickly to the plant state. 2.1. Human Factors in Alarm Management. Process control industries usually entail working in a huge, interactive system involving man and machine. Apart from developing smarter alarm management tools that improves the “machine” aspect, the “man” aspect in managing alarms through the understanding of the behaviors of operators during an abnormal situation is equally important.28 In an ethnographic study conducted by Yin,10 the behaviors of plant operators were classified into two groups, namely, novice and expert. Operators belonging to the novice group had less experience in managing the plant. Yin10 found that this group of operators tends to adopt a more reactive monitoring approach, that is, they generally rely on alarms to diagnose faults and to alert them of any abnormal behaviors happening to the plant. As a result, novice operators were more likely to be disoriented during an alarm flood and prone to activating the emergency shutdown when the fault/abnormal situation could not be promptly diagnosed and rectified. Operators belonging to the expert group, on the other hand, had a more complete understanding of the process dynamics. This group tended to adopt a more proactive process monitoring behavior, which involved utilizing trend displays to predict future process states. This mental prediction helped them diagnose and rectify the abnormal situation early and prevent plant shutdowns. One way to mitigate this performance gap between novice and expert operators is to provide tools that can help the former perform just as well as the latter. Expert operators benefit from years of

management have been placed variously at three to ten million pounds per year for a typical oil refinery8 and 10 to 20 billion dollars annually in the U.S. petrochemical industry.7,9 Most of the research in the field of alarm management has sought to develop systematic approaches for rationalization so as to reduce the number of alarms without compromising on the ability to detect all potential abnormal situations. Another complementary aspect is to enable the operator to quickly understand the alarms in the context of the dynamic process state. We seek to address this latter issue in this paper. Recent human factors studies reveal that, when an abnormal situation occurs, those operators who are able to predict the evolution of the plant state are best able to cope with alarm floods.10 Motivated by this, we propose to provide predictive alarm information in order to improve operators’ sensemaking facilities. Therefore, when an abnormal situation occurs, the operators can quickly orient themselves and therefore would have a longer lead time to identify the root cause and take corrective actions to bring the plant back to a safe operating range. The rest of this article is organized as follows: section 2 summarizes the recent developments in alarm management. In section 2.1, we discuss the role of human factors in alarm systems. In section 3, we introduce the concept of anticipatory alarms. The prediction of the future state of the alarm variables requires a dynamic model of the process. We propose a framework for developing such models in section 3.1. Section 4 illustrates these concepts and their effectiveness using a refinery depropanizer unit example. The results are then presented in section 5.

2. LITERATURE REVIEW Alarm management has received a lot of attention in recent years with a number of standards, handbooks, and articles proposing new techniques. The ISA 18.2 standard11 addresses the development, design, installation, and management of alarm systems adopting a lifecycle approach. The key stages in the lifecycle are specification of the alarm philosophy, identification of potential alarms, detailed design including specifying alarm set points, implementation including operator training, monitoring and assessment of alarm system performance, which may trigger modifications, thus leading to management of change. Various handbooks12−14 offer various insights for each of the stages. A number of software systems are also now available that offer various types of analysis for alarm monitoring and reduction including alarm rates and alarm frequency calculation, determination of stale (or standing) alarms, and support for alarm rationalization.15 These have led to a number of successes in actual industrial implementation, such as the case reported by Mahajan and Surve16 at a gas recycling plant of Qatar Petroleum. In tandem, several researchers have proposed more advanced techniques to evaluate alarm systems and improve alarms. Izadi et al.17 proposed the use of process data and knowledge along with alarm data to diagnose and rectify the problems of alarm systems. For example, using process knowledge, Foong et al.18 developed a fuzzy-logic based alarm prioritization scheme that enabled operators to decide, which alarms to attend to first if an alarm flood occurs. One family of approaches have sought to utilize the correlation among different alarms19 or between alarms and operator actions20,21 to determine the need and importance of alarms or select suitable thresholds. Such correlation analysis takes into account the temporal dependencies between alarms and can help detect strongly correlated 5183

dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193

Industrial & Engineering Chemistry Research

Article

Anticipatory alarms seek to overcome this handicap of partial information.35,36 As shown in Figure 2, in the proposed approach, operators are provided with real-time information not only about the alarms that have already occurred, but also about those that would occur in the near future. Through this, operators can obtain a comprehensive view of the current state of the process and its predicted state which would help them localize and rectify the problem. Alarm anticipation can be achieved in various ways. One simple strategy is reducing the alarm thresholds. However, alarm limits are usually set based on numerous considerations, especially safety-related ones.12 Changing the alarm limits could be misleading to the operators and have the inverse effect of worsening safety performance if operators become nonchalant to alarms, knowing subconsciously that they do not accurately reflect safety limits.3 Another approach could be using past data to perform extrapolation. Our preliminary investigation suggests that the quality of such prediction would be highly dependent on the extent of noise in the system. What is required is a multivariate, easy-to-develop, model-driven anticipation scheme that respects the fundamental laws. Therefore, in this paper, we propose a hybrid first-principles and data-driven model that considers interactions between different variables for anticipating alarms. The proposed anticipatory alarms are built around an alarm anticipation algorithm, and utilize dynamic process models, known as anticipatory alarm models or AA-models, to estimate the rate-of-change of process variables in the near-term. Using the rate-of-change, the time at which each process variable would trigger its alarm limits, termed as anticipated alarm time or AA-time, is calculated and used to trigger anticipatory alarms. Let yj be an alarm variable, where the index j is used to indicate different alarm variables. Each alarm variable yj is affected by other measured variables, Zjk, where the index k indicates that various measured variables could affect yj. In realtime, the alarm anticipation system uses AA-models and Zjk values to estimate the AA-time for each yj. First, the measurements of the process variables that are required in the AA-model, Zjk, are obtained at each time instant t. Second, based on the AA-model, the rate-of-change of the alarm ̂ t , is estimated. Finally, the alarm anticipation variable, dyj /d

operational experience, which allows them to develop mental models to quickly understand and anticipate situations. Anticipatory alarms seek to support the performance of novice operators using first-principles and data-driven process models. The crucial role of prediction during real-time decision making has been brought out in a number of domains ranging from firefighting29 to medical devices.30,31 Predictive tools can provide operators with information about the process’ future state, so as to improve their situation awareness and provide a longer lead time for action. Such a predictive aid is currently not implemented in the chemical industries but has found widespread usage in various other domains. For example, the cockpit display of traffic information in modern airplanes shows other aircrafts in the vicinity and their trajectories, and alerts the pilots of any potential conflicts. This improves pilots’ ability to anticipate and reduces their workload.32 Fire fighters anticipate how bushfires will spread so as to come up with the best resolution strategy.29 As human errors in medical device use account for a large portion of medical errors, methods have been developed to predict patient safety in medical devices with integral information technology to reduce such errors.30,31 Tsunami warning systems aim to relay possible impending rogue waves to affected shorelines in hopes to warn and advise people to evacuate to higher ground.33 Similar predictive applications and benefits are also found in maritime industries.34 A more familiar example is in hurricane forecasts, in which powerful simulations seek to extrapolate the paths of hurricanes. These examples reflect the common theme of integrating available information to calculate a prediction, and present this result to the user during decision-making. In this work, we adopt a similar approach for alarm management, called anticipatory alarms.

3. ANTICIPATORY ALARMS The key motivation for anticipatory alarms is to help plant operators, especially those that are less experienced, to manage abnormal situations in a more proactive manner. The information flow in traditional alarm systems from measurements to the display on the DCS is shown in Figure 1. When a

algorithm utilizes the predicted rate-of-change to estimate the AA-time of yj: t jAA =

Yj − yj ̂ t dyj /d

(1)

where Yj represents an alarm limit for yj. The time horizon for which prediction is performed is called the anticipation time window or AA-window. Variables whose AA-time is within the AA-window are conveyed to the operator as anticipated alarms. Figure 3 shows the AA-window, where t* is the current time and W is the length of the AA-window. At t*, the rate-of-change of the process variable is estimated. The AA-time is then calculated based on this predicted rate-ofchange, the current variable value, and the alarm limit. There are two possibilities as shown in Figure 3: (a) the AA-time falls within the AA-window tAA j ≤ W or (b) the AA-time is outside the AA-window (tAA > W). In the former case, anticipatory j alarm is triggered and the operator is notified of the predicted AA-time. In the latter case, no anticipatory alarm is triggered.

Figure 1. Existing alarms management in chemical plants.

large disturbance or abnormal situation occurs in the process, the operator’s attention is captured by the numerous alarms. Other variables that are moving toward their alarm limits but have not yet reached their alarm threshold would appear to be normal. In the absence of trend information, as is the case in most DCS schematics today, the operator does not have readily accessible information for inferring the true state of the plant, which hinders the operator’s sensemaking. 5184

dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193

Industrial & Engineering Chemistry Research

Article

Figure 2. Alarms management using the anticipatory alarms framework.

Figure 3. Anticipation time window. Figure 4. Structure of anticipatory alarms model.

̂ t is predicted The rate-of-change of the alarm variables dyj /d

dyj ̂

online using AA-models, which are developed offline beforehand, as described next. 3.1. AA-Model Development. Developing dynamic models of complex industrial processes is in general challenging. AA-models pose additional requirements since these models must be simple, for real-time use, yet be representative of the process dynamics in a wide range of operations including abnormal situations for which plant data may not be available beforehand. Developing a first-principles model that accurately represents the process dynamics during abnormal situation can be expensive and time-consuming. Further, the necessary parameters such as reaction kinetics and heat transfer coefficients are rarely measured. Data-driven models on the other hand are easier to develop if historical process data is available. However, the prediction capabilities of data-driven models are limited to the domain covered by the data. Since data from abnormal situations is usually rare or scarce, data-driven models by themselves would not offer good predictions. To overcome these challenges, we propose a hybrid modeling strategy that utilizes both first-principles and data-driven modeling techniques to estimate the process state accurately in the short-term. Rather than a single monolithic model that seeks to model the entire process, we consider the evolution of each variable individually and develop a set of simple yet multivariate models. Further, not every variable in the process is considered. A limited number of measured variables in the plant have alarms configured in the DCS. We develop AA-models to predict the evolution of only these variables with alarms, yj. The proposed model structure is shown in Figure 4. In the proposed scheme, measured process variables, Zjk, are input to the AA-models, which are represented in general by

dt

= f (7 j , Z j )

(2)

where Zj = [zj1,zj2, ...] denotes the set of measured process variables Zjk that affect yj, and 7 j denotes the set of parameters pjk in the first-principles model. Function f(...) relates the measured input process variables and the unknown parameters to the rate-of-change of yj. It is a hybrid first-principles and data-driven model. We propose to identify the structure of f using first principles, that is, mass and energy balances. Once the first-principles model of a process variable is derived, we use data-driven modeling techniques based on historical data of Zjk and yj to estimate the unknown parameters 7 j . A parameter estimation procedure is used to select the parameter values. A variety of data-driven techniques could be used to estimate 7 j . In this paper, we use an optimization-based method similar to the nonlinear least-squares approach that minimizes the error between actual and the predicted rate-of-change of process variables. The model structure of process variable yj (given in eq 2) is first linearized and rearranged into the following form: @j(n) = 7 j·j (n) = p1j ?1j(n) + p2j ?2j (n) + ... + pKj ?Kj (n),

n = 1, ..., N

(3)

w h e r e @j d e n o t e s p a s t m e a s u r e m e n t s o f y j ,

7 j = [p1j , p2j , ..., pkj ] denotes the parameters to be estimated, and j = [?1j, ?2j , ..., ?kj], where ?kj is a term comprising one or more variables zjk in the linearized equation. There can be a 5185

dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193

Industrial & Engineering Chemistry Research

Article

Figure 5. Schematic of depropanizer unit.

Figure 6. Control volume for developing AA-model of TI17.

driving force (?kj) that involves more than one variable (zjk). For example, in eq 8, the difference between two variables TI16 and TI17, that is, TI16 − TI17, is one of the driving forces. k is the number of terms in j and N is the number of samples/ observations in the historical data set used for parameter estimation. In the optimization-based method, the aim is to reduce the error between the actual (given by the historical data) and predicted (given by eq 2) rate-of-change of the process variable as follows: 2 ⎛ d@ dy j ̂ ⎞ j ⎜ ⎟ (n)⎟ ∑ ⎜ (n) − dt dt ⎠ n=1 ⎝

7j ≤ 7j ≤ 7j

̂ t is the rate-of-change of process variable yj where d@j/d ̂ t is the predicted obtained from the historical data and dy j /d rate-of-change of the process variable obtained using eq 2 and j . 7 j and 7 j represent the lower and upper bounds of 7 j , respectively. In summary, the procedure for developing the AA-models is as follows: (1) Identify the process variables, yj, in the plant to be used for anticipatory alarms monitoring. (2) For an alarm variable yj, (a) develop the first-principles model based on mass and energy balances relating yj to measured variables zjk and other unmeasured variables, (b) estimate each required unmeasured variable through interpolation or relating it to measured variables, (c) obtain historical data for yj and zjk that

N

min 7 j

(5)

(4)

subject to 5186

dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193

Industrial & Engineering Chemistry Research

Article

are required in the first-principles model, and (d) using the first-principles model and historical data, estimate the unknown parameters 7 j using data-driven techniques. Next, the benefits of anticipatory alarms are evaluated using a simulated depropanizer plant case study.

4. CASE STUDY: DEPROPANIZER PLANT The schematic of the depropanizer plant is shown in Figure 5.37 The primary objective of the unit is to separate a feed mixture (typically from either the deethanizer unit bottoms or the debutanizer unit overheads), consisting primarily of C3 and C4 hydrocarbons, into two product streams. The lighter product, from the top of the unit, consists primarily of C3. The bottom stream consists of C4 and other heavier hydrocarbons, which is further processed in the downstream units to recover the heavy hydrocarbons. The depropanizer unit has 24 measured process variables. However, from the point of view of the operator, not all process variables have equal importance. In this case study, eight process variables, TI13, TI14, TI16, TI17, TC11, LC11, LC12, and PC11, have alarms configured. These were selected for anticipatory alarms. 4.1. Development of AA-Models. AA-model for each process variable was first developed. Consider TI17, the temperature of tray 1. To develop the AA-model for TI17, we first develop its first-principles model. Taking tray 1 as the control volume as shown in Figure 6, the energy balance equation can be written as m1

dH1L = DR H DLR + V2H2V − L R H1L − V1H1V dt

Given the historical data, the term on the left-hand side of the equation and those in the square brackets are known while the rest of the terms (five parameters) are unknowns. Equation 8 thus provides the model structure for variable TI17 with the known process variables (measured or estimated) T2, TI16, TI17, PC11, V, and L. T2 is estimated from measured variables TI15 and TI17, L is estimated from FC12, and V is estimated from FI13 and LC11. The data-driven optimization method is then used to determine the unknown parameters, 7 j , using historical data. Similar models were developed for all eight alarm variables in the case study. The interested reader is referred to the Supporting Information for details of the other models. Once the AA-models are developed, they are used online to predict the AA-time for the process variables in real-time.

5. RESULTS The model developed above has been used for generating anticipatory alarms in real-time. In this case study, a total of six different fault scenarios are studied. They are as follows: (S1) loss of cooling water at condenser E12, (S2) loss of hot oil at reboiler E11, (S3) degradation of reflux pump P11A, (S4) loss of feed, (S5) fouling of reboiler E11, and (S6) fouling of condenser E12. The sampling period is four seconds. In each scenario, the fault is introduced at t = 15 s. The actual alarms that are triggered and their sequence in each scenario are given in Table 1. Because of limitation of space, we report detailed results for three scenarios. The AA-window length used in these scenarios is 60 s. 5.1. Scenario 1: Loss of Cooling Water at Condenser E12. In this scenario, there is a loss of cooling water entering the condenser E12. As there is no water to condense the vapor coming from the distillation tower, vapor starts to build up leading to a pressure increase in the column. Eventually, the column pressure PC11 will trigger its high-limit alarm at t = 88

(6)

where m1 is the liquid moles hold-up on tray 1, DR is the reflux flow rate, LR, V1, and V2 represent the respective liquid and vapor flow rates, and H denotes the respective specific enthalpy. The subscript index of H denotes the tray number, while the superscript index represents the type of flow (i.e., L for liquid and V for vapor). By expanding the specific enthalpy terms, HL1 , HV1 , HLDR, and HV2 (derivation details given in the Supporting Information), eq 6 can be rewritten as dTI17 ̂ AL [L R (TI16 − TI17)] = dt m1(AL + BL TI17) BL [L R (TI162 − TI172)] 2m1(AL + BL TI17) ⎡ VR(T2 − TI17) ⎤ 1 + ⎢ ⎥ L L ⎦ PC11 m1(A + B TI17) ⎣

Table 1. Alarms Triggered in the Six Scenarios

+

+

AV [V (T2 − TI17)] m1(A + BL TI17)

scenario

description

actual alarms (time)

S1

loss of cooling water loss of hot oil

PC11 HI (88s), TI16 HI (120s), TI13 HI (132s), TI17 HI (164s), LC12 LO (196s) TC11 LO (80s), LC11 HI (132s), LC12 LO (192s) TI17 HI (48s), LC12 HI (164s), TI16 HI (180s), TI14 HI (292s), LC11 LO (296s), TC11 HI (360s), TI13 HI (508s), PC11 HI (516s) LC11 LO (400s), TI14 HI (440s), TC11 HI (472s) TC11 LO (104s), LC11 HI (252s), LC12 LO (364s) PC11 HI (84s), TI16 HI (152s), TI13 HI (196s)

S2

L

S3

degradation of reflux pump

(7)

S4

loss of feed

where AL, AV, BL, and BV are unknown parameters in the firstprinciples model. Equation 7 can be rearranged to the form given in eq 3 as follows:

S5

reboiler fouling condenser fouling

BV [L R (T2 2 − TI172)] + 2m1(AL + BL TI17)

S6

5187

dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193

Industrial & Engineering Chemistry Research

Article

Figure 7. Anticipatory alarms in scenario 1.

Figure 8. Comparison of AA-times with actual time to alarm in scenario 1.

anticipatory alarm for LC12 LO is triggered at t = 160 s, the operator can localize the fault to be S1 (loss of cooling water), seeing that S3 would involve LC12 HI rather than LC12 LO. The operator can therefore conclude from the anticipatory alarm of LC12 LO at t = 160 s that the abnormal situation is in fact loss of cooling water and initiate recovery action even before the LC12 LO alarm actually sounds at t = 196 s. As shown in Figure 7, AA correctly anticipated all the five alarms in this scenario and offered an additional diagnosis time advantage of 36 s. The accuracy of the anticipatory alarms is illustrated in Figure 8, which shows how the AA-times for each of the eight variables compare with the actual alarm time. The actual alarm time is denoted by T at the right end of the x-axis, while the yaxis denotes the time to alarm. The x-coordinate of the first (left-most) marker of each alarm variable shows the time advantage provided by AA, as it signifies the first time the operator is notified of the incipient alarm. For example, the anticipatory alarm for TI13 HI gives a time advantage of 48 s. A perfect prediction will fall on the 45-degree solid line in Figure 8. It can be observed that while some AA-times are higher and others are lower than the actual time to alarm, the predictions become more accurate (the markers approach the 45-degree solid line) as they get closer to the actual alarm time. For instance, at t* = T − 48 s, the AA-time for TI13 HI is 54 s, which is six seconds higher than the actual time to alarm; at t* = T − 4 s, the AA-time for TI13 HI is three seconds, a difference of only one second.

s. As the pressure builds up in the reflux drum, the temperature of its content increases. This causes the temperature of the liquid flowing out of the reflux drum, TI16, to increase, resulting in increasing temperatures of the bottom product (TI13) and the top tray (TI17) as well. As a result, all three process variables, TI16, TI13, and TI17, trigger their high-limit alarms at t = 120, 132, and 164 s, respectively. Since no condensation takes place after the fault, the liquid level in the reflux drum, LC12, decreases and eventually triggers its lowlimit alarm at t = 196 s. Anticipatory alarms can offer significant advance indication for each of these impacted variables as shown in Figure 7. The x-axis denotes the process time while the y-axis denotes the AAtime, that is, the predicted time to alarm (tAA j ) for each alarm variable. A marker in the figure indicates an anticipatory alarm, that is, at that sampling instant, a particular alarm variable has AA been predicted to hit its alarm limit in the next tAA j seconds (tj ≤ 60 s, the AA-window length). At the end of the simulation, the actual time of every alarm is known. These actual alarm times are indicated by the circled markers that lie on the x-axis. The first anticipatory alarm is triggered at t = 64 s, predicting PC11 HI to occur 27 s later. Thus, the operator is notified early that there is a potential problem involving PC11, 24 s before its alarm actually sounds at t = 88 s. At this point, the operator may not be able to localize the fault yet, since PC11 HI can occur in three scenarios: S1, S3, and S6 (see Table 1). Similarly, the next three alarms, TI16 HI (anticipated at t = 92 s), TI13 HI (anticipated at t = 84 s), and TI17 HI (anticipated at t = 128 s) could be caused by either S1 or S3. Only after the 5188

dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193

Industrial & Engineering Chemistry Research

Article

Figure 9. Anticipatory alarms in scenario 2.

Figure 10. Comparison of AA-times with actual time to alarm in scenario 2.

Figure 11. Anticipatory alarms in scenario 3.

5.2. Scenario 2: Loss of Hot Oil at Reboiler E11. In this scenario, there is a loss of hot oil entering the reboiler E11. This results in less bottom liquid flow being vaporized and causes the temperatures in the column to decrease. For instance, the temperature of tray 34, TC11, dips to below 80 °C, which triggers its low-limit alarm at t = 80 s. The liquid hold-up at the bottom of the column also increases and the bottom hold-up liquid level, LC11, triggers its high-limit alarm at t = 132 s. Since no vapor eventually goes up to the top of the distillation tower, less and less condensation takes place. As such, the

liquid level in the reflux drum, LC12, decreases and eventually triggers its low-limit alarm at t = 192 s. As shown in Figure 9, AA also successfully predicted all three alarms in this scenario. The first anticipatory alarm was triggered at t = 56 s, predicting TC11 LO to occur in 35 s, before it actually sounded at t = 80 s. From Table 1, it can be observed that only S2 and S5 trigger TC11 LO. In fact, both these faults have the same alarms pattern. Thus, when notified of TC11 LO, the operator could localize the fault to be either S2 (loss of hot oil) or S5 (reboiler fouling) and proceed to 5189

dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193

Industrial & Engineering Chemistry Research

Article

check the status of the hot oil flow and the reboiler to confirm the diagnosis. Thus, in this case, the proposed anticipatory alarms give a diagnosis time advantage of 24 s. The accuracy of the anticipatory alarms in this scenario is shown in Figure 10. The starting time of AA is different for the three variables, with LC12 LO providing the most advance notification of 60 s. Again as in scenario 1, while the predicted times initially deviate a bit from the actual time to alarm (which becomes known only much later), they consistently become more accurate as they approach the actual alarm time. 5.3. Scenario 3: Degradation of Reflux Pump P11A. In this scenario, the performance (horsepower) of reflux pump P11A degrades significantly. Consequently, the reflux flow into the distillation tower decreases. The process variable that is affected directly by this fault is the temperature of the top tray, TI17, which starts to increase. Subsequently, temperatures of the reflux flow (TI16), other trays, tray 26 (TI114) and tray 34 (TC11), and bottom product (TI13) are also affected. Eventually, all these variables trigger their respective highlimit alarms: TI17 HI at t = 48 s, TI16 HI at t = 180 s, TI14 HI at t = 292 s, TC11 HI at t = 360 s, and TI13 HI at t = 508 s. Also, since less reflux is being pumped back to the distillation column, the liquid level in the reflux drum starts to increase and LC12 hits its high-limit at t = 164 s. If no corrective action is undertaken by the operator, the liquid level will keep on rising, eventually flooding the entire condenser, while the bottom hold-up in the column, LC11, will decrease and ultimately trigger the low-limit alarm at t = 296 s. Vapor continues to build up in the column leading to a pressure increase. Eventually the column pressure PC11 high-limit alarm is triggered at t = 516 s. AA successfully predicted all eight alarms in this scenario, as well and offered significant advance indication for each of these impacted variables, as shown in Figure 11. The first anticipatory alarm was triggered at t = 20 s, predicting TI17 HI to occur in 37 s, before it actually sounded at t = 48 s. From Table 1, it can be seen that TI17 HI could be caused by either S1 or S3. Only after the anticipatory alarm for LC12 HI was triggered at t = 148 s, the operator could localize the fault to be S3 (degradation of reflux pump), seeing that S1 would involve LC12 LO rather than LC12 HI. The operator could thus diagnose the fault based on the anticipatory alarm before the LC12 HI alarm actually sounded at t = 164 s. Thus, in this case, the proposed anticipatory alarms provided a diagnosis time advantage of 16 s. We have done similar analysis for the other three scenarios, which in the interest of space are not reported here (see Supporting Information). On average, in the six scenarios AA provides a diagnosis time advantage of 35 s. 5.4. Comparison of Different AA-Window Lengths. The performance of the proposed anticipatory alarms depends on only one tuning parameter, that is, the AA-window length. The selection of the AA-window length depends on several factors. One is the accuracy of the AA-models; the AA-models usually offer higher accuracy when the process variable is near its alarm limit, so shorter AA-windows should be used. Further, processes with smaller time-constant would also necessitate a shorter AA-window. However, from an operation point-of-view, the earlier the anticipation the earlier the operator can respond; hence a longer AA-window is more valuable. The above results used a window length of 60 s. Here, we study the effect of different AA-window lengths on prediction accuracy.

The performance of the AA-models can be evaluated by average anticipation error, false positive and false negative rates. Anticipation error is defined as the absolute difference between AA-time and the actual time to alarm (known only a posteriori) at a particular sampling instant. Figure 12 depicts the predicted

Figure 12. Comparison between actual and predicted times on anticipated alarms.

and actual times to alarm as a function of time. A perfect model prediction will lead to its AA-time tracking along the diagonal with the actual time. However, in practice, models are seldom perfect and the predicted time would deviate as illustrated. The resulting anticipation error is shown by the shaded area. The average anticipation error, εj, of a particular alarm variable yj is calculated as the mean of the anticipation errors from all sampling instants when its anticipatory alarm is active. M

εj =

(∑i = l |t jAA(i) − t jact(i)|) M−l+1

(9)

where l represent the sampling instant when the anticipatory alarm starts and M represents the sampling instant just before AA the actual alarm is triggered. Times tact j (i) and tj represent the actual time to alarm and the AA-time for the process variable at the ith time instant, respectively. A value of εj ≈ 0 indicates that the prediction from the model is accurate. The higher the εj, the more mismatch between the AA-time and the actual time to alarm. False alarms offer another indication of performance. A false negative is defined as an instance where an actual alarm occurs without triggering any anticipatory alarm. On the other hand, a false positive is defined as an instance where an anticipatory alarm has been triggered, but the actual alarm does not materialize during the scenario. The false positive rate is calculated as the percentage of samples with false positive anticipatory alarms flagged. These statistics are dependent on the AA-window length. The performance statistics for 120, 90, 60, 45, 30, and 15 s AA-windows are given in Table 2. In all six scenarios, there is no false negative; all alarms are successfully anticipated. As expected, it can be observed that a shorter AA-window indeed leads to lower anticipation error and false positive rate. However, there is a trade-off since the shorter AA-window with improved accuracy results in decreased time advantage from the earlier alarm notification. From the statistics in Table 2, it is clear that the 60-s AA-window (which has an average anticipation error of 13.59 s and false positive rate of 2.5%) seems to be a good trade-off as its performance is significantly better than the 90 s (31.87 s and 4.8%) and 120 s (37.25 s and 5190

dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193

Industrial & Engineering Chemistry Research

Article

Unlike typical models that seek to predict the process variables themselves,38−40 AA-models are developed to predict the rate-of-change of process variables for a short time period in the future. The AA-models developed in this paper utilized techniques from both first-principles and data-driven methods; thus it has several advantages over models developed using either of these methods. One, in isolation they are more flexible compared to models derived using the first-principles technique alone and two, they extrapolate better than classical black-box/ data-driven models.41 The AA-models can be used as long as the system remains in a condition where the mass and energy balances under which the AA-models were derived still apply. For example, Scenario 1 if unresolved would result in the activation of a pressure safety valve due to the continuing increase of column pressure PC11. This would change the structure of the system and as a result the prediction from the AA-models would be erroneous. To prevent this, we include a condition to use the AA-models only when the pressure safety valve is closed, i.e. when the model structure is preserved. While the AA-models developed in this paper are able to provide good predictions, they are “decentralized”, that is, different AA-models are derived for different alarm variables. However, the process is integrated and different variables affect one another. As such, the AA-models for these variables could be developed to take such relationships into account. However, this would lead to increased model complexity, which could compromise prediction accuracy. Therefore, it remains to be seen if the performance gain from incorporating multivariate relationships outweighs this additional source of error. Future work will focus on extending the proposed modeling framework to include the multivariate relationships between process variables and establish their relative benefits. The emphasis in this paper has been on the development of a predictive scheme for anticipatory alarms. Such information needs to be conveyed to operators through a suitable user

Table 2. Statistics for Different AA-Window Lengths AA-window length (s)

average anticipation error (s)

false positive rate (%)

false negative rate

120 90 60 45 30 15

37.25 31.87 13.59 11.88 6.32 4.53

5.8 4.8 2.5 1.8 1.1 0.2

0 0 0 0 0 0

5.8%) AA-windows, while providing adequate advance notification.

6. CONCLUSIONS AND DISCUSSION In this paper, a new type of alarms, known as anticipatory alarms, is introduced. This type of alarms aims to provide operators with anticipatory information on alarms that would occur as an abnormal situation arises in a plant. They are built around an alarm anticipation algorithm that utilizes models, known as AA-models, to predict the rate-of-change of process variables within the plant. The predicted rate-of-change of process variables are then translated into time predictions, known as AA-time, of the occurrence of various alarms within a certain time-window, called the AA-window. Anticipatory alarms would enable operators to carry out more effective sensemaking during abnormal situations since holistic information about the status of the entire unit can be better estimated well before alarms actually occur. As a result, operators can be more proactive in managing abnormal situations. The benefits of AA have been demonstrated through six scenarios in a depropanizer unit case study. All alarms are successfully predicted, providing a diagnosis time benefit of around 35 s to the operators.

Figure 13. Example of display for anticipatory alarms. 5191

dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193

Industrial & Engineering Chemistry Research

Article

(5) Liu, J.; Ho, W.; Lim, K.; Srinivasan, R.; Tan, K. Intelligent alarm management in a petroleum refinery. Hydrocarbon Process. 2004, 47− 53. (6) Health and Safety Executive. The Explosion and Fires at the Texaco Refinery, Milford Haven, 24 July 1994, Incident Report; HSE Books: London, 1997. (7) Bransby, M.; Jenkinson, J. Alarm management in the chemical and power industries: Results of a survey for the HSE. In IEEE Colloquium on Best Practices in Alarm Management; IEEE: Piscataway, NJ, 1998; pp 1−10. (8) Campbell-Brown, D. Alarm management: A problem worth taking seriously. Control 1999, 52−56. (9) Nimmo, I. Adequately address abnormal situation operations. Chem. Eng. Prog. 1995, 91, 36−45. (10) Yin, S. Proactive monitoring in process control using predictive trend display. PhD thesis, Nanyang Technological University, Singapore, 2012. (11) International Society of Automation. Management of Alarm Systems for the Process Industries, Technical Report; International Society of Automation: Research Triangle Park, NC, 2009. (12) Hollifield, B.; Habibi, E. Alarm Management Handbook: A Comprehensive Guide: Practical and Proven Methods to Optimize the Performance of Alarm Management Systems; International Society of Automation: Research Triangle Park, NC, 2010. (13) Rothenberg, D. H. Alarm Management for Process Control; Momentum Press: New York, 2009. (14) ASM Consortium. Effective Alarm Management Practices; ASM Consortium: Phoenix, AZ, 2009. (15) Gooch, J. Keys to successful alarm management. Hydrocarbon Process. 2011, 90 (4), 85−88. (16) Mahajan, S.; Surve, V. Operator performance enhancement by alarm management. SPE Prod. Oper. Symp. 2012, 1, 532−538. (17) Izadi, I.; Shah, S.; Chen, T. Effective resource utilization for alarm management. In 49th IEEE Conference on Decision and Control; IEEE: Piscataway, NJ, 2010; pp 6803−6808. (18) Foong, O.; Sulaiman, S.; Awang Rambli, D.; Abdullah, N. ALAP: Alarm prioritization system for oil refinery. In World Congress on Engineering and Computer Science, San Francisco 20−22 October 2009; International Association of Engineers: Hong Kong, 2009. (19) Yang, F.; Shah, S.; Xiao, D.; Chen, T. Improved correlation analysis and visualization of industrial alarm data. ISA Trans. 2012, 51 (4), 499−506. (20) Higuchi, F.; Yamamoto, I.; Takai, T.; Noda, M.; Nishitani, H. Use of event correlation analysis to reduce number of alarms. Comput.Aided Chem. Eng. 2009, 27, 1521−1526. (21) Nishiguchi, J.; Takai, T. IPL2 and 3 performance improvement method for process safety using event correlation analysis. Comput. Chem. Eng. 2010, 34, 2007−2013. (22) Kimura, N.; Takeda, K.; Noda, M.; Hamaguchi, T. An evaluation method for plant alarm system based on a two-layer cause-effect model. Comput.-Aided Chem. Eng. 2011, 29, 1065−1069. (23) Liu, X.; Noda, M.; Nishitani, H. Evaluation of plant alarm systems by behavior simulation using a virtual subject. Comput. Chem. Eng. 2010, 34 (3), 374−386. (24) Kondaveeti, S.; Izadi, I.; Shah, S.; Black, T.; Chen, T. Graphical tools for routine assessment of industrial alarm systems. Comput. Chem. Eng. 2012, 46, 39−47. (25) Cochran, E.; Bullemer, P. Abnormal situation management: Not by new technology alone. Proc. AIChE Process Plant Saf. Symp. 1996, 218−223. (26) Bloom, C.; Bullemer, P.; Barreth, R.; Reising, D. Situation awareness for refining and petrochemical process operatorsNot by technology alone. Proc. NPRA Annu. Meet. 2010, 247−258. (27) Chu, R.; Bullemer, P.; Harp, S.; Ramanathan, P.; Spoor, D. Qualitative user aiding for alarm management (QUALM): an integrated demonstration of emerging technologies for aiding process control operators. IEEE Int. Conf. Syst., Man, Cybern. 1994, 1, 735− 740.

interface so that they can use it in their sensemaking. An example of such a display is shown in Figure 13. The alarm display shows the temporal trends of the alarms in two panes, one (historical pane on the left) showing real alarms that have occurred in the recent past and the other (prediction pane on the right) showing anticipatory alarms. Actual alarms and anticipatory alarms are thus displayed in an integrated fashion to enable better sensemaking. In the display, each alarm is depicted as a triangle that is either pointing upward to represent a high-limit alarm or pointing downward to represent a low-limit alarm. In addition, the triangles are colored red to depict a process variable that has triggered a real alarm and amber to depict anticipatory alarm. Alarms are also grouped based on their process unit to reduce the complexity of the display and allows operator to identify and understand the alarms in a systematic manner. For example, the alarms in a depropanizer unit are grouped into condenser, distillation tower, reflux drum, reboiler, and feed units. Human factors experiments have to be conducted with human subject (operators) to determine the effectiveness of such a display in enabling better sensemaking for abnormal situation management in real time. This is the subject of our current research.



ASSOCIATED CONTENT

* Supporting Information S

Derivation of AA-model equations and results from scenarios 4, 5, and 6. This information is available free of charge via the Internet at http://pubs.acs.org/.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Tel: (65)65168041. Fax: (65)67791936. Present Address §

Indian Institute of Technology Gandhinagar, Vishwakarma Government Engineering College Complex, Chandkheda, Visat-Gandhinagar Highway, Ahmedabad, Gujarat, India 382424. Email: [email protected]. Tel: (91) 79-32210155. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work is funded by the Science and Engineering Research Council (SERC), A*STAR, under the Human Factors Engineering (HFE) Thematic Strategic Research Programme (TSRP). The authors thank Professor Martin Helander and Dr Yin Shanqing for helpful discussion.



REFERENCES

(1) Hutton, R.; Klein, G.; Wiggins, S. Designing for sensemaking: A macrocognitive approach. In Proceeding of the 2008 Human Factors in Computing Systems, Florence, Italy, 5−10 April 2008; ACM Press: New York, 2008. (2) Mattiasson, C. The alarm system from the operator’s perspective. In IEEE People in Control Meeting, Bath, U.K., 1999; IEEE: Piscataway, NJ, 1999. (3) Hopkins, A. Lessons from Longford: The Esso Gas Plant Explosion; CCH Australia Limited: North Ryde, New South Wales, Australia, 2000. (4) Liu, J.; Lim, K. W.; Ho, W. K.; Tan, R.; Srinivasan, R.; Tay, A. The intelligent alarm management system. IEEE Software 2003, 20, 66−71. 5192

dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193

Industrial & Engineering Chemistry Research

Article

(28) Kim, I. Computerized systems for online management of failures: a state-of-art discussion of alarm systems and diagnostic systems applied in the nuclear industry. Reliab. Eng. Sys. Saf. 1994, 44, 279−295. (29) Lewandowsky, S.; Dunn, J.; Kirsner, K.; Randell, M. Expertise in the management of bushfires: Training and decision support. Aust. Psychol. 1997, 32, 171−177. (30) Zhang, J.; Patel, V. L.; Johnson, T.; Chung, P.; Turley, J. Evaluating and predicting patient safety for medical devices with integral information technology. Adv. Patient Saf.: Res. Implementation 2005, 2, 323−336. (31) Chung, P.; Zhang, J.; Johnson, T.; Patel, V. An extended hierarchical task analysis for error prediction in medical devices. In AMIA Annual Symposium, January 2003; AMIA: Bethesda, MD, 2003. (32) Morphew, E.; Wickens, C. Pilot performance and workload using traffic displays to support free flight. In 42nd Annual Meeting of HFES, Chicago, IL, 5−9 October, 1998; HFES: Santa Monica, CA, 1998. (33) Titov, V. In the Sea. Tsunamis Forecasting, Vol. 15; Harvard University Press: Cambridge, MA, 2009. (34) Van Breda, L. Anticipatory Behaviour in Supervisory Control, Technical Report; Delft University Press: Delft, the Netherlands, 1999. (35) Xu, S.; Yin, S.; Srinivasan, R.; Helander, M.; Karimi, I.; Srinivasan, R. Proactive alarms monitoring using predictive technologies. Comput.-Aided Chem. Eng. 2012, 31, 1537−1541. (36) Xu, S.; Srinivasan, R. Anticipatory alarms for predictive abnormal situations management. In 4th World Conference of Safety of Oil and Gas Industry, Seoul, Korea, 2012; WCOGI, 2012. (37) Helander, M. Personal Communication, 2011. (38) Thompson, M.; Kramer, M. Modeling chemical process using prior knowledge and neutral networks. AIChE J. 1994, 40, 1328−1340. (39) Kahrs, O.; Marquardt, W. The validity domain of hybrid models and its application in process optimization. Chem. Eng. Process. 2007, 46, 1054−1066. (40) Ng, C.; Hussain, M. Hybrid neutral network-prior knowledge model in temperature control of a semi-batch polymerization process. Chem. Eng. Process. 2004, 43, 559−570. (41) Psichiohis, D.; Ungar, L. A hybrid neural network-first principles approach to process modeling. AIChE J. 1992, 38 (10), 1499−1511.

5193

dx.doi.org/10.1021/ie4014953 | Ind. Eng. Chem. Res. 2014, 53, 5182−5193