Ind. Eng. Chem. Res. 1999, 38, 4359-4371
4359
Hazard Identification in Batch and Continuous Computer-Controlled Plants P. W. H. Chung,* S. H. Yang, and D. W. Edwards Department of Chemical Engineering, Loughborough University, Loughborough, Leicestershire LE11 3TU, U.K.
It has been recognized that due to increasing use of computers in chemical processes some incidents have occurred not because of equipment faults but because of errors in control software. A qualitative, functional model-based approach to hazard identification for general computercontrolled plants is presented in this paper. A functional model, the Process Control Event Diagram (PCED), is proposed. It can be used to represent discrete or continuous control systems. The PCED developed for a computer-controlled plant is used as the basis for hazard identification. State transitions are generated manually by applying the PCED and process information in order to identify whether an undesirable state can be reached. Hazard and operability analysis is carried out by introducing deviations for each control action in the PCED. Case studies are presented to illustrate the methodology. Introduction New Routes to Failure Introduced by Using Computers. Process plants are becoming more complex and highly automated by using computers. This trend has enhanced the quality and the efficiency of normal operations but has also made systems vulnerable to new types of failure. As a consequence, industrial attention on safe design of computer-controlled plants is increasing. Process hazard identification is an important activity in safe process design. It is a systematic identification and mitigation of potential hazards that could endanger the health and safety of humans or cause serious economic losses. Hazard and Operability (HAZOP) analysis is a systematic procedure for determining the abnormal causes of process deviations from normal behavior and their adverse consequences in a chemical plant from a process safety perspective. It is performed by a multidisciplinary team of experts, equipped with a complete description of the process and its operation, who systematically examine every part of the plant Piping and Instrumentation Diagram (P&ID) in order to determine how deviation from the intent of the design of the plant can occur and cause hazards. Detailed descriptions of the HAZOP technique with examples of its applications have been given.1-4 In addition, other hazard analysis techniques such as Failure Modes and Effects Analysis (FMEA), Fault Tree Analysis (FTA), Event Tree Analysis (ETA), and “What If” analysis have been widely used in various industries.5 The use of computers in chemical process control, monitoring, and protection has introduced new potential problems. Several incidents involving computer-based control systems have been reviewed.6-9 Although there are many hazard identification and analysis techniques, there are few which are appropriate for computercontrolled plants. This is because most existing techniques consider neither the control logic, the operating sequence, nor the control algorithm, and they are not * Corresponding author. Telephone: +44 (0)1509 222543. Fax: +44 (0)1509 223923. E-mail:
[email protected].
suitable for identifying potential hazards due to computer system failures or inappropriate responses from the computer system. Control Logic Verification. Several recent studies have addressed problems related to the verification of control logic. Moon et al.10 have used a model-based verification method proposed by Clarke et al.11 to verify automatically the safety and operability of discrete chemical process control systems. The technique involves a system description, safety assertions, and a model checker. The system description is a state transition model of the system to be verified. Safety assertions are expressed in temporal logic. These describe the desired system behavior with respect to safety and operability. The model checker searches the state space of the system and determines the truth of the assertions. Due to the exponential growth of the search space that must be examined with application size, an implicit Boolean state-space model is used by Park and Barton12 to represent logic-based control systems instead of a state transition model in an implicit model-checking approach. Verification is posed as a Boolean satisfiability problem and transformed into its equivalent integer programming problem, which can be solved in terms of standard branch-and-bound algorithms. Kowalewski et al.13 proposed a verification method based on the modeling paradigm Condition/Event (C/E) systems. This approach considers control programs specified as Sequential Function Charts (SFCs). To verify a SFC, it is translated into a C/E system and then connected to a C/E model of the plant. The set of reachable states of the resulting closed-loop system is then compared to a set of forbidden states serving as the specification of the undesired behavior. A quantitative model-based approach to the safety verification problem for discrete/ continuous processing systems was proposed by Dimitriadis et al.14 An appropriate modeling framework can describe discrete, continuous, and hybrid systems and can be incorporated into a safety verification formulation. The formulation results in a mixed-integer optimization problem. Yang and Chung15-17 use a Process Control Event Diagram (PCED) as a system description to analyze the behavior of the process under the control
10.1021/ie990130k CCC: $18.00 © 1999 American Chemical Society Published on Web 10/14/1999
4360
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
logic on the basis of a qualitative simulation. The safety requirement is expressed using constraints linked to each other with Boolean logic operators. The truth of the safety requirement is checked in terms of the results from the qualitative simulation. The control logic is safe only if the calculated value of the expression of the safety requirement is true. Computer HAZOP. To explore the potential hazards in computer-controlled processes, a number of approaches have been investigated in recent years.6,8,15,16,18-22 Techniques for considering the safety aspects of computer-controlled systems are sometimes referred to as Computer HAZOP (CHAZOP). Nimmo8 extended HAZOP to computer-controlled processes. HAZOP and CHAZOP are carried out separately, and the early involvement of computer specialists in the specification of the control system has been emphasized. Andow6 proposed guidelines on HAZOP procedures for computer-controlled plants. It is recommended that a framework similar to the conventional HAZOP be used. CHAZOP is carried out in two stages: preliminary and full. The intention is to build up gradually a detailed view of how a system is supposed to work and what will happen if it fails. To extend HAZOP to computer control systems, Redmill et al.21 gave new interpretations of the generic guide words for “data flow” and “control flow”. For example, the guide word “No” is interpreted as no data or no control signal. A systematic safety evaluation framework for total system safe design in which a computerbased control system is involved was introduced by Drake and Thurston.22 Seven protection layers are proposed in the framework, and the design philosophy recommends achieving as much risk reduction as practical in each of the inner layers before adding outer layers. Purpose of the Paper. The objective of this paper is to describe a qualitative, functional, model-based approach to the hazard identification of computercontrolled plants at the specification stage of the control system life cycle. It is recognized that the operation of most computer control systems involves both discrete and continuous characteristics, and an appropriate modeling technique is proposed to describe it. This modeling method ensures that the functional model based hazard identification deals with discrete and continuous control systems in a consistent way. We begin with a description of the hazard identification framework for computer-controlled processes. Then a functional modeling technique is briefly introduced. The different stages of the framework for continuous/ discrete control systems are discussed in detail on the basis of the functional models. The hazard identification framework is further illustrated using two case studies, an automated semibatch evaporator and a riser reactor in an industrial fluidized catalytic cracking process.
tial hazards. It is the environment in which a system is placed that determines the potential hazardss software itself is not in itself a hazard, but it can contribute to hazards. Usually a chemical process can be divided into smaller parts on the basis of the operating units with a standard function. Some operating units are safety critical, such as a high-temperature reactor, and some are not. Safety requirements are normally expressed in terms of critical states that certain operation units should not reach. The hazard identification framework is to ensure that the control logic will keep the system from reaching such critical states. Stage 2: Represent Control Logic and Process. A control logic to achieve the required function is specified. During design the control logic should be expressed at a high level to facilitate understanding and discussion among engineers from different disciplines. However, the control logic representation should be unambiguous so as to avoid misinterpretation. A novel representation called Process Control Event Diagram (PCED) has been developed for this purpose. A detailed description of the PCED model will be given in the next section. For hazard identification purposes the process is described in qualitative rather than quantitative terms. Stage 3: Verify Whether the Control Logic Satisfies the Safety Requirements. The control logic needs to be shown to satisfy the safety requirements of a system before implementation. The PCED can be used to work out the state transitions of the control logic. The control logic is acceptable only if all the states satisfy the safety requirements. Otherwise, the control logic must be modified and then verified again. Stage 4: Identify Safety Critical Events in the Safety Control Logic. The purpose of this stage is to identify what will happen if the control system deviates from its normal behavior. By introducing deviation from normal behavior, for example, wrong signal, for each control action we can identify what are the causes and consequences of the deviation. If the control logic verification shows that the deviation causes the system to move into a state which violates safety requirements, then the control logic may have to be modified or something has to be done to prevent the causes of the deviation from happening. Stage 5: Apply the Life Cycle Question Library to Safety Critical Events. Some safety critical events can be prevented from happening if certain questions are considered in the first place. A safety question library, which was established in our previous work on the basis of industrial incident reports,30 is used for this purpose. The questions in the library are organized into a structure so that relevant questions can be located easily (see section on Safety-Related Questions).
Hazard Identification Framework for Computer-Controlled Processes
Modeling of Computer Control Systems
The hazard identification framework consists of five stages:15,16 Stage 1: Identify Safety Requirements. In the design and development of any system, accurate requirements are of paramount importance. The objective of this stage is to subdivide the system and to identify the suitable safety requirements for all the safety critical subsystems. To construct safety requirements, we must examine the environment and identify poten-
An appropriate representation is essential for hazard identification, since it provides the basic information for understanding and discussion. Therefore, it is important that the representation can be easily understood by engineers from different disciplines: process, safety, control and software. Unfortunately, existing representations, such as Piping and Instrumentation Diagram (P&ID) and Sign Directed Graph (SDG),26-28 do not capture the structure and behavioral information of computer control systems in a unified form. Therefore,
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4361
written separately in pseudocode at a suitable level of abstraction to aid understand and discussion. The order of events is represented in the horizontal direction. The PCED model is based on the sequential behavior of a computer. For example, even if two or more input signals change simultaneously, the computer will read the signals one at a time. The timing between reading in the signals may be unspecified when a PCED is drawn. It is part of the hazard identification procedure to consider what will happen if an event happens either too early or too late. A PCED permits analysis of the proposed events and the interactions between them. The model acts as a common viewpoint for people of different disciplines to communicate in a review and allows them to view the interactions of events from their own perspective. Figure 1 shows a simple PCED with four events: two I/O events and one computation event then followed by another I/O event. In this example the content of the computation node N3 is simply Figure 1. Example PCED.
they are not suitable. Event Time Diagrams (ETDs)19,20 have been used to express the control logic for hazard identification. However, there are limitations. For example, several ETDs may be required for a single control loop to express the different control actions in different situations. The effect of the control system on the process cannot be expressed in the ETD. The PCED15,16,25 representation complements P&IDs and extends features of the ETD. The basic structure of a PCED consists of seven functional levels; from top to bottom they are operator, human input device and display, communication 1, computer, communication 2, sensor and actuator, and process (see Figure 1). The functional levels in the diagram are used to capture the different types of components of a computercontrolled system. The operator and process functional levels are introduced to enable human effects and the effect of the control system on the process to be represented. Input/output functions are grouped into two levelssHuman Input Device (HID) and Display and Sensor and Actuatorssince the former interacts with operators but the latter interacts with the process. The functional levels are interdependent. For example, for an operator to interact with the computer, s/he must use an input device. Similarly, all components associated with the I/O level that interact with the computer must involve some component at the communication level. The functional model can be used to establish “what if” scenarios. For example, if an operator inputs incorrect data, what happens if the error propagates through to the other levels? A PCED can be used to represent a control system in terms of events, order of events, components, and control and data flow. An event is represented by either an input/output event or a computation event. An input/ output event is a labeled arc, which links different object nodes from different functional levels to an I/O node on the computer level. A label describes either the propagation of a signal, data, or the causal action or effect. The direction of an arc represents the direction of the propagation. An object node denotes an object involved in a system. A computation event indicates some processing that has to be carried out by the computer; this is shown as a computation node on a PCED. The actual description of the computation is
If temperature < setpoint then goto N1 Consider another example based on a computer-controlled batch reactor shown in Figure 2, this example is modified from the one described by Kletz.9 It has the following control logic implemented: when a fault occurs, an alarm sounds and all controlled variables are left as they were until the operator tells the control system to continue. There is also a liquid level feedback control loop on the condenser. So both discrete and continuous control systems are involved. The PCEDs of the control logic and the liquid level control loop for the batch reactor are shown in Figures 3 and 4, respectively. In Figures 3 and 4, CATF is the catalyst flow, CATV is the catalyst valve, COOV is the cooling water valve, LSENSOR is the liquid level sensor in the gearbox, LSEN is the liquid level sensor in the condenser, REFV is the reflux valve, and WATF is the cooling water flow. CONL is the liquid level in the condenser. The two computation nodes N3 and N5 in Figure 4 include the following algorithms and safety constraints:
Node N3 Constraint 1 If CONL ) BAD value then goto N6 Constraint 2 If CONL < min or CONL > max then goto N6 PID control algorithm e ) CONL - SETPOINT
∫
de OUTPUT ) k1*e + k2* e dt + k3* dt
(1) (2)
Constraint 3 If |∆OUTPUT| > set value and ∆OUTPUT > 0 then ∆OUTPUT ) set value If |∆OUTPUT| > set value and ∆OUTPUT < 0 then ∆OUTPUT ) -1 * set value where ∆OUTPUT is the change of OUTPUT in a calculation interval and set value is the maximum
4362
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
Figure 5. Schematic representation of the safety verification problem. Figure 2. Computer-controlled batch reactor.
Figure 3. PCED of the control logic for the computer-controlled batch reactor.
Figure 4. PCED of the liquid level feedback control loop in the condenser.
permitted change of the controller output in a calculation interval
Node N5 goto N1 Safety Verification of Control Logic Although several formal methods for verifying the safety of discrete/continuous control systems have been developed,10,12-16 attempts to verify control logic automatically are still not practical, because it is difficult to specify complete process models. On the other hand, with informal methods it is difficult to account for all possibilities due to the combinatorial nature of the
problem. A promising way is to carry out the safety verification in an informal but structured manner using computer tools. This approach will be discussed in this section. A process is usually deemed to be safe only if, under the control of a control logic, it does not reach any undesirable state from any initial condition; that is, the variables in those states do not take values within any undesirable regions. The safety verification problem is illustrated schematically in Figure 5 for a simple system involving two continuous variables x1 and x2. A state transition table can be used to show the change of states due to a sequence of control actions. On the basis of the understanding of the behavior of the process, an engineer is required to fill in the state transition table. After applying all the control actions, the process reaches its final state. If any intermediate state is undesirable, then the control logic is not safe and must be modified. For HAZOP purposes, the state transition table only needs to be completed in qualitative terms; precise quantitative values are not required. If the process engineer or the design team is not able to work out the state transition, then more preliminary work will have to be carried out in order to improve understanding of the process. For example, given the normal state as the initial state, applying the control logic shown in Figure 3 to the computer-controlled batch reactor of Figure 2 produced the state transitions shown in Table 1. The state S4 is undesirable; therefore, the control logic is not safe and should be modified. Consider changing the control logic to the following: when a fault occurs, an alarm sounds, the catalyst valve closes completely, and the cooling water valve and the reflux valve open fully. Similarly, after building the PCED for this modified control logic, a state transition graph can be generated. No undesirable state is reached; therefore, the modified control logic is safe. Continuous control software usually consists of two parts: control algorithms and safety constraints as shown in Figure 4. The control algorithms carry out the control function for normal operations. The safety constraints deal with abnormal conditions. Potential hazards can be generated due to abnormal conditions. Therefore, hazard identification of continuous control software focuses on the safety constraints for abnormal conditions. This will be discussed in the next section. Working out the state transition of a control logic can be very time-consuming, especially when a large number of variables are involved. Therefore, when creating the column headings for a state transition table, it is
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4363 Table 1. State Transitions of the Computer-Controlled Batch Reactor under the Control Logic state
oil level
alarm
catalyst flow (CATF)
cooling water (WATF)
reflux (REFLUX)
reactor temp
S1
normal
silent
low
low
low
S2
low
silent
low
low
low
S3
low
sound
low
low
low
S4
low
sound
high V N1 high V N2 high V N3, N4, N5 high
low
low
high
Table 2. Attributes, Guide Words, and Interpretations for Control Software attribute
guide word
interpretation
data flow or control flow
no more part of reverse other than early late more less more less no as well as other than no as well as part of other than no early late before after no more less other than no more less other than
no information flow more data is passed than expected information passed is incomplete information flow is in a wrong direction information is complete but incorrect information flow before it was intended information flow after it was required data rate is too high data rate is too low data value is too high data value is too low event does not happen another event takes places as well an unexpected event occurred instead no action takes place additional actions take place incomplete action is performed incorrect action takes place event/action never takes place event/action takes place before expected event/action takes place after expected happens before another expected event happens after another expected event output is not updated time between outputs is longer than required time between outputs is less than required time between outputs is variable never happens time is longer than expected time is shorter than expected time is variable
data rate data value event action
timing of event or action
repetition time
response time
important to focus on the safety critical subsystem and the process variables that are directly related to it. System Deviations Even with a safe control logic a process still can deviate from its design intent if any part of the control system hardware malfunctions. Therefore, we need to identify what can go wrong and consider what consequence may result. An efficient, systematic way of finding potential hazards is to introduce possible deviations from the design intent on the basis of the use of “guide words” which are words or phrases expressing specific types of deviation. In conventional HAZOP the common guide words are none (or no), more, less, and part of. These guide words are applied to the process attributes temperature, pressure, flow, level, and concentration. Another catchall guide word, other, is applied to the maintenance, start-up, and shutdown procedures. For a control system the attributes that can deviate are data flow, control flow, data rate, data value, event, action, timing of event or action, repetition time, and response time. The guide words that are applicable to these system attributes are similar to those for the process. However, the guide words need to be reinterpreted for them to be meaningful in a system context.21 The interpretation shown in Table 2 is based on ref 29. Deviation from normal behavior of the control systems is considered for each I/O node in the PCED. Causes,
corresponding consequences, and actions can be proposed by a team of experts. Potential hazards or safety critical events can then be identified. For example, Table 3 summarizes deviations, corresponding causes, and consequences for the liquid level control loop in the computer-controlled batch reactor shown in Figure 4. Safety-Related Questions If we can learn from past incidents, we may be able to prevent them from happening again. One way of learning lessons from past incidents is to build a safetyrelated question library based on analysis of industrial incident reports and use these to ask the designers safety questions related to the identified safety critical events. Over 170 questions have been derived and organized into a structured framework in our previous work,30 so that relevant questions can be located easily when considering different safety aspects of a computercontrolled plant. The questions are first divided into generic component types corresponding to the functional levels of the PCED model. The questions under each component type are then further divided according to their relevance to the different phases of the system life cycle. For example, one of the safety critical events in the computer-controlled batch reactor is that the cooling water valve fails to open. There are 22 questions in the question library related to the generic component actuator. Eight of these questions are grouped under the implementation and operation phases of the system
No alarm signal when required. The signal to the reflux valve, REFV, will be changed until reaching its upper or low limit. The process will be disturbed. The signal to the reflux valve, REFV, is changed according to this incorrect setpoint. The change of the flowrate of the reflux is greater/less than the desired value.
SETPOINT is BAD value. The signal to the reflux valve, REFV, increases to its upper limit. The reflux valve is left uncontrollable.
The condenser liquid level, CONL, is set to BAD value. Goto N6.
consequences
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
lifecycle. The following shows some of the questions for the implementation and operation phases only. These questions are categorized into four aspects of consideration: Selection: Question 1: What kind of actuator will be used? Installation: Question 2: How will this device be installed? Question 3: How will this device be interfaced to the system? Question 4: How will this device be calibrated? Testing: None Environment:
The controller parameter is not appropriately tuned. signal to the reflux valve REFV outside the desired range data value N4
less more
The operator made an error. SETPOINT incorrect data value N2
more other than
no signal to the alarm condenser liquid level CONL outside the normal limit data flow data value N6 N1
no less
data flow N4
no
data flow N2
no
no signal to the reflux valve REFV
The keyboard is out of order. The communication between the keyboard and the computer is broken. The communication between the computer and the reflux valve is broken. The communication to the alarm is broken. The liquid level sensor has not been calibrated for a long period time.
The liquid level sensor is out of order.
no signal from the liquid level sensor LSEN no signal from HID data flow N1
no
attribute
guide word
deviation
causes
Question 5: Is this device robust enough for the operating environment and number of operations?
HAZOP item
Table 3. Partial HAZOP Analysis Results for the Liquid Level Control Loop in the Computer-Controlled Batch Reactor
4364
Question 6: Is electromagnetic protection required for this device? Question 7: Is noise filtering/rejection required for this device? Question 8: What particular aspects of the operating environment may affect the operation of this device? By considering the appropriate questions, the possibility of the event cooling water valve fails to open can be reduced. As the event cooling water valve fails to open can also be caused by a communication link problem between the computer and the actuator, questions related to the communication link may also be considered. Hazard Identification for an Automated Semibatch Evaporator This section presents an example of the application of the hazard identification methodology discussed above to a purely discrete control system, an emergency control system in an automated semibatch evaporator which has been used as a benchmark example for safety identification for automated processing systems.13,31,32 Description of the Semibatch Evaporator. The semibatch evaporator which was built at the University of Dortmund, Germany, is shown in Figure 6. It consists of two connected cylindrical tanks T1 and T2 and is equipped with a small PC-based process control system. The following production sequence takes place: salt solution is charged into tank T1 and then evaporated until a desired concentration is reached. During evaporation, the condenser C1 condenses the steam coming from T1. When the desired concentration is reached, the material is drained from T1 into T2 as soon as T2 becomes empty and then the heating is stopped. Safety considerations dictate that high pressure or high temperature in the evaporator and in the condenser tube should be avoided; otherwise, the safety pressure valve will be tripped. In addition, the temperature in the evaporator should not become too low, since a crystallization effect will lead to precipitation of solids and spoil the batch. Consider the following emergency operation sequence for handling a cooling system breakdown. The problem
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4365 Table 4. State Transitions Generated by the Emergency Operation from a Normal State in the Semibatch Evaporator (the Underlines Express the Control Actions) state
F12
F13
F15
F18
HEAT
S1
off
on
off
off
on
S2
off
off
off
off
on
L_T1
L_T2
Tem_T1
Pre_C1
filled
filled
operating temp
operating pressure normal
filled
filled
operating temp
increasing to above op. pressure
filled
emptying
operating temp
increasing to above op. pressure
filled
emptying
decreasing to ambient
decreasing to atmospheric
filled
empty
decreasing to ambient
decreasing to atmospheric
emptying
filling
decreasing to ambient
decreasing to atmospheric
V N1 V N2,N3 S3
off
off
off
on
on V N4
S4
off
off
off
on
off V N5,N7,N8
S5
off
off
off
off
off V N9
S6
off
off
on
off
off
Figure 7. PCED of the operating sequence for handling the cooling system breakdown. Figure 6. Flow chart of the semibatch evaporator.
is to identify if there is any potential risk involved in the operation sequence and, if so, how to avoid it. Operating sequence for handling a cooling system breakdown: Step 1: open valve V18 and start pump P1 Step 2: switch off heating in tank T1 Step 3: if tank T2 is empty, close valve V18 and stop pump P1 and open valve V15 Safety Requirements. As mentioned above, the safety requirements for the semibatch evaporator are as follows: (1) High temperature in the evaporator and high pressure in the condenser should be avoided in order to prevent tripping the safety pressure valve. (2) Too low a temperature in the evaporator should be avoided to prevent precipitation of solids and spoiling the batch. Modeling of Control Logic. The control logic involved in the above operating sequence for handling the cooling system breakdown is modeled as a PCED as shown in Figure 7. The computation node N6 includes the following condition statement:
If L_T2 ) 0 goto Node N7; else goto Node N5 The events which initially activate the control logic should be used as a starting point in order to present this control logic in the PCED multilevel structure. In emergency operation, the control logic is activated by the event cooling water flow rate F13 being low. As shown in Figure 7, the computer gives out three control actions in series. Then, after a low signal from sensor
LIS/701 is received, pump P1 is stopped, valve V18 is closed, and valve V15 is opened. Table 4 shows the state transitions generated by applying the above control logic to the process. In Table 4 and Figure 7, L_T1 and L_T2 are the liquid levels in the tanks T1 and T2, respectively. Tem_T1 is the temperature in the tank T1. Pre_C1 is the pressure in the condenser C1. F12, F13, F15, and F18 are the flow rates through valves V12, V13, V15, and V18, respectively. HEAT is the state of the heating system. Verification of Control Logic. Because the cooling system breakdown incident could occur during any operation, the initial state of the system may be normal, as shown in Table 4 as S1, or it may be one of a number of start-up or shutdown states. Suppose the incident happened during normal operation. Table 4 shows first the state transitions from normal operation, state S1, to cooling water system breakdown, state S2, and then state changes caused by applying the above operating sequence to the process. The state transition from S1 to S2 is caused by the cooling system breakdown; S2 to S3 is the result of the control action of turning on P1 and opening V18; S3 to S4 is caused by turning off the Heater; a low signal from the liquid level sensor LIS/ 701 causes S4 to S5 by stopping pump P1 and closing V18; finally S5 to S6 happens when V15 is opened. Compared with the operating requirements, states S2 to S5 are all undesirable. If the process remains in state S2 or S3 for longer than a certain period, then the pressure in the condenser C1 will become too high, which would violate the first safety requirement. If the process stays in state S4 or S5 for longer than a certain period, then this will lead to the temperature in tank T1 being too low, which violates the second safety requirement. The only state transition which cannot be carried out immediately is from S4 to S5, since it must wait until a low signal from LIS/701 is received. The waiting time for emptying tank T2 depends on the volume of the
Step 11: close valve V15 Step 12: end Step 1 is added in order to avoid having F12 on during the operation of emptying tank T1. Steps 4, 5, and 6 are added to make sure that the waiting time before starting to empty tank T1 does not exceed the maximum possible delay time D. Steps 4, 5, and 6 ensure that precipitation of solids and spoiling the batch never occur. The PCED of the robust operating sequences is shown in Figure 8. The computation nodes N7, N10, N13, and
consequences
Monitor the opening of V18. Monitor the state of pump P1. Monitor the state of the heater. fail to open V18 fail to turn on pump P1 fail to turn off heater no no no
Step 10: when T1 is empty, stop pump P1 and close valve V18
action action action
Step 9 : open valve V15
N2 N3 N4
Step 8: when T1 is empty goto step 11
Cooling system breakdown cannot be identified and the emergency operation may not be carried out on time. Tank T2 will not be emptied. Tank T2 will not be emptied. The pressure in the condenser C1 will keep on increasing until the safety pressure valve is relieved.
actions
Step 7: open valve V15
deviations
Step 6: if tank T2 is empty before delay time D is reached, then stop pump P1 and close valve V18; else goto step 5
fail to receive a signal from FIS/801
Step 5: if time D is reached before tank T2 is empty, then goto step 9; else goto next step
guide word
Step 4: calculate maximum delay time D for opening valve V15
attribute
Step 3: switch off heating in tank T1
data flow
Step 2: open valve V18 and start pump P1
N1
Step 1: close Valve V12
HAZOP item
liquid in it. Therefore, a maximum possible waiting time for receiving a low signal from LIS/701 needs to be determined on the basis of a quantitative model. The control sequence will work if the cooling water system breaks down during normal operation. But the system can break down at any time, for example during start-up. In this case F12 will be On and L_T1 will be Filling in the state S1 of Table 4. After executing the control sequence, in the state S6, F12 will be On, F15 will be On, and L_T1 will be undesirable. Therefore, a more robust operating sequence must be derived as follows.
Table 5. Partial HAZOP Analysis Results for the Emergency Operation Sequence in the Semibatch Evaporator
Figure 8. PCED of the robust operating sequence for handling the cooling system breakdown.
Install a duplicate sensor.
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
no
4366
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4367
N18 include the following statements: Node N7: Calculate the maximum permitted waiting time D. Count the current waiting time DT
Implementation stage: Selection What sensor will be used?
If DT > ) D, then goto Node N8
Installation
else if L_T2 ) 0, then goto Node N14
How will this sensor be installed?
else goto Node N6 Node N10: If L_T1 ) 0, then goto Node N11; else goto Node N9 Node N13: goto Node N19 Node 18: If L_T1 ) 0, goto Node N19; else goto Node N17 HAZOP Identification for the Computer Logic Control System. Following the principles of HAZOP, deviations from normal behavior can be introduced for each action in the PCED shown in Figure 8. For example, the deviation for the action receiving a low signal from FIS/801, Node N1 in Figure 8, would be fail to receive a signal from FIS/801. The consequence of this deviation is that the emergency operation for handling a cooling system breakdown may not be carried out on time, which may lead to the pressure in the condenser C1, Pre_C1, increasing rapidly. Similarly, deviations for other control actions will need to be considered. Table 5 shows the deviations for three control actions. The deviations, failure to receive a signal from FIS/801 as well as failure to turn off the heater, are safety critical. Application of the Life Cycle Question Library. Consider the event fail to receive a signal from FIS/ 801, where the component involved is the sensor FIS/ 801. There are a number of questions in the question library for sensor components for overall life cycle stages. For example, the following are the questions for the design and implementation stages: Design stage: Options
How will this sensor be calibrated? Where will this sensor be positioned? Is position being representative of state being measured? Testing How will this sensor be tested? By considering the appropriate questions, the possibility of the event fail to receive a signal from FIS/ 801 can be reduced. Hazard Identification for Continuous Control Algorithm This section presents an example of the application of hazard identification methodology to a purely continuous control system, a temperature profile control system of a riser reactor in an industrial fluidized catalytic cracking unit (FCCU).33 Process Description. The outline structure of the FCCU and its control system are shown in Figure 9. The control system was implemented in a Distributed Control System (DCS). The oil feed is mixed with the regenerated catalyst and then reacts endothermically in the riser reactor. The product of the reaction is passed to the main fractionator for further processing, and the catalyst with coke deposits is circulated to the regenerator, where the coke is burnt off with an air supply. The regenerated catalyst is recirculated to the reactor and supplies the heat required for the cracking reaction. Although the reaction is endothermic, the temperature in the reactor is largely dependent on the amount of catalyst present, as the catalyst is heated to a high temperature before entering the reactor. T1.PV is the outlet temperature of the riser reactor. T1.SP is the set point of T1.PV. T1.PV is usually controlled by manipulating the regenerated catalyst valve to change the
What alternative states, measurement methods would be suitable? Are multiple sensors required? Inputs/outputs What is the expected range of the input values? Timing/control When does this state have to be measured? How often does this state have to be scanned? If multiple sensors are to be used to monitor a state, what strategy will be adopted? What variations are due to the positioning of these devices? Will these variations remain constant with time? How fast does the response have to be?
Figure 9. Riser reactor and regenerator of a FCCU with the original PID control system.
4368
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
Figure 10. PCED of the original PID control system for the FCCU riser reactor.
Figure 11. FCCU riser reactor and regenerator with the revised control system.
catalyst circulation rate using a traditional PID algorithm. The output of the controller, T1.OP, is the control signal to the catalyst valve. The PCED of the control system is shown in Figure 10. The two computation nodes N3 and N5 include the following algorithms and safety constraints:
Figure 12. Configuration of the revised control system.
Node N3 Constraint 1 If T1.PV ) BAD value, then goto N6 Constraint 2 If T1.PV < min or T1.PV > max, then goto N6 PID control algorithm e ) T1.PV - T1.SP
∫
de T1.OP ) k1 * e + k2* e dt + k3 * dt
(3) (4) Figure 13. PCED of the revised control system for the FCCU riser reactor.
Constraint 3 If |∆T1.OP| > set value and ∆T1.OP > 0, then ∆T1.OP ) set value If |∆T1.OP| > set value and ∆T1.OP < 0, then ∆T1.OP ) -1 * set value where BAD value means that the sensor or the communication is out of order and the signal from the sensor is a fault code, ∆T1.OP is the change of T1.OP in a calculation interval, and set value is the maximum permitted change of the controller output in a calculation interval.
Node N5 goto N1 Constraint 1 means that if T1.PV has a BAD value, then sound the T1.PV alarm, hold the current T1.OP value, and wait for operator intervention. Constraint 2 means that if the value of T1.PV is outside its normal limits, then sound the T1.PV alarm, hold the current T1.OP value, and wait for operator intervention. The purpose of constraint 3 is to prevent a change in T1.OP deviating from a set range in a single control interval.
This control system has been proved to be safe and was in operation for a long period of time at a refinery.33 Revised Control System. To improve the dynamic behavior of the control system, a revised control scheme was proposed, as shown in Figure 11. Three temperature sensors were placed along the riser of the reactor. T1.PV, T2.PV, and T3.PV are the outlet, middle, and bottom temperatures of the riser reactor, respectively. RC stands for the revised control algorithm. This was a temperature profile control strategy using a weighted average temperature of the reactor as the control variable and using the setpoint of T1.PV, T1.SP, as the setpoint of TT.PV, as shown in eqs 5-7. The configuration of the revised control system in the DCS is illustrated in Figure 12. It consists of a standard PID module and a standard ADD module. Using the standard modules in the DCS ensures safety, simplicity, and reliability of the control system. The PCED of the revised control system is shown in Figure 13. It includes three computation nodes as follows:
Node N4 TT.PV ) W1 * T1.PV + W2 * T2.PV + W3 * T3.PV (5)
where BAD value is a fault code, ∆T1.OP is the change of T1.OP in a calculation interval, and set value is the maximum permitted change of the controller output in a calculation interval.
Node N8 goto N1
Hidden Errors in the Revised Control System. After the safety constraints 1, 2, and 3 were implemented, the above revised control system was put into use. Because the temperature profile was introduced, any disturbances from oil feed, steam, and catalyst may be overcome more quickly. The outlet temperature T1 can be controlled within (0.6 °C of its set point. The operation of the revised control system is almost the same as that of the original PID control system for the operators. After about half a year of running, there was an error in the sensor of T2.PV, which got a BAD value immediately (T2.PV ) 20000050 (BAD VALUE code)). Then TT.PV reached its high limit value because the “ADD” real-time module did not filter any faults. The PID module in Figure 12 was working in the automatic mode; its output T1.OP decreased step by step due to safety constraint 3. When T1.PV decreased to its lower alarm limit, then safety constraint 2 was activated. The operator set the advanced controller to “MAN”ual mode, checked the sensor of T1.PV, and raised T1.PV manually. No attention was paid to the sensors of T2.PV, and T3.PV since the revised control system works in the same way as the original PID control system. After T1.PV went back to its normal value, the operator set the advanced controller to “AUTO”, but the control system failed again. Engineers took several hours to find out the reason for the fault. Although no accident occurred, the reactor outlet temperature dropped
attribute
data flow
data flow
data flow
data flow
data flow data value
data value
data value
data value
data value
N2
N3
N5
N7
N9 N1
N2
N3
N5
N7
less more
T1.OP outside the set range
T1.SP incorrect
T3.PV outside the normal limit
T2.PV outside the normal limit
If |∆T1.OP| > set value and ∆T1.OP < 0, then ∆T1.OP ) -1 * set value
no signal T1.PV outside the normal limit
If |∆T1.OP| > set value and ∆T1.OP > 0, then ∆T1.OP ) set value
no less more less more less more other than
Constraint 3
no signal
(7)
no signal
(6)
no
PID control algorithm
no
actions
The range of setpoint has been defined in keyboard module. The consequence never happens. Handled in safety constraint 3 in node N6.
Add a safety constraint similar to constraint 2 for T3.PV in node N6.
Add a safety constraint similar to constraint 2 for T2.PV in node N6.
Regularly observe the screen display. Handled in safety constraint 2 in node N6.
The range of setpoint has been defined in keyboard module. The consequence never happens. Use high reliable communication facility.
Add a safety constraint similar to constraint 1 for T3.PV in node N6.
Add a safety constraint similar to constraint 1 for T2.PV in node N6.
Node N6
no signal
If T1.PV < min or T1.PV > max, then goto N9
Handled in safety constraint 1 in node N6.
0 < W3 < W2 < W1 < 1
no
consequences
Constraint 2
T1.PV is BAD value. TT.PV reaches its upper limit value. T1.OP decreases to its low limit. The whole temperature profile T1, T2, T3 decrease. T2.PV is BAD value. TT.PV reaches its upper limit value. T1.OP decreases to its low limit. The whole temperature profile T1, T2, T3 decreases. T3.PV is BAD value. TT.PV reaches its upper limit value. T1.OP decreases to its low limit. The whole temperature profile T1, T2, T3 decreases. T1.SP is BAD value. T1.OP increases to its upper limit. The whole temperature profile T1, T2, T3 increases. T1.OP is disconnected. Catalyst valve is out of signal and is completely closed. The process is shutdown. Alarm is always silent. T1.OP will be changed until reaching its upper or low limit. The process will be disturbed. T1.OP will be changed until reaching its upper or low limit. The process will be disturbed. T1.OP will be changed until reaching its upper or low limit. The process will be disturbed. T1.OP is changed based on this incorrect setpoint. The whole temperature profile will be disturbed. The change of flow rate of the catalyst is greater than a desired value. A big disturbance will be introduced into the plant.
W1 + W2 + W3 ) 1
no signal
deviation
where W1, W2, and W3 are weight coefficients.
no
no signal
Constraint 1
no
guide word
∫e dt + k3 * de dt
data flow
T1.OP ) k1 * e + k2 *
N1
If T1.PV ) BAD value, then goto N9
HAZOP item
e ) TT.PV - T1.SP
Table 6. HAZOP Analysis Results for the Advanced Control System in the Riser Reactor
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4369
4370
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999
from its set point to its low limit several times. This was harmful to the safe and efficient operation of the reactor. From this case study, it is clear that even a DCS with safety constraints is not always safe, especially when changes are introduced. A hazard identification study is necessary.
Then Node N6 would have the following contents Constraint 1a If TT.PV ) BAD value, then goto N9 Constraint 2a If TT.PV < min or TT.PV > max, then goto N9 PID control algorithm
HAZOP Identification for the Revised Control System As discussed in the section Safety Verification of Control Logic, the potential hazards involved in a continuous control software-based process are likely to be generated due to incomplete safety constraints for dealing with abnormal operations. To achieve complete safety constraints for the control algorithm shown in Figure 13, all meaningful deviations made by combining guide words with a parameter such as flow or temperature should be introduced into the process according to the PCED expression. If the consequences are unacceptable, then new safety constraints need be added to the control software. Table 6 summarizes the HAZOP results for the revised control algorithm. Several actions are suggested for various deviations which are highlighted in Table 6. Implementing these actions adds the following new safety constraints in the control software for the node N6 in Figure 13 when implementing the control algorithm.
Constraints for T2.PV Constraint 4 If T2.PV ) BAD value, then goto N9 Constraint 5 If T2.PV < min or T2.PV > max, then goto N9
Constraints for T3.PV Constraint 6 If T3.PV ) BAD value, then goto N9 Constraint 7 If T3.PV < min or T3.PV > max, then goto N9
e ) TT.PV - T1.SP T1.OP ) k1 * e + k2 *
∫e dt + k3 * de dt
(6) (7)
Constraint 3a If |∆T1.OP| > set value and ∆T1.OP > 0, then ∆T1.OP ) set value If |∆T1.OP| > set value and ∆T1.OP < 0, then ∆T1.OP ) -1 * set value Concluding Remarks Hazard identification for computer-controlled systems is becoming more and more important because of new routes to failure and potential risks introduced by using computers to control, protect, and monitor industrial processes. Existing hazard analysis and identification techniques are not appropriate for computer-controlled processes. A qualitative, functional model-based approach for hazard identification has been presented. The proposed approach is based on the Process Control Event Diagram (PCED), which is used to express the control logic and the control algorithm for discrete/ continuous control systems in a consistent way. The hazard identification framework consists of five steps: identification of safety requirements, representation of control system, verification of the control logic, HAZOP analysis, and application of safety-related questions. State transitions are generated on the basis of the PCED and used for verification of the control logic. Hazard identification for continuous control software focuses on generation of safety constraints. The lessons from past incidents are used to avoid these incidents happening again by applying appropriate safety-related questions to safety critical events identified by HAZOP analysis. A purely discrete control system, an automated semibatch evaporator; a purely continuous control system, a riser reactor temperature control system in an industrial FCCU; and a hybrid control system, a computer-controlled batch reactor, are used to illustrate the whole procedure of applying the approach. Acknowledgment
Therefore constraints 1-7 in the node N6 will ensure that the revised control system for the riser temperature profile is safe in the cases of T1.PV, T2.PV, or T3.PV failing. Another way to handle the failures of T1.PV, T2.PV, or T3.PV is to build two constraints for TT.PV instead of constraints for T1.PV, T2.PV, and T3.PV separately.
Constraints for TT.PV 1. If TT.PV ) BAD value, then goto N9 2. If TT.PV < min or TT.PV > max, then goto N9
This project was funded by the EPSRC, U.K., grant No. GR/K90302, and continuing support for collaboration with the Department of Chemical Engineering, Dortmund University, is being funded by the British Council. Literature Cited (1) Kletz, T. A. HAZOP & HAZAN: Notes on the identification and assessment of hazards; The Institution of Chemical Engineer: Rugby, England, 1986. (2) Knowlton, R. E. Hazard and operability studies: the guide word approach; Chematics International Company, Vancouver, 1989.
Ind. Eng. Chem. Res., Vol. 38, No. 11, 1999 4371 (3) Lawley, H. G. Sizing up your plant this way. Chem. Eng. Prog. 1970, 70 (4), 45. (4) A Guide to Hazard and Operability Studies; Chemical Industries Association Ltd., 1977. (5) Ramiro, J. M. S.; Aisa, P. A. B. Risk analysis and reduction in the chemical process industry; Blackie Academic & Professional: London, 1998; p 16, ISBN 751403741. (6) Andow, P. Guidance on HAZOP procedures for computercontrolled plants. Contract Research Report, No. 26; HMSO: London, 1991; ISBN 0118859773. (7) Health and Safety Executive. Out of control: why control systems go wrong and how to prevent failure; HMSO: London, 1995. (8) Nimmo, I. Extend HAZOP to computer control systems. Chem. Eng. Prog. 1994, Oct, 32. (9) Kletz, T. Some incidents that have occurred, mainly in computer-controlled process plants. In Computer Control and Human Error; Kletz, Ed.; Institute of Chemical Engineers: Rugby, U.K., 1995; Chapter 1. (10) Moon, I.; Powers, G. J.; Burch, J. R.; Clarke, E. M. Automatic verification of sequential control systems using temporal logic. AIChE J. 1992, 38 (1), 67. (11) Clarke, E. M.; Emerson, E. A.; Sistla, A. P. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Trans. Programming Lang. Syst. 1986, 8, 244. (12) Park T.; Barton, P. I. Implicit model checking of logic-based control systems. AIChE J. 1997, 43 (9), 2246. (13) Kowalewski, S.; Gesthuisen, R.; Roaˆmann, V. Model-based verification of batch process control software. Proc. IEEE Conf. SMC, San Antonio; Institute of Electrical and Electronics Engineers: New York, 1994; p 331. (14) Dimitriadis, V. D.; Shah, N.; Pantelides, C. C. Modeling and safety verification of discrete/continuous processing systems. AIChE J. 1998, 43 (4), 1041. (15) Yang, S. H.; Chung, P. W. H. Life cycle hazard analysis for computer controlled processes. Comput. Chem. Eng. 1998, 22 (Suppl.), S483. (16) Yang, S. H.; Chung, P. W. H. Hazard analysis and support tool for computer controlled processes. J. Loss Prevention Process Ind. 1998, 11, 333. (17) Yang, S. H.; Rong, G.; Chung, P. W. H. Verification of control logic based on process control event diagram. International Conference on CONTROL’98, Swansea, U.K.; 1998; Vol. 2, p 1090. (18) Willis, D. M. Guidance on HAZOP procedures for computer controlled chemical plant. Loss Prevention Bull. 1992, 108, 19. (19) Chung, P. W. H.; Broomfield, E. J. Hazard and operability (HAZOP) studies applied to computer-controlled process plants. In Computer Control and Human Error; Kletz, Ed.; Institute of Chemical Engineers: Rugby, U.K., 1995; Chapter 2. (20) Broomfield, E. J.; Chung, P. W. H. Safety assessment and the software requirement specification. Reliability Eng. Syst. Safety 1997, 55, 295.
(21) Redmil, F.; Chudleigh, M. F.; Catmur, J. R. Principles underlying a guideline for applying HAZOP to programmable electronic systems. Reliability Eng. Syst. Safety. 1997, 55, 283. (22) Drake, E. M.; Thurston, C. W. A safety evaluation framework for process hazards management in chemical facilities with PES-based controls. Process Safety Prog. 1993, 12 (2), 92. (23) Smith, J. U. M. Using a layered functional model to determine safety requirements. Proceeding of the safety-critical systems symposium; Springer-Verlag: London, 1996; p 56. (24) Croll, P. R.; Chambers, C.; Bowell, M.; Chung, P. W. H. Towards safer industrial computer controlled systems. Proceedings of the 16th International Conference on Computer Safety, Reliability and Security, York; Springer-Verlag: London, 1997; p 321. (25) Chung, P. W. H.; Yang, S. H. Functional Modelling for Hazard Identification. Workshop Notes of 5th international Workshop on Advanced in Functional Modelling of Computer Technical System, Paris; 1997; p 117. (26) Larkin, F. D.; Rushton, A. G.; Chung, P. W. H.; Lees, F. P.; McCoy, S. A.; Wakeman, S. J. Computer-aided Hazard Identification: Methodology and System Architecture. Proc of Hazards XIII, Manchester, U.K.; IChemE Symposium Series No. 141; IChemE: Rugby, U.K., 1997; p 337. (27) Wang, X. Z.; Yang, S. A.; Yang, S. H.; McGreavy, C. The application for fuzzy qualitative simulation in safety and operability assessment of process plants. Comput. Chem. Eng. 1996, 20 (Suppl.), S671. (28) Tarifa, E. E.; Scenna, N. J. Fault diagnosis, direct graphs, and fuzzy logic. Comput. Chem. Eng. 1997, 21 (Suppl.), S649. (29) Ministry of Defence, Hazop studies on systems containing programmable electronics, Part 2: General Application Guidance. Interim, Defence Standard, Glasgow, 1996. (30) Chung, P. W. H.; Broomfield, E.; Yang, S. H. Safety related questions for computer-controlled plants: derivation, organisation and application. J. Loss Prevention Process Ind. 1998, 11, 397. (31) Kowalewski, S.; Preuaˆig, J. Verification of sequential controllers with timing functions for chemical processes. 13th IFAC World Congress, San Franciso, CA; 1996; p J 419. (32) Chung, P. W. H.; Yang, S. H. The application of the HAZAPS methodology to an semi-automated batch evaporator. International Workshop on Discrete Event Systems, Cagliari, Italy; 1998; p 308. (33) Rong, G.; Yang, S. H.; Chung, P. W. H. Hazard analysis for DCS based advanced control algorithm using functional model. CONTROL’98, Swansea, U.K.; 1998; Vol. 2, p 1090.
Received for review February 22, 1999 Revised manuscript received August 2, 1999 Accepted August 16, 1999 IE990130K