RELIABILITY THEORY IN CHEMICAL SYSTEM DESIGN DA L E
F. R U D D, University of Wisconsin, Madison, ?Vis.
Reliability theory is applied to the design of chemical processing systems which are subject to frequent failures in key processes. By designing redundancies into the processing system, the adverse effects on the system of such failures can be reduced. The concepts of parallel and stand-by redundancies are discussed in some detail, and dynamic programming is applied to determine the optimum design for series processes. Methods for the reliability analysis of more complex systems are presented.
N MANY INDUSTRIAL SITUATIONS,
the performance of a process-
I ing system is greatly influenced by the failures of certain key
steps in the processing sequence. In the biochemical industry in particular, such failures occur with discouraging frequency, resulting in great economic losses both in materials and effort. The failure characteristics may, in many cases, be the main factor governing the economic feasibility of a process. To reduce the adverse effects of failures in the system, certain redundancies may be designed into the system by producing replicate batches of key components, by constructing parallel processing systems, or by the use of stand-by reactors. The extent to which these design counter measures benefit the process system is largely determined by economic factors. The costs of designing and operating such redundancies must be more than balanced by the increased profit obtained by the improved performance of the system. Reliability theory is an application of the calculus of probability which evolved from the studies of rocket systems and has been used with success in the design of complex electronic systems. In this article, the theory is applied to the economic design of complex chemical processing systems which operate in the presence of frequent failures of key processing steps. First the failure characteristics of a single process a r e discussed. Then the effects of these failures on the performance of a processing system are determined in terms of the interrelations of processes in the system. The use of redundancy as a design counter measure is then discussed and methods are given to determine the optimal redundancy for series processes. The problem of determining the reliability of a general system i s also presented. Clearly, reliability theory is only applicable in the design of certain classes of chemical processing systems, and then only in the later stages of the design and, in some cases, .even only after plant construction. The primary design criterion is normally performance. That is, the system must be capable of performing a given task, assuming that every step in the process is perfectl) reliable. Frequently, this criterion is sufficient. In many cases, information is available on process reliability from the history of the process performance. In this case, simple design modifications may produce more desirable system performance. More often the frequent failure of a 138
I&EC FUNDAMENTALS
key step of a process is not detected until the plant is constructed and is in operation. In this case simple redesign counter measures may improve process performance. Process Failures
Before a large scale processing system can be studied, i t is necessary to have a clear understanding of the performance of each individual process in the system and to have methods available to express this performance quantitatively. The calculus of probability is used for this, but first the types of failures that are encountered will be discussed qualitatively. Consider the operation of a delicate chemical reactor. This process is subject to frequent failures. That is, during the course of the reaction things may go bad, and the material produced is not the desired product. Such failures could be caused by catalyst degradation, the chance inclusion of a poison in the feed, by the formation of a hot spot in the reactor. or by any number of complex phenomena. As information is obtained from replicate runs of the process, it is often possible to distinguish three modes of failure which occur in separate periods of time. During the reactor start-up the failure rate may be quite high. This initially high rate is caused by the failure of substandard elements. For example, a poorly fabricated catalyst pellet may be prone to form hot spots which increase the chances of reaction failure, or a flaw in the glass lining of a reactor might allow corrosion to weaken the reactor SO it must be replaced. These failures occur when the process is initially put into operation and once the process has survived this initial period the failure rate drops. It is then common for the failure rate to level out and remain constant for a long period of time. During this period, the process failures are said to be caused by chance. The failures occur without apparent cause, and the rate of failure is fairly independent of time. This is the period of random failures. As the equipment grows old the failure rate increases. This increase is caused by the normal wear out of the equipment. This is called the period of wear-out failures. In a n actual process, these three periods-initial, random,
and wear out-will be superimposed, and the failure history of a process may not resemble any one of these modes of failure alone. By proper mechanical design and preoperation care, the initial failure period may be reduced. The random failures are generally beyond control, and, in fact, the purpose of reliability theory is to design around these failures. Hence, one must always be concerned with chance failures. T h e sharp increase in the failure rate which is caused by equipment wear out may not occur until the process is quite old. A study of the random failure period alone is in itself quite valuable, but no reliability analysis is complete until the initial and wear-out failure periods are considered. Now consider quantitative measures of the failure properties of a process. The failure history of a process is not generally reproducible. Failure is a stochastic or statistical phenomenon, and it is therefore necessary to consider it from a probabilistic point of view. Define i he reliability R(t) of a process as the probability that the process will perform properly during the period of time [ O . t ] . Then if Q ( t ) is the probability that the process will fail during that period of time, R ( t ) has the properties
Ut)
+ R(t)
=
resolve the relative contributions from the various failure modes. With suitable care, these difficulties may be avoided. The initial and wear-out periods differ from the random period in that the failure rates are time dependent. There is no one model for the mode of failure for these two periods, but often the data may be approximated by a normal distribution function. (4)
where M is the mean life of the process, 2 the variance about that mean, and 7 the accumulated operation time for the process. t is the operation time-i.e.. the time elapsed since this particular run on the equipment was begun-and 7 is the accumulated operation time for the process-i.e., the total elapsed time since the process was new. A process which starts in operation at time t = 0 using equipment which is of age 7 will have a reliability consisting of the joint probability that the process has not failed by either the initial, random, or wear-out modes of failure.
1
R(0)= 1
The second and third terms are the a posteriori probabilities imposing the condition that the equipment must have survived these modes of failure up to t = 0 since the process was assumed to be put into operation successfully a t t = 0. This background in individual process failure is sufficient to see how the history of a process can be used to characterize its performance and to estimate its future performance. Bazovsky gives a detailed discussion of reliability (2).
Iim R(t) = 0 t+ m
I n a batch process where satisfactory performance is required only during the period of time that the process is in operation, R = R( T ) where T i s the processing time. The reliability measure, R, may be thought of as either a probability of occurrence of an event on a single process, or as the fraction of tixes that event occurs during a number of trials with the process. In the limit of a large sample these are the same. Performance history of a process can then be used to estimate process reliability. T h e rate of failure X(C) is the a priori probability of failure in the period of time [t. t -t d t ] and is related to the reliability of a process by the relation .-
=
Process Systems
Consider now a processing system sho\vn in Figure 1 with a number of individual processing steps which must be performed in a series to produce the final product. A primary raw material is reacted with a secondary specie in process 1V. The product of this reaction is then fed to process S - 1 and reacted with another secondary species and so on through the process chain. T h e secondary species are all quite delicate and cannot be stored; hence, it is necessary to produce them upon demand by special reactions. The intermediate products of the reaction are also quite unstable and if the proper secondary species is not available on time, the processing system will fail, and the entire effort expended in the previous steps of the process is wasted. Thus, the failure of any one of the reactions which produce the secondary species results in the failure of the entire processing system. Let R , be the reliability of the zth process-i.e., the probability that the tth secondarv specie will be available on time. The reliability of the system is the joint probability that all .Y secondary species are available on time.
x(t)R
For the random mode of failure, the rate of failure is not time dependent and hence R,(t)
=
exp.[ - A t ]
(3)
The mean time between failures m is the first moment of the failure distribution function.
From a measurenieni: of the mean time between failures m, an estimate of the parameter X can be obtained, and the random failure reliability can be determined. When the random failure mode is superimposed on other modes, serious statistical problems may be encountered in attempting to
Secondary Specie A
Secondarv Specie 5-1
Secondary Specie
I
2-
j.
Figure 1 .
Ri i = l
fi +w I
Primary Specie
.v
R, =
.. .
Secondary Specie 1
K
A
J.
*
n Product
Series processing sequence VOL. 1 NO. 2 M A Y 1 9 6 2
139
If a given secondary specie is frequently not available (the process which produces it fails frequently) it would be advisable to produce more than one batch of that specie to increase the probability that a t least one batch will be available on time. This is called parallel redundancy. While only one batch is needed, several are produced to reduce the effects of batch failures. Consider now the i* step in this reaction system for which we will prepare l t batches of the ifh secondary specie. Only one batch can be used, hence, 1, - 1 batches are redundant. The probability that a single batch will fail is 1 - R,, the probability that all I t batches will fail is (1 - R J J L . Therefore, the probability that a t least one batch will succeed is R,’ = 1 - (1 - R,)Ic which is by definition the reliability of the t f h process with its redundancies. Now if each of the ,V stages in the process system is designed with a redundancy of order l t - 1, the reliability of the system is N
11
R, =
- ( 1 - Ri)li]
(7)
i = l
Reaction 2
+ Aa +.x x + A2 + y
Reaction 1
y
Reaction 3
w
+ A1
+
z
The secondary species AI, AB, and A3 are prepared for this reaction and cannot be stored. The reliabilities of the reactions and Ra which produce AI, A P ,and A3 are R 1 = 3/4, Rz = 1/3.
The probability of successful operation of this process with no redundancies is-
R, = RiRzRa
=
1/8
Only one trial in eight would be expected to succeed. Now suppose that for each batch of the raw material, two batches of each of the secondary species are prepared-a redundancy of order one. The system reliability is then R,
=
[l - (1
- R 1 ) 2 ] [1 (1
- R 2 ) * ] [ 1- (1 - R j ) 2 ]= 25/64
The reliability of the system is increased by a factor of three. T h e more redundant the system is made, the greater its reliability. But it is expensive to use redundant design, and in the chemical industry reliability is not the most important design criterion. Optimal Parallel Redundancy
Let P be the profit received if the processing system is successful. The reliability of the system is the fraction of the trials that are successful and hence the expected profit for the system is PR,. The redundancy built into the system is costly. Let C ibe the construction cost (suitably distributed over the life of the process) of the ifh kind of reactor, and let O f be the operation cost. The total secondary specie cost (the cost of constructing and operating the system of redundant A‘
reactors) is then
(C,
+ Oi)li.
The profit for the entire
r=l
system is the profit from the product minus the cost of proN
duction
PR, -
(C,
+ Oi)li.
The optimal parallel redun-
i = l
dancy is that which maximizes the system profit. 140
-
a A:
Rk‘. Now, if the optimal design for the stages .I7 - 1,
i=k
Example. Consider now the process in which the primary chemical specie, w, is modified by three successive batch reactions to give the valuable product, z.
=
Dynamic programming will now be used to determine the optimal parallel redundancy. This method, dynamic programming, is described in detail in the book “The Optimal Design of Chemical Reactors” by Aris ( 7 ) and is a skillful application of the principle of optimality developed by Bellman. This particular problem was solved in a n unpublished work ( 3 ) . The principle of optimality states that the last stages of a stage-wise process must necessarily operate optimally with respect to the results of the previous stages. When properly applied, this principle results in a great reduction in computational labor involved in determining the optimal redundancy. Divide the staged processing system into two parts. stage number S a n d stages S - 1, ;\i - 2, , , ., 1, Definef, - I(&‘,) as the expected return from the optimal operation of the last S 1 stages of the process as a function of the probability S, that the upstream stage N operates. The probability that all upstream processes work is defined for the Kfh stage as SA =
l & E C FUNDAMENTALS
- 2, . , ,, 1 is known, then stage ,\;can be designed optimally invoking the principle of optimality and solving the maximum problem in the single variable l,,.
N
f.v(Ss
+
= max.1f.v -
1)
- (CX
I(&)
IN
+ OsYsl
(8)
That is, the stage N must be designed such that the profit received from the operation of the stages S - 1: h’ - 2, . . ., 1 minus the cost of operating the stage S is a maximum. Since the primary species is always available S,. L 1 = 1. Now the remaining N - 1 stages may be thought of as a X - 1 stage process which operates with a source of raw materials whose reliability is S,. Invoking the principle of optimality yields fs - I(S.V)= max.[f,v - ~ S S I-) 1s
-
- (Cs- I
1
+ O S - 1)l.v
- I]
(8n)
This is the dynamic programming algorithm for the optimal design of stage N - 1. This recursive development is continued until only one stage remains and its optimal design is determined by the solution of fl(S2) = rnax.[PSI - (C1 11
+ 01)111
(86)
Now the optimal design for the single stage is determined for a spectrum of S2 values. The results of this design are then used in the recursive dynamic programming formula for the two-stage system for a spectrum of 5‘3 values. This is repeated until the entire set of .V maximization prGblems are solved numerically. Example, Consider the stageIvise process introduced in the previous example problem. Assign costs of operation
Process 3 Process 2 Process 1
Ci
Oi
0.1 0.5 0.5
0.1 0.5 0 5
and let the profit associated with the final product z be 10 units. The profit for the system with no redundancies is 3
3
(Ci
p n R ii=l
i=l
+ oi) =
-0.95
Now apply the dynamic programming concepts to determine the optimal design. T h e recursive dynamic programming algorithms are fl(S2) = max.[IOS1
- 1.0111
11
f3(1)
=
max.[fP(Sa) - 0.2131 13
The first maximum problem is solved for the optimal 11 for a spectrum of Sp values, and the results are presented in Table I for a one-stage pi-ocess. These results are then used to construct the corresponding table for the two-stage process by the solution of the second maximum problem. The results of these calculations are then used to form the single entry for the three-stage process.
Table 1.
The Dynamic Programming Tables Three Stages
s3
f3
13
s 2
1 0
1 28
7
0 94
S2
I2
12
SI
1. o
3.15
3 3
2 2
0.88 0.79 0.60 0.45
0
0.00
TZLO Stages 0.9 0.8
0.6 0.4
2.30 1 . 163 0.30 0.100 One Stage
s 1
fl
I1
1 .o 0.8 0.6
7.38 5.50 3.63 2.00 0 . !jO
2 2 2
0.4
0.2
1 1
process fails. In parallel redundant design, all of the processes operate. T h e stand-by redundant design is generally economically more desirable than parallel redundant design. Consider the effect of stand-by redundancies on the reliability of a system. A process with stand-by redundancies may be thought of as a single process where 1 - 1 failures are allowed before the operation of the process is interrupted where 1-1 is the number of stand-by processes. The reliability of this system is the same for parallel design, but the cost of the redundancies is less since the stand-by reactors operate only in case of failures. The reliability of the redundant process is R' = 1 - (1 - R ) since, with the stand-by reactors, the process fails only when all I processes fail (the one main process plus 1 - 1 stand-by processes). T o determine the expected cost of operation and construction of this stand-by system the expected number of stand-by systems that operate must be determined. T h e E, cost of construction and operation of this process is I , where C and 0 are the unit costs defined before and E is the expected number of processes that operate. The expected number of processes that operate E can be determined from the reliability R of the individual processes. If no failures occur, only the main process will operate. T h e probability that the main process will not fail is R. If the main process fails, a stand-by process will take over. If this stand-by process does not fail, then two processes will be used (the main process plus the first stand-by process). 1 - R is the probability that the main process fails and R is the probability that the first stand-by does not fail. Thus the probability that two processes will be used is (1 - R)R. (1 is the probability of failure of the main process and the first stand-by and, hence, is the probability that the second stand-by will operate. The probability that only three processes operate R)2R. (main process plus first two stand-by processes) is (1 By the same analysis, the probability that only k processes will operate is (1 - R ) k - ' R . T h e expected number of processes in operation is
+
-
-
k= 1
E Starting with the three-stage process the system profit for the systemf3 = 1.28 units with the optimal 13 = 7 and S3 = 0.94. Entering th.e table a t S3 = 0.94 for the twostage process gives 1 2 = 3 and SP = 0.83. Entering the table for S2 = 0.83 gives 11 = 2. Thus, the optimal parallel design consists of producing seven batches of A3, three batches of A*, and two batches of A1 a t