Steady-State Identification, Gross Error Detection, and Data

Ind. Eng. Chem. Res. 2004, 43, 4323-4336

4323

Steady-State Identification, Gross Error Detection, and Data Reconciliation for Industrial Process Units Shrikant A. Bhat and Deoki N. Saraf* Process Control Laboratory, Department of Chemical Engineering, Indian Institute of Technology, Kanpur 208016, India

Processing of online data for use with steady-state models requires identification of the existence of a steady state in a process, detection of the presence of gross errors, if any, and data reconciliation to eliminate random measurement noise. The method of Cao and Rhinehart (J. Process Control 1995, 5 (6), 363-374) was modified for steady-state identification by optimizing the filter constants so as to minimize type I and II errors and simultaneously reduce the delay in state detection. A simple algorithm has been presented that makes use of past measurements and the Kalman filter to detect the presence of gross error and estimate its magnitude. This algorithm simultaneously reconciles data for random measurement errors. Another algorithm that makes use of the exponential filter along with the least-squares minimization strategy is also applied for data reconciliation and found to perform equally well. All of these algorithms have been applied to data from an industrial crude distillation unit. The presence of strong autocorrelation in the industrial data was dealt with by adding random noise. This strategy makes the steady-state identification algorithm more suitable for online application. Introduction Ever-increasing competition coupled with stringent environmental and safety norms has led the process industry to large-scale automation. This, of course, requires accurate monitoring of the plant operation, which is achieved through measurement of various parameters such as temperatures, pressures, flows, etc. Present day computers and related hardware allow hundreds of measurements to be made simultaneously every few seconds. These measurements serve as a basis for each decision that follows. The measurements, on the other hand, are often corrupted with random noise and sometimes with systematic errors. In the presence of these errors, the small changes taking place in the process may get masked, making it difficult to improve upon the performance. Therefore, data processing is an important aspect of process automation and also a potential problem that needs to be addressed before optimization is attempted. Because of the random nature of the measurements, statistical methods provide efficient tools and are being used to effectively solve this problem. There are three important aspects of data processing: the first is steady-state identification (SSI), which involves ascertaining if the plant is operating under steady-state condition; the second is gross error detection (GED), which helps in eliminating systematic errors; the third is data reconciliation (DR), which involves making the data free of random errors. Steady-state models are often used for online process monitoring, product property prediction, online optimization, etc. Online data are used to tune these models from time to time to be able to represent a dynamic system using a static model. The methodology used for DR also depends on the existence or otherwise of a steady state. Therefore, SSI is the starting step in all of these applications. To proceed with these techniques * To whom correspondence should be addressed. Fax: 91512-2590104. E-mail: [email protected].

without ascertaining a steady state may lead to total disorder in the process under consideration. Once the existence of a steady state is ensured, the next step that follows is GED. Gross errors arise from instrument failure, measurement bias, the presence of leaks, etc. Although less common, if undetected, these lead to distribution of the error to all of the measurements during reconciliation exercise, thus totally distorting the true picture. The objective is not only to identify the presence of gross error but also to estimate its magnitude so that a correction is applied and further data processing carried out until the instrument is recalibrated and the fault is rectified. The next step that follows is DR. DR aims at eliminating random noise associated with the measurements so that they satisfy process constraints. These constraints are usually in the form of material and energy balances. The error-free measurements are then subjected to further processing. Crude distillation unit (CDU) is basic to any petroleum refinery through which the entire crude oil must be processed into several products that either undergo further processing in subsequent units or are directly blended into marketable products. Optimal operation implies that an economic objective function is optimized subject to the fractionation products meeting the desired quality standards and operation being carried out within safe limits. Online optimization requires a nonlinear analytical model, which is usually of steadystate type. To make it applicable under dynamic operating conditions, it is necessary to tune the model parameters (usually stage efficiencies) frequently using online operating data. Also in the absence of the availability of hardware sensors, one uses soft sensors for the estimation of product properties (quality), which are estimated from routinely made secondary measurements. Clearly, it is important that only those data that represent steady-state operation and are free from any

10.1021/ie030563u CCC: $27.50 © 2004 American Chemical Society Published on Web 06/26/2004

4324 Ind. Eng. Chem. Res., Vol. 43, No. 15, 2004

random measurement noise and/or gross errors are used as model inputs. Existing Approaches There are various techniques available for SSI. Most of these are based on selecting a data window, computing either the average, variance, or regression slope on the data window, and comparing it with that on the previous one using appropriate statistical tests.1-3 A strategy based on moving-average charts common to statistical process control4 involves the use of the moving average and upper and lower 3 standard deviation limits to detect unsteady state. Still another method5 in this category is based on the ratio of variances calculated by two different methods in the same data window. During steady state, the ratio of variances will be near unity, while during unsteady state, the ratio would be much larger than unity. In all of these methods, we need to select a data window and perform computations over the set of data. The longer the window is, the more delayed the detection of changes in the state is, while a shorter window tends to cause an adverse effect on the test statistics because of noise associated with the measurements. Data storage and processing also impose a computational burden. Narsimhan and co-workers suggested a composite statistical test6 and a mathematical theory of evidence7 for SSI. These methods need the specific assumption that a time period be selected within which process variables are assumed to be in the steady state, but in actual practice it is difficult to choose successive time periods satisfying this assumption. Li et al.8 reported a dynamic filter based approach to identify the steady state and applied it to a CDU. However, they did not provide the details of their method. Cao and Rhinehart9 discussed modification of the primitive F-test type of statistic,5 so that the data are treated sequentially for SSI without the need to select the time window. This strategy makes this method more efficient for online applications to industrial processes. The modification is done by incorporating an exponentially weighted moving-average filter to calculate the average and variance by two different methods. The main aim of this study is to extend the method of Cao and Rhinehart so that onslaught of an unsteady state as well as return to a steady state is detected with minimum delay and at the same time to keep type I and II errors within acceptable limits. Both GED and DR are the issues addressed simultaneously in the chemical engineering literature. DR, with the data containing only random errors and subjected to balance constraints, by means of a constrained least-squares minimization procedure was first introduced by Kuehn and Davidson.10 However, the presence of gross errors in the measurements can corrupt the least-squares procedure, giving undesirable results. So, it was necessary to eliminate gross errors before going for DR. Subsequently, GED methods were developed based on various statistical tests. Comprehensive reviews of these methods have been provided by Crowe11 and Yang et al.12 Narsimhan and Mah13 suggested a generalized likelihood ratio (GLR) method, which is able to distinguish between measurement bias and process leaks that the earlier methods were not able to achieve. This method applies serial compensation strategy to identify multiple gross errors and is very useful for industrial applications because it is able to

estimate the magnitude of the gross error. The above methods can only identify the presence of gross errors, and DR must be carried out subsequently. Biegler and Tjoa14 presented a simultaneous strategy for DR and gross error identification that gives unbiased estimates in the presence of gross errors, and a GED test can also be simultaneously constructed. This technique was found to efficiently handle nonlinear constraints. Tong and Crowe15 proposed a method based on principle component analysis that the authors claimed could detect subtle gross errors where other methods would fail. However, none of the above methods were found to be applicable to the CDU under consideration. A close look revealed that spatial redundancy of data, which is germane to all of the above techniques, is not available in the industrial multistage process with multiple input and multiple output streams such as distillation columns, extraction columns, etc. Another limitation of the above methods, with the exception of the GLR method, is that they do not provide an estimate of the magnitude of the gross error, without which erroneous data cannot be corrected. This means that further use of such data must be suspended for as long as the fault persists, which is undesirable in industrial practice particularly if these data are used to monitor the product quality online. In the present work, we apply an approach similar to that applied to quasi steady state using the Kalman filter by Stanley and Mah16 and couple it with a simple comparison strategy, making use of the previously reconciled data. This method simultaneously detects gross errors and reconciles data. An estimate of the error is available for compensation until the fault is rectified. SSI Theoretical Development. Cao and Rhinehart9 discussed the modification of the F-test type statistic.5 This method involves calculation of the filtered value of the online measurements and the filtered value of the variance by two different approaches: one using the filtered mean-square deviation from the previously filtered value and the other using the filtered meansquare difference of the successive data. Conventionally, the variance is estimated as

σ j2 )

1

N

(Zi - ZN)2 ∑ N - 1i)1

(1)

where Z h N is the average over N measurements. The average, Z h N, can also be estimated through the exponentially weighted moving average, Zf,i, of the process variable, Zi, as follows:

Zf,i ) L1Zi + (1 - L1)Zf,i-1

(2)

The filtered square deviation from the previously filtered value is given by

vf,i2 ) L2(Zi - Zf,i-1)2 + (1 - L2)vf,i-12

(3)

The unbiased estimate of the variance from this first approach is given by

si,12 )

2 - L1 2 vf,i 2

(4)

Ind. Eng. Chem. Res., Vol. 43, No. 15, 2004 4325

Figure 1. Probability density function (pdf) of R for equal and unequal sets of L2 and L3.

The filtered square difference of successive data is given by

df,i2 ) L3(Zi - Zi-1)2 + (1 - L3)df,i-12

(5)

The second unbiased estimate of the variance would be

si,22 ) df,i2/2

(6)

The R statistic is defined as the ratio of the two variance estimates as given by eqs 4 and 6.

R)

(2 - L1)vf,i2 df,i2

(7)

The R value is compared with the critical value of the R statistic. The filter constants and Rcrit should be chosen so as to have both type I and II errors and the delay in detection within allowable limits. These authors recommended only the upper critical value to be used for practical purposes because R values below the lower critical limit can only occur if the measurements fluctuate between high and low extremes, and in the case of chemical process plants, such a situation seldom occurs. If the R value is greater than Rcrit, the process is considered to be not at steady state at an associated level of significance. In this scheme, L2 and L3, which are the filter constants for the two variances, are proposed to be selected as equal. This weighs the two variances equally in the calculation of the R statistic. Now if we weigh the two variances unequally by choosing L2 greater than L3, it is observed that we are able to reduce type I and II errors as well as the delay in detection. Figure 1 shows the probability density function of R for equal and unequal L2 and L3 with the same L1. As seen in this figure, unequal L2 and L3 only increase the spread of the R statistic both below and above R ) 1. Having selected a given measurement for SSI, two important things of concern are the variance associated with the measurement and the type of unsteady-state effect. The most likely types of unsteady states encountered are either step changes, impulses, or ramp changes of various slopes. In general, unsteady state is triggered by a step change in one of the measurements followed by ramp changes of variable slopes in other measurements. For most of the process plants, the measurements are assumed to have normally distributed random noise with zero mean and a definite variance. The variance associated with any measurement depends on the measurement type (flow, temperature, etc.), the sensor dynamics, and the associated plant conditions. The

variance may keep on changing with changing plant conditions. In general, the plant conditions do not change drastically unless some serious fault occurs, which may eventually result in a system shutdown. So, the drift in the variance, if any, under normal operating conditions can always be expected not to be drastic but occurring at a slower pace, and for simulation purposes, the variance is assumed to be constant over a period of time. To account for slow drift, the variance can be updated from time to time. A reasonable estimate of the variance is obtained through the previous data history. In the same way, the knowledge of the minimum slope of the ramp change that should be detectable can also be obtained a priori. In fact, the changes occurring to extents smaller than this may be regarded as quasi steady state, and the detection of such changes should be avoided. The important factors that gauge the effectiveness of the steady-state identifier are the extent of type I and II errors, and the delay in the steady-state as well as unsteady-state detection. Type I errors are contained by selecting a suitable Rcrit for a given set of filter constants. The extent of type II errors as tested9 for a ramp change involves simulation trials for a measurement continuously increasing with the given slope. This gives an estimate of the probability of occurrence of type II errors. One aspect of this analysis is that if we want to study the change in the temperature from 100 °C with slope 0.25/interval, then testing this on 100 000 time intervals leads to testing the temperature range 10025 100 °C. Now instead of performing this trial for a constant slope for 100 000 steps, if we repeat the trial of change from 100 to 200 °C (which is more likely to occur) 100 000 times, then we get a different picture. The difference lies in the fact that, in this case, the type II errors have a significant contribution from the delay in detection of the unsteady state in addition to errors due to false indication of the steady state after the unsteady state is detected. The above repetition exercise will show type II errors equivalent to the unsteady-state delay in detection even if there is no false indication of the steady state. If we can choose the filter constants in such a way that most of the type II errors are due to a delay in detection of the unsteady state (within the acceptable range), then the set of the filter constants for which the steady-state detection occurs earlier should be selected. In such a case, an acceptable type II error criterion should be chosen and filter constants manipulated in such a manner that any further reduction in the delay in detection would result in contribution of a false indication of the steady state toward type II errors. While choosing the filter constants, besides type I and II errors, the third thing that comes into the picture is the delay in detection while moving from the unsteady to the steady state. If for the case mentioned above we analyze the delay in detection of return to the steady state for the change in temperature from 100 to 200 °C over various trials, we find the distribution to be normal as shown in Figure 2. This figure shows the steady-state detection profile based on 10 000 trials. Each trial consisted of 4000 steps (15 s interval), with the first 400 representing a constant value of 100 °C with an associated variance of 0.5 and an autocorrelation between successive data (generated through a first-order autoregressive process) of 0.05. Then there are 1600 steps generated with a slope of 0.0625 °C/step, with the


Figure 2. Probability distribution for unsteady- to steady-state detection.

additional noise corresponding to variance 0.5. From 2000 to 4000, we have data representing a constant value of 200 °C, with the variance of the associated noise at 0.5 and autocorrelation 0.05. Each simulation is based on the average of the two data in succession. Thus, the first steady state is from the 1st to 200th step. The unsteady state is from the 201st to 1000th step, after which the next steady state follows until the 2000th step. The X axis in Figure 2 represents the time at which the second steady state was detected, and the Y axis represents the probability. Figure 2 shows two different distributions for two sets of filter constants. Clearly, the mean and the variance of the distribution depend on the choice of the filter constants. So, to get the best set of these constants in terms of early detection, we compare the mean and the spread of various distributions so generated. A narrow spread is preferred over the wide spread if the means are equal; otherwise, the set of constants that results in a larger average delay are rejected. The spread of different distributions is compared by comparing the number of trials for which the delay in detection lies to the right of the mean value + a constant (say 5). In Figure 2, the set of filter constants corresponding to distribution a is superior to that for distribution b because the former has a lower value of mean as compared to the latter. If both the distributions a and b had the same mean value, then compare the area under the curve between the mean value + 5 and infinity for both cases. The one with the smaller area is taken to be better than the other. This defines the quantitative aspects of the unsteadyto steady-state delay in detection. The last criterion in the selection of filter constants is the delay in detection of the onset of unsteady state. It is desirable that the onset of unsteady state is detected in a time interval equivalent to change in the true value of the signal by 2-3 standard deviations most of the time. This is achieved by fixing the allowance of delay in the detection of the unsteady state in type II errors between a 2 and 3 standard deviation limit and suitably selecting the filter constants in such a way that type II errors are only due to an unsteady-state delay in detection. For a given set of filter constants, the proper selection of Rcrit would fix type I errors at the desired level. In the present study, the type II error limit was fixed by allowing 15 out of 1000 trials to cross the limit of 2.5 standard deviations (detection interval). This is the measure of type II errors being allowed. The objective then is to select that set of filter constants for which the the unsteady- to steady-state delay in detection would be at the minimum and at the same time meeting both type I and II error criteria. Optimization of the Filter Constants. It is necessary to get the best set of filter constants that would

meet the desired criteria as discussed above. In this regard, there is a need to closely look at the significance and relative importance of the three filter constants. L1 is used to filter the raw data. The smaller the value of L1, the deeper is the filtering, leading to a larger delay in detection of the steady state but an early detection of the unsteady state and less type II errors. It is, therefore, desirable to increase L1 so long as the type II error criterion is satisfied. Both L2 and L3 are filter constants to calculate the variance by two different approaches; L2 is for the variance based on the filtered square deviation from the previous filtered value, while L3 is for the one based on the filtered square difference of successive data. The former appears in the numerator and the latter in the denominator of the R statistic equation. A larger value of L2 and/or a smaller value of L3 would cause the R value to cross the critical limit at a faster pace during transition from steady to unsteady state and vice versa. A higher value of this ratio would permit earlier detection of both the steady and unsteady states but at the same time causes increased type II errors. It is, therefore, necessary to balance the merits of both L1 and L2/L3 ratio selection to achieve the ultimate goal of earlier detection at an allowable error level. To get the best set of filter constants for a given measurement, we define the variance, the minimum slope of the ramp to be detected, the allowance of the delay in detection of the unsteady state, and the range of change with respect to that measurement. We fix L3 at some minimum value, say 0.01, and then find a pair of L1 and L2 at which type II errors as well as the delay in detection of the unsteady state are within acceptable limits. While finding the L1 and L2 pair, we first fix L2 and increase L1 to settle at the value at which type II errors are within the specified limit. At this point we get a certain delay in detection. We now increment L2 and again go on increasing L1 from a small starting value (generally 0.02) as long as the type II error criterion is satisfied. Then this exercise is repeated with yet another increased value of L2. From these results, for a given value of L3, we choose a L1 and L2 pair at which the detection time is at a minimum (local minimum). This gives us the best estimate corresponding to a value of L3. This entire exercise is repeated for the next incremented value of L3, and again the optimal set of L1 and L2 corresponding to this L3 is calculated. The L3 value is incremented until we get a set of L1, L2, and L3 at which the time lag is at a global minimum. To start with, the increments in the values of L1, L2, and L3 are kept at 0.02, 0.05, and 0.01, with the starting values at 0.02, 0.05, and 0.01, respectively. It may be pointed out here that in the absence of any known analytical or empirical relationship between the filter constants and the two types of errors and delay times, it is not possible to use standard optimization techniques to find the optimal values of L1, L2, and L3. Optimization through enumeration following a one at a time approach has, therefore, been adopted. However, once the significance and relative importance of each of these filter constants is understood (as discussed earlier), not all possible combinations of these parameters need to be examined, thus simplifying the search to a manageable level. Once we get an approximate optimum set of filter constants, the search can be further refined to find values closer to the optimum by reducing the steps of the increments. In the present


study, the number of trials was restricted to 1000 because the results with 100 000 trials were not significantly different from those reported here. The steps through which the optimum is to be obtained are as follows: (1) Select a value of L3 (say 0.01). (2) Select a value of L2 (say 0.05). (3) For these values of L2 and L3, start with a low value of L1 (say 0.02) and calculate Rcrit, type II errors, and the delay in detection of the steady state. (4) Increment L1 by 0.02, and repeat calculations of step 3 until an allowable type II error limit is crossed. (5) Increment L2 by 0.05, and then return to step 3. Go on incrementing L2 until the local minimum in terms of an earlier detection of the steady state is obtained corresponding to given value of L3. (6) Increment L3 by 0.01, and return to step 2 until a global minimum is obtained for a given number of data points being averaged. If we wish to choose a different time interval for averaging, then all of these steps must be carried out afresh. If we increase the number of data points used for averaging, the number of steps equivalent to the unsteady-state “detection interval” would decrease. For example, for the temperature changing from 100 to 200 °C, linearly with a slope of 0.25 °C/min, if we average two successive measurements made at 15 s intervals, the allowable step limit would be 2.5σ/ramp per step ) 1.7675/0.125 ) 14 steps because 0.125/step is the change associated with this averaging. Similarly, for an average of four successive measurements, the allowable step limit would be set at 1.7675/0.25 ) 7 steps. If we wish to keep both L2 and L3 at the same value,9 it is still possible to choose the filter constants optimally following the above procedure. In this approach, select L2 and L3 at some starting value and then increase L1 until the type II error criteria are struck; then repeat the same exercise for the next incremented value of L2 and L3. In this analysis, it was found that the delay in detection achievable through the optimal in the former approach (L2 * L3) is smaller than that in the latter one (L2 ) L3). Selection of the Measurements for SSI. As pointed out earlier, for any process unit, it is difficult to get a single measurement that would reflect all sorts of unsteady-state effects. Therefore, it is necessary to select more than one measurement as unsteady-state markers. If any one of these measurements shows transient behavior, then an unsteady state is inferred for that unit. However, if this measurement is corrupted with gross error, it would cause false unsteady-state detection. To overcome this difficulty, it is necessary that the existence of an unsteady state is concluded only if at least one more variable shows a transient response. The second variable should preferably be chosen for which the sensor is located in physical proximity of the first. The chances that both of these measurements will have gross error are small. In the case of CDU (see Appendix A for a description of CDU), it is possible to have local disturbances (unsteady-state conditions) that may not be reflected in other parts of the unit. To overcome this problem, the temperatures on 10 different stages were selected for SSI for that column. During any disturbance, it is likely that any one temperature along with one or more adjacent temperatures would show the unsteady state. So, if we get such a combination violating SSI criteria,

unsteady state would be confirmed. Return to the steady state after an unsteady state is confirmed when there are no two adjacent temperature measurements showing an unsteady state. Because these stages span the entire length of the main column, it is expected that an occurrence of transient in any part of the column will not go undetected. For CDU, the flow measurements pose a problem as steady-state markers. This is because the incoming and outgoing flows are measured at only a few specific locations. Moreover, several of these flows are fixed by pumping requirements and hence are not free to change with the unsteady state. GED and DR Once the existence of the steady state is ascertained, it is necessary to further process the data to remove the associated gross and random errors. Almost all of the techniques in the GED and DR literature are based on spatial redundancy of measurements. In the case of CDU, the spatial redundancy is available only with respect to the incoming feed and the six outgoing product flows. The circulating reflux flows as well as all of the temperatures do not form a part of the spatially redundant system. Nodal aggregation17 to eliminate all of the unmeasured flows results in a single node system with all of the seven streams connected to it. Hence, it is difficult to apply conventional DR and GED techniques to this system to reconcile the data and to identify, detect, and estimate the magnitude of gross error associated with the measurements. Under these circumstances, the temporal redundancy available with these data is exploited with advantage. DR. To start with, for the CDU under consideration, we consider 12 flow measurements only. We start with a steady-state condition, and it is assumed that all of the measurements available initially contain only random errors. We consider two strategies to carry out DR for this system. The first strategy is based on the application of the Kalman filter, while the second strategy involves application of the least-squares minimization technique to carry out DR. These two strategies are described below. (a) Application of the Kalman Filter. The 12 flow measurements of the CDU under consideration are put through the Kalman filter as suggested by Stanley and Mah.16 Following the analysis given in this reference, the Kalman filter equations are formulated for the 11 state variables and the 12 measurements as follows. The measurement vector y is related to the state vector x through the equation

yk ) Hkxk + vk

(8)

where Hk is the measurement matrix and vk is the measurement noise vector at time tk. The measurements are used to update the prior estimates of xk and of the error covariance matrix Pk of xk

xˆ k(+) ) xˆ k(-) + Kk[yk - Hkxˆ k(-)]

(9)

Kk ) Pk(-) HkT[HkPk(-) HkT + Mk]-1

(10)

Pk(+) ) (I - KkHk)Pk(-)

(11)

where

and M is the measurement noise variance-covariance matrix.


Next these estimates are extrapolated to time tk+1 based on the system model postulated. Considering the process to be quasi steady state, although the measurements are going to change over a period of time, the prediction for the immediate future is obtained based on the following model:

xk+1 ) xk + w

(12)

where w is the process noise vector with mean zero and a covariance matrix Q. The process noise and the measurement noise are assumed to be uncorrelated. The prior estimates of both x and P are assumed to be available from the initial set of data. The initial expectations are given as

E(x0) ) xˆ 0

(13)

E[(x - xˆ 0)(x - xˆ 0)] ) P0 ) P0(-)

(14)

With this model, the prediction equations are

xˆ k+1(-) ) xˆ k(+)

(15)

Pk+1(-) ) Pk(+) + Qk

(16)

In this case, the process noise covariance matrix is treated as a filter design parameter and is chosen to be diagonal for ease of computation and simplicity. The measurement noise is also assumed to be uncorrelated, and M is taken to be diagonal. To start with, it is assumed that the measurements are free from gross errors so that averaging 25-30 initial measurements gives an initial estimate of x. The variance calculated on these measurements is used to get initial estimates of M and P. Using an initial estimate of P, the gain matrix for the Kalman filter, K, is evaluated using eq 10. Then the next set of measurements are used to update the estimate of x using eq 9, while the estimate of P is updated using the gain matrix by eq 11. Updated values of both x and P are used in eqs 15 and 16, respectively, to get predicted values corresponding to the given set of measurements. It is necessary to adjust the M/Q ratio to allow for the drift in the measurements occurring during the steady-state regime. The maximum extent of the drift is on the order of 0.05-0.1%/min. The effect of the M/Q ratio in accurately estimating the flow changes occurring to this tune should be tested by observing the average error associated with the reconciled measurements with different values of the M/Q ratio. (b) Application of the Least-Squares Minimization Technique. For this case also, we start with true sets of measurements, as was the case with the previous strategy, and compute the intermediate flows through nodal balance. A least-squares minimization technique10 weighted by the inverse of variances and subject to material balance constraints is applied to the 11-node system (Figure 8 in the appendix). The formulation of this optimization problem involves minimization of the following objective function for the measurement vector y:

Min (y - ym)T M-1(y - ym) subject to: Ay ) 0 where A is the incidence matrix and ym is the vector of measurements at a given instance of time.

To start with, all of the values of flows are assumed to be known and the intermediate flows in the 11-node system are calculated through nodal balances. For the next set of online measurements, the previous intermediate flows calculated through nodal balances are used as measured values and the reconciliation is carried out. For the subsequent sets, the reconciled values of the previous step are used as measured values for the intermediate flows. For reconciliation exercise, the variances of the 12 measured flows are obtained from the steady-state measured values and variances for the intermediate unmeasured flows should be used as tuning parameters. For true steady state in which there is no drift occurring, very small values of these variances would cause a good match between the reconciled and true flow values, but to account for slow drifts occurring within the steady-state regime, it is necessary to keep these variances at higher values. This is achieved by simulating a drift in the measurements and then checking the average errors for different sets of variances of the intermediate flows. The set with acceptable error should be chosen. To get a better picture of the trend of the drift, an exponentially weighted moving-average filter is incorporated to obtain the filtered value, Zf, for the measurement value, Z, as follows:

Zf,i ) K1Zi + (1 - K1)Zf,i-1

(17)

It is necessary to check which value of the filter constants would follow the drift with minimum average error. GED. Gross errors in chemical plants are mainly due to measurement bias or instrument failure and are assumed to have a fixed contribution to the measurement value besides occasional stray readings. The GED for all of these cases is applied based on a simple comparison strategy. To start with, we assume that there is no gross error in any measurement, which can be assured through calibration of sensors at the very initial stages. Each time it is observed whether the next measurement value is within the 6 standard deviation limit from the previous reconciled value. If it is within the limit, the measurement is sent for further DR exercise. If the limit is violated, then a gross error is identified and the magnitude of the gross error is taken as the difference between the measured and reference value (averaged value or previous reconciled value). The reconciliation exercise is carried out with the previous reference value for this measurement. If the next measurement is more than 6 standard deviations apart from the previous measurement value as well as the previous reconciled value, a still developing gross error is expected and the difference between the reference value and this measurement value is considered as the magnitude of gross error and used as the correction. If the measurement value remains within a 6 standard deviation limit of the previous measurement value and still 6 standard deviations away from the previous reconciled value, then the average of these two measurements is taken and the difference between this averaged value and the reference value is taken as the magnitude of gross error. The averaging is continued if measurements until 10 sampling intervals are found within a 6 standard deviation limit from the previous measurement and at the same time more than 6


standard deviations away from the previous reconciled value. Subsequently, the average value of the error is used as the correction for gross error until any other gross error occurs or the source of the gross error is eliminated. For this case, a leak at any point except at the exit product streams would lead to unsteady state and hence would be taken care of during SSI exercise. The leaks associated with the outgoing product streams would be identified as measurement bias only. DR using spatial redundancy cannot be used for temperature measurements because empirical thermodynamic correlations relating temperature to enthalpy are only approximate. So, a simple approach based on the exponentially weighted moving-average filter with a small value of the filter constant is used to get the estimate of the reconciled temperature measurements. This would filter the noise associated with the measurements as well as catch the drift occurring during the steady-state regime; however, energy balance cannot be assured. A similar comparison strategy as applied to flow values is used for GED in temperatures. One limitation of this analysis is that after transition from the unsteady to steady state we must assume that the measurements do not have any additional gross errors because during the unsteady state we have no means to check whether a gross error has crept into any of the measurements. However, because ramp change is the most common with all of the measurements, it may be possible to track the trend in most of the measurements. The tracking is achieved by fitting a polynomial to a set of measurements. The next measurement value is predicted using this polynomial and compared with the actual measurement. If the difference is more than 6 standard deviations, a gross error is suspected in the measurement. If the difference is within a limit, then the constants of the polynomial are updated by carrying out regression using the next measurement value and the same exercise is repeated. In this way, it may be possible to get an idea if any measurement is affected by gross error during the unsteady state. The instrument should be corrected for any fault before processing of the data for identification of the steady state is started. Results and Discussion SSI. For the trial case, which was used to get the steady-state detection profile with the temperature changing from 100 to 200 °C, linearly with a slope of 0.25 °C/min [and autocorrelation between successive data (every 15 s) of 0.05], the trend of an optimum search for the average of two measurements in succession is reported in Tables 1 and 2, while the comparison of the results for an average of two and four measurements for equal and unequal L2 and L3 is reported in Table 3. A total of 1000 such trials were tested, and the average of these 1000 trials is reported in these tables. In the above example, for an average of two readings the process returns to steady state at the 1001st step, but the algorithm detects it at the step given in the column titled “US to SS” in Table 1. The value in excess of 1000, therefore, represents the delay in detecting the switch from the unsteady to the steady state. The first three rows in Table 1 show that, as L1 is increased from 0.06 to 0.1, type II errors increase from 2 to 39 and US to SS detection decreases from the 1042nd step to the 1025th step for L2 ) 0.1 and L3 ) 0.02. Linear interpolation provides L2 corresponding to type II error as 15.

Table 1. Search for Optimum Values of L1 and L2 for a Given Value of L3 (0.02) for an Average of Two Data L1

L2

L3

US to SS

no. of type II errors

0.06 0.08 0.10 0.083 0.06 0.08 0.077 0.06 0.08 0.071 0.04 0.06 0.08 0.063 0.04 0.06 0.059 0.04 0.06 0.053 0.04 0.06 0.048 0.04 0.06 0.044 0.04 0.06 0.041

0.1 0.1 0.1 0.1 0.15 0.15 0.15 0.20 0.20 0.20 0.25 0.25 0.25 0.25 0.30 0.30 0.30 0.35 0.35 0.35 0.40 0.40 0.40 0.45 0.45 0.45 0.50 0.50 0.50

0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02

1042 1032 1025 1031 1031 1022 1023 1025 1017 1021 1039 1021 1014 1020 1035 1019 1020 1033 1017 1022 1030 1015 1024 1028 1014 1026 1027 1012 1026

2 11 39 15 5 17 15 5 23 15 1 11 38 15 1 16 15 3 21 15 3 33 15 5 56 15 11 81 15

Table 2. Optimum Values of L1 and L2 at Different Values of L3 L1

L2

L3

US to SS

0.0587 0.0622 0.0602 0.0604 0.0604 0.0628 0.0689 0.0604 0.0648 0.0630 0.0628 0.0600

0.30 0.30 0.35 0.35 0.35 0.30 0.25 0.30 0.25 0.25 0.25 0.25

0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13

1020 1018 1017 1017 1017 1019 1019 1020 1020 1021 1021 1022

The corresponding US to SS detection step is also calculated. The rows with the boldface values in Table 1 represent the interpolated values for L1, US to SS detection step, and type II errors. Tables similar to Table 1 are generated for different values of L3 (not shown). The results of these searches are summarized in Table 2, which shows the local optima at different values of L3. The best among them are marked with boldface values. Similar results are generated using an average of four readings. These optimum values of the filter constants and corresponding step number for detection from US to SS as well as SS to US are compared with those for an average of two readings in Table 3. This table also shows the same parameters if L2 and L3 are kept equal but at optimized values. As seen in Table 3, unequal values of L2 and L3 score over equal ones for both an average of two as well as an average of four readings. Table 4 represents the same set of results for autocorrelation lasting for 1 min. Considering this as a firstorder autoregressive process, the autocorrelation between every 15 s reading was taken to be 0.051/4 ) 0.4728. It was observed that for this case a “detection interval” of 2.5 times the standard deviation was causing unduly delayed steady-state detection. In this case, increased autocorrelation causes Rcrit to be higher and hence L1 to be smaller to meet the “detection interval”

4330 Ind. Eng. Chem. Res., Vol. 43, No. 15, 2004 Table 3. Comparison of the Delay Times for Equal and Unequal Values of L2 and L3 for Averages of Two and Four Readings (Autocorrelation between Every 15 s of Data at 0.05) L2 * L 3

L2 ) L 3

avg

L1

L2

L3

US to SS

SS to US

L1

L2

L3

US to SS

SS to US

2 slope ) 0.125/step 4 slope ) 0.25/step

0.06

0.35

1017 (1000)

209 (200)

0.052

0.2

209 (200)

0.5

505 (500)

105 (100)

0.15

0.24

0.2 delay ) 14.5 min 0.24 delay ) 13 min

1029 (1000)

0.16


513 (500)

105 (100)

Table 4. Comparison of the Delay Times for Equal and Unequal Values of L2 and L3 for Averages of Two and Four Readings (Autocorrelation between Every 15 s of Data at 0.4728) L2 * L 3

L2 ) L 3

avg

L1

L2

L3

US to SS

SS to US

L1

L2

L3

US to SS

SS to US

2 slope ) 0.125/step 4 slope ) 0.25/step

0.04

0.4

1024 (1000)

212 (200)

0.032

0.12

212 (200)

0.5

509 (500)

106 (100)

0.135

0.24


1063 (1000)

0.12

0.02 delay ) 12 min 0.1 delay ) 9 min

513 (500)

106 (100)

Table 5. Performance Evaluation of Optimal Filter Constants for an Average of Four Readings case

L1

L2

L3

autocorrelation between successive data (15 s intervals)

type II errors (100 000 time steps)

Rcrit

average R (unsteady state)

1 2 3 4

0.15 0.16 0.135 0.12

0.24 0.5 0.24 0.5

0.24 0.04 0.24 0.1

0.05 0.05 0.4728 0.4728

5 13 6 9

2.83 3.84 3.65 4.71

21.19 15.62 26.21 29.13

Table 6. Performance Evaluation of Filter Constants slope of the ramp 0.2σ/Ta

σ/Ta

case

L1

L2

L3

type II errors (100 000 time steps)

Rcrit

average R (unsteady state)

SS to US

US to SS

SS to US

US to SS

1 2 3

0.1 0.08 0.06

0.1 0.16 0.45

0.1 0.16 0.07

116 278 271

1.89 2.63 3.63

5.31 8.38 12.44

112 111 110

519 515 508

104 104 103

551 540 536

a

T ) processing time interval.

of 2.5 times the standard deviation, and this tends to delay the steady-state detection too much. A detection interval of 3 standard deviations works well in bringing down the delay in detection of the steady state for data with autocorrelation lasting for 1 min. Here again unequal values of L2 - L3 result in significantly reduced lags. From Tables 3 and 4, it is seen that averaging four readings causes a further reduction in steady-state detection delay as compared to the cases where two readings are averaged. A higher level of autocorrelation is seen to result in an increased delay in detection of the steady state. However, for all of these cases, unequal L2 and L3 always leads to faster detection of a return to steady state. When two readings are averaged, there is a sizable difference in the delay in detection between the cases with autocorrelation lasting for 15 and 60 s for L2 ) L3 (14.5 and 31.5 min, respectively). For a higher degree of autocorrelation (60 s), if only two readings are averaged, the steady-state detection gets unduly delayed because of a smaller slope of the ramp, increased variance associated with the measurements, and only two manipulated parameters L1 and L2 (or L3) on hand. Clearly, this is not the problem with the unequal L2 and L3 case, and the delay increases from 8.5 to 12 min with an increase in the autocorrelation between successive data. Tables 1-4 were generated by repeating the calculations 1000 times. In another calculation, the unsteady state was allowed to grow from 100 °C at a slope of 0.25/ min for 100 000 steps (minutes). The results are pre-

sented in Table 5 with four readings averaged. Cases 1 and 3 are for L2 ) L3 for autocorrelation dying out (to 0.05) in 15 and 60 s, respectively. Cases 2 and 4 represent the same information for L2 * L3. The type II errors are given in column 6. Columns 7 and 8 give the Rcrit and average values of R (during unsteady state). Clearly, the R value stays well above Rcrit, causing only a few type II errors when optimum values of the filter constants are used. Consider a case for the ramp change with a slope of 0.2σ/processing time interval.18 The values of L1, L2, and L3 are suggested at 0.1, 0.1, and 0.1 for type I error at a 1% level of significance and type II error at 0.1%. If for the same case we apply the optimal filter search, for both equal and unequal L2 and L3, the values turn out to be different. These results are reported in Table 6. The R profile for a single randomly generated case is shown in Figure 3. From above results it is clear that, for nearly the same allowable type I and II errors, the optimal selection with L2 ) L3 is better than the ones suggested.9 For L2 * L3, the performance in terms of earlier detection of the steady state is still better. For a higher value of the slope (say a slope of σ/processing time interval), the average SS to US as well as US to SS detection step for the three cases mentioned above is also shown in Table 6. For this case, the SS to US delay in detection decreases while that for US to SS increases as compared to the minimum slope case. Here again a L2 * L3 filter constant performs much better in terms of earlier detection of states.


Figure 3. R profile for different sets of filter constants for the three cases reported in Table 6. Table 7. Change in the Kero CR Return Temperature and Corresponding Changes in the Tray Temperatures location

initial temp (°C)

final temp (°C)

Kero CR return

58.2

69.84

location

initial temp (°C)

final temp (°C)

tray 2 tray 5 tray 14 tray 24 tray 28 tray 32 tray 37 tray 39 tray 44 tray 49

106.8 127.7 160.2 196.4 236.1 270.9 322 342.7 368.3 361.4

110.7 132.1 164.3 199.9 239.4 273.6 324 346.3 368.3 361.4

Test Problems. Two test cases are tried in this study. One is based on the simulated data and the other on the actual data from the industry. We consider each of them one by one. Case Based on Simulated Data. Pseudo-experimental data were generated using a steady-state simulator19 at two different conditions, and linear dynamics was superimposed. Consider a case in which the Kero CR return temperature is changed from the base case of 58.2 °C to 69.84 °C. All of the flow rates and other CR return temperatures were kept constant. The temperature profiles for the two steady states (initial and final) are given in Table 7. (a) SSI. As seen from Table 7, temperatures in all of the locations changed because of the step change; however, the extent of change varied throughout the column. As discussed earlier, for detection of the steady state, all of the temperature changes were studied. If two adjacent temperatures showed an unsteady state, then only an unsteady state was concluded. This means for these two cases R becomes greater than Rcrit. To check for delay in steady- as well as unsteady-state detection, we generate the first 400 steps representing an initial value of the temperature, a noise variance at 0.5% of the temperature value, and autocorrelation at 0.05. This was followed by 80 steps in which the temperature value was linearly increased to the final value with the associated noise. This was further followed by 400 steps representing the final value, a noise variance 0.5% of the temperature value, and autocorrelation at 0.05. An additional 800 steps representing a small drift of 0.05%/min were introduced in each temperature and checked for unsteady-state detection criteria. Assuming these data to have been obtained at 15 s intervals, an average of every four steps was

carried out to get the first steady state from the 1st to 100th step, followed by the unsteady state from the 101st to 120th step and the second steady state from the 121st to 220th step. There are again additional steps from the 221st to 420th step representing the slow drift. It was observed that with unequal as well as equal L2 - L3 cases unsteady-state detection occurred at the 108th step when the 5th and 14th tray temperatures showed unsteady state. For this case the unsteady-state detection time (8 min) was more than the average time (5 min) because the slope at which the temperatures are changing in this case is less than the slope (0.25%/ min) at which the filter constants are evaluated. The steady state is detected at the 126th step for the unequal L2 - L3 case and at the 133rd step for the equal L2 - L3 case. For the drift generated at 0.05%/min after the 220th step, although R statistic values for some of the temperatures crossed the critical limit for some of the steps, the occurrences of two adjacent temperature values showing unsteady state were negligible. This shows that selection of two adjacent measurements for unsteady-state detection also reduces the possibility of type I errors. From the above simulation experiments, we conclude the following: 1. For US to SS detection, it is desirable to use unequal (optimal) values of L2 and L3 because there is faster detection of the return of steady state when using L2 * L3 as compared to L2 ) L3. Earlier detection of steady state means earlier resumption of online monitoring and property prediction. 2. SS to US detection occurs with the same delay for both equal and unequal values of L2 and L3. 3. A small drift (0.05%/min or less) will not be detected as unsteady state when one uses optimal filter constants. This is desirable because it has been observed that in CDUs small drifts can occur for long periods of time without affecting the normal operation. (b) DR Using the Kalman Filter. The M/Q ratio is used as a tuning parameter for the Kalman filter. A higher value of the M/Q ratio would give more weightage to the previously filtered value and would be suitable for the conditions when the measurements are not drifting. To suitably account for the drift associated with the measurements, it is necessary to adjust the value of the M/Q ratio so that the average error associated with the reconciled measurements is at the minimum. Therefore, Kalman filter tuning should be done with respect to the drifting situation. A temperature drift on the order of 0.05%/min has been considered to be within the steady-state regime. Because the corresponding drift associated with the flows is generally of higher magnitude, the drift in the flows in the steady-state regime is assumed to be at 0.1%. It is, therefore, necessary to adjust the M/Q ratio for the Kalman filter to be able to adjust to this drift. For this purpose, a drift with a slope of 0.1%/min was superimposed on the incoming crude and the outgoing product flows, which already had a random noise with a variance of 3% of the flow value. To be able to get an average of four readings, measurements were generated with a slope of 0.025% drift at every step and then every four readings were averaged. The average error associated with the reconciled values for various ratios of M/Q was calculated. The ratio that gave a minimum error was selected. Table 8 shows the effect of M/Q on the performance of the Kalman filter. The errors were

4332 Ind. Eng. Chem. Res., Vol. 43, No. 15, 2004 Table 8. Effect of M/Q on the Kalman Filter Performance M/Q

average error over 1000 trials

0.01 0.1 1

33.15 31.11 25.40

M/Q

average error over 1000 trials

5 10

31.84 42.09

Table 9. Effect of the Exponential Filter Constant and Intermediate Flow Variance on Reconciliation filter constant errors with the exponential filter

0.6

0.7

0.8

0.9

1

28.95

27.85

29.96

32.81

36.17 Figure 4. Column top temperature history showing strong autocorrelation.

average error variance for the intermediate flows 100 200 300 400 500

28.63 26.94 26.52 26.35 26.26

27.37 26.25 26.09 26.06 26.06

27.41 26.96 27.07 27.19 27.28

28.35 28.59 28.96 29.22 29.41

29.97 30.85 31.47 31.86 32.12

evaluated on a total of 1600 measurements for each of the flows. A total of 1000 such trials were performed, and errors were averaged again. This covers the range of 40% increase in the flows over a time interval of 400 min. The average error is calculated as the average of the sum of the square of the deviation of the reconciled flows from their respective true values over 400 steps, which have been reported in Table 8 for various values of the M/Q ratio. As seen from this table, the error is minimum for M/Q equal to 1 in the present case. For this case, the total error associated with the 12 flows was found to be 71.32. After four successive data were averaged, the error was reduced to 35.65. With application of the Kalman filter, the error is further reduced to 25.52 for M/Q ) 1. The average error for the no-drift situation is 23.93 at M/Q ) 1. (c) DR Using the Exponential Filter with Optimization Strategy. The error associated with this analysis depends on both the choice of the exponential filter constant and the variance of the intermediate flows. The results for the same case as considered for the Kalman filter are reported in Table 9. In this table, the third row (in boldface) shows the total average errors associated with all of the measurements after filtering. Again this error is averaged over 400 steps as before for all of the 12 flows being reconciled. However, it may be noted that, at this stage, material balance may not be satisfied. The average error associated with the exponential filter depends on the value of the filter constant. The optimization strategy applied to this filtered data may cause an increase or decrease in the average errors depending upon the variance of the intermediate flows. The variance of the intermediate flows that gives further reduction in the average error over and above that of the exponential filter should be preferred. The optimization strategy also keeps the measurements in material balance, which is not the case with the exponential filter alone. A combination of both would help in reducing errors while at the same time satisfying the balance constraints. From Table 9, one would conclude that a value of 0.7 for the filter constant with the variance for the intermediate flows in the range of 300-400 is desirable for this case. The above set of data when treated with this method resulted in a total error of 23.7 as opposed to 23.93 using the Kalman filter

in the absence of any drift. The error in the presence of drift was 26.06 by the exponential filter method as compared to 25.52 by Kalman filtering. Clearly, the Kalman filter as well as the exponential filter with optimization strategy performed equally well for the test problem. Increasing the variance of the intermediate flows would increase errors associated with the drift and no-drift cases. Case Based on Data from Industry. The data are available at the interval of 1 min for a period of 25 h with a noticeable unsteady state in between. Figure 4 shows a portion of raw data with a high degree of autocorrelation present, which was found to be in the range of 0.7-0.9. It was observed that autocorrelation could be totally eliminated only if we sampled at longer intervals, say every 15-20 min, but this would unduly delay data processing as well as property prediction. It is well-known that at times online data show a high degree of autocorrelation, particularly in the refinery processes, where none is expected. One must, therefore, learn to work with such data for practical reasons, and suitable modification must be made to the SSI strategy. This is achieved by adding a random component to each sampled data so that autocorrelation is reduced to an acceptable level and at the same time the unsteady state is not suppressed. With a high degree of autocorrelation present between successive data at 1 min intervals, the extent of random noise to be added is very high. However, if we sample the data at 2, 3, or 5 min intervals, the extent of random noise to be added would continue to decrease. It is, therefore, desirable to select the sampling time interval for SSI in an appropriate manner. In this study we present the analysis for sampling every 5th min data. Based on the plant data representing the steady state, an estimate of the variance as well as autocorrelation between sampled data is obtained. Once the sampling interval is selected at 5 min, the noise to be added to the data should be such that the autocorrelation between successive 5 min data is brought down to an allowable range. In this study, noise equivalent to 1.5 standard deviations was found to be adequate to bring the level of autocorrelation to within 0.05 in all of the measurements. The variance associated with the modified data (after the addition of noise) is then calculated, and this is then used for calculation of the optimal set of filter constants that are used for SSI. The variance and autocorrelation associated with the data keeps changing over a period of time. However, the change is usually slow unless there is an abrupt occurrence of unsteady state. A straightforward strategy is to update the filter constants from time to time. These constants can be evaluated a priori for different values of variance and autocorrelation and


Figure 5. State detection profile using an optimal set of filter constants.

Figure 6. Reconciled flows using the Kalman filter and an optimization algorithm with the exponential filter.

Table 10. Variance and Optimal Set of Filter Constants for Temperature Measurements (Data from Industry)

Table 11. GED in the Presence of Drift of 0.1%/min (Random Error Limit ) 6σavg)

serial no. 1 2 3 4 5 6 7 8 9 10

variance 0.14 0.07 0.03 0.58 0.75 0.51 0.9 1.22 2.07 0.61

L1 0.08 0.162 0.162 0.061 0.061 0.061 0.042 0.042 0.036 0.061

L2 0.4 0.45 0.45 0.25 0.25 0.25 0.2 0.2 0.1 0.25

L3

flow

gross error introduced (10σavg)

gross error detected

0.04 0.02 0.02 0.03 0.03 0.03 0.05 0.05 0.03 0.03

UN reflux Top CR Kero CR SCN SK LGO CR LGO HGO CR HGO crude LR

27.39 25.98 28.72 58.09 14.23 30.62 27.39 25.98 51.23 22.91 78.42 55.32

31.02 28.07 25.51 61.44 15.08 33.94 27.12 28.44 48.48 24.74 97.16 62.30

used as required. In the present study, addition of random noise to the data seems to suppress the drifting effect to some extent. While this strategy was found to work well in the present case, this aspect of addition of an appropriate extent of noise to each measurement at an appropriate sampling interval needs to be studied in greater detail. Research with the emphasis on eliminating the effect of changing variance and autocorrelation on the SSI strategy is going on in our group and will be communicated soon. The values of the variance and the set of filter constants are reported in Table 10. The filter constants are not evaluated separately for all of the 10 temperature measurements, but the variances are grouped into five categories as 0.14 (for the 1st temperature), 0.07 (for the 2nd and 3rd temperatures), 0.6 (for the 4th6th and 10th temperatures), 1 (for the 7th and 8th temperatures), and 2.07 (for the 9th temperature). One set of optimal filter constants is evaluated for each category. The steady/unsteady-state detection profile for optimal filter constants are shown in Figure 5. The results are based on 100 random trials. Each trial consists of plant data with the addition of random noise to it. The Y axis in this figure represents the probability of detecting unsteady state at a given step represented by the X axis. The temperatures included in this analysis are measured along the length of the main column. The unsteady state is detected only when two successive temperatures show unsteady state. For this case, unsteady state existed from the 138th to 156th step and detection of the next steady state occurs in almost all of the cases at the 166th step. From this figure, it is clear that the present scheme of data processing works satisfactorily and one can still identify the unsteady state, which really exists even in the presence of data showing a high degree of autocorrelation. It may be mentioned that if we were to use 1 min of raw data without adding random noise, then most of the steady-state region would have been identified as unsteady.

The flow history for the raw and reconciled data for one of the flows is shown in Figure 6 using both the Kalman filter and exponential filter strategy. The other flows show similar trends. Drifting variance can be accounted for by updating it from time to time. The estimate of the variance at every sampling interval can also be obtained from eqs 4 and 6 and should be averaged over a certain period of time to reduce the noise associated with it. However, this aspect needs to be studied in greater detail. GED. The GED scheme is tested with the same simulated data as those used with the Kalman filter for M/Q ) 1, for both drift and no-drift situations. The noise variance with the flows is assumed to be 3% of the flow value. The noise variance associated with the average of four readings would be reduced to 0.75%. The GED is, therefore, based on the reduced variance. Table 11 shows the minimum gross errors that are detected in the presence of a drift of 0.1%/min. The gross error limit in this case is kept at 6 standard deviations; i.e., up to 6 standard deviation variations, the error is assumed to be only due to random noise, but beyond this limit, it should be detected as gross error. The minimum magnitude of the gross error that is detected correctly with this random error limit is at 10 standard deviations. Table 12 shows the detection capability of the algorithm if multiple gross errors are present. Three cases are reported, with the gross errors of magnitude of 10 standard deviations introduced in streams UN, SCN, and HGO for case 1, Reflux, Kero CR, SCN, LGO CR, and Crude for case 2, and reflux, Kero CR, SCN, SK, HGO CR, HGO, and LR for case 3. It is seen that all of the errors are detected reasonably well in all of the three cases. This is to be expected because, unlike most other GED methods, the present algorithm relies on comparison with the previous data, which ensures that the existence of multiple errors do not affect this detection.


Figure 7. Schematic of the CDU Table 12. Multiple GED (Random Error Limit ) 6σavg) case I flow

MT/day

variance associated with averaged data

UN reflux top CR Kero CR SCN SK LGO CR LGO HGO CR HGO crude LR

1000 900 1100 4500 270 1250 1000 900 3500 700 8200 4080

7.50 6.75 8.25 33.75 2.03 9.38 7.50 6.75 26.25 5.25 61.50 30.60

case II

error (MT/day)

detection (MT/day)

27.39

30.88

14.23

15.01

case III

error (MT/day)

detection (MT/day)

error (MT/day)

detection (MT/day)

25.98

28.07

25.98

28.07

58.09 14.23

61.44 15.37

58.09 14.23 30.62 27.12

61.44 14.94 33.36

51.23 22.91 97.61 55.32

48.48 24.39

27.39 22.91

Conclusions In this work an attempt has been made to address three inherent data processing problems, namely, SSI, GED, and DR, for systems having multiple-staged operations in a single unit such as CDU. For SSI, the technique suggested by Cao and Rhinehart9 was modified. An algorithm was developed to get an optimal set of filter constants L1, L2, and L3, which results in early steady- and unsteady-state indication with both type I and II errors within specified limits. Choosing unequal values of L2 and L3 seemed to work better in terms of earlier detection of the steady state. Application of the present SSI method to real-time industrial data at 1 min intervals would suggest the process to be perennially in the transient state because of the presence of strong autocorrelation. However, by sampling the data less

24.56 78.42

61.28

frequently (every 5 min) and adding a random error component, it was possible to suppress the autocorrelation and yet identify the real transient states. For the DR problem, in the absence of adequate spatial redundancy, temporal redundancy associated with the measurements was exploited using two different approaches; the first one was based on application of the Kalman filter, while the second involved application of the least-squares optimization technique to exponentially filtered raw data. Both approaches worked equally well in reducing the random errors associated with the raw data as tested on the simulated measurements representing allowable drift. GED based on a simple comparison strategy making use of the previously reconciled data worked well in detecting single as well as multiple gross errors. The minimum amount of gross


temperatures are measured. All of these measurements except LR and UN flows and the draw plate temperatures are used as inputs to the simulator, while the rest of these measurements are used to tune the model online. So, it is necessary to get the true state of all of these measurements all the time so long as steady state prevails. The representation of this unit in the form of process digraph is as shown in Figure 8. Nomenclature

Figure 8. Nodal representation of the CDU

error that is detected depends on the variance associated with the averaged data and suitable choice of the random error limit. Although in the present study the above methodology has been applied to only data from an industrial distillation column, it is equally applicable to all such processes where multiple-staged operations are housed in a single unit such as extraction, absorption, adsorption, etc. Appendix CDU. The schematic of the CDU is as shown in Figure 7. The measured streams are the mass flow rate of the feed (crude oil) and the product streams UN, SCN, SK or kero, LGO, HGO, and LR and circulating flows top reflux, Top CR, kero CR, LGO CR, and HGO CR. In addition to the above 12 flows, temperatures such as all circulating reflux inlet and outlet temperatures, all draw plate temperatures, flash zone temperature, heater outlet temperature, and top plate and reflux

A ) incidence matrix CR ) circulating reflux d2 ) estimate of the squared difference of successive data H ) measurement matrix HGO ) heavy gas oil K ) Kalman filter gain matrix K1 ) exponential filter constants (exponential filter with optimization technique) LGO ) light gas oil LR ) long recidue L1, L2, L3 ) exponential filter constants (steady-state identification) M ) measurement noise variance-covariance matrix N ) number of measurements P(+) ) error covariance matrix of x immediately after a discrete measurement P(-) ) error covariance matrix of x immediately before a discrete measurement Q ) process noise variance-covariance matrix R ) statistics for steady-state identification SCN ) special cut naphtha SK/Kero ) superior kerosene s12 ) noise variance using the mean-square deviation approach s22 ) noise variance using the mean of squared difference of successive data approach UN ) unstabilized naphtha v ) measurement noise vector v2 ) mean-square deviation w ) process noise vector x ) state vector xˆ (+) ) estimate of x immediately after a discrete measurement xˆ (-) ) estimate of x immediately before a discrete measurement y ) measurement vector (for Kalman filter analysis) Z ) measurement value Z h ) average measurement value Subscripts 0 ) initial value f ) filtered value i, k ) time interval m ) measured values N ) number of measurements Superscript T ) transpose Greek Symbol σ j ) estimate of standard deviation Other Symbols E(‚) ) expected value


Literature Cited (1) Bethea, R. M.; Rhinehart, R. R. Applied Engineering Statistics; Marcel Dekker: New York, 1991. (2) Alekman, S. L. Control for the Process Industries; Putman Publications: Chicago, IL, Nov 1994; Vol. VII, No. 11, p 62. (3) Jubien, G.; Bihary, G. Control for the Process Industries; Putman Publications: Chicago, IL, Nov 1994; Vol. VII, No. 11, p 64. (4) Loar, J. Control for the Process Industries; Putman Publications: Chicago, IL, Nov 1994; Vol. VII, No. 11, p 62. (5) Crow, E. L.; Davis, F. A.; Maxfield, M. W. Statistics Manual; Dover Publications: New York, 1955; p 63. (6) Narsimhan, S.; Mah, R. S. H.; Tamhane, A. C.; Woodword, J. W.; Hale, J. C. A composite statistical test for detecting changes of steady states. AIChE J. 1986, 32, 1409-1418. (7) Narsimhan, S.; Kao, C. N.; Mah, R. S. H. Detecting changes of steady state using the mathematical theory of evidence. AIChE J. 1987, 33 (11), 1930-1933. (8) Li, B.; Chen, B.; Wang, J.; Cong, S. Steady-state online data reconciliation in a crude oil distillation unit. Hydrocarbon Process. 2001, Mar, 61-64. (9) Cao, S.; Rhinehart, R. R. An efficient method for on-line identification of steady state. J. Process Control 1995, 5 (6), 363374. (10) Kuehn, D. R.; Davidson, H. Computer control. II: Mathematics of Control. Chem. Eng. Prog. 1961, 57, 44-47. (11) Crowe, C. M. Data reconciliation- progress and challenges. J. Process Control 1996, 6 (2/3), 89-98.

(12) Yang, Y.; Ten, R.; Jao, L. A study of gross error detection and data reconciliation in process industries. Comput. Chem. Eng. 1995, 19, suppl., S217-S222. (13) Narasimhan, S.; Mah, R. S. H. Generalized Likelihood Ratio Method for Gross Error Identification. AIChE J. 1987, 33 (9), 1514-1521. (14) Biegler, L. T.; Tjoa, B. Simultaneous Strategies for Data Reconciliation and Gross Error Detection of Nonlinear Systems. Comput. Chem. Eng. 1991, 15 (10), 679-690. (15) Tong, H.; Crowe, C. M. Detection of Gross Errors in Data Reconciliation by Principal Component Analysis. AIChE J. 1995, 41 (7), 1712-1722. (16) Stanley, G. M.; Mah, R. S. H. Estimation of Flows and Temperatures in Process Networks. AIChE J. 1977, 23 (5), 642650. (17) Mah, R. S.; Stanley, G. M.; Downing, D. M. Reconciliation and rectification of process flow and inventory data. Ind. Eng. Chem., Process Des. Dev. 1976, 15 (1), 175-183. (18) Cao, S.; Rhinehart, R. R. Critical Values for a steady-state identifier. J. Process Control 1997, 7 (2), 149-152. (19) Kumar, V.; Sharma, A.; Chowdhary, I. R.; Ganguly, S.; Saraf, D. N. A crude distillation unit model for online applications. Fuel Process. Technol. 2001, 73, 1-21.

Received for review July 8, 2003 Revised manuscript received January 14, 2004 Accepted May 17, 2004 IE030563U

Steady-State Identification, Gross Error Detection, and Data

Recommend Documents