Article pubs.acs.org/IECR
Selective Use of Adaptive Soft Sensors Based on Process State Hiromasa Kaneko, Takeshi Okada, and Kimito Funatsu* Department of Chemical System Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan ABSTRACT: Soft sensors are widely used to realize efficient operations in chemical processes because some governing variables, such as product quality, cannot be measured directly through hardware in real time. One of the design problems of soft sensors is the degradation of their prediction accuracy. To reduce degradation, a range of adaptive models has been developed, such as moving window, just-in-time, and time difference models. However, none of these adaptive models performs well in all process states. To address this problem, we developed an online monitoring system using multivariate statistical process control to select the appropriate adaptive model for each process state. The proposed method was applied to dynamic simulation data and empirical industrial data. Higher predictive accuracy than from traditional adaptive models was achieved. This novel approach may be used to reduce the maintenance cost of soft sensors.
1. INTRODUCTION The efficient operation of chemical processes requires the monitoring and control of process variables, such as pressure, temperature, flow rate, liquid level, and concentration. However, there are other important variables, such as product quality, that are not as amenable to direct measurement via hardware in real time. To estimate the value of such a difficultto-measure variable, which we will denote as y, so-called soft sensors1−3 have been developed. A soft sensor or virtual sensor is a piece of software that processes several hardware measurements together, outputting information similar to a traditional hardware sensor. By using soft sensors, y values can be estimated from other easy-to-measure variables X. The chemical processes can be controlled efficiently and safely using the estimated values of y. The predictive accuracy of soft sensors decreases because of changes in, e.g., the state of the chemical plant, catalyzing performance loss, and sensor and process drift. This degradation of soft sensors in particular makes it difficult to identify the reasons for abnormal plant function. If the prediction error of a given y is above a certain threshold, there is no effective method to know whether the abnormal situation is due to the y analyzer itself or the degradation of the soft sensor model. To reduce degradation, the soft sensor model can be maintained through periodic reconstruction. The two main approaches are to use a sequentially updating scheme, such as a moving window (MW)4,5 or a recursive6 algorithm, or a just-in-time (JIT) scheme, of which the distance-based,7 the correlation-based,8 and the locally weighted partial-least-squares9 algorithms are well established. For example, an MW model is reconstructed with the data that were measured most recently, while a distance-based-JIT-enabled model is reconstructed with data whose similarities with the prediction data (e.g., measured by Euclidean distance) are higher than those of other data. To address the challenge of constructing a soft sensor model that handles abnormal entries in the training data, we previously developed a method based on the time difference (TD) of y and X.10−12 This TD model handles the effects of deterioration with age, such as sensor drift, and the gradual © 2014 American Chemical Society
changes in the state of the plant, minimizing the need for model reconstruction. Models such as the MW, JIT, and TD models that can predict the y values while adapting to the states of the plant are called adaptive models.13 There are no adaptive models with high predictive ability in all process states. Kaneko et al. found that TD models maintained high predictive accuracy even with shifting X or y values, independent of the rate of degradation.14 In contrast, the JIT models were only able to adapt to the degradation if it involved a shift in X values. These characteristics were confirmed through the analyses of numerical simulation and empirical industrial data. The different characteristics of each adaptive model mean that models should be matched to the degradation of the plant. Accordingly, we have been developing model selection methods based on the reliability of adaptive soft sensors.15,16 A TD model was used to predict y variables, and its reliability was monitored. When the reliability was low, the model was switched to the MW type.15 We extend this strategy to switch between TD and JIT models. A TD model is used in normal states and an MW or a JIT model is used in states in variation. This is effective because TD models have no model reconstruction, are stable, and can adapt to the shifts of X and y values, whereas MW and JIT models are adaptive to the rate of change of X and y values, i.e., a change of the process characteristics. In this paper, we present a novel online monitoring system to select the appropriate adaptive model for each process state. We use multivariate statistical process control (MSPC) methods17 to determine the reliability of the TD model, which indicates whether the current state is a normal state or a state in variation. We denote this approach the “discriminant model of soft sensors”. The selection of an appropriate soft sensor model with the discriminant model enables high predictive accuracy and reduces the maintenance cost of soft Received: Revised: Accepted: Published: 15962
May 21, 2014 August 11, 2014 September 23, 2014 September 23, 2014 dx.doi.org/10.1021/ie502058t | Ind. Eng. Chem. Res. 2014, 53, 15962−15968
Industrial & Engineering Chemistry Research
Article
sensors. The “maintenance cost” means the cost in reconstructing soft sensor models. For example, when abnormal data or outliers are stored in a database, predictive MW and JIT models cannot be constructed and the database must be managed, which requires some cost. To verify the effectiveness of the proposed method, we analyze data obtained by dynamic simulation of an existing fullscale distillation column. Furthermore, by analyzing data from an operational industrial plant, the prediction accuracy of the proposed method was discussed in each state of the plant.
between X and y.14 When the PR values are small, the process state will be stable and the TD model will have high predictive accuracy. 2.2. Proposed Method. When the EPM determines that the process characteristics are changing, the TD model is switched for an MW or JIT model. The decision algorithm is shown in Figure 2. First, the reliability of the TD model (see
2. METHODS We employ the ensemble prediction method (EPM)12 as it is an MSPC approach based on TD models that can adapt to the shifts of y values and X values. For a detailed comparison of MSPC methods, such as support vector machine-based fault detection methods,18,19 the interested reader is referred to Okada et al.16 The TD model is detailed in Appendix A.
Figure 2. Decision algorithm for adaptive model selection using the proposed method.
Appendix A) is checked for new data using PR. PR in the EPM is explained in section 2.1. The PR threshold in the EPM is determined as the minimum PR value within which the PR values of 99.7% of the training data are included. Then, a PR value for new data is compared with the PR threshold. When a PR value is lower than the threshold, the TD model is reliable and predicts the y value; otherwise the TD model is not reliable and an MW or a JIT model is used after the model is updated. The window size is defined as the number of data used for the model construction. Although MW and JIT models can adapt quickly to change from an unsteady to a steady state if the window is small, they are more stable and accurate when the window size is large. As the discriminant model of soft sensors selects a TD model soon after a steady state is reached, slow adaption times are avoided while also taking advantage of a large MW/JIT window during unsteady states. Thus, the discriminant model can prevent the predictive accuracy of the MW or JIT model from decreasing.
Figure 1. Basic concept of the EPM.12
2.1. Ensemble Prediction Method. Figure 1 shows the basic concept of the EPM. Let Δx(k|k − i) = x(k) − x(k − i) be the differential value of easy-to-measure variable X at times k and k − i. For each Δx(k|k − i), with i = 1, ..., n, the TD model f, which is shown in Appendix A, predicts Δypred values as follows: Δypred (k|k − 1) = f (Δx(k|k − 1)) Δypred (k|k − 2) = f (Δx(k|k − 2)) ⋮
(1)
3. RESULTS AND DISCUSSION To verify the effectiveness of the proposed method, we analyzed data obtained from a dynamic simulation of a depropanizer distillation column (based on a real industrial plant) and from the operation of distillation columns at the Mizushima Works, Mitsubishi Chemical Corp. We used a partial-least-squares method20 for the regression and set the window size for constructing the MW and JIT models as 50. The JIT model was Euclidean distance based. 3.1. Analysis of Dynamic Simulation Data. The dynamic simulation was performed with Visual Modeler (Omega Simulation Co., Ltd.).24 The y variable was the molar concentration of the bottom n-butane, and the X variables were the 17 variables such as temperature, pressure, and flow rate. Further technical details of the depropanizer distillation column are described by Kaneko and Funatsu.25,26 A schematic representation of the depropanizer distillation column andthe y variable
Δypred (k|k − n) = f (Δx(k|k − n))
From the differential Δypred values we calculate the corresponding series of ypred: ypred (k|k − 1) = Δypred (k|k − 1) + y(k − 1) ypred (k|k − 2) = Δypred (k|k − 2) + y(k − 2) ⋮
(2)
ypred (k|k − n) = Δypred (k|k − i) + y(k − n)
The standard deviation of ypred is an index of prediction reliability (PR). When the PR values are large, the process is changing and the prediction errors of the TD model will be large because the TD model cannot handle the slope change 15963
dx.doi.org/10.1021/ie502058t | Ind. Eng. Chem. Res. 2014, 53, 15962−15968
Industrial & Engineering Chemistry Research
Article
X variable consisted of 11 time points (0, 6, 12, 18, 24, 30, 36, 42, 48, 54, and 60 min). The TD interval was also 6 min. The PR threshold in the EPM was determined to be 0.0761 by use of the way described in section 2.2. The proposed discriminant model was able to reduce the maintenance cost required for the MW and JIT models by 98.6% because these models were applied to only 27 data of 1920 data, whereas the TD model was used for the other data. Table 1 shows the
and the X variables are shown in Appendix B. The feed molar concentrations of ethane, propane, isobutene, n-butane, and isopentane were 0.2, 65, 4.7, 30, and 0.1%, respectively. The steam temperature was set as 185 °C. Noise was added to the molar concentrations of propane and n-butane and to the steam temperature, which was generated using the random walk method27 within ±1%, ±1%, and ±3 °C, respectively. The setting values for each instrument and the control system were given as the default values. Two types of disturbances were generated in this study: (i) changes in the feed concentrations of n-butane and propane, and (ii) changes in the steam temperature. The concentration of n-butane varied as follows: 30% from 1 to 1440 min (24 h), 35% from 1440 to 4320 min (72 h), 30% from 4320 to 7200 min (120 h), 35% from 7200 to 10 080 min (168 h), and 30% from 10 080 to 17 280 min (288 h). The feed concentration of propane changed in correspondence with that of the n-butane. The steam temperatures were 185 °C from 1 to 12 960 min (216 h), 220 °C from 12 960 to 15 840 min (264 h), and 185 °C from 15 840 to 17 280 min (288 h). The time plots of the concentration of n-butane and the steam temperature are shown in Appendix B. Both the measurement interval and the measurement delay of the y variable were set as 6 min. The first 5760 min (96 h) of data were used for the training data, and the next 11 520 min (192 h) were the test data. Only the variation in the feed concentration of the n-butane was included in the training data; the variation in the steam temperature was an unknown disturbance. To incorporate the dynamics of process variables into soft sensor models, X included each process variable delayed for durations in the range of 0−60 min in steps of 6 min; i.e., each
Table 1. Prediction Result of Each Model for Dynamic Simulation Data RMSE TD model MW model JIT model proposed model (TD and MW) proposed model (TD and JIT)
0.0735 0.0922 0.1035 0.0500 0.0963
root-mean-square-error (RMSE) values of the test data for each model. When used in isolation the JIT model had a lower RMSE value than the MW model, indicating better predictive accuracy, which in turn performed better than the TD model. However, when the TD was paired with either the MW or JIT in the proposed model, the predictive accuracy improved, with the former being the optimal combination. The JIT model has the highest RMSE value, indicating it to be the least suitable for adapting to the degradation in this process. Figure 3 shows time plots of the prediction error of y. When the disturbance took place from 1300 to 1900 min (Figure 3a,b), the error of the TD model was less than those of the MW and JIT models. The variation in the feed concentration was
Figure 3. Examples of y prediction errors for the dynamic simulation data. The plot captions indicate the model (MW, JIT) used in conjunction with the TD model in the proposed model. (a) MW, from 1300 to 1900 min; (b) JIT, from 1300 to 1900 min; (c) MW, from 7100 to 7600 min; (d) JIT, from 7100 to 7600 min. 15964
dx.doi.org/10.1021/ie502058t | Ind. Eng. Chem. Res. 2014, 53, 15962−15968
Industrial & Engineering Chemistry Research
Article
included in the training data and the TD model was able to adapt. The relatively small PR values in the EPM during this disturbance meant that the proposed discriminant model mostly operated with the TD model having high predictive ability. The disturbance of the feed temperature, which was not included in the training data, was an unknown disturbance for the TD model. When this disturbance commenced (Figure 3c,d), the PR values indicated a state of variation and selected the MW or JIT model. Evidently, the largest y prediction error of the TD model could be avoided by switching to the MW or JIT model. After the disturbance had passed, although the MW and JIT models were affected by the variations and their predictive ability was low, the TD model that was not affected by the variation was selected again with the proposed method, and was able to produce small prediction errors of y. The proposed
method achieved high predictive accuracy both in the normal states and in the states in the variation. The discriminant model of soft sensors was able to determine whether the TD model could predict y values accurately or not. The proposed method can reduce the frequency of usage of the MW or JIT models and improve the predictive accuracy of soft sensors. 3.2. Analysis of Empirical Industrial Data. The following is an analysis of data obtained from the operation of distillation columns at the Mizushima Works. The y variable was the concentration of the bottom product with the lower boiling point, and the X variables were the 19 process variables (including temperature, pressure, flow rate). The technical details of this distillation column are given by Kaneko et al.4 The y and X measurement intervals were 30 min and 1 min, respectively. The training data were recorded from January to December 2002, and the test data were recorded from January 2003 to December 2006.
Figure 4. Examples of RMSE values when real industrial data were used. The plot captions indicate the model (MW, JIT) used in conjunction with the TD model in the proposed model. (a) MW, from 462 700 to 490 000 min; (b) JIT, from 462 700 to 490 000 min; (c) MW, from 515 000 to 562 000 min; (d) JIT, from 515 000 to 562 000 min; (e) MW, from 1 380 000 to 1 450 000 min; (f) JIT, from 1 380 000 to 1 450 000 min. 15965
dx.doi.org/10.1021/ie502058t | Ind. Eng. Chem. Res. 2014, 53, 15962−15968
Industrial & Engineering Chemistry Research
Article
To incorporate the dynamics of process variables into soft sensor models, the X variables included each process variable delayed for durations in the range of 0−60 min in steps of 10 min; i.e., seven time points (0, 10, 20, 30, 40, 50, and 60 min). The TD interval was 30 min because the y measurement interval was 30 min. The PR threshold in the EPM was determined to be 0.354. The proposed method could reduce the maintenance cost required for the MW model or the JIT model by 71.0% because the MW model or the JIT model was applied to 16 769 data of 57 738 data whereas the TD model was used for the other data. The RMSE value for each model is shown in Table 2. The
Although the Euclidean distance based JIT model was used in this paper, other JIT models such as the locally weighted partial-least-squares model9,21 and other MW models such as the online support vector regression model22,23 can be employed in our proposed system. The discriminant model of soft sensors was constructed using the MSPC methods. The performance improvement of the MSPC methods will contribute to the discriminant ability of the proposed method. We believe that, by applying our proposed method to process control and achieving adaptation to the process characteristics with highly accurate prediction, chemical plants will be operated effectively and stably.
Table 2. Prediction Result of Each Model for Real Industrial Data
APPENDIX A: TD MODEL For the present time t and a previous time i, we define the temporal differentials of X and y, respectively, as
■
RMSE TD model MW model JIT model proposed model (TD and MW) proposed model (TD and JIT)
0.3973 0.3763 0.3820 0.3728 0.3780
ΔX(t |t − i) = X(t ) − X(t − i)
(A.1)
Δy(t |t − i) = y(t ) − y(t − i)
(A.2)
We model the relationship between ΔX(t|t − i) and Δy(t|t − i) as
RMSE value of the TD model was the highest in this case study. Using the proposed method with the TD and MW models resulted in an RMSE value slightly lower than the values from either of the two models used separately. The same held true for the TD−JIT pairing. Figure 4 shows time plots of RMSE values, each of which was calculated with the 48 data (1 day) close to each time. From Figure 4a−d, the proposed method could use the TD and MW models or the TD and JIT models depending on the process states. When the TD model had higher predictive ability than the MW model or the JIT model, the TD model was selected appropriately, and vice versa. As shown in Figure 4e,f, when the RMSE values of the TD model were large, which meant that the process was in a state of variation, the proposed method could select the MW model or the JIT model. The PR in the EPM detected the time when the states of the plant were different from the one in the training data for the TD model. The MD model, and not the TD model, was then used for predicting the values of y. After the each variation, when the RMSE values of the TD model became smaller than those of the MW or JIT models and would be affected by the old data in the variation, the proposed method could select the TD model appropriately and achieve high prediction accuracy. It was confirmed that the PR in the EPM, which is the discriminant model of soft sensors, could assess the current states of the plant and select the appropriate adaptive model with high accuracy for the state.
Δy(t |t − i) = f (ΔX(t |t − i)) + e
(A.3)
where f is a regression model and e is a vector of error residuals. The function f gives the predicted value of the y temporal differential for consecutive times t′ and i as follows: Δx(t ′|t ′ − i) = x(t ′) − x(t ′ − i)
(A.4)
Δypred (t ′|t ′ − i) = f (Δx(t ′|t ′ − i))
(A.5)
This is then used to determine the y prediction from the previous value: ypred (t ′) = y(t ′ − i) + Δypred (t ′|t ′ − i)
(A.6)
This method can be easily extended to nonuniform time intervals. By constructing time difference models, the effects of deterioration with age, such as the drift and gradual changes in the states of plants, are implicitly handled.
4. CONCLUSIONS In this paper, we developed an MSPC-based discriminant model that used a reliability metric to select the optimal adaptive model for predictive accuracy. The discriminant model demonstrates that the TD model is best suited for steady-state operation, MW and JIT models perform better during unsteady states, and by switching between them one can achieve an aggregate performance better than that of the component models individually. Through the analyses of dynamic simulation data and empirical industrial data, we concluded that the proposed method could reduce maintenance cost and improve the predictive accuracy of soft sensors.
Figure 5. Schematic representation of the depropanizer distillation column.25,26 15966
dx.doi.org/10.1021/ie502058t | Ind. Eng. Chem. Res. 2014, 53, 15962−15968
Industrial & Engineering Chemistry Research
■
Article
APPENDIX B: DEPROPANIZER DISTILLATION COLUMN USED FOR DYNAMIC SIMULATION
acknowledge the support of the Core Research for Evolutional Science and Technology (CREST) project “Development of a knowledge-generating platform driven by big data in drug discovery through production processes” of the Japan Science and Technology Agency (JST). The authors acknowledge the support of Mizushima Works, Mitsubishi Chemical Corp.
Figure 5 shows a schematic representation of the depropanizer distillation column, and Table 3 shows the y variable and the X variables. The time plots of the concentration of n-butane and the steam temperature are shown in Figure 6.
■
Table 3. Process Variables Measured in the Depropanizer Distillation Column25,26 symbol
(1) Kano, M.; Nakagawa, Y. Data-based Process Monitoring, Process Control, and Quality Improvement: Recent Developments and Applications in Steel Industry. Comput. Chem. Eng. 2008, 32, 12. (2) Kadlec, P.; Gabrys, B.; Strandt, S. Data-driven Soft Sensors in the Process Industry. Comput. Chem. Eng. 2009, 33, 795. (3) Kano, M.; Fujiwara, K. Virtual sensing technology in process industries: Trends and challenges revealed by recent industrial applications. J. Chem. Eng. Jpn. 2013, 46, 1−17. (4) Kaneko, H.; Arakawa, M.; Funatsu, K. Development of a New Soft Sensor Method Using Independent Component Analysis and Partial Least Squares. AIChE J. 2009, 55, 87. (5) Kadlec, P.; Gabrys, B. Local Learning-Based Adaptive Soft Sensor for Catalyst Activation Prediction. AIChE J. 2010, 57, 1288. (6) Qin, S. J. Recursive PLS Algorithms for Adaptive Data Modelling. Comput. Chem. Eng. 1998, 22, 503. (7) Cheng, C.; Chiu, M. S. A New Data-based Methodology for Nonlinear Process Modeling. Chem. Eng. Sci. 2004, 59, 2801. (8) Fujiwara, K.; Kano, M.; Hasebe, S.; Takinami, A. Soft-sensor Development Using Correlation-based Just-in-time Modeling. AIChE J. 2009, 55, 1754. (9) Schaal, S.; Atkeson, C. G.; Vijayakumar, S. Scalable techniques from onparametric statistics for real time robot learning. Appl. Intell. 2002, 17, 49−60. (10) Kaneko, H.; Funatsu, K. Maintenance-Free Soft Sensor Models with Time Difference of Process Variables. Chemom. Intell. Lab. Syst. 2011, 107, 312. (11) Kaneko, H.; Funatsu, K. Development of Soft Sensor Models Based on Time Difference of Process Variables with Accounting for Nonlinear Relationship. Ind. Eng. Chem. Res. 2011, 50, 10643. (12) Kaneko, H.; Funatsu, K. A Soft Sensor Method Based on Values Predicted from Multiple Intervals of Time Difference for Improvement and Estimation of Prediction Accuracy. Chemom. Intell. Lab. Syst. 2011, 109, 197. (13) Kadlec, P.; Grbic, R.; Gabrys, B. Review of Adaptation Mechanisms for Data-driven Soft Sensors. Comput. Chem. Eng. 2011, 35, 1. (14) Kaneko, H.; Funatsu, K. Classification of the Degradation of Soft Sensor Models and Discussion on Adaptive Models. AIChE J. 2013, 59, 2339. (15) Okada, T.; Kaneko, H.; Funatsu, K. Development of an Adaptive Soft Sensor Method Considering Prediction Confidence of Models. J. Comput. Chem. Jpn. 2012, 11, 24 (in Japanese). (16) Okada, T.; Kaneko, H.; Funatsu, K. Development of a Model Selection Method Based on Reliability of a Soft Sensor Model. Songklanakarin J. Sci. Technol. 2012, 34, 217. (17) Bersimis, S.; Psarakis, S.; Panaretos, J. Multivariate Statistical Process Control Charts: An Overview. Qual. Reliab. Eng. Int. 2007, 23, 517. (18) Kulkarni, A.; Jayaraman, V. K.; Kulkarni, B. D. Knowledge incorporated support vector machines to detect faults in Tennessee Eastman Process. Comput. Chem. Eng. 2005, 29, 2128. (19) Yu, J. A support vector clustering-based probabilistic method for unsupervised fault detection and classification of complex chemical processes using unlabeled data. AIChE J. 2012, 59, 407−419. (20) Wold, S.; Sjöström, M.; Eriksson, L. PLS−regression: a Basic Tool of Chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109. (21) Kim, S.; Kano, M.; Nakagawa, H.; Hasebe, S. Estimation of active pharmaceutical ingredients content using locally weighted partial least squares and statistical wavelength selection. Int. J. Pharm. 2011, 421, 269−274.
objective variable
A no.
symbol
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
F1 F2 F3 F4 F5 F6 P1 P2 T1 T2 T3 T4 T5 T6 T7 L1 L2
molar concn of bottom n-butane explanatory variable feed flow bottom flow top flow steam flow flow to reboiler reflux flow press. 1 press. 2 feed temp top temp bottom temp steam temp temp 1 temp 2 temp 3 liquid level 1 liquid level 2
Figure 6. Time plots of concentration of n-butane and steam temperature.
■
REFERENCES
AUTHOR INFORMATION
Corresponding Author
*Tel.: +81-3-5841-7751. Fax: +813-5841-7771. E-mail:
[email protected]. Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS H.K. is grateful for financial support in the form of a Grant-inAid for Young Scientists (B) (No. 24760629). The authors 15967
dx.doi.org/10.1021/ie502058t | Ind. Eng. Chem. Res. 2014, 53, 15962−15968
Industrial & Engineering Chemistry Research
Article
(22) Kaneko, H.; Funatsu, K. Application of Online Support Vector Regression for Soft Sensors. AIChE J. 2014, 60, 600. (23) Kaneko, H.; Funatsu, K. Adaptive Soft Sensor Model Using Online Support Vector Regression with the Time Variable and Discussion on Appropriate Hyperparameters and Window Size. Comput. Chem. Eng. 2013, 58, 288. (24) http://www.omegasim.co.jp/product/vm/index.htm. (25) Kaneko, H.; Funatsu, K. Consideration of Soft Sensor Methods Based on Time Difference and Discussion on Intervals of Time Difference. J. Comput. Aided Chem. 2012, 13, 29 (in Japanese). (26) Kaneko, H.; Funatsu, K. Development of Soft Sensor Models Based on Time Difference of Process Variables with Accounting for Nonlinear Relationship. Ind. Eng. Chem. Res. 2011, 50, 10643. (27) Grady, L. Random Walks for Image Segmentation. IEEE Trans. Pattern Anal. 2006, 28, 1768.
15968
dx.doi.org/10.1021/ie502058t | Ind. Eng. Chem. Res. 2014, 53, 15962−15968