Article Cite This: Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
pubs.acs.org/IECR
Online Quality Prediction of Industrial Terephthalic Acid Hydropurification Process Using Modified Regularized Slow-Feature Analysis Weimin Zhong,* Chao Jiang, Xin Peng, Zhi Li, and Feng Qian
Downloaded via UNIV OF NEW ENGLAND on July 17, 2018 at 02:33:23 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
Key Laboratory of Advanced Control and Optimization for Chemical Processes of Ministry of Education, East China University of Science and Technology, Shanghai 200237, China ABSTRACT: Purified terephthalic acid (PTA) is an important product for the polyester and textile industry. In the industrial PTA-production process, 4-carboxybenzaldehyde (4-CBA) is a detrimental byproduct that can lower the polymerization rate and the average molecular weight of the polymer. Therefore, the content of 4-CBA in the final product can be used as a quality index to evaluate the current running status of the PTAproduction process. However, because of the slow catalyst deactivation, this process is notable for its nonlinearity and dynamics. It is very difficult to obtain the 4-CBA-content values using traditional prediction methods from the process directly in real-time. For a better estimation of the status of the PTA-production process, a novel, online quality-prediction method based on modified regularized slow-feature analysis (ReSFA) is proposed in this paper for predicting the concentration of 4-CBA. The proposed method can handle the dynamics of the process better by exploring the temporal relationship of the input variables and incorporating the neighboring relationships of the input and output variables. Meanwhile, a modified just-in-time-learning method is introduced to deal with nonlinearity to improve online prediction performance. Finally, a case study is conducted with data sampled from a practical industrial terephthalic acid hydropurification process to demonstrate the effectiveness and superiority of the proposed method.
1. INTRODUCTION Terephthalic acid (TA), as the main intermediate of polyethylene terephthalate (PET), plays an indispensable role in the modern polymer and textile industry.1−3 However, crude terephthalic acid (CTA) contains such impurities as about 2000−6000 ppm 4-CBA,4,5 which can lower the polymerization rate and the average molecular weight of the polymer. It is not easy to separate 4-CBA from PTA using physical approaches because the molecular weight of 4-CBA is similar to that of TA.2 Thus, it is very necessary to implement a CTApurification process to decrease 4-CBA in TA. A real CTApurification process is composed of two main phases. The first phase is a p-xylene (PX)-oxidation process, where PX in an acetic acid solvent is oxidized to TA in a continuously stirred tank reactor using air or molecular oxygen.3 In this phase, CTA is produced. In the second phase, namely, the hydropurification process, CTA gets refined by reacting 4-CBA with hydrogen. Finally, CTA is converted into PTA after certain processes. Considering the detrimental effect of 4-CBA in some industrial processes, it is necessary to use the 4-CBA concentration as an index to estimate the PTA-production process. However, because of the complexity, nonlinearity, dynamics, and strong coupling of the CTA-hydropurification process with slow catalyst deactivation, the 4-CBA content cannot be directly measured from the real process. Conversely, © XXXX American Chemical Society
the products of the practical PTA-production process need to be tested for 2 h in a laboratory, but the data collected from the laboratory cannot be directly used to evaluate the real-time running status of the process, because the 2 h spent in the laboratory can result in a very serious delay. The delay may incur a 2 h product failure. Therefore, traditional 4-CBAmeasuring methods might be no longer suitable in this situation. In modern industrial processes, safety and quality requirements have been paramount; thus, real-time monitoring and control have become increasingly significant, and the requirements for the precise measurement of key quality-related variables need to be satisfied.6−11 However, some of these variables may be time-consuming or costly to measure in realtime in industrial processes because of the expensive of the procedures, dangerous environments, long analysis cycles, and other reasons.6 The inaccurate estimations of some key variables may result in poor control performance in closedloop control systems. As an effective alternative for hardware sensors, soft sensors have been applied to alleviate the aforementioned problems. The basic idea of soft sensors is Received: Revised: Accepted: Published: A
March 26, 2018 June 6, 2018 June 28, 2018 June 28, 2018 DOI: 10.1021/acs.iecr.8b01270 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
many adaptive mechanisms have been introduced to develop existing quality-prediction methods, such as moving-window (MW) models,25 recursive models,26 and time-difference (TD) models.27 Nonetheless, some common limitations of these mechanisms can be analyzed. MW and recursive models lack the capacity for handling abrupt changes until numerous new data from new conditions have been modeled. Additionally, most recursive approaches fit a single global model that may fail in coping with strong nonlinearity, and TD models cannot deal with nonlinearity either. Just-in-time-learning (JITL)based models, notable for their preponderance in simultaneous local modeling and processing time delay, have been proposed for performance monitoring and soft sensors in many areas.28,29 A JITL framework enables one to construct local models with the most relevant samples from a historical data set. The criteria to evaluate similarities between selected samples and query samples have much evolved. One of the most common criteria is based on the Euclidean distance, where the k nearest neighbors of the query data compose the relevant data set. Such techniques as k surrounding neighbors30 and k bipartite neighbors31 have been put forward to enhance the accuracy of the selection of similar data. Besides distance information, the angle information between two samples has also been taken into consideration.32 Fujiwara et al.33 proposed a novel similarity criterion based on correlation that combines the squared prediction error (SPE) and Hotelling T2 statistics using PCA so that correlation among variables can be exploited, and sample selection can be improved. Recently, Yuan et al.34 proposed a probabilistic JITL (P-JITL) framework to deal with samples that contain missing values in the chemical process using a symmetric-Kullback−Leiblerdivergence measurement. In the traditional correlation-based JITL (CoJITL) method, the performance of sample selection and local modeling can be improved compared with those in distance-based and angle-based ones. PCA can be utilized to construct T2 and SPE statistics. These two statistics respectively stand for correlation similarity and dissimilarity. On the basis of the two statistics, a correlation-based similarity criterion can be proposed. However, Arbel et al. and Tyréus35 emphasized the importance of seeking out the dominant process variables when using PCA. Jiang et al.36 pointed out that useful information on processes may be suppressed or submerged into unmatched subspaces when using original PCA to construct subspaces. Hence, when using PCA to suppress process information, selection of variables requires more considerations. However, when using PCA to construct similarity criteria in a conventional CoJITL framework, only variables with large covariance can be selected. In other words, variables with small covariance will be considered less important and therefore discarded, which may cause loss of information to some extent. In order to avoid loss of process information as much as possible, sensitive principal-component analysis (SPCA) is introduced to replace ordinary PCA in the CoJITL framework in this paper. SPCA is an effective tool for capturing process information from variables with small covariance. This method can detect which variables are “sensitive” to the current measured data by checking the sensitivity of each variable to the change of the T2 statistic. By means of SPCA in CoJITL modeling, loss of process information is reduced and sample selection is more reasonable. The local models for subsequent regression modeling are more effective. Meanwhile, inspired by the idea that the complexity of a process can be represented by
to establish a regression model to predict hard-to-measure variables through known variables. Conventional soft sensors can be classified into two main categories: model-driven and data-driven approaches. On account of the popularization of the distributed-control system (DCS), massive volumes of process data can be archived, which has contributed to the rapid advancement of data-driven soft sensors during the past two decades. Some of the most representative data-driven soft sensors have been extensively applied to industries, such as principal-component analysis (PCA),8 independent-component analysis (ICA),7 partial least-squares (PLS),12 extreme machine learning (ELM), 13 artificial neural networks (ANNs),14 support vector machines (SVMs),15 and Gaussian process regression (GPR).16 However, some of the aforementioned methods (e.g., original PCA and PLS) can only be used to extract static latent features. Latent features derived via these methods are unsuitable for modeling dynamic processes because they are limited by containing temporally related process information. In practical chemical processes, frequent fluctuations of equipment characteristics with time, such as catalyst deactivation, equipment aging, environmental change, variation of feed properties, and changes of operating points, always result in dynamic processes, which deteriorates the models and incurs model mismatches in a slow way. In order to solve the aforementioned problems, slow-feature analysis (SFA), a new merged unsupervised algorithm,17 is applied in our work for its remarkable ability to extract slowly varying and temporally related features for modeling dynamical processes. SFA has been extensively applied in blind source separation,18 image processing19 and process monitoring.20,21 For SFA, the extracted slowly varying features can represent intrinsic information on processes. In contrast, the quickly varying features can be seen as the carriers of process-noise information. On the basis of the extracted slow features, SFA is more applicable for modeling dynamical processes. Heretofore, some SFA-based methods have been applied in prediction modeling. Shang et al.22 proposed probabilistic slow-feature analysis (PSFA) using state space to describe the dynamics for soft-sensor modeling. Fan et al.23 put forward a robust PSFA-based regression model to model outliers in the observation data. The above research adopting SFA for qualityprediction modeling has achieved good results. However, because SFA is based on global slow-feature extraction, the preservation of local information is not taken into consideration. To solve this problem, Böhmer et al.24 developed regularized SFA by introducing a regularization term into the optimization objective of SFA. Nevertheless, the local-structure information does not only exist in the input space but also in the output space. It is noticeable that local-structure preservations are only conducted in the input space when preforming regularized SFA. In other words, the locality of output variables is neglected. The waste of locality information in output space may result in inaccurate prediction modeling. Therefore, with the aim of better dealing with the dynamics of the CTA-hydropurification process, the temporal local information of the input space and the output space are incorporated into a regularization term to modify conventional SFA in this paper. Although SFA can handle process dynamics well to some extent, maintaining the high performance of a qualityprediction model for a process is still a challenging problem. In order to cope with the degradation of prediction models, B
DOI: 10.1021/acs.iecr.8b01270 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research temporally invariant features,17 a regression model based on modified regularized slow-feature analysis (ReSFA) is first integrated into the JITL framework for prediction-performance improvement. The rest of this paper is organized as follows: In Section 2, some preliminaries about the CTA-hydropurification process, ordinary SFA, and the JITL framework are reviewed. The proposed JITL-based SFA regression model is detailed and described in Section 3. Sequentially, the case studies of quality prediction of the CTA-hydropurification process are carried out in Section 4. Finally, some conclusions are listed in the Section 5.
extract temporally related latent variables (viz., slow features) that can better address the dynamics to some extent. The methodological details of SFA are as follows.17 Given a temporal sequence X = [x1, ..., x t ] ∈ RNt , N indicates the dimension. Sequence {xi} needs to be centered in advance. The purpose of SFA is to learn the representation T = [t1, ..., t t ] ∈ Rnt , which can globally minimize the temporal variation of X. The temporal variation at time i can be defined as ti̇ = ti − ti−1. The optimization problem of SFA can be formulated as t
arg min ∑ ∥ti̇ ∥22 T
2. PRELIMINARIES 2.1. Description of the CTA-Hydropurification Process. The industrial PTA-production process comprises two main sections, namely, the oxidation process and the hydropurification process. In the oxidation process, the raw material, p-xylene, in an acetic acid solvent is oxidized into TA with a high content of the byproduct 4-CBA in a continuously stirred tank reactor by air or molecular oxygen. It is hard to separate 4-CBA using conventional physical methods, but the CTA-hydropurification process can lower the content of 4CBA in CTA. In this process,1,12 CTA containing around 3000 ppm 4-CBA as well as polyaromatic compounds from the PXoxidation process mix with deionized water in a storage tank, at which point the mass fraction of the CTA slurry is about 29%, and then the CTA slurry is pumped through five continuous heat exchangers, which contributes to a rise in the slurry’s temperature. This is followed by the dissolution of CTA in the water. Afterward, the slurry and hydrogen are injected into the top of a fixed-bed reactor whose bed is filled to the brim with 0.5 wt % carbon-coated palladium (Pd/C) catalyst. Pd sinters very easily, and a rough proportionality has been proved to exist between the catalyst activity and the Pd surface area. Pd sintering possibly results in the deactivation of the catalyst. Under a pressure of about 7.9 MPa, 4-CBA reacts with hydrogen and is converted into p-toluic acid, which is more soluble than 4-CBA. Then, the product comes out from the bottom of the reactor and goes through five continuous crystallizers, and the temperature of the product drops down to that of the atmosphere. Finally, the product with a content of 4-CBA less than 25 ppm can be obtained. Figure 1 illustrates the flowchart of the CTA-hydropurification process. 2.2. Slow-Feature Analysis. SFA is an unsupervised algorithm that extracts invariant features from quickly varying signals. Different from traditional PCA and PLS, SFA can
subject to (s.t.)
TTT = I (1)
i=2
where I denotes an identity matrix, and TTT = I guarantees a nontrivial solution. To simplify this optimization problem, it is assumed that the output variables, namely, slow features, are a linear combination of input variables. The linear mapping is
T = WTX
(2)
where W ∈ RNn. We have t
∑ ∥ti̇ ∥22
̇ ̇ T) = tr[WT(XX ̇ ̇ T)W] = tr(TT (3)
i=2
w h e re tr (· ) d e no t e s t h e ma t r i x - t ra c e o p er a t o r , Ṫ = [t 2̇ , ..., t ṫ ] ∈ Rn(t − 1), and Ẋ = [ẋ2 , ..., ẋ t ] ∈ RN(t − 1). Furthermore, sequence {xi} needs to be whitened: XXT = I
(4)
Therefore, we have TTT = WTXXTW = WTW = I
(5)
The optimization problem of SFA can be rewritten as ̇ ̇ T)W] s.t. WWT = I arg min tr[WT(XX W
(6)
By solving the generalized eigen-decomposition problem, the mapping W can be calculated. ̇ ̇ T)W = ΛW (XX
(7)
where Λ denotes a diagonal matrix of eigen-values. Finally, the slow features (SFs), T, can be obtained using eq 2. 2.3. Just-in-Time-Learning Framework. JITL is a framework that concentrates on modeling current situations according to the most relevant data samples. Thus, this method inherently works well at tracking the variations of process characteristics. JITL enables one to establish a more flexible and adaptive soft-sensor model. Given the newly measured input variables, xnew, the JITL framework features the subsequent procedures, which include three main steps: (1) searching the historical data samples to find relevant samples that match xnew by using a certain resemblance criterion, (2) utilizing the relevant data samples to form a local model for the current measured sample, (3) predicting the output variables depending on the local model, and (4) updating the historical database with the newly measured sample. Previously, the distance was employed as a resemblance criterion, for example, the Euclidean norm d(xnew, xi) = ∥xnew −
Figure 1. Flowchart of the purified terephthalic acid hydropurification process. C
DOI: 10.1021/acs.iecr.8b01270 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 2. Sample selection of Euclidean-distance-based JITL modeling (left) and CoJITL modeling (right).
PCA for similarity and dissimilarity measurement, the first several principal components (PCs) and their corresponding variables are considered to contain the most information on the process, and the rest of the PCs are not employed. Therefore, the first PCs are used to calculate the T2 and SPE statistics. The T2 statistic measures the variation along the directions of the PCs, but the directions are indefinite because there is no definite mapping between newly sampled data and PCs. Thus, it is possible for the remaining variables or PCs to preserve process information, and information extraction from these variables is of significance. In our proposed similarity measurement, SPCA can be utilized as an effective tool to extract process information from variables that traditional CoJITL modeling may discard. To illustrate SPCA-based similarity measurement, let the scaled database matrix be X database ∈ RsM with zeros as the mean and unit variance, where M is the length of the database, and s is the number of variables, and let the scaled measured vector be x measured ∈ Rs × 1. First, the conventional PCA model is built on the basis of Xdatabase. In PCA, the loading matrix, P ∈ Rsk , can be derived via singular-value decomposition (SVD); k is the number of retained PCs. The number of retained PCs can be determined k s by the CPV method, which is ∑i = 1 λi /∑i = 1 λi ≥ θ1, where λi is the variance of the score vector, and θ1 is a threshold value for the CPV method. If the CPV is larger than θ1, the corresponding PCs will be selected. The T2 statistic of the first k PCs can be calculated as Tmeasured2 = xTmeasuredPΛ−1PTxmeasured, where Λ ∈ Rk × k is a diagonal matrix. It has been proved that PCs with small variance may be as important as ones with larger variance, so information extraction from variables with small variance is quite significant, especially when the number of process variables is too large. To describe the variation in the direction of mth PC, the change rate (RTm,a2) of Tm2 on the ath measured vector can be defined as
xi∥2, where xi is derived from the historical database and i = 1, 2, ..., n (n denotes the size of the database). However, a relatively precise local model can be built when the correlation is strong even when the distance is large. Hence, not only distances but also angles among samples must be considered. The calculation of angles is θ(x new , x i) = arccos ∥ x
x Tnew x i new
∥22 ∥ x i ∥22
.
A newly merged form of the resemblance criterion can be listed like eq 8. D(x new , x i) = α exp[− d(x new , x i)] + (1 − α)cos[θ(x new , x i)]
(8)
where α ∈ [0, 1]. In this paper, this JITL-modeling method is renamed JITL(D&A). Correlation-based JITL has been proposed to improve the performance of the similarity measurement. This modeling method utilizes PCA to establish SPE and T2 statistics for similarity calculations, where the T2 statistic and SPE statistic serve as indices of similarity and dissimilarity, respectively. The SPE and T2 statistics are integrated into a single index. J = ρTPCA 2 + (1 − ρ)SPE PCA
(9)
where ρ ∈ [0, 1]. In this paper, we renamed this method JITL(PCA). A comparison of sample selection between Euclideandistance-based JITL modeling and correlation-based JITL modeling is depicted in Figure 2. Red points denote the chosen data, and black points denote the discarded data. In traditional JITL modeling (Figure 2, left), a neighboring region around the query data is defined only on the basis of distance. If samples have small distances but weak correlations, the selected samples may be unsuitable for local modeling. However, in CoJITL modeling (Figure 2, right), sample selection is based on the correlation as well as the distance; hence, this method can select samples more accurately.
3. MODIFIED JITL-BASED REGULARIZED-SFA FRAMEWORK FOR QUALITY-PREDICTION MODELING 3.1. Improved Correlation-Based Similarity Measurement. In this subsection, a novel correlation-based similarity measurement is proposed. In the proposed method, sensitive PCA is utilized to replace the PCA in the CoJITL framework because of its ability to reduce information loss and construct more accurate similarity criteria. When using conventional
R Tm,a2 =
Tm , a 2 1 M ∑ j = 1 Tm , j 2 M
(10)
where Tm,a2 is the T2 statistic of the mth PC (m = 1, 2, ..., k) on the ath measured sample. The number of SPCs is determined l k by the rule ∑m = 1 R Tm,a2 /∑m = 1 R Tm,a2 ≥ θ2 , where l is the number of retained SPCs, and θ2 is a threshold value to D
DOI: 10.1021/acs.iecr.8b01270 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Industrial & Engineering Chemistry Research
t t ij t yz j z arg minjjj∑ ∥ti̇ ∥22 + μ∑ ∑ K (i − 1)(j − 1) ∥ti̇ − tj̇ ∥22 zzz zz W j j i=2 i=2 j=2 k {
measure the sensitivity of retained PCs. If l k ∑m = 1 R Tm,a2 /∑m = 1 R Tm,a2 ≥ θ2 is satisfied, then the first l SPCs of database samples will be selected to reconstruct the T statistic T database 2 = x database P′Λ′ −1 P′ T x database , where s×1 sl x database ∈ R , P′ ∈ R , and Λ′ ∈ Rl × l . The SPE statistic can be achieved as follows: SPEdatabase = eeT, where e = xTdatabase(I − P′P′T). Then, the T2 statistic and the SPE statistic are integrated into a new similarity measurement as J = ρTdatabase 2 + (1 − ρ)SPEdatabase
WWT = I
(11)
k
M
∑i = 1 Ji
s.t. (13)
where K is a matrix used to measure the locality of samples, and μ is a parameter serving as a weight to balance the temporal slowness and regularized term. The former part of the new optimization objective is identical to SFA, whereas the latter one acts as the regularization term. Matrix K guarantees the preservation of the locality of samples; in this paper, Euclidean distance is employed into K. It should be noted that the output information is unemployed in most JITL-modeling methods, which may contribute to a loss of useful information. Accordingly, output information is introduced into matrix K with the purpose that not only the input but also the output should preserve the locality in data space. The temporal sequences of input and output are X = [x1, ..., x t ] ∈ Rst and Y = [y1, ..., yt ] ∈ RLt , respectively, where L indicates the dimension of output. Matrix K is described as
where ρ ∈ [0, 1]. According to eq 11, we can calculate the similarity, Ji, for each sample in the database, where i ∈ 1, 2, ..., M. The similarities are then resorted in a descending manner, and samples with larger similarity are selected. The selection criterion is formulated as below: ∑i = 1 Ji
Article
≥ θ3
ij (η∥x i − x j∥ + (1 − η)∥y − y ∥)2 yz i j j zz zz K ij = expjjjj− zz j σ k {
(12)
where θ3 ∈ [0, 1] is a threshold to measure the data similarity. The proposed JITL modeling (JITL(SPCA)) is shown in Figure 3. In Figure 3, the red points stand for the chosen
(14)
where η is a parameter used for a regulation between input variables and output variables, and σ is also a parameter that regulates the neighboring correlations of local variables. It must be noted that the calculation of locality utilizes eq 14 only if xi is located among the k nearest neighbors of xj or if yi is located among the k nearest neighbors of yj; otherwise, Kij = 0. In this way, temporally related features with global and local process information can be obtained, and the full suite of input and output information can be made. Through simple calculation, we have t
t
t
∑ ∥ti̇ ∥22 + μ∑ ∑ K (i− 1)(j − 1) ∥ti̇ − tj̇ ∥22
Figure 3. Proposed JITL modeling.
i=2
i=2 j=2
̇ ̇ T) + μ tr[Ṫ (Λ − K)Ṫ T] = tr(TT
samples, and the black points stand for the discarded samples. Different from conventional CoJITL modeling, it is obvious that some discarded data in CoJITL modeling may be chosen by the proposed method, which could contribute to better local modeling. Thus, sample selection in the proposed JITL modeling is more reasonable, and information loss can be lowered. 3.2. Modified Regularized Slow-Feature Analysis. Considering conventional SFA only extracts globally optimized features, SFA’s competence to retain local process information and deal with dynamics are limited. An intuition is presented to integrate the local relationships of samples into SFA’s optimization problem. Motived by the principles of SFA to extract slowly varying features by minimizing the difference of slow features, ti̇ , SFA can be improved by minimizing intravariations of the difference of slow features. In this way, an improved SFA algorithm can extract “slower” features from a process, and these slower features can better reveal the essence of the process. Moreover, a constraint for SFA should be designed to enable improved SFA that preserves global information together with local information. A new optimization problem of SFA can be described as
= tr{Ṫ [I + μ(Λ − K)]Ṫ T} ̇ ̇ T) = tr(TDT
(15)
where Λ is a diagonal matrix t Λ(i − 1)(i − 1) = ∑ j = 2 K (i − 1)(j − 1) i = 2, ..., t , D = I + μ(Λ − K)
with (16)
We can rewrite the optimization problem as ̇ ̇ T) arg min tr(TDT W
s.t.
Ṫ ΛṪ T = I
WTW = I
(17)
For simplicity, the new optimization problem can be reformulated as arg min W
̇ ̇ T) tr(TDT s.t. WTW = I tr(Ṫ ΛṪ T)
(18)
Considering eq 2, we then have arg min W
E
̇ T)W) tr(WT(XDX s.t. WTW = I T ̇ tr(W (X ΛXT)W)
(19)
DOI: 10.1021/acs.iecr.8b01270 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research θ = (TTT)−1TYselected
Eventually, we can solve a generalized eigen decomposition to get the mapping W. ̇ ̇ T)W = Ω(Ẋ ΛẊ T)W (XDX
(21)
where T contains the extracted regularized slow features, and Yselected contains the output variables of the local models. Then, the predicted value of the output can be obtained using the equation below.
(20)
where Ω is a diagonal matrix of eigen-values. The improved SFA algorithm can be summarized in Table 1.
Ypredicted = θ WTx measured
Table 1. Algorithm: Modified Regularized Slow-Feature Analysis
(22)
After finishing the prediction of the output for the current measured sample, the historical database will be updated with this sample. Then, the next sample will be collected, and the process of the proposed modeling method will continue until no sample can be collected or a fault occurs. The whole procedure is illustrated in Figure 4.
3.3. Quality-Prediction Method Based on Improved JITL−Regularized SFA. In this subsection, a novel SFA-based quality-prediction method has been proposed to improve the prediction performance of a regression model. The overall procedure of the proposed method is summarized as follows. After collecting a sample from the process in real-time, the sample must be normalized in advance. Then, a similarity criterion based on SPCA is established for the measured sample. In SPCA, traditional PCA is first performed on the current measured data, and then as many PCs as possible should be retained in case of information loss. Then, for each retained variable, the change rate of the T2 statistic will be tested according to eq 10; first, several retained variables with large change rates of the T2 statistic will be preserved. On the basis of the preserved PCs and variables, the T2 and SPE statistics of each sample in the database can be calculated. We can obtain the similarity values for each historical data set according to eq 11. Then, on the basis of the corresponding selection criterion, some of the most resembling data will be chosen from the database. After sample selection, the proposed ReSFA is performed on the chosen samples to establish a local model by approximating the regression relationship among the output and extracted latent features. By means of these features, the coefficients of the output variables and input variables for the current sample can be calculated through a least-squares model, which is listed below.
Figure 4. Procedure for the improved JITL-ReSFA(SPCA).
4. CASE STUDY AND DISCUSSION In the industrial PTA-production process, the byproduct 4CBA in TA has an unfavorable effect on the polymer process, the content of 4-CBA in the final product of the CTAhydropurification process can be considered as an important quality index of the process. However, the content of 4-CBA is immeasurable online using traditional methods. In order to better evaluate the whole PTA-production process, this paper F
DOI: 10.1021/acs.iecr.8b01270 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research proposes an online quality-prediction method using an improved just-in-time-learning-based regularized slow-feature regression model (JITL-ReSFA(SPCA)) for the prediction of the content of 4-CBA in final product, PTA. The configuration of the computer is as follows: Windows 7 (64 bit) OS, Intel Core i5-2430 (2.40 GHz) CPU, 4 GB of RAM, and MATLAB 2017a. 4.1. Selection of Variables. Before the modeling and quantitative analysis using the proposed method, it is of great importance to perform variable selection on the PTAproduction process. Thanks to the help of DCS in industrial CTA-hydropurification process, some variables can be directly measured online: the feed flow rate (ton/h), the TA concentration (%), the hydrogen flow rate (kg/h), and the reactor temperature (°C). Furthermore, Li et al.12 pointed out that reactor pressure (MPa) and catalyst activity (kmol kg c−1 s−1) should be taken into consideration. However, it is difficult to get the reactor-pressure data because it changes quickly with time and holds a wave near a constant value under the normal running process. In our case, this constant value is about 7.9 MPa. Thus, we constructed the reactor-pressure values with a stochastic volatility according to Li et al.12 Additionally, the values of catalyst activity are hard to obtain from the process in real-time. For a better description of the CTA-hydropurification process, we rely on a catalyst-deactivation model provided by Li et al.1 This model can be described as k new = kold − ηt 2
(23)
where knew is the new catalyst activity, kold = 0.7 is the old catalyst activity, and η = 8 × 10−9 is a constant parameter obtained by real plant data and simulations. Furthermore, the 4-CBA concentration (%) in TA and the 4-CBA concentration (ppm) in PTA are hard to measure online, but they can be obtained through testing in the laboratory. Finally, seven variables were chosen as inputs for the softcomputing modeling: the mass flow of CTA (kg/h), the mass flow of deionized water (kg/h), the mass flow of hydrogen (kg/h), the reactor temperature (°C), the reactor pressure (MPa), the 4-CBA content in CTA (kg/h), and the catalyst activity (kmol kg c−1 s−1). One output variable, namely, the 4CBA content in CTA (kg/h), requires prediction. The description of variables is listed in Table 2. Both inputs and output are shown in Figure 5. 4.2. Data Preprocessing. After the input and output variables were determined, data preprocessing needed to be conducted in advance of modeling. In our case, a total of 633 samples (about 3.5 months) had been collected from an industrial CTA-hydropurification process and the sampling frequency was once every 4 h. For the approximation of the
Figure 5. Eight measurements in the CTA-hydropurification process: (a) CTA mass flow, (b) water mass flow, (c) hydrogen mass flow, (d) reactor temperature, (e) 4-CBA content in CTA, (f) reactor pressure, (g) catalyst activity, and (h) 4-CBA content in PTA.
real process, additive white Gaussian noises (AWGN) were added into the output 4-CBA data, and the signal-to-noise ratio was set at 20 dB. In the real process, outliers in the collected samples are unavoidable, and they can incur model mismatches and weaken the performance of the prediction to certain extent. Hence, it was very necessary to preprocess on the raw data in advance. The 3σ criterion is an effective method to check the outliers in an industrial process. Therefore, we applied this criterion on our sampled data to determine which samples were outliers, and those outliers were discarded. After the data preprocessing, the size of our sampled data was 595. 4.3. Quality Prediction for 4-CBA Content in CTAHydropurification Process Based on Modified JITL− Regularized SFA. In this section, the proposed JITL-ReSFAbased quality-prediction model is applied to predict the 4-CBA content in the final product for an industrial CTA-purification process. The estimation performance of the proposed JITLReSFA method is compared with those of some existing methods. The root-mean-squared error (RMSE), R2, and maximum absolute error (MAE) are utilized to quantitatively assess the performances of different quality-prediction methods.
Table 2. Description of the Variables for the CTAHydropurification Process description
unit
CTA mass flow deionized-water mass flow H2 mass flow reactor temperature reactor pressure 4-CBA content in CTA catalyst activity 4-CBA content in PTA
kg/h kg/h kg/h °C MPa kg/h kmol kg c−1 s−1 kg/h G
DOI: 10.1021/acs.iecr.8b01270 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
demonstrate conventional SFA’ s ability to handle process dynamics, (2) to verify that the proposed ReSFA outperforms traditional SFA in dealing with dynamical processes, and (3) to validate that the proposed JITL-modeling methods can contribute to less information loss and better sample selection than classical JITL-modeling methods. Then the quantitative results of all methods are given in Table 3.
N
RMSE =
1 ∑ (y − yî ) N i=1 i
(24)
N
R2 = 1 −
∑i = 1 (yi − yî )2 N
∑i = 1 (yi − y̅ )2
MAE = max{|yi − yî |}
(25) (26)
Table 3. Error Comparisons
where yi is the real value of output variables, ŷi is the predicted output, y̅ is the mean value of the output, and N is the length of the query samples. For comparison, the sizes of the training and testing sets are 300 and 295 samples, respectively, for all methods. To verify effectiveness of the proposed method, qualitative and quantitative analyses should be conducted among some existing methods and some modified SFA algorithms. Some existing methods are listed as below: PLS, kernel PLS (KPLS), recursive KPLS (RKPLS), ELM, and SFA. The modified SFA methods are kernel SFA (KSFA), moving-window SFA (MWSFA), regularized SFA (ReSFA), and MW-ReSFA. To further demonstrate the validity of the proposed JITL-modeling method and the proposed similarity criterion, two classical JITL-modeling methods, namely, JITL modeling based on Euclidean distances and angles and correlation-based JITL modeling, are compared with the proposed correlation-based JITL modeling. For better comparisons, these three JITLmodeling methods are all dependent on ReSFA for local modeling. Before the comparing the performances of aforementioned methods, we provide the parameters in these methods, which are of great significance. These parameters are chosen empirically. The reasons for the empirical selection of parameters are listed as follows: (1) The parameters in the aforementioned methods need to be identical to those in the proposed method. If optimally chosen, some parameters may not be identical. (2) If the parameters are chosen optimally, the proposed method will suffer from unsatisfied time consumption, and prediction performance will deteriorate. Thus, these parameters are given as follows: (1) For KPLS, RKPLS, and KSFA, the types of kernel functions are all set as Gaussian kernel functions for better evaluation of the prediction performances of these methods,. The bandwidths of kernel functions cannot be set too large, because if the bandwidth is too large, the output matrix of a kernel function will approximate an all-one matrix, which will weaken the ability to deal with nonlinearity. Therefore, we set the bandwidth as 50, which is identical to the bandwidth in ReSFA when σ is set as 2500. (2) For ELM, the activation functions of the hidden nodes are all Gaussian kernel functions. Apart from activation functions, the other important parameter of ELM is the number of hidden nodes. Practically, ELM can achieve a good generalization performance if it has enough hidden nodes. Thus, we set the number of hidden nodes to 2000. (3) For ReSFA, μ = 0.7, and σ = 2500. (4) For MW-SFA and MW-ReSFA, the window sizes are set as 300. (5) For JITL-ReSFA(D&A), α = 0.6, μ = 0.7, and σ = 2500. (6) For JITL-ReSFA(PCA), θ1 = 0.99, θ3 = 0.7, μ = 0.7, ρ = 0.6, η = 0.6, and σ = 2500. (7) For JITL-ReSFA(SPCA), θ1 = 0.99, θ2 = 0.5, θ3 = 0.7, μ = 0.7, ρ = 0.6, η = 0.6, and σ = 2500. After determining the parameters, some discussions and comparisons among methods must be made with three main purposes. The purposes are listed as follows: (1) to
method
RMSE
R2
MAE
PLS KPLS RKPLS ELM SFA KSFA MW-SFA ReSFA MW-ReSFA JITL-ReSFA(D&A) JITL-ReSFA(PCA) JITL-ReSFA(SPCA)
0.8149 94.8290 0.3211 9.0224 0.9477 44.1230 0.3726 0.3268 0.3181 0.3591 0.3063 0.3029
−1.3771 −32 188.0000 0.6310 −281.8828 −2.2150 −6967.6000 0.5030 0.6177 0.6379 0.5236 0.6664 0.6715
2.1693 378.7900 0.9239 7.2004 3.2293 127.8300 1.3358 0.9912 0.9422 1.0848 0.9155 0.9045
For a demonstration of conventional SFA’s efficiency in coping with dynamics, we compared the prediction performances of SFA with those of PLS, KPLS, and ELM. The plot comparisons are shown in Figure 6. From the qualitative
Figure 6. Plot comparisons: (a) SFA, (b) PLS, (c) KPLS, and (d) ELM.
comparisons in Figure 6, we can find that the performances of PLS and SFA are better than those of ELM and KSFA. The reason for the good performances of PLS and SFA is that the nonlinearity of the CTA-hydropurification process is not too strong, and PLS and SFA, as linear methods, can deal with weak nonlinearity well to some extent. From the quantitative comparisons in Table 3, it can be concluded that the overall performance of PLS outperforms that of SFA in the long term. Moreover, it is notable that SFA seems to be very effective in the early stage of the process. To verify this point of view, we chose the first 100 samples from the testing set to simulate the early running conditions of the CTA-hydropurification process. The results are given in Figure 7 and Table 4. The results in H
DOI: 10.1021/acs.iecr.8b01270 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 7. Plot comparisons in the first 100 samples of the test data: (a) SFA and (b) PLS.
Table 4. Error Comparisons between PLS and SFA in the First 100 Samples of the Test Data method
RMSE
R2
MAE
PLS SFA
0.7846 0.3073
−2.3372 0.4882
1.7700 0.8372
Table 4 show that the RMSE value of SFA is smaller than that of PLS. The results indicate that SFA outperforms other classical methods in the early phase of the process. SFA has a more accurate prediction performance as it exploits dynamic information to some extent, but the extracted dynamical information will degrade with time, and model mismatches may happen in a long enough time span because of the failure to exploit local process information. A vertical comparison of SFA, PLS, KPLS, and ELM is given in Figure 8a, which also qualitatively demonstrates that the overall performance of PLS is better than that of SFA. To better estimate the performance of the proposed ReSFA method, ReSFA and MW-ReSFA are compared with SFA, some classical modified SFA algorithms, and RKPLS. The plot comparisons of these methods are depicted in Figure 9, and
Figure 9. Plot comparisons: (a) SFA, (b) KSFA, (c) MW-SFA, (d) RKPLS, (e) MW-ReSFA, and (f) ReSFA.
vertical comparisons are provided in Figure 8b,c. From the prediction results in Table 3, it can easily be seen that the proposed ReSFA method can effectively alleviate the problem of SFA. As ReSFA extracts not only global information but also local information, the dynamics of the process can be modeled well in this way. Although the moving-window strategy can enhance SFA to deal with process dynamics, it can be seen that the predictive accuracy of MW-SFA is lower than that of ReSFA and, a fortiori, lower than that of MW-ReSFA. The problem of SFA reveals that SFA is essentially limited in dealing with dynamics. Moreover, we can note that ReSFA, a method without an adaptive mechanism, can approximate the accuracy of RKPLS, which further validates the effectiveness of the proposed ReSFA method. Some extra experiments have been carried out for classical JITL-modeling methods and the proposed JITL-modeling method. JITL-ReSFA(D&A) cannot give satisfactory predictive accuracy, and it even degrades compared with ReSFA, because its similarity measurements are only on the basis of distances and angles between samples and do not take correlation among samples into consideration. Hence, in JITL(D&A) modeling, samples chosen for local modeling may not resemble the query data, which incurs the deterioration of the prediction model. In contrast, JITL-ReSFA(PCA) is an improvement as it employs correlations together with distances among sample data. Furthermore, the proposed JITL-ReSFA(SPCA) method achieves a higher prediction accuracy than JITL-ReSFA(PCA). The comparison between JITL-ReSFA(PCA) and JITL-ReSFA(SPCA) indicates that the proposed JITL-modeling approach is more appropriate for constructing local models and designing more proper similarity measurements. The plot comparisons between three JITL approaches are illustrated in Figure 10, and a vertical comparison is in Figure 8d.
Figure 8. Vertical comparisons of methods: (a) PLS, KPLS, ELM, and SFA; (b) SFA, KSFA, and ReSFA; (c) RKPLS, MW-SFA, and MW-ReSFA; and (d) JITL-ReSFA(D&A), JITL-ReSFA(PCA), and JITL-ReSFA(SPCA). I
DOI: 10.1021/acs.iecr.8b01270 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research
Figure 10. Plot comparisons between three JITL-modeling-based ReSFAs: (a) distance- and angle-based JITL, (b) correlation-based JITL, and (c) the proposed JITL
■
5. CONCLUSION In this paper, a novel just-in-time-learning-framework-based modified regularized-slow-feature-analysis method is proposed for 4-CBA-content prediction in industrial CTA-hydropurification processes. In the proposed method, a novel JITL framework is put forward for an improvement of sample selection. In the proposed JITL, correlations among the data are taken into consideration. Furthermore, to avoid information loss as much as possible, sensitive PCA is introduced into CoJITL modeling to overcome the shortcomings of PCA by means of capturing information from variables with small variance. After sample selection, a modified regularized slowfeature analysis is utilized to establish local models by approximating regression relationships among outputs and extracted latent features. In the proposed ReSFA, local information together with input and output information are considered. Finally, a case study about the prediction of 4-CBA content in the final product in the CTA-hydropurification process is carried out to estimate the predictive accuracy of the proposed JITL-ReSFA(SPCA) method. The results quantitatively and qualitatively demonstrate the superiority of the proposed method compared with some other methods. One limitation worthy noting is that for constructing similarity criterion, no matter whether PCA or SPCA is used, a hypothesis about the distribution of the process must be made in advance, because the process must approximately follow a normal distribution. If the process does not follow a normal distribution, this method may provide a poor or wrong prediction result.
■
REFERENCES
(1) Li, Z.; Zhong, W.; Wang, X.; Luo, N.; Qian, F. Control structure design of an industrial crude terephthalic acid hydropurification process with catalyst deactivation. Comput. Chem. Eng. 2016, 88 (6), 1−12. (2) Azarpour, A.; Zahedi, G. Performance analysis of crude terephthalic acid hydropurification in an industrial trickle-bed reactor experiencing catalyst deactivation. Chem. Eng. J. 2012, 209 (41), 180− 193. (3) Qian, F.; Tao, L.; Sun, W.; Du, W. Development of a Free Radical Kinetic Model for Industrial Oxidation of p-Xylene Based on Artificial Neural Network and Adaptive Immune Genetic Algorithm. Ind. Eng. Chem. Res. 2012, 51 (8), 3229−3237. (4) Pellegrini, R.; Agostini, G.; Groppo, E.; Piovano, A.; Leofanti, G.; Lamberti, C. 0.5 wt.% Pd/C catalyst for purification of terephthalic acid: Irreversible deactivation in industrial plants. J. Catal. 2011, 280 (2), 150−160. (5) Jhung, S. H.; Romanenko, A. V.; Lee, K. H.; Park, Y. S.; Moroz, E. M.; Likholobov, V. A. Carbon-supported palladium-ruthenium catalyst for hydropurification of terephthalic acid. Appl. Catal., A 2002, 225 (1), 131−139. (6) Liu, Y.; Gao, Z.; Li, P.; Wang, H. Just-in-Time Kernel Learning with Adaptive Parameter Selection for Soft Sensor Modeling of Batch Processes. Ind. Eng. Chem. Res. 2012, 51 (11), 4313−4327. (7) Tong, C.; Lan, T.; Shi, X. Soft sensing of non-Gaussian processes using ensemble modified independent component regression. Chemom. Intell. Lab. Syst. 2016, 157, 120−126. (8) Yuan, X.; Ge, Z.; Huang, B.; Song, Z.; Wang, Y. Semi-supervised JITL framework for nonlinear industrial soft sensing based on locally semi-supervised weighted PCR. IEEE. T. Ind. Inform. 2017, 13 (2), 532−541. (9) Ge, Z. Review on data-driven modeling and monitoring for plant-wide industrial processes. Chemom. Intell. Lab. Syst. 2017, 171, 16−25. (10) Ge, Z.; Song, Z.; Ding, S. X.; Huang, B. Data Mining and Analytics in the Process Industry: The Role of Machine Learning. IEEE Access 2017, 5 (99), 20590−20616. (11) Ge, Z. Distributed predictive modeling framework for prediction and diagnosis of key performance index in plant-wide processes. J. Process Control 2018, 65, 107−117. (12) Li, Z.; Zhong, W.; Peng, X.; Du, W.; Qian, F. Soft sensor based on recursive kernel partial least squares for 4-carboxybenzaldehyde of an industrial terephthalic acid hydropurification process. Chem. Engineer. Trans. 2017, 61, 463−468. (13) Yao, L.; Ge, Z. Deep Learning of Semi-supervised Process Data with Hierarchical Extreme Learning Machine and Soft Sensor Application. IEEE T. Ind. Electron. 2018, 65 (2), 1490−1498. (14) Shang, C.; Yang, F.; Huang, D.; Lyu, W. Data-driven soft sensor development based on deep learning technique. J. Process Control 2014, 24 (3), 223−233. (15) Kaneko, H.; Funatsu, K. Application of online support vector regression for soft sensors. AIChE J. 2014, 60 (2), 600−612. (16) Ni, W.; Tan, S. K.; Ng, W. J.; Brown, S. D. Moving-Window GPR for Nonlinear Dynamic System Modeling with Dual Updating
AUTHOR INFORMATION
Corresponding Author
*Tel.: +86-21-64252640. E-mail:
[email protected]. ORCID
Weimin Zhong: 0000-0002-4285-4739 Feng Qian: 0000-0003-2781-332X Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (Key Program 61333010), the National Science Fund for Distinguished Young Scholars (61725301), the International (Regional) Cooperation and Exchange Project (61720106008), the Program of Introducing Talents of Discipline to Universities (the 111 Project) under Grant B17017, Fundamental Research Funds for the Central Universities (222201814041), and Shanghai Sailing Program (18YF1405200). J
DOI: 10.1021/acs.iecr.8b01270 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX
Article
Industrial & Engineering Chemistry Research and Dual Preprocessing. Ind. Eng. Chem. Res. 2012, 51 (18), 6416− 6428. (17) Wiskott, L.; Sejnowski, T. J. Slow feature analysis: unsupervised learning of invariances. Neural Comput. 2002, 14 (4), 715−770. (18) Blaschke, T.; Zito, T.; Wiskott, L. Independent Slow Feature Analysis and Nonlinear Blind Source Separation. Neural Comput. 2007, 19 (4), 994−1021. (19) Sun, L.; Jia, K.; Chan, T. H.; Fang, Y.; Wang, G.; Yan, S. In DLSFA: Deeply-Learned Slow Feature Analysis for Action Recognition. Proc. Cvpr. IEEE 2014, 2625−2632. (20) Shang, C.; Huang, B.; Yang, F.; Huang, D. Slow feature analysis for monitoring and diagnosis of control performance. J. Process Control 2016, 39, 21−34. (21) Shang, C.; Yang, F.; Gao, X.; Huang, X.; Suykens, J. A. K.; Huang, D. Concurrent monitoring of operating condition deviations and process dynamics anomalies with slow feature analysis. AIChE J. 2015, 61 (11), 3666−3682. (22) Shang, C.; Huang, B.; Yang, F.; Huang, D. Probabilistic slow feature analysis-based representation learning from massive process data for soft sensor modeling. AIChE J. 2015, 61 (12), 4126−4139. (23) Fan, L.; Kodamana, H.; Huang, B. Identification of robust probabilistic slow feature regression model for process data contaminated with outliers. Chemom. Intell. Lab. Syst. 2018, 173, 1− 13. (24) Böhmer, W.; Grünewälder, S.; Nickisch, H.; Obermayer, K. Generating feature spaces for linear algorithms with regularized sparse kernel slow feature analysis. Mach. Learn. 2012, 89 (1−2), 67−86. (25) Yao, L.; Ge, Z. Moving window adaptive soft sensor for state shifting process based on weighted supervised latent factor analysis. Control Eng. Pract. 2017, 61, 72−80. (26) Shang, C.; Yang, F.; Huang, B.; Huang, D. Recursive Slow Feature Analysis for Adaptive Monitoring of Industrial Processes. IEEE T. Ind. Electron. 2018, 65, 8895. (27) Kaneko, H.; Funatsu, K. Development of Soft Sensor Models Based on Time Difference of Process Variables with Accounting for Nonlinear Relationship. Ind. Eng. Chem. Res. 2011, 50 (18), 10643− 10651. (28) Peng, X.; Tang, Y.; Du, W.; Qian, F. An Online Performance Monitoring and Modeling Paradigm based on Just-in-time Learning and Extreme Learning Machine for Non-Gaussian Chemical Process. Ind. Eng. Chem. Res. 2017, 56 (23), 6671−6684. (29) Peng, X.; Tang, Y.; He, W.; Du, W.; Qian, F. A Just-in-Time Learning based Monitoring and Classification Method for Hyper/ Hypocalcemia Diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinf. 2018, 15 (3), 788−801. (30) Zhang, J.; Yim, Y. S.; Yang, J. Intelligent selection of instances for prediction functions in lazy learning algorithms. Artif. Intell. Rev. 1997, 11 (1−5), 175−191. (31) Zheng, Q.; Kimura, H. Just-in-Time Modeling for Function Prediction and Its Applications. Asian J. Control 2001, 3 (1), 35−44. (32) Cheng, C.; Chiu, M. S. A new data-based methodology for nonlinear process modeling. Chem. Eng. Sci. 2004, 59 (13), 2801− 2810. (33) Fujiwara, K.; Kano, M.; Hasebe, S.; Takinami, A. Soft-sensor development using correlation-based just-in-time modeling. AIChE J. 2009, 55 (7), 1754−1765. (34) Yuan, X.; Ge, Z.; Huang, B.; Song, Z. A Probabilistic Just-inTime Learning Framework for Soft Sensor Development With Missing Data. IEEE T. Contr. Syst. T. 2017, 25 (3), 1124−1132. (35) Tyréus, B. D. Dominant Variables for Partial Control. 1. A Thermodynamic Method for Their Identification. Ind. Eng. Chem. Res. 1999, 38 (4), 1432−1443. (36) Jiang, Q.; Yan, X.; Zhao, W. Fault Detection and Diagnosis in Chemical Processes Using Sensitive Principal Component Analysis. Ind. Eng. Chem. Res. 2013, 52 (4), 1635−1644.
K
DOI: 10.1021/acs.iecr.8b01270 Ind. Eng. Chem. Res. XXXX, XXX, XXX−XXX