A Quality-Related Statistical Process Monitoring Method Based on

Mar 25, 2018 - Enclave Optimization: A Novel Multiplant Production Scheduling Approach for Cryogenic Air Separation Plants. Industrial & Engineering ...
1 downloads 0 Views 2MB Size
Subscriber access provided by TUFTS UNIV

Process Systems Engineering

Quality-related Statistical Process Monitoring Based On Global Plus Local Projection to Latent Structures Jinglin Zhou, Shunli Zhang, Han Zhang, and Jing Wang Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.7b04554 • Publication Date (Web): 25 Mar 2018 Downloaded from http://pubs.acs.org on March 25, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

A Quality-related Statistical Process Monitoring Method Based On Global Plus Local Projection to Latent Structures† J. L. Zhou, S. L. Zhang, H. Zhang, and J. Wang∗ The College of Information Science and Technology, Beijing University of Chemical Technology, 100029, Beijing, China. E-mail: [email protected]

Abstract The partial least squares (PLS) method is widely used in the quality monitoring of process control systems, but it has poor monitoring capability in some locally strong nonlinear systems. To enhance the monitoring ability of such nonlinear systems, a novel statistical model based on global plus local projection to latent structures (GPLPLS) is proposed. First, the characteristics and nature of global and local partial least squares (QGLPLS) are carefully analyzed, where its principal components preserve the local structural information in their respective data sheets as much as possible but not the correlation. The GPLPLS model, however, pays more attention to the correlation of extracted principal components. GPLPLS has the ability to extract the maximum linear correlation information; at the same time the local nonlinear structural correlation information between the process and quality variables is extracted as much as possible. Then, the corresponding quality-relevance monitoring strategy is established. Finally, the validity † The

Corresponding Author E-mail: [email protected]. ORCID J. Wang: 0000-0002-6847-8452 Notes: The authors declare no competing financial interest. ∗ To whom correspondence should be addressed

1

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and effectiveness of the GPLPLS-based statistical model are illustrated through the Tennessee Eastman process (TEP) simulation platform. The experimental results demonstrate that the proposed model can maintain the local properties of the original data as much as possible and get better monitoring results compared with PLS and QGLPLS.

Introduction Owing to raised demands on process operation and product quality, the modern industrial process becomes more complicated when accompanied by the large number of process and quality variables produced. Therefore fault detection and diagnosis are extremely necessary for complex industrial processes. Data-driven statistical process monitoring plays an important role in fault detection and the diagnosis for digging out the useful information from these highly correlated process and quality variables 1–5 because the quality variables are measured at a much lower frequency and usually have a significant time delay. 6 Monitoring the process variables related to the quality variables is significant for finding potential harm that may lead to system shutdown with possible enormous economic loss. Currently, partial least squares (PLS), which is one of those data-driven methods, 7–10 is widely used because of its advantages in extracting the latent variables by establishing the relationship between input and output space for quality-relevant process monitoring. 11 The statistics T 2 and SPE are used to alert one to the faults during the monitoring process. Although it works very well in some cases, there are two problems in this process. The score vector that forms the statistics T 2 contains variations that are irrelevant to the output variables. The other fault is that the input space still contains large variations after the PLS decomposition, so it is inappropriate to use the statistic SPE for the quality-relevant process monitoring. The total PLS (T-PLS) was proposed to resolve the issues and then the recursive TPLS 12 was proposed to adapt to process monitoring over time. Qin et al. state that the process data in T-PLS is decomposed into four subspaces unnecessarily and proposed the Concurrent Projection to Latent Structures (C-PLS) method. However, the nature of PLS is a linear projection, which is not applicable for nonlinear systems. Currently, some 2

ACS Paragon Plus Environment

Page 2 of 39

Page 3 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

other methods related to PLS have been proposed for quality-relevant fault detection of nonlinear processes. The Kernel partial least squares (KPLS) method, which is conducted by mapping the original data onto a high-dimensional feature space, is one of the nonlinear PLS models. 13,14 KPLS effectively solves the nonlinear problem between the principal components for input space and output space. The Kernel Concurrent Canonical Correlation Analysis (KCCCA) algorithm is proposed for quality-relevant nonlinear process monitoring that considers the nonlinearity in the quality variables. 15 Other methods have also been proposed, such as the neural network PLS (NNPLS), which introduces the nonlinearity between the input and the output score vectors, 16,17 the PLS with a quadratic (QPLS), 18 the recursive nonlinear PLS (RNPLS), 19 and the Nonlinear PLS with slice transformation (SLT)-based piecewise-linear inner relation (NPLSSLT) 20 model. However, the proposed algorithms suffer from different shortcomings, for example, the kernel function of the KPLS is difficult to choose, and the calculation is quite complex when optimizing the neural network in the NNPLS model. Considering that the PLS and its extended algorithms only focus attention on the global structural information and cannot extract the local adjacent structural information of the data well, they are not suitable for the extraction of nonlinear features. Therefore, the local linearization method for dealing with nonlinear problems is taken into account. In recent years, locality preserving projections (LPP), 21,22 which belong to the manifold learning method have been proposed to solve the local adjacent structural features problem and effectively make up for this deficiency. The LPP method preserves the local features by projecting the global structure to an approximate linear space, and by constructing a neighborhood graph to explore the inherent geometric features and manifold structure from the sample data sets. But the LPP method cannot consider the overall structure and lacks a detailed analysis and explanation of the correlation between process and quality variables. Therefore, combining the PLS method and the LPP method has become a new topic of concern for a growing number of engineers. Regarding the combination of global and local information, Zhong et al. proposed a quality-

3

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

related global and local partial least squares (QGLPLS) model. 23 The QGLPLS method integrates the advantages of the locality-preserving projections (LPP) and partial least square (PLS) methods, and extracts meaningful low-dimensional representations from the high-dimensional process and quality data. The principal components in QGLPLS preserve the local structural information in their respective data sheets as much as possible. However, the correlation between the process and quality variables is not enhanced, and the constraints of LPP are removed in the optimization objective function. Therefore, the monitoring results are seriously affected. After further analysis of the geometric characteristics of LPP and PLS, a new integration method called the locality-preserving partial least squares (LPPLS) model 24 was proposed by Wang et al. to pay more attention to the locality-preserving characteristics. LPPLS can exploit the underlying geometrical structure, which contains the local characteristics, in input and output space. Although the correlation degree maximized between the process and quality variables was considered, the global characteristics were converted into a combination of multiple local linearized characteristics and were not expressed directly. In many processes, the linear relationship may be the most important, and the best way is to describe it directly rather than through a combination of multiple local linearized characteristics. To establish a quality-related model, a novel statistical method based on global plus local projection to latent structures (GPLPLS) is proposed in this study, and it efficiently preserves the global and local characteristics and focuses more attention on the relevance of extracted principal components. The main concept of the GPLPLS statistical model is to enhance the local nonlinear characteristics by LPP or LLE (locally linear embedding, which is the other popular manifold learning method to maintain the local structural information) during the PLS decomposition process. Therefore, GPLPLS not only has the PLS characteristic of the correlation between the process and quality variables being maximized, but it also has the ability of LPP/LLE to maintain the local nonlinear structure. Then, the monitoring process based on this model is established and compared with several other methods in the well-known Tennessee Eastman Process (TEP) 25 simulation platform benchmark.

4

ACS Paragon Plus Environment

Page 4 of 39

Page 5 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

The rest of this paper is organized as follows. Section 2 reviews the QGLPLS method briefly and then the new objective function of the LLE method is shown. The new GPLPLS method is proposed followed by analyzing its locality preserving capacity in Section 3. In Section 4, process monitoring based on GPLPLS is established and the validity and effectiveness of the GPLPLSbased statistical model is illustrated through the Tennessee Eastman process (TEP) 25 simulation platform in Section 5. In the end, the conclusion is presented in Section 6.

The QGLPLS model and new optimization objective of LLE Similar to the QGLPLS model, the GPLPLS model is also an integration of global and local structural information. To better analyze the relationship between the QGLPLS and the GPLPLS models, a brief introduction to the QGLPLS model is given in this section. Next, to introduce the new manifold learning algorithm into the PLS model, we try to convert the optimization objective of the local linear embedding (LLE).

A brief review of the QGLPLS model Consider the model between two normalized data sets X = [x1 , x2 , · · · , xm ] ∈ Rn×m and Y = [y1 , y2 , · · · , yl ] ∈ Rn×l of a QGLPLS algorithm, 23 where m and l are, respectively, the number of process and quality variables, and n is the sampling time. The main idea of QGLPLS is to obtain the relationship between the process and quality variables by combining the PLS and LPP methods while maintaining the local characteristics as much as possible. The optimization objective function of the QGLPLS model is given as follows, 23 JQGLPLS (w, c) = arg max{wT X TY c + λ1 wT X T Sx Xw + λ2 cTY T SyY c} = arg max{wT X TY c + λ1 wT θx w + λ2 cT θy c} s.t. wT w = 1, cT c = 1

5

ACS Paragon Plus Environment

(1)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 39

where θx = X T Sx X and θy = Y T SyY represent the local structural information of process variable X and quality variable Y . Sx , Sy and the next D1 , D2 are the local characteristic parameters of the LPP algorithm. 23 The parameters λ1 and λ2 were used to control the trade-off between the global and local features. It is easy to find that the optimization problem of the QGLPLS algorithm (1) includes the optimization problem of the PLS algorithm (2) JPLS (w, c) = arg max{wT X TY c} T

(2)

T

s.t. w w = 1, c c = 1 and a part optimization problem in the LPP algorithm (3). JLPP (w) = arg max{wT X T Sx Xw}

(3)

T T

s.t. w X D1 Xw = 1 The optimization objective function (1) seems to be a good combination of the PLS algorithm global characteristics and the LPP algorithm local persistence characteristics. Is that really the case? Let’s analyze the solution of the optimization problem first. To solve the optimization objective function (1), the following Lagrange function is introduced:

Ψ(w, c) = wT X TY c + λ1 wT θx w + λ2 cT θy c − η1 (wT w − 1) − η2 (cT c − 1)

Then, the optimization objective (1) can be further written as follows by the extreme conditions.

JQGLPLS (w, c) = η1 + η2

(4)

Setting λ1 = η1 and λ2 = η2 , the optimal projection w and c are the corresponding eigenvectors to the largest eigenvalue 4η1 η2 of (I − θx )−1 X TY (I − θy )−1 Y T X and (I − θy )−1 Y T X (I − θx )−1 X TY ,

6

ACS Paragon Plus Environment

Page 7 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

respectively, that is (I − θx )−1 X TY (I − θy )−1 Y T Xw = 4η1 η2 w −1

(I − θy )

−1

T

Y X (I − θx )

(5)

T

X Y c = 4η1 η2 c

(4) shows that η1 + η2 should be the maximum if the optimal solution of QGLPLS optimal problem is obtained, but in fact the QGLPLS maximum is η1 η2 . It is clear that the condition of max{η1 + η2 } and max{η1 η2 } is different in most cases. Why does this happen? Back to the optimization problem (1) again. The optimization problem (1) is the global (PLS) and local (LPP) characteristics combination optimization problem. It is undeniable that this is a very interesting combination. However, the latent factors of PLS are chosen to manifest the factor variation as much as possible and the correlation between latent factors is as strong as possible. However, the latent factors of LPP only need to manifest the local structure as much as possible. In other words, the local structural information of process variables X (θx = X T Sx X) and quality variables Y (θy = Y T SyY ) is enhanced, but not the correction between them. Thus, the global and local characteristics straightforward combination may lead to erroneous results. Like LPP, LLE converts a global nonlinear problem into the combination of a plurality of local linear problems by introducing local structural information, but the tunable parameters of LLE are less than LPP. Thus, LLE is another good solution to the system problem with strong local nonlinear processes, but it cannot be directly combined with PLS. The main reason is that the optimization objectives of those methods are not consistent.

New optimization objective for LLE For the LLE algorithm, the size of the neighborhood has to be determined first, that is, how many samples do we need to linearly represent a sample. Assume that this value is kx . The K nearest neighbors (KNN) of a sample by means of Euclidean distance measurements can be chosen. After finding the kx nearest neighbors of a sample xi , the linear relationship between xi and the nearest neighbor needs to be found to find the weight of the linear relationship. Suppose there are n 7

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 39

samples of m dimensional space, then the mean square error can be used as the loss function.

2

kx

J(A) = min ∑ xi − ∑ ai j,x x j

i=1 j=1 n

(6)

k

s.t.

∑ ai j,x = 1

j=1

where [ai j,x ] := Ax ∈ Rn×kx , (i = 1, 2, · · · , n, j = 1, 2, · · · , kx ) is the weight coefficient. LLE finds d projection vectors w1 , ..., wd to make up a projection matrix. W = [w1 , ..., wd ] ∈ Rm×d with the same weight coefficient ai j,x . The points of the space X are projected into a new low dimension  T T T T reduction space Φ = φ1 φ2 · · · φn ∈ Rn×d (d 0 Sm,x (i, i) = 0

  0,

where i = 1, 2, · · · , n. Only several maximum singular values are inverted, which inevitably causes information to be lost in the traditional generalized inverse definition. Here, all the effective or nonzero singular values are considered to maintain the information integrity. Then, the minimum value problem (8) is changed to the following maximum value problem (10). T JLLE (W ) = maxtr W T XM XMW

 (10)

T T

s.t. W X XW = I where XM := SxVx X. In general, LLE needs to determine the dimension d of the low-dimensional space in advance. The selection of the appropriate d is not easy work. However, the dimension d of the low-dimensional space in PCA can be determined by some specific criteria, such as the cumulative contribution. In fact, the optimization problem (10) can be further rewritten as follows. T XM w JLLE (w) = max wT XM

(11)

T T

s.t. w X Xw = 1 where w ∈ Rm×1 . The criteria for PCA to extract the number of principal components may be applied directly to LLE. 9

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 39

With the new optimal objective of LLE and the combination of global and local characteristics, we will attempt to simultaneously maintain the local information of the process and quality variables and enhance their correlation in the next section.

GPLPLS models and its extracted component GPLPLS models A nonlinear function F(Z) can be written as follows according to the Taylor series:

F(Z) = A(Z − Z0 ) + g(Z − Z0 )

(12)

where the first term A(Z − Z0 ) represents the linear part and the latter g(Z − Z0 ) represents the nonlinear part. In many actual systems, especially near the equilibrium point Z0 , linearity is the primary part and nonlinearity is the secondary part. When the PLS method is used to build the model of a nonlinear system Y = F(X), the result may be poor because PLS, which uses the linear dimensionality reduction method PCA to obtain the principal components, is only built on the relationship between the linear part of the input space X and that of the output space Y . To obtain a better model with local characteristics, the original data are mapped onto a high-dimensional feature space (KPLS model) or the nonlinear characteristics are converted into a combination of multiple local linearized characteristics (LPPLS model). To a certain extent, both schemes can solve some nonlinear problems. However, the feature space of KPLS is not easy to determine and for the primary linear part of LPPLS it is more appropriate to use the global feature description. To better illustrate the relationship among the PLS, and LPPLS with the upcoming new model, let us review the PLS and LPPLS modeling processes and strategy. In fact, the optimization objective function of PLS (2) includes two objectives: one is that latent factors are chosen to manifest factor variation as much as possible and the other is that the correlation between latent factors of input and output space is as strong as possible. The former

10

ACS Paragon Plus Environment

Page 11 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

indicates that its latent factors are extracted using the principle of PCA, that is,

JPCA (w) = wT X T Xw

s.t. wT w = 1

(13)

Correspondingly, in the LPPLS model, LPP (3) is used to replace PCA (13) for preserving the strong local nonlinearities during the PLS decomposition. Inspired by the role of PCA in the PLS model and LPP in the LPPLS model, we attempt to use a new dimensionality reduction method that combines the global (PCA) and local characteristics (LPP or LLE), to extract latent factors in the nonlinear system. Although QGLPLS combines global and local features, the combination of the two is not coordinated. How does one combine the two features to maintain the same objective? According to the expression of a nonlinear function (12), the input and output space can both be divided into two parts: the linear and nonlinear parts and the nonlinear part can be converted into the combination of a plurality of local linear problems by introducing local structural information. Therefore, the input space X or the output space Y , for example, is mapped into a new feature space XF or YF that includes a global linear subspace and a plurality of local linear subspaces. Obviously, the output space Y can also be used for such mapping. Consequently, the following new optimization objective function of the global plus local projection to latent structures (GPLPLS) method is immediately obtained using the feature space XF or YF to replace the original space X or Y , JGPLPLS (w, c) = arg max{wT XFTYF c} T

(14)

T

s.t. w w = 1, c c = 1 1

1

where XF = X + λx θx2 , YF = Y + λy θy2 . . The linear model of X and Y established by PLS (2) actually contains two relationships: one is that the input or output space is divided into ‘scores’ and ‘loads’ (outer relationship), and the other is the relationship between the principal components of the input space X and output space Y (inner model). Obviously, we can preserve the locality information for the inner or outer models 11

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

or both. Therefore, four different optimization objective functions are immediately available for different values of λx and λy . 1), PLS optimization objective functions: λx = 0 and λy = 0; 2), GPLPLSx optimization objective functions: λx > 0 and λy = 0; 3), GPLPLSy optimization objective functions: λx = 0 and λy > 0; 4), GPLPLSx+y optimization objective functions: λx > 0 and λy > 0.

The relationship among the GPLPLS models Three models are given above. What are the internal links of these three models and the differences among their modeling processes? The questions will be answered in this subsection. Assuming the original relationship is Y = f (X). Local linear embedding or local preservation can be regarded as the equilibrium point of system linearization. From this perspective, the modeling of the different λx and λy combinations is as follows.

PLS model:

Yˆ = A0 X

GPLPLSx model:

Yˆ = A1 [X, Xzi ]

GPLPLSy model:

Yˆ = A2 [X, f (Xl j )]

GPLPLSx+y model:

Yˆ = A3 [X, Xzi , f (Xl j )]

where Xzi , i = 1, 2, · · · , kx and Yl j = f (Xl j ), j = 1, 2, · · · , ky are the local characteristics of input and output space, respectively. A0 , A1 , A2 and A3 are model coefficient matrices. Obviously, PLS uses a simple linear approximation of the original system. For a non-linear relatively strong system this approximation effect is generally not good. The GPLPLS uses the method of spatial local decomposition and approximates the original system with the sum of multiple simple linear models. GPLPLSx or GPLPLSy is a special case of GPLPLSx+y . It seems that these three combinations have embraced all the possible GPLPLS models. Let us go back to the GPLPLSx+y model’s 12

ACS Paragon Plus Environment

Page 12 of 39

Page 13 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

optimization function again.  T   1 1 2 2 JGPLPLSx+y (w, c) = arg max{w X + λx θx Y + λy θy c} T

1

1

T

1

T

1

= arg max{wT X TY c + λx wT θx2 Y c + λy wT X T θy2 c + λx λy wT θx2 θy2 c}

(15)

s.t. wT w = 1, cT c = 1 1

1

T

It is obvious that Eq.(15) contains two coupling components (θx2 Y and X T θy2 ) that represent the correlation between the linear primary part and the nonlinear part. In some cases, those coupling components may have a negative impact on the modelling. On the other hand, in addition to the fact that the outer relationship of the input and output space can be expanded into a linear and non-linear combination, the inner relationship between the input and output space, that is, the final model, can also be described as a linear and non-linear combination. It is very natural to model the linear and non-linear parts separately, without considering the establishment of the coupling relationship of these two parts. Correspondingly, there is no need to consider the coupling components between the linear and nonlinear parts in the model’s optimization function. Thus, we immediately get the following GPLPLSxy model optimization function. 1

T

1

JGPLPLSxy (w, c) = arg max{wT X TY c + λxy wT θx2 θy2 c} T

(16)

T

s.t. w w = 1, c c = 1 The parameters λxy ≥ 0 are introduced to control the trade-off between the global and local features, as well.

Capture the components of GPLPLS In this subsection, we will deduce how to capture the components of GPLPLS. To facilitate comparison with the conventional linear PLS, we denote that E0F = XF , and F0F = YF . At the same

13

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 39

time, all the GPLPLS model optimization objectives will be in the following optimization function. 1

T

1

JGPLPLS (w, c) = arg max{wT XFTYF c + λxy wT θx2 θy2 c} T

(17)

T

s.t. w w = 1, c c = 1 At least one of [λx , λy ] and λxy is zero. The latent variables of the GPLPLS model (8) are computed as follows. First, the Lagrangian multiplier vector is introduced to convert the constrained optimization problem (8) into an unconstrained problem. 1

T

1

T Ψ(w1 , c1 ) = wT1 E0F F0F c1 + λxy wT1 θx2 θy2 c1 − λ1 (wT1 w1 − 1) − λ2 (c1 T c1 − 1)

The optimal solutions of w1 and c1 can be obtained by

∂Ψ ∂ w1

= 0 and

∂Ψ ∂ c1

(18)

= 0, respectively. Then,

the optimization problem of the objective function is transformed into the eigenvector correspond    1 1 T 1 1 T T 2T 2 2T 2 ing to the largest eigenvalue of the matrix. E0F F0F + λxy θx θy E0F F0F + λxy θx θy or     1 T 1 1 1 T T 2T 2 2T 2 F0F E0F + λxy θy θx F0F E0F + λxy θy θx , that is cˇ ž     1 T 1 1 1 T T 2T 2 2T 2 E0F F0F + λxy θx θy w1 = θ 2 w1 E0F F0F + λxy θx θy

(19)

    1 1 1 T 1 T T 2T 2 2T 2 F0F E0F + λxy θy θx F0F E0F + λxy θy θx c1 = θ 2 c1

(20)

1

T

1

where θ = wT XFTYF c + λxy wT θx2 θy2 c. The optimal weight vectors w1 and c1 are obtained from equations (19) and (20); then, the latent variables u1 and t1 can be calculated as follows,

t1 = E0F w1 , u1 = F0F c1

14

ACS Paragon Plus Environment

Page 15 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Calculate the load vectors,

p1 =

T t Tt E0F F0F 1 1 , q = 1 kt1 k2 kt1 k2

Residual matrixes E1F and F1F are, E1F = E0F − t1 pT1 , F1F = F0F − t1 qT1 .

Similar to the PLS method, other latent variables are computed according to the reduced residual matrix EiL and FiL , i = 1, 2, · · · , d − 1. In general, we do not need to use all the existing components to establish a regression model. We only use the first d components that can get a better predictive regression model. And d can be determined by the cross-validation test. Remark 1 Like the QGLPLS method, the main idea of the GPLPLS method is to integrate the local and global structural characteristics (covariance). Obviously, the GPLPLS method integrates the global and local structural characteristics better than the QGLPLS method. Unlike the QGLPLS method, the GPLPLS method not only maintains the local structural characteristics but also extracts their correlation information as much as possible. As a result, GPLPLS has the ability to extract the maximum global correlation; at the same time, the local structural correlation between the process and quality variables as much as possible. Remark 2 All of the characteristics of LPPLS are described by local features. This indiscriminate description has advantages in strong nonlinear systems but not necessarily for linearly dominant systems. The method proposed in this paper is directed to the process of linear dominance but still maintains part of the nonlinear relationship. It integrates the global characteristics (covariance) and nonlinear correlation (multi-covariance) as much as possible.

15

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 39

Quality monitoring based on GPLPLS and posterior monitoring assessment Process and quality monitoring based on GPLPLS The PLS model uses two statistics to monitor processes: One is T 2 , which is composed of the former d principal components, reflecting the major variation related to output quality. The other is SPE, which reflects the variation irrelevance to output quality. However, unlike PCA, PLS may not extract as much of the variance of the input space. Therefore, the residual part may have large variations, making it inappropriate to be monitored by the SPE statistic. In other words, the residual subspace of PLS is not a true residual subspace, we prefer to call it the remaining subspace; that is, it is the input space to remove the relevant part of output. On the other hand, the GPLPLS monitoring method is very similar to the PLS method. Therefore, in the processes of monitoring the GPLPLS method, T 2 statistics are used to monitor the principal component and the remaining subspace. Similar to traditional PLS, process monitoring based on the GPLPLS method is also divided into two parts: off-line training and on-line monitoring. The detailed process is given in the following. The input space X and the output space Y of the GPLPLS model mapping to a low dimensional space is defined by a small number of latent variables (t1 , · · · ,td ). The E0F and F0F are decomposed as follows: d

E0F = ∑ ti pTi + E¯0L = T PT + E¯0F i=1 d

F0F = ∑

(21) ti qTi + F¯0L

T

= T Q + F¯0F

i=1

where T = [t1 ,t2 , · · · ,td ] are the latent score vectors. P = [p1 , · · · , pd ] and Q = [q1 , · · · , qd ] are the

16

ACS Paragon Plus Environment

Page 17 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

loadings for E0F and F0F , respectively. To represent ti in terms of E0F 1

T = E0F R = (I + λx Sx2 )E0 R

(22)

where R = [r1 , ..., rd ] , in which i−1

ri = ∏ (In − w j p j T )wi j=1

This decompositions (21) and (22) are not for the scaled and mean centered E0 and F0 data and are difficult to apply in practice. From (21), we have

E0 = T0 PT + Xe

(23)

F0 = T0 Q¯ T + F¯0 = E0 RQ¯ T + F¯0

(24)

where T0 = E0 R, Xe = E0 − T0 PT , Q¯ = T0+ F0 . To perform process monitoring on a new data sample x and subsequently y, which are scaled and mean-centered, an oblique projection onto input data space x is induced, x = xˆ + xe xˆ = RPT x

(25)

xe = (I − PRT )x 2 and T 2 statistics are monitored with the following calculations, The Tpc e

t = RT x −1 1 T T T0 t T =t Λ t =t n−1 0  −1 1 T 2 T −1 T xe Te = xe Λe xe = xe X Xe n−1 e 2

T −1

T



(26)

where Λ and Λe are the sample covariance matrices. Generally, the corresponding control limits

17

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 39

2 and T 2 statistics are estimated by the F distribution. 26 However, the T of the GPLPLS of the Tpc pc e

method is not obtained by a scaled and mean-centered matrix E0F and the output variables may not be subject to Gaussian distribution. Therefore, the corresponding control limits Th pc,α and The,α should be calculated by their probability density functions, which can be estimated by using the non-parametric kernel density estimation (KDE) method. 27 Finally, The fault detection logic in the input space is, 2 Tpc > Th pc,α

Quality-relevant faults

2 Tpc > Th pc,α or Te2 > The,α

Process-relevant faults

(27)

2 Tpc ≤ Th pc,α and Te2 ≤ The,α Fault-free

Based on the previous modeling process and monitoring strategy, the GPLPLS algorithm for multiple input and output data is given as follows. 1. Scale the raw data to a zero mean and unit variance to obtain X and Y . Combine the LLE/LPP and PLS optimization goals (11)/(3) and (2), then perform GPLPLS on X and Y using (23) and (24) to get T0 , Q¯ and R. The number of GPLPLS factors d is determined by crossvalidation. 2. Form the input-remaining subspace Xe . 3. Calculate the control limit by the non-parametric kernel density estimation (KDE) method 27 and perform fault monitoring based on the fault detection logic (27).

Posterior monitoring assessment So far, many quality-relevant monitoring methods are available for industrial processes, for example, the well-known Tennessee Eastman Process (TEP) simulation platform benchmark. 25 The target of most methods is for the quality-relevant alarm rate to be as high as possible. Which monitoring result is more reasonable? This problem seems to have little attention.

18

ACS Paragon Plus Environment

Page 19 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Similar to the performance evaluation index of a control loop, the following posterior monitoring assessment (PMA) index is introduced to assess whether the quality-relevant alarm rates are reasonable or not. PMA =

E(YN2 ) E(YF2 )

(28)

where E is mathematical expectation. YN and YF are the training output data and fault case output data, respectively. Both are normalized by the mean and standard deviation of YN . An index close to 1 implies that the quality of the fault case is close to normal conditions. An index greater than 1 means that the quality of the fault case is better than the normal condition and an index far from 1 indicates that the quality differs greatly from normal operating conditions and the corresponding 2 (GPLPLS method) should be a high value and the quality-relevant index T 2 (PLS method) or Tpc

others should be low. However, a single PMA index does not really reflect the dynamic change because the controllers in the process can reduce the effect of some faults. Therefore, two PMA indices are used to describe the dynamic and steady-state effects. On the other hand, the output variables may be of high dimension. To ensure that the assessment is reasonable, the worst strategy is adopted, that is ( PMA1 = min

) E YN2 (k0 : k1 , i)  , i = 1, 2, · · · , l E YF2 (k0 : k1 , i)

( PMA2 = min

) E YN2 (k2 : n, i)  , i = 1, 2, · · · , l E YF2 (k2 : n, i)

(29)

(30)

where ki , i = 0, 1, and2 are given constants. It is worth noting that those PMA indices are only used to check whether the previous fault detection results are reasonable. Compared with the fault detection based on the models (such as PLS, and GPLPLS models), it is objective but not suitable for detecting whether the quality-relevant fault has occurred because the quality testing takes more time.

19

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Case study The proposed fault detection strategy based on the GPLPLS model was tested on the TEP simulation platform. 25 As an industrial process benchmark, TEP has been widely used for comparing various approaches to process monitoring and fault diagnosis. 28–31 A detailed description of TEP can be found in Chiang, Russell and Braatz, 28 and the simulation data are downloaded from “http://web.mit.edu/braatzgroup/links.html”. We will compare them with the monitoring strategies of PLS, QGLPLS, LPPLS and a concurrent projection to the latent structures (CPLS) 32 model. The CPLS algorithm projects the input and output space into five subspacesˇcža joint input´lCoutput subspace, an output-principal subspace, an output residual subspace, an input-principal subspace and an input-residual subspace. Just considering the monitoring ability for quality-relevant faults, the compared CPLS model is slight where the input-principal and the input-residual subspaces are replaced by the input-remaining subspace Xe . The corresponding monitoring strategies are replaced by Te2 . To highlight process-based quality monitoring, the monitoring for the output-principal and the output-residual subspaces (CPLS model) is not considered in the next simulation analysis. In order to compare different methods, we use two different data sets. One is consistent with, 4 and the other is consistent with. 24

Models and discussion The process variable matrix X consists of all process measurement variables (XMEAS (1:22)) and 11 manipulated variables (XMV (1:11) except XMV (5) and XMV (9)). XMEAS (35) and (36) are composed as a quality variable matrix Y . 4 The training data set is normal data IDV(0), and the testing data sets are 21 faulty data IDV(1-21). The fault detection rate (FDR) and false alarm rate (FAR) with a 99.75% control limit for PLS, CPLS and GPLPLS are given in Table 1 and Table 2. The FDR and FAR are defined as follows, 30

FDR =

No.of samples(T > Tth | f 6= 0) × 100 total samples( f 6= 0)

20

ACS Paragon Plus Environment

Page 20 of 39

Page 21 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

FAR =

No.of samples(T > Tth | f = 0) × 100 total samples( f = 0)

In this simulation, the parameters of the GPLPLSxy method are chosen as kx = 22, ky = 23, λx = λy = 0, andλxy = 1. k0 = 161. The number of principal components for the PLS, CPLS and GPLPLS models are set to 6, 6 and 2 by the cross-validation-based approach. k1 = n = 960 and k2 = 701. The results of PMAi , andi = 1, and2 are also given in Table 1. Table 1: FDR of PLS, CPLS and GPLPLS PLS IDV 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

CPLS

T2

Te2

T2

Te2

99.63 98.50 1.00 19.13 22.00 99.25 100.00 96.00 0.50 26.38 26.63 97.50 94.88 91.50 1.25 20.13 77.38 89.38 0.50 30.50 41.88

99.88 97.25 0.88 100.00 100.00 100.00 100.00 97.00 1.00 82.63 75.75 99.88 95.00 100.00 1.38 37.63 96.63 90.13 41.13 90.75 47.63

84.13 94.75 0.13 7.25 17.38 98.25 97.88 76.13 0.38 16.38 8.13 83.75 88.00 20.88 1.25 9.13 36.50 89.00 0.00 20.13 37.25

99.75 98.25 1.13 100.00 100.00 100.00 100.00 97.88 1.63 84.63 77.13 99.75 95.13 100.00 2.88 44.00 97.00 89.88 39.00 88.25 45.75

T2 35.00 74.00 0.25 0.50 13.25 96.88 26.00 72.63 0.38 17.50 1.50 71.88 75.50 0.38 3.13 8.63 8.75 87.00 0.00 12.50 21.25

GPLPLSxy Te2 SPE 99.75 99.25 97.88 98.00 1.25 0.63 100.00 84.38 100.00 21.63 100.00 100.00 100.00 100.00 97.88 95.50 0.75 0.63 84.75 20.75 77.25 59.13 99.88 96.88 95.25 93.00 100.00 100.00 3.25 0.25 44.00 8.25 96.63 86.38 90.13 89.25 36.13 4.00 90.25 24.63 50.75 36.38

PMA PMA1 PMA2 0.204 0.693 0.066 0.058 0.772 0.8670 0.888 0.9277 0.3018 0.9461 0.0029 0.0026 0.1439 0.9721 0.0596 0.0951 0.8977 0.8465 0.5888 0.5064 0.7830 0.6956 0.0404 0.0232 0.0229 0.0208 1.0721 0.8580 0.9027 0.5710 0.7770 0.5355 0.6443 0.6862 0.0049 0.0037 0.9453 0.8859 0.6700 0.7366 0.2342 0.1063

Three detection criteria are given in the GPLPLSxy model in Table 1, where Te2 are the residual detection indices. GPLPLSxy does not extract variances in the input space in a descending order unlike PCA (13). Therefore, the input residuals can still contain large variations, making them inappropriate to be monitored by the SPE statistic. For example, IDV(5) is a quality-recoverable step fault. Although the cascade controller can compensate for its impact on output quality, at least the process-related monitoring statistic should provide a consistent fault detection statistic. 21

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 2: FAR of PLS, CPLS and GPLPLS PLS IDV 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

T2

Te2

0.00 0.63 0.63 0.00 0.00 0.00 0.00 0.00 1.25 0.00 0.63 0.63 0.00 0.00 0.00 1.25 0.63 0.00 0.00 0.00 1.25

0.00 0.00 0.00 0.63 0.63 0.63 0.00 0.63 1.25 0.00 0.63 0.00 0.00 0.63 0.00 1.25 0.00 0.63 0.00 0.00 0.63

CPLS T2 Te2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.63 0.00 0.63 0.00 0.00 0.00 0.00 0.63 0.00 0.63 0.63 0.00 0.00 0.00 0.63 0.63 0.63 0.63 0.00 0.00 0.00 0.00 0.00 2.50 2.50 0.00 0.00 0.00 0.63 0.00 0.00 0.00 0.00 0.00 2.50

22

GPLPLSxy T2 Te2 0.00 0.63 0.00 0.00 0.00 0.63 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.63 0.00 0.63 0.00 0.00 0.00 0.00 0.00 1.25 2.50 0.63 0.00 0.00 0.00 0.63 0.00 0.00 0.00 0.00 0.63 2.50

ACS Paragon Plus Environment

Page 22 of 39

Page 23 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

The SPE statistic does not provide this whereas the Te2 statistic does. From the results of the Te2 statistic, most of these failures (IDV(1-21) except for IDV(3, 9, and 15)) have caused a big change in the input space, but some SPE statistics do not reach this conclusion. Therefore, we do not give the SPE statistics for the FAR (Table 2). Although the results of the different FDR methods (Table 1) are different, the results of the corresponding FAR (Table 2) are almost the same. This implies that the new model does not significantly increase the risk of false alarms. With these two PMA indices, the 21 faults can be divided into two types. One is qualityirrelevant faults (PMA1 > 0.9 or PMA1 + PMA2 > 1.5 ) which include IDV(3,4,9,11,14,15, and 19) and the other is quality relevant faults. The quality relevant faults can be further subdivided into four types. The first type (IDV(10,16,17, and 20)) of fault has a slight impact on quality (0.5 < PMAi < 0.8 i = 1, 2). The second type are quality-recoverable faults (PMA1 < 0.35 and PMA2 > 0.65) which include IDV(1,5, and 7). The third type (IDV(2, 6, 8, 12, 13, and 18)) have a serious impact on quality (PMAi < 0.1 i = 1, 2). The fourth type only includes IDV(21) which is a fault that causes the output variables to drift slowly. It should be noted that this classification depends on the choice of parameters k0 , k1 and k2 . It is only a preliminary result and the final conclusion requires further analysis, but it has a reference value. For the serious quality-relevant faults, all methods can give consistent test results, so those faults will be not discussed in the next fault detection analysis.

Fault detection analysis In this subsection, the difference between the fault detection results of the PLS, CPLS and GPLPLS models will be discussed. There are several cases (IDV(3,9,15)) where there are no failures or the failures have little impact on the output and process variables. Both approaches give consistent conclusions. For other faults, their diagnosis results have some differences. Three faulty scenarios including quality-recoverable faults, slight quality-relevant faults and quality-irrelevant faults are analyzed as follows. It should be noted that in all the monitoring figures, the red dotted line is the 99.75% control limit and the blue line is the monitored value. In the figure of the predicted output 23

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

values, the blue dotted lines are the real output values and the green lines are the predicted output values. Scenario 1: Quality-recoverable faults The faulty cases are IDV(1), IDV(5) and IDV(7). All of those faulty cases are step change faults but the feedback controllers or the cascade controller in the process can reduce the product quality effect of those faults. Therefore, the product quality variables in IDV(1), IDV(5) and IDV(7) tend to return to normal as shown in Figs.1-2. The corresponding monitoring results of IDV(1, 5, and 7) are shown in Figs.3-5 by the PLS, CPLS and GPLPLS methods, respectively. All the T 2 and Te2 statistics of the PLS, CPLS and GPLPLS methods detect process-relevant faults in the input 2 statistic tends to return to normal whereas the T 2 statistic remains at a space. For GPLPLS, the Tpc e

high value, which means that those faults are quality-recoverable faults. Existing works that report high quality-relevant fault detection rates (T 2 , for example, of IDV(1) and IDV(7) in the PLS and CPLS methods) for those faults may be reporting nuisance alarms. The CPLS method eliminates some nuisance alarms for IDV(1) and IDV(7) and the T 2 statistics are also very close to the control limits, but the T 2 statistics are still over the control limits. This shows that the CPLS method can improve detection performance but still does not capture the nature of the quality-recoverable fault detection problem. In this scenario, the GPLPLS method can simultaneously and accurately reflect the changes in process and quality variables. PLS for IDV(1)

5.5

PLS for IDV(5)

5.5

5

PLS for IDV(7)

5.5

5

5

4.5

4.5 4.5 4

3.5

4

Output Y

Output Y

4

Output Y

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 39

3.5

3

3.5

3 3

2.5

2.5

2.5

2

1.5

2

2 0

100

200

300

400

500

600

700

800

900

1000

1.5 0

100

Sample

200

300

400

500

600

700

800

900

1000

0

Sample

100

200

300

400

500

600

700

800

900

1000

Sample

Figure 1: The output predicted values for IDV(1), IDV(5) and IDV(7) using the PLS method Scenario 2: Quality-irrelevant faults The faulty cases include (IDV(4,11,14, and 19)). The faults in IDV(4), IDV(11) and IDV(14) were reported to be quality-irrelevant but process-relevant faults with a different data set. 24 In this case study, the PMA indices indicate that these are still quality-irrelevant faults and the corre24

ACS Paragon Plus Environment

Page 25 of 39

GPLPLS xy for IDV(1)

5.5

GPLPLS xy for IDV(5)

5.5

5

GPLPLS xy for IDV(7)

5.5

5

5

4.5

4.5 4.5 4

3.5

4

Output Y

Output Y

Output Y

4

3.5

3

3.5

3 3

2.5

2.5

2.5

2

1.5

2

2 0

100

200

300

400

500

600

700

800

900

1000

1.5 0

100

200

300

400

500

Sample

600

700

800

900

1000

0

100

200

300

400

500

Sample

600

700

800

900

1000

Sample

Figure 2: The output predicted values for IDV(1), IDV(5) and IDV(7) using the GPLPLSxy method

PLS for IDV(1)

800

CPLS for IDV(1)

500

GPLPLS xy for IDV(1)

150

400

600

100 400

T2

T2

T2

300 200 50 200

100

0

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

1500

500

600

700

800

900

1000

600

700

800

900

1000

sample

2000

2000

1500

1500

T 2e

T 2e

T 2e

1000 1000

1000

500 500

0

500

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

500

sample

Figure 3: PLS, CPLS and GPLPLSxy monitoring results for IDV(1)

PLS for IDV(5)

200

CPLS for IDV(5)

150

GPLPLS xy for IDV(5)

150

150

100

T2

100

T2

T2

100

50

50

50

0

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

15000

500

600

700

800

900

1000

600

700

800

900

1000

sample

8000

15000

6000

T 2e

T 2e

10000

T 2e

10000 4000

5000

5000 2000

0

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

500

sample

Figure 4: PLS, CPLS and GPLPLSxy monitoring results for IDV(5)

PLS for IDV(7)

600

CPLS for IDV(7)

500

500

GPLPLS xy for IDV(7)

500

400

400

300

300

300

T2

T2

T2

400

200

200

200 100

100 0

100

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

600

1200

500

1000

400

800

500

600

700

800

900

1000

600

700

800

900

1000

sample 800

300

T 2e

T 2e

600

T e2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

600

200

400

100

200

400

200

0

0 0

100

200

300

400

500

sample

600

700

800

900

1000

0 0

100

200

300

400

500

600

700

800

900

1000

sample

0

100

200

300

400

500

sample

Figure 5: PLS, CPLS and GPLPLSxy monitoring results for IDV(7)

25

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

sponding output quality value and its prediction values are shown in Figs.6- 7. The monitoring results of IDV(4), IDV(11) and IDV(14) are shown in Figs.8 - 10. All these faults are related to the reactor cooling water, and these disturbances hardly affect the output product quality. GPLPLSxy , 2 shows that these faults are quality-irrelevant faults, and T 2 detects these faults in the input Tpc e

remaining subspace. Their corresponding fault detection rates are 100.00%,77.25% and 100.00%. For PLS and CPLS, T 2 and Te2 detect these faults, respectively. That is PLS-based or CPLS-based monitoring indicates that these disturbances are quality-reverent. It should be noted that the CPLS method can also efficiently filter out the quality-irrelevant faults (IDV(4) and IDV(11). Although the monitoring of IDV(14) is not ideal, it basically filters out such faults (from 91.50% to 20.88% ). This indicates that CPLS can improve the monitoring for such faults. Anyway, existing works that report high detection rates for these faults at best give nuisance alarms. In this scenario, the GPLPLS method shows its superior performance over PLS and CPLS in filtering out the qualityirrelevant faults. PLS for IDV(4)

PLS for IDV(11)

5.5

5

4.5

4.5

4.5

4

4

Output Y

5

3.5

3.5

4

3.5

3

3

3

2.5

2.5

2.5

2

2 0

100

200

300

400

500

600

700

800

900

1000

PLS for IDV(14)

5.5

5

Output Y

Output Y

5.5

2 0

100

200

300

400

Sample

500

600

700

800

900

1000

0

100

200

300

400

Sample

500

600

700

800

900

1000

Sample

Figure 6: The output predicted values for IDV(4), IDV(11) and IDV(14) using the PLS method

GPLPLS xy for IDV(4)

5.5

GPLPLS xy for IDV(11)

5.5

5

4.5

4.5

4.5

4

4

Output Y

5

3.5

3.5

4

3.5

3

3

3

2.5

2.5

2.5

2

2 0

100

200

300

400

500

600

700

800

900

1000

GPLPLS xy for IDV(14)

5.5

5

Output Y

Output Y

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 39

2 0

100

200

300

Sample

400

500

600

700

800

900

1000

0

Sample

100

200

300

400

500

600

700

800

900

1000

Sample

Figure 7: The output predicted values for IDV(4), IDV(11) and IDV(14) using GPLPLSxy method

Scenario 3: Slight-quality-relevant faults The faulty cases include (IDV(10,16,17, and 20)). Few researchers pay special attention to this view. From the quality-relevant alarm rate, they are similar to the quality-recoverable faults. 26

ACS Paragon Plus Environment

Page 27 of 39

PLS for IDV(4)

50

CPLS for IDV(4)

40

40

GPLPLS xy for IDV(4)

10 8

30

T2

T2

6

T2

30 20

20

4 10

10 0

2

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

200

200

150

150

150

100

100

50

50

0 300

400

500

400

600

700

800

900

500

600

700

800

900

1000

600

700

800

900

1000

100 50

0 200

300

T e2

200

T 2e

250

T e2

250

100

200

sample

250

0

100

sample

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

500

sample

Figure 8: PLS, CPLS and GPLPLSxy monitoring results for IDV(4)

PLS for IDV(11)

100

CPLS for IDV(11)

40

80

GPLPLS xy for IDV(11)

20

30

15

T2

T2

T2

60 20

10

40 10

20 0

5

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

500 400

400

300

300

500

600

700

800

900

1000

600

700

800

900

1000

sample

500

600 500

T e2

T 2e

T e2

400

200

200

100

100

0

0

300 200

0

100

200

300

400

500

600

700

800

900

1000

100 0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

500

sample

Figure 9: PLS, CPLS and GPLPLSxy monitoring result for IDV(11)

PLS for IDV(14)

250

CPLS for IDV(14)

50

30

6

T2

150

T2

8

T2

40

100

20

50

10

0

0 0

100

200

300

400

500

600

700

800

900

1000

4 2 0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

800

800

600

600

600

400

400

200

200

0

0 300

400

500

sample

300

400

600

700

800

900

1000

500

600

700

800

900

1000

600

700

800

900

1000

T 2e

800

T 2e

1000

200

200

sample

1000

100

100

sample

1000

0

GPLPLS xy for IDV(14)

10

200

T 2e

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

400 200 0 0

100

200

300

400

500

600

700

800

900

1000

sample

0

100

200

300

400

500

sample

Figure 10: PLS, CPLS and GPLPLSxy monitoring result for IDV(14)

27

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

Although they also belong to quality-relevant faults, these faults have little impact on quality, and the associated monitoring values T 2 are relatively small. To a certain extent, these faults can also be seen as quality-irrelevant faults. Many methods, such as the PLS method, have failed to accurately detect them. The monitoring results of IDV(10), IDV(16) IDV(17) and IDV(20) are shown in Figs.11 - 14 and the corresponding output prediction values are shown in Fig.15 and 16. From Figs.11 - 14 and Figs.15 and 16, the monitoring process changes of the GPLPLS method better matches the quality changes. PLS for IDV(10)

80

CPLS for IDV(10)

60

GPLPLS xy for IDV(10)

40

50 60

30

40

T2

T2

T2

40 30

20

20 20

10 10

0

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

1000

800

800

800

600

600

400

400

200

200

0

0 200

300

400

500

600

700

800

900

T 2e

1000

T 2e

1000

T 2e

1200

100

200

300

400

1000

500

600

700

800

900

1000

600

700

800

900

1000

sample

1200

0

100

sample

1200

600 400 200 0

0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

500

sample

Figure 11: PLS, CPLS and GPLPLSxy monitoring results for IDV(10)

PLS for IDV(16)

80

CPLS for IDV(16)

40

GPLPLS xy for IDV(16)

30 25

60

30

40

T2

T2

T2

20 20

15 10

20

10 5

0

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

250 200

500

600

700

800

900

1000

600

700

800

900

1000

sample

300

300

250

250

200

200

T e2

T 2e

T e2

150 150

150

100 100 50

100

50

0

50

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

500

sample

Figure 12: PLS, CPLS and GPLPLSxy monitoring results for IDV(16)

PLS for IDV(17)

4000

CPLS for IDV(17)

200

3000

GPLPLS xy for IDV(17)

25 20

150

T2

T2

T2

15 2000

100

10 1000

50

0

0 0

100

200

300

400

500

600

700

800

900

1000

5 0 0

100

200

300

400

sample 3

500

600

700

800

900

1000

0

100

200

300

400

sample

×10 4

4

500

600

700

800

900

1000

600

700

800

900

1000

sample

×10 4

×10 4

4

2.5 3

3

1.5

T 2e

T 2e

2

T 2e

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 39

2

2

1 1

1

0.5 0

0 0

100

200

300

400

500

sample

600

700

800

900

1000

0 0

100

200

300

400

500

600

700

800

900

1000

0

sample

100

200

300

400

500

sample

Figure 13: PLS, CPLS and GPLPLSxy monitoring results for IDV(17) As can be seen from the above three scenarios, the GPLPLS method can filter out the nuisance alarms, both for slight-quality-relevant faults and quality-relevant faults, or for quality-recoverable faults. There are two possible reasons: The first is that the principal component of the GPLPLS 28

ACS Paragon Plus Environment

Page 29 of 39

PLS for IDV(20)

200

CPLS for IDV(20)

60

GPLPLS xy for IDV(20)

40

50 150

30

100

T2

T2

T2

40 30

20

20 50

10 10

0

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

10000

5000

5000

0 300

400

500

400

600

700

800

900

500

600

700

800

900

1000

600

700

800

900

1000

5000

0 200

300

T 2e

10000

T 2e

10000

T e2

15000

100

200

sample

15000

0

100

sample

15000

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

500

sample

Figure 14: PLS, CPLS and GPLPLSxy monitoring results for IDV(20)

PLS for IDV(16)

PLS for IDV(17)

5.5

5

4.5

4.5

4.5

4

4

Output Y

5

3.5

3.5

4

3.5

3

3

3

2.5

2.5

2.5

2

2 0

100

200

300

400

500

600

700

800

900

1000

PLS for IDV(20)

5.5

5

Output Y

Output Y

5.5

2 0

100

200

300

400

Sample

500

600

700

800

900

1000

0

100

200

300

400

Sample

500

600

700

800

900

1000

Sample

Figure 15: The output predicted values for IDV(16), IDV(17) and IDV(20) using the PLS method

GPLPLS xy for IDV(16)

5.5

GPLPLS xy for IDV(17)

5.5

5

4.5

4.5

4.5

4

4

Output Y

5

3.5

3.5

4

3.5

3

3

3

2.5

2.5

2.5

2

2 0

100

200

300

400

500

Sample

600

700

800

900

1000

GPLPLS xy for IDV(20)

5.5

5

Output Y

Output Y

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

2 0

100

200

300

400

500

600

700

800

900

1000

0

Sample

100

200

300

400

500

600

700

800

900

1000

Sample

Figure 16: The output predicted values for IDV(16), IDV(17) and IDV(20) using GPLPLSxy method

29

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

method is based on the global characteristics adding non-linear local structural features and this strategy enhances its nonlinear mapping capabilities. The second is the use of non-Gaussian thresholds and this strategy makes it possible to deal with cases where the signal does not necessarily meet the Gaussian assumption.

Other GPLPLS methods and the new data set For the same data set, the FDRs for the other three GPLPLS models are shown in the following Table 3, in which K = [kx , ky ]. From the highlighted columns in the table, it can be seen that the consistency of these methods is good, especially the FDRs of GPLPLSx+y and GPLPLSxy . To more clearly discuss these models, IDV(7) is selected for further analysis. The corresponding monitoring results are given in Fig. 17. Figure 17 shows that the detection process of IDV(7) by 2 statistic tends to return to GPLPLSy is inconsistent with the PMA in Table 1. Although the Tpc

normal, there are still a lot of alarms, and most of them are nuisance alarms. The possible reason is that it only locally enhances the output space. This model has a better effect on the input space being linear and the output space being non-linear. However, the input space of the benchmark TEP may also have strong non-linearity, which results in better monitoring of the other three models. The monitoring results also give the same conclusion, as shown in Table 4 (in which Σ = [σx , σy ]) by the GPLPLS method based on the local strategy of LPP with one or two more parameters, . GPLPLS x for IDV(7)

500

GPLPLS y for IDV(7)

500

GPLPLS x+y for IDV(7)

500

2

400

400

300

300

T2

T2

300 200

T2

T 99.75% limit

400

200

100

200

100

0

100

0 0

100

200

300

400

500

600

700

800

900

1000

0 0

100

200

300

400

sample

500

600

700

800

900

1000

0

100

200

300

400

sample

800

500

600

700

800

900

1000

600

700

800

900

1000

sample

1200

800

1000 600

600

400

T 2e

T 2e

800

T e2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 39

600

400

400 200

200 200

0

0 0

100

200

300

400

500

sample

600

700

800

900

1000

0 0

100

200

300

400

500

600

700

800

900

1000

sample

0

100

200

300

400

500

sample

Figure 17: The monitoring results for IDV(7) by different GPLPLS methods Thus far, the proposed method has not been compared with the methods of QGLPLS, LPPLS and so on. This is because these methods have parameters that need to be adjusted, and different parameters have different results. To be as consistent as possible with the existing results, we

30

ACS Paragon Plus Environment

Page 31 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Table 3: FDR of GPLPLS methods

IDV 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

GPLPLSx kx = 16 T2 Te2 35.50 99.75 70.75 98.38 0.00 1.38 0.00 100.00 10.75 100.00 96.13 100.00 23.50 100.00 68.63 97.88 0.00 1.50 13.88 84.75 0.88 77.50 68.25 99.88 72.63 95.25 0.00 100.00 0.88 2.50 7.13 45.38 1.88 96.88 86.38 90.00 0.00 38.25 8.63 90.63 14.00 52.75

GPLPLSy ky = 16 T2 Te2 38.75 99.75 95.13 98.13 1.00 1.25 1.25 100.00 19.25 100.00 98.75 100.00 79.25 100.00 81.88 97.88 0.75 1.38 21.13 84.75 2.88 77.00 87.00 99.75 88.00 95.13 3.25 100.00 1.38 3.50 12.88 43.50 11.38 97.00 88.88 90.00 0.00 38.50 22.50 89.75 31.63 44.25

31

GPLPLSx+y K = [22, 24] T2 Te2 35.13 99.75 74.00 98.38 0.25 1.13 0.50 100.00 13.25 100.00 96.88 100.00 26.25 100.00 72.63 97.88 0.38 1.13 17.50 84.38 1.50 76.63 71.88 99.88 75.50 95.13 0.50 100.00 3.13 1.63 8.63 42.63 8.88 96.75 87.00 90.00 0.00 37.75 12.50 90.50 21.25 49.63

ACS Paragon Plus Environment

GPLPLSxy K = [22, 23] T2 Te2 35.00 99.75 74.00 98.38 0.25 1.38 0.50 100.00 13.25 100.00 96.88 100.00 26.00 100.00 72.63 97.88 0.38 1.25 17.50 84.50 1.50 76.75 71.88 99.88 75.50 95.25 0.38 100.00 3.13 1.63 8.63 43.75 8.75 96.88 87.00 89.88 0.00 37.38 12.50 90.38 21.25 50.25

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 39

Table 4: FDR of GPLPLS methods

IDV 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

GPLPLSx kx = 12 σx = 2 T2 Te2 49.00 99.75 56.38 98.38 0.13 1.25 0.38 100.00 12.88 100.00 96.75 100.00 26.00 100.00 71.50 97.88 0.25 1.00 18.25 84.00 2.00 76.75 71.25 99.88 75.13 95.25 0.25 100.00 2.75 1.75 9.13 42.88 5.63 97.00 86.75 89.88 0.00 36.75 10.25 90.38 20.38 49.50

GPLPLSy ky = 20 σy = 0.05 T2 Te2 61.13 99.75 94.88 98.13 0.75 1.75 1.38 100.00 19.63 100.00 99.13 100.00 56.75 100.00 85.25 97.88 1.00 1.50 21.50 84.75 3.00 77.13 88.38 99.75 84.75 95.13 8.75 100.00 1.50 3.13 9.75 45.00 14.25 96.88 89.38 90.00 0.13 39.00 25.13 89.38 29.50 43.63

32

GPLPLSx+y K = [12, 20] Σ = [2, 1] T2 Te2 47.63 99.75 46.75 98.38 0.50 1.25 0.50 100.00 13.75 100.00 97.00 100.00 27.13 100.00 72.63 97.88 0.38 1.25 18.38 84.25 2.38 77.00 71.75 99.88 75.75 95.25 0.38 100.00 3.38 1.88 9.50 43.00 7.50 97.00 86.88 89.88 0.00 37.13 11.75 90.38 20.88 49.00

ACS Paragon Plus Environment

GPLPLSxy K = [14, 22] Σ = [0.05, 1.3] T2 Te2 41.75 99.75 60.00 98.38 0.25 1.25 0.38 100.00 12.75 100.00 96.88 100.00 26.63 100.00 72.00 97.88 0.38 1.25 18.00 84.25 1.88 77.00 71.63 99.88 75.63 95.25 0.50 100.00 2.75 1.88 9.13 43.00 5.50 97.00 86.75 89.88 0.00 37.13 11.25 90.38 20.88 49.38

Page 33 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

selected the same data set as 24 to carry out the following new tests. In this case study, all the process variables (XMEAS (1:22):=x1 : x22 ) and 11 manipulated variables (x23 : x33 ) except XMV (12) are selected to constitute the process variable matrix X. The composition G of stream 9 and composition E of stream 11, i.e., XMEAS (35) and (38), are composed as quality variable matrix Y . In the simulation, the parameters of the manifold learning based PLS model are set as: δx = 0.1, δy = 0.8, kx = 12 and ky = 12 (QGLPLS model); δx = 1.5, δy = 0.8, kx = 20 and ky = 15 (LPPLS model), kx = 11 and ky = 16 (GPLPLSxy model). The quality-relevant FDR values with 99.75% control limit of the PLS, CPLS, QGLPLS, LPPLS and GPLPLS models are given in Table 5. As can be seen from the Table 5 and 1, although the data set is different, the PMA results are similar. Therefore, the quality-relevant monitoring results should be similar, and it is obvious that the GPLPLSxy model gives a more consistent conclusion. For the LPPLS model, the main reason is that the parameter selection is not suitable because it is difficult to select the appropriate parameters and the method of parameter fixing is still an open problem. The QGLPLS model has not only unsuitable parameters, but also unsuitable model structure. Both of them lead to poor monitoring results.

Conclusion In this paper, a novel statistical model based on a global plus local projection to the latent structures (GPLPLS) model is proposed, and it focuses more attention on the relevance of extracted principal components. The GPLPLS model further illustrates the importance of maximizing correlation information between the process and quality variables. On the basis of the local preserving capability of LLE/LPP, the GPLPLS method ensures that the correlation between the data after dimension reduction is still the largest, which is not mentioned in the QGLPLS method. So the linear correlation information between the process and quality variables extracted by the GPLPLS method is maximized and the local nonlinear structural correlation information is extracted as much as possible. The TEP benchmark platform simulation results also show that the GPLPLS method

33

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 39

Table 5: Quality-relevant FDRs using different methods IDV 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

PLS 99.13 98.00 0.38 0.63 21.88 99.25 36.75 92.50 0.63 30.00 1.38 87.50 93.88 33.50 0.63 14.25 56.00 88.00 0.00 26.63 29.88

CPLS 96.13 81.25 0.50 0.13 20.38 99.25 35.63 87.75 0.38 28.00 0.25 84.75 85.00 1.63 0.75 12.63 37.13 88.00 0.00 27.75 24.50

QGLPLS 99.75 97.63 1.13 98.88 21.38 99.38 83.63 93.38 0.75 23.13 53.50 87.75 95.25 96.88 1.50 9.00 96.75 90.25 2.50 36.25 44.38

LPPLS 98.63 98.13 0.50 0.25 99.63 100.00 37.63 92.25 0.63 49.00 2.88 95.50 94.13 2.50 0.75 53.38 52.75 87.88 3.25 28.13 42.38

34

GPLPLS 66.75 92.75 0.50 0.25 17.63 96.38 27.75 74.88 0.00 13.88 0.38 75.50 79.75 0.00 0.50 8.00 1.63 86.75 0.00 10.25 8.63

ACS Paragon Plus Environment

PMA1 0.204 0.066 0.772 0.888 0.302 0.003 0.144 0.060 0.898 0.589 0.783 0.040 0.023 1.072 0.903 0.777 0.644 0.005 0.945 0.670 0.234

PMA2 0.683 0.055 1.189 1.024 1.038 0.003 1.031 0.072 0.805 0.811 0.756 0.032 0.022 0.774 0.565 0.526 0.696 0.004 0.751 0.775 0.086

Page 35 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

greatly improves the quality-relevant monitoring performance. Therefore, the development of the GPLPLS model makes sense and can be more effective for monitoring quality-relevant nonlinear statistical processes.

Acknowledgment J.L. Zhou thanks the grant funded by NSFC (No. 61473025 ); J. Wang thanks the grant funded by NSFC (No. 61573050) and the open-project grant funded by the State Key Laboratory of Synthetical Automation for Process Industry at the Northeastern University (No. PAL-N201702).

References 1. Ding, S. X. Data-driven design of monitoring and diagnosis systems for dynamic processes: A review of subspace technique based schemes and some recent results. Journal of Process Control 2014, 24, 431–449. 2. Aumi, S.; Corbett, B.; Clarke-Pringle, T. Data-driven model predictive quality control of batch processes. AIChE Journal 2013, 59, 2852–2861. 3. Peng, K.; Zhang, K.; Dong, J. Quality-relevant fault detection and diagnosis for hot strip mill process with multi-specification and multi-batch measurements. Journal of the Franklin Institute 2015, 352, 987–1006. 4. Zhang, K.; Dong, J.; Peng, K. A novel dynamic non-Gaussian approach for quality-related fault diagnosis with application to the hot strip mill process. Journal of the Franklin Institute 2016, 354, 702–721. 5. Yin, S.; Ding, S.; Xie, X.; Luo, H. A Review on Basic Data-Driven Approaches for Industrial Process Monitoring. IEEE Transactions on Industrial Electronics 2014, 61, 6418–6428.

35

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

6. Zhou, L.; Chen, J.; Song, Z.; Ge, Z. Semi-supervised PLVR models for process monitoring with unequal sample sizes of process variables and quality variables. Journal of Process Control 2015, 21, 1–16. 7. Ge, Z.; Song, Z.; Gao, F. Nonlinear quality prediction for multiphase batch processes. AIChE Journal 2012, 58, 1778–1787. 8. Li, G.; Qin, S. J.; Zhou, D. Geometric properties of partial least squares for process monitoring. Automatica 2010, 46, 204–210. 9. Zhao, C. Quality-relevant fault diagnosis with concurrent phase partition and analysis of relative changes for multiphase batch processes. Intelligent Control and Automation IEEE 2014, 1372–1377. 10. Zhang, Y.; Qin, S. J. Improved nonlinear fault detection technique and statistical analysis. AIChE Journal 2008, 54, 3207–3220. 11. Qin, S. J. Statistical process monitoring: basics and beyond. Journal of Chemometrics 2010, 17, 480–502. 12. Dong, J.; Zhang, K.; Huang, Y.; Li, G.; Peng, K. Adaptive total PLS based quality-relevant process monitoring with application to the Tennessee Eastman process. Neurocomputing 2015, 2154, 77–85. 13. Rosipal, R.; Trejo, L. J. Kernel partial least squares regression in reproducing kernel Hilbert space. Journal of Machine Learning Research 2002, 2, 97–123. 14. Godoy, J. L.; Zumoffen, D. A.; Vega, J. R.; Marchetti, J. L. New contributions to non-linear process monitoring through kernel partial least squares. Chemometrics and Intelligent Laboratory Systems 2014, 135, 76–89. 15. Zhu, Q.; Lin, Q.; Qin, S. J. Quality-relevant fault detection of nonlinear processes based on

36

ACS Paragon Plus Environment

Page 36 of 39

Page 37 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

kernel concurrent canonical correlation analysis. American Control Conference. IEEE 2017, 5404–5409. 16. Qin, S. J.; McAvoy, T. J. Nonlinear PLS modeling using neural networks. Computers and Chemical Engineering 1992, 16, 379–391. 17. Qin, S. J.; McAvoy, T. J. Nonlinear FIR modeling via a neural net PLS approach. Computers and Chemical Engineering 1996, 20, 147–159. 18. Wold, S.; Kettaneh-Wold, N.; Skagerberg, B. Nonlinear PLS modeling. Chemometrics and Intelligent Laboratory Systems 1989, 7, 53–65. 19. Li, C.; Ye, H.; Wang, G.; Zhang, J. A Recursive Nonlinear PLS Algorithm for Adaptive Nonlinear Process Modeling. Chemical Engineering and Technology 2005, 28, 141–152. 20. Shan, P.; Peng, S.; Tang, L.; Yang, C.; Zhao, Y.; Xie, Q.; Li, C. A nonlinear partial least squares with slice transform based piecewise linear inner relation. Chemometrics and Intelligent Laboratory Systems 2015, 143, 97–110. 21. He, X.; Niyogi, P. Locality preserving projections. Advances in Neural Information Processing Systems 2003, 16, 186–197. 22. He, X.; Yan, S.; Hu, Y.; Niyogi, P.; Zhang, H. J. Face recognition using laplacianfaces. IEEE Trans on Pattern Analysis and Machine Intell 2005, 27, 328–340. 23. Zhong, B.; Wang, J.; Zhou, J.; Wu, H.; Jin, Q. Quality-Related Statistical Process Monitoring Method Based on Global and Local Partial Least-Squares Projection. Industrial and Engineering Chemistry Research 2016, 55, 1609–1622. 24. Wang, J.; Zhong, B.; Zhou, J. Quality-Relevant Fault Monitoring Based on Locality Preserving Partial Least Squares Statistical Models. Industrial and Engineering Chemistry Research 2017, 56, 7009–7020.

37

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

25. Lyman, P. R.; Georgakis, C. Plant-wide control of the Tennessee Eastman problem. Computers and Chemical Engineering 1995, 19, 321–331. 26. Zhou, D.; Li, G.; Qin, S. J. Total projection to latent structures for process monitoring. AIChE Journal 2010, 56, 168–178. 27. Lee, J.; Yoo, C.; Lee, I. Statistical process monitoring with independent component analysis. Journal of Process Control 2004, 14, 467–485. 28. Chiang, L. H.; Russell, E. L.; Braatz, R. D. Fault diagnosis in chemical processes using fisher discriminant analysis, discriminant partial least squares, and principal component analysis. Chemometrics and Intelligent Laboratory Systems 2000, 50, 243–252. 29. Deng, X.; Tian, X.; Chen, S. Modified kernel principal component analysis based on local structure analysis and its application to nonlinear process fault diagnosis. Chemometrics and Intelligent Laboratory Systems 2013, 127, 195–209. 30. Ge, W.; Wang, J.; Zhou, J.; Wu, H.; Jin, Q. Incipient Fault Detection Based on Fault Extraction and Residual Evaluation. Industrial and Engineering Chemistry Research 2015, 54, 3664– 3677. 31. Yu, J. Local and global principal component analysis for process monitoring. Journal of Process Control 2012, 22, 1358–1373. 32. Qin, S.; Zheng, Y. Quality-relevant and process-relevant fault monitoring with concurrent projection to latent structures. AIChE Journal 2012, 59, 496–504.

38

ACS Paragon Plus Environment

Page 38 of 39

Page 39 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Outer relationship m

n

d

(3) Project

m

X

global (1) Select neighbors

m

(2)Reconstruct with linear weights

X

Xi

aik

m

Xk

m

=

PX

aik

PXk

P’

i

aij

+

E

PXj

i

d

aij

n

T

Xj

Xlocal

n

n

n

U=BT(Inner relationship) I

n

n

d

(3) Project

I

Y

global (1) Select neighbors

I

(2)Reconstruct with linear weights

Y

Y

i

i

Ylocal

bik

Yk

I

I

=

PYi bik

PYj

bij Yj n

Q’

PYk

bij

n

U

Outer relationship

Figure 18: TOC

39

ACS Paragon Plus Environment

+

F

d

n