A Bibliometric Review and Analysis of Data-Driven Fault Detection

Subscriber access provided by - Access paid by the | UCSB Libraries

Review

A bibliometric review and analysis of data-driven fault detection and diagnosis methods for process systems Md Alauddin, Faisal Khan, Syed Ahmad Imtiaz, and Salim Ahmed Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.8b00936 • Publication Date (Web): 10 Jul 2018 Downloaded from http://pubs.acs.org on July 11, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

A bibliometric review and analysis of data-driven fault detection and diagnosis methods for process systems Md Alauddin, Faisal Khan*, Syed Imtiaz, and Salim Ahmed Centre for Risk Integrity and Safety Engineering (C-RISE) Faculty of Engineering and Applied Science Memorial University of Newfoundland, St. John’s, NL, Canada Corresponding author’s email: [email protected]

Abstract: Accident prevention is one of the most desired and challenging goals in process industries. For accident prevention, fault detection and diagnosis (FDD) is critical. FDD has been an active area of research for decades. The focus of the current review is on the data-driven techniques as we are now in a digital era and data analytics is getting more emphasis in all areas including process industries. The analysis is done to address the following fundamental questions: (i) How are the leading areas evolving? (ii) Who are the contributing authors? (iii) What are the key sources and domains of publications? and (iv) Which countries are active in this research area? Furthermore, we briefly described four techniques, the principal component analysis (PCA), the partial least squares (PLS), the independent component analysis (ICA), and the Gaussian mixture model (GMM), to represent the state of the art algorithms from different periods. It was observed that significant works in this field are being carried out throughout the world including the developed and the developing countries. China is emerging as the leading contributor to the total number of publications while Singapore is the country with the highest per-capita publication. Finally, the linkage between different type of publications, especially between the engineering journals and the industrial journals, is growing. This indicates that these techniques are gaining industrial importance. It can be concluded that the data-based process monitoring is developing rapidly and being applied in process industries; nevertheless, the pace of application in the process industries is not at par with the pace of theoretical development. Keywords: fault detection, fault diagnosis, process system, process monitoring, bibliometric analysis.

Nomenclature Symbols A AE ANN AP AR ARIMA ARMA BC

Meanings correlation matrix auxiliary equipment artificial neural network industrial applications in process systems Journal article autoregressive integrated moving average autoregressive moving average book chapter

ACS Paragon Plus Environment

1

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

BIP Bk CCA Co CP CUSUM DA DGMM DICA Dr EM ES EWMA FDD FGMM FKICA FL G GMM HOEVD HOS ICA JADE K KPCA LPP M MD MIPLS MKICA MKPCA ML MPLS N NKGMM NLGBN OP OPC O-PLS P P

Page 2 of 40

Bayesian inference-based probability index Book Canonical Correlation Analysis conference article chemical process systems cumulative sum Discriminant analysis dynamic Gaussian mixture model dynamic independent component analysis regularized Mahalanobis distance covariance of ith Gaussian component expectation maximization Expert System exponentially weighted moving average fault detection and diagnosis finite Gaussian mixture model fault-related kernel independent component analysis Fuzzy Logic the multivariate Gaussian probability density function Gaussian mixture model higher-order eigenvalue decomposition higher order statics independent component analysis joint approximate diagonalization of Eigen matrices number of Gaussian components kernel principal component analysis Locality preserving projections algorithm and methods development maximal diagonality multiway interval partial least squares multiway kernel ICA multiway kernel principal component analysis maximum Likelihood multi-way partial least square total number of training samples nonlinear kernel Gaussian mixture model nonlinear Gaussian belief network offshore processes optimal principal components orthogonal projections to latent structures transformation matrix probability density function


2

Page 3 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


pj PC PCA PE PLS PLS-SEM Q QTA R RKF-PCA RPLS S SDG SDG SPC SPCA STOTD SVM T T TPLS U U V W WoS X X X X* xt Y µi

jth eigen vector of the correlation matrix principal components principle component analysis process equipment partial least square Partial least squares structural equation modeling output weight qualitative trend analysis review article robust kernel fuzzy PCA recursive partial least squares the original signal from the source Signed directed graph Signed directed graph statistical process control sparse principal component analysis simultaneous third-order tensor diagonalization Support vector machine transformed data matrix input latent vector total projection to latent structure output latent vector recovered signal number of scalar parameters input weight Web of Science original data input matrix observed signal Back transformed data monitored sample output matrix mean of ith Gaussian component

ωi

weight of ith Gaussian component

ૃj

jth eigen value of correlation matrix


3


1.

Page 4 of 40

Introduction Chemical process systems are high volume, hazardous operations that can result in colossal loss of

lives and assets if an accident occurs. Ensuring efficient process monitoring and early detection and diagnosis of abnormal operations is essential to prevent such losses. This can be achieved by rendering preventive actions such as process monitoring, system reconfiguration, installing safety instrumented systems, timely maintenance, and repair 1. Fault detection, identification, and diagnosis are the essential elements of an efficient process monitoring system. The fault detection evaluates whether the process is operating under normal conditions; the fault identification and the fault diagnosis determine the type of a fault and the cause of the fault, respectively 2. The fault detection and diagnosis methods can broadly be classified into two categories: model-based and data-based. The model-based approaches involve the rigorous development of a process model based on first principles3,4.

The methods based on the first principle are robust and reliable. These methods

have been widely studied, developed and applied to the process systems. However, they are not easily implementable for early fault detection of complex processes because of the difficulty to capture system complexities and nonlinearities in first principles models. In addition, they require reliable a priori quantitative or qualitative knowledge about the process5,6. On the other hand, data-driven methods are based on the available process measurements requiring minimal process knowledge. For example, in application of PCA for FDD variables are not required to be classified as inputs or outputs; instead, all the variables are processed symmetrically. The measured data are transformed into features which are analyzed for fault detection and diagnosis7–9. The data-based fault detection and diagnosis methods can be classified as qualitative or quantitative10,11. The expert systems and qualitative trend analysis (QTA) are two well-known qualitative techniques11. In QTA, the data is represented as primitives which establish the trend12. The QTA provides relatively accurate results; however, it becomes computationally stiff for complex processes12. The quantitative methods can be classified into two categories: statistical or non-statistical. The support vector machine (SVM), and the artificial neural network (ANN) are the two most commonly used non-statistical based methods. They are also referred as supervised learning methods. SVM and ANN perform FDD by constructing learning models of labelled historical data and comparing the models against the current process data. The major drawback of these supervised FDD techniques is the requirement of a substantial number of labelled data samples for training13. Statistical methods such as principal component analysis


4

Page 5 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


(PCA) and partial least squares (PLS) use projection of data to a lower dimensional space for achieving fault detection and diagnosis14,15. Data-based methods of fault detection and diagnosis (FDD) can further be classified as a univariate or a multivariate method. In the univariate methods, each variable is monitored individually by limit checking or trend checking using conventional statistical process control (SPC) charts such as the Shewhart control charts, the cumulative sum (CUSUM) control charts, and the exponentially weighted moving average (EWMA) control charts16. Although the univariate control strategy is still dominant in process industries, their performance deteriorates when the variables are highly correlated17. Moreover, they cause alarm flooding which is becoming a major challenge for the safety of many process systems. In contrast, the multivariate methods are based on the cumulative effects of a group of variables. They take into account the correlation between the variables and has better capability of discerning operational changes from process faults. Furthermore, they result in fewer monitoring indices than the univariate models which prevent alarm flooding18. Fig. 1 presents a classification of well-known data-based FDD methods. Some of the prominent multivariate statistical methods for FDD have been discussed in Sec 3.2.1-3.2.4.

Fig.1: Classification of data-based FDD methods Many researchers have reviewed and compared the data-based fault detection and diagnostic techniques. For instance, Venkatsubramanium et al19 examined the fault diagnostic methods based on process history. They distinguished the distinct techniques based on ten striking characteristics and


5


Page 6 of 40

concluded that no single technique is better with respect to all the aspects. However, being an early article, this does not include recently developed methods such as Gaussian mixture model, Markov model, and vine copula-based techniques. Ge, Song and Gao20 reviewed the data-based process monitoring techniques with respect to nonlinear, non-Gaussian, multimode and dynamic nature of the processes. Similarly, Stosch et al21 presented a review of hybrid and semi-parametric FDD methods. Some of the other notable works for the survey of data-driven FDD includes: Dai & Gao7, Gao, Cecati, & Ding22, Ge2, Qin8, Tidriri et al23, Yin et al24 and Yin et al25. In this paper, a review and analyses of data-driven fault detection and diagnosis in process systems are presented. The aim is to (i) provide an overview of the development of various data-based methods, (ii) find out the significant contributors to data-based fault detection and diagnosis, (iii) discover the principal sources and (iv) mark the countries active in this research area. A pictorial representation of the structure of the subsequent sections is presented in Fig. 2.

Fig. 2: Structure of the forthcoming contents of the paper.

2. Method of investigation The bibliographic data used in this study were retrieved in the first week of November 2017 from the Web of Science (WoS) Core Collection and Scopus databases. Many search keywords related to databased monitoring of process systems were used to maximize the number of bibliographic data. The search strings were as followsTOPIC: (Data-based OR data-driven OR supervised OR unsupervised OR Process history based OR statistical OR Multivariate Statistical OR parametric OR semi parametric OR non-parametric OR history-based OR causality-based OR Logic based)


6

Page 7 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


AND TOPIC: (fault detect* OR fault diagnos* OR process monitor* OR "fault isolation" OR abnormal situation prediction OR abnormal situation management OR fault management OR process fault OR statistical process monit* OR fault identification) AND TOPIC: (Process system* OR Process indust* OR chemical process* OR chemical process industries OR industrial Process* OR biochemical processes OR offshore process OR offshore process systems) The search resulted in 9889 documents in the WoS database. The results were further refined by subject areas and limited to the Electrical and Electronics Engineering, Automation control system, Computer science and artificial intelligence, Chemical Engineering, Mechanical Engineering, Industrial Engineering, Manufacturing Engineering, Petroleum Engineering, Multidisciplinary Engineering, Statistics, and Applied Chemistry. This resulted in 5051 documents. Similarly, 16205 documents were recorded in the Scopus database which were further refined to 9752 documents for the areas of Engineering (ENGI), Computer Science (COMP), Mathematics (MATH) and Chemical Engineering (CENG). The comparatively higher number of documents in Scopus for the same keywords based search is due to two reasons: a larger number of conferences indexed in Scopus and fewer numbers of categories by subject areas. For example, Scopus indexes all the engineering papers in two groups: ENGI and CENG. On the contrary, WoS records those under several categories such as Aerospace Engineering, Agricultural Engineering, Automation Control System, Biomedical Engineering, Electrical and Electronics Engineering, Environmental Engineering, Civil Engineering, Chemical Engineering, Industrial Engineering, Instrumentation, Manufacturing Engineering, Marine Engineering, Mechanical Engineering, Ocean Engineering, Petroleum Engineering and Multidisciplinary Engineering, to name a few. However, further refinement in Scopus can be achieved based on sources. The combined bibliographic data were screened for eliminating duplicates and irrelevant documents. This resulted in 4081 records that were analyzed using Vosviewer and MS Excel to extract meaningful outcomes. The stepwise description of the search process is given in Fig. 3.


7


Page 8 of 40

Step 1: Keywords selection

Step 2: Document search in database • • • • • •

Search in Scopus database and saving the full record (csv format) Search in Web of science database and saving the full record (Tabdelimited format) Conversion of WoS data from Tabdelimited to csv format Conversion of WoS(csv) to Scopus (csv) manually Combining the data from both sources to a single file Screening for duplicates or irrelevant data

Step 3: Analysis of the bibliometric data using Vosviewer and MS Excel

Fig 3: Stepwise description of the investigation method 3.

Results and discussion:

The bibliometric data were processed using Vosviewer 1.6.5 (www.vosviewer.com) and MS Excel to know the status of the FDD methods and applications, the publication trend, contributors, sources and active countries in this field. The following sections discuss the findings. 3.1 Publication trend: 4081 manuscripts were found related to the data-based fault detection and diagnosis in process systems. They were categorized based on types of publication as journal articles, conference papers, books, book-chapters, reviews and editorials (Table 1). Journal articles were found to have the largest share of documents (61.82 %) followed by conference papers (35.82 %). Fig. 4a presents the classification based on the foci of the articles. For a category, the numerical value represents the number of documents published while the percentage value is its percentage share out of the total publications. The symbols M, R, CP, and OP have the following meanings; M: publications with a focus on the development of algorithms and methods, R; review articles, CP: articles related to applications in the process industries and OP: articles with applications in offshore processing. PE represents publication with applications of these techniques to process equipment such as reactor, heat exchanger, pump,


8

Page 9 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


compressor, and distillation column, whereas AE stands for publications with application to the auxiliary equipment such as sensors. It shows that 78 % of the publications were devoted to algorithms and methods development, whereas only 20% were on applications to process systems. Table 1: Types of publications related to data-driven FDD

Document type

Number of

Percentage share

documents

of document

Article

2523

61.82%

Book

5

0.12%

Book Chapter

23

0.56%

Conference Papers

1462

35.82%

Conference Review

1

0.02%

Editorial

4

0.10%

Note

1

0.02%

Review Article

62

1.52%

4081

100.00%

Total

Fig. 4a: Categories of publications on data-based FDD (M= algorithm and methods development; R= review article; CP = chemical process systems, OP= offshore processes, PE= process equipment, AE= auxiliary equipment)

Fig. 4b displays the trends of publication in distinctive areas of FDD over the last four decades. The symbols CP, OP, PE, and AE have the same meanings as above, while AP denotes the publications


9


Page 10 of 40

related to applications. CP, OP, PE, and AE are de facto subsets of AP. The publications in this field commenced in the early 1980s; grew quadratically in the mid-90s and almost exponentially after 2000. These techniques started getting tried for industrial application by 2000. However, their application in process systems is not at the pace of their development, although a weak linear function. By now, these techniques can be used for monitoring and fault diagnosis of a process, process equipment or auxiliary devices. To be specific, 182 publications were documented for the application of data-based monitoring in process systems during the last three years (2015-17). It comprises 134 documents for the monitoring of the chemical process, 5 for the offshore processes, 20 for process equipment and 23 for auxiliary equipment.

Fig. 4b: Number of documents published over the last four decades in the area of data-based FDD in process systems (M= algorithm and methods development; R= review article; CP = chemical processes systems, OP= offshore processes, PE= process equipment, AE= auxiliary equipment, AP= industrial application in process systems)

Based on these trends we can classify the evolution and development of the research on FDD in four categories namely, formulation of basic data-driven algorithms, advancement of the algorithms, applications of the algorithms in process systems, and development and application of advanced hybrid techniques in the process systems (Fig. 4c).


10

Page 11 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Fig. 4c: The evolution of data-based FDD algorithms in process systems

3.2 Key areas: The data-driven fault detection and diagnosis has become an established field with numerous emerging areas. The keywords used in a research field are an indication of the diverse nature of that field. Accordingly, 7195 keywords were recorded in this study. Fig.5a shows the most frequently used keywords under the domain of data-based FDD. The distinct color represents the density of the clusters whereas the relative font size indicates their frequencies of occurrences26. The relative font size could be an indication of their significance and relative advancement.

Fig. 5a: Frequently used author’s keywords in data-based FDD (Red: highest density cluster; Yellow: medium density cluster; Green: low density cluster)


11


Page 12 of 40

It comprises distinct keywords related to the areas of applications and featured algorithms. Fault analysis (detection, diagnosis, isolation, prediction, and identification), process monitoring, anomaly detection, fault tolerant control, abnormal situation management and condition-based maintenance were the central areas (Fig. 5b).

Fig. 5b: key areas of applications in FDD (Red: highest density cluster; Yellow: medium density cluster; Green: low density cluster)

Fig. 5c: The distinct data-driven fault detection and diagnostic methods and algorithms used in process systems (Red: highest density cluster; Yellow: medium density cluster; Green: low density cluster)


12

Page 13 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Fig. 5c presents the distinguished FDD methods developed to date. It can be observed that two distinct nuclei, characterized by multivariate and univariate methods are formed. Multivariate techniques have received more attention in recent years and techniques such the principal component analysis (PCA) and the partial least squares (PLS) are well-established. In addition, many new techniques are emerging such as independent component analysis (ICA), Bayesian network (BN) and self-organizing map (SOM). Shewhart chart, exponentially weighted moving average (EWMA), cumulative sum (CUSUM) control chart, average run length, autoregressive integrated moving average (ARIMA) and autoregressive moving average (ARMA) are sensor-based methods. The other noticeable clusters include knowledge-based methods and clustering algorithms such as k-means clustering and Gaussian mixture model. In addition, many methods derived from knowledge-based and sensor-based approaches have been evolved. This includes artificial neural network (ANN), support vector machines (SVM), pattern recognition, fuzzy logic (FL) and hidden Markov’s methods (HMM). Fig 5d shows These FDD algorithms with their subclasses have been shown in Fig. 5d-e.

Fig. 5d: Multivariate data-driven FDD methods


13


Page 14 of 40

Fig. 5e: The univariate data-driven FDD methods

Fig. 5f presents the breakthrough of the 15 most commonly employed FDD techniques in the last 30 years. PCA, ANN, FL, BN, and PLS were registered with the highest number of publications. However, BN, HMM, GMM, IC, and SVM were recorded with significant growth in the last five years. Fig 5g shows the status of some of the successful combination of these FDD techniques. ANN-FL, ANNPCA, PCA-ICA, PCA-BN, PCA-SVM, and ICA-SVM were the most frequently used combinations. It also appears that BN is emerging as a preferred algorithm for combination with other algorithms.

Fig. 5f: Breakthrough of distinct FDD algorithms in process systems in last 30 years


14

Page 15 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Fig. 5g: Growth of major recombination of distinct algorithms in process systems in last 30 years

The FDD techniques could be broadly divided into three categories based on the relative occurrences: i.

Most frequent occurring keywords: PCA, PLS ANN, and FL

ii.

Moderately occurring: SVM, BN, and ICA

iii.

Less frequently occurring: SOM, HMM, GMM, Discriminant analysis (DA), Locality preserving projections (LPP), SDG (Signed directed graph) and loss function models.

PCA and PLS are matured tools and have been extensively applied in the process systems. These algorithms possess the properties of simplicity and adaptability which resulted in their application to various process industries. For instance, the major advantages of PCA and PLS are their ability to handle large numbers of highly correlated data, measurement errors, and missing data. Dimension reduction feature of these algorithms makes them the ideal choice for combination with other algorithms for handling big data. ICA is a developing area and being implemented for many industrial applications. GMM is an evolving technique which has a potential to be a superior alternative for handling nonGaussianity of industrial data. ICA produces statically independent features that can capture higher order statistics and the GMM is free of taking the assumption of data distribution. We have selected PCA, PLS, ICA, and GMM to represent the state of the art algorithms from different periods. These algorithms are described briefly in the next section.


15


Page 16 of 40

3.2.1 Principal component analysis (PCA): PCA is one of the most popular and widely used data-based process monitoring technique. The idea behind PCA is the dimensionality reduction of data set while retaining their variations27–29. It is accomplished by extracting a set of loading vectors that maximize the variance in the transformed space30. These orthogonal weight vectors are arranged by retained variance; the feature with maximum variance is called the first principal component. Similarly, the feature which captures the second most variance is named as the second principal component and so on31. For given a data matrix X ∈ℝNxm, where N is the sample and m represents the number of process variables, the Principal Components (PCs) are given by a linear combination of the original variables. The coefficients of each linear combination can be obtained from the eigenvectors of the covariance matrix of the original variables. The Principal Components or score vectors are captured in a matrix T ∈ℝ N×m, [t 1, t2, …, t r , …, t m] such that the PCs are in descending order of capturing the covariance in the original data. The steps for calculation of the principal components (PCs) is summarized below32–34

a. Calculation of the correlation matrix ‘A’ Assuming that X is mean centered, the correlation matrix is given by Eq. 1.

b. Calculation of the eigenvalues ૃj of the matrix A and the eigenvectors p j The eigen vector of the covariance matrix are known as loading. They are the coefficients of the linear equation relating original variables to the transformed variables. =0

j= 1, 2, 3,……., m

(2)

c. Eigenvectors pj, j = 1, ... , r, are first few eigenvectors corresponding to the large eigevalues are selected. d. The variables are mapped from the original space to a new space which are uncorrelated (4) Where

is transformation matrix

e. Finally, the data are projected back to the original domain (5)


16

Page 17 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


The fault detection is done by using Hotelling T2 statistics and squared prediction error (SPE) which is also known as Q- statistics. The T2 is the sum of normalized squared scores. Assuming the data to be mean centred, T2-statistics and SPE are given by Eq.6 and Eq.7 respectively,

where Λr is a diagonal matrix containing the r largest eigenvalues, λi and ti refers to the ith row of Tr ∈ℝN×r , the matrix of first r retained scores vectors from the PCA model. The fault will be detected if the indices exceed the corresponding thresholds

)

Where

is the normal deviate corresponding to the upper 1 − α percentile.

An argument against PCA is that it cannot work with non-Gaussian and nonlinear data35–37. Moreover, the Pearson’s dependence on the covariance matrix causes PCA sensitive to outliers38. This makes PCA less attractive for practical applications. The problem of nonlinearity can be handled by its variant-kernel principal component analysis (KPCA) that transforms the nonlinear variables in a higher dimensional space where variables are linear and uses PCA for fault detection and diagnosis39. However, KPCA has difficulty in fault diagnosis since the contribution plot cannot be calculated in the original space because of the irreversible nature of kernel function. Many non-parametric and semi-parametric PCA models have been developed to widen the scope for industrial application. Yu et al38,40 came up with a new formulation for PCA based on semi-parametric Gaussian transformation and distance correlation coefficient. The multiway kernel principal component analysis (MKPCA) was proposed for dynamic data41. Many authors proposed computationally efficient methods to deal with big data from complex industrial processes. For instance, Zass and Shashua42 developed the non-negative sparse PCA for


17


Page 18 of 40

creating a low dimensional representation of data. Heo, Gader, and Frigui43 instituted a robust kernel fuzzy PCA (RKF-PCA) based on fuzzy iterative scheme and kernel mapping.

Nonetheless, these

methods exhibit inadequate feature extraction capability due to the limited exploitability of up to the second order statistics of the data set. The timeline for the development of these techniques is presented in Fig. 5h.

Fig. 5h: Timeline of PCA based FDD methods

3.2.2 Partial Least Squares (PLS): PLS is based on the extraction of latent variables from input and output while maximizing the covariance between a pair of latent variables44,45. It finds a set of orthogonal components that maximize the explanation of both input and output; and furnishes the predictive dependence between them. In effect, PLS analysis appends the operation of PCA with Canonical Correlation Analysis (CCA) and Multivariate Linear Regression (MLR)46. The PLS algorithms involves projection of X ∈ RN×l and Y ∈ RN×m onto the latent variables T ∈ RN×γ for establishing the correlation model of X and Y.

Where γ is the number of latent variable P ∈ Rl×γ and Q ∈ Rm×γ are the loading matrices of X and Y. Considering the input matrix X and output matrix Y are normalized (zero mean and unit variance), the objective function of PLS is given in Eq. 14 and Eq. 1547.


18

Page 19 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Where, w and q denote input and output weights, respectively. The optimal solution of this problem is given by Eq. 16 and Eq. 1747.

Where λw and λq are eigenvalues and w and q are eigenvectors of corresponding matrices. Subsequently, the input and output latent score vectors t and u can be calculated as t = Xw, and u = Yq, which is referred to as outer model. After the outer model is obtained, the inner model can be built between the latent scores u and t as follows (Eq. 18)

After extracting the set of score vectors and loading vectors, deflated matrices of X and Y are calculated for the next set of vectors (Eq. 19-20)48. This procedure is repeated iteratively to get the result47.

Although an effective technique, PLS cannot handle nonlinearities, non-Gaussianity, and dynamic problems. Thus, Kaspar and Harmon Ray49 proposed dynamic PLS for capturing time-varying nature of process data. Joe Qin50 formulated recursive partial least squares (RPLS) algorithms for on-line process modelling to adapt process changes and off-line modelling to deal with a large number of data samples. PLS does not perform well when the input space has significant variation orthogonal to the output space. Trygg and Wold

51

proposed the orthogonal projections to latent structures (O-PLS) that

removes variation from X (descriptor variables) that is not correlated to Y (property variables). Zhou, Li, and Qin15 proposed total projection to latent structure (TPLS) which makes further decomposition on particular subspaces. In addition, multi-way partial least squares (MPLS)52 has been recognized for its ability to model processes where the variables are highly correlated. Hair et al53 describes partial least squares structural equation modelling (PLS-SEM) for non-normal data and small sample sizes. Stubbs, Zhang, and Morris54 prescribed multiway interval partial least squares (MIPLS) for efficient monitoring of batch processes. Fig. 5i shows the timeline for the development of PLS-based methods.


19


Page 20 of 40

Fig. 5i: Timeline of PLS-based FDD methods

3.2.3 Independent component analysis (ICA): ICA is optimal for non-Gaussian data which is frequently encountered in process systems. It is based on the higher order statistics (HOS) such as kurtosis and negative-entropy. Unlike PCA, ICA vectors are not orthogonal, and each component is equally important. The mathematical model for ICA can be represented by Eq. 21-22 55. u= Wx (21) x= As (22) Where, s, x and u represent the original signal from a source, the observed signal, and the recovered signal. A and W denotes the mixing matrices. For the given observed signal (x), both mixing matrix (A) and source (s) are estimated. ICA can be achieved by three routes 56: i.

Perform PCA and subsequently fixing the remaining degree of freedom in HOS.

ii.

Working directly with HOS.

iii.

Combined exploitation of second- and higher-order statistics.

Several algorithms have been developed based on the above approaches. For example, FastICA algorithm is based on whitening and projection to the non-orthogonal weight vectors. Here, the optimal weight vectors are iteratively determined by maximizing kurtosis. The FastICA algorithms are parallel, distributed, computationally simple and cubic convergent. However, being based on the Newton Raphson method, it gets trapped at local minima57. The other well-known ICA algorithms include simultaneous third-order tensor diagonalization (STOTD), joint approximate diagonalization of eigenmatrices (JADE),


20

Page 21 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


higher-order eigenvalue decomposition (HOEVD), maximal diagonality (MD) and maximum likelihood (ML)58. A major drawback of the conventional approach of ICA is that the independent components cannot be ranked with respect to the amount of variance explained by them 59. In addition, ICA is limited to the linear systems and uses kernel based mapping for dealing with nonlinear systems. However, it becomes computationally expensive with increasing number of samples. Moreover, the fault identification and diagnosis are difficult due to the irreversible nature of the kernel mapping60. Many researchers came up with modifications in the existing algorithm to deal with the limitations. For instance, Zhang & Zhang61 introduces modified ICA based on particle swarm optimization (PSO). Stefatos and Hamza62 used the concept of dynamic independent component analysis (DICA) methods for FDD for capturing the dynamic pattern. Zhang and Qin63 proposed multiway kernel ICA (MKICA) method for dealing with nonlinearities in industrial systems. Similarly, Rashid and Yu64 introduced the concept of hidden Markov’s model in ICA for handling multimodal problems of industrial processes. Cai and Tian65 and Cai, Tian, and Chen66 proposed robust ICA for handling outliers and noisy data. Tong, Palazoglu, and Yan67 suggested improved ICA based on ensemble learning and Bayesian inference. Du et al68 introduced fault-related kernel independent component analysis (FKICA) which decomposes data into four subspaces and makes the algorithm more sensitive to specific faults. Fig. 5j presets the developmental stages of ICA based methods.

Fig. 5j: Timeline of ICA based FDD methods

3.2.4 Gaussian mixture model (GMM): Gaussian mixture model (GMM) is one of the more recent algorithms to deal with non-Gaussian data. It is a statistical method based on the weighted sum of probability density functions of multiple Gaussian distributions. The number of Gaussian components K (along with the statistical parameters) is


21


Page 22 of 40

approximated by the expectation maximization (EM) algorithm which is composed of two steps, namely the expectation step (E-step) and a maximization step (M-step)69,70. The E-step determines the posterior probability, given all the parameters and the prior probabilities, while M-step finds all the parameters. The maximum likelihood solution is obtained by repeating the E- and M-step in an iterative procedure. Thissen et al71 came up with the GMM based on Bayesian information criteria (BIC). First of all a Gaussian mixture model is built from the normal operating data with multiple steady states. The optimal number of Gaussian components is determined by iterative estimation. The contribution index is calculated for each Gaussian component. Finally, the major faulty variables are identified by integrating the multiple local contribution indices into a single global contribution index by using Bayesian inference. For an m-dimensional sample point x from a multimode process, the probability density function can be expressed by Eq. 2369.

Essentially, the probability density function of Gaussian mixture model is

equivalent to the weighted sum of the density functions of all Gaussian components.

Where

µi,

and

ωi

represent

the

mean,

the

covariance

and

the

the ith Gaussian component while K is the number of Gaussian components. multivariate Gaussian probability density function with the mean µi and covariance

weight

of

denotes the Mathematically,

can be expressed as follows 69-

(24) The total number of Gaussian components k and their statistical parameters estimated from a modified expectation maximization (EM) algorithm based. After obtaining the GMM, the posterior probability of an arbitrary monitored sample x belonging to each Gaussian component can be obtained through Bayesian inference. With the initial estimate

, the posterior probability of the ith training sample xi

within the kth Gaussian component Ck at the lth iteration is computed via the expectation step Eq.25 72.

The initial parameter values knowledge.

can be set through an initial guess based on the prior


22

Page 23 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


In the subsequent maximization step, the probability density parameters of the kth Gaussian component can be further estimated using Eq. 26-2872.

(26)

(28) Where N is the number of training samples used to learn the Gaussian mixture model and represents the number of scalar parameters to be estimated. The above EM iterations will stop when the mean vectors

, the covariance matrices

and the prior probabilities

converge to the stable solutions.

Considering that each component Ck follows a unimodal Gaussian distribution, the squared Mahalanobis distance from the center of Ck follows chi-square distribution. For industrial multivariate process data, usually, there is significant collinearity so the covariance matrix of the measurement data is often of illcondition. This could be dealt with using regularized instead of the standard Mahalanobis distance (Dr) as given in Eq. 2973,74.

Where ϵ is a small value used to remove the potential ill-condition of the covariance matrix

.

The

distance Dr follows an approximate χ2 distribution (Eq. 17)75.

Finally, the faulty variables are determined by the Bayesian inference based probability index (BIP) control chart. The BIP of a monitored sample xt can be formulated as given in Eq. 3169.

(31)

In

Eq.

31

the

Mahalanobis

distance

based

probability

represents the possibility of the data points of the ith operating mode have the shorter distance from the mode center than the monitored sample xt,. The posterior probability

corresponds to the likelihood that the monitored sample xt belongs to the ith


23


Page 24 of 40

mode. Thus we can conclude that the BIP index represents the possibility that the data points in all different modes have the shorter distance from the corresponding mode centers than the monitored sample xt. Li, Qin, and Yuan76 devised a framework for root cause analysis for fault location for stationary as well as dynamic processes. The authors proposed a dynamic time warping based causality analysis for digging the causality relation among candidates. Yu72 devised nonlinear kernel Gaussian mixture model (NKGMM) for handling nonlinearities and introduced dynamic Gaussian mixture model (DGMM) based on particle filter resampling for time-varying operations77. It demonstrates the superior performance in terms of fault detection rate, false alarm rate and faulty variable diagnosis. However, it does not perform well in isolation and classification of different process faults and disturbances under dynamically shifting modes. Yu and Qin70 contrived multiway Gaussian mixture model to monitor a batch process with multiple operational phases. The GMM was integrated with hybrid unfolding of the three-way data matrix to classify different samples into several Gaussian components. Furthermore, Yu and Qin75 came up with finite Gaussian mixture model (FGMM) based on the Figueiredo–Jain (F–J) algorithm for optimization of the number of Gaussian components and estimation of the statistical distribution parameters. It resulted in higher detection rates, lower false alarm rates and shorter delays in fault detection. It is consistently effective in detecting different process faults. Jiang, Huang and Yan78 presented GMM, and optimal principal components (OPCs) based Bayesian method for efficient fault diagnosis. The concept of GMM and Bayesian inference was used to identify the operating mode and then local PCA model was established in each mode. Finally, the OPCs were selected through a genetic algorithm based stochastic optimization technique. Hongyang Yu, Khan and Garaniya79 proposed a probabilistic multivariate method for fault diagnosis of industrial processes. It employs the Gaussian copula based on rank correlation for modelling of dependencies and capturing the nonlinear relationship between process variables. To be specific, the reference dependence structures of the process variables are determined from normal process data and then compared with those obtained from the faulty data samples. The technique is effective in handling nonlinearities; however, it requires longer computational time for convergence. Similarly, Hongyang Yu, Khan and Garaniya80

devised a three-layered nonlinear Gaussian belief network

(NLGBN) for fault diagnosis in industrial systems. They introduced G-index, a novel parameter for process monitoring, based on expectation maximization. This seems to be a robust method that can deal with both nonlinear subspace and noisy data. In addition, it increases accuracy for fault detection compared to the conventional methods. Fig. 5k shows the progression of GMM based methods.


24

Page 25 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Fig. 5k: Timeline of GMM based FDD methods

3.4 Contributors The total number of authors contributing to the data-based process monitoring was found to be 6665; out of which 278 authors have more than five documents, while 98 authors were recorded with more than ten documents. Table 2 lists the most prominent authors in this area. It comprises authors from several countries that indicates the global recognition of the research area.

Macgregor J.F. was the most

frequently cited author with 6037 citations for his 30 documents. Similarly, Venkatasubramanian V, Qin S. J., Rengaswamy R., and Kourti, T. were also highly recognized and cited. Yin S, Tsung F., Zhang Y., Kruger U., Song Z. and Zhang Z were significant contributors and topped the list based on the number of documents published in journals and conferences. Table 2: List of eminent authors based on citations S.No. 1 2 3 4 5 6 7 8 9 10 11 12 13

Author Macgroger J F Venkatsubramannium V Qin S J Rengaswamy R Kourti T Kavuri S N Yin K Nomikos P Yin S Lee I B Ding S X Tsung F Morris A J


Citations* 6037 (30) 2749 (10) 2664 (27) 2551 (8) 2535 (14) 2330 (2) 2330 (2) 2200 (3) 2041 (45) 1659 (18) 1540 (36) 1305 (44) 1099 (31)

25


14 15 16 17 18 18 20 21 22 23 24

Martin E B Kano M Song Z Ge Z Kruger U Zhang Y Yu J Gao F Zou C Zhang J Chen J

Page 26 of 40

1086 (30) 1071 (15) 929 (39) 899 (34) 802 (36) 780 (39) 713 (23) 684 (28) 589 (21) 567 (39) 543 (29)

*The number of citations is based on the publications selected for this study (number of documents in parentheses). Fig. 6a and Fig. 6b presents the type and the category of publications of selected authors respectively. It can be seen that Yin S has the highest number of documents including 24 journal articles, 19 conference papers, one book-chapter and one editorial. However, Tsung F has 39 journal articles, the highest among the selected authors. Tsung F, Song Z, Zhang J and Ding S X can be identified as potential algorithms developer, while, Khan F, Gao F, Zhang Y, Wang H and Ding S X are influential application-oriented authors.

Fig. 6a: Types of publication of selected authors (authors with more than 25 documents) (AR= Journal article; Co= conference article; Bk= book; BC= book chapter)


26

Page 27 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Fig. 6b: Categories of publication of the selected authors (authors with more than 25 docs) (M= Algorithm and methods development; R= Review article; AP = applications)

An analysis for top five most cited authors was carried out to know of the dominant factors for citations. Fig 6c illustrate the selected author’s citations for different category of publications. Macgregor J F and Kourti T have more citations due to algorithms and application-oriented papers while review articles were the dominant factor for the citation of Venkatsubramannium V and Rengaswamy R. However, algorithmbased articles are the primary contributors to the citation for Qin S J. Fig 6d depicts the selected author’s citations for distinct types of publications. Here, AR1 and CO1 are the average citations (citations per documents) of journal and conference articles, respectively. It reveals that journal articles are the principal contributors to citation. Conference articles have a negligible or trivial role in the citation.

Fig. 6c: citations of selected authors for distinct category of publications of data driven FDD in process

Fig 6d: The comparative citations of journal and conference papers of selected authors


27


systems

(AR1= average citations of journal papers

(M= algorithm and methods development; R= review article; AP= industrial application in process systems)

CO1 = average citation of conference articles)

Page 28 of 40

3.5 Publication over geographical areas: The number of publications over a geographical area is a measure of versatility and recognition of a research topic. The work on data-based process monitoring is prevalent throughout the globe. Fig. 7a provides the stature of marked countries active in this area of research. It constitutes both the developed and the developing nations which indicates that the field is perceived important throughout irrespective of the economic statuses of countries. With 988 documents, China topped the list for the highest number of publications. It is followed by USA (850), UK (305) and Canada (243). The publication in the proportion of population and wealth was evaluated and presented in Fig. 7b. The former is expressed in per capita publication (total number of publication divided by entire population of the country) and scaled for number of publications per million people. Singapore was recorded with the highest per capita publication (16.408 publications per million people). It was followed by Taiwan (8.049), Canada (6.696), UK (4.647) and Belgium (4.405). The cumulative effect of population and economy was figured out in terms of effectiveness which is defined as the number of publications per unit average income (Eq. 32). This indicates that how a research field is important to a developed and a developing nation.

(32)

A higher per-capita publication per GDP represents better utilization of the resources (both human and monetary). China and India were found to be most effective countries with the effectiveness of 121.64 and 61.99 publication per unit average income respectively. It can also be observed that all the BRICS countries except Russia record high effectiveness value.


28

Page 29 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Fig. 7a: Number of documents published in the area of data-based FDD by selected countries

Fig. 7b: Efficiency of publication of selected countries

Fig. 7c presents the co-citation network among the marked countries. The sizes of the spheres represent their relative strength of publication (on the number of document basis); the colors imply their evolution stage with respect to time and the linkage represent their co-citation. It can be concluded that up to 2005, USA and UK were the most active countries working in this field. However, since 2012, China is dominant in this field. Canada, Germany, France, Taiwan, South Korea became significant contributors


29


Page 30 of 40

over the past decade. India, Iran, Malaysia, Brazil have established their presence. In addition, Qatar and Saudi Arabia are recently emerging nations in data-based process monitoring. However, USA, UK, Canada, China, France, South Korea, Taiwan, Germany and Hong Kong have been widely acknowledged and cited for their notable contribution. The linkage strength between two countries represents their cocitation index. It illustrates how frequently documents of the two countries are cited by a third country simultaneously.

Fig. 7c: Co-citation network of the countries working in the area of data-based FDD.

3.6 Publishing Sources Fig. 8a shows the major sources which are publishing articles on data-driven fault detection and diagnosis in process systems. The context of journal sources is presented here, though many conferences are imparting remarkable contribution. Quality and Reliability Engineering International, Industrial and Engineering Chemistry Research, Chemometrics and Intelligent Laboratory Systems, Journal of Process Control, Computers and Chemical Engineering, AIChE Journal and International Journal of Production


30

Page 31 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Research are top journals with a large volume of publication. The total number of publications from these sources were registered as 143, 132, 102, 96, 86, 74 and 68, respectively. It can also be noticed that all the sources, except Journal of process control, are producing dominantly in the category of algorithms development. Nonetheless, Journal of Process Control, Quality and Reliability Engineering International, Computers and Industrial Engineering, and Expert Systems with Applications are the major contributors to application-oriented publications. It can be observed that Industrial and Engineering Chemistry Research is the most cited journal with 6021 citations; followed by AIChE Journal (5073), and Journal of Process Control (4271)


31


Page 32 of 40

Fig. 8a: Categories of publication of the selected Sources (M= Algorithm and methods development; R= Review article; AP = applications)

Fig. 8b demonstrates the bibliographic coupling of various sources. Here again, the size of a sphere indicates its relative strength for publication and the color tells about their evolution with respect to time. It demonstrates that AIChE Journal, Industrial and Engineering Chemistry Research, Computers and Chemical Engineering, Chemometrics and Intelligent Lab, Journal of process control, Chemical Engineering Sciences, Journal of Chemometrics, Control Engineering Practices, Automatica, Journal of Quality Technology and Technometrics are some of the profoundly contributing sources. The coupling linkage reveals that up to 2010, the coupling between different journals, especially between engineering and industrial was limited. However, recently this has increased several folds which indicates that these techniques are gaining industrial importance. It can be noticed that Automatica, Computers and Chemical Engineering, Journal of Quality Technology and Technometrics were key contributors at the beginning of 2000. Since 2010, AIChE, Control Engineering Practices, Chemical Engineering Sciences, Journal of Chemometrics and Journal of Manufacturing became significant contributors. Similarly, Chemometrics and Intelligent Control, Journal of Process Control, Expert System with Applications have added essential contributions to the area of data-based process monitoring recently. In addition, Canadian Journal of Chemical Engineering, Process Safety and Environmental Protection, Journal of Loss Prevention, and Quality Technology are appearing in this field. 4.

Conclusion: This paper presented a bibliometric analysis and review of data-driven fault detection and diagnosis

methods for process systems. It investigated the evolution of the leading areas of FDD, the contributing authors, key sources and marked countries active in this research area. It was found that FDD is an emerging research area with an exponential growth rate. However, the application of the methods in process systems is not at par with the pace of their development. It was observed that numerous semiparametric and hybrid techniques have been developed to deal with process nonlinearities, nonGaussianity in data, and complexities of the process systems. It was found that significant works in this field are being carried out throughout the world including the developed and the developing countries. China was seen to be the most active country in terms of the number of publications while Singapore was recorded with the highest per capita publication. Twenty-five most influential authors based on the number of publications and citations were listed. Their works were compared based on types of


32

Page 33 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


publication (i.e. Journal article, conference article, book, and book chapter) and categories of publication (i.e., algorithm and methods development, review article, and application article). Quality and Reliability Engineering International, Industrial and Engineering Chemistry Research, Chemometrics and Intelligent Laboratory Systems, Journal of Process Control, Computers and Chemical Engineering, AIChE Journal and International Journal of Production Research were the top journals with large volumes of publications. The coupling linkage revealed that up to 2010, the coupling between different journals, especially between the engineering journals and the industrial journals were limited. However, recently this has increased several folds which indicates that these techniques are gaining industrial importance. Though statistical process control (SPC) and multivariate statistical control (MSPC) has been around for over three decades now, there is no assessment of the impact or industry acceptance of the methods. Anecdotal evidence shows most of the available literature is from the academic community, describing algorithms that are applied to benchmark systems, or data collected from the industry. Only a few online industrial applications have taken place. We believe that the process industries have not yet fully harnessed the benefits of advanced monitoring tools. There is a surge in the use of artificial intelligence in the Information Technology sector. It is expected that these developed methods will find use in other sectors including process fault detection and diagnosis. As the 19th century is known for coal based technologies and 20th century for oil based systems; the current century is being known for digitalization and data mining. The advancement in computing power has expedited the data mining and the feature extraction for capturing the dynamics of complex systems. The data-based algorithm will play a central role in process monitoring, control, fault detection and diagnosis, and most importantly in safety management. It can be concluded that the data-based FDD is a growing research area. However, it is at the nascent stage with respect to applications in process systems with potential for broader uses in the near future 5.

Acknowledgements

The authors thankfully acknowledge the financial support provided by the Natural Science and Engineering Council of Canada and the Canada Research Chair (Tier I) Program in Offshore Safety and Risk Engineering.


33

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

Page 34 of 40

Fig. 8b: Bibliographic coupling of distinct publication sources


34

Page 35 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


References (1)

Niu, G.; Yang, B. S.; Pecht, M. Development of an Optimized Condition-Based Maintenance System by Data Fusion and Reliability-Centered Maintenance. Reliab. Eng. Syst. Saf. 2010, 95 (7), 786–796.

(2)

Ge, Z. Review on Data-Driven Modeling and Monitoring for Plant-Wide Industrial Processes. Chemom. Intell. Lab. Syst. 2017, 171,16–25.

(3)

Liu, S.; McDermid, J. A.; Chen, Y. A Rigorous Method for Inspection of Model-Based Formal Specifications. IEEE Trans. Reliab. 2010, 59 (4), 667–684.

(4)

Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.; Kavuri, S. N. A Review of Process Fault Detection and Diagnosis Part I: Quantitative Model-Based Methods. Comput. Chem. Eng. 2003, 27(3), 293–311.

(5)

Staroswiecki, M. Quantitative and Qualitative Models for Fault Detection and Isolation. Mech. Syst. Signal Process. 2000, 14, 301–325.

(6)

Isermann, R. Model-Based Fault-Detection and Diagnosis - Status and Applications. Annu. Rev. Control. 2005, 29(1), 71–85.

(7)

Dai, X.; Gao, Z. From Model, Signal to Knowledge: A Data-Driven Perspective of Fault Detection and Diagnosis. IEEE Trans. Ind. Informatics 2013, 9 (4), 2226–2238.

(8)

Qin, S. J. Survey on Data-Driven Industrial Process Monitoring and Diagnosis. Annu. Rev. Control. 2012, 39(2), 220–234.

(9)

Li, D.; Hu, G.; Spanos, C. J. A Data-Driven Strategy for Detection and Diagnosis of Building Chiller Faults Using Linear Discriminant Analysis. Energy Build. 2016, 128, 519–529.

(10)

Hwang, I.; Kim, S.; Kim, Y.; Seah, C. E. A Survey of Fault Detection, Isolation, and Reconfiguration Methods. IEEE Trans. Control Syst. Technol. 2010, 18 (3), 636–653.

(11)

Venkatasubramanian, V.; Rengaswamy, R.; Ka, S. N.; Kavuri, S. N.; Ka, S. N. A Review of Process Fault Detection and Diagnosis Part II : Qualitative Models and Search Strategies. Comput. Chem. Eng. 2003, 27 (3), 313–326.

(12)

Maurya, M. R.; Rengaswamy, R.; Venkatasubramanian, V. Fault Diagnosis by Qualitative Trend Analysis of the Principal Components. Chem. Eng. Res. Des. 2005, 83 (9), 1122–1132.

(13)

Zhu, X.; Goldberg, A. B. Introduction to Semi-Supervised Learning. Synth. Lect. Artif. Intell. Mach. Learn. 2009, 3 (1), 1–130.

(14)

Kourti, T.; MacGregor, J. F. J. F. Process Analysis, Monitoring and Diagnosis, Using Multivariate Projection Methods. Chemom. Intell. Lab. Syst. 1995, 28, 3–21.

(15)

Zhou, D.; Li, G.; Qin, S. J. Total Projection to Latent Structures for Process Monitoring. AIChE J. 2010, 56 (1), 168–178.

(16)

De Vargas, V. D. C. C.; Lopes, L. F. D.; Souza, A. M. Comparative Study of the Performance of the CUSUM and EWMA Control Charts. Comput. Ind. Eng. 2004, 46, 707–724.

(17)

Harrou, F.; Nounou, M. N.; Nounou, H. N.; Madakyaru, M. PLS-Based EWMA Fault Detection Strategy for Process Monitoring. J. Loss Prev. Process Ind. 2015, 36, 108–119.

(18)

Anderson, M. J. Multivariate Control Charts for Ecological and Environmental Monitoring. Ecol. Appl. 2004, 14 (6), 1921–1935.

(19)

Venkatasubramanian, V.; Rengaswamy, R.; Kavuri, S. N.; Yin, K. A Review of Process Fault Detection and


35


Page 36 of 40

Diagnosis Part III: Process History Based Methods. Comput. Chem. Eng. 2003, 27(3), 327–346. (20)

Ge, Z.; Song, Z.; Gao, F. Review of Recent Research on Data-Based Process Monitoring. Ind. Eng. Chem. Res. 2013, 52 (10), 3543–3562.

(21)

von Stosch, M.; Oliveira, R.; Peres, J.; Feyo de Azevedo, S. Hybrid Semi-Parametric Modeling in Process Systems Engineering: Past, Present and Future. Comput. Chem. Eng. 2014, 60, 86–101.

(22)

Gao, Z.; Cecati, C.; Ding, S. X. A Survey of Fault Diagnosis and Fault-Tolerant Techniques Part I: Fault Diagnosis. IEEE Trans. Ind. Electron. 2015, 62 (6), 3768–3774.

(23)

Tidriri, K.; Chatti, N.; Verron, S.; Tiplica, T. Bridging Data-Driven and Model-Based Approaches for Process Fault Diagnosis and Health Monitoring: A Review of Researches and Future Challenges. Annu. Rev. Control. 2016, 42,63–81.

(24)

Yin, S.; Ding, S. X.; Haghani, A.; Hao, H.; Zhang, P. A Comparison Study of Basic Data-Driven Fault Diagnosis and Process Monitoring Methods on the Benchmark Tennessee Eastman Process. J. Process Control 2012, 22 (9), 1567–1581.

(25)

Yin, S.; Ding, S. X.; Xie, X.; Luo, H. A Review on Basic Data-Driven Approaches for Industrial Process Monitoring. IEEE Transactions on Industrial Electronics. 2014, 61(11), 6414–6428.

(26)

van Eck, N. J.; Waltman, L. Software Survey: VOSviewer, a Computer Program for Bibliometric Mapping. Scientometrics 2010, 84 (2), 523–538.

(27)

Jackson, J. E. A User’s Guide to Principal Components; John Wiley & Sons Inc., 2004,4-25.

(28)

Van Der Maaten, L. J. P.; Postma, E. O.; Van Den Herik, H. J. Dimensionality Reduction: A Comparative Review. J. Mach. Learn. Res. 2009, 10, 1–41.

(29)

Partridge, M. Fast Dimensionality Reduction and Simple PCA. Intell. Data Anal. 1998, 2 (3), 203–214.

(30)

Wang, H.; Shangguan, L.; Guan, R.; Billard, L. Principal Component Analysis for Compositional Data Vectors. Comput. Stat. 2015, 30 (4), 1079–1096.

(31)

Yu, H.; Khan, F.; Garaniya, V. Nonlinear Gaussian Belief Network Based Fault Diagnosis for Industrial Processes. J. Process Control 2015, 35, 1–39.

(32)

Imtiaz, S. A.; Shah, S. L.; Patwardhan, R.; Palizban, H. A.; Ruppenstein, J. Detection, Diagnosis and Root Cause Analysis of Sheet-Break in a Pulp and Paper Mill with Economic Impact Analysis. Can. J. Chem. Eng. 2007, 85 (4), 512-525.

(33)

Wise, B. M.; Gallagher, N. B. The Process Chemometrics Approach to Process Monitoring and Fault Detection. J. Process Control 1996, 6 (6), 329–348.

(34)

Joe Qin, S. Statistical Process Monitoring: Basics and beyond. J. Chemom. 2003, 17 (8–9), 480–502.

(35)

Lawrence, N. Probabilistic Non-Linear Principal Component Analysis with Gaussian Process Latent Variable Models. J. Mach. Learn. Res. 2005, 6, 1783–1816.

(36)

Liang, Z.; Lee, Y. Eigen-Analysis of Nonlinear PCA with Polynomial Kernels. Stat. Anal. Data Min. 2013, 6 (6), 529–544.

(37)

Mika, S.; Schölkopf, B.; Smola, A.; Müller, K.; Scholz, M.; Rätsch, G. Kernel PCA and De-Noising in Feature Spaces. Analysis 1999, 11 (i), 536–542.

(38)

Yu, H.; Khan, F.; Garaniya, V. An Alternative Formulation of PCA for Process Monitoring Using Distance Correlation. Ind. Eng. Chem. Res. 2016, 55 (3), 656–669.


36

Page 37 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


(39)

Gharahbagheri, H.; Imtiaz, S. A.; Khan, F. Root Cause Diagnosis of Process Fault Using KPCA and Bayesian Network. Ind. Eng. Chem. Res. 2017, 56 (8), 2054–2070.

(40)


(41)

Lee, J.-M.; Yoo, C.; Lee, I.-B. Fault Detection of Batch Processes Using Multiway Kernel Principal Component Analysis. Comput. Chem. Eng. 2004, 28 (9), 1837–1847.

(42)

Zass, R.; Shashua, a. Nonnegative Sparse PCA. Adv. Neural Inf. Process. Syst. 2007, 19, 1561–1568.

(43)

Heo, G.; Gader, P.; Frigui, H. RKF-PCA: Robust Kernel Fuzzy PCA. Neural Netw. 2009, 22 (5–6), 642–650.

(44)

Becker, J. M.; Klein, K.; Wetzels, M. Hierarchical Latent Variable Models in PLS-SEM: Guidelines for Using Reflective-Formative Type Models. Long Range Plann. 2012, 45 (5–6), 359–394.

(45)

Abdi, H. Partial Least Squares Regression and Projection on Latent Structure Regression (PLS Regression). Wiley Interdiscip. Rev. Comput. Stat. 2010, 2 (1), 97–106.

(46)

Wang, M.; Yan, G.; Fei, Z. Kernel PLS Based Prediction Model Construction and Simulation on Theoretical Cases. Neurocomputing 2015, 165, 389–394.

(47)

Dong, Y.; Qin, S. J. Dynamic-Inner Partial Least Squares for Dynamic Data Modeling. IFAC-PapersOnLine 2015, 28 (8), 117–122.

(48)

Wold, S.; Sjöström, M.; Eriksson, L. PLS-Regression: A Basic Tool of Chemometrics. Chemom. Intell. Lab. Syst. 2001, 58 (2), 109–130.

(49)

Kaspar, M. H.; Harmon Ray, W. Dynamic PLS Modelling for Process Control. Chem. Eng. Sci. 1993, 48 (20), 3447–3461.

(50)

Joe Qin, S. Recursive PLS Algorithms for Adaptive Data Modeling. Comput. Chem. Eng. 1998, 22 (4–5), 503– 514.

(51)

Trygg, J.; Wold, S. Orthogonal Projections to Latent Structures (O-PLS). J. Chemom. 2002, 16 (3), 119–128.

(52)

Nomikos, P.; MacGregor, J. F. Multi-Way Partial Least Squares in Monitoring Batch Processes. Chemom. Intell. Lab. Syst. 1995, 30 (1), 97–108.

(53)

F. Hair Jr, J.; Sarstedt, M.; Hopkins, L.; G. Kuppelwieser, V. Partial Least Squares Structural Equation Modeling (PLS-SEM). Eur. Bus. Rev. 2014, 26 (2), 106–121.

(54)

Stubbs, S.; Zhang, J.; Morris, J. Multiway Interval Partial Least Squares for Batch Process Performance Monitoring. Ind. Eng. Chem. Res. 2013, 52 (35), 12399–12407.

(55)

Hyvärinen, A.; Oja, E. Independent Component Analysis: Algorithms and Applications. Neural Netw. Vol.13, No 2000, 1 (1), 4-5-430.

(56)

De Lathauwer, L.; De Moor, B.; Vandewalle, J. An Introduction to Independent Component Analysis. J. Chemom. 2000, 14 (3), 123–149.

(57)

Shen, H.; Kleinsteuber, M.; Hüper, K. Local Convergence Analysis of FastICA and Related Algorithms. IEEE Trans. Neural Networks 2008, 19 (6), 1022–1032.

(58)

De Lathauwer, L.; De Moor, B.; Vandewalle, J. An Introduction to Independent Component Analysis. J. Chemom. 2000, 14 (3), 123–149.

(59)

Chawla, M. P. S. PCA and ICA Processing Methods for Removal of Artifacts and Noise in Electrocardiograms: A Survey and Comparison. Appl. Soft. Comput., 2011, 11(2), 2216–2226.


37


Page 38 of 40

(60)

Deng, X.; Tian, X.; Chen, S. Modified Kernel Principal Component Analysis Based on Local Structure Analysis and Its Application to Nonlinear Process Fault Diagnosis. Chemom. Intell. Lab. Syst. 2013, 127, 195–209.

(61)

Zhang, Y.; Zhang, Y. Fault Detection of Non-Gaussian Processes Based on Modified Independent Component Analysis. Chem. Eng. Sci. 2010, 65 (16), 4630–4639.

(62)

Stefatos, G.; Hamza, A. Ben. Dynamic Independent Component Analysis Approach for Fault Detection and Diagnosis. Expert Syst. Appl. 2010, 37 (12), 8606–8617.

(63)

Zhang, Y.; Qin, S. J. Fault Detection of Nonlinear Processes Using Multiway Kernel Independent Component Analysis. Ind. Eng. Chem. Res. 2007, 46, 7780–7787.

(64)

Rashid, M. M.; Yu, J. Hidden Markov Model Based Adaptive Independent Component Analysis Approach for Complex Chemical Process Monitoring and Fault Detection. Ind. Eng. Chem. Res. 2012, 51 (15), 5506–5514.

(65)

Cai, L.; Tian, X. A New Fault Detection Method for Non-Gaussian Process Based on Robust Independent Component Analysis. Process Saf. Environ. Prot. 2014, 92 (6), 645–658.

(66)

Cai, L.; Tian, X.; Chen, S. A Process Monitoring Method Based on Noisy Independent Component Analysis. Neurocomputing 2014, 127, 231–246.

(67)

Tong, C.; Palazoglu, A.; Yan, X. Improved ICA for Process Monitoring Based on Ensemble Learning and Bayesian Inference. Chemom. Intell. Lab. Syst. 2014, 135, 141–149.

(68)

Du, W.; Fan, Y.; Zhang, Y.; Zhang, J. Fault Diagnosis of Non-Gaussian Process Based on FKICA. J. Franklin Inst. 2017, 354 (6), 2573–2590.

(69)

Yu, J. A New Fault Diagnosis Method of Multimode Processes Using Bayesian Inference Based Gaussian Mixture Contribution Decomposition. Eng. Appl. Artif. Intell. 2013, 26 (1), 456–466.

(70)

Yu, J.; Qin, S. J. Multiway Gaussian Mixture Model Based Multiphase Batch Process Monitoring. Ind. Eng. Chem. Res. 2009, 48 (18), 8585–8594.

(71)

Thissen, U.; Swierenga, H.; De Weijer, A. P.; Wehrens, R.; Melssen, W. J.; Buydens, L. M. C. Multivariate Statistical Process Control Using Mixture Modelling. J. Chemom. 2005, 19 (1), 23–31.

(72)

Yu, J. A Nonlinear Kernel Gaussian Mixture Model Based Inferential Monitoring Approach for Fault Detection and Diagnosis of Chemical Processes. Chem. Eng. Sci. 2012, 68 (1), 506–519.

(73)

Chen, T.; Zhang, J. On-Line Multivariate Statistical Monitoring of Batch Processes Using Gaussian Mixture Model. Comput. Chem. Eng. 2010, 34 (4), 500–507.

(74)

Cai, L.; Tian, X.; Chen, S. Monitoring Nonlinear and Non-Gaussian Processes Using Gaussian Mixture ModelBased Weighted Kernel Independent Component Analysis. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28 (1), 122–135.

(75)

Yu, J.; Qin, S. J. Multimode Process Monitoring with Bayesian Inference-Based Finite Gaussian Mixture Models. AIChE J. 2008, 54 (7), 1811–1829.

(76)

Li, G.; Qin, S. J.; Yuan, T. Data-Driven Root Cause Diagnosis of Faults in Process Industries. Chemom. Intell. Lab. Syst. 2016, 159, 1–11.

(77)

Yu, J. A Particle Filter Driven Dynamic Gaussian Mixture Model Approach for Complex Process Monitoring and Fault Diagnosis. J. Process Control 2012, 22 (4), 778–788.

(78)

Jiang, Q.; Huang, B.; Yan, X. GMM and Optimal Principal Components-Based Bayesian Method for Multimode Fault Diagnosis. Comput. Chem. Eng. 2016, 84, 338–349.

(79)

Yu, H.; Khan, F.; Garaniya, V. A Probabilistic Multivariate Method for Fault Diagnosis of Industrial Processes.


38

Page 39 of 40 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Chem. Eng. Res. Des. 2015, 104 (3), 306–318. (80)


Appendix Table A: List of prominent authors in data-driven FDD (based on number of documents) S.No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 18 20 21 22 23 24

Author Yin S Tsung F Zhang Y Wang H Song Z Zhang J Kruger U Ding S X Ge Z Xie L Morris A J Zhao C Martin E B Wang Y MacGroger J F Chen J Gao F Liu J Khan F Qin S J Li z Wang Z Yu J Huang B


Documents 45 44 40 40 39 39 37 36 34 33 32 32 30 30 30 29 28 28 28 27 25 25 23 23

39


Page 40 of 40

TOC

Fig. 5a: Frequently used author’s keywords in data-based FDD


40

A Bibliometric Review and Analysis of Data-Driven Fault Detection

Recommend Documents