Sequential Dependence Modeling Using Bayesian ... - ACS Publications

Aug 28, 2014 - ABSTRACT: An emerging kind of prediction model for sequential data with multiple time series is proposed. Because D-vine copula provide...
2 downloads 0 Views 3MB Size
Article pubs.acs.org/IECR

Sequential Dependence Modeling Using Bayesian Theory and D‑Vine Copula and Its Application on Chemical Process Risk Prediction Xiang Ren, Shaojun Li,* Cheng Lv, and Ziyang Zhang Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China S Supporting Information *

ABSTRACT: An emerging kind of prediction model for sequential data with multiple time series is proposed. Because D-vine copula provides more flexibility in dependence modeling, accounting for conditional dependence, asymmetries, and tail dependence, it is employed to describe sequential dependence between variables in the sample data. A D-vine model with the form of a time window is created to fit the correlation of variables well. To describe the randomness dynamically, Bayesian theory is also applied. As an application, a detailed modeling of prediction of abnormal events in a chemical process is given. Statistics (e.g., mean, variance, skewness, kurtosis, confidence interval, etc.) of the posterior predictive distribution are obtained by Markov chain Monte Carlo simulation. It is shown that the model created in this paper achieves a prediction performance better than that of some other system identification methods, e.g., autoregressive moving average model and back propagation neural network. operators, Seider and co-workers8−10 developed a kind of Bayesian model accounting for the interdependences among safety systems with Cuadras and Auges copula and multivariate Gaussian copula for real-time risk evaluation of chemical processes. In Bayesian simulation, posterior distributions were updated dynamically to yield mean abnormal events and fault probabilities in each time period. Once the alarms or the abnormal events are predicted and evaluated, operators will be more cautious and have enough time to detect and diagnose the faults. Recently, Ahooyi and co-workers11,12 proposed a moment-based method for estimating the probabilities of rare events. The method is based on maximizing the information entropy subject to equality constraints on the moment functions and can be applied to model fault propagation in complex highly interactive systems under a Bayesian network structure. When compared with some conventional bivariate copulas, the moment-based probabilities provide a more efficient method for estimating probabilities of industrial processes with high nonlinearity and uncertainty. As for copula theory in dependence modeling, standard multivariable copulas such as the multivariate Gaussian or Student-t as well as the exchangeable Archimedean copulas13 have limited abilities to fit those dependence models well. In this regard, vine copulas are able to model complex patterns by benefiting from the rich variety of bivariate copulas (structures and parameters) as building blocks.14 It has been tested in many applications that vine copulas have much more flexibility in dependence modeling, accounting for conditional dependence, asymmetries, and tail dependence. For example, Czado15 illustrated the model flexibility of vine copula by considering a D-vine tree in 3 dimensions and showed that vine copulas

1. INTRODUCTION Risk assessment and fault detection in chemical process have been active areas of research, and complex and interconnected behaviors are frequently encountered when modeling some related processes. Generally, the issue of predicting the number of alarms or abnormal events can be treated as a time series analysis problem with various forms. The time domain approach,1 one of the basic approaches in time series analysis, has had significant effect on dependence modeling. This approach can be used as an efficient forecasting tool focusing on modeling some future value of a time series as a parametric function of the current and past values. Among those methods, Box and Pierce2 developed a systematic class of models called autoregressive integrated moving average (ARIMA) models to handle time-correlated modeling and forecasting. Subsequently, several special or advanced techniques were also proposed. For example, generalized autoregressive conditional heteroskedasticity models for the analysis of volatility,3 adaptive splines threshold autoregressive models for long-range dependence and nonlinearity,4 etc. More typically, when it comes to the coupling relations with finite variables, a series of probability estimation methods are also qualified. These methods intrinsically model the joint distribution of variables and can provide a more complete description of the estimated parameters with highly complex dependence. The problem is how to analyze, extract, and make use of the information from the multidimensional sample data to model the complex dependence structure. Though some traditional tools like linear correlation coefficient and scatter plot in multivariate dependence model achieved reasonable results,5 in recent years, several efficient methods have also been applied to industry areas, such as copula theory and moment-based method. Yi and Bier6,7 first employed bivariate copulas to estimate conditional system failure probability in the nuclear industry. Because there exists a nonlinear relationship between variables and behavior-based factors involving human © 2014 American Chemical Society

Received: Revised: Accepted: Published: 14788

May 7, 2014 July 13, 2014 August 28, 2014 August 28, 2014 dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

could describe asymmetries well by using different kinds of copulas along with their parameters. Berg et al.13 tested the good performance of vine copulas (pair copula constructions) in computational complexity and fitting capability compared with those of nested Archimedean constructions, one group of relatively flexible multivariate Archimedean copula extensions. Reference 16 gives a description of the tail behavior of vine copulas. This paper focuses on establishing a kind of dynamic model accounting for multivariate sequential dependence, with the method of Bayesian theory and D-vine copula. As an application, a case study involving a plant run by multiple shifts of operators is introduced. The alarm sample data of four operator groups are simulated by an improved Gaussian regression method. These data perform well in sequential dependence analysis. The number of the abnormal events is predicted dynamically using a Bayesian hierarchical model, and D-vine copula is employed into the joint posterior distribution to capture the sequential dependence among process variables at subsequent time periods. To perform Bayesian simulation, Markov chain Monte Carlo method with Metropolis−Hastings algorithm17 is used to obtain the confidence intervals. The paper is organized as follows: Section 2 gives a review of Bayesian theory and D-vine copula. A generalized state estimation approach with n-dimensional data set is introduced in Section 3, with an emphasis on sequential dependence analysis. Section 4 provides a detailed Bayesian modeling considering sequential dependence among four operator groups to predict the frequency of abnormal events in a specific chemical plant. Performance of the statistical model is discussed compared with that of the autoregressive integrated moving average (ARMA) model and back propagation (BP) neural network. Finally, Section 5 provides some summarizing comments and conclusions.

a direct link with Data. Assuming that (x,ϕ) are exchangeable in their joint distribution

2. METHODOLOGICAL BACKGROUND In this section, a brief review of Bayesian statistical method is given, along with copula theory accounting for sequential dependence between random variables, which is helpful in Bayesian modeling in Section 4. 2.1. Bayesian Theory. Bayesian theory is an efficient statistical method, especially in solving the problems of lowprobability, high-consequence events. It has the ability in realtime to evaluate and predict unobservable parameters using historical information (prior distribution) and the real-time sample data (likelihood function).18 Assuming that Data ∈ χ denote the observed data in the sample space χ, ϕ ∈ Θ is the unobservable parameter vector linked with sample data. The Bayesians hold the view that there exists uncertainty in ϕ, if Data and ϕ are exchangeable, then

where c is defined as

p(ϕ|Data) =

p(ϕ)l(ϕ|Data)

∫ p(ϕ)l(ϕ|Data) dϕ

p(x , ϕ|Data) = p(x|ϕ , Data)p(ϕ|Data) ∝ p(x|ϕ , Data)p(Data|ϕ)p(ϕ)

(2)

Note that p(x|ϕ,Data) usually has a fixed structure linked with only (x,ϕ), which can be simplified as p(x|ϕ,Data) = p(x| ϕ). In general, ϕ is unknown, so the distribution for x must average over uncertainty in ϕ

∫ p(x, ϕ|Data) dϕ ∝ ∫ p(x|ϕ , Data)p(Data|ϕ)p(ϕ) dϕ

p(x|Data) =

(3)

2.2. D-Vine Copula. Copulas are able to model a dependence structure of multivariate data. Recall from Sklar19 that an n-dimensional joint distribution can be decomposed into its n univariate marginal distributions and an n-dimensional copula. Let F be the n-dimensional cumulative distribution function of the random vector x = [x1, x2,···, xn]T with marginal cumulative distributions, F1, F2, ···, Fn, then there exists a copula function C such that F(x1 , x 2 , ···, xn) = C(F1(x1), F2(x 2), ···, Fn(xn))

(4)

where C is unique if F1, F2,···, Fn are continuous. Similarly, the corresponding joint probability density f(x1, x2,···, xn) can be obtained as a function of the copula density c and the marginal density f1, f 2,···, f n: f (x1 , x 2 , ···, xn) = f1 (x1) × f2 (x 2) × ··· × fn (xn) · c[F1(x1), F2(x 2), ···, Fn(xn)]

c(F1 , F2 , ···, Fn) =

∂ nC ∂F1∂F2···∂Fn

(5)

(6)

For more details, see ref 20. However, conventional copulas are very restrictive in multivariate dependence modeling with complex structures. Though extensions of those copulas offer some improvement,21 there still exist other limitations, such as parameter restriction, computationally time-consuming problems, etc.22 2.2.1. Pair-Copula Construction. Vine copulas were initially proposed by Joe23 and developed further by Bedford and Cook;24−26 statistical inference on parameter estimation was explored by Aas et al.27 Compared with some standard multivariate copulas or generalized forms, they have much more flexibility in dependence modeling accounting for conditional dependence as well as asymmetries and tail dependence. Vines are hierarchical graphic models for describing multivariate copulas using rich variety of bivariate copulas as building blocks, so-called pair-copulas, whose modeling scheme is based on a decomposition of an ndimensional multivariate density into n(n − 1)/2 bivariate copula densities. Thus far, the typical 2 types of pair copula constructions (PCCs) have been used more generally, canonical vines (C-vines) and D-vines. Herein, we concentrate on D-vine copulas, whose standard density formula can be written as

∝ p(ϕ)l(ϕ|Data) (1)

Where, p(ϕ) is the prior density of ϕ and l(ϕ|Data) refers to the likelihood function, while p(ϕ|Data) denotes the posterior density of ϕ. Obviously, posterior density is proportional to the product of its prior and the likelihood. Now, we need to estimate the unknown state value x, which is related to the observed data, Data ∈ χ. x may be governed by the unknown parameter vector ϕ (e.g., the parameters of distribution of x), termed hyperparameter, which seems to have 14789

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research n

f (x) =

Article

n−1 n−i

2.2.2. Conditional Distribution and Simulation. A challenging problem for inference is how to evaluate the conditional distribution functions in eq 7. Aas et al.27 provided analytical expressions for a variety of bivariate copulas. For simplicity, define uj|i ≡ F(xj|xi ,xi+1,···, xj−1) and ui|j ≡ F(xi|xi+1, xi+2,···, xj), where i < j, then the conditional distribution function can be easily established in a recursive form28

∏ fk (xk) × {∏ ∏ cj ,j + i| j + 1:j + i− 1 k=1

i=1 j=1

(F(xj|xj + 1 , ···, xj + i − 1), F(xj + i|xj + 1 , ···, xj + i − 1); θj , j + i | j + 1: j + i − 1)}

(7)

where f(x) is the joint density function of n-dimensional random variables x; f k(k = 1, 2,···, n) denotes the marginal densities and cj,j+i|j+1:j+i−1 bivariate copula densities; and θj,j+i|j+1:j+i−1 denotes the corresponding copula parameter between conditional distribution functions of xj and xj+i conditioned on xj+1, xj+2,···, xj+i−1. Taking the PCC in four dimensions as an example, the order of variables is chosen according to its sequential form, then the joint density of x1, x2, x3, x4 can be formulated as follows:

⎧ uj | i ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ui | j ⎪ ⎪ ⎪ ⎩

f (x1 , x 2 , x3 , x4) = f1 (x1) ·f (x 2|x1) ·f (x3|x1 , x 2) · f (x4|x1 , x 2 , x3)

=

∂ Ci , j | i + 1: j − 1(ui | j − 1 , uj | i + 1; θi , j | i + 1: j − 1) ∂ui | j − 1

= hi , j | i + 1: j − 1(ui | j − 1|uj | i + 1; θi , j | i + 1: j − 1) =

∂ Ci , j | i + 1: j − 1(ui | j − 1 , uj | i + 1; θi , j | i + 1: j − 1) ∂uj | i + 1

(11)

where Ci,j|i+1:j−1 is the cumulative distribution function corresponding to pair-copula density ci,j|i+1:j−1. Here, it needs to be stressed that θi,j|i+1:j−1 or θj,i|i+1:j−1 is just a sign, only to make the equation more symmetric, as it does not cause differences between them in a specific D-vine model. Considering a D-vine model in four dimensions, once the marginal cumulative distributions ui|i ≡ Fi(xi)(i = 1,2,···,n) are given, their corresponding conditional distributions (forward recursion or backward recursion in eq 11) can be easily obtained, which is shown more clearly in Figure 2.

(8)

With the property of conditional density function, we have that ⎧ f (x 2|x1) = c1,2(F(x1), F(x 2)) ·f (x 2) 2 ⎪ ⎪ ⎪ f (x3|x1 , x 2) = c1,3 | 2(F(x1|x 2), F(x3|x 2)) · ⎪ c 2,3(F(x 2), F(x3)) ·f3 (x3) ⎨ ⎪ f (x | x , x , x ) = c 1,4 | 23(F (x1|x 2 , x3), ⎪ 4 1 2 3 ⎪ F(x4|x 2 , x3))·c 2,4 | 3(F(x 2|x3), F(x4|x3)) · ⎪ ⎩ c3,4(F(x3), F(x4)) ·f4 (x4)

= hj , i | i + 1: j − 1(uj | i + 1|ui | j − 1; θj , i | i + 1: j − 1)

(9)

In the following sections of this paper, each bivariate copula is briefly denoted by c, e.g., c1,2(F(x1), F(x2)) is simply written as c1,2. Substituting eq 9 into eq 8, the D-vine density with four dimensions can be obtained f (x1 , x 2 , x3 , x4) = c1,2·c 2,3·c3,4·c1,3 | 2·c 2,4 | 3·c1,4 | 2,3·f1 (x1) · f2 (x 2) ·f3 (x3) ·f4 (x4)

(10)

Equation 10 decomposes a multivariate density into 6 bivariate copula densities and their margins. For more details of demonstration, see ref 28. Figure 1 is the graphical model of D-vine with three trees, which shows this process more clearly.

Figure 2. Calculation for conditional distribution functions: the dependencies between {uj|i,ui|j} values resulting from both the forward and backward recursions of eq 11.

The associated conditional distributions in eq 9 are calculated as follows, which will be used in the Bayesian modeling part in Section 4.

Figure 1. Graphical model: D-vine trees with edge indices.

The pairs (1,2), (2,3), and (3,4) in the first tree and the conditional pairs (1,3|2), (2,4|3), and (1,4|2,3) in the second and third trees incorporate together and form the D-vine model. More generally, returning back to eq 7, the outer product runs over the n − 1 trees, while the pairs in each tree are designated by the inner product. 14790

u1 | 2 =

∂ C1,2(u1 | 1 , u 2 | 2 ; θ1,2) ∂u 2 | 2

u3 | 2 =

∂ C2,3(u 2 | 2 , u3 | 3; θ2,3) ∂u 2 | 2

u2|3 =

∂ C2,3(u 2 | 2 , u3 | 3; θ2,3) ∂u3 | 3

u4 | 3 =

∂ C3,4(u3 | 3 , u4 | 4 ; θ3,4) ∂u3 | 3 dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

Figure 3. Overall schematic of Bayesian modeling with sequential dependence.

u1 | 3 =

∂ C1,3 | 2(u1 | 2 , u3 | 2 ; θ1,3 | 2) ∂u3 | 2

u4 | 2 =

∂ C2,4 | 3(u 2 | 3 , u4 | 3; θ2,4 | 3) ∂u 2 | 3

Data = [ Data1 Data 2 ··· Data n ] y1 y2 ··· yn ⎤ ⎡ ⎢ ⎥ yn + 2 ··· y2n ⎥ ⎢ yn + 1 =⎢ ⋮ ⋮ ⋮ ⋮ ⎥⎥ ⎢ ⎢⎣ y(m − 1)n + 1 y(m − 1)n + 2 ··· ymn ⎥⎦

2.2.3. Introduction to Software Package CDVine. Complexity in fulfilling C-vine and D-vine simulation limits the popularity of this approach; thus far, several software packages are available, such as the software tool UNICORN26 and the package CDVine29 for the statistical software R. The CDVine Package provides functions for statistical inference of canonical vine and D-vine copulas. It contains tools for bivariate exploratory data analysis and for bivariate as well as vine copula selection; both graphical and analytical tools are included. It also includes sampling methods and plotting methods.14 In this paper, several functions in the CDVine package are used in the D-vine model simulation.

where m denotes the number of sample data for each variable in a specific time window with length n; yt (t = 1, 2,···, mn) represents the state or observed value in time interval t; and each column denotes the samples for a specific variable. It is worth noting that if each row of Data described in eq 14 is incorporated from tail to head sequentially, then the sequential data for the whole states can be obtained, that is, [y1, y2,···, yn, yn+1,···, ymn]. Generally, the current state is correlated with the states of its previous time in a specific time window with length n. For describing this “catenulate” coupling form, n types of the corresponding D-vine models are constructed; hence, the concept of sequential dependence is needed and developed. Considering that there always exist uncertainties among sample data, Bayesian statistical theory is employed to predict those values dynamically, with a posterior distribution form. Overall, the assumption above is practical because the coupling in a specific time window is rather strong; moreover, the information on the rest samples in the previous time has been introduced into the Bayesian model. Consider the joint density function of [xt−(n−1),···,xt−2,xt−1,xt] in the current time window, which can be decomposed into the product of their marginal density functions and D-vine copula density

3. STATE ESTIMATION WITH SEQUENTIAL DEPENDENCE It is convenient to use a set of state equations to describe a multivariate system30−32 whose discrete form with random noise can be written as x t = f (x t − 1, ωt − 1)

(14)

(12)

nx

Here, x t ∈ 9 is the state vector, system error is represented by ωt, and f(•): 9 nx × 9 nω → 9 nx denotes the state equation of a random nonlinear system. The state equation describes only the nonlinear relationship of the state vector in two adjacent time periods.31 However, multiple couplings existing in multiple time series are commonly encountered, which can be described as

n

f (xt − (n − 1), ···, xt − 2 , xt − 1, xt |Data) = {∏ f (xt − n + i|Data)}· c D‐vine

xt = f (xt − (n − 1), xt − (n − 2), ···, xt − 1, ωt − (n − 1), ωt − (n − 2), ···, ωt − 1) (13)

i=1

(15)

where f(xt−n+i|Data) represents the marginal posterior distribution of xt−n+i and cD‑vine denotes the D-vine copula density, which will be further discussed in Section 4. Because there exist uncertainties on the corresponding parameters of the distribution of xt−n+i (i = 1, 2,···, n), the Bayesian hierarchical model in eq 3 is employed. When combined with eq 15, we have

Herein, xt is a scalar compared with that in eq 12. xt−(n−1), xt−(n−2), ···, xt−1, xt is named as a time window with length n, which can also be treated as an n-dimensional multivariable shown in eq 7. To estimate the state value xt in eq 13, a set of sequential data with n-dimensional variable is discussed. Its matrix form is expressed as 14791

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

Figure 4. Alarm time series for four operator groups: (a) simulated data points with 220 time intervals for each operator group and (b) integrating the data points for four operator groups in order and yielding 880 data points with a sequential form.

equipment areas, there are large numbers of process variables to be observed. In fact, every single operator or operator group usually has to face hundreds of alarms daily. When the alarm flood occurs, this number will increase quickly. For example, in 1994, 11 min prior to an explosion in the Texaco Milford Haven refinery, the two operators had to recognize, acknowledge, and act on 275 alarms.34 To achieve more efficient alarm management, in 1999 the Engineering Equipment and Materials Uses Association guidelines35 established that the number of alarms should not exceed 10 in the first 10 min following an upset. In this paper, the alarms in a major equipment area are studied. Generally, the vast majority of chemical processes are continuous production, and a work shift system is usually introduced.36,37 For example, in China, most chemical plants operate on a 4-group-3-shift basis, that is, every group alternately works for 8 h and rests for another 24 h. The act of former group A will have a crucial impact on group B, as well as some indirect influence on groups C and D, and so on, which present the problem of multivariate sequential dependence. Therefore, to make the model more powerful, it is of great interest to consider sequential dependence for multidimensional data sets with sequential form. In the following work, a case study involving a chemical plant run by four shifts of operators has been addressed, and the number of abnormal events has also been predicted dynamically by using Bayesian methods and D-vine copula. 4.2. Data Process. With respect to a 4-group-3-shift case study in a chemical process, suppose Ai, Bi, Ci, and Di are the ith shifts of group A, B, C, and D, respectively. Thus, the shifts in the first day are A1, B1, and C1; D1, A2, and B2 for the second; and so on. Because there exists uncertainty in the number of alarms for each group, herein four Gaussian distributions with expectation μi and variance σ2i are designed

f (xt − (n − 1) , ···, xt − 2 , xt − 1 , xt |Data) n

∝ {∏

∫ f (xt−n+i|ϕ, Data)f (xt−n+i|ϕ)p(ϕ) dϕ}·c D‐vine

i=1

(16)

Equation 16 is the basic model to predict those sequential data. Where ϕ is called a hyperparameter with a vector form, which is influenced by the corresponding sample data and related to its random variable. Figure 3 depicts the overall flowchart of the process for state estimation. The empirical data can be divided into two parts. Historical data are used for correlation analysis and copula fitting and can be invoked into the prior distribution. Meanwhile, the information on the likelihood function is supplied by the historical samples of testing data for each corresponding random variable with the time series. In Bayesian simulation, conjugate prior method, Monte Carlo (MC) method, and Markov chain Monte Carlo (MCMC) method are used to get the posterior means and other statistics.

4. CASE STUDY 4.1. Introduction to Abnormal Events with Multiple Shifts of Operators. The development of risk assessment and management methods to handle abnormal events has been an active area in industry risk modeling, particularly in the chemical process industries (CPIs). The CPIs handle many hazardous chemicals in storage units, reactors, separators, and other components.8 To maintain safe production in chemical processes, the core goal is to control parameters like temperature, pressure, level, flow, and ingredients within their normal operating scope. If a parameter exceeds its threshold of alarms in different levels, then an abnormal event is triggered. In an industrial-scale chemical plant, about 5000−10 000 alarms typically occur daily.9 Generally, each of the major equipment areas is related to a single operator’s scope of responsibility in the plant.33 Meanwhile, for each of the major

xi ∼ Normal(μi , σi2) 14792

(i = 1, 2, 3, 4)

(17)

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

Figure 5. Scatter plot of the normalized data sets for four operator groups.

where xi denotes the number of alarms on group i in the corresponding time interval. In fact, it has been examined that the number of alarms can be approximately assumed to follow a Gaussian distribution by analyzing the raw data obtained from the distributed control system (DCS) in a refinery. In this paper, 220 consecutive shifts for each operator group, i.e., a total of 880 consecutive shifts, are used to simulate the process, and the simulated data are given in the Supporting Information. The previous 800 data points are used for correlation analysis as historical information, and the latter 80 data points are used to examine the prediction performance. As explained in eq 14, herein the sequential data for four operator groups, [y1, y2, y3, y4, y5,···, y4m], can also be described in a matrix form Data = [ Data1 Data 2 Data3 Data4 ] y2 y3 y4 ⎤ ⎡ y1 ⎢ ⎥ y6 y7 y8 ⎥ ⎢ y5 =⎢ ⎥ ⋮ ⋮ ⋮⎥ ⎢ ⋮ ⎣⎢ y4m − 3 y4m − 2 y4m − 1 y4m ⎥⎦

without loss of generality, let the samples in column 1 to 4 represent the alarms of operator group A to D in their own time intervals, respectively. For example, the sample data for group C is Data3 = [y3, y7,···, y4m−1]T. Figure 4a depicts the number of alarms for four operator groups with the length of 220. Figure 4b integrates the data points in order for four operator groups into a sequential form. To analyze the sequential dependence between variables for four operator groups, the alarm time series for each group are normalized and transformed via its empirical cumulative distribution function. This procedure is widely used in the rank correlation analysis, which can be seen in ref 5. The scatter plots for 4-dimensional data sets are given in Figure 5. It is shown that the constructed sequential data are strongly correlated. However, there exists logical dependence in sequential modeling. As discussed before, e.g., the effect of group A on group B is much significant than that of group B on group A; hence, it is imperative to distinguish correlation coefficient τAB from τBA. The scatter plots in Figure 5 give a complete view of sequential dependence in a matrix form. The plot in row i and column j represents the effect of group i on group j. Herein, operator group A, B, C, and D are denoted by 1, 2, 3, and 4, respectively. It is also found that there is evidently strong dependence in 1 → 2, 2 → 3, and 3 → 4, while dependences of variables with 1 or even more intervals become weaker

(18)

where m denotes the number of samples for each operator group and yt (t = 1, 2,···, 4m) represents the number of abnormal events in time interval t; each column denotes the samples in a specific operator group. For definiteness and 14793

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

Figure 6. Flowchart of D-vine modeling. Four time windows correspond to four D-vine models.

multiple shifts of operators or other similar problems, there is no need to optimize the coupling order of variables in a specific D-vine model (the structure selection procedure in Figure 3) under Bayesian simulation, which will make the problem easy to handle. For further analysis, the sequential data are decomposed into four types of time windows, 1 → 2 → 3 → 4, 2 → 3 → 4 → 1, 3 → 4 → 1 → 2, and 4 → 1 → 2 → 3, denoted as TW1, TW2, TW3, and TW4, respectively. Likewise, 1, 2, 3, and 4 represent four operator groups in their corresponding time periods. Also, four types of time windows correspond to four classes of D-vine models, as shown in Figure 6. According to eqs 9 and 10, four types of D-vine models can be considered separately; their joint distribution functions of the alarms of four adjacent operator groups are established as follows:

gradually, which is shown to be more practical in the multiple operator shifts case study. From a quantitative viewpoint, Kendall’s τ has been calculated with the following matrix forms

τ Kendall

⎡ \ 0.701 0.630 0.502 ⎤ ⎢ ⎥ \ 0.696 0.551⎥ ⎢ 0.418 =⎢ ⎥ \ 0.692 ⎥ ⎢ 0.542 0.507 ⎢⎣ 0.637 0.591 0.519 \ ⎥⎦

For example, the Kendall’s τ of τ1,2 is 0.701, much larger than that of τ2,1, 0.418, which shows the significant difference in sequential dependence. 4.3. Model Construction and Parameter Estimation. In this section, we aim to establish a Bayesian model to predict the number of alarms at present. Before that, the concept of time window is proposed to model the multivariable with alarm time series for different operator groups. By analyzing the previous 800 data points, copula families and their corresponding parameters have also been tested, which will be invoked in the posterior distribution function in the Bayesian model. 4.3.1. Pair-Copula Construction in Sequential Modeling. Compared with the conventional copulas, C-vine and D-vine provide a more flexible structure in describing coupling between random variables. For a set of sequential data with four time series, the coupling form among them is complex. As has been explained before, A3 is influenced by D2, and it also correlates with C2, B2, A2, and D1, and even more, indirectly. For simplicity, we consider only the couplings with its three previous groups, that is, D2, C2, and B2. In fact, A2 is treated as the sample datum of A3 and can be employed by the likelihood function in the Bayesian model. Overall, in the case study of 14794

⎧ f (x , x , x , x ) = c1·f (x ) ·f (x ) ·f (x ) ·f (x ) ⎪1 1 2 3 4 1 2 3 4 ⎨ 1 ⎪ c = c ·c ·c ·c ·c ·c 1,2 2,3 3,4 1,3 | 2 2,4 | 3 1,4 | 2,3 ⎩

(19)

⎧ f (x , x , x , x ) = c 2 · f (x ) · f (x ) · f (x ) · f (x ) ⎪ 2 1 2 3 4 2 3 4 1 ⎨ 2 ⎪ c = c ·c ·c ·c ·c ·c 2,3 3,4 4,1 2,4 | 3 3,1 | 4 2,1 | 3,4 ⎩

(20)

⎧ f (x , x , x , x ) = c 3·f (x ) ·f (x ) ·f (x ) ·f (x ) ⎪ 3 1 2 3 4 3 4 1 2 ⎨ ⎪ c 3 = c ·c ·c ·c ·c ·c 3,4 4,1 1,2 3,1 | 4 4,2 | 1 3,2 | 4,1 ⎩

(21)

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

Figure 7. Four-dimensional D-vine structure along with the parameters for each copula. Copula family name is shown on the edge, and the parameters of these copulas appear beneath the names; run time and log-likelihood defined in eq 25 are shown in the top left corners of the subplots. Four subplots represent four types of D-vine models in different time windows (TW1, TW2, TW3, and TW4).

⎧ f ( x , x , x , x ) = c 4 · f (x ) · f (x ) · f (x ) · f (x ) ⎪ 4 1 2 3 4 4 1 2 3 ⎨ 4 ⎪ c = c ·c ·c ·c ·c ·c 4,1 1,2 2,3 4,2 | 1 1,3 | 2 4,3 | 1,2 ⎩

copula term, and both structure selection and parameter estimation are handled simultaneously. The log-likelihood of a D-vine copula for given data, pair-copula structures and parameters is defined as follows:

(22)

4.3.2. Copula Selection and Parameter Estimation. It is important to select a group of well-fitted pair copulas (structures and parameters) for the given data, which will have a great impact on the prediction performance. For a specific pair copula, cj,j+i (i = 1, 2,···, n−1; j = 1, 2,···, n−i) which has been described in eq 7, its corresponding parameters using different available copula families can be estimated by maximum likelihood estimation o

θĵ , j + i | (j + 1):(j + i − 1) = arg

M d−1 d−i

log‐likelihood =

k=1 i=1 j=1

(ujk| j + i − 1 ,

max

M k=1

ujk+ i | j + 1;

θĵ , j + i | (j + 1):(j + i − 1))]

(25)

Herein, the available copula families are fitted from six copula families, involving two elliptical copulas, Gaussian, Student-t, and four Archimedean copulas, Clayton, Gumbel, Frank, and Joe. Functions CDVineCopSelect and CDVineLogLik in CDVine Package are used to fit the sample data. Figure 7 shows the copula families chosen and their parameters for four types of D-vine models described in eqs 19−22. It can be found that the run time in copula selection and estimation (six copula families to be selected) for each D-vine model is approximately 1.5 s. In fact, the run time is less than 3 s when considering all 40 copula families available in the CDVine package. Therefore, the computational time required for capturing the right structure of the D-vine copula is less than those of conventional multivariate copulas because the multivariate optimization problem is converted to bivariate copula estimation problems by proceeding sequentially tree-bytree in a D-vine model.14 More to the point, from Figure 7, we can see that very low dependence is also estimated (especially in the third tree), and these fitted copulas with low dependence may contribute little to the prediction performance; moreover, it does really waste time in solving the optimization problems. Consequently, it has been suggested that each pair-copula perform a test for independence that indicates whether the corresponding pair-copula can be replaced by the independent copula.39 Because the data we focus on are simulated rather than practical, the further simplification is not discussed here. The corresponding methods on independence tests are referenced in ref 40.

θjo, j + i | (j + 1):(j + i − 1)

∑ log[c o(ujk| j + i− 1, ujk+ i| j + 1; θjo,j + i| (j + 1):(j + i− 1))

∑ ∑ ∑ log[cj ,j + i| (j + 1):(j + i− 1)

(23)

where M = 200 denotes the number of conditional distribution data (observations). ukj|j+i−1 denotes the kth observation of uj|j+i−1, which can be calculated by a recursive form in eq 11. co denotes the oth (o = 1, 2,···, O) available copula family, and O is the number of the available copula families to be selected. Then the Akaike information criterion (AIC) are computed for the whole O copula families,38 and the family with the minimum value is chosen: AICoj , j + i = −2 M

∑ log[c o(ujk| j + i− 1, ujk+ i| j + 1; θjo,j + i| (j + 1):(j + i− 1))] + 2λ k=1

(24)

where λ is a constant related with parameter numbers. λ = 1 for one-parameter copulas, and λ = 2 for the two-parameter copulas. Overall, with 200 bivariate copula data for each operator group, pair copulas in the D-vine model are fitted sequentially by proceeding iteratively tree-by-tree. This process involves only bivariate estimation for each individual pair 14795

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

4.3.3. Bayesian Modeling. When handling a chemical process, operators may face masses of alarms. These alarms can be grouped into several categories according to different consequences. Once alarms and the fault probabilities are detected, predicted, and evaluated dynamically, operators will take measures to avoid an accident in real time. Generally, the current frequency of alarms is influenced by two factors (operation factors are discussed here, excluding other factors like equipment failure): (a) operators at present, related to skills, emotions, external environment, etc., and (b) operation condition at present, which is mainly relevant to the skills of previous operator groups. The basic idea of our model can be summarized as follows: 80 samples are dynamically invoked into the likelihood function of Bayesian formula, and prior information is supplied by the 800 historical data. Meanwhile, to account for sequential dependence in factor (b), D-vine copula is introduced to the prior distributions. 4.3.3.1. Bayesian Model with Noncoupling. Let xt be the number of abnormal events in time interval t, which follows Gaussian distribution xt ∼ Normal(μtmod4,σ2), where “mod” represents modulo operation; means of four operator groups are shown in eq 17. If the coupling relations are neglected, their joint posterior distribution can be decomposed into four marginal posterior distributions:

f (xt |Data) ∝

2 ⎧ ⎤⎫ ⎡ 2 2 ⎪ σ 2μn + σn2xt ⎞ 1 ⎪ 1 σ + σn ⎛ ⎢ ⎟ ⎜ ⎨ exp − dμt ⎥⎬ = μ − 2 2 ⎜ t 2 2 ⎟ ⎥ ⎢ 2πσσn ⎪ σ + σn ⎠ ⎦⎪ ⎣ 2 σ σn ⎝ ⎩ ⎭ 2 ⎛ 1 (x − μ ) ⎞ t n ⎟ exp⎜⎜ − 2 2 ⎟ 2 σ + σ ⎝ n ⎠





In the noncoupling model, the posterior distribution of alarms is also a Gaussian distribution. Moreover, both posterior mean and predictive mean are the same, that is, a weighted average of the prior mean μ0 and the sample mean y,̅ where the weights are determined by the sample size q, variance σ2, and σ20. With an increase in q, the posterior variance decreases, indicating a more accurate estimation. 4.3.3.2. Bayesian Model Using D-Vine Copula. Analogously, D-vine copula is employed in the model when accounting for sequential dependence. The joint posterior distribution is the product of marginal densities and the vine copula density cD‑vine: f (xt − 3 , xt − 2 , xt − 1 , xt |Data) = c D‐vine·f (xt − 3|Data) ·

(26)

f (xt − 2|Data) ·f (xt − 1|Data) ·f (xt |Data)

4

f (xt − 3 , xt − 2 , xt − 1, xt |Data) ∝ c D‐vine·exp(∑ (xt − 4 + i − μn )2 ) i=1

(31)

4.4. Simulation. In order to perform Bayesian simulation, MC and MCMC methods are provided to evaluate the expectation of the frequency of alarms in each time period as well as other statistics, e.g., standard errors, skewness, kurtosis, etc. An ARMA model based on the least square (LS) method and BP neural network simulation are also given. Then, we study the results above and discuss the advantages of the statistical model in state estimation with sequential dependence. 4.4.1. Monte Carlo Method. The Bayesian model with Dvine copula in eq 31 is essentially the function of a joint density with four random variables and a copula density. For simplicity, a more succinct form is given. As for TW1 model in eq 19, marginal posterior density of alarm frequencies for the ith (i = 1, 2, 3, 4) operator group is f(xi|Data). As x1, x2, and x3 are fixed in time t − 3, t − 2, and t − 1, their joint posterior density function can be written as

f (μt |Data) ∝ f (Data|μt )f (μt ) q

∑ y4i−4+(t mod 4) + i=1

⎫ 1 ⎞ ⎪ μ ⎟μ ⎬ , 2 0⎟ t ⎪ σ0 ⎠ ⎭

2⎞



⎛ (μ − μ ) 1 exp⎜⎜ − t 2 n ⎟⎟ 2π σn 2σn ⎝ ⎠

(27)

where ⎧ qσ02 σ2 ⎪ μn = + μ y̅ qσ02 + σ 2 0 qσ02 + σ 2 ⎪ ⎨ ⎪ 2 σ 2σ02 ⎪ σn = qσ02 + σ 2 ⎩

(30)

Herein, cD‑vine satisfies cD‑vine = ctmod4, where “mod” represents modulo operation. Hence, ctmod4 can be obtained from eqs 19−22. When eq 29 is substituted into eq 30, the final Bayesian model with D-vine copula can be expressed as

To reduce the complexity in modeling and computation, herein we consider only the uncertainty on μt, which is assumed to follow Gaussian distribution, μt ∼ Normal(μ0,σ02). In addition, we assume that σt is fixed in our simulated data for convenience. In fact, it varies for different operator groups. The value becomes smaller with skilled operators who can achieve a relatively stable operation level, and vice versa. Each marginal posterior distribution of expectation is calculated using a Bayesian hierarchical model in eqs 2 and 3.

⎧ ⎛1 ⎪ 1⎛ q 1 ⎞ 2 ⎜⎜ 2 ⎜ ⎟ 2 ∝ exp⎨ − + μ − 2 2 t ⎪ σ0 ⎠ ⎝σ ⎩ 2⎝σ

⎛ 1 (x − μ )2 ⎞ t n ⎟ exp⎜⎜ − 2 2 ⎟ 2 2 ⎝ 2 σ + σn ⎠ 2π σ + σn 1

(29)

f (xt − 3 , xt − 2 , xt − 1 , xt |Data) = f (xt − 3|Data) ·f (xt − 2|Data) ·f (xt − 1|Data) ·f (xt |Data)

∫ f (xt|μt , Data)f (μt |Data) dμt

(28)

f (xt − 3 , xt − 2 , xt − 1 , xt |Data) = c1,2·c 2,3·c3,4·c1,3 | 2·c 2,4 | 3·

(1/q)∑i q= 1y4i−4+(t mod4)

y̅ = denotes sample mean for a specific operator group in time interval t, and q is the current number of its sample data. Meanwhile, their posterior distributions of alarms can be obtained as follows:

c1,4 | 2,3·f (xt − 3|Data) ·f (xt − 2|Data) ·f (xt − 1|Data) · f (xt |Data) ∝ c1,2·c 2,3·c3,4·c1,3 | 2·c 2,4 | 3·c1,4 | 2,3·f (xt |Data) (32) 14796

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

Figure 8. Histograms of alarm predictions for four operator groups with eight selected time intervals.

where μt* and σt*denote the expectation and standard deviation of the proposal distribution, respectively. μt* satisfies μt* = xh−1 t , where xh−1 denotes the state value of the (h − 1)th iteration. t Herein, σ*t is set as 10 so that a suitable Markov chain can be obtained with its acceptance rate between 50% and 85% over the whole time periods.17 Step 3: Calculate the transition probability r, which satisfies

Thereby, the 4-dimensional density is simplified as a 1dimensional density. This simplified approach is suitable because the numbers of alarms in the previous three time periods correspond to three different distributions; their actual values represent only one possible sample that has happened. The direct coupling relations can be reflected quite well by essentially reducing the dimension of each fitted bivariate copula to one, which relates to only the current alarms with the margins of other variables fixed. Their posterior means are computed using MC integration by sampling data from marginal posterior distribution not accounting for sequential dependence E(xt |Data) =

∫0

=

+∞

xt f (xt − 3 , xt − 2 , xt − 1 , xt |Data) dxt

∑h = 1 xthf (xth− 3 , xth− 2 , xth− 1 , xth|Data)

=

∑h = 1 xthc D ‐ vinef (xth|Data)

Nit ∑h = 1 f (xth− 3 ,

xth− 2 ,

xth− 1 ,

Nit

∑h = 1 c D ‐ vinef (xth|Data) (33)

where h is the iteration counter, xht is sampled by Normal(μn,σ2n), and Nit = 100 000 is the total number of iterations to obtain a converged posterior mean, E(xt|Data). 4.4.2. Markov Chain Monte Carlo Method. To obtain more statistical information on predictive values, the MCMC method with Metropolis−Hasting (M-H) algorithm is introduced. This draws samples from approximate distributions, called a proposal distribution, and corrects them for a better approximation of the complex target distribution. The procedure for MCMC simulation is as follows: Step 1: Specify an initial value, x0t . Herein, historical datum in time t − 4, yt−4 is selected, ensuring a higher convergence rate for the created Markov chain. Step 2: Define proposal distribution of xt*, which is taken as Gaussian distribution ⎛ (x * − μ *) 1 t t ⎟⎟ exp⎜⎜ − 2 * 2π σt* σ 2 ⎝ ⎠ t

c D ‐ vinef (xth − 1|Data)

* and c

D‑vine

D‑vine

(35)

can be written as

⎧ c D‐vine* = c t − 3, t − 2(ut − 3 , ut − 2) · ⎪ ⎪ ct − 2, t − 1(ut − 2 , ut − 1) ·ct − 1, t(ut − 1 , ut*) · ⎪ ⎪ ct − 3, t − 1(ut − 3 | t − 2 , ut − 1 | t − 2) · ⎪ ct − 2, t(ut − 2 | t − 1 , ut*| t − 1) ·ct − 3, t | t − 2, t − 1 ⎪ ⎪ (ut − 3 | t − 1 , ut*| t − 2) ⎨ ⎪ c D‐vine = ct − 3, t − 2(ut − 3 , ut − 2) ·ct − 2, t − 1 ⎪ ⎪ (ut − 2 , ut − 1) ·ct − 1, t(ut − 1 , ut ) · ⎪ c (u ,u )· ⎪ t − 3, t − 1 t − 3 | t − 2 t − 1 | t − 2 ⎪ ct − 2, t(ut − 2 | t − 1 , ut | t − 1) ·ct − 3, t | t − 2, t − 1 ⎪ (u ⎩ t − 3 | t − 1 , u t | t − 2)

xth|Data)

Nit

f (xt*) =

c D ‐ vine *f (xt*|Data)

Where c

Nit

=

f (xt − 3 , xt − 2 , xt − 1 , xt*|Data)J h (xth − 1|xt*) f (xt − 3 , xt − 2 , xt − 1 , xth − 1|Data)J h (xt*|xth − 1)

r=

(36)

Considering that μ*t = and according to eq 34, we can find that the conditional proposal distribution, Jh(x*t |xh−1 t ), equals the conditional distribution of xh−1 given xt*, Jh(xh−1 t t |xt*), which satisfies xh−1 t ,

J h (xt*|xth − 1) = J h (xth − 1|xt*) =

2⎞

⎛ (x * − x h − 1)2 ⎞ 1 t ⎟⎟ exp⎜⎜ − t 2π σt* 2σt*2 ⎝ ⎠

(37)

Step 4: Accept = x*t with probability r. Otherwise, xht = Repeat for h = 1, 2,···, Nit (Nit = 100 000 is the maximum iteration).

xh−1 t .

(34) 14797

xht

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

Because the state values in the obtained Markov chain are able to describe the samples of target distribution, the MCMC method provides almost all of the statistical information on posterior distribution of alarms for each operator group. In the Supporting Information, detailed statistical information on alarms from time interval 5 to time interval 80 is given (the alarms from time interval 1 to time interval 4 are set as initial samples), including their quantiles, medians, means, standard deviations, skewness, and kurtosis. Figure 8 depicts the histograms of frequencies of four operator groups in eight different time intervals (the first and the last time windows). It shows that the alarms of four groups with their confidence intervals vary dynamically. Sequential dependence significantly affects the corresponding alarms in a specific time window. The standard deviation of each group tends to be smaller in general with the historical sample increases; this corresponds to the property of Bayesian inference technique. More specifically, alarms of group A are approximately Gaussian distributions, for their skewness and kurtosis are nearly 0. Distributions for operator group B have left tails in the beginning and then change into right tails, whereas operator groups C and D have longer left tails and peakedness shapes for the most part. These properties are due to sequential dependence modeling that makes the estimation performance more complex and diversified. It can also be found that there exist some outliers. For example, the alarms of group D in time interval 44, x44 = 158, lies outside its confidence interval with probability 90%, [125.36, 155.37]. This is mainly due to the strong randomness of the data. The fitting performance of the sequential data using D-vine copula may also affect its prediction accuracy. 4.4.3. ARMA Model and BP Neural Network Simulation. In the field of system identification, two types of prediction models are also applied, both ARMA model with the least squares method and BP neural network. In order to achieve a relative objective conclusion, some strategies are used as follows. The generalized ARMA model with time series form can be written as2 ⎧ x(t ) = A(q)x(t ) + C(q)e(t ) ⎪ ⎪ −1 −2 −n ⎨ A(q) = a1q + a 2q + ··· + anaq a ⎪ ⎪C(q) = 1 + c1q−1 + c 2q−2 + ··· + cn q−nc ⎩ c

Figure 9. AIC indices with increasing orders of the noise model.

neural networks have been widely used in control, optimization, classification, prediction, etc.41 Here, a dynamic model with a BP neural network simulation is designed to predict the abnormal events. A trilayer neural network with 3 × 9 × 1 structure is used. It has been shown by a group of simulations with different network structures that this structure can obtain relatively accurate and stable results compared with those of other selections of neuron numbers in the hidden layer. 4.4.4. Comparison Study. The predictive values of different methods are depicted in Figure 10. It can be found that all five results can track the actual values well.

(38)

where A(q) and C(q) are delay operators and na refers to the order of the process model with time series; nc denotes the order of the noise model. Herein, it is suitable to set the order of the process model as 3 because of four different operator groups, i.e., na = 3. The order of the noise model is chosen by minimizing the AIC index. As shown in Figure 9, the AIC index slows gradually with nc increases. When nc = 3, the slope of the curve becomes relatively small. Therefore, structure parameters of the ARMA model are set as na = nc = 3. Model parameters are estimated using a LS algorithm. The previous 800 data are used for training, while the remaining 80 for testing. The fitting model is optimized as

Figure 10. Comparison of the performance of different prediction models. The numbers of alarms are depicted in a time-series form with 76 time intervals, from time interval 5 to time interval 80.

Herein, the best-fit curves for different methods are plotted to give a better comparison, as shown in Figure 11. An LS algorithm is used to obtain the corresponding linear regression equations x i = β0̂ + β1̂ x 0

x(t ) = 0.9981x(t − 1) − x(t − 2) + 0.998x(t − 3) + e(t ) − 0.3423e(t − 1) + 0.9911e(t − 2) − 0.3449e(t − 3)

(i = 1, 2, ···, 5)

(40)

(39)

where xi denotes the simulation results with the ith method and x0 represents the actual data. β ̂ , and β ̂ are regression

Because of nonlinearity, high parallelism, fault and noise tolerance, and learning and generalization capabilities, artificial

coefficients. To measure the regression effect, residual standard deviation (RSD) index is employed:

0

14798

1

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

Figure 11. Best-fit curves/regression curves for different prediction models. Actual data is plotted on the x-axis, and simulation results of each method on the y-axis: (a) noncoupling, (b) MC with D-vine, (c) MCMC with D-vine, (d) ARMA model, and (e) BP neural network.

rather large for the beginning, resulting in more randomness when sampling, and the results may be difficult to converge in finite iterations. However, the MCMC method using transition probability guarantees that we can choose an efficient Markov chain with some visual indices, e.g., cumulative average. Moreover, the BP neural network model is better than ARMA and slightly worse than the Bayesian coupling model. Although the best-fit curve of BP is a little closer to the 45° diagonal line than that of MCMC with D-vine method, all three indices (RSD, MRE, and RMSE) of the MCMC with D-vine method are significantly smaller. Note that empirical data have properties of linearity (a special form of nonlinearity) and uncertainty; ARMA and BP consider only its nonlinear part and maybe mistakenly handle the stochastic part as the liner/ nonlinear factors to some extent, as shown in Figure 9. Therefore, the seemingly good prediction performance of ARMA and BP reflects some random factors. Once the training data are changed, these methods cannot exhibit good, stable results on the same level. By contrast, D-vine copula modeling has a great ability to describe this coupling or nonlinear form. Posterior distribution using Bayesian inference supplies confidence intervals, which can reflect these uncertainties well. Therefore, from the point of its prediction performance and the corresponding mechanism, the sequential dependence model proposed in this paper is more accurate and practical. Overall, it is shown that the model achieves a better prediction performance compared with that of the ARMA model and BP neural network. Most importantly, some corresponding statistics of the posterior distribution have been obtained by the MCMC method. It is concluded that the Bayesian model with D-vine copula can achieve a stable prediction performance by considering both nonlinearity and randomness.

L

∑t = 1 (xti − β0̂ − β1̂ xt0)2

RSDi =

L−2

(i = 1, 2, ···, 5) (41)

where L = 76 denotes the number of time intervals (the first four time intervals are not included) and xit denotes the predictive mean in time interval t using the ith model; x0t is the actual value in time interval t. A best-fit curve that is closer to the 45° diagonal line with a smaller RSD value indicates better prediction performance. Two indices, mean relative error (MRE) and root mean square error (RMSE) are defined to quantitatively compare the prediction performance of different methods: MREi =

RMSEi =

1 L

L



|xti − xt0| xt0

t=1

1 L

(42)

L

∑ (xti − xt0)2 (43)

t=1

The prediction results are listed in Table 1. It can be found that MRE and RMSE of D-vine model using MCMC method is Table 1. Comparison of Prediction Performance for Different Methods, MRE and RMSE Indices Are Calculated model type

noncoupling

MC+Dvine

MCMC+DVine

ARMA

BP

MRE RMSE

0.1236 20.86

0.0684 12.06

0.0650 11.40

0.0724 13.45

0.0701 12.72

0.0650 and 11.40, respectively, far better than that of the noncoupling model (0.1236 and 20.86), indicating that the sequential dependence using D-vine copula modeling exhibits a remarkable effect. The MCMC method obtains an estimation that is more accurate than that of the MC method because the convergence rate of the MC method is related to the approximated sampling distribution. Usually, in Bayesian modeling, the variance is

5. CONCLUSION The concept of sequential dependence penetrates into many fields of application, such as finance, economics, industry, etc. In this paper, we propose a novel Bayesian model using D-vine copula to predict generalized sequential data exhibiting 14799

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

Distributions from Scarce Data with Application to Risk Assessment and Fault Detection. Ind. Eng. Chem. Res. 2014, 53 (18), 7538−7547. (12) Ahooyi, T. M.; Soroush, M.; Arbogast, J. E.; Seider, W. D.; Oktem, U. G. Maximum-Likelihood Maximum-Entropy Constrained Probability Density Function Estimation for Prediction of Rare Events. AIChE J. 2014, 60 (3), 1013−1026. (13) Berg, D.; Aas, K. Models for construction of multivariate dependence: A comparison study. Eur. J. Finance 2009, 15, 675−701. (14) Brechmann, E. C.; Schepsmeier, U. Modeling Dependence with C- and D- Vine Copulas: The R Package CDVine. J. Stat. Software 2013, 52 (3), 1−27. (15) Czado, C. Pair-Copula Construction of Multivariate Copulas. In Proceedings of the Statistical Workshop, Warsaw, 2009; pp 93−109. (16) Joe, H.; Li, H.; Nikoloulopoulos, A. Tail dependence and vine copulas. J. Multivar. Anal. 2010, 101, 252−270. (17) Stewart, W. J. Probability, Markov Chains, Queues, and Simulation: The Mathematical Basis of Performance Modeling; Princeton University Press: Princeton, NJ, 2009. (18) Gelman. A.; Carlin, J. B. et al. Bayesian Data Analysis, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, 2003. (19) Sklar, A. Fonctions dé Repartition á n Dimension et Leurs Marges. Publications de l’Institut de Statistique de l’Université de Paris 1959, 8, 229−231. (20) Nelson, R. B. An Introduction to Copulas, 2nd ed.; Springer: Berlin, 2006. (21) Savu, C.; Trade, M. Hierarchical Archimedean Copulas. In Proceedings of International Conference on High Frequency Finance; Konstanz, Germany, 2006. (22) Kurowicka, D. Introduction: Dependence Modeling. In Dependence Modeling, Vine Copula Handbook; World Scientific Publishing: Singapore, 2011. (23) Joe, H. Families of m-Variate Distributions with Given Margins and m(m-1)/2 Bivariate Dependence Parameters. In Distributions with Fixed Marginals and Related Topics; Ruschendorf, L., Schweizer, B., Taylor, MD, Eds.; Institute of Mathematical Statistic: Hayward, CA, 1996; pp 120−141. (24) Bedford, T.; Cooke, R. M. Probability Density Decomposition for Conditionally Dependent Random Variables Modeled by Vines. Ann. Math. Artif. Intell. 2001, 32, 245−268. (25) Bedford, T.; Cooke, R. M. Vines - A New Graphical Model for Dependent Random Variables. Ann. Stat. 2002, 30, 1031−1068. (26) Kurowicka, D.; Cooke, R. M. Uncertainty Analysis with High Dimensional Dependence Modeling; John Wiley & Sons: Chichester, U.K., 2006. (27) Aas, K.; Czado, C.; Feigessi, A.; Bakken, H. Pair-Copula Construction of Multiple Dependence. Insurance Math. Econ. 2009, 44 (2), 182−198. (28) Smith, M.; Min, A.; Czado, C.; Almeida, C. Modelling longitudinal Data using a Pair-Copula Decomposition of Serial Dependence. J. Am. Stat. Assoc. 2010, 105 (492), 1467−1479. (29) Schepsmeier, U.; Brechmann, E. C. Package CDVine. P package version. 2013. http://CRAN.R-project.org/package=CDVine. (30) Chen, W.; Bakshi, B. R.; Goel, P. K.; Ungarala, S. Bayesian Estimation via Sequential Monte Carlo Sampling: Unconstrained Nonlinear Dynamic Systems. Ind. Eng. Chem. Res. 2004, 43 (14), 4012−4025. (31) Lang, L.; Chen, W.; Bakshi, B. R.; Goel, P. K.; Ungarala, S. Bayesian estimation via sequential Monte Carlo samplingConstrained dynamic systems. Automatica 2007, 43, 1615−1622. (32) Ungarala, S.; Chen, Z.; Li, K. Bayesian State Estimation of Nonlinear Systems Using Appropriate Aggregate Markov Chains. Ind. Eng. Chem. Res. 2006, 45 (12), 4208−4221. (33) Laberge, J. C.; Bullemer, P.; Tolsma, M.; Reising, D. V. C. Addressing alarm flood situations in the process industries through alarm summary display design and alarm response strategy. Int. J. Ind. Ergonomics 2014, 44, 395−406. (34) Adnan, N. A.; Izadi, I.; Chen, T. On expected detection delays for alarm systems with deadbands and delay-timers. J. Process Control 2011, 21, 1318−1331.

sequential dependence. The characters of the statistical model can be summarized as follows: 1. D-vine model is implemented in a specific time window. 2. Frequencies of abnormal events are estimated with confidence intervals rather than point estimations, and the Bayesian estimation has been proven to be the minimum variance estimation. 3. When compared with the ARMA model and BP neural network, the statistical model requires less information (fitted-copula parameters) of sample data (training data), and the corresponding correlation coefficients are not that sensitive to the prediction performance.



ASSOCIATED CONTENT

* Supporting Information S

Table S1, 800 historical data of alarms for four operator groups; Table S2, 80 testing data of alarms for four operator groups; Table S3, means and other statistics of simulation results using different methods. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Tel.: +86-21-64253820. E-mail: [email protected]. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The authors appreciate the National Natural Science Foundation of China (Project 21176072) and the Fundamental Research Funds for the Central Universities for their financial support.



REFERENCES

(1) Shunway, R. H.; Stoffer, D. S. Time Series Analysis and Its Applications With R Examples; Springer: New York, 2011. (2) Box, G. E. P.; Pierce, D. A. Distribution of residual autocorrelations in autoregressive integrated moving average models. J. Am. Stat. Assoc. 1970, 72, 397−402. (3) Hansen, P. R.; Lunde, A. A Forecast Comparison of Volatility Models: Does Anything Beat A GARCH(1,1)? J. Appl. Econometrics 2005, 20, 873−889. (4) Lewis, P. A. W.; Ray, B. K. Modeling Long-Range Dependence, Nonlinearity, and Periodic Phenomena in Sea Surface Temperatures Using TSMARS. J. Am. Stat. Assoc. 1997, 92 (439), 881−893. (5) Schirmacher, D.; Schirmacher, E. Multivariate Dependence Modeling Using Pair-Copulas. Presented at Enterprise Risk Management Symposium, Chicago, IL, April 14−16, 2008; 1−52. (6) Bier, V. M.; Yi, W. A Bayesian Method for analyzing dependencies in precursor data. Int. J. Forecasting 1995, 11, 25−41. (7) Yi, W.; Bier, V. M. An Application of Copulas to Accident Precursor Analysis. Manage. Sci. 1998, 44 (12), 257−270. (8) Meel, A.; Seider, W. D. Plant-specific dynamic failure assessment using Bayesian theory. Chem. Eng. Sci. 2006, 61, 7036−7056. (9) Pariyani, A.; Seider, W. D.; Oktem, U. G.; Soroush, M. Dynamic Risk Analysis using Alarming Databases to Improve Process Safety and Product Quality: Part I-Data Compaction. AIChE J. 2012, 58 (3), 812−825. (10) Pariyani, A.; Seider, W. D.; Oktem, U. G.; Soroush, M. Dynamic Risk Analysis using Alarming Databases to Improve Process Safety and Product Quality: Part II-Bayesian Analysis. AIChE J. 2012, 58 (3), 826−841. (11) Ahooyi, T. M.; Arbogast, J. E.; Oktem, U. G.; Seider, W. D.; Soroush, M. Estimation of Complete Discrete Multivariate Probability 14800

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801

Industrial & Engineering Chemistry Research

Article

(35) Engineering Equipment and Materials Users Association. Alarm Systems: A Guide to Design, Management and Procurement. Engineering Equipment and Materials Users Association: London, 1999. (36) Kleindorfer, P.; Oktem, U. G.; Pariyani, A.; Seider, W. D. Assessment of catastrophe risk and potential losses in industry. Comput. Chem. Eng. 2012, 47, 85−96. (37) Pariyani, A.; Seider, W. D.; Oktem, U. G.; Soroush, M. Incidents Investigation and Dynamic Analysis of Large Alarm Databases in Chemical Plants: A Fluidized-Catalytic-Cracking Unit Case Study. Chem. Eng. Sci. 2010, 49 (17), 8062−8079. (38) Akaike, H. Information theory and an extension of the maximum likelihood principle. In Proceedings of the Second International Symposium on Information Theory; Akademiai Kiado: Budapest, 1973; pp 267−281. (39) Czado, C.; Schepsmeier, U.; Min, A. Maximum Likelihood Estimation of Mixed C-Vines with Application to Exchange Rates. Stat. Modell. 2012, 12 (3), 229−255. (40) Genest, C.; Favre, A. Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydrol. Eng. 2007, 12, 347−368. (41) Basheer, I. A.; Hajmeer, M. Artificial neural networks: Fundamentals, computing, design, and application. J. Microbiol. Methods 2000, 43, 3−31.

14801

dx.doi.org/10.1021/ie501863u | Ind. Eng. Chem. Res. 2014, 53, 14788−14801