Hierarchical Multiblock T-PLS Based Operating Performance

Oct 9, 2018 - With the pursuit of profit, the operating performance assessment of industrial processes has gradually attracted attention and concentra...
0 downloads 0 Views 2MB Size
Article Cite This: Ind. Eng. Chem. Res. 2018, 57, 14617−14627

pubs.acs.org/IECR

Hierarchical Multiblock T‑PLS Based Operating Performance Assessment for Plant-Wide Processes Yan Liu,*,†,‡,§ Fuli Wang,†,‡ Furong Gao,§ and Haonan Cui∥ †

College of Information Science & Engineering, Northeastern University, Shenyang, Liaoning 110819, China State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang, Liaoning 110819, China § Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong ∥ Zhongxing Telecommunication Equipment Corporation, Shenzhen, Guangdong 518000, China

Downloaded via UNIV OF SOUTH DAKOTA on October 31, 2018 at 14:44:26 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



ABSTRACT: With the pursuit of profit, the operating performance assessment of industrial processes has gradually attracted attention and concentration in modern industrial production. However, owing to the complexity of plant-wide processes, many existing operating performance assessment techniques are lacking in interpretation of the assessment result, and the cause identification is also difficult for the nonoptimal operating performance. In this study, a new operating performance assessment method is proposed for plant-wide processes based on hierarchical multiblock total projection to latent structures (T-PLS). The proposed approach divides the process variables into several conceptually meaningful blocks and incorporates block information into high level for operating performance assessment of plant-wide processes. With the new method, the assessment task can be greatly reduced and the interpretability of the assessment models are greatly improved. Once the process operating performance is nonoptimal, the responsible block is first located by the block assessment result, and then the variable contributions are calculated in the corresponding block so as to narrow the search and improve the accuracy of the cause identification result. A case study of the gold hydrometallurgy process evaluates the feasibility and efficiency of the proposed method. optimal operating level.16 Subsequently, managers and operators can have a good grasp of the operating level by operating performance assessment, and then propose some suggestions on operating adjustment and performance improvement. Despite a rich body of literatures in industrial process monitoring, researches on operating performance assessment of industrial processes are still in its infancy. In our previous work, some methods have been developed on this issue.19−21 It is worth noting that many modern industrial processes are designed very complex. On the one hand, it is due to the high production difficulty of the products. For example, annealing furnace is an important part of continuous annealing line.22 In order to produce the products that meet the special requirements on hardness and ductility, the processes in annealing furnace include heating, soaking, cooling, over aging and final cooling, and the steel strip has to travel all these furnace units driven by a series of tension rollers. On the other hand, the requirement for high precision on the products also increases the complexity of the manufacturing process. One

1. INTRODUCTION Generally, the process operating performance is maintained at a good operating level in the early stage of the production. However, due to process disturbances, parameter drift and operation errors, the operating performance may deteriorate with time from the optimal level, which cancels the benefits of technology designs for operation optimization and results in a degraded operating behavior. Hence, it is necessary to propose an effective assessment strategy of operating performance for industrial processes. Over the past decades, many process monitoring methods have been developed,1−6 among which principal component analysis (PCA) and partial least-squares (PLS) are most widely used. Thereafter, several extensions to handle various factors, including process dynamics,7−9 nonlinearity,10,11 non-Gaussianity,12−15 are also available. However, since the main task of process monitoring is to maintain the production safety, it cannot satisfy the quest by enterprises for profits any longer. To most enterprises, the final goal is the profit maximization, so it is important to develop an effective strategy to ensure the optimal operating performance of the production process. As a starting point, the operating performance assessment of industrial processes came into being.16−18 Under the normal and safe operating conditions, the operating performance assessment is to measure how far is the gap between the current operating condition and the © 2018 American Chemical Society

Received: Revised: Accepted: Published: 14617

June 16, 2018 August 28, 2018 October 9, 2018 October 9, 2018 DOI: 10.1021/acs.iecr.8b02685 Ind. Eng. Chem. Res. 2018, 57, 14617−14627

Article

Industrial & Engineering Chemistry Research example is the gold hydrometallurgy process.23−28 To extract gold from the low grade ore as much as possible, the process is usually designed to include two cyanide leaching units, two washing units, and one cementation unit. These complex industrial processes usually contain a large number of process variables coming from many different operating units, and they are known as plant-wide processes. As the complexity of the plant-wide processes increases, the difficulty of operating performance assessment also increases significantly, which is reflected in the following two aspects: (i) increase the complexity of process analysis; (ii) reduce the interpretabilities of the assessment model and the nonoptimal cause identification result. Thus, it is of a considerable challenge for the operating performance assessment of plant-wide processes. In the field of process monitoring, many hierarchical and multiblock strategies have been proposed to overcome the complexity of plant-wide processes for fault detection.29−35 The basic idea of the hierarchical and multiblock strategies is to divide the total process variables into several meaningful blocks so as to obtain local information and global information simultaneously from the process data. Since the number of process variables is obviously reduced in each individual block, the complex large-scale problem can be solved by dealing with several simple problems. However, to the best of our knowledge, seldom studies have been reported to deal with the operating performance assessment of plant-wide processes in a decentralized manner so far. Thus, the pioneer work has provided abundant theoretical bases for our following work. In this study, a novel operating performance assessment method is proposed for plant-wide processes based on hierarchical multiblock total projection to latent structures (T-PLS). The core idea is to break a plant-wide process into several conceptually meaningful blocks according to the physical processing units, and then build an assessment strategy by incorporating block information into the high level for the operating performance assessment of plant-wide processes. In this way, we can obtain both the local information within each block and the cross-information among multiple blocks. Also, the effects of different level information can be characterized with respect to the operating performance assessment of plant-wide processes. In theory, according to the process complexity, a plant-wide process can be divided into many blocks in horizontal and many layers in vertical. Each block corresponds to a physical processing unit, and the process variables within the block have close correlations with each other; while each layer could include several blocks and the cross-information among different blocks can be described in this level. In order to facilitate the description and understanding, a plant-wide process with two layers and several blocks is taken as an example to illustrate the proposed method in this study. Furthermore, there is no essential difference when it is applied to the processes with more layers and blocks. Two layers are separately named as unit layer and plant-wide layer from the bottom up. Figure 1 is the schematic diagram of the hierarchical multiblock structure. In the unit layer, the assessment models of different units are built based on the performance-related information extracted by T-PLS.36 They reveal the impacts of the behaviors on the local operation performance and lay the foundation for the plant-wide assessment model. Also, the unit assessment models will play an important role in nonoptimal cause identification. Furthermore, the plant-wide assessment model is established in the plant-wide layer on the basis of the unit assessment models.

Figure 1. Schematic diagram of the hierarchical multiblock structure of a plant-wide process.

Both the main variation information of the plant-wide processes and the cross-information among different units can be extracted by T-PLS. The plant-wide assessment model characterizes the combined behaviors of all units under the consideration of the correlations and interactions among them and shows their effect on the operating performance of the plant-wide process. It plays a part in the operating performance assessment from the global perspective. The complexity of the process analysis is greatly reduced by making a meaningful decomposition on the plant-wide process in a hierarchical multiblock manner, meanwhile, the interpretability of the assessment models is significantly improved in different layers. In online assessment, the assessment strategy is developed based on the similarities between the performance-related information of online data and those of the assessment models in both unit layer and plant-wide layer. The greater similarity means better running state. At the appearance of the nonoptimal operating performance, we can quickly locate the responsible unit based on the assessment results in the unit layer. Then the variable contributions are calculated in the responsible unit for nonoptimal cause identification. Compared with the direct analysis of nonoptimal cause from the perspective of plant-wide, this manner can narrow the range of cause traceability and exclude the interferences of unrelated units so as to improve the accuracy of the cause identification result. The schematic diagram of proposed method is shown in Figure 2. The proposed assessment method can be used in cases where the number of process variables is large and additional information is available for blocking the variables into conceptually meaningful blocks. The contributions of the present paper are summarized as follows: (i) process analysis becomes easier after breaking down a plant-wide process into

Figure 2. Schematic diagram of proposed method 14618

DOI: 10.1021/acs.iecr.8b02685 Ind. Eng. Chem. Res. 2018, 57, 14617−14627

Industrial & Engineering Chemistry Research

l o t y ,new = q TRTxnew = gyxnew ∈ R, o o o o o o o T T T (A − 1) × 1 o o , o to ,new = Po (P‐pyq )R x new = Goxnew ∈ R o m o o o o tr , new = PTr (I‐PRT)xnew = Gr xnew ∈ RA r × 1, o o o o o o o ̑ new ∈ RJ × 1. o xȓ , new = (I ‐Pr PTr )(I‐PRT)xnew = Gx n

multiple layers and multiple blocks; (ii) improve the interpretability of the assessment models and make the meanings more clearly; (iii) narrow down the scope of nonoptimal cause identification. The organization of this paper is as below. Section 2 presents a brief description of the T-PLS algorithm. Section 3 demonstrates the hierarchical multiblock T-PLS based assessment method proposed in this study. A application study of the gold hydrometallurgy process is presented in section 4. Finally, the paper ends with some conclusions and acknowledgments.

3. OPERATING PERFORMANCE ASSESSMENT OF PLANT-WIDE PROCESSES BASED ON HIERARCHICAL MULTIBLOCK T-PLS In this section, detailed methodology is described for the proposed hierarchical multiblock T-PLS based assessment method. Orienting to the plant-wide process which can be decomposed into the hierarchical multiblock structure as shown in Figure 1, the proposed method is to build the assessment models in both unit layer and plant-wide layer by T-PLS. The unit assessment models portray the impacts of the local behaviors on the process operation performance, lay the foundation for the plant-wide assessment model, and play an important role in locating the position of nonoptimality. Thereafter, the plant-wide assessment model is established on the high level for operating performance assessment from a global point of view, where the cross-information between units is extracted and dimensionality reduction is further enhanced for process variables. In online assessment, the assessment indices are construct based on the similarities between the performance-related information of online data and those of the assessment models in both unit layer and plant-wide layer, and then a decision can be made on the operating performance of the plant-wide process according to the value of the assessment index. Once the process operating performance is nonoptimal, the responsible unit can be quickly positioned through the assessment results in the unit layer, and then the variable contributions are calculated in the responsible unit for nonoptimal cause identification. 3.1. Establishment of Assessment Model. According to the process knowledge and expert experience, the process data under optimal operating performance are collected from the historical data and used to establish the assessment models. In plant-wide processes, the number of process variables J is always very large. To improve the interpretability of the assessment model and reduce the complexity of process analysis, process variables are divided into several blocks based

(1)

where T∈RN×A and P∈RJ×A are the score matrix and loading matrix of X, respectively; A is the number of PLS component and can be determined by cross-validation;38 q∈RA×1 is the loading vector of y; E and f are the residual matrix and residual vector of X and y, correspondingly. With the original weight matrix R = W(PTW)−1∈RJ×A,39 score matrix T can be calculated from X directly. W is the weight matrix of X and is obtained from the PLS decomposition. T-PLS algorithm is implemented on eq 1 by setting ty=Tq∈RN×1 first. In PLS systematic subspace, the outputunrelated information X̂ o is further separated from the outputrelated information X̂ y, where X̂ o = X̂ −X̂ y, X̂ = TPT, X̂ y =typTy and py = X̂ Tty/tTy ty. Run PCA on X̂ o with A − 1 components, obtaining X̂ o = ToPTo . Furthermore, in PLS residual subspace, the large process variation information X̂ ris separated from noise Er by PCA decomposition on E with Ar components, where Ar < J-A and X̂ r = TrPTr . Thus, based on T-PLS algorithm, X and y are eventually modeled as follows: T T T l o o o X = typy + ToPo + TrPr + Er m o o o y = ty + f o n

(3)

with gy = qTRT, Go = PTo (P-pyqT)RT, Gr =PTr (I-PRT) and G̑ = (I-PrPTr )(I-PRT). It is well-known that the process operating performance has a close relationship with the comprehensive economic index, such as the costs, profits, total revenue, product quality or the weighted integration of several important production indices. If the comprehensive economic index approaches or reaches the optimal level in history, it can be sure that the process is operating on the optimal state. Additionally, the process variant information closely related to the comprehensive economic index is contained in the process data. It is just the performance-related information and applicable to operating performance assessment. As known from the TPLS decomposition, the performance-related information precisely refers to the score ty, so ty and ty,new are used from the perspective of operating performance assessment, and gy is essentially the parameter of the assessment model.

2. REVIEW OF TOTAL PROJECTION TO LATENT STRUCTURES T-PLS is an improved version of partial least-squares (PLS), where the output-unrelated part and output-related part are further decomposed in PLS systematic subspace, and large process variations are separated from noise in PLS residual subspace. In this way, more accurate information can be provided for whom cares about certain aspects of the whole information. Assuming that X = [x1,x2,···,xN]T∈RN×J and y = [y1,y2,···,yN]T∈RN×1 are the process data and output data, respectively, where N is the number of samples and J is the number of process variables. Based on the nonlinear iterative partial least-squares (NIPALS) algorithm,37 the normalized (X,y) is projected into a low-dimensional space and formulated as follows: T l o o X = TP + E m o o y = Tq + f n

Article

(2)

where ty∈RN×1, To∈RN×(A−1) and Tr∈RN×Ar are the scores directly correlated to y, orthogonal to y and the main part of E, respectively; py∈RJ×1, Po∈RJ×(A−1) and Pr∈RJ×Arare the loadings separately corresponding to ty, To and Tr; Er∈RN×J and f∈RN×1 are the residual parts of X and y respectively, and they represent noise. For a new sample xnew∈RJ×1, the scores and residual part are calculated as below: 14619

DOI: 10.1021/acs.iecr.8b02685 Ind. Eng. Chem. Res. 2018, 57, 14617−14627

Article

Industrial & Engineering Chemistry Research

it is significant to make intelligent selection of variable blocking based on prior knowledge so that the most important correlations are extracted within divided blocks. When the established models are used for process operating performance assessment, it makes the process analysis easier and the meaning of each unit assessment model clearly. Moreover, with the development of the plant-wide assessment model, the cross-information between units can be efficiently extracted. The variable dimension may also be reduced in the high level, and more redundant information can be removed from the process data, which will also enhance the performance of the assessment method. 3.2. Online Operating Performance Assessment. The basic idea of online assessment is to compare the similarity between the performance-related information of online data and those of assessment models. If the process operating performance is optimal, the performance-related information contained in the online data must be consistent with those covered by the assessment models. Thus, one can use the similarity of the performance-related information to evaluate the process operating performance in real time. For new samplexnew, it is first divided into several blocks in the same manner of offline modeling and denoted as follows:

on prior knowledge. As shown in Figure 1, it is assumed that there are a total of Q units in the plant-wide process. Then the process data of the qth unit is denoted as X q = [xq1,xq2,···,xqN]T∈RN×Jq, q = 1,2,···,Q, where Jq is the number of Q process variables and satisfies ∑q = 1 Jq = J . So the modeling data matrix X can be expressed in the block matrix format, that is, X = [X1,X2,···,XQ]T. Center Xq and y to zero mean and scale them to unit variance before modeling, and the assessment model of the qth unit is formulated as follows: l X q = tq pqT + TqPqT + TqPqT + Eq o o y y o o r r r o o m o q q o o y = ty + f o n

tqy ,

(4)

where and are the scores directly correlated to y, orthogonal to y and the main part of Eqr , respectively; pqy , Pqo and Pqr are the loadings of tqy , Tqo and Tqr correspondingly; Eqr and fq are the residual parts of Xq and y, respectively. For sample xqn,n = 1,2,···,N, its score can be calculated as below: Tqo

Tqr

tqy , n = qyqT RqTxnq = g qyx qn

gqy

(5)

qT qqT y R

where = is the parameter of the unit assessment model; qqy is the loading vector of y, and Rq is the original weight matrix obtained from PLS. After all of Q unit assessment models are built, the performance-related information extracted from different units is arranged as follows: X̃ = [t 1y , t y2 , ···, t yQ ] = [x1̃ , x̃2 , ···, xÑ ]T ∈ RN × Q

tqy

[tqy,1,tqy,2,···,tqy,N]T

1 1 2 2 2 x new = [x 1new,1, xnew,2 , μ , xnew, J |xnew,1 , xnew,2 , μ , xnew, J , 1

1 2 Q T = [xnew , xnew , μ , xnew ]

(9)

xqnew

(6)

[xqnew,1,xqnew,2,···,xqnew,Jq]

Where = is the process data belongs to the qth unit. Normalize xqnew with the same mean and standard deviation of the corresponding unit assessment model. Then the score of xqnew is calculated as follows:

where = and x̃n = Although the correlations within each unit are well extracted, the correlations between units are not well extracted. Therefore, we build the assessment model on a high layer to extract the cross-information between units. According to the newly defined process data X̃ , the plant-wide assessment model is established as follows:

qT t yq,new = qqT RqTxnew = g qyx qT new y

(10)

T h e r e a f t e r , Q s c o r e s a r e a r r a n g e d a s x̃ n e w = [t1y,new,t2y,new,···,tQy,new]T. In order to extract the cross-information between units, the performance-related information in the plant-wide layer can be extracted from x̃new and is formulated as below:

(7)

The score of sample x̃n is calculated by tỹ , n = q̃ Ty R̃ Tx̃ n = g̃ yx̃ n

2

Q

[t1y,n,t2y,n,···,tQy,n]T.

l o X̃ = t ỹ p̃ Ty + Tõ PTo + Tr̃ PTr + Ẽ r o o o m o o o y = t ỹ + f ̃ o n

T

Q Q Q |,μ ,|xnew,1 , xnew,2 , μ , xnew, J ]

tỹ ,new = q̃ Ty R̃ Tx̃ new = g̃ yx̃ new (8)

(11)

To measure the similarities between the performance-related information of online sample and those of different assessment models, the Euclidean distances between scores are calculated correspondingly:

q̃Ty R̃ T

where g̃y = is the parameter of the plant-wide assessment model, q̃y and R̃ are the loading vector of y and the original weight matrix obtained from PLS, respectively. Figure 3 illustrates the proposed hierarchical multiblock assessment modeling strategy. It is worth noting that the block dividing plays an important role in the hierarchical multiblock assessment method. Hence,

d q = (t yq,new − t ̅yq)2

(12)

d = (tỹ , new − ty̅ )2

(13)

N N where t ̅yq = ∑n = 1 t yq, n/N and ty̅ = ∑n = 1 tỹ , n/N are the means of the scores in the unit layer and plant-wide layer, respectively. According to the property of T-PLS,tqy̅ and ty̅ are all zeros. Thus, eqs 12 and 13 can be rewritten as

Figure 3. Hierarchical multiblock assessment modeling strategy 14620

d q = (t yq,new )2

(14)

̃ )2 d = (ty,new

(15) DOI: 10.1021/acs.iecr.8b02685 Ind. Eng. Chem. Res. 2018, 57, 14617−14627

Article

Industrial & Engineering Chemistry Research

meaningful to normalize the raw variable contribution contrraw j to ensure all variables giving the same contribution statistically in this condition. Thus, contrraw is normalized by j

For the convenience of online assessment, the assessment indices are defined based on dqand das follows: γ q = e−β1d

q

(16)

γ = e−β2d

contrj =

(17)

where β1and β2 are adjustable parameters defined by users. Mathematically, γq and γ are the exponential functions of dq and d with the range of (0,1] correspondingly. Taking γqas an example, if γqis near 1, it means that dqis close to 0 and the distance of the performance-related information between the online data and the unit assessment model is very close. Thus, one can consider that the process operating performance is satisfactory in the qth unit. In contrast, if γq is close to 0, it indicates that the process operating performance is barely satisfactory. The parameter β1 can be used to determine the shape and gradient of the exponential function. The larger β1is, the steeper the function is and the fasterγq decreases. An appropriate β1should be able to make γq close to 1 for the optimal operating performance and close to 0 under the worst operating performance. In addition, it should make sure that the values of γq at different samplings are separated as far as possible under optimal and nonoptimal operating performances. To do this, the process data under different operating performances should be selected from the historical data, and then the assessment indices are calculated, respectively. Thereafter, the values of β1 and β2 can be determined based on expert experience. To make a strict distinction between optimality and nonoptimality, a threshold θ(0.5