Multiresolution Soft Sensors: A New Class of Model Structures for

Mar 14, 2017 - can also be used to derive an online model that is able to provide estimates of the quality variable at .... Schematic illustration of ...
0 downloads 10 Views 699KB Size
Subscriber access provided by University of Newcastle, Australia

Article

Multiresolution Soft Sensors (MR-SS): A New Class of Model Structures for Handling Multiresolution Data Tiago J Rato, and Marco Seabra Reis Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.6b04349 • Publication Date (Web): 14 Mar 2017 Downloaded from http://pubs.acs.org on March 23, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Industrial & Engineering Chemistry Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Multiresolution Soft Sensors (MR-SS): A New Class of Model Structures for Handling Multiresolution Data

Tiago J. Rato, Marco S. Reis*

CIEPQPF, Department of Chemical Engineering, University of Coimbra, Rua Sílvio Lima, 3030-790, Coimbra, Portugal,

Equation Chapter (Next) Section 1

*Corresponding author: e-mail: [email protected], phone: +351 239 798 700, FAX: +351 239 798 703

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract The key quality features of industrial processes are typically obtained offline with a considerable delay and by resort to expensive equipment. To avoid this experimental burden, soft sensors have been developed. However, the current methodologies assume that all variables under analysis have the same level of granularity, while in reality they often present a multiresolution structure, with some variables containing instantaneous information of the process and others representing averages over hours, shifts, or production batches. Furthermore, this multiresolution structure is quite often confused with a multirate or multiscale problem and therefore a clear definition of their main differences is highlighted. Based on this distinction, we propose a new class of model structures that explicitly incorporates multiresolution in industrial soft sensors. Several simulated examples demonstrate that the proposed approach leads to better prediction capabilities than its counterparts and is also robust to mismatches between the modelling assumptions and the actual multiresolution structure of data.

Keywords: Soft sensors; Multiresolution data; Partial least squares; Dynamic partial least squares.

ACS Paragon Plus Environment

Page 2 of 45

Page 3 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

Introduction

Industrial processes are usually equipped with a large variety of sensors that track the process evolution over time. However, quality variables tend to be available less frequently due to the nature of the associated measurement systems, which usually involve complex and time consuming experimental protocols, performed by trained personnel

1-3

. Given their importance, these variables are often the targets for process

supervision, monitoring, control and optimization, for which more frequent estimates are highly desirable. Soft sensors are inferential models developed to achieve this goal, bringing other associated benefits, such as a reduction of the inspection overhead, improvements in the consistency of the final product quality and in process efficiency, among others 1, 4-7. In the workflow for soft sensor development two aspects should be present (i) the appropriateness of the model structure to accommodate for the process-driven features (dynamics, non-linearity, system dimensionality, natural blocks of variables, etc.) and (ii) the structure of the data available to implement them (missing data, acquisition rates, variables resolution, presence of outliers, delay structure, etc.). Both aspects have and influence on the nature of the inferential model to be proposed and tuned for acting as a reliable industrial soft sensor. Regarding the first aspect, there are essentially two paths for developing a model with the appropriate process-driven characteristics: the model-based path and the data-driven or empirical path. Model-based approaches concern the use of first principles models describing the fundamental physical and chemical phenomena taking place in the process. Even though this type of approaches can be applied to a broad operation region, they rely on the existence of profound and detailed knowledge about the system, as well as on accurate estimates for all the parameters involved, something that may be difficult to achieve in many industrial contexts. On the other hand, the second class of approaches makes use of data-driven empirical methods taken from different domains of data science (statistics, machine learning, chemometrics, etc.) in order to build predictive models based on historical data. Examples of methodologies employed in this context include: principal components regression 8, partial least squares neural networks

12

, neuro-fuzzy systems

9-11

, artificial

13, 14

, classification and regression trees

(including their bagging and boosting variants)

15

, machine learning algorithms

ACS Paragon Plus Environment

16

,

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Gaussian processes

17, 18

and support vector machines

19, 20

Page 4 of 45

. Application examples of

these models include the use of partial least squares (PLS) and dynamic PLS to predict distillate and bottom compositions in a distillation column 21; moving average PLS soft sensor for estimating the product’s quality in a batch polymerization process layer perceptron neural networks for estimation of cement fineness

10

; multi-

22

; or the use of

dynamic artificial neural networks models to estimate NOx and O2 emissions in industrial boilers 23. These examples clearly show the wide scope of application of soft sensors and their potential utility; more examples can be found in several of the review papers published on this thematic 1, 5, 6. The second aspect of soft sensor development, related to the structure of data to be used in their implementation, has received far less attention from the scientific community. In this domain, published works address mostly the existence of outliers data

28-30

and different sampling rates (multirate data)

24-27

, missing

2, 31-33

. Regarding the last aspect,

several approaches were proposed. A simple solution is to downsample the more frequently observed variables and then build a model with the remaining low sampling rate data. By doing so, information on the highly sampled variables is lost. Furthermore, this type of model is only capable to predict observations at the same low frequency rate that was adopted for building the model. As an alternative, numerical interpolation can be used to insert the missing low frequency observations. Likewise, lifting techniques can also be used to derive an online model that is able to provide estimates of the quality variable at higher sampling rates

2, 34, 35

. However, even in this case several

observations are discarded, which may deteriorate modeling performance, especially if the process has dynamic characteristics. To circumvent this limitation, approaches based on finite impulse response (FIR) models have been proposed to weight past observations before including them in the model

31, 36

. In a similar manner, Shang,

4

Huang, Suykens and Huang introduced a regularization approach to DPLS in order to smooth out the coefficients over time based on the idea that nearby observations should be closely related. Even though these multirate approaches can cover a relevant number of applications, they may be sometimes misused. This happens when the data structure is wrongly assessed. One such case occurs when a multiresolution data structure is wrongly taken as a multirate data structure. In fact, both of them look exactly the same and are virtually indistinguishable from the standpoint of any unsupervised analysis on the data

ACS Paragon Plus Environment

Page 5 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

sets, as illustrated in Figure 1: both the multirate and multiresolution cases depicted in this figure present the same data collection pattern. However, in the multirate case the values represent the instantaneous measurements of the variables with different sampling rates, whereas in the multiresolution case they contain information about the process with different levels of granularity (different resolutions) – the different supports are indicated by the dotted lines and the value is launched in the table at the end of each aggregation period. Therefore, background knowledge about the way the values are obtained is fundamental to correctly identify the underlying data structure.

Multirate Time k k-1 k-2 k-3 k-4 k-5 k-6 k-7 k-8

X1

X2

X3

Multiresolution X4

X1

X2

a)

X3

X4

b)

Figure 1 Schematic illustration of two data sets with the same pattern of collected data (black dots represent observations for each variable), but with different data structures: a) multirate; b) multiresolution.

In summary, the definition of the data structure is dependent on the way the values in the table are obtained: aggregation rules, averaging time support, batch formation, sample preparation, etc. For example, let us consider the reading of concentrations in reactive systems. In these cases, a sample of the reactive mixture can be collected and accumulated over a pre-specified time interval and then sent to the laboratory for analysis. The returned value does not respect solely to the final period of the sampling procedure, but rather to its full duration, representing an overall (delocalized) measure of concentration in this period. If several variables are collected in this way, or

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 45

averaged/aggregated with different time spans, as often happens in the process DCS’s, a data set with multiresolution characteristics is generated. The current so called “multiresolution” approaches also fail to accommodate this data structure. In fact, this nomenclature does not faithfully reflect the analysis being done and should be corrected to “multiscale” (instead of “multiresolution”). The reason lays in the fact that such methods apply a wavelet transformation in order to decompose single-resolution signals into several time-frequency scales

37-41

. This leads to a set of

detail coefficients, which represent the specific contribution of each scale, and approximation coefficients, with the coarsest approximation of the original signal (low frequency bands). As coefficients at different scales are analysed, these methodologies should be called “multiscale”. A multiresolution approach analyses signals represented at different resolutions or granularity levels (i.e., only approximation coefficients at different scales are considered, and not the detail coefficients). This is the differentiating aspect of multiscale and multiresolution approaches. These terms were often used interchangeably in the past for the case of multiscale analysis, because the wavelet decomposition is also referred to as a multiresolution analysis, but this designation stems from the original application in image analysis, where the purpose was indeed to analyse images at different levels of granularity. In the context of the present article (and not only), “multiscale” and “multiresolution” represent different types of methodologies, and we recommend to follow our proposed nomenclature. From the wavelet theory perspective, multiresolution data sets are composed only by approximation coefficients of the collected variables at different scales and there is no information on the detail coefficients. To further highlight the differences between multirate, multiscale and multiresolution approaches, in Table 1 the main data characteristics that they account for during modelling are presented. This includes the accommodation of different time supports and sampling rates; modelling of dynamic dependencies; and modelling of multiresolution dependencies.

Table 1 Differentiating aspects of multiresolution, multirate and multiscale methodologies.

Methodology

Handles data

Handles data with

Make use of both wavelet

with different

different sampling rates

detail and approximation

resolutions

(granularity)

coefficients

ACS Paragon Plus Environment

Page 7 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

(granularity) Multirate

No

Yes

No

Multiscale

No

No

Yes

Multiresolution

Yes

Yes

No

In multiresolution scenarios, the relationships between variables at different resolutions can be adequately represented using a tree structure connecting their measurements with different granularities (see Figure 2 and references 38, 39, 42). However, this structure and other characteristics of multiresolution data are not currently addressed during the construction of soft sensors. Dynamic approaches (e.g., DPLS), for instance, only address the time relationships, leaving the resolution domain unmodeled. Therefore, an appropriate framework to systematically handle multiresolution data and integrate them into model building is highly desirable. To this end, a novel modelling methodology is proposed and studied in this work, including the formal definition of the required data processing operations and procedures to estimate and apply multiresolution models. The rest of this paper is organized as follows. In the next section the fundamental operations needed to develop and define the multiresolution models are introduced and described. These operators are then employed in the construction of models with different levels of complexity in Section 3, where a weighting procedure is also proposed for handling multiresolution data. Afterwards, the proposed multiresolution soft sensors are tested and compared in a set of simulated case studies (Section 4). The results obtained are discussed in Section 5 and the main conclusions of this work are summarized in Section 6.

2

Construction of Multiresolution Empirical Models: Definitions and Operators

In a multiresolution scenario, variables’ values are made available at different time instants, depending on the associated granularity, a feature that is also present in a multirate scenario (see Figure 1 and the discussion around it). However, in a multiresolution setting, data convey information over a longer time period, namely the

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 45

time horizon in between recorded observations (henceforth called, the variables’ time support). In this case, if a variable has its values available only at the end of every r time intervals, then each one of these values represents an average over the last r time instants (this time interval corresponds to the finer time grid established by the sensors with the fastest acquisition rates). This type of variable is represented as x(r), where r is the period of time between observations, which characterizes the variable granularity in a multiresolution context (it is the variable’s time support). The process data is also characterized by a quality variable (y) that needs to be predicted by a set of readily available process variables (x). The quality variable is assumed to be sampled less frequently (higher granularity or lower resolution), namely with a time support of r. On the other hand, the process variables can have any assortment of resolutions with a time support, q, ranging from 1 to r. As considered in similar application scenarios 37, 38, the variables’ time supports are assumed to be multiples of 2, leading to a computational procedure that can be represented by a dyadic tree. Otherwise, an interpolation scheme can be applied to match this dyadic scheme. To better contextualize the notation used, let us consider the case where the quality variable y(4) has a low resolution, with time support 4, and that three process variables are available: (i) x1(1) a variable with high resolution, time support 1; x2(2) a variable representing averages over every two samples, time support 2; x3(4) a variable with low resolution, representing averages over every four samples, time support 4. The data (q) from this process can then be arranged according to Equation (1) where xi , k is the ith

process variable with time support q, collected at time k. (1)   x1,1  −    (1)  −    x1,2     x (1)  −    1,3  (4)  (1)   x1,4  y4  (1)  −  = f   x1,5     (1)   x1,6  −    (1)  −    x1,7  (4)  (1)   x1,8  y8   M    M   

− (2) x2,2 − (2) x2,4

− (2) x2,6 − (2) x2,8 M

−   −  −   (4)  x3,4  −   −   −   (4)  x3,8  M  

ACS Paragon Plus Environment

(1)

Page 9 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Likewise, yk( r ) represents the quality variable with time support r at time k. Note that all variables are only simultaneously available every r observations. The corresponding schematic representation of the data table and the way its values were computed is presented in Figure 2.

Figure 2 Graphical representation of the data hat is actually collected over time for each variable (in grey) and their relationship with the highest resolution (r = 1) represented as a dyadic tree.

By grouping all the variables with the same resolution in X ( q ) (where q represents the time support), Equation (1) can be more generally written as,

(

Y ( r ) = f  X (1)

X (2) L X ( q ) 

)

(2)

The approaches proposed to handle the structure of multiresolution data arranged as shown in Equation (2) are described in the Section 3. These approaches make use of a set of operators and associated nomenclature to be introduced in the following subsections. The operators are expressed in matrix format and represent the multiresolution extension of well-known operations for manipulating data

43-45

. If

multiple operators are applied to the data, the order of operations is set from right to left, following the usual convention from functional analysis. For instance, in the case ( q,r )

of D ( r ) L( q ,r ) X( q ) , the operator L

is applied first and D ( r ) is applied to its outcome

(these operators are described in the following sections).

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 45

2.1 Downsampling Operator As the quality variable is available with the coarsest granularity and consequently at the lowest frequency, the simplest approach for modeling it, consists in downsampling the remaining process variables with finer granularities to the same low frequency. This is done by discarding all the x observations without a corresponding value in y. In order to codify this procedure in a compact way, a downsampling operator is introduced. The downsampling operator, D(r), retains every k ⋅ r observation of the original data matrix, with k = 1,K ,  n / r  , where  z  represents the largest integer less than or equal to z. Therefore, the matrix representation of D(r) consists of a n / r × n array of zeros with 1 on the ( k , k ⋅ r ) elements, k = 1,K, n / r  . Recalling the example of Equation (1), the application of D(4) to X (1) leads to,

D (4) X (1)

(1)  x1,1   (1)   x1,2  (1)   x1,3  (1)  (1)  0 0 0 1 0 0 0 0   x1,4   x1,4  =   (1)  =  (1)   0 0 0 0 0 0 0 1   x1,5   x1,8  (1)  x1,6   x (1)   1,7  (1)  x1,8 

(3)

In this example, the D(4) operator retains every fourth observation, leading to a reduced data set with a lower sampling frequency. Note however, that this operator does not change the resolution of the variables, since they are still related in the same way to the original time support (an operator that changes the variables resolution will be introduced in Section 2.4).

2.2 Lag Operator Given the natural dependency of the quality variable from past observations (either due to process dynamics, or because of the time support of the window relative to its resolution, that naturally involves past observations) a lag operator is here introduced.

ACS Paragon Plus Environment

Page 11 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

This operator introduces lags in the observations of x taking into account its resolution (time support q) and the size of the time window to be modelled (r). For modeling static relationships between x and y, only the information of x contained in the same support of y is needed. The lag operator is instrumental to collect this data, and therefore there exists a link between the number of lags to consider for each x and the time support of y. More specifically, the lag operator creates an extended data matrix with time shifted observations of x that fall inside a window of size r. The procedure is codified through the use of the backward shift operator z–1, defined as 46

,

x ( k − 1) = z −1 x ( k )

(4)

The time support of y implies that the observations to be lagged are located in the time indices k , k − 1, k − 2,..., k − (r − 1) (note that k − r falls in the next time window). On the other hand, the time support of x indicates that it is obtained at multiples of q. Therefore, the time indices of interest are k , k − q , k − 2 q,..., k − ( r − 1) q  q . Under these definitions, the lag operator, L( q ,r ) , shifts the observations collected every q samples over a window of r observations and is represented as,

L( q ,r ) =  z 0 

z − q L z − j ⋅q L z

− ( r −1) / q  ⋅q

 , j = 0,K , ( r − 1) / q    

(5)

In matrix form, this operator is applied by means of the Kronecker product (represented by ⊗ ). This is exemplified for the case of Equation (1), by applying L(

1,4 )

L(1,4 ) X (1) =  z 0

z −1

z −2

(1) (1)  x1,1   x1,1  (1)   (1)  x1,2   x1,2 (1)  (1)  x1,3  x1,3  (1)   (1) x   x1,4 =  (1) z −3  ⊗  1,4 (1)  x1,5 x  (1)   1,5 (1)  x1,6   x1,6  x (1)   x (1)  1,7   1,7 (1) (1)  x1,8   x1,8

− (1) x1,1

− −

(1) x1,2

(1) x1,1

(1) x1,3 (1) x1,4

(1) x1,2 x1(1),3

(1) x1,5 (1) x1,6

(1) x1,4 (1) x1,5

(1) x1,7

(1) x1,6

to X (1) :

−   −  −  (1)  x1,1  (1)  x1,2  (1) x1,3  (1)  x1,4  (1) x1,5 

(6)

In this example, the process variable has a time support q = 1 and is time shifted over the last r = 4 observations.

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 45

Even though the lag operator will be here used mostly to compute a lower resolution version of the process variables that matches the resolution of the quality variable (r), it can also be used to add older observations through the adoption of multiples of r (2r, 3r, …).

2.3 Zero-Order Hold Infilling Operator The multiresolution scenario often leads to a situation of irregular sampling, which may pose a problem during process modelling and prediction due to the existence of missing observations. A possible solution is to infill the missing observations in some meaningful way and then apply the conventional modelling methodologies. One of such approaches is to hold the past observations constant until a new observation is obtained 47

. This operation is here defined by a zero-order hold operator, H ( r ) , that takes the

matrix form of a n × n matrix of zeros with 1 on the ( k + s, k ) elements, with k = 1, K , n and s = 0, ..., r − 1 . Its application is exemplified below for a variable with time support 2.

H (2) X (2)

0 0  0  0 = 0  0 0  0

0 1 1 0

0 0 0 0

0 0 0 1

0 0 0 0

0 0 0 0

0 0 0 0

0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0

0  −   −   (2)   (2)  0   x2,2   x2,2  (2) 0   −   x2,2      (2) (2)  0   x2,4  =  x2,4  (2) 0   −   x2,4    (2)   (1)  0   x2,6   x2,6  (1) 0   −   x2,6    (2)   (2)  1   x2,8   x2,8 

(7)

2.4 Average Operator The previous operations preserve the resolution of the original variable and only shift or downsample the original array. In this section, an operator is presented that introduces granularity by computing averages over a certain time support. Therefore, the information it returns corresponds to a different resolution than that of the original variable on which it was applied. This change in resolution is here performed by an

ACS Paragon Plus Environment

Page 13 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

averaging operator, but different filters can be used by simply changing the averaging weights. The average operator, A(q,r), transforms a variable of resolution q into a variable of resolution r by averaging the observations in a window of size r. Therefore, the new observation at time k is an average of observations at times k , k − 1,..., k − (r − 1) . However, due to the time support of x, the only available observations are multiples of q. Converting this into a matrix formulation leads to a n × n matrix of zeros with a q / r

weight on the

( k, k − s )

elements, for all combinations of k = r ,K , n

s = 0, q , 2 q , 3 q , ...,  ( r − 1 ) q  q . For instance, the application of A

(1,4)

and

to X(1) produces a

vector of moving averages over the last 4 observations:

A(1,4) X (1)

0 0  0  0 0 0   0 0 0  1/ 4 1/ 4 1/ 4 =  0 1/ 4 1/ 4  0 1/ 4  0  0 0 0  0 0  0

0 0 0 1/ 4 1/ 4 1/ 4 1/ 4 0

(1)   −  0   x1,1  (1)     0 0 0 0   x1,2   −  (1)   −  0 0 0 0   x1,3   (1)   (1)  0 0 0 0   x1,4   x1,1− 4  = (1) (1)   x1,2 −5  1/ 4 0 0 0   x1,5   (1)   (1)  1/ 4 1/ 4 0 0   x1,6   x1,3−6  (1)   x (1)  1/ 4 1/ 4 1/ 4 0   x1,7 −7    (1)   1,4 (1) x 1/ 4 1/ 4 1/ 4 1/ 4   x1,8   1,5−8 

0

0

0

(8)

For A(q,r), the weights ( q / r ) are set based on the fraction of observations within the specified window. However, this averaging procedure can be generalized by letting the weights to be given as a certain function of the s index, in order to allow, for instance, that different time instants have different importance in the low resolution values. An example of such function is the exponential downweighting function ws = (1 − λ ) , s

where 0 ≤ λ < 1 is the forgetting factor. This alternative weighting version of the operator is represented by W(q,r).

2.5 “Partial Least Squares” Operator High-dimensional data sets are often associated with high levels of correlation between subsets of input variables, which in turn lead to modelling difficulties for traditional least squares approaches. A number of methodologies have been developed for handling

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 45

this data structures, being Partial Least Squares (PLS) one solution that has been finding good acceptance in industry including for soft sensors development (see Section 1). PLS relates two data matrices, X (predictors) and Y (predictions), through a latent variable model structure that can be re-casted as a linear regression model 48-51. In order to find the desirable latent variables, PLS searches for the orthogonal linear combinations in X (n × m) with maximal covariance with Y (n × a) , usually through the NIPALS algorithm

49, 52

. Following this approach, the two data blocks can be related

through the following internal relationships 50, 52, 53,

X = TP T + E

(9)

Y = TCQT + F

(10)

where T is a (n × p) matrix of latent variables relating X and Y ; P is a ( m × p ) Xloading matrix; Q is a (a × p ) Y-loading matrix; C is a ( p × p ) regression coefficient matrix; E (n × m) and F (n × a) are residual matrices. Likewise, the external relationship can be written as Y = XB + F , where B is a ( m × a ) coefficient matrix. In this case, B = W ( P T W ) CQ T , where W is a ( m × p ) weighting matrix on X. −1

To select the number of latent variables, p, the adjusted Wold’s criterion was used in this study

54

. This criterion compares the successive prediction residual error sum of

squares (PRESS) and keeps the lowest value of p that satisfies the criterion PRESS ( p + 1) PRESS ( p ) > 0.90 .

In order to facilitate the exposure of the multiresolution modelling approaches, PLS is here defined by a partial least squares operator, P{X, Y} , which returns the appropriate coefficient matrix B. As the modeling approaches defined in Section 3 only differ on the treatment made to the X block, the PLS operator is omitted from their representation. Furthermore, it should be clear from the context which data sets are being used for modelling. In the future, other predictive methods can also be used in alternative to PLS, represented by their respective operators. They can be easily accommodated in the proposed framework, by replacing the “PLS” operator with that for the new methods.

ACS Paragon Plus Environment

Page 15 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

3

Modelling Approaches

In this section the modelling approaches for building soft sensors for data sets with the structure of Equation (1) are presented. The proposed alternative based on the explicit consideration of the multiresolution characteristics of data is put forward (Sections 3.2), besides the current simpler ones where these features are not considered (Section 3.1). A summary of all approaches is provided at the end of the section (Section 3.3).

3.1 Single-resolution and unstructured approaches Partial least squares is a versatile approach for modelling the relationship between two blocks of variables at the same resolution. However, in the multiresolution scenario, the observed variables are available at different time instants and with different resolutions. A common fix to circumvent this pattern is to downsample both X and Y blocks to the lowest sampling rate (with period r; the same simple solution is often applied for multirate signals). This can be done through the use of the downsampling operator, D(r), leading to a model relating D ( r ) Y and D ( r ) X . Afterwards, the standard PLS model is applied as usual (see Section 2.5), leading to: r

D ( r ) Y ( r ) = ∑ { D ( r ) X( q ) } B ( q )

(11)

q =1

Note that the matrix of coefficients B ( q ) is not an operator and that it is applied to data after preprocessing. Therefore, X ( q ) is firstly transformed according to the D( r ) operator, and only after this, the regression coefficients are finally applied (the braces in Equation (11) emphasize the correct order of application of the operator and coefficients). Although several resolutions may appear simultaneously in the model, this fact is disregarded. Furthermore, the model only includes information about the current state of the process. Therefore, this approach is considered an unstructured approach since there is no attempt to explicitly accommodate the variables’ resolution. This model is henceforth defined as 1:DX (the notation summarizes the main operations performed to the data, which in this case is a Downsampling preprocessing of X). As

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 45

the variables are only downsampled, 1:DX does not change their resolution and therefore there is a loss of information in the variables with higher resolutions. Assuming that the low resolution variables are averages over a certain time period, the observations that were discarded in the model 1:DX can alternatively be included through averaging. This can be done through the use of the average operator, A( q , r ) , described in Section 2.4. This leads to a set of moving averages, which are downsampled in order to match the resolution of Y. This modeling approach, which bring resolution consistency to the analysis (a true single-resolution methodology), is summarized as, r

D ( r ) Y ( r ) = ∑ D ( r ) A( q ,r ) X( q ) B( q )

(12)

q =1

where the B( q ) coefficients are found by application of PLS. This model is defined as 2:DAX, following the main operations used during modelling (Downsampling and Averaging of X). In order to tentatively improve modelling performance, the observations of X that would be discarded in the downsampling procedure (relative to variables with higher resolution) can be inputted as time shifted variables, at their native resolution, and thus used to build a dynamic PLS model (see Figure 3). The first step of DPLS involves the lagging of past observations, which can be done by application of the lag operator, L(

q ,r )

(Section 2.2) to X(q), leading to a data set arranged as in Equation (6) (see Figure 3 (a)). Afterwards, downsampling is performed (Figure 3 (b)), the resulting matrices are used to fit a DPLS model between D ( r ) Y ( r ) and the concatenated D ( r ) L(

q ,r )

X( q ) . This can be

equivalently represented as, r

D ( r ) Y ( r ) = ∑ D ( r ) L(

q,r )

X( q ) B ( q )

(13)

q =1

leading to model 3:DLX (i.e., it uses Downsampling of the Lagged observations of X). By following this model building scheme, all observations within the time support of Y(r) are included in the model (see Figures 2 and 3). Since Y(r) is in effect related to

them (Y(r) is an average over the past r samples), this inclusion may lead to a better

ACS Paragon Plus Environment

Page 17 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

predictive performance. However, as it does not consider the actual structure of multiresolution data, it can still be a suboptimal solution.

1

1

2

2

3

3

4

4

4

5

5

6

6

7 8

7 8

Y(r)

8

2

4

4

6

8

8

4

3

2

1

5

4

3

2

6

5

4

3

7

6

5

4

8

7

6

5

X(q)

L(q,r)X(q)

4

2

6

4

8

6

4

8

4

4

4

3

2

1

4

2

4

8

8

8

7

6

5

8

6

8

D(r)Y(r)

Lag Observations

(a)

D(r)L(q,r)X(q) Downsampling

(b)

Figure 3 Schematic representation of the (a) lagging and (b) downsampling operations used by DPLS. The collected samples are presented in grey.

3.2 Multiresolution approach A characteristic trace of multiresolution data is the dependency between the high and low resolution measurements of the same variable. This happens because the recorded variables (at low resolution) are filtered or weighted versions of inner (unrecorded) high resolution samples. Therefore, it can be assumed that for each observed lower resolution realization at time k, y k( r ) , there is a relationship with a set of unknown high resolution observations, y% k(1) , such that: yk( r ) = w0 y% k(1) + w1 y% k(1)−1 + ... + wr y% k(1)− r

(14)

where wi are weighting constants involved in the computation of the lower resolution values. This relationship is assumed to occur at the highest resolution (time support 1) since it is the common ground for all variables (i.e., all variables can be traced back to a high resolution level). This relationship is also visible in Figures 2 and 4, where the low resolution variables (x3(4) and y(4)) present a direct connection with their high resolution counterparts.

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 45

For exposure simplicity, let us assume that at the highest resolution there is a static dependency (i.e., the variables at time k do not depend on other time instants from the past). The same assumption is made for variables available at lower resolutions: their unmeasured values at high resolution, assumedly also have a static dependency. In this context, instead of predicting y(r) directly, its unknown high resolution values, y% k(1) , could be estimated first from the process variables, such that, y% k(1) = bxk(1)

(15)

where b is a vector of regression coefficients (note however that at this point the coefficients cannot be fitted since the high resolution signal is unknown). Afterwards, the low resolution signal could be computed from the high resolution estimate by application of the weighting constants, leading to, r −1

r −1

r −1

i =0

i =0

i =0

yk( r ) = ∑ wi y% k(1)−i = ∑ wi bxk(1)−i = b∑ wi xk(1)−i = buk(1)

(16)

where, r −1

uk(1) = ∑ wi xk(1)−i

(17)

i =0

represents the sum of weighted observations over the r period. By making use of the equivalency conveyed by Equation (16), the low resolution signal, y k( r ) , can be equally estimated from the weighted process variables u k(1) . This implies

that the process variables can be subject to the same multiresolution weighting profile that explicitly models the inner relationship between the high and low resolution signals of y.

ACS Paragon Plus Environment

Page 19 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 4 Relationship between low and high resolution variables.

More generally, dynamic dependencies can be allowed to link the high resolution quality variable to the processes variables. In this case, the model is rewritten as,

 ( r ) r −1 (1)  yk = ∑ wi y% k −i  i =0  L  y% (1) = ∑ b x (1) k j k− j  j =0

(18)

where y% k(1) is the unmeasured high resolution signal, wi are weights relating the high and low resolution signals, bj are regression coefficients and L is the number of past observations which have a dynamic effect on y% k(1) . For the static case, L = 0. In Appendix A, it is demonstrated that the model of Equation (18) can also be represented as,  (r ) L (1)  yk = ∑ bi uk  i=0  r −1 u (1) = w x (1) ∑ i k −i  k i =0

(19)

Since the inner weights of y(r) are unknown, an optimization step is implemented in order to find out the set of weights that lead to the minimum of the mean squared error for the observed y’s (MSE). These values are determined simultaneously with the regression coefficients, which are obtained by PLS, while the optimization problem was solved using the fmincon function of Matlab ® (R2016a) with the default interior point algorithm. Both operations are described in Table 2. The final model structure for this approach takes the following form:

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 45

r

D ( r ) Y( r ) = ∑ D ( r )W ( q ,r ) X( q ) B( q )

(20)

q =1

and is here referred as model 4:DWX (i.e., the models performs Downsampling and Weighting on X).

Table 2 Pseudo-code for determining the model parameters of 4:DWX.

1. Determine the weighting constants

a. Solve the following optimization problem:

w = arg min ( MSE ( w) ) w

where MSE ( w ) is computed using the steps in stage 2. 2. Determine the MSE(w)

a. For a given w, build the respective weighting operator W ( q ,r ) ; b. Fit a PLS model between D ( r ) Y ( r ) and the concatenated components of

D ( r )W ( q ,r ) X( q ) , q = 1,..., r ; r

c. Compute the model errors as E( r ) ( w) = D( r ) Y ( r ) − ∑ D( r )W ( q ,r ) X( q ) B( q ) ; q =1

d. Compute the MSE as MSE ( w ) =

1 (r ) E ( w)T E( r ) ( w) . n

Although the weights can be entirely left free in the estimation process, under the multiresolution framework they typically represent averages over past high resolution observations. Furthermore, depending on the process characteristics, it is also reasonable to assume a certain decay in importance in the contributions from past observations. Therefore, some estimation degrees of freedom can be saved by properly parameterizing this behavior, namely through an exponential weighted filter where the weights are given by,

ACS Paragon Plus Environment

Page 21 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

wi = (1 − λ ) , i = 0,..., r − 1 i

(21)

with 0 ≤ λ < 1 , λ being the forgetting factor. For λ = 0 , all past observations are weighted equally, leading to the usual averaging filter (as used in 3:DAX), while for higher values of λ , the importance of recent observations becomes larger. Note that the weights are here applied only to the last r observations, while the typical exponentially downweigh procedures use the entire data window

55

. Therefore, the proposed

weighting procedure is a truncated version of the exponentially weighted moving average (EWMA). Another advantage of using an exponential filter is the simplification of the optimization problem, since only one parameter (the forgetting factor) needs to be estimated in order to establish the weightings profile. Therefore, it is not expectable to have major optimization problems, and the time consuming task is the fitting of a new PLS model for each tentative forgetting factor.

3.3 Summary The set of modelling approaches considered in this study are summarized in Table 3, where the main characteristics of each methodology are highlighted. They are also represented by the respective mathematical expression using the operators introduced in Section 2. Furthermore, it is recalled that methodologies 1 to 3 can already be found in literature, while methodology 4 is completely new.

Table 3 Summary of the modelling approaches, their short designation and respective defining relationships.

Number Short name Model 1 DX (r ) (r )

r

D Y = ∑ D( r ) X( q) B( q) q=1

Inner points of x are removed. 2

DAX

r

D( r ) Y( r ) = ∑ D( r ) A( q,r ) X( q) B( q ) q =1

Inner points of x are averaged. 3

DLX

r

D( r ) Y( r ) = ∑ D( r ) L( q,r ) X( q ) B( q ) q =1

Inner points of x are shifted as new variables.

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

4

DWX

Page 22 of 45

r

D ( r ) Y ( r ) = ∑ D ( r )W ( q ,r ) X( q )B ( q ) q =1

x is exponentially weighted using a filter set y.

4

Case studies

In this section we present the results of applying multiresolution soft sensors to several case studies. The first battery of case studies involves simulated systems with a latent variable structure (Section 4.1), where the purpose is to test the multiresolution approaches in a well-defined set of conditions. By doing so, the results of the method can be better interpreted and insights on the way it works acquired, without the interfering effects of complex characteristics introduced by more complex testing scenarios. This is suitable for characterizing the method in an initial stage of validation and for establishing its main properties. Then, in Section 4.2, we test the method under more realistic conditions, with resort to the simulation of a distillation column for separation of a ternary mixture of methanol, ethanol and 1-propanol, with non-ideal vapor-liquid equilibriums. The purpose here is to assess the method’s robustness under more challenging situations and to illustrate the way it is developed and applied in more realistic settings. The assessment and comparison of the methods’ performances is made on the basis of the mean squared error (MSE, Equation (22)), using the noise free signal as reference (as the goal of the soft sensor is to estimate the true underlying signal), over 100 replicates of the test data. In all cases, the models are trained on 1000 observations and tested on other 1000 observations. The studies are made in such a way to enable the application of a permutation test

56

in order to find out if the MSE values obtained for

the different methods are statistically different and thus elaborate on the relative performance of the methodologies in a rigorous and consistent way. These permutation tests are applied to the MSE of each pair of methods using 10000 permutations of the sign (+) or (–) in the difference between the MSEs. Afterwards, the empirical distribution of the test statistic and the corresponding p-value are determined. MSE =

1 n 2 ∑ ( yi − yˆi ) n i =1

ACS Paragon Plus Environment

(22)

Page 23 of 45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

4.1 Case Study 1: Latent Variables Process To illustrate the use of the proposed methodology, data is generated using a latent variable model structure. This is a suitable testing scenario, as many real systems tend to be well described by such model structure

57, 58

. The simulated data consists of nine

process variables (X, m = 9) and one quality variable (Y, a = 1) related according to Equations (9) and (10). The process parameters P and Q are random matrices with orthogonal columns, while the elements of C were randomly generated using the standard normal distribution. In order to confer dynamic properties to the process, three latent variables (T, p = 3) following an autoregressive model were used. Each latent variable, ti , is computed as,

ti = ϕti −1 + ε i

(23)

where ϕ is the autoregressive coefficient and ε i is a random white noise variable with zero mean and variance (1 − ϕ 2 ) (with this choice, the latent variables have unit variance). Each latent variable is defined by a different autoregressive coefficient, which were set to 0.9, 0.5 and 0.3. The noise matrices E and F were set to signal-tonoise

ratios

(SNR)

of

20

dB,

being

the

SNR

defined

as:

SNR = 10log ( var ( signal ) var ( noise) ) . The above procedure led to a data set with dynamic characteristics and correlated measurements available at all time instants (high resolution information). Therefore, in order to include multiresolution features in the simulation, this initial data set was preliminarily subjected to further processing. In this case, the quality variable was considered as being recorded less frequently, representing a variable originated through a complex measurement system, possibly a time consuming laboratory analysis, and containing information over a wider time window during which the sample was composed, therefore having a larger time support (higher/coarser granularity). This low resolution signal was computed based on weighted averages over non-overlapping windows of 4 consecutive high resolution observations. Therefore, the quality variable is only available every fourth time instant (r = 4). Likewise, for the process variables it is also considered that they have distinct resolutions due to similar restrictions (which in

ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

practice can be connected to averaging procedures taking place at the level of the DCS, mostly for compression purposes). In this case, variables 1 to 3 are taken every instant (q = 1), variables 4 to 6 are taken every two time instants (q = 2) and variables 7 to 9 are taken every four time instants (q = 4). Note that since the process parameters are randomly generated, the order of the variables is indifferent. The weights used in the computation of the low resolution variables (x(q) with q > 1 and y(r)) were set differently for three scenarios. In the first case, the weights were set to the

same value, representing the typical multiresolution framework of averages computed over a given time period (Section 4.1.1). In the second case, the variables were subjected to an exponential filter with λ = 0.75 (Section 4.1.2). Finally, in the third case the process variables represent averages and the quality variable is exponentially weighted with λ = 0.75 (Section 4.1.3). After this data processing stage, the data sets obtained for analysis have the general structure of Equation (1). The results are presented in the following subsection, according to the way the low resolution variables were obtained.

4.1.1

Multiresolution data obtained by equal weighting of high resolution data

This section considers the scenario were the low resolution variables are computed as plane averages of high resolution data, where all the high resolution data values in the time support have an equal contribution to the low resolution value that is actually observed. The MSEs obtained in 100 Monte Carlo simulations are represented in Figure 5, and the associated results for the permutation tests, in Table 4 (the P, Q, and C matrices were randomly generated in each new run leading to different testing scenarios, in order to avoid the dominance of special simulated conditions). Figure 5 clearly shows that 1:DX presents the worst performance as a result of only modelling the instantaneous relationships and of not including information on the multiresolution structure. On the other hand, the methodologies with the best performances are 2:DAX and 4:DWX, as can be verified in Table 4, where it can be seen that they lead to significantly lower MSE than the other methods and that they are also not statistically different in this regard. This happens because both methodologies internally weight the observations according to a specific profile in order to mimic the multiresolution characteristics. In the case of 2:DAX, the weighting profile is imposed

ACS Paragon Plus Environment

Page 24 of 45

Page 25 of 45

to be an average, while for 4:DWX the weights are fitted by tuning the forgetting factor according to the optimization problem described in Section 3.2 (see Table 2). The 0.95 quantile of the distribution of the forgetting factors estimated during the Monte Carlo simulations is 0.033, which in this rather extreme scenario leads to a weighting profile of w = [1.000 0.967 0.935 0.904]. Therefore, even in this case, the past observations are given rather similar weights, which justifies the similarity in performance with 2:DAX. Based on these results both methodologies were found to be good alternatives, but since 4:DWX is more flexible on the setting of the weights, it is considered a preferable choice.

MSE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 5 Boxplot of MSE for 100 replicates of the latent variable case study with averages computed using equal weights.

Table 4 Permutation tests to the MSE of the modelling methodologies in the latent variable case study with averages computed using equal weights. p-value and test statistic are presented for each pair of A – B. The values in bold represent cases where A is statistically better than B. The values underlined are cases where A and B are statistically equal.

A 1:DX 2:DAX 3:DLX 4:DWX

B

1:DX

2:DAX

3:DLX

4:DWX

-