Building Optimal Multiresolution Soft Sensors for Continuous

(1−4) To face this limitation, variable selection algorithms, such as forward ..... X(q) represent the version of the original matrix at resolution q;...
0 downloads 0 Views 835KB Size
Subscriber access provided by UNIV OF DURHAM

Process Systems Engineering

Building Optimal Multiresolution Soft Sensors for Continuous Processes Tiago J. Rato, and Marco Seabra Reis Ind. Eng. Chem. Res., Just Accepted Manuscript • DOI: 10.1021/acs.iecr.7b04623 • Publication Date (Web): 12 Apr 2018 Downloaded from http://pubs.acs.org on April 12, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Building Optimal Multiresolution Soft Sensors for Continuous Processes

Tiago J. Rato, Marco S. Reis*

CIEPQPF, Department of Chemical Engineering, University of Coimbra, Rua Sílvio Lima, 3030-790, Coimbra, Portugal, Equation Chapter (Next) Section 1 Equation Chapter 1 Section 1

*Corresponding author: e-mail: [email protected], phone: +351 239 798 700, FAX: +351 239 798 703

1 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Data-driven models used in soft sensor applications are expected to capture the dominant relationships between the different process variables and the outputs, while accounting for their high-dimensional, dynamic and multiresolution character. While the first two characteristics are often addressed, the multiresolution aspect is usually disregarded and confused with a multirate scenario: multiresolution occurs when variables have different levels of granularity due to, for instance, automatic averaging operations over certain time windows; on the other hand, a multirate structure is caused by the existence of different sampling rates, but the granularity of the recorded values is the same. This has two major and immediate implications. Firstly, current methods are unable to handle variables with different resolutions in a consistent and rigorous way, since they tacitly assume that data represent instant observations and not averages over time windows. Secondly, even if data is available at a single-resolution (i.e., all variables with the same granularity), it is not guaranteed that the native resolution of the predictors is the most appropriate for modeling. Therefore, soft sensor development must address not only the selection of the best set of predictors to be included in the model, but also the optimum resolution to adopt for each predictor. In this work, two novel multiresolution frameworks for soft sensor development are proposed, MRSS-SC and MRSS-DC, that actively introduce multiresolution into the data by searching for the best granularity for each variable. The performance of these methodologies is comparatively assessed against current single-resolution counterparts. The optimized multiresolution soft sensors are bounded to be at least as good as their single-resolution versions and the results confirm that almost always they perform substantially better.

Keywords: Multiresolution data; Partial least squares; Dynamic partial least squares; Stepwise regression.

2 ACS Paragon Plus Environment

Page 2 of 39

Page 3 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

1

Introduction

Process models are the base of most operations in industry. With the emergence of Industry 4.0, data-driven empirical models in general, and soft sensors in particular, have been acquiring an increasing importance. Their quality depends on the ability to address the characteristics of data, which in turn is related to the natural behavior of the process together with the data acquisition system set in place. One of such characteristics is the high-dimensionality caused by the increasingly availability of inexpensive sensors that allow for the collection of hundreds of variables across the process, a task that is facilitated with the appearance of new solutions in the domain of the Internet of Things (IoT). On the other hand, the number of sources of variability is much lower than the number of measured variables, leading to redundancy in the collected information, which appears in the form of highly correlated data. This characteristic of data raises estimation difficulties to classic regression models based on weighted or unweighted least squares

1-4

. To face this limitation, variable selection

algorithms, such as forward selection, backward elimination, stepwise regression or genetic algorithms

1, 5-8

have been adopted for reducing the degree of redundancy in

data, and simultaneously handle the sparsity problem, i.e., the existence of noise factors or variables with no predictive value, that should also be removed from the model. An alternative approach to variable selection for handling high-dimensional data sets, consists in selecting dimensions, instead of variables. Dimension selection (or projection) methods consist in projecting the original data into a lower dimensional subspace that captures the essential of the underlying sources of variation. Fall in this category the well-known class of latent variable methods, such as principal component regression (PCR)

9

and partial least squares (PLS)

10-12

. PLS has been extensively

applied to build soft sensors for continuous and batch processes, given its well-known ability to handle collinearity in the predictors

13-17

. However, it is also recognized that

PLS performance can improve by removing noisy variables, through some variable selection scheme

7, 18-21

. Likewise, it has also been demonstrated that smoothing the

variables through the use of moving averages schemes can lead to improved models

22,

23

. Other classes of approaches for handling high-dimensional data, such as penalized

regression methods (ridge regression, LASSO, elastic net) and machine learning approaches (random forests, gradient boosting trees), are reported in Ref. 24.

3 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Another characteristic of industrial data is the presence of non-uniform spaces between recorded values. This sparse data structure can be originated from multiple acquisition rates (multirate systems) or due to the existence of variables at multiple resolutions. A common example where a multirate structure can be encountered is the case where some variables (usually, the final quality variables) are obtained less frequently as a result of complex and time consuming experimental protocols, while process variables can be easily obtained, at higher rates, by dedicated sensors. In this case, all variables have the same resolution (highly localized in time), but different sampling rates. For these situations, several multirate solutions were already proposed in the literature 25-29. The case of multiresolution is however often overlooked, despite its prevalence and relevancy in industry. It regards situations where the recorded observations have different granularities. This happens because they are subject to aggregation rules that merge multiple (high resolution) samples into a single (low resolution) observation, such as averaging operations. The recorded values have therefore different resolutions (or granularities) and do not respect solely to the reporting time instant (as happens in the multirate case), but rather to the entire time window over which they are computed or acquired. Low resolution observations can also result from compound sampling procedures, where samples are collected during a period of time and accumulated, before being mixed and tested, leading to properties that represent the state of the process over the collection window of time. In multiresolution data structures, the time window, used for aggregating high resolution observations or for accumulating process samples, is at the origin of the variable’ granularity and is here called as the variable’s time support. This characteristic makes multiresolution data sets fundamentally different from their multirate counterparts, even though the data tables that they produce may look visually similar. For more discussion on the differences between multirate and multiresolution data sets we refer the reader to Ref.

30

. In spite of the need for dedicated multiresolution

methodologies, there is currently a lack of solutions addressing the case of multiresolution structures and by consequence, of multiresolution soft sensors. To the best of our knowledge, the only proposal in this regard is the Multiresolution PLS in Ref. 30, that implements a weighing scheme designed for handling multiresolution data. This modeling approach considers the case where data already has a multiresolution structure and the aim is to accommodate the different resolutions by implicitly matching 4 ACS Paragon Plus Environment

Page 4 of 39

Page 5 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

the resolution of the predictors with the resolution of the response. In this approach, the resolution of the data is never changed, nor are predictors selected to be included in the model. Therefore, when applied to single-resolution data sets (which is the scope of the current work), this previous contribution becomes equivalent to the standard PLS model. The key point we address in this work, is the following: even if the data is available at a single-resolution it is not guaranteed that the native resolution of the predictors is really the best resolution for deriving a predictive modelling. On the contrary, we have all reasons to suspect the opposite. The existent data collecting systems were designed and installed by third-party companies during the commissioning stage of the DCSs and IT infrastructure. Their concerns were not to optimize the performance of future predictive models, but to secure that all relevant variables are sampled at a sufficiently high resolution and rate, in order to stabilize the process and capture the local variability from the standpoint of the control and monitoring systems. However, from the perspective of developing optimal soft sensors, robustness and parsimony often lead to the conclusion that it is indeed preferable to select fewer variables and adopt low resolution descriptions of them. In summary, the decision of specifying the original data resolution and acquisition rate was certainly made by someone that is not involved in future model development tasks, and perhaps that does not even belong to the company. Therefore, there is no guarantee at all that the optimal sampling and resolution settings, for prediction or for any other purpose, are the ones present in the original data set. In this work, we will show that it is not only desirable to select the set of predictors most suitable for estimating the response during soft sensor development, but also to determine at which resolution they should be included in the model: variable and time parsimony should be addressed and optimized simultaneously, for soft sensors to achieve the best predictive performance. To achieve this goal, a novel multiresolution framework for soft sensor development is proposed. The proposed solution was designed for continuous processes, and can effectively address the simultaneous presence of high-dimensionality (through variable selection), process dynamics (by selection of time-shifted versions of the original variables) and multiresolution (by selection of the resolution most suitable for each variable).

5 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The rest of the article is organized as follows. In the next section the standard regression models based on latent variables and stepwise regression are briefly reviewed (Subsections 2.1 to 2.3), followed by the description of two novel approaches for building multiresolution soft sensors for static (MRSS-SC) and dynamic (MRSS-DC) continuous systems (Subsection 2.4). Afterwards, the proposed methodologies are tested and compared in a set of simulated case studies (Section 3). The results obtained are discussed in Section 4 and the main conclusions of this work are summarized in Section 5.

2

Empirical Modeling Approaches for High-Dimensional Data

To facilitate the presentation of multiresolution empirical models proposed in this article, we start by defining the nomenclature and structure assumed for data sets arising from typical industrial processes. In this context, process and quality variables usually have different resolutions as a result of different aggregation rules implemented over the recorded values. Without loss of generality, the multiresolution structure of these variables is assumed to follow a dyadic tree as in the case of wavelet-based frameworks 31, 32

. If such structure is not met, interpolation 31 or multiresolution infilling of missing

observations

33

can be used to accommodate the observations into the dyadic scheme.

(The dyadic structure was adopted to facilitate the generalization of the multiresolution model structure to different levels of granularity, using a simple and compact parameterization.) Each variable is here generally represented as xi(,qt ) , where i is the variables index, t is the observation time instant and 0 ≤ q ≤ r is the variable resolution, being r the lowest resolution used in the representation (coarser granularity), and 0 the highest resolution (finest granularity). The variable’s resolution also defines the length of the time support and the periodicity in which values become available. That is, variables with resolution q are collected at every 2q-th observation after being averaged over the last 2q high resolution observations (see Figure 1).

6 ACS Paragon Plus Environment

Page 6 of 39

Page 7 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 1 Graphical representation of the relationship between an observed variable at different resolutions (gray) and their time support at the highest resolution (r = 0) represented as a dyadic tree.

For soft sensor development, two blocks of variables are available: predictors and responses. The predictors are here represented by a (n×m) matrix X, while the responses are given by a (n×a) matrix Y, where n is the number of observations at the highest resolution level (note that when a variable is collected at a low resolution, its value is not recorded in all time instants and thus some of its elements in the data matrices are blank, see Figure 1), m is the number of predictors and a is the number of responses. In this study, we only address the case of one response, a = 1. Regarding their resolution, all predictors are assumed to be available at the highest resolution (q = 0), since they are usually related to easily obtained process variables (i.e., sensor data, such as temperature, flow, pressure, pH, conductivity, etc.). Note however, that this is only a constraint for the classical modelling approaches, while the proposed methodology can still be applied even if the predictors have different resolutions. On the other hand, the response can be at any coarser (or lower) resolution, r ≥ 0 . The resolution of the response also defines the lowest resolution of the data set, as happens in practice. Also note that during the n high resolution time instants, the response is only recorded at every 2r-th observation (in the meantime, the values are accumulated to be averaged and recorded in the end of the period). Therefore, the effective amount of response observations that can be used for modelling is  n / 2 r  , where  z  is the operator representing the largest integer less than or equal to z. For this reason, before applying

7 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

any of the modeling strategies considered in this study, both X (or a pre-processed version of X) and Y need to be downsampled to the resolution of the response. To facilitate the construction and interpretation of the empirical models described in the following subsections, we will use the following operators, which appear more fully described in Appendix A (see also Ref. 30). These are necessary for a rigorous treatment and description of the model structures, but for a first contact with this material, the reader may think of the models as containing variables at multiple resolutions, indexed by r (r = 0 stands for the finer resolution, and r > 0 for the coarser or lower ones). These operators are: •

Downsampling operator, D(r), that reduces the number of observations by retaining only every 2r-th observation.



Lag operator, L(q,l), that adds time-shifted versions of a variable with resolution q within a time window of l past observations.



Average operator, A(q,r), that averages all observations of a variable with resolution q within a time window of 2r observations.

Finally, we would like to point out that, while the average operator is perhaps the most intuitive approach for aggregating multiple observations, other operators can also be employed. For instance, an operator based on exponentially weighted moving averages can also be used to give more importance to the most recent observations. Regardless of the type of operator used to aggregate the data, the structure of the models described in the following subsections remains the same.

2.1

Partial Least Squares: PLS

Partial least squares (PLS) is one of the most well-known and widely used methodology for modelling high-dimensional industrial data sets

10, 12, 34

. Its advantage comes from

the ability to handle highly collinear data sets, by estimating the low-dimensional predictor subspace with maximum prediction power. More specifically, PLS successively finds the linear combinations of the X-variables with maximal covariance with the linear combinations of the Y-variables

35

. This is usually achieved by

8 ACS Paragon Plus Environment

Page 8 of 39

Page 9 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

application of the NIPALS algorithm 10, 35. PLS is an estimation principle applicable to the following latent variable model structure 10-12:

X = TP T + E

(1)

Y = TCQ T + F

(2)

where T is a (n × p ) matrix of latent variables relating X and Y; P is a (m × p ) Xloading matrix; Q is a ( a × p ) Y-loading matrix; C is a ( p × p ) regression coefficient matrix; E ( n × m ) and F ( n × a ) are residual matrices. This PLS parametrization can be recast into a standard linear regression format, Y = XB + F , where B is a ( m × a ) coefficient matrix given by, B = W ( P T W ) CQ T , where W is a (m × p ) weighting −1

matrix on X. Among the many possibilities available to select the number of latent variables, p, the adjusted Wold’s criterion was used in this study

36

. This criterion compares the

successive prediction residual error sum of squares (PRESS) and keeps the lowest value of p that satisfies the condition PRESS ( p + 1) PRESS ( p ) > 0.90 . The formulation of PLS in Equations (1)-(2) is only suitable for single-resolution data sets. Therefore, when applied to multiresolution data, it has to be modified in order to take into account that variables are recorded at different time instants and with different resolutions. For simplicity, it is here assumed that only the response is at a lower resolution. This is a common situation in practice and it implies that some preprocessing has to be done in order to match and synchronize the available observations. The simplest way to do it, without considering any multiresolution processing (i.e., following the procedures currently in use), is to downsample the observations from all predictor variables to the same periodicity of the low resolution response variable, so that only the 2r-th observation from each predictor is retained (where r is the resolution of the response). This can be done by use of the downsampling operator, D(r) (see Appendix A.1), leading to two new data block: D ( r ) Y and D ( r ) X . Afterwards, the standard PLS model can be applied without limitations. By doing so, the final model structure is given by,

9 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

D ( r ) Y ( r ) = {D ( r ) X (0) } B (0) PLS

Page 10 of 39

(3)

Remember that X( q ) represent the version of the original matrix at resolution q; in this case q = 0. (Note that downsampling X( q ) also does not change its resolution.) This model is henceforth called 1:PLS.

2.2

Dynamic Partial Least Squares: DPLS

The standard PLS model does not contemplate any dynamical modeling features, and is unable to capture dependencies of the response from past observations in the X-block. This may represent a limitation for two reasons. First, it misses all possible dynamic relationships which are likely to exist in process data. Second, even when the data is collected from a static process, if the response is available at a lower resolution, the recorded values are an average of the respective (unknown) high resolution signals within a time window (i.e., the variable’s the time support; see Figure 1). Therefore, the observed averages are likely to be related with some past observations of the predictors. For these reasons, the matrix of predictors should include not only their present observations (as happens in the standard PLS) but also a window of past observations, leading to a dynamic PLS model formulation (DPLS) 37-39. The first step to build a DPLS model involves the generation of an extended matrix of predictors that concatenates the current observations and their time-shifted versions (at their native resolution, q = 0), such that, % (0) (t ) = L(0,l ) X (0) (t ) =  X (0) (t ) X (0) (t − 1) L X (0) (t − l )  X  

(4)

where X(0) (t − k ) represents the original high-resolution matrix of predictors (q = 0) shifted k observations into the past and l is the number of past observations to be included in the model (i.e., the number of lags). (Note that this step can also be carried out by application of the lag operator, L( q ,l ) ; see Appendix A.2.) The advantage of this procedure is that some (or all) of the observations that would by discarded by PLS during downsampling are now included in the model as time-shifted variables.

10 ACS Paragon Plus Environment

Page 11 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Afterwards, it is still necessary to downsample both data sets in order to remove the observations with unknown values in Y, leading to the D ( r ) Y ( r ) and D ( r ) L( 0,l ) X (0) blocks. Note that since lags are added before downsampling, the dynamic structure of data is persevered. Furthermore, if l ≥ 2r – 1, none of the observations is discarded and all observations within the time support of the response are included in the model. As such, DPLS can potentially model some of the multiresolution characteristics. However, this is made by wrongly treating them as dynamic relationships – this is still a single resolution approach. The D ( r ) Y ( r ) and D ( r ) L( 0,l ) X (0) blocks are then fed to the standard PLS estimation procedure, but since the past observations are included in the extended X-block, a DPLS model is obtained in the end, as follows:

{

}

D ( r ) Y ( r ) = D ( r ) L( ) X(0) B (0) DPLS 0,l

(5)

The models based on this structure are here referred as 2:DPLS.

2.3

Single-Resolution Forward Stepwise Regression: SR-FSR

Stepwise regression 1 is a sequential approach for fitting regression models by selecting the most relevant predictors based on the statistical significance of their contribution to improve the estimation of the response variable. The forward stepwise procedure starts by building a reference model using the predictor with highest correlation with the response (if it is significant; Stage 1 of Table 1). Afterwards, a two stage procedure is run in order to successively add or remove variables from the model. In the first stage, each variable outside the model is tested for inclusion in the model (Stage 2 of Table 1). This test is made by building a tentative model with each one of these variables and performing a partial F-test to assess the contribution of the inserted variable for improving the model explanation power (the null hypothesis is that its contribution is zero). If any of the tentative coefficients is found to be significant (i.e., they have a pvalue lower than the entrance tolerance, meaning that it is highly unlikely that its contribution is zero), then the tentative variable leading to the lowest p-value is added to the model. Afterwards, the new reference model is tested for the removal of any variable (Stage 3 of Table 1). Again, this is made by using a partial F-test. In this case, 11 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

if the contribution of a variable in the model is found not to be statistically different from zero, the variable with the highest p-value and higher than an exit tolerance, is removed from the model. This addition/removal sequence is repeated until no further variables are added or removed from the model. The pseudo-code for the implemented stepwise regression procedure is provided in Table 1. In this study, the entrance tolerance (significance level) was set to 0.05 and the exit tolerance to 0.10 (the defaults in the software platform used, Matlab ®, The Mathworks, Inc.). The standard procedure for stepwise regression assumes that all variables are available at the same resolution, i.e., it is a single-resolution methodology. Furthermore, it does not change the original resolution of the variables. Therefore, it is not capable to make the best use of all available information (e.g., the time support of a low resolution response covers several past observations of high resolution predictors, but only the most recent observations are used) nor it optimizes the predictors’ resolution (e.g., the predictors might have a large uncertainty and thus averaging then over a time window might smooth out and improve their estimates). To apply this modelling strategy, the data sets have to be downsampled to the lowest resolution, leading to the D ( r ) Y and D ( r ) X blocks. Afterwards, the downsampled blocks can be related through a linear model identical to Equation (3). The only difference regarding the approached based on PLS, is the estimation principle adopted for obtaining the regression coefficients: now these are obtained through ordinary least squares. This type of empirical models is referred as Single-Resolution Forward Stepwise Regression, for short, 3:SR-FSR. Table 1 Pseudo-code for the single-resolution stepwise regression.

1. 2.

3.

4.

Select the predictor with highest correlation with the response and use it to build a reference model; Inclusion stage: a. For each variable not in the reference model, fit a tentative model and perform a F-test to determine if the additional regression coefficient is likely to be equal to zero. Save the respective p-value; b. If there is at least one p-value lower than an entrance tolerance, select the variable with lowest p-value, add it to the model and update the reference model. Removal stage: a. For each variable in the reference model, perform a F-test to determine if its regression coefficient is likely to be equal to zero. Save the respective p-value; b. If there is at least one p-value greater than an exit tolerance, select the variable with greatest p-value, remove it from the model and update the reference model. If no variable has been added in Stage 2 nor removed in Stage 3, stop the algorithm. Otherwise, go to Stage 2.

12 ACS Paragon Plus Environment

Page 12 of 39

Page 13 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

2.4

Optimal Construction of Multiresolution Soft Sensors for Continuous Processes: MRSS-SC, MRSS-DC

The multiresolution empirical model building methodology for static continuous processes (MRSS-SC) proposed in this article, builds over the standard SR-FSR introduced in Section 2.3. However, in the new MRSS-SC framework, the variables’ resolution is also subject to selection, together with the predictors to be included in the model. In this way, MRSS-SC has an extra degree of freedom for model building, which is the ability to use lower resolutions of the X-block variables, if that is advantageous for the properties of the final estimated model, namely to its prediction ability and robustness. Now, the model developer has the possibility to tune the resolution of the variables to use, no longer being tied to the native resolution imposed by the data collection system, which resulted from a decision of a third party completely unrelated to the current modelling task, but that otherwise would be interfering with the model performance. With MRSS-SC, the ability to tune the resolution returns to the hands of the user. In brief terms, the underlying idea of MRSS-SC is to simultaneously select the best variables to be included in the model and also to search for the best resolution to use in the selected variables. The search space for the variables’ resolution is constrained to be between the native resolution of the variable ( q = 0 ) and a given maximum resolution defined by the user ( q = qmax ). For simplicity, it is assumed that all predictors have the same native resolution. However, the procedure can still be applied even if their native resolutions are different from each other. The only modification to the algorithm is the minimum resolution of the search space, which becomes dependent of the variable under analysis. The overall MRSS-SC model structure is obtained by building an extended matrix of predictors containing duplicates of the original variables over all resolutions under analysis. The signal at each resolution is computed through the use of the average operator (A(q,r)). Afterwards, all resolutions are concatenated into an extended matrix, X =  A(0,0) X (0)

A(0,1) X (0) L A(0,qmax ) X (0) 

13 ACS Paragon Plus Environment

(6)

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 39

This extended matrix is then used for variable-resolution selection, following a procedure similar to that of SR-FSR, but subject to some extra selection rules during model building (see below). Furthermore, since the response is observed less frequently, the response and predictor blocks need to be downsampled. Consequently, in a rigorous multiresolution nomenclature, the MRSS-SC model is given by,

D ( r ) Y ( r ) = D ( r ) XB MRSS ⇔ qmax

q) ⇔ D ( r ) Y ( r ) = ∑ { D ( r ) A(0, q ) X(0) } B (MRSS

(7)

q=0

(0) T where, B MRSS =  B MRSS

T

( qmax ) T (1) T  . B MRSS L B MRSS

In summary, Equation (7) indicates that, for all possible resolutions (q = 0,…, qmax), the predictors are averaged over a time support of 2q observations that contains the current observation (using A(0,q ) ), and then downsampled to the response sampling rate (using q) D( r ) ), after which they are finally multiplied by the regression coefficients ( B (MRSS ).

Since the model is composed by all combinations of variables and resolutions, each one of these combinations can be interpreted as a pseudo-variable representing a specific variable-resolution combination.

In comparison to the other methodologies, we recall that SR-FSR, PLS and DPLS use only the native resolution of the predictors, whereas MRSS-SC can change or tune their resolution in order to optimize the prediction accuracy. Furthermore, if the maximum resolution (qmax) is set to zero, the MRSS-SC model becomes comparable to SR-FSR and PLS (the only difference being the approach used to estimate the coefficients). Likewise, the model of the dynamic counterpart of MRSS-SC (to be introduced latter on this article) is similar to DPLS when the maximum resolution is zero. By considering the variable-resolution combinations as the new set of variables, the standard SR-FSR can then be applied. However, since SR-FSR does not make any distinction about the variables’ resolution, the same variable could be included in the model more than once, but at different resolutions. To avoid this situation, in the proposed MRSS-SC framework, the eligible variable-resolutions are restricted to the set of variables not included before in the model. In other words, if a given variable is already in the model, all other variable-resolution sharing the same variable are not 14 ACS Paragon Plus Environment

Page 15 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

considered for further addition to the model. The reasoning for selecting only one resolution for each variable is to enforce the construction of intuitive, interpretable and parsimonious models as well as to mitigate the over representation of variables (for instance, it may be interesting to know at which resolution a given variable is most predictive). Another difference relative to SR-FSR, is the selection criterion. In MRSS-SC, the criterion for variable-resolution selection is based on the root mean squared error (RMSE) computed from K-fold cross-validation. For each validation fold k = 1,…, K, the RMSE is computed as: RMSE ( k ) =

1 h 2 ( yi − yˆi ) ∑ h i =1

(8)

where yi is the observed response, yˆi is the estimated response and h =  n / K  is the number of observations in a fold. By using the RMSE as selection criterion, variable-resolution combinations are added or removed from the model, based on a robust assessment of their ability to explain and improve the response estimates. Furthermore, in order to avoid local optima, the crossvalidation procedure is replicated c times. Each replicate is based on a sampled data set obtained by taking K ⋅ h random observations from the original D ( r ) Y ( r ) and D ( r ) X . Therefore, only one data set is needed for cross-validation. In these conditions, given the randomization of observations and their subdivision in folds, the training and validation data sets used to compute the RMSE are highly likely to be rather distinct from each other, making the assessment process conservative and therefore robust. The main steps of the MRSS-SC procedure are described in more detail in the following paragraphs and summarized in Table 2. To initiate MRSS-SC (Stage 3 of Table 2), the algorithm starts by selecting the variableresolution that leads to the lowest M-RMSE (M-RMSE is the median of CV-RMSE; for each combination of variable-resolution, the algorithm returns a CV-RMSE vector with dimensions ( c ⋅ K ×1 ), composed by the RMSE of each K-fold on every c replicate). Since no variable is yet included in the model, it is necessary to test all possible univariate models. This is done for all combinations of variables and resolutions from where their CV-RMSE and M-RMSE are retrieved following the cross-validation

15 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

approach described before. Based on this information, the variable-resolution with the lowest M-RMSE is selected for building a reference model. The respective CV-RMSE and M-RMSE are also saved for future reference. In the following stage (Stage 4 of Table 2), the algorithm searches for the next variableresolution to be added to the model. In analogy to the standard SR-FSR, this is done by tentatively adding a variable-resolution to the model and computing the respective performance metrics (CV-RMSE and M-RMSE). The variable-resolution selection is then made by testing if there is any statistical improvement in the M-RMSE after adding the tentative variable-resolution. This is done by resort to a paired Wilcoxon rank sum test 40, 41, comparing the tentative CV-RMSE and reference CV-RMSE. The Wilcoxon rank sum test is a non-parametric test used to determine if the difference between two paired samples comes from a distribution with a median equal to zero. Therefore, when the paired Wilcoxon rank sum test is applied to the tentative CVRMSE and reference CV-RMSE, it can be used to infer if the M-RMSE of both models are statistically different (i.e., the p-value for the statistical test is lower than the entrance tolerance). Based on this, only the tentative variable-resolutions that lead to statistically different M-RMSE are considered for addition to the model. Furthermore, their M-RMSE should be lower than the reference M-RMSE in order to ensure that an improvement is obtained. Among the variable-resolution that respect such conditions (and only for those variables that are not yet in the model), the variable-resolution leading to the lowest M-RMSE is added to the model. Subsequently, the reference model, CV-RMSE and M-RMSE are updated. After selecting what variable-resolution is added to the model, the algorithm determines if any of the variables-resolution in the model should be removed (Stage 5 of Table 2). This is done by tentatively removing each variable-resolution and comparing the respective M-RMSE with the reference. Again, this comparison is done through the paired Wilcoxon rank sum test between the tentative CV-RMSE and reference CVRMSE. Furthermore, the variable-resolution to be removed should lead to a decrease on the M-RMSE. Therefore, the variable-resolution of interest are the ones with p-value lower than an exit tolerance and with lower M-RMSE than the reference model. Among this variables-resolution, the one that leads to the lowest M-RMSE is selected and

16 ACS Paragon Plus Environment

Page 16 of 39

Page 17 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

removed from the model. Afterwards, the reference model, CV-RMSE and M-RMSE are updated. The “inclusion” and “removal” stages are then repeated until no further variableresolution is either added or removed from the model. In the current study, the entrance tolerance (or significance level) was set to 0.05, while the exit tolerance was set to 0.10, in order to secure that the results are comparable with SR-FSR. The number of crossvalidation replicates (c) was set to 20 and the number of folds (K) was set to 5. The results obtained with the application of this MultiResolution procedure for Soft Sensor development in Static Continuous systems, are reported under 4: MRSS-SC. Although we are only reporting the results for the above selection strategy, other selection criteria were also considered in this study. The most relevant of them is the use of a F-test to the regression coefficients, such as in SR-FSR. Another alternative approach is to select the resolutions using Genetic Algorithms

42-44

, having the M-

RMSE as the optimality criterion. All of these scenarios led however to similar results, which further corroborates the solution selected to be implemented in MRSS-SC. Besides searching for the optimal set of variables and their resolution, MRSS-SC can also be used to fit dynamic models. This can be accomplished by adding time-shifted versions of the original set of predictors (as in the case of DPLS). Afterwards, each time-shift can be treated as a new variable, which is also subject to selection and resolution optimization. For modelling proposes, this translates into the application of

% (0) ) and subsequently average each timeEquation (4) to the original data (leading to X shift over the resolutions of interest: % (0) X dyn =  A(0,0) X

% (0) L A(0,qmax ) X % (0)  A(0,1) X 

(9)

The final model is then represented by, D ( r ) Y ( r ) = D ( r ) X dyn B MRSS ⇔ qmax

{

}

(q) ⇔ D ( r ) Y ( r ) = ∑ D ( r ) A(0,q ) L( ) X(0) B MRSS q =0

0,l

(10)

The order of operations in Equation (10) establishes that the predictors are firstly shifted in time within a time window of l observations (using L(0,l)). Then each time-shifted version is averaged over a time support of 2q observations (using A(0, q ) ), downsampled 17 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

to the response time support (using D( r ) ) and then multiplied by the regression q) coefficients ( B(MRSS ). Consequently, the model can accommodate all combinations of

variables, resolutions and time-shifts. However, by application of the MRSS procedure, only some of those combinations are actually included in the model. Finally, it should be noted that the past information introduced by the lag operator and average operator serve different purposes. The lag operator is used to account for process dynamics and does not change the variables resolution. On the other hand, the average operator is used to pre-process each variable, returning its output at several distinct resolutions. This MultiResolution framework for Soft Sensor development, in Dynamic Continuous processes, is here defined as 5: MRSS-DC. Table 2 Pseudo-code for the multiresolution stepwise regression.

1. 2.

Build an extended data set with all combinations of variables and resolutions. Build c random permutations of the original extended data set for cross-validation. Each data set is composed by K·h observations randomly selected from the original extended data set. 3. Fit an initial model: a. For each combination of variable-resolution, fit a tentative model on each c random set and compute the respective CV-RMSE obtained by K-fold cross-validation; Save the CVRMSE and M-RMSE; b. Select the variable-resolution that has the lowest M-RMSE. Save its model and CV-RMSE for reference. 4. Inclusion stage: a. For each variable not in the reference model: i. For all resolutions, fit a model with the tentative variable-resolution on each c random set and compute the respective CV-RMSE obtained by K-fold cross-validation. Compute the p-value of the Wilcoxon signed rank test between the tentative CVRMSE and reference CV-RMSE. Save the M-RMSE and p-value; b. For the tentative models with p-value lower than an entrance tolerance and M-RMSE lower than the reference M-RMSE, select the variable-resolution with lowest M-RMSE, add it to the model and update the reference model and CV-RMSE. 5. Removal stage: a. For each variable-resolution in the reference model: i. Fit a model without the tentative variable-resolution on each c random set and compute the respective CV-RMSE obtained by K-fold cross-validation. Compute the p-value of the Wilcoxon signed rank test between the tentative CV-RMSE and reference CV-RMSE. Save the M-RMSE and p-value; b. For the tentative models with p-value lower than an exit tolerance and M-RMSE lower than the reference M-RMSE, select the variable-resolution with lowest M-RMSE, remove it from the model and update the reference model and CV-RMSE. 6. If no variable has been added in Stage 4 nor removed in Stage 5, stop the algorithm. Otherwise, go to Stage 4.

2.5

Summary

The soft sensor methodologies presented in this section are summarized in Table 3. The models are defined by the operators in Appendix A, which also codify the amount of 18 ACS Paragon Plus Environment

Page 18 of 39

Page 19 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

information used by each modelling strategy. Note however that the variable selection methodologies restrict the amount of variables that are included in the model and thus some of the elements of the B(q) matrices are actually zero.

Table 3 Summary of the modelling approaches considered in this work, their short designation and respective defining relationships.

Number

Short name

1

PLS

D ( r ) Y ( r ) = { D ( r ) X (0) } B (0) PLS

2

DPLS

D ( r ) Y ( r ) = D ( r ) L( ) X(0) B (0) DPLS

3

Model

{

(r )

D Y

SR-FSR

(r )

}

0,l

= {D X (r )

(0)

}B

(0) FSR

qmax

4

q) D ( r ) Y ( r ) = ∑ {D ( r ) A(0,q ) X (0) } B (MRSS

MRSS-SC

q=0

qmax

5

q=0

3

{

}

q) D ( r ) Y ( r ) = ∑ D ( r ) A(0,q ) L( ) X (0) B (MRSS

MRSS-DC

0,l

Case studies

In order to illustrate the added value of the proposed methodology, we have applied it to two simulated case studies representing typical industrial scenarios. These contemplate situations where the data has a single-resolution structure and multiresolution structure, in order to demonstrate that both cases can benefit from the use of the multiresolution frameworks for soft sensor construction. The first case study consists of a continuous stirred-tank reactor where an endothermic reaction takes place, while the second case study concerns the simulation of a distillation column. The use of such simulated scenarios enables a direct control of the testing and comparison conditions and an access to the true values of the response, necessary for rigorously computing the effective estimation performance, while keeping their complexity close to that exhibit by real systems. All case studies were replicated 100 times, to safe guard against abnormal realizations as well as to assess the robustness of the modeling methodologies. Each replicate is composed by two independent data sets used for training and testing. Each data set contains 2r×1000 observations, where r is the resolution of the response. However, since the response has a time support of 2r, only 1000 observations (corresponding to the time instants in which the response is observed) can be used for 19 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 39

training and testing. The additional inter sample observations collected for the predictors (which have time support, 20 = 1) are treated according to the specifications of each modeling strategy. That is, for 1:PLS and 3:SR-FSR they are discarded (downsampled); for 2:DPLS they are added as time-shifted variables; for 4:MRSS-SC and 5: MRSS-DC they are averaged into low resolution variables. The number of latent variables retained in the 1:PLS and 2:DPLS models were selected through the adjusted Wold’s criterion (see Section 2.1). The performance assessment is based on the mean squared error (MSE, Equation (10)), using the noise free signal as reference. The relative performance of the methodologies was further assessed by the use of a paired t-test.

MSE =

3.1

1 n 2 ∑ ( yi − yˆi ) n i =1

(10)

Case Study 1: Continuous Stirred-Tank Reactor

The first process structure simulates a dynamical model of a continuous stirred-tank reactor (CSTR)

45

with free discharge flow, where an endothermic reaction of the type

A → B takes place (see Figure 2). The system is also composed by a heating jacket and is under a PI control system in order to maintain the temperature and fluid level close to their set-points. The inputs of the system are the feed stream concentration of compound A (CA0) and temperature (T0) and the heating fluid inlet temperature (Tj0). The system outputs are the CSTR level (h), concentration of compound A (CA), temperature (T) and the heating fluid outlet temperature (Tj). All variables are corrupted by white noise with a

signal-to-noise

ratio

(SNR)

of

20

dB,

being

SNR = 10 log ( var ( signal ) var ( noise ) ) .

20 ACS Paragon Plus Environment

the

SNR

defined

as:

Page 21 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 2 Process flow diagram of the CSTR.

In this case study, the concentration of compound A (CA) is used as the response variable, while the remaining variables are used as predictors. Furthermore, two scenarios where the response has different resolutions were also considered. In the first case, the response variable is at the highest resolution (r = 0), representing, for instance, a local high resolution sensor. This is the case where the original data is available at a single-resolution, and we explore the benefits of introducing a multiresolution structure in the modeling stage. As for the second case, the response variable is defined as a low resolution variable, containing the average values over 4 time instants (r = 2). This situation can be interpreted as compound samples resulting from the collection of samples at 4 consecutive time instants, which are then mixed and sent for laboratory analysis. In both cases, the predictors are available at the highest resolution (q = 0). The behavior of these variables is exemplified in Figure 3. Since the native resolution of the predictors may not be the best resolution to model the response, the multiresolution methodologies (4:MRSS-SC and 5:MRSS-DC) are allowed to search for other lower resolution versions up to the maximum of qmax = 5. That is, the predictors can be averaged over dyadic windows with at most 25 = 32 observations. Regarding the process dynamics, the maximum number of lags was set to 16. For 2:DPLS the number of lags is selected by testing all possible lags and then use the model leading to the lowest MSE computed from cross-validation on the training data set. As for 5:MRSS-DC, the time-shifted variables are treated as regular variables

21 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research

and the methodology is let free to choose which combinations of variables, time-shifts

C(2) A T j0

Tj

CA0

h

T

C(0) A

and resolutions are included in the model.

T0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3 Graphical representation of the response at high resolution (CA(0)) and low resolution (CA(2)) and its predictors at high resolution (h, T, Tj, CA0, T0 and Tj0).

The MSEs for the scenario where the response is at the highest resolution are presented in Figure 4 (single-resolution data sets). For reference, we note that the variance of the response is 4.2 × 10

-4

and that the variance of its noise is 3.9 × 10 -6. A comparison

between the estimates of 2:DPLS and 5:MRSS-DC is also provided in Figure 5. Figure 4 clearly indicates that the worse modelling approaches are 1:PLS and 3:SR-FSR. The main reasons for these results are that they neither are able to model process dynamics nor try to improve the model description by changing the predictors’ resolution. In turn, when 2:DPLS is applied, a significant improvement can be observed. Nevertheless, the estimates of 2:DPLS are still somewhat distant and dispersed w.r.t. the real values (see Figure 5 (a)). The 2:DPLS approach also presented a tendency to select a considerably large number of lags (16 lags in 86 % of the times and 15 lags in 10% of the times), but since the resolution of the predictors is kept unchanged, the past information is only 22 ACS Paragon Plus Environment

Page 22 of 39

Page 23 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

being used to estimate the response, rather than smoothing out the predictors. On the other hand, when the variables’ resolution is also optimized, a significant improvement can be observed for 4:MRSS-SC and 5:MRSS-DC. These results also show that the main cause of improvement in soft sensors predictive performance is due to the selection of the predictors’ resolution. Therefore, even when the response is at the same resolution as the predictors, there can be an advantage on changing the resolution of the predictors to optimize the performance of the soft sensors. To gain further insights on the factors leading to this result, the percentage of times that each resolution was selected by 4:MRSS-SC is represented in Figure 6 for each predictor. From this figure it can be verified that most of the predictors are set to a low resolution. This is more relevant for CA0 (with resolution 5 in 89% of the times) and T (with resolution 5 in 94% of the times). On the other hand, h is often selected at the highest resolution (with resolution 0 in 99% of the times). From the process dynamics, it is known that these three variables have a direct effect on CA and thus it comes with no surprise that they are found to be the most relevant during model building. Furthermore, the lower resolutions attributed to CA0 and T indicate that the relationship between the output concentration and these variables are characterized by slow evolving dynamics. That is, CA is not just the result of the instantaneous input concentration and reactor’s temperature, but instead of their past values and evolution over time. Similar results are obtained when the response variable is at a lower resolution than the predictors, see Figures 7 to 9. Nevertheless, it is now visible that the lower resolution of the response also induces the selection of predictors at even lower resolutions, which highlights even further the advantages of 5:MRSS-DC to simultaneously cope with process dynamics and the low resolution character of the response. Furthermore, this example also shows that while just optimizing the resolution of the predictors (4:MRSSSC) leads to significant improvements relative to the standard 2:DPLS, the major gain in performance is obtained when the process dynamics is complement by the multiresolution selection approach (5: MRSS-DC).

23 ACS Paragon Plus Environment

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 39

MSE

Industrial & Engineering Chemistry Research

Figure 4 Boxplot of MSE for 100 replicates of the CSTR case study, when all variables are at the highest resolution.

4.54

4.54

4.52

4.52

4.5

4.5

4.48

4.48

4.46

4.46

4.44

4.44

4.42 4.42

4.44

4.46

4.48

4.5

4.52

4.42 4.42

4.54

4.44

4.46

4.48

y

y

(a)

(b)

4.5

4.52

4.54

Figure 5 Relation between the real response (y) and its estimate (yest) using (a) 2:DPLS and (b) 5:MRSS-DC, when all variables are at their highest resolution.

h

T

100

100

80

80

60

60

40

40

20

20

0

0 0

1

2

3

4

5

0

Resolution

1

2

3

Resolution

24 ACS Paragon Plus Environment

4

5

Page 25 of 39

Tj

C A0

100

100

80

80

60

60

40

40

20

20

0

0 0

1

2

3

4

5

0

Resolution T0

1

2

3

4

5

4

5

Resolution Tj0

100

100

80

80

60

60

40

40

20

20

0

0 0

1

2

3

4

5

0

Resolution

1

2

3

Resolution

Figure 6 Selected resolution for each predictor in the CSTR case study, when all original variables are available at the highest resolution.

MSE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

Figure 7 Boxplot of MSE for 100 replicates of the CSTR case study, when the predictors are at the highest resolution (q = 0) and the response is at the lowest resolution (r = 2).

25 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

4.54

4.54

4.52

4.52

4.5

4.5

4.48

4.48

4.46

4.46

4.44

4.44

4.42

4.42

4.4 4.4

4.42

4.44

4.46

4.48

4.5

4.4 4.4

4.52

4.42

4.44

Page 26 of 39

4.46

4.48

y

y

(a)

(b)

4.5

4.52

Figure 8 Relation between the real response (y) and its estimate (yest) using (a) 2:DPLS and (b) 5:MRSS-DC, when the predictors are at the highest resolution (q = 0) and the response is at the lowest resolution (r = 2).

h

T

100

100

80

80

60

60

40

40

20

20

0

0 0

1

2

3

4

5

0

Resolution Tj

1

2

3

4

5

4

5

Resolution C A0

100

100

80

80

60

60

40

40

20

20

0

0 0

1

2

3

4

5

0

Resolution

1

2

3

Resolution

26 ACS Paragon Plus Environment

Page 27 of 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

T0

Tj0

100

100

80

80

60

60

40

40

20

20

0

0 0

1

2

3

4

5

0

Resolution

1

2

3

4

5

Resolution

Figure 9 Selected resolution for each predictor in the CSTR case study, when the predictors are at the highest resolution (q = 0) and the response is at the lowest resolution (r = 2).

3.2

Case Study 2: Distillation Column

In this section, the prediction capabilities of the different soft sensors are tested on a simulation scenario of a distillation column operating according to the model described in

46

Ref.

(available

at

http://www.nt.ntnu.no/users/skoge/distillation/nonideal_skouras/ternary; accessed on August 29, 2016). The distillation column separates a ternary mixture of methanol, ethanol and 1-propanol. The model is based on non-ideal vapor-liquid equilibria, computed through the Wilson equation. The column is composed by 41 stages, including a reboiler (stage 1) and a total condenser (stage 41). The feed is located at stage 21. The inputs of the system are the reflux (L) and boilup (V) flow rates, with disturbances in the feed flow (F) and in the feed composition (xA, xB and xC). The system is also regulated by two PI control loops in the condenser holdup (MD) and reboiler holdup (MB), which are controlled by the distillate flow (D) and bottom flow (B), respectively. Variability is introduced in the system through the feed flow (F) and feed composition (xA, xB and xC), whose behavior is described by the stochastic equations presented in Table 4. These variations cause fluctuations within a range of about ± 10 % the nominal value. Based on these disturbances, the simulator returns the compositions and temperatures in all stages of the column. The measurements are finally corrupted with white noise with a SNR of 20 dB.

27 ACS Paragon Plus Environment

Industrial & Engineering Chemistry Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Table 4 Configuration of the disturbances over time. ω(t;φ) represents an autoregressive time series with unit variance and autoregressive coefficient φ.

Disturbance F

Equation 1 + 0.033 × ω ( t;0.9 )

xA

xA =

xB xC

[ kmol min ]

zA , z A ( t ) = 0.4 + 0.033 × ω ( t;0.95 ) z A + z B + zC zB , z B ( t ) = 0.4 + 0.033 × ω ( t;0.95 ) xB = z A + z B + zC zC xC = , zC ( t ) = 0.2 + 0.033 × ω ( t;0.95) z A + z B + zC

In typical configurations of distillation columns, the variable of interest that defines the quality of the product is the distillate composition. Furthermore, this variable is often obtained by laboratory analysis and has a low resolution (i.e., it is an overall measure of quality over a given time window). To reflect this characteristic, the distillate composition returned by the simulator is converted into a low resolution variable with resolution r = 2 by averaging the original signal over a time support of 4 sampling times. Given the importance of this variable, a soft sensor is to be developed using the temperatures from the column stages as predictors. These temperature readings are all available at the highest resolution (q = 0). After downsampling the temperatures to the sampling rate of the distillate composition, 1:PLS and 3:SR-FSR can be applied without further processing. Consequently, both approaches lose the information conveyed by the discarded samples. As an alternative 2:DPLS accommodates these observations through the use of time-shifted variables. By doing so, process dynamics can also be accounted for. The maximum number of lags was set to l = 16. The actual number of lags was selected by cross-validation on the training data sets by minimization of the MSE. Based on this procedure, 2:DPLS tends to use between 5 to 7 lags. However, it should be noted that 2:DPLS still uses the native resolution of the temperatures. The multiresolution aspect is introduced by 4:MRSS-SC and 5: MRSS-DC. In both approaches, the lowest resolution that can be selected is set to qmax = 5, meaning that at most, each temperature is averaged over 32 observations. For

5:MRSS-DC the maximum number of lags was also set to l = 16. In this case, each time-shifted version of the original variables is considered as a regular variable, which is also tested at all possible resolutions. The MSEs for these methodologies over the 100 28 ACS Paragon Plus Environment

Page 28 of 39

Page 29 of 39

replicates are depicted in Figure 10. Note that the low MSE values in Figure 10 are caused by the low magnitude of the distillate concentration, which is on average 1.7 × 10-4 with a variance of 1.3 × 10-9. Furthermore, the added noise has a variance of 3.3 × 10-12. These results are complemented by the paired t-tests in Table 5. 10 -11 6 5 4

MSE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Industrial & Engineering Chemistry Research

3 2 1 1:PLS

2:DPLS

3:SR-FRS

4:MRSS-SC

5:MRSS-DC

Figure 10 Boxplot of MSE for 100 replicates of the distillation column case study, when the predictors are at the highest resolution (q = 0) and the response is at the lowest resolution (r = 2). Table 5 Paired t-test to the MSE of the modelling methodologies in the distillation column case study, when the predictors are at the highest resolution (q = 0) and the response is at the lowest resolution (r = 2). p-value and test statistic are presented for each pair of A – B. The values in bold represent cases where A is statistically better than B. The values underlined are cases where A and B are statistically equal.

A 1:PLS 2:DPLS 3:SR-FSR 4:MRSS-SC 5:MRSS-DC

1:PLS