Generalized Multiresolution Decomposition Frameworks for the

Aug 9, 2006 - Furthermore, the proposed approaches explicitly integrate data uncertainty information into their algorithms, to explore all knowledge a...
0 downloads 7 Views 143KB Size
6330

Ind. Eng. Chem. Res. 2006, 45, 6330-6338

Generalized Multiresolution Decomposition Frameworks for the Analysis of Industrial Data with Uncertainty and Missing Values Marco S. Reis* and Pedro M. Saraiva GEPSI-PSE Group, Department of Chemical Engineering, UniVersity of Coimbra, Po´ lo II-Rua Sı´lVio Lima, 3030-790 Coimbra, Portugal

In this paper, we address the problem of extending the multiscale decomposition framework based upon the wavelet transform to situations where datasets contain any type of missing data patterns (e.g., random, multirate). Furthermore, the proposed approaches explicitly integrate data uncertainty information into their algorithms, to explore all knowledge available about data during the decomposition stage, as well as to enable its posterior use in subsequent data-analysis tasks. Beside their role in the multiscale decomposition of complex datasets, these frameworks, called generalized multiresolution decomposition frameworks (GMRD), also lead to new developments in data-analysis tools based upon the information they provide, namely, in the selection of relevant scales for data analysis and in the improvement of signal-denoising procedures. Guidelines are presented regarding their adequate use and how they can be combined into an integrated multiscale data analysis framework. The pertinence of the proposed GMRD frameworks is illustrated through several examples involving simulated, laboratory, and industrial datasets. 1. Introduction The multiresolution decomposition analysis based upon the wavelet transform is instrumental when one needs to focus data analysis at a particular scale or to separate the several contributions to an overall phenomenon, arising from different scales either in time or length. It has been playing an important role in several domains since its original introduction by Mallat,1 either from a theoretical perspective, for instance in the constructive procedure of new orthogonal basis sets with compact support proposed by Daubechies,2 or from a practical point of view, by allowing the unification of several similar approaches used in separate fields under different names, such as “multiresolution signal processing” from computer vision, “pyramid algorithms” from image processing, “subband coding” and “filter banks” from signal processing, and “quadrature mirror filters” from digital speech processing.3-6 Examples of applied domains where new and improved approaches begin to emerge as a consequence of applying the wavelet-based multiresolution decomposition analysis include signal and image denoising7,8 and compression,9 process monitoring,10 system identification,11,12 time series analysis,13 regression, classification and clustering,14,15 and numerical analysis.16 However, this wavelet-based multiresolution decomposition framework can only be straightforwardly applied in those situations where there are no missing data in the datasets to be processed, a situation quite often not verified in industrial and laboratory applications, for a number of reasons, such as the occurrence of process upsets, problems in the chain “process sensor/transmission system/storage unit”, sensors with different acquisition rates (multirate sensors) and/or acquisition times, and laboratorial measurements made available only after a certain time period and sometimes with invalid results (thus giving rise to “blanks” in the datasets), among others. Therefore, in this paper, we put forward generalized multiresolution decomposition frameworks (GMRD) that are able to cope with any missing data structure, enabling data analysis to be * To whom correspondence should be addressed. Tel.: +351 239 798 700/727. Fax: +351 239 798 703. E-mail: [email protected].

straightforwardly carried out at several scales under these circumstances, and furthermore opening new opportunities to develop applications that explore the type of data structures they provide, such as the selection of adequate scales for performing data analysis (the term “generalized” reflects the broader dataset contexts where such frameworks can now be applied). Another theme that has gained considerable importance recently regards the specification and incorporation of uncertainty information in data analysis. Measurement uncertainty17-19 is a key concept in metrology, defined as a “parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand”17 (measurand is the “particular quantity subject to measurement”). According to the “Guide to the Expression of Uncertainty in Measurement” (GUM),17 the standard uncertainty, u(xi) (to which we will refer here simply as uncertainty), is expressed in terms of a standard deviation of the values collected from a series of observations (the so-called Type A evaluation) or through other adequate means (Type B evaluation), namely, relying upon an assumed probability density function based on the degree of belief that an event will occur. Numerical quantities, y, calculated from uncertain measurements, {xi}i)1:N, according to a functional relationship of the type

y ) f(x1,x2,‚‚‚,xN)

(1)

turn out to be also uncertain quantities and, therefore, should have associated an uncertainty value, the combined standard uncertainty, uc(y), which is calculated according to a propagation formula, such as the following: N

uc (y) ) 2

N

∂f ∂f

N

() ∂f

u(xi,xj) ) ∑ ∑ ∑ i)1 j)1 ∂x ∂x i)1 ∂x i

j

2

u2(xi) +

i N-1 N

2

∂f ∂f

u(xi,xj) ∑ ∑ i)1 j)i+1∂x ∂x i

(2)

j

Equation 2 is based on a Taylor series expansion neglecting second- and higher-order terms,20 which should be added if

10.1021/ie051313b CCC: $33.50 © 2006 American Chemical Society Published on Web 08/09/2006

Ind. Eng. Chem. Res., Vol. 45, No. 18, 2006 6331

nonlinear effects become important. Therefore, with the development of detailed metrology procedures for specifying the quality associated with measured quantities, which are furthermore being promoted and increasingly enforced by standardization organizations, one has now frequently available for analysis two tables: one with raw data and another with their associated uncertainties. Classical data-analysis tasks, formerly based strictly on raw data, such as principal components analysis and multivariate linear regression approaches (e.g., ordinary least squares, principal components regression) are, therefore, being “updated”, leading to their improved measurement uncertaintybased counterparts, which explicitly consider combined data/ uncertainty structures.21-26 In this paper, the availability of such a double-table structure will also be explored in order to improve our GMRD frameworks performance. Through them, we are not only able to provide the uncertainties associated with the coefficients computed at each scale but may even also integrate uncertainty information in the computation of these coefficients. We will, thus, address here the development of generalized multiresolution decomposition methodologies (GMRD) that incorporate data uncertainty information and are able to cope with difficult data/uncertainty structures, often met in industrial practice, like missing data and heteroscedastic (or nonhomogeneous) uncertainties. We also present some new developments that rely upon the information processed by these frameworks in order to illustrate their conceptual pertinence, namely, regarding issues such as scale selection and signal denoising, and illustrate their application through several examples, involving the analysis of simulated, laboratory, and industrial data. In the next section, we present three uncertainty-based multiresolution decomposition frameworks, address some guidelines for their use, and show how they can be adequately integrated in a multiscale data-analysis framework. Section 3 presents new developments of data-analysis tools relying upon GMRD frameworks, and in the following section we illustrate their application by using simulated and real-world industrial and laboratorial datasets. Finally, Section 5 presents the main conclusions reached so far. 2. Generalized Multiresolution Decomposition Frameworks (GMRD) The wavelet-based multiresolution decomposition proposed by Mallat1 enables the computation of coarser approximations for a given signal available at a finest scale, which can be seen as projections to approximation subspaces indexed by a scale index (j), Vj, that span progressively shorter regions of L2(R) and have a nested structure (Vj+1 ⊂ Vj,Vj+1 * Vj). On the other hand, details that are lost in the process of projecting the signal to increasingly coarser approximation spaces can also be considered as projections to complementary subspaces, the details spaces, Wj, that, in conjunction with the approximation space, span the full space of the original signal. This allows us to write a given signal at the finest scale, say f0, as the sum of its projection to the approximation space at scale j, plus all the details relative to the scales between: j

f0 ) f j +

∑ i)1

j

wi S f0 ) PrVj f0 +

PrW f0 ∑ i)1 i

(3)

The projections, fj and {wi}i)1,...,j in eq 3, can adequately be written in terms of linear combinations of the spaces basis functions multiplied by the expansion coefficients (calculated as inner products of the signal and the basis functions). These

expansion coefficients are called approximation coefficients, ajk(k ∈ Z) (multipliers of the basis functions for the approximation space), and detail coefficients, dik(i ) 1, ..., j; k ∈ Z) (multipliers of the basis functions for the detail spaces), and are usually referred to as the orthogonal wavelet transform or wavelet coefficients: j

f0 )

∑k ajkφj,k + ∑ ∑dikψi,k, i)1 k

ajk ) 〈fj,φj,k〉, dik ) 〈fj,ψi,k〉 (4)

For the purposes of this paper, we will define a “generalized multiresolution decomposition framework” as any algorithm that provides expansion coefficients (approximation and detail coefficients) of the type mentioned above. In the particular situations where no data is missing and their associated uncertainties remain constant over the time (or length) domain (i.e., the error structure affecting measurements is homogeneous or homoscedastic), they reduce themselves to the classic wavelet transform coefficients, so that they take advantage of all the available knowledge regarding wavelet basis functions with well-defined properties established by design and for which there are very efficient algorithms to recursively compute coefficients based on convolutions with quadrature mirror filters. However, since this procedure cannot be directly applied in less conventional situations, namely, those with missing data, something that occurs quite often in industrial scenarios, and furthermore does not take explicitly into account data uncertainties, when these are available, we will propose in the next subsections three categories of methodologies that allow one to handle these issues under different settings: when there is missing data and homoscedastic (constant) or heteroscedastic (varying) uncertainties (Methods 1 and 2) and when there is no data missing and the uncertainties are either homoscedastic or heteroscedastic (Method 3). 2.1. Method 1: Adjusting Filter Weights According to Data Uncertainties. The Haar wavelet transform, perhaps the simplest of all known wavelet transform families, attributes a very clear meaning to its coefficients: approximation coefficients are averages of nonoverlapping blocks with two elements (scaled by a factor of 1/x2), and detail coefficients correspond to the difference between the average and the first element. Cascading this procedure across the successive sets of approximation coefficients results in the Haar wavelet transform, which can then be written as j+1 j ) C[k/2](ajk + ak+1 ) a[k/2]

(5)

j+1 j ) C[k/2](a[k/2] - ajk) d[k/2]

(6)

where [x] is the smallest integer n, such that n g x and C[k/2] ) x2/2. Such a process gives equal weight to both values in the calculation of their average (approximation coefficient). With data measurement uncertainties at our disposal, the averaging process can be modified so that the calculation of the coarser approximation coefficient (at scale j + 1) gives more weight to the datum with less uncertainty (at scale j). This can be achieved by using different averaging coefficients applied to each datum: j+1 j+1,1 j j+1,2 j ) C[k/2] ak + C[k/2] ak+1 a[k/2]

(7)

Adequate weights can be set using formulas for the MVUE (minimum variance unbiased estimator) of the (common) average, suggesting the averaging coefficients associated to each

6332

Ind. Eng. Chem. Res., Vol. 45, No. 18, 2006

datum presented in eq 8. As the computed approximation coefficients at scale j + 1 are just linear combinations of finerscale observations at scale j, the estimator underlying eq 8 is indeed a BLUE (best linear unbiased estimator). The only underlying assumption that we have to consider here is the independence of errors affecting these finer-scale observations, which must have associated finite standard deviations (uncertainties). In this case, we propose eq 9 for computing detail coefficients at scale j + 1, as it enables a sound and consistent interpretation of the concept of “details” under these new circumstances, as will be explained below, while maintaining a structural similarity with the classical formula: j+1,1 C[k/2] )

Rule 1. No missing data w use Haar and calculate uncertainties through eq 10

Rule 2. {ajk} is missing w

{

j+1 j+1,1 j j+1 j+1,2 j+1 j ) C[k/2] (ak - a[k/2] ) ) C[k/2] (a[k/2]l - ak+1 ) d[k/2]

(9)

Data uncertainty for the approximation coefficients at scale j should also be propagated to the approximation and detail coefficients at the next coarser scale j + 1, so that the averaging procedure can continue to coarser scales, enabling the specification of uncertainties associated with each coefficient for posterior use. This can be done by applying the law of propagation of uncertainties to the present situation,19 j+1,1 2 j+1,2 2 j ) u(ajk)2 + (C[k/2] ) u(ak+1 )2 x(C[k/2]

(10)

x

j+1 j+1,1 2 j j+1,1 u(d[k/2] ) ) (C[k/2] ) (u(ajk)2 + u(ak+1 )2 - 2C[k/2] u(ajk)2) (11)

where it is assumed that the errors from two successive observations are independent from each other, although more complex error structures can also be handled. By conducting a multiresolution decomposition using this procedure, we give more weight to the values with less uncertainty associated during the calculation of the approximation coefficients. In the limit, if a datum at scale j has a very high uncertainty associated with it, then this value will not contribute significantly to the calculation of the next approximation coefficient at scale j + 1, and the detail coefficient calculated at this scale will also have a very low value, in agreement with the intuitive reasoning that, in fact, very little information (“detail”) is lost in the replacement of the two values by their weighted average (using the inverse of the square uncertainties as weights), when one of them is not reliable at all. Extending this reasoning even further, we can see that this calculation scheme allows for the natural integration of missing data in the analysis, as a missing datum can be considered to be any finite number with an infinite uncertainty associated with it, which effectively removes it from eqs 7-11. In this case, the coarser approximation coefficient will have the same value as the nonmissing datum and the coarser detail coefficient will be zero. Therefore, there is no need for any additional formal modification of Method 1 in order to accommodate for the presence of missing data. For the situation where there is no missing data and data uncertainties are homoscedastic, this multiresolution decomposition framework provides the same results as the Haar transform (up to a scaling factor of 2j/2 for the coefficients at each scale). 2.2. Method 2: Use Haar Wavelet Filter, Accommodate Missing Data, and Propagate Data Uncertainties to Coarser Coefficients. In this second methodology, the averaging and differencing coefficients are kept constant and equal to the ones

j+1 j j+1 j a[k/2] ) ak+1 ,u(a[k/2] ) ) u(ak+1 )

j Rule 3. {ak+1 } is missing w

j+1 j+1 ) 0,u(d[k/2] ))0 d[k/2]

{

j+1 j+1 a[k/2] ) ajk,u(a[k/2] ) ) u(ajk) j+1 j+1 ) 0,u(d[k/2] ))0 d[k/2]

j Rule 4. {ajk,ak+1 } are missing w

{

;

j+1 j+1 a[k/2] ) missing, u(a[k/2] ) ) missing

1/u(ajk)2

j+1,1 j+1,2 , C[k/2] ) 1 - C[k/2] (8) 1/u(ajk)2 + 1/u(akj + 1)2

j+1 )) u(a[k/2]

Table 1. GMRD Frameworks: Table of Rules for Method 2

j+1 j+1 ) missing, u(d[k/2] ) ) missing d[k/2]

suggested by the Haar filter. When there are no data missing, the uncertainties of the finer approximation coefficients are propagated to the coarser approximation and detail coefficients, using the law of propagation of uncertainties, j+1 j+1 u(a[k/2] ) ) u(d[k/2] ))

j )2 x(x2/2)2u(ajk)2 + (x2/2)2u(ak+1

(12)

where independent errors are assumed. If there are missing data in the dataset, the computation of the next coarser approximation and detail coefficients follows by essentially applying successively the set of rules defined in Table 1 to each new pair of j }. approximation coefficients at scale j, {ajk,ak+1 From this set of rules, it is possible to see that, when no missing data is present, the procedure consists of applying the Haar wavelet with uncertainty propagation. However, when there are missing data, it may happen that it remains present at coarser scales (fourth rule). This fact is instrumental for analyzing the information content at different scales, and will be explored in the development of a procedure that suggests the minimum scale for carrying out a given data-analysis task, presented in the next section. 2.3. Method 3: Use Any Wavelet Filter and Propagate Data Uncertainties to Coarser Coefficients. Missing data is not always an issue in datasets to be analyzed. Therefore, for those situations where complete datasets are available, it is advantageous to explore the benefits of the wavelet filter coefficients that were designed in some optimal sense, so that their good multiscale decomposition properties can be brought to the analysis. However, data uncertainties should also be incorporated, to allocate the available knowledge regarding raw data uncertainty into the calculated approximation and detail coefficients. There is a situation where this is particularly simple, which occurs when the uncertainties across observations for each variable are homoscedastic (constant). In this case, it can be shown that all the approximation and detail coefficients at coarser scales have the same uncertainty as raw data at the finest scale.8,27 This can easily be checked by analyzing eq 12 for the case of the Haar wavelet, but it still holds for any orthogonal wavelets family. For heteroscedastic situations, the law of propagation of uncertainties is applied in order to compute the uncertainties associated with coarser coefficients. When used with the Haar wavelet filter, this method coincides with Method 2 for cases where no data is missing. 2.4. Guidelines on the Use of GMRD Frameworks. Method 1, on one hand, and Methods 2 and 3, on the other, differ deeply on how they implement the incorporation of uncertainty information in their GMRD frameworks. In this section, a general guideline about which type of approach to use and when is provided. For that purpose, let us consider an artificial

Ind. Eng. Chem. Res., Vol. 45, No. 18, 2006 6333

Figure 2. Block diagram underlying the selection procedure for an adequate GMRD framework (Method 1, 2, or 3).

Figure 1. (a) True signal used in the simulation; (b) a realization of the noisy signal; and (c) box plots for the difference in MSE at each scale (j) obtained for the two types of methods over 100 simulations.

piecewise constant signal, where the values are held constant in windows of 24 ) 16 successive values (Figure 1a), to which proportional noise with uncertainties assumedly known is added. Using the noisy signal (Figure 1b) it is possible to compute its approximations at coarser scales (j ) 1, 2, ...), according to the two types of approaches, and then to see which method performs better in the task of approximating the true signal when projected at the same scale, j. Our assumed performance index here is the mean square error (MSE) between the approximation at scale j, calculated for the noisy signal, and that for the true signal, MSE(j), with Figure 1c summarizing the results obtained from 100 of such simulations. These results illustrate a guideline according to which, from the strict standpoint of the approximation ability at coarser scales, Method 1 is more adequate than Methods 2 and 3 for constant signals and for piecewise constant signals until we reach the scale where the “true” values begin to vary from observation to observation, i.e., for which the piecewise constant behavior stops. As the original signal has constant values along windows of 16 values, the piecewise constant pattern breaks down after scale j ) 4 (the support of the coefficients at this scale in the original, finest-scale domain is 24 ) 16, meaning that they will vary from observation to observation after this point). This occurs because Method 1 is based on the MVUE estimator of an underlying constant mean for two successive values, thus leading to improved results when this assumption holds, at least approximately, as in the case of piecewise constant signals, being overtaken by the second type of methods when such an assumption is no longer valid. This conclusion is formalized in Figure 2, where a block diagram is presented summarizing the decision process that leads to the selection of an adequate GMRD framework. 3. Data Analysis Applications Based Upon GMRD Frameworks GMRD frameworks provide approximation and detail coefficients for the different scales at which data can be represented, along with their associated uncertainties. This information can be the final goal of the analysis (to see how information is distributed across scales), or it can be an intermediate step in an integrated multiscale data analysis procedure (Figure 3). In this section, we address two representatives of such data-analysis

Figure 3. Schematic illustration of the basic modules underlying an integrated multiscale data analysis toolbox.

tasks, to illustrate the practical interest of GMRD frameworks: scale selection and signal denoising. 3.1. Scale Selection for Data Analysis. When dealing with industrial databases, where hundreds of variables from different points in the mill are being collected, together with productquality variables obtained from the quality-control labs, it often happens that such datasets present a very sparse structure. This means that they have a lot of “holes”, because of variables having different acquisition rates and/or arising from missing data randomly scattered through the records for each variable, owing to process, instrumentation, communications, or data storage related problems. Any efforts directed toward conducting data analysis at a very fine time scale (e.g., minutes), may therefore become useless when, for instance, most of the variables are being collected at a coarser time scale (e.g., hours). It would, therefore, be very appealing to have at our disposal a tool that could suggest what is the finest time scale where a given analysis can be carried out, leaving to the analyst a final decision about which coarser scale should be indeed adopted. Method 2 is able to cope both with missing data and data uncertainty and, therefore, provides a GMRD framework that is instrumental in deciding about a minimum scale for analysis on the basis of either the amount of missing data present or on the uncertainty information available, or even both. After introducing the general methodologies for scale selection, we will present, in the next section, a real case study where they are applied in order to select an appropriate scale for conducting data analysis considering only the presence of missing data and another case study where the decision was made based on the available uncertainty information. 3.1.a. Scale Selection Based on Missing Data. The four rules presented before, regarding the implementation of our GMRD

6334

Ind. Eng. Chem. Res., Vol. 45, No. 18, 2006

Table 2. Rules to Be Adopted during the Reconstruction Procedure for the GMRD Framework (Method 2), within the Scope of Scale Selection Rule 1. No value missing w use Haar reconstruction procedure j+1 j+1 Rule 2. {a[k/2] , d[k/2] } missing w ajk ) missing, djk ) missing

framework according to Method 2, lead to detail and approximation coefficients that can also eventually contain missing data. Let us now define the “reconstruction” procedure that, starting from the coarser approximation coefficients, and using the sets of finer detail coefficients, successively “reconstructs” the finer approximation signals for all the scales below the coarser one, based on the set of rules defined in Table 2. By using this “reconstruction” procedure, it is possible to come up with a succession of “reconstructed” approximation signals for all scales, which differ from the ones that were obtained in the decomposition phase in the presence of missing data. This happens because, when one datum is missing, the proposed decomposition procedure leads to the application of rules 2 and 3, introducing a nonmissing datum for the coarser approximation coefficient and a zero in the coarser detail coefficient; then, during the “reconstruction” phase, rule 1 results in two equal nonmissing values, where originally we had only one. Therefore, in the presence of missing data, the “reconstruction” process “creates” more data (or energy) through a scheme closely related to wavelet interpolation (this is why we keep the term “reconstruction” between quotation marks, as it is a matter not only of reconstructing but also of interpolating when there are missing data). It is this increase in the energy of the approximation signals at the finest scales in the presence of missing data (energy is here defined as the sum of the squares of nonmissing values) that allows for the development of a diagnostic tool that provides the scale, up to which missing data do play a significant role (interfering in the reconstruction phase) and after which such a behavior becomes attenuated. By plotting the energy of the approximation coefficients at all scales obtained in the decomposition procedure, and that for the approximation coefficients obtained after the “reconstruction” phase, along with their difference, to better extract the point where their behavior stops diverging significantly, it is possible to quickly suggest a minimum scale to be considered for conducting proper data analysis. 3.1.b. Scale Selection Based on Data Uncertainties. Excessive noise may also hinder the analysis at finer scales, because any fine structure that might be present is almost completely masked by the superposition of the unstructured noise component. Basically, this means not only that the true signal’s and noise’s spectra overlap in the higher frequency ranges but also that the magnitude of the power spectrum for the noise source is higher in these frequency regions. Therefore, such frequency bands do not contain relevant information about the underlying true signal and, therefore, should not be considered during data analysis. A simple and effective way of accomplishing this goal consists of applying an uncertainty-based coefficient thresholding methodology and checking whether there are any scales where detail coefficients are massively thresholded (note that detail coefficients for a given scale contain localized information regarding frequency bands28). GMRD using Method 3, which incorporates uncertainty propagation, should then be adopted here, and a plot of the energy associated with the original detail coefficients and with that for the thresholded ones (or for the difference between them), as well as plots of the percentage of the original energy in that scale that is eliminated by the thresholding operation, will highlight those scales dominated

by noise and which are, therefore, not meaningful for proper data analysis. 3.1.c. Scale Selection Based on Missing Data and Data Uncertainties. The procedure proposed for supporting scale selection when both missing data and data uncertainties have to be taken into account simultaneously consists of applying simultaneously the two methodologies presented before. Thus, it consists of decomposing data using Method 2, after which uncertainty-based thresholding is applied to nonmissing detail coefficients. The simultaneous analysis of the plots relative to the (differential) distribution of energy contained in the detail coefficients (based on data uncertainties) and approximation coefficients (which consider the presence of missing data), as described above, provides the information needed to support a decision that considers both missing data and data uncertainty. 3.2. Signal Denoising. Wavelets found great success in the task of “cleaning signals” from undesirable components of stochastic nature, often called in a general sense as “noise”. They also prove to be useful in situations were an underlying model is available (empirical or derived from first principles) in the related task of optimal estimation.29-33 If we are in such a position that the main noise features are known, namely, measurement uncertainties, even though no model can be adopted, then it is desirable to use such an important additional piece of information to come up with improved denoising schemes. The standard denoising procedure consists of the following sequence of steps: (1) decomposition of the signal into its wavelet coefficients; (2) application of a thresholding technique to the calculated coefficients; and (3) reconstruction of the signal using the processed coefficients.7 A simple yet effective way for integrating data uncertainty information in this type of denoising scheme that has proved to be quite successful in several applications consists of implementing a GMRD framework (namely, using Method 3, as missing data is not usually a problem when the objective is to clean data for noise), and using the uncertainty information provided for each detail and approximation coefficient, to implement a hard-thresholding policy that eliminates coefficients whose absolute magnitude is below ku(djk), where k is a thresholding constant, common to all coefficients, and u(djk) is the uncertainty associated with coefficient djk. In the next section, an example illustrating this class of procedures is presented. 4. Illustrative Examples Regarding GMRD Applications In this section, we present several simulated examples and real-world case studies regarding the use of GMRD frameworks in the tasks of scale selection and signal denoising, using the methodologies presented in the previous section. 4.1. Scale Selection in a Pulp and Paper Dataset. A selected subgroup of nine key quality variables relative to the pulp produced in an integrated pulp & paper Portuguese mill (Portucel), related to paper structure, strength, and optical properties, was collected during 4.5 years. They will be analyzed in order to identify any relevant variation patterns along time, as well as process upsets and disturbances, so that potential root causes can be found and worked out, leading to process improvement. The finest resolution (j ) 0) present in the data is relative to a daily basis. The first decision that one has to make concerns the choice of the scale where the analysis should be conducted. We based our decision on a criterion that considers only the presence of missing data, because it was known, in advance, that this was the major problem in our dataset. Therefore, the scale selection methodology centered on

Ind. Eng. Chem. Res., Vol. 45, No. 18, 2006 6335

Figure 4. (a) Plot of energy contained in the approximation signals after decomposition and reconstruction, at several scales, and (b) semilog plot of the difference between both of these energies, for each scale.

Figure 5. Results obtained in the data analysis conducted at scale j ) 3, where the monitoring statistic Tw2 is the counterpart for T2 used in MSPC based on PCA. The first subset of data (to the left of the vertical dashed line) is used as a reference set to estimate monitoring parameters (after outliers have been removed). The horizontal dashed line is relative to the 95% upper control limit for the adopted statistic.

missing data criteria was adopted, and all variables were analyzed in order to decide about the finest scale where such an analysis should be undertaken. Since measurement frequencies for all of the nine variables in the pant laboratory are approximately the same, it is not difficult to come up with a single scale that would be valid for all variables. Figure 4 illustrates some of the plots thus obtained, relative to variable X8, where it is possible to see that, after scale j ) 3 (i.e., 23 ) 8 days), the effects of the presence of missing data significantly decrease (energy associated with the approximation coefficients obtained in the decomposition phase gets close to the one obtained after the “reconstruction” phase). Therefore, the scale selected for conducting data analysis in this case was j ) 3. This is consistent with the fact that, during a period of ∼1 year, these variables were measured weekly. After selecting a proper scale of analysis (j ) 3), we applied a modification of the multivariate process monitoring scheme based on PCA, designed to take into account uncertainty information available in the approximation coefficients, i.e., which considers explicitly all available information (raw values and uncertainties).34 Some results of this study are presented in Figure 5, where one of the statistics of this method, analogous to the PCA-MSPC T2 statistic,35,36 is presented. It is possible to see that relationships between the several quality variables have deviated substantially

across time in a kind of stepwise pattern, occasionally spiked with some rare additional events. An increase in variation during the final time period is also observable. In this example, the selection of the scale to be employed for data analysis was facilitated by the fact that the same scale was suggested by the procedure for all the variables considered, due to reasons that were already addressed. However, there are situations where that does not happen, such as when variables have different acquisition rates or distinct amounts of missing data. For these situations, there are two alternatives: (1) to analyze each variable separately and then, based on the knowledge we have about the process and the relative importance of the variables, decide about the scale; and (2) if no such subject matter knowledge is available, then an averaging weighting scheme has to be applied to the scales suggested for each variable, which should be related to the subsequent analysis purposes. For instance, we can use MLPCA, an extension of PCA that makes explicit use of data uncertainties and is able to cope with missing data,24 in order to use the information provided by this method for the loading vectors (which contain information about the importance of each variable in each component) as well as the weighted sum of squares explained by each component (which contains information regarding the importance of each component in the explanation of the overall data variability, taking uncertainties into account), to come up with weights to be applied to the scales suggested for each variable, considering their importance in the explanation of the underlying overall latent variable structure. 4.2. Analysis of Paper Surface Characteristics Using Profilometry. Paper surface plays a key role in its quality, as it is directly connected to a number of important paper properties from the end user’s perspective, such as general appearance (optical properties, flatness), printability (e.g., the absorption of ink), and friction. Figure 6 presents an accurate surface profile obtained with a mechanical stylus profilometer, where it is quite clear that different surface phenomena are located at different scales: at a coarser scale, the presence of waves indicates a problem known as “waviness” or “piping streaks”, which has a characteristic wavelength of ∼15 mm, while at finer scales, paper micro- and macroroughness (relative to variations in the cross direction, X, over the ranges of 1-100 µm and 100-

6336

Ind. Eng. Chem. Res., Vol. 45, No. 18, 2006

Figure 6. Surface profile in the transversal direction, for a paper sample exhibiting waviness phenomena.

Figure 8. Denoising results associated with the four alternative methodologies (Haar, TI Haar, Haar + uncertainty propagation, and TI Haar + uncertainty propagation) for 100 noise realizations.

Figure 7. Plots of (a) distribution of energy in detail coefficients across scales, (b) percentage of energy originally contained in each scale that is removed by the thresholding operation (relative to the original energy content of that scale), and (c) percentage of eliminated coefficients in each scale (relative to the original number of coefficients in that scale).

1000 µm, respectively) dominate variability. However, there is also an additional contribution to the observed profile that should be considered, due to measurement noise, which is a consequence of the limited resolution of the measuring device employed for detecting oscillations in the thickness direction (Z) below a certain level, in this case, 8 nm. The proposed GMRD framework based on data uncertainties (Method 3) allows for the incorporation of this type of knowledge and provides important clues about the minimum scale that can be used and the scales where the dominant phenomena are located. In Figure 7, we present the distribution of energy contained in the detail coefficients obtained by decomposing the original profile using a Symmlet-8 wavelet filter (part a) and also information regarding the coefficients that are eliminated after applying a thresholding operation that eliminates details below the resolution level: the percentage of energy originally contained in each scale that is removed by the thresholding operation (part b) and the percentage of eliminated coefficients at each scale (part c). From Figure 7, it can be seen that the dominating phenomena are located at scale 11, corresponding to 8.93 × 211 µm ≈ 18.3 mm (separation between two successive samples in the X direction is ∼8.93 µm), i.e., quite close to the characteristic wavelength of the waviness phenomena, and that the profile is almost unaffected by measurement noise at all scales, since only very few coefficients are discarded at the finest scales as a

consequence of the limited resolution of the measuring device. Therefore, the high-resolution profilometer is indeed suitable to assess fine details of paper surface (minimum scale for analysis is j ) 1), and one may also conclude that, in this particular case, all the scales do contain potentially relevant information regarding a characterization of the underlying phenomena. Furthermore, efforts for analyzing and monitoring waviness phenomena should focus mainly around scales j ) 10 and j ) 11. 4.3. Uncertainty-Based Denoising. In this example, we illustrate how the ideas presented in Section 3.2 can be implemented in practice, with added benefits. To enable a comparative assessment of the several methodologies involved in denoising, the true signal must be known, and therefore, we followed an approach based upon simulated data. The true signal consists of a smoothed version of a near-infrared (NIR) spectrum, to which heteroscedastic proportional noise was added. The standard denoising procedure, described in Section 3.2, was then applied to the noisy signal, using the Haar wavelet transformation with the threshold suggested by Donoho and Johnstone,7 T ) σˆ x2 ln(N), where σˆ is a robust estimator of the noise (constant) standard deviation. Furthermore, the “translation invariant” extension of this classic procedure, based on Coifman’s “cycle spinning” concept,37 “average[shiftdenoise-unshift]”, where all possible shifts are used, was also tested. We will call this alternative methodology “TI Haar”. These two methods are to be compared with their counterpart procedures that take advantage of the uncertainty information provided by our GMRD frameworks (Methods 2 or 3, because they coincide when there is no missing data), as proposed in Section 3.2, which are called, respectively, “Haar + uncertainty propagation” and “TI Haar + uncertainty propagation” (only 10 rotations were used for this method). All the tested methods used the same wavelet (Haar), threshold constant (x2 ln(N)), and thresholding policy (“hard threshold”). Figure 8 presents results regarding MSE scores of the reconstructed signal (scale j ) 0) relative to the true one, obtained from 100 realizations of additive noise. A clear improvement in MSE is found for the uncertainty-based methods, regarding their classical counterparts. Figure 9 illustrates the denoising effect for one of such realizations, where the more effective denoising action provided by methods based upon GMRD frameworks can be seen graphically. One can also see that the smoothing action due to the averaging scheme over

Ind. Eng. Chem. Res., Vol. 45, No. 18, 2006 6337

Literature Cited

Figure 9. Examples of denoising using the four methods referred in the text (Haar, TI Haar, Haar + uncertainty propagation, and TI Haar + uncertainty propagation), for a realization of additive heteroscedastic proportional noise.

several shifts really enhances the discontinuous nature of the denoised signal obtained with the Haar wavelet filter. 5. Conclusions MRD frameworks play an essential role when one needs to focus data analysis at a particular scale or to identify several contributions to an overall phenomenon, arising from different scales in time or length. However, the presence of missing data prevents the use of classical MRD based on wavelets, and the incorporation of data uncertainty in the analysis is also desirable from the standpoint of using all information right from the beginning, making it also available for subsequent tasks, in an integrated multiscale data analysis environment. Therefore, we propose in this paper generalized multiresolution decomposition frameworks that are able to cope with these issues, and provide guidelines about their use and integration. Methods 1 and 2 handle the presence of missing data and any structure of data uncertainties, with the former being especially devoted to piecewise constant signals. Method 3 handles those cases where no missing data is present, incorporating data uncertainty in the computation of detail and approximation coefficients. It should be stressed that Methods 1 and 2 are not extensions of the wavelet transform in a strict sense, as some of their fundamental properties do not always hold, such as the energy conservation property in Method 2 (in the sense of Plancherel formula27). However, they do allow one to extend the wavelet multiresolution decomposition to contexts where it could not be applied otherwise (at least without some serious preprocessing efforts), namely, when we have missing data. Furthermore, such methods provide new tools for addressing other types of problems in data analysis (such as the one of selecting a proper scale) or for improving current available methodologies (signal denoising), by making a more complete use of all available data information. Simulated and real-world data-analysis problems illustrate the use of various methodologies proposed and their practical benefits. Acknowledgment The authors would like to acknowledge Portucel, SA (Grupo Portucel Soporcel, Portugal), for providing real data for the case studies, and Portuguese FCT for financial support through research project POCTI/EQU/47638/2002.

(1) Mallat, S. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Trans. Pattern Anal. Machine Intell. 1989, 11, (7), 674-693. (2) Daubechies, I. Orthonormal Bases of Compactly Supported Wavelets. Commun. Pure Appl. Math. 1988, 41, 909-996. (3) Bruce, A.; Donoho, D.; Gao, H.-Y. Wavelet Analysis. IEEE Spectrum 1996, 26-35. (4) Burrus, C. S.; Gopinath, R. A.; Guo, H. Introduction to WaVelets and WaVelet TransformssA Primer. Prentice Hall: Upper Saddle River, NJ, 1998. (5) Rioul, O.; Vetterli, M. Wavelets and Signal Processing. IEEE Signal Process. Mag. 1991, 8 (4), 14-38. (6) Hubbard, B. B. The World According to WaVeletssThe Story of a Mathematical Technique in the Making, 2nd ed.; A K Peters: Natick, MA, 1998. (7) Donoho, D. L.; Johnstone, I. M. Ideal Spatial Adaptation by WaVelet Shrinkage; Department of Statistics, Stanford University: Stanford, CA, 1992. (8) Jansen, M. Noise Reduction by WaVelet Thresholding; Springer: New York, 2001; Vol. 161. (9) Vetterli, M.; Kovacˇevic´, J. WaVelets and Subband Coding; Prentice Hall: Upper Saddle River, NJ, 1995. (10) Bakshi, B. R. Multiscale PCA with Application to Multivariate Statistical Process Control. AIChE J. 1998, 44 (7), 1596-1610. (11) Carrier, J. F.; Stephanopoulos, G. Wavelet-Based Modulation in Control-Relevant Process Identification. AIChE J. 1998, 44 (2), 341360. (12) Nikolaou, M.; Vuthandam, P. FIR Model Identification: Parsimony Through Kernel Compression with Wavelets. AIChE J. 1998, 44 (1), 141150. (13) Percival, D. B.; Walden, A. T. WaVelets Methods for Time Series Analysis; Cambridge University Press: Cambridge, U.K., 2000. (14) Alsberg, B. K.; Woodward, A. M.; Winson, M. K.; Rowland, J. J.; Kell, D. B. Variable Selection in Wavelet Regression Models. Anal. Chim. Acta 1998, 368, 29-44. (15) Walczak, B.; van den Bogaert, V.; Massart, D. L. Application of Wavelet Packet Transform in Pattern Recognition of Near-IR Data. Anal. Chem. 1996, 68, 1742-1747. (16) Beylkin, G.; Coifman, R.; Rokhlin, V. Fast Wavelet Transforms and Numerical Algorithms I. Commun. Pure Appl. Math. 1991, XLIV, 141183. (17) ISO. Guide to the Expression of Uncertainty; ISO: Geneva, Switzerland, 1993. (18) Kimothi, S. K. The Uncertainty of Measurements; ASQ: Milwaukee, WI, 2002. (19) Lira, I. EValuating the Measurement Uncertainty; Institute of Physics Publishing: Bristol, U.K., 2002. (20) Herrador, M. A Ä .; Asuero, A. G.; Gonza´lez, A. G. Estimation of the Uncertainty of Indirect Measurements from the Propagation of Distributions by Using the Monte Carlo Method: An Overview. J. Chemom. 2005, 79, 115-122. (21) Bro, R.; Sidiropoulos, N. D.; Smilde, A. K. Maximum Likelihood Fitting Using Ordinary Least Squares Algorithms. J. Chemom. 2002, 16, 387-400. (22) Reis, M. S.; Saraiva, P. M. Integration of Data Uncertainty in Linear Regression and Process Optimization. AIChE J. 2005, 51 (11), 30073019. (23) Rı´o, F. J.; Rio, J.; Rius, F. X. Prediction Intervals in Linear Regression Taking Into Account Errors in Both Axis. J. Chemom. 2001, 15, 773-788. (24) Wentzell, P. D.; Andrews, D. T.; Hamilton, D. C.; Faber, K.; Kowalski, B. R. Maximum Likelihood Principal Component Analysis. J. Chemom. 1997, 11, 339-366. (25) Wentzell, P. D.; Andrews, D. T.; Kowalski, B. R. Maximum Likelihood Multivariate Calibration. Anal. Chem. 1997, 69, 22992311. (26) Wentzell, P. D.; Lohnes, M. T. Maximum Likelihood Principal Component Analysis with Correlated Measurements Errors: Theoretical and Practical Considerations. Chemom. Intell. Lab. Syst. 1999, 45, 6585. (27) Mallat, S., A WaVelet Tour of Signal Processing; Academic Press: San Diego, CA, 1998. (28) Alsberg, B. K.; Woodward, A. M.; Kell, D. B. An Introduction to Wavelet Transforms for Chemometricians: A Time-Frequency Approach. Chemom. Intell. Lab. Syst. 1997, 37, 215-239.

6338

Ind. Eng. Chem. Res., Vol. 45, No. 18, 2006

(29) Bassevile, M.; Benveniste, A.; Chou, K. C.; Golden, S. A.; Nikoukhah, R.; Willsky, A. S. Modeling and Estimation of Multiresolution Stochastic Processes. IEEE Trans. Inf. Theory 1992, 38 (2), 766-784. (30) Chou, K. C.; Willsky, A. S.; Benveniste, A. Multiscale Recursive Estimation, Data Fusion, and Regularization. IEEE Trans. Autom. Control 1994, 39 (3), 464-478. (31) Dyer, M. A Multiscale Approach to State Estimation with Applications in Process Operability Analysis and Model Predictive Control. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, 2000. (32) Nounou, M. N.; Bakshi, B. R. On-Line Multiscale Filtering of Random and Gross Errors Without Process Models. AIChE J. 1999, 45 (5), 1041-1058. (33) Stephanopoulos, G.; Dyer, M.; Karsligil, O. Multi-Scale Modeling, Estimation and Control of Processing Systems. Comput. Chem. Eng. 1997, 21, S797-S803 (Supplement for the PSE’97-ESCAPE-7, Joint 6th International Symposium of Process Systems Engineering and 30th European

Symposium on Computer Aided Process Engineering, Trondheim, Norway, May 1997). (34) Reis, M. S.; Saraiva, P. M. Heteroscedastic Latent Variable Modelling with Applications to Multivariate Statistical Process Control. Chemom. Intell. Lab. Syst. 2006, 80, 57-66. (35) MacGregor, J. F.; Kourti, T., Statistical Process Control of Multivariate Processes. Control Eng. Pract. 1995, 3 (3), 403-414. (36) Wise, B. W.; Gallagher, N. B. The Process Chemometrics Approach to Process Monitoring and Fault Detection. J. Process Control 1996, 6 (6), 329-348. (37) Coifman, R. R.; Donoho, D. L. Translation-InVariant De-Noising; Department of Statistics, Stanford University: Stanford, CA, 1995.

ReceiVed for reView November 25, 2005 ReVised manuscript receiVed July 7, 2006 Accepted July 11, 2006 IE051313B