A Practical Assessment of Process Data Compression Techniques

Dec 15, 1997 - This means that important short-lived high-frequency disturbances can be preserved in the compressed data, and these disturbances may b...
0 downloads 0 Views 432KB Size
Ind. Eng. Chem. Res. 1998, 37, 267-274

267

A Practical Assessment of Process Data Compression Techniques Matthew J. Watson, Antonios Liakopoulos, Dragana Brzakovic, and Christos Georgakis* Chemical Process Modeling and Control Research Center, Lehigh University, Bethlehem, Pennsylvania 18015

Plant data are used to compare the effectiveness of wavelet-based methods with other compression techniques. The challenge is to effectively treat the data so that the maximum compression ratio is achieved while the important features are retained in the compressed data. Wavelets have properties that are desirable for data compression. They are localized in time (or space) and in frequency. This means that important short-lived high-frequency disturbances can be preserved in the compressed data, and these disturbances may be differentiated from slower, low-frequency trends. Besides discrete wavelet transforms, linear interpolation, discrete cosine transform, and vector quantization are also used to compress data. The transform-based compression algorithms perform better than the linear interpolation methods, such as swinging door, that have been used traditionally in the chemical process industries. Among these techniques, the wavelet-based one compresses the process data with excellent overall and best local accuracy. 1. Introduction There is often a need to retrieve large quantities of archival data for the purposes of plant diagnostics or model identification and validation. With the globalization of the operations of many companies this task also implies that the archival and recalling locations might be separated by several thousand miles. To speed up retrieval time from archive to requesting computer, the data need to be transmitted in compressed form. The problem of data compression for any type of engineering data has the same basis: maximize the compression ratio while maintaining as much of the desirable features of the signal as possible. The features that are retained depend upon the type of compression technique used. The objective of any data compression algorithm is to represent a given data set with another smaller data set. In order to accomplish this, a data compression algorithm takes advantage of any redundancy or repetition in the data. Frequently, data compression can be carried out for the purpose of separating the useful features of the data from those not needed. Most data compression algorithms consist of one or a combination of the following: data transform, quantization, and coding. In contrast to the vast amount of research in data compression as applied to image or acoustic signals, there have been few studies of data compression in the process industries. Hale and Sellars (1981) describe a data compression algorithm used in industry, whereby a signal is approximated by a piecewise linear function. The swinging door algorithm (Bristol, 1990), which is a variation of the piecewise linear theme, is also in use in the process industry. Feehs and Arce (1988) describe how vector quantization can be used to compress process trend recordings. Hu and Arce (1988) later studied the * Author to whom correspondence should be addressed: Iacocca Hall, Lehigh University, 111 Research Drive, Bethlehem, PA 18015. Telephone: (610)758-5432. Fax: (610)7585297. E-mail: [email protected].

application of subband decomposition before vector quantizing the signal. Taking the sub-band decomposition of a signal is closely related to taking the wavelet transform. Bakshi and Stephanopoulos published a series of papers studying how wavelets can be applied to extract temporal features and detect faults in process signals (Bakshi and Stephanopoulos, 1994a,b). More recently, the problem of compressing chemical process data through the use of wavelets has been addressed (Bakshi and Stephanopoulos, 1996; Watson et al., 1994). These papers present qualitative comparisons of some data compression algorithms and give some quantitative comparisons for short-time data sets. The objective of this paper is to describe three ways in which data compression is achieved: piecewise linear functional approximation, application of a data transform and the discarding of the insignificant transform coefficients, and vector quantization. Each of these methods is described in sections 2.1, 2.2, and 2.3 respectively. The data compression methods described are then used to compress large sets of real process data. One of the data sets contains four flow rate measurements from an Amoco depropanizer (MacFarlane, 1993). Each variable in this data set contains 1399 data points. The other set is temperature data from duPont’s Falcon project (Moser, 1994), and each variable consists of 17 152 time sequence measurements. Comparisons between the compression methods are presented and discussed in sections 3 and 4, mostly in terms of how the error, the difference between the reconstructed and original signal, varies as a function of compression ratio. 2. Compression Methods and Algorithms 2.1. Piecewise Linear Compression. In piecewise linear compression a signal is assumed to continue in a straight line, within an error bound, until a point lying outside of the error bound forces a recording to be made. A new line is then assumed and the algorithm continues. In this type of data compression, where the

S0888-5885(97)00401-6 CCC: $15.00 © 1998 American Chemical Society Published on Web 01/05/1998

268 Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998

Figure 1. Boxcar compression. (Redrawn from Hale and Sellars, 1981.)

recording time step varies, the measured value and its date-time tag must be recorded. If each date-time tag is assumed to require the same amount of memory as one recorded measurement, the compression ratio is

compression ratio ) no. of original measurements no. of recorded measurements × 2 The boxcar algorithm makes a recording when the current value differs from the last recorded value by an amount greater than or equal to the predetermined recording limit (error bound) for that variable. The previous value processed should be recorded, not the current value which caused the triggering of the recording as in Figure 1. The boxcar algorithm performs best when the process runs for long stretches of steady-state operation. The backward slope algorithm projects a recording limit into the future on the basis of slope of the previous two recorded values. The previous value is recorded if the current value lies outside of the recording limit. Once a value is recorded a new line and recording limit are projected into the future, and the algorithm is repeated (Figure 2). The boxcar and backward slope algorithms can be combined into a method that is a hybrid of the two. If a value lies outside of the backward slope recording limit, the method reverts to the boxcar until a recording is made. If the boxcar test fails first, the method continues with backward slope until a recording is made. If both tests fail, a recording is made and the algorithm starts over. The swinging door algorithm (Bristol, 1990) is similar to the backward slope algorithm, except that the recording limit is based on the slope of the line between the previously recorded value and the current measured values. When the current measured value has exceeded the error bound, defined by the recording limit, the value at the previous time step is recorded and the algorithm is repeated. These compression algorithms have a minimal computational load and were developed at a time when algorithms that achieved higher compression ratios did not justify the additional computation. Today’s computational environment allows the efficient application

Figure 2. Backward slope compression. (Redrawn from Hale and Sellars, 1981.)

of more complex algorithms, discussed below, to be applied effectively to process data. 2.2. Data Transforms. In the continuous formulation, a data transform is nondestructive or lossless, in that no information is lost when the transform is made. Typically the transform has an inverse that can perfectly reconstruct the original data. Examples of some commonly used transforms are Laplace, Fourier, and recently the wavelet transform. A linear transform often compacts most of the information of the original data set into a smaller number of vector components. For example, the discrete cosine transform performs a mapping from the time to the frequency domain, and often a signal vector has very little energy in the higher frequency regions of the spectral band. This property, known as compaction, implies that a large portion of the components of the transformed signal vector are likely to be very close to zero and may often be neglected entirely (Gersho and Gray, 1992). Setting these coefficients to zero is known as thresholding and is the basis for data compression through functional approximation. When coefficients are neglected, the transform is no longer lossless in that the reconstructed signal differs from the original. In the remainder of this section we briefly present the definitions of the transforms considered in this paper. Some properties of the transforms relevant to the discussion of the results are also reviewed. The continuous Fourier transform is given by

H(f) )

∫-∞∞h(t) e-2πift dt

(1)

where f is the frequency and h(t) is the function in the time domain. The discrete form of the same transform is N-1

Hn )

hke-2πikn/N ∑ k)0

(2)

where hk, k ) 0, 1, ..., N - 1, is the time sequence array of length N, and Hn is the discrete Fourier transform coefficient. The discrete and continuous Fourier transforms are related, to a first approximation, by H(fn) ≈ ∆Hn, where ∆ is the sampling interval in the time domain.

Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998 269

The discrete cosine transform, of N-1 data points, is given by

Fk )

x

where

N-1

2 N

νk

[

fj cos ∑ j)0

]

(2 j + 1)kπ 2N

k ) 0, 1, 2, ..., N - 1 (3)

{

1 νk ) x2 1

if k ) 0 otherwise

fj is the discrete time data sequence and Fk is the kth discrete cosine transform coefficient. Algorithms for computing the discrete cosine transform can be found in other references (Ahmed et al., 1974; Elliot and Rao, 1982; Rao and Yip, 1990). The integral wavelet transform can be written as

ˆf (a,b) ) |a|-1/2

∫-∞∞ f(t)ψ(t -a b) dt

(4)

where ψ(t) is a wavelet function, known as the mother wavelet. If the wavelet transform ˆf (a,b) is evaluated at the position b ) k/2 j and with dilation a ) 2-j and if the wavelet dilates and translates ψj,k(t) ) 2 j/2ψ(2 jt k) are orthonormal, then the wavelet coefficients are

cj,k ) ˆf

( ) 1 k , 2j 2j

(5)

where j and k are integers. Most functions and discrete data sets can be expressed in terms of a doubly infinite series as follows: ∞

f(t) )

∑ cj,kψj,k(t) j,k)-∞

(6)

For a discrete data set, and an orthonormal wavelet basis, the wavelet transform coefficients are given by ∞

cj,k ) 2 j/2

∑ f(l)ψ(2 jl - k) l)-∞

(7)

where f(l) is the lth element of the discrete time data sequence (Chui, 1992). In this work we extensively use Daubechies’ wavelets (Daubechies, 1988, 1992) which are orthonormal and have compact support. They can be expressed in terms of finite impulse response (FIR) filter coefficients, which simplifies the calculation of the wavelet transform coefficients and its inverse and allows for fast computation. There are several ways of taking the discrete wavelet transform. The most computationally efficient way of finding the discrete wavelet transform is by using multiresolution analysis (Mallat, 1989; Strang, 1989). Daubechies’ wavelets and the multiresolution analysis form of the discrete wavelet transform are used in this paper. The multiresolution analysis output from a signal can be represented as a gray-level image on the time-frequency plane where dark areas represent coefficients with large magnitude and white areas represent zero coefficients (Taswell, 1993), or in three-dimensional space, where the three axes are time, frequency, and coefficient magnitude. Figure 3a shows the function

Figure 3. Discrete wavelet transform and power spectrum of signal whose frequency varies with time. (a) The signal is sin 50πt3; (b) frequency spectrum of signal; (c) multiresolution analysis representation of the wavelet transform.

sin 50πt3 sampled at 128 discrete points over the time interval t ) [0,1]. In contrast to the Fourier and cosine transforms, which can only resolve a function in terms of its frequency components, the wavelet transform simultaneously resolves a set of data into its time and frequency (translation and dilation) components. Thus, short-lived high-frequency components of the data can be distinguished and, if necessary, separated from slower temporal trends. The frequency (Fourier) spectrum, in Figure 3b, shows information about what frequencies are present in the entire signal, but tells one nothing of where the frequencies occur. In contrast, the discrete wavelet transform of the signal, represented on a time-frequency plane using multiresolution analysis in Figure 3c, gives information on the magnitude of frequencies over discrete time intervals. Each discrete wavelet transform coefficient is represented within a rectangular time-frequency box, known as a tile, of area ∆t∆f ) constant. Information about low frequencies is given over a longer period of time, corresponding to a wide, short-time-frequency tile. Information at high frequencies, while blurred over a larger frequency range, is given over a smaller time interval, corresponding to a narrow, tall tile. Once the discrete transform has been calculated, compression is achieved by rounding the smallest coefficients to zero until either a desired compression ratio is reached or the accuracy of the reconstructed signal has exceeded a desired bound. The frequent occurrence of the zero transform coefficients can be taken advantage of, for example, by storing the non-zero coefficients in the form {sign(wk)(k + (|wk|/|wk|∞))} (Kantor 1993), where wk are the transform coefficients that have been sorted in descending order. The non-zero coefficients, in this form, must be stored with the length of the original vector and |wk|∞, the maximum transform coefficient. The compression ratio is then given by

compression ratio ) length of original vector number of non-zero coefficients + 2

(8)

2.3. Quantization. Quantization is similar to rounding numbers to a desired level of accuracy, but it is not

270 Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998

necessarily restricted to rounding to the nearest whole number or to a decimal place. In the scalar case the real line is divided up into line segments known as cells. The cells are smaller (closer together) in the region of the real line where a measurement is most likely to occur. Outside of this region the cells can be made larger without a significant loss in accuracy. Scalar quantization can be extended to vector quantization where instead of dividing the real line into line segments, the Rk space is divided into smaller regions of dimension k (Gersho and Gray, 1992; Linde et al., 1980). For example, in two dimensions a cell becomes a polygon and in three dimensions a polyhedron. Each cell, i, is assigned a single value (vector), yi, which is the midpoint of that cell. The collection of values (vectors), yi, is known as the codebook. A data point, or vector of points, that lies within cell i is approximated by the value (vector) yi. Quantization is lossy in that a quantized signal cannot be perfectly reconstructed or inverted to give the exact original signal. The advantage is much higher compression ratios. Vector quantization was implemented by taking sequential blocks of data of a single variable, the same dimension as the codebook elements, and quantizing the blocks. For example, if a three-dimensional codebook was used, blocks of three data points were quantized sequentially. This approach was used here since the motivation for compressing the data is primarily to decrease transmission time of individual measurements. To calculate the codebook, with a given number of elements, N, and dimension, k, an iterative procedure must be used (Gersho and Gray, 1992). The iteration is generally based on a subset of the entire data set, known as training data. While the optimality of the codebook can be guaranteed for the compression of the training data, it cannot be guaranteed for the entire data set to be compressed. The ability of the vector quantizer to compress data effectively depends on the training data that are used to calculate the codebook. Once a decision has been made about the number of elements N in the codebook and the dimension k of each element, the compression ratio can be calculated on the basis of the number of bits required to represent the signal before and after quantization. To calculate the memory required after the signal has been quantized, we have assumed it has been coded. The idea of coding is to assign a binary code to the codebook indices i. Numerals from another base, for example hexadecimal, could be used here also. The theoretical limit to the average number of bits needed to represent a given set of characters is given by the information entropy function (Held and Marshall, 1991), measured in bits:

Figure 4. Amoco and duPont data sets.

puter requires k × 64 bits of memory to represent each group of data to be quantized by a k-dimensional codebook. The compression ratio is then

compression ratio )

k × 64 log2(N)

since the memory required to store the codebook ()64Nk bits) can be neglected for large data sets. For example, a three-dimensional codebook containing 128 elements has a compression ratio of 3 × 64/log2(128) ) 27.43. 3. Results Two groups of plant data were used to test the following data compression methods: piecewise linear functional approximation, vector quantization, and data transforms (Fourier, cosine, and wavelet). The first set of data is flow rate measurements from one of Amoco’s depropanizing distillation columns (MacFarlane, 1993). The second set of data is from duPont’s Falcon project (Moser, 1994). Two sets of data from the first group and two from the second group of data are shown in Figure 4. The % relative global error is used in the evaluation of the effectiveness of the compression algorithms. It is calculated as the ratio of the L2 norm of the difference between the original and reconstructed signal and the L2 norm of the original signal:

% relative global error ) 100 ×

N

H)

Pi log2 Pi ∑ i)1

(9)

where Pi is the probability of the ith codebook element occurring and N is the number of cells (quantization intervals). Since the Pi’s are probabilities, ∑iPi ) 1. In general, each of the elements of a codebook has an approximately equal probability of occurring, so that Pi ) 1/N in eq 9. For this reason coding the index with a binary number of constant bitlength is good enough. Then, from eq 9, the number of bits required to code the indices, i ) 1, 2, 3, ..., N, from the codebook is log2(N) rounded up to the nearest integer. A 64-bit com-

(10)

∑ i(fi - ˆfi)2 ∑ i(fi)2

(11)

where fi denotes the ith element of the original signal and ˆfi is the ith element of the reconstructed signal. The % relative maximum error, which gives a localized measure of the error, is based on the absolute value of the maximum difference between the original and reconstructed signal and is expressed as a fraction of the L∞ norm of the signal:

% relative maximum error ) 100 ×

maxi(|fi - ˆfi|) maxi(|fi|)

(12)

Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998 271

Figure 5. Comparison of piecewise-linear compression algorithms: -‚-, boxcar; - - backward slope; ---, boxcar and backward slope; s swinging door.

3.1. Piecewise Linear Compression. For the piecewise linear case, the relation between compression ratio and error was found by varying the recording limit, making the linear approximations, and comparing the reconstructed signal with the original. The results are shown in Figure 5. Note that the recording limit depends on the range of the signal. For example, the reboiler flow has a range of values from 5.9 to 8.1 and the recording limit was varied from 0.1 to 0.5, whereas the feed flow has a range of 3500-5500 so its recording limit was considerably larger. Figure 5 shows that the performances of the piecewise linear methods are comparable, an expected result considering that the same interpolating function is used to reconstruct the signals, although the polynomials used to establish the recording limit vary between zeroth and first order. In terms of the local error, it was found that the swinging door algorithm compressed the data most effectively (Watson, 1996). 3.2. Vector Quantizer Compression. The Amoco and Falcon data sets were quantized with codebooks of various dimensions and sizes. The dimension was varied between 1 (scalar) and 6, for codebooks of size 16, 32, and 64. For codebooks containing 128 elements the dimension was varied from 1 to 4, and for codebooks with 256 elements, the dimension was 1 and 2. The Amoco data sets are relatively small and the entire set is used as the training data in the calculation of each codebook. The duPont data sets are much larger and, to simulate what would be done in a practical setting, training data sets are selected. This also reduces the computational load. An initial codebook was found iteratively using the pairwise nearest neighbor algorithm (Gersho and Gray, 1992), and the initial codebook was then trained using the LBG algorithm named after Linde, Buzo, and Gray (1980). The training ratio (the ratio of the number of training vectors to the number of codebook vectors) was 8. Figure 6 shows the effect of codebook size and dimension on vector quantization compression performance for the Amoco and Falcon data. The general trend is clear: the most effective codebook has a high dimension and a large number of elements. 3.3. Transform Compression. Figure 7 shows the % relative global error as a function of compression ratio for the wavelet, discrete cosine, and Fourier transform.

Figure 6. Vector quantization of data: b, scalar codebook; +, two-dimensional codebook; - ‚ -, three-dimensional codebook; - - -, four-dimensional codebook; - -, five-dimensional codebook; s, sixdimensional codebook. The left-most point of each curve represents the design with the largest number of code elements.

Figure 7. Comparison of transforms’ effectiveness in compressing data: s, Daubechies’ fifth-order wavelet; - -, Fourier transform; ---, cosine transform.

The curves were generated by taking the transform of the entire signal and varying the number of the transform coefficients that are considered small and are thus set to zero. The compression ratio is varied by changing the number of discarded coefficients, and the reconstructed signal is compared with the original to calculate the relative global error. The wavelet transform of the signal was taken using Daubechies’ compactly supported orthonormal wavelets (Daubechies, 1988). Information entropy was used to evaluate the order of the wavelet that would compress the signal most effectively, although, in general, all the different wavelets perform equally effectively. In fact, a set of 20 wavelets (Taswell, 1993) have been tested, and it was found that there was no significant variation in the compression performance (Watson, 1996). From Figure 7 it is obvious that the wavelet and cosine transforms have performed better than the Fourier transform. It is well-known that in general the discrete cosine transform is better at compressing data than the discrete Fourier transform (Rao and Yip, 1990). It might be considered a surprise that the cosine did as well as the wavelet transform, but it should be noted that the comparison is made on

272 Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998 Table 1. Effect of Using a Different Codebook on the Reconstruction Error of Data dP1

Figure 8. Overall comparison of the compression algorithms: s, wavelet; -‚-, discrete cosine transform; ---, vector quantization; - -, swinging door.

the basis of global error and not the local error. Comments on the smaller local error achieved by the wavelet transform compression are presented in the Discussion. 4. Discussion To facilitate the comparison of the compression techniques, only the best from each family of algorithms are considered in this section. Among the linear methods, the swinging door performed the best. From the transform methods we select, for this comparison, the wavelet transform and its closest competitor, the cosine transform. Highlights of the Results section are shown in Figure 8. Clearly the swinging door algorithm does not compress data as effectively as the other two compression algorithms. However, the advantage of using a piecewise linear method is its simplicity. The recording limit can be matched to the intrinsic noise level of a measurement so that only significant changes and trends are recorded. Many process variables have long periods of steady-state or pseudo-steady-state behavior, and piecewise linear functions are well suited to describing this type of behavior. Reconstruction of the compressed data is simply a matter of drawing straight lines between recorded measurements and need not involve a computer if hard copies of the recorded measurements are available. Insight can be gained from the compressed data without having to first decode the data. The disadvantages are the date-time tag required for each recording and the varying time step between each measurement. If each measurement has its own datetime tag, much of the compression is lost. Also, the variable time step can be problematic for tasks such as model identification. The performance of a vector quantizer is strongly dependent on the dimension and the size of the codebook. To compress data most effectively, large codebooks of a high dimension are required. However the design of a codebook, even for a small set of training data, is extremely time consuming. For example, if the codebook is to contain 64 one-dimensional elements, and a training ratio of 8 is used, the training data consist of 512 elements. The first exhaustive search for the 511 nearest neighbor (see section 2.3) consists of ∑n)1 n

relative global error (%)

codebook dimension

compression ratio

codebook (a)

codebook (b)

1 2 3 4 5 6

10 25 45 50 70 80

0.2 0.2 0.2 0.2 0.2 0.2

0.3 0.5 0.8 0.8 1.0 1.0

510 n comparisons, and so on. comparisons, the second ∑n)1 511 k Finding the 64 nearest neighbors requires ∑k)64 ∑n)1 n ) 22 325 856 comparisons. Only the six-dimensional vector quantization results are shown in Figure 8, since this gave the lowest reconstruction errors for a given compression ratio. The codebook sizes shown are 16, 32, and 64 which correspond to decreasing compression ratios and data reconstruction errors. Designing a codebook of this nature for a single-process variable requires an unreasonably large amount of computer time, especially if the data set is large. For example, the design of a six-dimensional codebook containing 64 elements for data sets dP1 or dP2 takes several hours of CPU time on an IBM Powerstation 320H workstation, using Matlab algorithms. The designing of a separate codebook for every measured variable of a large processing plant is simply infeasible. The justification for this computational effort is that once the codebook has been calculated, it can be used to quantize an infinite length of data. But the quantization is only effective as long as the data are similar to the training data. If the training data used to design the codebook are not selected appropriately, a suboptimal quantizer results. A case in point is the failure to select enough training data to cover the entire range of measurements that can be expected from the measuring device. Choosing appropriate training data requires either a lot of data, and hence a lot of computer time, or engineering judgment. Neither of these requirements are satisfactory for most problems. Figure 8 shows that the quantizer compresses data more effectively than wavelet transforms when the codebook design is based on the entire data set, as was the case for the Amoco flowrate data. In a typical application the length of the data set to be quantized is considerably longer than that of the Amoco flow rate data, used in this study, and thus the training data used to design the codebook is a fraction of the total data available. Vector quantization is not a global method. That is, a codebook that compresses one data set effectively may not compress another data set very well. Table 1 illustrates this point. The codebook of one data set was used to compress another data set with a similar range. Codebook (a) is the original codebook intended for Falcon data set 1 (dP1). The compression ratio that corresponds to an error of 0.2% is found and codebook (b), from another similar data set, is used to compress data set 1 to the same compression ratio. The error values are listed in Table 1. Clearly, as the dimension of codebook (b) increases, the error of the quantization increases. If the codebook from a data set with a vastly different range was used instead, the relative global error would be considerably worse. Ideally, a separate codebook must be calculated for each process data set that is to be compressed. The calculation of the compression ratio for a technique based on functional approximation is independent

Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998 273

Figure 9. Falcon Data Set 3 (top) and difference between reconstructed and original data for vector quantization (VQ), discrete cosine transform (DCT), and discrete wavelet transform (DWT). Note that the DWT scale has been magnified 10 times.

of the number of bits required by the computer to store each measurement. However, the calculation of the compression ratio for vector quantization, eq 10, is directly proportional to the computer’s bitlength. A lower bitlength shifts the plots of compression ratio versus error, in Figures 6 and 8, to the left, thereby lowering the compression performance of the quantizer. The trends that are of interest in the process industries are usually dynamic in nature and should be reconstructed accurately from the compressed data. Data that contain long periods of steady-state behavior with intermittent regions of dynamic behavior are not compressed very effectively with a vector quantizer. It is not always possible to capture all of the dynamic characteristics of the data since vector quantization is sort of an averaging process. That is to say, if the steady-state behavior is more prevalent in the data, the vector quantizer gives more weight to reconstructing the steady-state part of the data more accurately than the transient or intermittent dynamic behavior. To illustrate the point, Figure 9 shows a data set, also from duPont’s Falcon project, that contains long periods of steady-state behavior. The vector quantizer incurs large errors around the points where there is a large step change, whereas the wavelet-based technique shows almost no difference between the original and compressed signal around the step changes. From Figure 8, it might appear, on first sight, that using wavelets to compress data has no significant advantage over the discrete cosine transform. It shows that the discrete cosine transform compresses the data as effectively as the discrete wavelet transform and in some cases more effectively. However, wavelets are well suited to describe long periods of steady-state behavior followed by abrupt changes because of their ability to resolve a signal simultaneously into its time and frequency components. Sinusoidal bases tend to introduce large errors in the region of the abrupt change when some of the coefficients are thresholded. This was illustrated in Figure 9, where the difference between the reconstructed and original signal is largest around the step changes. This is due to the well-known Gibb’s phenomenon. To illustrate the point, we computed the discrete cosine transforms of a step function and a discrete time impulse and truncated the number of coefficients that are used in the reconstruction series.

Figure 10. Gibb’s phenomena for third-order Daubechies’ wavelet and sinusoidal basis. (a) step function reconstructed from truncated wavelet series; (b) step reconstructed from truncated discrete cosine series; (c) discrete time impulse reconstructed from truncated wavelet series; (d) discrete time impulse reconstructed from truncated discrete cosine series.

Figure 11. Comparison of transforms’ effectiveness in compressing data: s, wavelet transform; ---, Fourier transform; - - cosine transform.

In Figure 10, two signals, a step and a spike, each consisting of 200 points have been reconstructed from a truncated wavelet and cosine series. Of the 200 series coefficients, 166 were set to zero for the step function, and 181 were set to zero for the spike. It can be seen in Figure 10 that wavelets describe a step function more accurately, even though the same number of coefficients are thresholded. Notice also that the maximum value of the discrete time impulse function, Figure 10c,d, is considerably less than 1 when it is reconstructed from a truncated cosine series, whereas the truncated wavelet series preserves the original magnitude of the spike. The comparisons have, up until now, been made based entirely on the relative global error defined in eq 11, which is based on the L2 norm. If the maximum relative error is used as a basis of comparison instead, the picture is more complete. Figure 11 shows the % relative maximum error, eq 12, as a function of compression ratio. Clearly, the wavelet transform has a consistently lower relative maximum error. As was

274 Ind. Eng. Chem. Res., Vol. 37, No. 1, 1998

shown in Figure 10, the error is largest for a truncated sinusoidal series when there is a sudden change, such as a step or spike. Wavelets, however, are better suited to describe sudden changes because the reconstruction series combines dilated and translated versions of the mother wavelet. These two factors allow a sudden, time-localized change to be captured in a smaller number of larger wavelet coefficients. There are some issues that remain for future work. Two of them are discussed here. Firstly non-zero transform coefficients can be quantized and encoded prior to storage to increase the compression ratio. The effect of this on the error of reconstruction needs to be evaluated. Secondly, it would be of interest to compare the effectiveness of the wavelet transform with the Karhunen-Loeve transform. The Karhunen-Loeve transform is an optimal data transform, although there is no fast algorithm and the set of basis functions, or vectors, is determined on a case by case basis (Rao and Yip, 1990). 5. Conclusions In this paper, several ways in which process data can be compressed were described. The different compression methods were applied to long sets of real plant data, and comparisons were made. Compressing data by transforming the data and thresholding the insignificant transform coefficients is the most effective means of compressing large sets of industrial data. A piecewise linear approach in general does not compress data as effectively as data transforms. The compression ratios are lower for a given error due to the date-time tag associated with each recording as necessitated by the variable time step. The use of a vector quantizer to compress plant data is impractical because of the amount of time it takes to calculate the codebook for each data set and the inability of the vector quantizer to use a codebook for different data sets. The wavelet compression algorithm is simple and fast and can be applied to any data set. The wavelet transform compresses data as effectively as the discrete cosine transform when the L2 norm is used to quantify the reconstruction error. The wavelet transform is superior at reconstructing sudden changes in the measured data (steps, spikes, etc.). This is important for the process control engineer who needs to carry out model identification and investigate controller tuning. Furthermore, computational efficiency is improved when orthonormal wavelets are used. It takes less time to compute the discrete wavelet transform than the discrete Fourier or cosine transform. Acknowledgment The authors thank Amoco and duPont for supplying the data.

Bakshi, B. R.; Stephanopoulos, G. Representation of process trendssIII. Multiscale extraction of trends from process data. Comput. Chem. Eng. 1994a, 18 (4), 267-302. Bakshi, B. R.; Stephanopoulos, G. Representation of process trendssIV. Induction of real-time patterns from operating data for diagnosis and supervisory control. Comput. Chem. Eng. 1994b, 18 (4), 303-332. Bakshi, B. R.; Stephanopoulos, G. Compression of chemical process data through functional approximation and feature extraction. AIChE J. 1996, 42 (2), 477-492. Bristol, E. H. Swinging door trending: Adaptive trend recording? Advances in Instrumentation and Control; Instrument Society of America: Research Triangle Park, NC, 1990; Vol. 45, pp 749754. Chui, C. K. An Introduction to Wavelets; Academic Press: San Diego, 1992. Daubechies, I. Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 1988, 41, 909-996. Daubechies, I. Ten Lectures on Wavelets; Society for Industrial and Applied Mathematics: Philadelphia, 1992. Elliott, D. F.; Rao, K. R. Fast Transforms: Algorithms, Analyses, Applications; Academic Press, Inc: New York, 1982. Feehs, R. J.; Arce G. R. Vector quantization for data compression of trend recordings. Technical Report, Udel-EE 88-11-1; University of Delaware: Newark, DE, 1988. Gersho, A.; Gray R. M. Vector Quantization and Signal Compression; Kluwer Academic Publishers: Boston, 1992. Hale, J.; Sellars, H. Historical data recording for process computers. Chem. Eng. Prog. 1981, 77 (11), 38-43. Held, G.; Marshall, T. Data Compression: Techniques and Applications: Hardware and Software Considerations; Wiley: New York, 1991. Hu, T. W.; Arce G. R. Application of subband decomposition to process control data. Technical Report, Udel-EE 88-12-1; University of Delaware: Newark, DE, 1988. Kantor, J. C. Wavelet Toolbox Reference Version 1.1. University of Notre Dame: Notre Dame, IN, 1993. Linde, Y.; Buzo, A.; Gray, R. An algorithm for vector quantization design. IEEE Trans. Commun. 1980, COM-28, 84-95. MacFarlane, R., Amoco Corporation, personal communication, 1993. Mallat, S. G. Multiresolution approximations and wavelet orthonormal bases of L2(R). Trans. Am. Math. Soc. 1989, 315 (1), 69-87. Moser, A. R., E. I. du Pont de Nemours and Company, personal communication, 1994. Rao, K. R.; Yip, P. Discrete Cosine Transform: Algorithms, Advantages, Applications; Academic Press, Inc.: Boston, 1990. Strang, G. Wavelets and dilation equations: A brief introduction. SIAM Rev. 1989, 31 (4), 614-627. Taswell, C. WavBox3: Wavelet Toolbox for Matlab; Stanford University: Stanford, CA, 1993. Watson, M. J. Wavelet Techniques in Process Data Compression. Master’s Thesis, Lehigh University, Bethlehem, PA, 1996. Watson, M. J.; Liakopoulos, A.; Brzakovic, D.; Georgakis, C. Wavelet techniques in data compression and dynamic model identification. Research Progress Report 19; Chemical Process Modeling and Control Research Center: Lehigh University, Bethlehem, PA, 1994.

Received for review June 2, 1997 Revised manuscript received October 15, 1997 Accepted October 17, 1997X IE970401W

Literature Cited Ahmed, N.; Natarajan, T.; Rao, K. R. Discrete cosine transform. IEEE Trans. Comput. 1974, C-23, 90-93.

X Abstract published in Advance ACS Abstracts, December 15, 1997.